The main argument against using UUIDs as a primary key in a relational database seems to be that if the UUIDs aren't serialized sequentially tables can become fragmented. At first blush, version 1 UUIDs seem like they would work (they are sequential after all) but because their timestamps are stored in reverse order (29, 30, 31 becomes 92, 03, 13) when the UUIDs are converted into binary values they aren't consecutive.
Recognizing this problem, Microsoft introduced a built in function which allows developers to generate database friendly UUIDs; unfortunately, the function is only available to SQL Server users. Ideally we would like to abstract Microsoft's logic out of the database and move it into our application where we would be free to leverage it anywhere; to this end, I decided to analyze SQL Server's output by creating a loop of 1000 inserts and carefully inspecting the results.
Upon looking at the sample set of data I noticed a few very obvious things. The first was that the format of the generated ids was consistent with that of a traditional UUID: 16 bytes divided into 5 segments of 4, 2, 2, 2 and 6 bytes. If parsed like a traditional UUID the variant field was well formed (it resolved to Leach-Salz) but the version and timestamp were not. I also noticed that the first byte where time_high was normally stored changed with greater frequency than the last byte (where a traditional version 1 UUID most often varied). It looked as if a version 1 UUID had been generated in the database and its byte order had been reversed. Indeed, flipping the byte order of the time_high, time_mid, and time_low fields of any SQL Server generated id allowed me to convert it into a valid version 1 UUID; I could verify this by looking at the timestamp:
Manually inserting a version 1 UUID into the database produced similar results. The byte order of the time_high, time_mid, and time_low fields of the UUID were stored in reverse. This was the algorithm!
Abstracting out the logic was simply a matter of identifying which segments of the UUID needed to have their byte order changed and encapsulating the necessary transformations in a builder class. To test the class, I used it to generate one million globally unique identifiers and checked to see if SQL Server stored them sequentially. It did! My code worked and seemed portable. I was generating database friendly sequential ids outside of SQL Server!
Almost. The algorithm worked perfectly under MS SQL Server but operated inconsistently when used with any other database. Note that the two UUIDs above represent the respective timestamps of "Tuesday, February 28, 2012 7:10:17 PM GMT" and "Tuesday, February 28, 2012 7:10:20 PM GMT". SQL Server inserted these ids in the right order but if you look at their actual numeric values they are NOT sequential. The hex value 0x0001a3db (107483) should PRECEDE 0x0001abd9 (12896). SQL Server was obviously sorting its ids using something other than their binary value. As I suspected it may be using hints supplied by the proprietary UNIQEIDENTIFIER type, I stopped investigating. This wasn't going to be a portable solution.
Persisting UUIDs generated using this algorithm resulted in sequential writes to SQL Server, MySQL and HqSQL databases. The calls were ANSI SQL compliant as I was now storing the UUID as a BINARY(16). In addition, because I left the node bits in the least significant spot, UUIDs could be generated in parallel on an arbitrary number of hosts and still be written in the order they were created (the format of the UUID was roughly a concatenation of 'create date' and 'host id').
I also decided to change the version number of the UUID from 1 to 13 (D). This would prevent accidental collisions with version 1 UUIDs. I am presently implementing this strategy in our back end and welcome any feedback.