Jump to content

Talk:Universally unique identifier/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1

Bring back the stat!

I am saddened that this article no longer includes this stat in the opening paragraph: "A UUID is a 16-byte (128-bit) number. The number of theoretically possible UUIDs is therefore 216*8 = 2128 = 25616 or about 3.4 × 1038. This means that 1 trillion UUIDs would have to be created every nanosecond for 10 billion years to exhaust the number of UUIDs." It made the article several times more readable, and gave the context straight away, even if it is not genuinely of mathematical use... — Preceding unsigned comment added by 87.127.211.206 (talk) 13:37, 25 August 2011 (UTC)

Doesn't a 128-bit number have 2^128 possible values (where ^ means "raised to the power"). I don't understand the phrase "216*8 = 2128 = 25616 or about 3.4 × 1038". 2128 is not equal to 25616, and 216*8 is not 2128.

I have read, elsewhere, that there are enough UUIDs to assign one to every atom in the known universe. On the other hand, http://en.wikipedia.org/wiki/Observable_universe says that the known universe has about 10^80 atoms. I think that 10^80 is about 2^265, which is larger than 2^128. I'm confused. 75.146.141.142 (talk) 22:19, 27 April 2012 (UTC)

We need more material in this article

As a Linux user, I understand the importance and awesomeness of UUIDs. However, I'd like to know more. For instance, when did the idea of UUID come into creation? When was it created as a set standard? What I'm trying to say is that this article needs more dates, years, names of people related to those things. --Cyberman (talk) 02:31, 14 June 2008 (UTC)


ETC

What is Leach-Salz? --Abdull 15:16, 4 April 2007 (UTC)

The UUID variant described in RFC 4122, "A Universally Unique IDentifier (UUID) URN Namespace", by Paul Leach, Michael Mealling, and Rich Salz. Guy Harris 17:44, 4 April 2007 (UTC)
Perfect answer, thank you very much! I first thought it has something to do with salt... --Abdull 18:55, 4 May 2007 (UTC)

private-use UUIDs?

The article doesn't make clear if there is a "private-use" address space for UUIDs, i.e., some rules to create UUIDs not intended to be exported from a closed system, like we have IPv4 and IPv6 private networks, Unicode's Private Use Area, the .local domain name, local-use EAN-13 barcodes and so on... The most logical idea is to use a specific reserved version number, but the RFC 4122 doesn't seem to mention anything other than the 5 versions (clock, dce, md5, random and sha1). --Juliano (T) 01:39, 20 August 2007 (UTC)

AFAIK there are not such "private use" UUIDs. The inventors of these UUIDs claim that the collision risk is small enough that in practise there are no collisions, so there is no need for "private use" UUIDs. --81.27.124.161 (talk) 21:22, 9 April 2008 (UTC) (RokerHRO)
AFAIK?? Seems like that acronymn is a good example of private use UUIDs. The acronym is unintelligible, its meaning known only to only a select few ... as the link provided demonstrates.

K. Kellogg-Smith (talk) 13:09, 20 October 2011 (UTC)

Collisions

"This means that 1 trillion UUIDs would have to be created every nanosecond for 10 billion years to exhaust the number of UUIDs." this leads to misunderstanding. While the number space is that big, chance for a collision are much higher, indeed! I.e. see http://en.wikipedia.org/wiki/Birthday_paradox —Preceding unsigned comment added by 84.129.161.7 (talk) 03:16, August 30, 2007 (UTC)

Duplicate UUID's on hosts

IS it possible to have same valid UUID on multiple hosts in a network? —Preceding unsigned comment added by 128.222.37.20 (talk) 05:15, 13 September 2007 (UTC)

It is, but very unlikely as explained in the article if the UUIDs are chosen randomly. --81.27.124.161 (talk) 21:24, 9 April 2008 (UTC) (RokerHRO)

"Well Known" UUID's

Perhaps a section should be added to list some well known UUIDs. The article already uses the MS IUnknown UUID as an example of such a UUID - however a more exhaustive list of "known" aka documented UUIDs that are for specific uses should be added.Myrdred (talk) 16:43, 9 February 2008 (UTC)

Am I the only one horrified by this idea?

What a terrible idea! So basically because we can't be bothered to create a central listing of things, we're just generating an enormous gillion-character string and "hoping" that since it's so big it's not going to collide?! This is used in ext3?! Wow I feel great knowing that the only reason my filesystem isn't catastrophically corrupted is because I'm lucky. This type of thing might have limited applications where it's impossible to safely merge IDs that could potentially collide (can't imagine why it would be impossible but maybe for some crazy deep space probe where energy can't be spared for merging processing and high bandwidth is critical), but for labelling articles on E's website?! This is terrible programming! :D\=< (talk) 02:11, 2 March 2008 (UTC)

It is not enormous, it is not gillion-character, and it is not a string. It is a 128-bit number. The purpose of UUID is to be used when you simply don't have any means of using a central registry. Two autonomous systems, with no previous established communication between them, or access to the Internet may create lots of objects identified by UUIDs, than they can merge together into one, and their objects will still be unique.
Or do you really want to force everyone making a filesystem on a newly-installed system to have an Internet connection, in order for mke2fs to connect to a central registry to create the UUID of your partition? And even in the remote possibility of two objects in the universe having the same UUID, it is easier to believe in the existence of the Invisible Pink Unicorn than they happening to coexist on the same context.
Your filesystem won't be catastrophically corrupted just because it has the same UUID of a news article on some website. It won't be corrupted even it there was another object on the same computer with the same UUID. You have to be extremely unlucky to get two partitions with the same UUID on the same computer, and even so, your biggest problem will be referencing one of them by UUID when mounting, something very simple to fix.
The principle is the same of hashing. For mission-critical systems, you either won't depend on UUIDs or will properly handle UUIDs so that they don't collide or collisions will be treated accordingly.
--Juliano (T) 17:00, 9 March 2008 (UTC)
And how is a filesystem not mission-critical? And give one example where you can't have a central registry. I don't understand why partitions have UUIDs- they're already registered in the drive's partition table, and each hard drive's partitions are numbered and made available by the BIOS. No need for an internet connection of course. Obviously there are cases where data needs to be merged and still have a unique identifier but if you actually give some processing to sorting through the data and preventing collisions, reissuing identifiers, and generally doing what a merge should be doing, then you have no problems. It's a lot slower than just issuing huge UUIDs to everything and crossing your fingers, which does work, but come on! How ugly! This is not the right thing. :D\=< (talk) 15:19, 10 March 2008 (UTC)
Yes, sure a filesystem is mission-critical... but the data contained into it, not its identification along all the partitions of the system. Once you build a mission-critical system, the UUIDs of the partitions are set on stone and won't change anymore during its mission-critical activity. You use UUIDs only to identify each partition among the others, only. Once mounted, the kernel uses its (major, minor) tuple to access it. And you are thinking too strictly... forget the BIOS and the drive's partition table. A computer is not restricted to having a single drive.
Think you have 2 hard drives on your computer, each one with a few partitions, and each partition identified by its UUID. If you consider the possibility of changing their internal connections, referring them by their /dev/sdXY path becomes inherently broken. Using UUIDs, you buy a new SATA PCI card and move one of the disks to it, the system will boot with all mount points in place, like if nothing changed. Or you buy an USB enclosure, put the other disk into it and plug it into the USB port, the system will still boot and mount all the partitions properly on their mount points, since they still have the same UUIDs they had when created. You bring a hard drive from a friend, and plug it anywhere on your system reassured that it would be easier to win three consecutive times the Lottery's full-prize than having his partitions colliding with yours (unless he knew beforehand one of your systems partitions UUID and set on his drive, but this is another problem).
With dynamic and redundant disk systems, like LVM and RAID, where disks and partitions may be freely moved and newly ones added, UUIDs are even more important. You ask for one example where you can't have a central registry... disk partitioning is pretty much an excellent example. Creating and formatting RAID partitions on hardware RAID implementations are done by the BIOS, before the operating system is loaded. How you can think of using a central registry in this case?
I work with distributed computing, process migration and some grid computing. We use UUIDs extensively to identify nodes, tasks and all sort of objects being processed and passed around. Objects must be unique on a system that is not fully connected. Sections of the system get disconnected from the rest and get reconnected a few hours later. The idea of a central registry for UUIDs is simply crazy and stupid. Half the objects created don't even live for more than a few seconds. A central registry not only would slow everything down, it would break its mobility and create a single point of failure, for no added value. Such a distributed system is inherently designed with fault-tolerance in mind, since you may spawn a task to a node, the node crashes and you won't ever receive its answer back. UUID collision is not even considered, it is a non-issue.
And talking about BIOS, your worst nightmare is already becoming true. The BIOS is obsolete and is currently being replaced with the Extensible Firmware Interface, which in turn replaces the old and obsolete Master Boot Record with the new GUID Partition Table, heavily based on UUIDs. Sorry.
--Juliano (T) 17:19, 10 March 2008 (UTC)
Froth, I thought about this problem too. I guess I would feel safer if all programs that use UUID's are written to handle UUID collisions safely, but I guess the probability of collision is known to be too small to bother with. I wonder if there are many life-and-death applications that blindly depend on uniqueness of UUIDs, e.g.avionics, railways, medical equipment. Perhaps only the most hardnosed engineer/mathematician-types would feel safe. Glueball (talk) 11:47, 21 April 2008 (UTC)

xxxxxxxx-xxxx-3xxx-xxxxxxxxxx

This pattern is supplied in the section on Version 3 UUIDs. Isn't there a group of four hex digits missing here? I hesitate to correct it myself in case I'm missing a key point about version 3. Jimgawn (talk) 18:36, 11 April 2008 (UTC)

 Fixed. You are right. It should be (and now is) xxxxxxxx-xxxx-3xxx-xxxx-xxxxxxxxxxxx . --68.0.124.33 (talk) 06:07, 15 January 2009 (UTC)

Also it would be better write "340 282 366 920 938 463 463 374 607 431 768 211 456 possible UUIDs", in stead of "There are 340,282,366,920,938,463,463,374,607,431,768,211,456 possible UUIDs". — Preceding unsigned comment added by Palladipeloarancione (talkcontribs) 13:47, 5 April 2012 (UTC)

Where UUID stored?

Please say where the UUID number is stored in flash memory cards.

I have a CF card that shows up as

/dev/disk/by-uuid/2004-1223 -> ../../sdb1

and an SD card that doesn't seem to have a UUID. Can one use e.g., the GNU/Linux dd command to see where the UUID is stored? Jidanni (talk) 01:24, 13 July 2008 (UTC)

It is not a property of the SD card, but of the partitions contained in it. It is part of the filesystem, depending on which filesystem you have in it. This number (2004-1223) is most likely to be a 32-bit FAT volume serial number, which is not an UUID as described in this article.
This number is chosen when you create the filesystem in (ie, format) the SD card. On Linux, you may force a given number by passing the -i xxxxxxxx parameter to mkdosfs.
The FAT article tells exactly where the serial number is stored inside the partition, but this is out of the scope of this article. UUIDs are not used by Windows to identify its partitions.
--Juliano (T) 17:35, 13 July 2008 (UTC)

Is there an error or am I a bad counter?

I think, that 8+4+4+4+12=32 not 36? —Preceding unsigned comment added by BartekBl (talkcontribs) 20:19, 14 November 2008 (UTC)

32 hexadecimal digits, plus 4 dashes, equals to 36 characters in the textual representation. --Juliano (T) 11:31, 15 November 2008 (UTC)

NCS UUID

When I first came across UUIDs they always seemed to be referred to as NCS UUIDs, but there is no mention of NCS in the article. I don't know where NCS fits in, but I would like to. —Preceding unsigned comment added by 170.148.215.156 (talk) 14:16, 5 March 2009 (UTC)

Too many implementations

Am I the only one who thinks that mentioning implementations of UUIDs in every possible language/environment (29 currently) only clutters the article and doesn't belong to encyclopedia? I think the section should either be removed or shortened a lot. Svick (talk) 18:55, 7 October 2009 (UTC)

I think it's useful to continue to include them. They might be moved to List of UUID implementations, but that seems like overkill. Maybe change the list to a toggled table? Ant (talk) 15:29, 19 November 2010 (UTC)
Why do you think it's useful? Could you elaborate on that? Keep in mind that Wikipedia is not a repository of links. Svick (talk) 17:51, 20 November 2010 (UTC)

"Sufficient entropy" requirement for collision avoidance

However, these probabilities only hold when the UUIDs are generated using sufficient entropy. [1]

This is true, but it seems to gloss over an important factor. Some UUID allocation schemes are deliberately running with reduced entropy. On the same page, for example, SQL Server's NEWSEQUENTIALID() is mentioned, and there's the practice of overwriting a portion of the GUID with predictable information, for example the wFormatTag/GUID conversion used in WAVE_FORMAT_EXTENSIBLE headers. I don't assert that either of these examples drastically reduce the safety of the system, but similar implementations may be more greedy with the number of predictable bits, and substantially increase the risk of collision. --ToobMug (talk) 13:36, 10 February 2010 (UTC)

Such schemes generally use the MAC address based form of UUIDs. Those do not depend on entropy for uniqueness, only the improbability of generating two UUIDs exactly 2^60 / 10 ^ 7 seconds (which is more than 3600 years) apart and expecting them to differ. Everything else is by central authority: First digits of MAC address allocated by IEEE to hardware maker, remaining digits of MAC allocated by hardware maker (one per card made), 60 bit Date/time allocated by the clock on the computer with that card, 14 bit reboot counter allocated by OS of that computer, variant and type bits fixed by standard. To generate sequential UUIDs with this scheme, simply reserve a continuous time interval of the needed number of 100ns units, and allocate them all as a block from the system daemon/facility that ensures only one program gets the UUID for a given moment in time. 77.215.46.17 (talk) 23:38, 18 April 2011 (UTC)

Compress the Implementations section?

Would anybody mind if I replaced the entire Implementations section with just a list of languages in which implementations exist (to give a feel for how widespread adoption is) and all the citations? I think that would improve overall readability of the article and detract almost nothing from the content. Anybody who is looking for a specific implementation need only google "uuid <language name>". -- RoySmith (talk) 18:05, 26 June 2011 (UTC)

Yes, I would mind. Implementations are not always conformant (e.g. CouchDB's) or may offer differing features, licensing, be target for different compilers, etc. Lambda-mon key (talk) 01:17, 24 August 2011 (UTC)

"In perspective"

Regarding the paragraph "To put these numbers into perspective, one's annual risk of being hit by a meteorite...", this is bad exposition. People are notoriously bad at having an intuition for these types of odds, so if your goal is to assess risk, this isn't helpful. — Preceding unsigned comment added by 17.209.4.116 (talk) 06:19, 23 February 2012 (UTC)

entropy

the Author wrote: >>However, these probabilities only hold when the UUIDs are generated using sufficient entropy. >>Otherwise the probability of duplicates may be significantly higher, since the statistical dispersion may be lower.

So what is required to provide "sufficient entropy?" — Preceding unsigned comment added by Eostermueller (talkcontribs) 15:37, 24 February 2012 (UTC)

Citation for "e2fsprogs is used by all these people"

The current version of this article (2012-05-07T13:56:32) says that "Linux's ext2/ext3 filesystem, LUKS encrypted partitions, GNOME, KDE, and Mac OS X" all generate UUIDs (or GUIDs? paragraph is unclear) by using the "e2fsprogs" software. The citation for this links to what appears to be the e2fsprogs website, but (as of this writing) this webpage doesn't make any claims regarding what other software uses it to create UUIDs/GUIDs. Bowmanjj (talk) 16:11, 9 May 2012 (UTC)

I was bored so I dug around and confirmed that e2fsprogs does indeed implement UUID. It's under lib/uuid. A direct link that I hope works is http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=tree;f=lib/uuid;h=8b3114ef4e04e05248251da519633bcc982021ae;hb=HEAD Now how to site this? I have no idea 68.190.112.86 (talk) 10:47, 6 July 2012 (UTC)

I've added some links, though they're a bit ugly. For Mac OS X, e2fsprogs' implementation appears to have been introduced in 10.4; 10.3.9's CFUUIDCreate() instead calls _CFUUIDGenerate() which appears to originate with DEC (via HP and the OSF). However, I find the sentence problematic because it conflates what UUIDs are used for (e.g. for identifying filesystems) with what they are used by (e.g. by the kernel, for finding the root FS) and what they are generated by (e.g. gen_uuid.c). I think the anecdote that a single implementation has been copied into many projects is worth mentioning, but could use a little rephrasing. I also changed the wording WRT ext2/ext3 — UUIDs are not used by "the filesystem" (the filesystem driver almost certainly doesn't care!), they're used to identify the filesystem across reboots/hotplugging/etc. ⇌Elektron 02:24, 11 July 2012 (UTC)

LVM UUID / Non-standard UUIDs

The Logical Volume Manager (Linux) seems to use a non-standard form of UUID using digits and upper and lowercase letters, for a space of size 62^32 = 2.27x10^57 or 2^190. An LVM UUID of "VexQRf-qHxg-dQ8N-AM6r-Xtf0-WvItBa" does not fit the standard referenced in this article. The LVM code at http://git.fedorahosted.org/git/?p=lvm2.git;a=tree;f=lib/uuid;hb=HEAD documents the implementation. Should the page cover non-standard UUIDs like this? Drf5n (talk) 20:15, 30 July 2012 (UTC)

Not the only one horrified...

..but perhaps for a different reason, namely that UUIDs involve, in principle, the computer telling lies to the user about what an object's real name or identifier is. A disk partition is not /dev/sda1, it is actually 87937597593793753.... A user in Active Directory, likewise. This is obfuscation of the worst possible kind, and ought to be avoided if at all possible. It creates numerous operational difficulties where legitimate and normal changes to a computer give rise to unexpected and unpredictable results, and in some circumstances can cause backups to be unusable or data to be lost. --Anteaus (talk) 19:10, 13 September 2012 (UTC)

Fortunately, in most of the examples you mention, the computer is not lying. The thing has one or more names and one or more numbers, some of those numbers happen to be UUIDs. A disk partition is /dev/sda1 (meaning the first partition on the first disk that uses a libscsi driver with your current kernel and boot timing), partition # xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx in the GPT partition table on disk # yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy, the partition named "MyHomes" inside its superblock, the partition numbered zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz inside itsown superblock and the partition currently mounted as /home until you change your mind. A user in active directory is user RID #1011 in domain foo.example (which has # S-1-5-21-yyyyyyyyy-yyyyyyyyy), it is also SID S-1-5-21-yyyyyyyyy-yyyyyyyyy-1011, user "John Doe", login username jd@foo.example and AD object {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}. Users can (with some difficulty) see all these numbers and names, but tends to prefer some to others.
The operational difficulties are mostly caused by the other numbers, not the UUIDs, or by trying to use configuration files that refer to UUIDs that were not restored (such as creating a new filesystem with a new UUIDs and then restoring an /etc/fstab that refers to those UUIDs, or upgrading to a kernel that renames /dev/hda1 to /dev/sda1 as happened recently to all Linux users).
I will admit though that some recent operating systems (from both camps) tend to go out of their way to hide the real names of things from users, resulting in much operational difficulty, but the UUIDs are the least of the problems here, using pretty names such as "John Doe" while hiding the more specific names such as "jd" is a much bigger problem.
77.215.46.17 (talk) 03:43, 26 November 2012 (UTC)

Why the format?

One thing this article currently doesn't address is WHY this canonical format was chosen: Why 8-4-4-4-12, instead of 8-8-8-8, or 4-4-4-4-4-4-4-4, or something else altogether? Having 5 blocks of variable size (instead of power-of-2 blocks of fixed size) just doesn't look like something an IT guy would normally do. I guess it might have to do with the blocks using information from different sources in the initial scheme of MAC address+time, but I'm not sure about that. If information about this exists, it would be nice if it was added to the article. -- 91.48.253.170 (talk) 09:12, 3 April 2013 (UTC)

If you read the RFC, it's fairly obvious why the 8-4-4-4-12 was chosen. The UUID is (originally, and mostly) a 64-bit integer timestamp, a 16-bit counter in case you generate UUIDs faster than once every 100ns, and a 48-bit MAC address. That would give you
AAAAAAAAAAAAAAAA-DDDD-EEEEEEEEEEEE
Except that most systems at the time did not have a native 64-bit integer construct. For example, Windows uses a structure (ULONGLONG) that glues two 32-bit values together. And when dealing the the UUID structure in memory, it is helpful to have the two 32-bit values:
AAAAAAAA-BBBBBBBB-DDDD-EEEEEEEEEEEE
Next is the issue that 4 bits inside the "B" chunk are stripped out, and used to indicate a version. In order to make addressing that value easy (either reading or writing), it helps to split the "B" chunk also into two 16-bit chunks:
AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE
In other words, speaking as a programmer, it's obviously all a practical matter. Unfortunately in the original RFC the author didn't give what he was **thinking** at the time. So without any verifiable source of the author's thoughts (aside from the author himself telling us what he thought) there is no verifiable source. So if anyone explaining what i just said will have it removed from Wikipedia.
Pauladin (talk) 15:59, 10 April 2014 (UTC)

Number of atoms in the universe

I think that the comparison between the number of possible UUIDs and the (estimated) number of atoms in the universe is at best useless, at most misleading. They differ by 42 orders of magnitude! — Preceding unsigned comment added by 137.132.3.8 (talk) 02:33, 14 April 2015 (UTC)

Agreed. The "as many ____ as atoms in the universe" analogy is overused, anyway. – voidxor 22:12, 16 January 2017 (UTC)

Collisions (Duplication)

I am mostly liking the article now that it is merged with GUID. However, the current version of the article lacks two things which the previous version had:

  • an explanation of why RFC 4122 recommends version 5 (SHA1) over version 3 (MD5) and counsels against either in security applications.
  • discussion of the probability of collisions and duplication.

I'll come back to the first of these another time, but I'd like to bring up the second point now. The analysis of collision probability with UUIDs is generally done using the birthday problem. The previous version(s) of this article, going back years, mentioned this and provided examples. I understand why this was removed: there were no references supporting it.

Ironically, there are numerous references out there on the Internet which talk about the birthday problem and UUID's, and which do provide calculations of examples. They are almost all blogs and forums like Stack Overflow, which is a problem. More aggravating is that almost all of them got their information from one of the previous incarnations of the UUID article on Wikipedia! I can find legitimate references, such as academic journal articles, regarding the relationship between UUIDs and the birthday problem; but unfortunately they do not provide concrete examples.

voidxor, question for you: Assuming I had a valid reference which stated that the birthday problem was the correct way of analyzing UUID collisions, would it be allowed under Wikipedia policies for me to then plug the numbers into the birthday problem formula, so as to provide concrete examples, such as I did in the text which you didn't carry forward during the merge? Or would this be considered "original research"? It is just arithmetic, after all.

In short, I have a good-ish reference for the relationship between UUIDs and the birthday problem; but to make it concrete, I would have to do the calculations myself. Original research, or no? 73.253.110.94 (talk) 17:05, 18 January 2017 (UTC)

Last time I looked at this (which was some years ago) there was no chance of an overlap (ignoring clock rollover, which is longer than the Mayan calendar). What there was instead was a rate limit in how fast they could be allocated.
As my problem involved generating them (for database keys) in a way that the burst rate of their consumption could exceed the allocation rate, our fix for this was to install multiple NICs in the server, thus multiple MAC addresses and so a faster potential allocation rate. I dimly recall that MS SQL Server did this for us automatically, so was a much easier fix than any soft of buffering.
For a good discussion of the virtues of UUIDs as database keys, particularly for databases with distributed record creation, then look at Kimball's Data Warehouse Toolkit ISBN 0471200247 Andy Dingley (talk) 17:23, 18 January 2017 (UTC)

For version 1 UUIDs using MAC addresses, there is no chance of duplication if they are "correctly" generated. Part of "correctly" means not exceeding a maximum average rate of generation of 16384 per 100 nanoseconds per MAC address or node id. With proper programming, this can be an average rate, sometimes exceeded, because there are techniques for "pocketing" unused ones for later when the rate temporarily goes above 16384/100 nanoseconds. However, the key words here are "correctly generated". There are a lot of ways to generate them incorrectly, such as having a network card which duplicates the MAC address on another network card (known to have happened). You can have bugs. And even if you generate your UUIDs perfectly, for "universal" uniqueness you are also depending on all the other guys to generate their UUIDs correctly too, and of course your perfection does not prevent the other guys from having problems. And one thing we have learned from the Internet is that if somebody can figure out how to exploit your trusting everybody-should-just-play-nice-and-follow-the-rules scheme to his advantage or even just for his amusement, it will be exploited. So even the version 1 UUIDs come down to probabilities. Version 2 is similar, but the maximum average rate of generation per node-domain-id is lower. Versions 1 and 2, using randomly-generated node ids, versions 3 and 5 (hash-based) and version 4 (random) do have a chance of collision, even when generated perfectly. It is "almost" a zero chance, and how close to zero it actually is can be determined using the birthday problem formula. 73.253.110.94 (talk) 17:45, 18 January 2017 (UTC)

Fascinating!
Comments:
1) One of you guys said you actually generate these extremely fast in your IT operation. Do you have a dedicated UUID server, or do you just call a function in your database software to spit out a new UUID?
Note that if you do it inside the database program, it becomes impossible to overrun the UUID generator, which seems to be a big concern here.
2) Some standard said to use a hash of the generated UUID. That's a really great idea, but how could they possibly fail to notice that hashing each one would slow your generator by--what? 1,000 clocks per UUID?
3) Using the Mac address in the UUID is brilliant. It guarantees no collisions between computers. But within a pc, you just set almost half the bits in your preferably-random number to a constant.
if you have to use a 16,384 counter to insure uniqueness, then you're close to being fu cked by future technology. Instead of incrementing a 16384 counter, increment the Mac @ field with each UUID. That field is invulnerable to the time stamp stall that happens on really fast processors.
If you're worried about the incremented mac @ colliding with another mac @ in the same batch of NIC cards, just hash it before you start. --VerdanaBold 19:35, 13 March 2017 (UTC)

Hi, Verdana. Probably not kosher to be having this discussion on a Wikipedia Talk page, unless it can be related back to the article, but I'll add that if you are in a situation where version 1 or 2 UUIDs have to be generated so fast that you can't rely on the "uniquifying" clock sequence to keep them unique, and need to start incrementing the MAC address too, you are probably better off just using a version 4 random UUID, or one of the flavors (version 3 or 5) of hash-based UUID's. The MAC address is a 48-bit number, and even if you start at a random point, if you are generating UUIDs so fast that you use a large number of sequentially assigned MAC addresses, you are creating a larger and larger target for someone else using random node assignment to accidentally hit. Might as well throw in the towel, and make the whole UUID random. That way you get 122 bits of randomness (128 minus the 6 bits which tell you what kind of UUID it is). Personally, I don't see much point in version 1 or 2 with anything other than an actual MAC address from a manufacturer who can be relied upon to generate unique MAC addresses, and you are in a domain where you can be reasonably assured that other UUID generators aren't going to screw around (or you don't think you'll ever care.) If you don't have that, then one of the other UUID types is probably better for your use case. Person54 (talk) 22:06, 13 March 2017 (UTC)

Incorrect Article

The article says "anyone can set a UUID". But it also says it requires 128 bits. It doesn't elude to that software is required: WHICH IT IS.

The article doesn't say UUID can't be stored on any disks: byut only within the header for filesystems (which must be supported) on a partition (which must also be supported) of a disk.

Personally, I just tried to set UUID on a flash stick and I get blank <none>, but other software (an emulator) insists I must "use UUID" to conform with it's "booting methods"

"anyone can set a UUID" is far from true

MORESO, saying anything 128-bit (that isn't human readable and has no real standard except changing standard) is equivalent to "being a UUID" is absolutely assanine. By those standards any collected 128-bit from anyhwere are categorized under a new name. But the proper name for that is guess what? 128 bits. — Preceding unsigned comment added by 2600:8806:400:B090:4DD4:234D:C19F:DC85 (talk) 18:58, 20 June 2018 (UTC)

Er, wut? Try proofreading what you post so that it is comprehendible, and don't post irrelevant rants ... this isn't a forum. -- Jibal (talk)

Microsoft GUID

In the FORMAT section there is an example specifically labeled as Microsoft GUID. It does not specify what they mean by "Microsoft" GUID but MS does have their own GUID variant, according to https://tools.ietf.org/html/rfc4122#section-4.4. The GUID provided as an example, in the variant section starts with 'A' hex digit but Microsoft GUIDs start with C/D hex digits. Therefore that part of the format is ambiguous at lowest and incorrect at best. — Preceding unsigned comment added by Kein (talkcontribs) 16:56, 13 January 2020 (UTC)

MAC and clock - bad ideas?

It is common to copy MAC addresses over from one NIC to another in clustered environments. So make sure to use the factory MAC and not the MAC visible on the network.

The clock may generate (improbable) duplicates if it is set backwards. I already had to adjust the clock of a server backwards, probably due to a HW glitch. A rogue NTP server might be used as an attack vector.

Version 1 and 2 also have poor entropy. This is not necessarily a disadvantage. A more or less sequential nature of UUIDs may be a wanted feature [1].

The clock has the advantage to allow to reconstitute roughly in which order the UUIDs have been generated - if you can trust the clock.

BTW, version 6 is on the way: [2]

Stonux (talk) 14:11, 10 July 2020 (UTC)

References

UUID generator: why only uuidgen.org allowed?

While this generator is recommended by ITU-T, I don't see uuidgen.org being recommended by any authorities (ITU-T, ISO, IETF). Only thing in that website is "@Accelery", which is a twitter account with only one tweet: "Online UUID generators seem stuck in the past, so we made a new one." That's it, no more information is given. Very flimsy. How can we trust that? How superior is that from the one recommended by ITU-T? Any users of that tool can comment? There was a comment <!--Do not add any more GUID generator web sites; they will be removed. which discouraged editors to add alternative UUID generators. That's advertising, in my opinion. Therefore I removed it and replaced by the ITU-T one. Feelthhis (talk) 04:00, 19 December 2020 (UTC)

DOCTYPE Puzzle

Readers of this page may be interested in thw following discussion:

--Guy Macon (talk) 22:37, 1 June 2021 (UTC)

When Microsoft started using UUIDs

To help fill in the history for when Microsoft started using UUIDs (the "when" question is flagged in the article), I have a bit of information that might help someone search for the answer.

I attended a Microsoft conference at the Redmond campus in 1995 where a presenter gave information about UUIDs, how they are essentially unique, and that Microsoft was going to be using them.

That's not enough to put into the article, obviously, but maybe it'll help give a timeline if someone is researching this. — Preceding unsigned comment added by Chris uvic (talkcontribs) 18:20, 30 June 2021 (UTC)

Other variants of UUID

Reviewing logs in a web gateway, UUID Variant "e" was observed in the wild. This is not Random as fields 3 4 5 are the same between two seperate UUIDs

406bb3ab-a127-ec11-981f-c896653b5010 213d5787-2629-ec11-981f-c896653b5010

https://email.ngpvan.com/unsubscribeUnique/406bb3ab-a127-ec11-981f-c896653b5010/213d5787-2629-ec11-981f-c896653b5010?nvep=ewogICJUZW5hbnRVcmkiOiAibmdwdmFuOi8vdmFuL05HUC9OR1A0Mi8xLzg4NTAzIiwKICAiRGlzdHJpYnV0aW9uVW5pcXVlSWQiOiAiMjEzZDU3ODctMjYyOS1lYzExLTk4MWYtYzg5NjY1M2I5MjA4IiwKICAiRW1haWxBZGRyZXNzIjogImxvbGVyY29hc3RlckBjaXNjby5jb20iCn0%3D &hmac=wVRUIC0pn7S85lof8rzhxr1gJh2MXaBVOMZRmH_96HI=&id=1326529531

  • Log has been altered to remove PII while not changing data type/or meaning

173.38.117.90 (talk)DBNerd — Preceding undated comment added 13:13, 11 October 2021 (UTC)

New Versions

There is a draft RFC which introduces versions 6, 7 and 8, as well as a form called MAX UUID.

https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#section-3

I don't know if this proposal merits inclusion, or if we should wait until it is finalized.

Bend1010 (talk) 20:31, 20 July 2022 (UTC)

ULID

Hi folks. As a heads-up, I created a page for universally unique lexicographically sortable identifier (ULID). – ClockworkSoul 17:35, 28 September 2022 (UTC)

Variant 2 UUID and COM/OLE

This page states:

"Variant 2 UUIDs, historically used in Microsoft's COM/OLE libraries, use a mixed-endian format, whereby the first three components of the UUID are little-endian, and the last two are big-endian."

But blog post from Microsoft engineer Raymond Chen https://devblogs.microsoft.com/oldnewthing/20220928-00/?p=107221 states:

"No, it is little-endian all the way. But if you don’t understand how GUIDs are formed, it might look like some parts are big-endian."

Wongm (talk) 03:27, 31 October 2022 (UTC)

Variants 1 and 2 confusion

Several distinctions in the main article seem to be backward (and comments below tend to confirm suspicion). For example, the main article claims that variant 1, aka Leach-Salz, are most common. I believe it is in fact variant 2, aka Leach-Salz, that are most common. For example:

Java UUID documentation, several times refers to variant 2 as the "Leach-Salz" variant, such as: "The layout of a variant 2 (Leach-Salz) UUID is as follows" and "valid only for a UUID with a variant value of 2, which indicates the Leach-Salz variant".

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/UUID.html

Java's standard UUID.randomUUID() factory method returns version 4, variant 2 by default (as tested and proven).

It is stated that variants 1 vs 2 are distinguished by endian-ness. The article also claims that these distinctions are lost in textual representation (that variant 1 and 2 are the same as text). These may be correct assertions, but of course, one must be sceptical.

I could imagine these inaccuracies may have been introduced when merging the UUID and GUID articles. It may be that variant 1 was the most common GUID and variant 2 is the most common UUID (maybe), yet we now pretend that both UUID and GUID are the same (with different variants and versions).

Alexgenaud (talk) 11:20, 9 November 2022 (UTC)

Adding a bit to your comment (not disagreeing at all):
As it stands now, the article uses the terms "variant 1" and "variant 2" in a couple of places, but the section on variants only gives names to the variants and says what is in the 3-bit field to indicate each one. It doesn't number them.
Way down in section "Version 4 (random)", the article does define in passing what it considers variant 1 and variant 2 to be, in terms of the 3-bit values. That definition makes OSF DCE variant 1 and Microsoft COM/DCOM variant 2. So I guess the idea was that Apollo NCS is variant 0.
It would be easy to clean this up by defining the variant numbers up in the Variants section or in the variant/field table. Except if other references like the Java documentation number the variants differently, that's not ideal. It would be better to use the same variant numbering as the industry, if that's consistent. A problem with the Java numbering is that it takes the variant field as being only two bits, so both Microsoft COM/DCOM and Reserved for Future Use are 3. Tim Mann (talk) 20:53, 29 June 2023 (UTC)

Upcoming versions

I suggest we make reference to the upcoming updates on UUID from IETF (https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-07.html#name-update-motivation), especially on the "Uses" section, when it goes:

The random nature of standard UUIDs of versions 3, 4, and 5, ...may create problems with database locality or performance when UUIDs are used as primary keys.

I think it's helpful to know that one can address UUID v4's issues for DB keys with UUID v7, even though it is still a draft.

On the draft state, its worth mentioning its been worked since 2021 and there are implementations out there (JS: 1, 2).

Thoughts? Leite~enwiki (talk) 17:08, 17 August 2023 (UTC)

Citation needed

The text says a citation for DomainOS is needed. Here are two:

Leach, P. J., Levine, P.H., Hamilton, J. A., Stumpf, B.L., "UIDs as Internal Names in a Distributed File System," in Proceedings ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, Ottawa, Ont., Aug. 18-20, 1982, pp. 34-41.

Leach, P. J., Levine, P.H., Douros, B. D., Hamilton, J. A., Nelson, D. L., Stumpf, B.L., "The Architecture of an Integrated Local Network," IEEE Journal on Selected Areas in Communications, v.SAC-1, n.5, Nov. 1983, pp. 842-857.

The first is more about UIDs in DomainOS, the second is about the OS overall. Paul Jay Seattle (talk) 23:18, 15 September 2023 (UTC)