Skip navigation.

Software Development

Correcting The Future

GUID

There was a recent post titled GUIDs are Great gotten from reddit here. The comments on reddit seemed curious to me. There was a concerted effort to convince people that GUID's were good enough for everyone. I think I talked about this topic on this blog before, but I can't remember. In any case, I think it deserves another mention.

Is my math off? Let's consider two scenarios. In all cases, we'll assume a GUID takes 100 cycles to compute. It may be more. It may be less. But even if I'm off by a factor of 100, you only need 100 times as many machines or you'll need machines that are 100 times faster which will happen in about 10 years.

Before getting to the two scenarios, we have to consider what it takes to create a collision. Most would think that since a GUID is 128 bits, the odds of a collision would be 1/(2128). Not so. That's the probability of a collision on ONE GUID. The probability that you'll produce an identical GUID as another that is already out there goes up dramatically. So much so that in order to achieve odds of 50% of producing a collision, you actually need sqrt(2128) = 264 GUID's. The address space of GUID's has significantly diminished. To understand this, one must know about the birthday paradox.

Our objective is now to produce 264 GUID's.

Scenario #1

Year: 1995
CPU: Intel Pentium
Speed: 100 Mhz

100,000,000/100 = 1,000,000 GUID's per second.

Time required for 264 GUIDs
Seconds: 18446744073709.55
Minutes: 307445734561.82
Hours: 5124095576.03
Days: 213503982.33
Years: 584942.42

We're safe for 585,000 years. Sounds good, eh? Well, you only need 585,000 Pentium machines from 1995 to get a collision within a year. Doesn't sound so good to me though it's certainly nothing to fret over.

Scenario #2

Year: 2005
CPU: Intel Pentium 4
Speed: 3.8Ghz

3,800,000,000/100 = 38,000,000 GUID's per second.

Time required for 264 GUIDs
Seconds: 485440633518.67
Minutes: 8090677225.31
Hours: 134844620.42
Days: 5618525.85
Years: 15393.22

15,000 years! Still great, no?

In ten years, we went from requiring 585,000 years to 15,000 years. How did that happen? That's 39 times shorter than in 1995. Let's see what will happen in 2015. We know processor speed doubles every 18 months. And GUID creation is especially conducive to multicore processing. So in ten years, we'll get 7 doublings in speed. 4Ghz*27 = 512Ghz. Will we have 128 or 256 core computers in 2015? Probably not. But it won't be far off. Maybe quantum computing will take off? What we know is that we'll get this kind of speed sooner or later. So let's look at this future scenario.

Scenario #3

Year: 2015
CPU: Maas Biolabs
Speed: 512Ghz

512,000,000,000/100 = 5,120,000,000 GUID's per second.

Time required for 264 GUIDs
Seconds: 3602879701.90
Minutes: 60047995.03
Hours: 1000799.92
Days: 41700.00
Years: 114.25

114 years! Still sounds good? Uh, NO! It means you need 114 machines running for a year to get 50% chance of collision. Not only that, but once you cross that barrier, there is no going back. That's the thing to remember with GUID's. Once they are put in use, the odds of a collision get higher and higher. These next 10 years brought 135 times the progress of the last 10 as far as GUID production. It'll get even worse after that.

There's another issue to take into account. Right now, there aren't many reasons to share GUID's all over the world between different applications. If you use a GUID, it's usually within one application. Once distributed computing takes off, pieces of data and code objects will have to be identified. These can be created in a nanosecond. You'll have billions and billions of these being produced per second all over the world. Under such a scenario, you'll get a guaranteed collision in under a second.

The only two things keeping GUID's in use today is the fact that most of their uses are in different applications and that machines and their software aren't fully distributed on a massive scale yet.

Are my numbers off? Why do the 'best experts' in the field still say that GUID's are great and there's no need to check for collisions or even worry about it. The odds they mention also seem very weird to me. I don't understand where they get their numbers. I often see static odds. But collisions become more likely the more GUID's you have. The odds cannot possibly be static. So either I'm missing a big piece of the puzzle or these experts are seriously misinformed.

D3D Variable Width Line Drawing in Hardware (w/ screenshot)Project V TODO List (Updated April 15, 2009)

Comments

Anonymous 16. August 2008, 19:09

mind writes:

Well, GUIDs aren't hashes. One part of them is literally a machine's ethernet mac address, one part is literally the time, etc. A machine which generates "too many" GUIDs will only cause collisions with itself.

I do think they've got a limited scope of applicability. If you're looking for a global identifier, you most likely either need to adjust your design or put more thought into your requirements (GUIDs are certainly forgeable). And their obtuse grouping of digits causes me to write them off as a Microsoft-only thing.

Anonymous 16. August 2008, 20:27

Assaf Arkin writes:

GUID actually means different things to different people. Atom and RSS, for example, use that term for strings of arbitrary size. Microsoft uses that term for their version of UUID (RFC 4122), which are indeed 128 bits long.

Part of UUID are control bits which tell you which version it is. Version 1 includes the machine's MAC address, which is unique to that machine, at least until 2100, when we start running out of MAC address numbers. So if you use version 1, you're only risking colliding with yourself.

Then there's the clock number, which limits you to one UUID per 100 nanoseconds (more if you can take advantage of the sequence number). If your sole purpose is to create collisions with yourself, today's hardware is good enough. But if you're doing something productive, that rate is good enough.

So while your math looks correct, it assumes software will do nothing but look for ways to collide. In practice, you get a space to prevent collisions with others, it's much smaller than 128 bits and you're going to waste (read: never use) most of the UUIDs allocated to you, yet, it's not a problem.

Anonymous 8. August 2009, 02:33

Just Some Guy writes:

You are missing the big puzzle piece.

Collisions only matter when they occur between two identifiers that must be unique within a given context. Take COM components. A collision would only matter if two different COM components with the same GUID were published into the same component directory. If you have a collision between a COM component and a laser printer, it wouldn't matter.

Increasing computer speed by 1000X does not appreciably increase the speed with which COM components are published.

Vorlath 8. August 2009, 03:23

If you use context, then the ID's are no longer global. The whole point of GUID's is that you don't need context.

The speed increase wasn't the point. It was an analogy as to how much the industry would start needing exponentially more ID's as time goes on. If someone decides to use GUID's to ID everything from albums to cooking recipes, we'll run out of GUID's pretty quickly.

Anonymous 17. August 2009, 14:55

Hason writes:

My GUID is:

EB563CE2-8B3D-11DE-A2C5-8FFC55D89593

Has anybody else got that one?

How to use Quote function:

  1. Select some text
  2. Click on the Quote link

Write a comment

Comment
(BBcode and HTML is turned off for anonymous user comments.)

If you can't read the words, press the small reload icon.


Smilies

December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31