UUID v4 vs. v7 vs. ULID vs. Snowflake — which ID when?
When creating a new table, most developers reflexively reach for UUIDv4 as the primary key — it's available everywhere, unique, and simple. But v4 has a stubborn performance problem in large tables that UUIDv7 (RFC 9562, 2024), ULID, and Snowflake all solve elegantly. This article classifies the four variants and gives a pragmatic recommendation.
Why use random IDs at all?
Classic auto-incrementing integer IDs are compact and sortable but have two hard problems. First, they leak business information: anyone hitting /invoices/4711 immediately knows there are about 4,711 invoices — and can discover further tickets, orders, or user profiles by incrementing. Second, they scale poorly across distributed systems because they require a central sequence.
Random or pseudo-random IDs solve both problems: they're not enumerable, and any node can generate them independently without collision risk. The standard for this has been the UUID for decades — a 128-bit value with different properties depending on the version.
UUIDv4 — the classic
UUIDv4 is essentially 122 bits of randomness plus 6 bits of version and variant marker. It's generated from a cryptographically secure random number generator. The collision probability is astronomically low — even at a trillion IDs per second over 100 years, you wouldn't statistically expect a duplicate.
That randomness is both a strength and a weakness. Strength: no central coordination, no leak, no timestamp inside. Weakness: in a B-tree index, successive inserts are scattered across the entire index structure — each insert touches a different page, the cache barely helps, and the index fragments.
The index problem in practice
Example: a MySQL table with 100 million rows and a UUIDv4 primary key has a multi-GB InnoDB B-tree. Inserting a new row lands somewhere in the middle of the index; a page has to be read, modified, and written back. At 10,000 inserts per second, that's 10,000 random disk seeks (or at least cache misses) — write performance collapses.
With a monotonically growing ID (auto-increment, timestamp-based, etc.), inserts always land at the end of the index, on the same or a neighboring page. The cache almost always hits, the index stays compact. This is exactly where the newer ID schemes come in.
UUIDv7 — timestamp prefix per RFC 9562
UUIDv7 was standardized in 2024 with RFC 9562. It combines the first 48 bits as a Unix timestamp in milliseconds with 74 bits of randomness plus the usual 6 bits of version marker. The result: the ID is sortable by time but, with 2^74 possible random suffixes per millisecond, practically collision-free.
For databases this is a game-changer: inserts happen almost monotonically increasing, the index stays sequential, and write performance is practically that of integer IDs. At the same time, the advantages of random UUIDs are preserved — no enumeration, no central sequence, multi-region capable. Postgres, MySQL, and SQL Server have had native support or extensions since 2024/25.
ULID — the unofficial predecessor
ULID (Universally Unique Lexicographically Sortable Identifier, 2016) anticipated UUIDv7 for years: 48 bits of timestamp, 80 bits of randomness, lexicographically sortable in Crockford Base32 (26 characters instead of UUID's 36). Many languages and databases got ULID libraries long before RFC 9562 was finalized.
Functionally, ULID is almost identical to UUIDv7. Differences: shorter string form (good for URLs), Crockford Base32 avoids visually similar characters, but there's no formal RFC standard. Starting fresh today, prefer UUIDv7 — already on ULID? No reason to migrate.
Snowflake — Twitter, Discord, X
Snowflake IDs are 64-bit integers (not UUID format!): typically 41 bits of timestamp, 10 bits of machine ID, 12 bits of sequence counter per millisecond. They fit into a BIGINT, are extremely compact, sortable by time, and can produce 4,096 IDs per millisecond per node across up to 1,024 nodes without coordination.
Trade-off: Snowflake leaks the timestamp very precisely (and with it the creation date of objects — as on Twitter, where the creation date of a tweet can be reconstructed directly from the ID). If that's a problem, use UUIDv7. Snowflake shines where extreme write rates and compact IDs matter — and implementations like Sonyflake or TSID extend the concept.
Decision guide
A compact rule of thumb:
- UUIDv4: for small tables, external API IDs, tokens — anywhere sorting doesn't matter and maximum randomness counts.
- UUIDv7: today's default for new tables with high write volume. Standardized, widely supported, monotonic, no index pain.
- ULID: when you already have legacy code with it, or when the shorter string form (26 characters) for URLs matters.
- Snowflake: at very high write rates (Twitter, Discord, trading), when 64-bit BIGINT instead of 128-bit UUID saves significant storage and a timestamp leak is acceptable.
Frequently asked questions
Can I migrate from UUIDv4 to UUIDv7?
Existing IDs remain valid — the UUID format is backward compatible. From a certain cutoff date you can create new inserts with v7 and leave old v4 IDs untouched. The index then immediately benefits for new entries. A real backfill migration is usually not worthwhile because old UUIDs are linked from external systems.
Is UUIDv7 really multi-region safe?
Yes — the RFC requires millisecond-level clock synchronization, easily achievable via NTP. If two nodes generate IDs in the same millisecond window simultaneously, the 74 random bits make collisions practically impossible. Only for a strict global sort order do you still need explicit ordering — that's not guaranteed by the UUID standard.
Should I store UUIDs as BINARY(16) or as strings?
Performance-wise: always BINARY(16). A UUID as a string (CHAR(36)) takes 2.25x the space, loads slower, and bloats indexes. Postgres has a native uuid type; MySQL and SQL Server need BINARY(16) with helper functions. The only reason for strings: debug comfort when logging — better solved with hex/uuid display in your tools.
Komentarze
Komentarze są dostarczane przez Disqus. Przed ich załadowaniem potrzebujemy Twojej zgody — Disqus to usługa zewnętrzna i ustawia własne pliki cookie.