How UUIDs work internally
May 12, 2026 · 15 min read
Strings like 550e8400-e29b-41d4-a716-446655440000 are just one skin on 16 bytes.
When integrations fail, the bug is almost always byte order or version bits, not "broken UUID math."
The 128-bit layout
RFC 4122 names five logical fields packed into 128 bits:
time_low(32 bits)time_mid(16 bits)time_hi_and_version(16 bits) - includes 4-bit versionclock_seq_hi_and_reserved+clock_seq_low(16 bits) - includes variantnode(48 bits)
Random v4 reuses this skeleton but fills most bits with entropy. v7 overwrites the front with a Unix timestamp. The hyphenated string is big-endian hex presentation of those fields in a defined order.
Example: 550e8400-e29b-41d4-a716-446655440000
|time_low |mid |ver|seq| node |
Version nibble ----^ (4 in "41d4" -> 0x4xxx)
Version and variant nibbles
The version sits in the high nibble of time_hi_and_version. Valid values include 1, 3, 4, 5, 6, 7, 8.
The variant is encoded in the most significant bits of the clock sequence field; RFC 4122 variant starts with 10.
Validators reject strings where these bits are wrong even if hex length is perfect.
Example: a string with version nibble 0 or variant that does not match RFC 4122 is not a valid UUID
even if every character is hexadecimal. That is why copy-paste from corrupted logs often fails validation - one
nibble flipped during OCR or PDF export breaks structural rules.
How v4 and v7 populate the same skeleton
Version 4 sets the version to 4 and fills the remaining 122 bits with cryptographically strong randomness
(implementation quality matters - use OS CSPRNG, not Math.random()).
Version 7 writes a 48-bit Unix timestamp in milliseconds into the most significant bits, then random data for the rest. The result still parses as a UUID string, but the leading hex changes slowly over time, which is why indexes behave more like auto-increment integers than like random v4 keys.
v4: random everywhere (except version/variant)
v7: | timestamp (48b) | random (80b) | (simplified view)
Endianness: where Microsoft GUIDs diverge
The first three fields (time_low, time_mid, time_hi_and_version) are stored
little-endian in some ecosystems (.NET Guid, SQL Server uniqueidentifier on wire).
The last two fields remain big-endian. Copying hex dumps between Java and C# without conversion produces "different" UUIDs for the same logical value.
MongoDB Extended JSON uses subtype 04 (standard) or subtype 03 (legacy C#). Our MongoDB UUID converter exists because this confusion is routine in data migrations.
How databases store UUIDs
- CHAR(36) / VARCHAR - human-readable, larger indexes.
- BINARY(16) - compact; driver must encode/decode.
- Native UUID type (PostgreSQL) - database handles parsing.
Random v4 in BINARY(16) still fragments indexes. v7 or ULID improves sequential insert performance because leading bytes change slowly.
-- PostgreSQL: store as native uuid, compare efficiently
SELECT id FROM users WHERE id = '550e8400-e29b-41d4-a716-446655440000'::uuid;
-- MySQL 8+: UUID_TO_BIN / BIN_TO_UUID for byte order control
SELECT UUID_TO_BIN('550e8400-e29b-41d4-a716-446655440000', 1);
When exporting to data warehouses, teams often denormalize UUID to string for Parquet readability. Re-importing those files is where endianness bugs return - always store the canonical string alongside raw bytes if you need lossless round-trip.
A practical debugging workflow
- Normalize to lowercase canonical string.
- Confirm version nibble matches generator you think you used.
- Dump 16-byte hex from DB and compare to converter output.
- If mismatch only in first 8 bytes, suspect endianness.
- Log both string and binary during one request to catch driver bugs early.
FAQ
- Why does my UUID start with ff?
- Check if you are viewing binary as hex without proper field boundaries, or if data is not a UUID at all.
- Are UUIDs always 36 characters?
- Canonical hyphenated form yes. Compact 32-hex and URN forms are common alternatives.
- Can two different strings be the same UUID?
- Yes if casing differs or leading zeros are omitted in non-canonical parsers. Always use a strict validator.
- What is the UUID epoch?
- v1 timestamps count 100-ns intervals since 1582-10-15 UTC. v7 uses Unix ms. Do not mix semantics when parsing.
- How many bytes is a UUID on the wire?
- Always 16 bytes. String form is a presentation layer.
Related: How to validate UUIDs · UUID validator tool