How UUIDs work internally

May 12, 2026 · 15 min read

Strings like 550e8400-e29b-41d4-a716-446655440000 are just one skin on 16 bytes. When integrations fail, the bug is almost always byte order or version bits, not "broken UUID math."

The 128-bit layout

RFC 4122 names five logical fields packed into 128 bits:

time_low (32 bits)
time_mid (16 bits)
time_hi_and_version (16 bits) - includes 4-bit version
clock_seq_hi_and_reserved + clock_seq_low (16 bits) - includes variant
node (48 bits)

Random v4 reuses this skeleton but fills most bits with entropy. v7 overwrites the front with a Unix timestamp. The hyphenated string is big-endian hex presentation of those fields in a defined order.

Example: 550e8400-e29b-41d4-a716-446655440000
         |time_low |mid |ver|seq|      node      |
Version nibble ----^ (4 in "41d4" -> 0x4xxx)

Version and variant nibbles

The version sits in the high nibble of time_hi_and_version. Valid values include 1, 3, 4, 5, 6, 7, 8. The variant is encoded in the most significant bits of the clock sequence field; RFC 4122 variant starts with 10. Validators reject strings where these bits are wrong even if hex length is perfect.

Example: a string with version nibble 0 or variant that does not match RFC 4122 is not a valid UUID even if every character is hexadecimal. That is why copy-paste from corrupted logs often fails validation - one nibble flipped during OCR or PDF export breaks structural rules.

How v4 and v7 populate the same skeleton

Version 4 sets the version to 4 and fills the remaining 122 bits with cryptographically strong randomness (implementation quality matters - use OS CSPRNG, not Math.random()).

Version 7 writes a 48-bit Unix timestamp in milliseconds into the most significant bits, then random data for the rest. The result still parses as a UUID string, but the leading hex changes slowly over time, which is why indexes behave more like auto-increment integers than like random v4 keys.

v4: random everywhere (except version/variant)
v7: | timestamp (48b) | random (80b) |  (simplified view)

Endianness: where Microsoft GUIDs diverge

The first three fields (time_low, time_mid, time_hi_and_version) are stored little-endian in some ecosystems (.NET Guid, SQL Server uniqueidentifier on wire). The last two fields remain big-endian. Copying hex dumps between Java and C# without conversion produces "different" UUIDs for the same logical value.

MongoDB Extended JSON uses subtype 04 (standard) or subtype 03 (legacy C#). Our MongoDB UUID converter exists because this confusion is routine in data migrations.

How databases store UUIDs

CHAR(36) / VARCHAR - human-readable, larger indexes.
BINARY(16) - compact; driver must encode/decode.
Native UUID type (PostgreSQL) - database handles parsing.

Random v4 in BINARY(16) still fragments indexes. v7 or ULID improves sequential insert performance because leading bytes change slowly.

-- PostgreSQL: store as native uuid, compare efficiently
SELECT id FROM users WHERE id = '550e8400-e29b-41d4-a716-446655440000'::uuid;

-- MySQL 8+: UUID_TO_BIN / BIN_TO_UUID for byte order control
SELECT UUID_TO_BIN('550e8400-e29b-41d4-a716-446655440000', 1);

When exporting to data warehouses, teams often denormalize UUID to string for Parquet readability. Re-importing those files is where endianness bugs return - always store the canonical string alongside raw bytes if you need lossless round-trip.

A practical debugging workflow

Normalize to lowercase canonical string.
Confirm version nibble matches generator you think you used.
Dump 16-byte hex from DB and compare to converter output.
If mismatch only in first 8 bytes, suspect endianness.
Log both string and binary during one request to catch driver bugs early.

FAQ

Why does my UUID start with ff?: Check if you are viewing binary as hex without proper field boundaries, or if data is not a UUID at all.
Are UUIDs always 36 characters?: Canonical hyphenated form yes. Compact 32-hex and URN forms are common alternatives.
Can two different strings be the same UUID?: Yes if casing differs or leading zeros are omitted in non-canonical parsers. Always use a strict validator.
What is the UUID epoch?: v1 timestamps count 100-ns intervals since 1582-10-15 UTC. v7 uses Unix ms. Do not mix semantics when parsing.
How many bytes is a UUID on the wire?: Always 16 bytes. String form is a presentation layer.