How MongoDB stores UUIDs internally: BSON Binary and field layout
June 2, 2026 · 15 min read
Developers often ask what MongoDB “really” stores when a field looks like a UUID in Compass. The answer is not a special UUID BSON type in the way SQL Server has a dedicated UNIQUEIDENTIFIER column type; it is a Binary element with a length, a subtype byte, and sixteen payload octets. Everything else - shell pretty-printing, Extended JSON, driver wrappers - is interpretation layered on that envelope. Understanding the envelope explains duplicate key behavior, index size, and why replication oplogs show base64 blobs instead of hyphenated strings.
The BSON Binary envelope
Each BSON element begins with a type byte. For binary data, the type is 0x05. The field name is a CString, followed by a 32-bit little-endian integer length, then the subtype byte, then the raw bytes. For UUIDs the length is 16 and the subtype is 03 or 04. No hyphens are stored; those appear only when a tool formats the value for humans.
Because the subtype is part of the value, two binaries with identical 16-byte payloads but different subtypes are not equal in MongoDB comparison semantics. That detail matters when migrations rewrite only the payload but forget to update the subtype byte, or when a script constructs Binary manually with subtype 0 (generic) instead of 04.
How UUID fields sit inside documents
_id fields are commonly UUID binary in microservice architectures where ObjectId’s timestamp semantics are unwanted. Foreign keys mirror the same type. Embedded subdocuments may carry correlation IDs as binary to keep array payloads compact. When the same identifier appears in multiple shapes - string in an audit log collection and binary in the operational collection - joins must convert explicitly.
{
"_id": { "$binary": { "base64": "VQ6EAOKbQdSnFkRmVUQAAA==", "subType": "04" } },
"tenantId": { "$binary": { "base64": "...", "subType": "04" } }
}
Extended JSON is how exports and log pipelines usually show binary UUIDs. Base64 encoding expands size on the wire but remains unambiguous about subtype. Always preserve the subtype field when round-tripping through ETL; dropping it forces the next loader to guess.
Indexes on UUID primary and foreign keys
B-tree indexes compare binary UUIDs lexicographically by byte order, which for subtype 04 aligns with a naive memcmp of the RFC octet sequence - not necessarily time order unless you use UUID v7 or similar time-ordered layouts. Random UUID v4 keys insert throughout the index, causing more page splits than monotonic ObjectIds. That trade-off is often acceptable for globally unique IDs generated offline.
Compound indexes that lead with a UUID tenant key still benefit from binary storage: keys are shorter than UTF-8 strings, cache lines hold more entries, and collation rules do not apply. Partial indexes and unique constraints behave exactly as with any other BSON type - provided all writers agree on subtype and byte order.
WiredTiger, compression, and on-disk representation
WiredTiger stores BSON documents as encoded buffers with optional compression (snappy/zlib/zstd depending on configuration). UUID fields compress differently than repetitive strings; highly entropic random v4 values resist compression within the field itself but still save space versus 36-byte strings because there are fewer bytes to begin with.
Backup tools (mongodump, Ops Manager, Atlas snapshots) copy the encoded BSON verbatim. Restoring into a cluster with different UUID codec defaults does not rewrite payloads automatically. Disaster recovery runbooks should note subtype expectations the same way they note schema versions.
Operational implications for SRE and DBAs
Monitoring queries that filter on UUID should use bind parameters with binary types, not interpolated strings, to hit indexed plans. Explain output showing COLLSCAN on a UUID field often traces to string literals compared against binary columns. Schema validation with $jsonSchema can enforce bsonType binary with subtype 04 for defense in depth.
When cloning data to analytics warehouses, document whether BigQuery, Snowflake, or Spark readers interpret bytes as strings, structs, or FIXED(16). A one-time conversion UDF at the warehouse boundary is cheaper than silent mis-joins across billions of fact rows.
Replica set elections and sharded chunk migrations do not rewrite UUID payloads; what you inserted is what travels in the oplog. That immutability is a feature for forensic replay - if a UUID byte changes in replication, suspect application updates, not MongoDB internals rearranging BSON.
FAQ
- Does MongoDB have a native UUID BSON type?
- No. UUIDs are stored as BSON Binary with subtype 03 or 04. Drivers may expose a UUID class, but on the wire it is still Binary.
- Are UUID _id values smaller than ObjectIds?
- ObjectIds are 12 bytes; UUID binary is 16 bytes. UUIDs are still smaller than string UUIDs and offer different uniqueness semantics.
- How do I see raw bytes in mongosh?
- Use Extended JSON export, hexBin helpers, or driver scripts. Compass shows a friendly UUID view when subtype is 03 or 04.
- Does changing subtype change WiredTiger file size?
- Subtype is one byte per value; migrating 03 to 04 with the same logical IDs changes only those bytes and index entries, not the overall schema shape.
Related: Binary UUID explained · Convert binary to string · Why UUID hex looks different · All UUID tools