MongoDB binary UUID explained - cover art

MongoDB and UUID 14 min read

MongoDB binary UUID explained: subtype 04 and 16 bytes

May 26, 2026 · 14 min read

When you store a UUID in MongoDB as a native value - not a 36-character string - you are usually looking at a BSON Binary field with exactly 16 bytes of payload and a subtype byte that tells clients how to interpret those bytes. For greenfield applications on modern drivers, that subtype is almost always 04, meaning “UUID in RFC 4122 byte order.” Understanding that single field prevents the classic bug where the same logical identifier prints as two different hex strings depending on whether you query in the shell, export Extended JSON, or read the value in C# or Java.

Why MongoDB uses Binary instead of strings

Storing UUIDs as strings works until it does not. A canonical hyphenated UUID is 36 bytes in UTF-8, while the binary form is 16 bytes plus a few bytes of BSON overhead. On large collections, that difference compounds in working set size, index B-tree footprint, and backup volume. More importantly, string comparison is lexicographic, not semantic: uppercase versus lowercase, missing hyphens, and legacy brace formats all create duplicate-key surprises that binary storage avoids entirely.

Binary UUIDs also align with how drivers and aggregation pipelines think about identity. Equality is bytewise. Index bounds are predictable. When you project to Extended JSON for debugging, tools can render subtype and base64 consistently. Teams that standardize on subtype 04 early spend less time reconciling microservices that each invented their own string normalization rules.

What subtype 04 means in practice

BSON Binary subtypes are a contract between the database and your application runtime. Subtype 0x04 was reserved for UUIDs so drivers no longer had to guess whether 16 arbitrary bytes were a GUID, a hash, or opaque blob data. When subtype is 04, encoders write the time_low, time_mid, time_hi_and_version, clock_seq, and node fields in the order RFC 4122 specifies for the canonical string - not the legacy Microsoft mixed-endian layout used by older .NET drivers (subtype 03).

If you open a document in MongoDB Compass or mongosh and see UUID("550e8400-e29b-41d4-a716-446655440000"), the shell is presenting a subtype-04 value. Export the same document through a driver configured for subtype 03 and the hex payload can permute while the string still “looks” correct after conversion. That mismatch is endianness, not corruption, but it feels like corruption when two services disagree on storage settings.

The 16-byte layout under the hood

A UUID is 128 bits. In binary form those bits are stored as 16 octets with no hyphens. The familiar 8-4-4-4-12 string groups are a human rendering; the bytes themselves follow the RFC field structure. Version nibble and variant bits live in time_hi_and_version and clock_seq_hi, which is why validators can reject malformed values even when the hex length is correct.

// Node.js (BSON UUID, subtype 4)
import { UUID } from "mongodb";

const id = new UUID("550e8400-e29b-41d4-a716-446655440000");
// id._bsontype === "Binary"
// id.sub_type === 4
// id.buffer.length === 16

await db.users.insertOne({ _id: id });

When you hash or sign payloads that include UUID bytes, always specify whether you mean raw binary or canonical string form. Cryptographic libraries frequently assume lowercase hyphenated text, while MongoDB stores raw octets. Document the choice in your API schema so consumers do not accidentally double-encode.

How drivers read and write subtype 04

Modern MongoDB drivers expose a first-class UUID type that maps to BSON Binary subtype 04 by default. In Node, Python, Go, and Rust, constructing a UUID from a canonical string sets subtype 04 unless you override legacy behavior. Java and .NET require more care: older stacks defaulted to subtype 03 for Guid compatibility, and migration guides often mention explicit codec configuration.

Aggregation operators such as $toUUID and $toString (where available) assume RFC byte order when the input is already binary subtype 04. If your pipeline receives hex strings from an external system, normalize to UUID type before comparing to stored binary fields; string-to-string compare against binary will not do what you expect.

Pitfalls when mixing strings and Binary

The most expensive mistake is storing some documents with string IDs and others with binary IDs in the same collection. Unique indexes treat "550e8400-..." and Binary(16) as different values even when they represent the same identity. Secondary mistakes include copying hex from logs without subtype context and assuming Compass’s friendly UUID view matches what a legacy C# reader decodes.

Establish a schema rule: primary keys and foreign keys use BSON UUID subtype 04 unless you are explicitly integrating with a subtype-03 system. Enforce it in ODM validators, migration scripts, and code review checklists. When you must interoperate with legacy GUID bytes, convert at the boundary and store only one representation inside MongoDB.

FAQ

Is subtype 04 the same as a UUID string?
They represent the same 128-bit value when byte order is RFC 4122. The string is a rendering; subtype 04 is the compact binary storage form MongoDB prefers for indexes and equality.
How many bytes does a MongoDB UUID use?
The payload is 16 bytes. BSON adds a subtype byte and length fields, so on-disk size is slightly larger than 16 bytes but far smaller than a 36-character UTF-8 string.
Can I change subtype without changing the logical UUID?
Sometimes you only need to reinterpret byte order (03 vs 04). That is a transformation, not an in-place subtype rename. Use a tested conversion routine rather than flipping the subtype byte alone.
Does MongoDB generate UUIDs for me?
MongoDB can generate ObjectIds by default, but UUID primary keys are application-generated unless you use server-side scripts. Drivers typically call random UUID v4 or time-ordered UUID v7 generators client-side.

Related: Subtype 03 vs 04 · How MongoDB stores UUIDs · MongoDB UUID converter · Binary UUID to string

Browse all tools