June 23, 2025·Any3D Team

Slimming a Model, Lesson One: The Three Weapons of Vertex Compression

3d-compressionvertex-compressionmeshoptdracogltf

In the previous article we cracked open a GLB file and saw textures eat 80% of the size while vertices take only 10-20%. So is vertex compression irrelevant?

Quite the opposite. When a model's textures are already compressed to KTX2 and the vertices are dense, the remaining 20% is vertices—and that 20% can be cut in half, or even by 90%. More importantly, vertex compression is one of the few optimizations that is almost free and works instantly: add a few commands, swap a decoder, and the file is thinner.

This article clarifies three things: what vertex data actually looks like; the temperament of each of the three approaches (quantization, MeshOpt, Draco); and a conclusion that will save you from pitfalls—there is no "best" solution, only the "most fitting" solution.

How big is one vertex

First, what's inside a vertex. In glTF, each vertex is made up of several attributes:

Attribute	Purpose	Default precision	Bytes per vertex
position	Vertex coordinates in space	3 × float32	12
normal	Determines lighting direction	3 × float32	12
tangent	Normal-map computation	4 × float32	16
texcoord_0 (UV)	Texture sampling coordinates	2 × float32	8
color (vertex color)	Per-vertex shading	4 × float32	16

A vertex with the full set of PBR attributes takes 48-64 bytes just for geometry data. A 100,000-vertex model is 5-6MB in vertices alone.

Notice almost everything here uses float32 (32-bit floats). That's the default, and it's also the attack surface for vertex compression—because the vast majority of attributes simply don't need 32-bit precision.

Weapon one: Quantization

Quantization is the underlying principle of all vertex compression; Draco and MeshOpt use it internally too.

Quantization (mapping high-precision floats to low-precision integers) boils down to this: for the float 3.14159265, remembering 3.14 is enough. For a set of coordinates within a space, instead of recording every decimal precisely on 32 bits, you use an integer with a smaller range to represent it.

Original:  position.x = 1.234567   (float32, 4 bytes)
Quantized: position.x = 1234       (int16,   2 bytes)  + a scale/offset to restore

Before vs after quantization:

Attribute	float32 bytes	Quantized (16-bit)	Savings
position	12	6	50%
normal	12	6 (or 4, using int8 + octahedral)	50-67%
tangent	16	4-8	50-75%
texcoord	8	4	50%

For that 48-64 byte vertex, quantization basically compresses it to 16-24 bytes, more than halving the size.

When to use quantization

You just want to reduce size and don't need the ultimate compression ratio
You want zero decoder dependencies—a quantized glTF uses the standard KHR_mesh_quantization extension, natively supported by mainstream engines, with no need to bring in an extra decoder library
Your target platform is sensitive to package size (e.g. WeChat Mini Programs, where bundling a Draco decoder costs tens of KB)

When not to use it

The model is tiny and detail is the selling point (e.g. millimeter-level industrial parts). Quantization betrays itself most on small models—textures may be fine, but a 0.1mm vertex displacement is visible in a close-up.

The real cost of precision loss: a jewelry showcase scene quantized a ring model to 16 bits, and the metal edge showed aliasing in close-ups. The cause wasn't too few vertices; the world-space scale was too small for 16-bit integers to express finely enough. The fix is to shrink the quantization range (reduce the bounding box of position) or bump up the bit depth for small models.

Weapon two: MeshOpt

MeshOpt is the official glTF extension EXT_meshopt_compression, positioned as "decent compression ratio, blazing-fast decoding."

What it does: first quantize the attributes (same as above), then apply a technique called entropy coding (lossless) to losslessly re-compress the quantized integers. In other words: lossy quantization + lossless entropy coding = smaller size, same quality as quantization alone.

Compression ratio: another 30-50% smaller than quantization alone
Decode speed: extremely fast, pure C/JS, tens of millions of vertices per second on a single thread
Decoder size: tiny (~20-30KB gzipped)
Compatibility: natively supported by Three.js and Babylon.js, a de facto web standard

When to use MeshOpt

You need a higher compression ratio but can't accept Draco's slower decoding
Web-first, mobile, WebXR—decode speed directly affects first-paint experience
The model is decompressed frequently (e.g. dynamically loaded levels)

When not to use it

Your target platform doesn't even recognize EXT_meshopt_compression (rare, old engines)
You only need "it runs" and don't care about a 30% difference—then plain quantization is simpler and has one less dependency

Weapon three: Draco

Draco is Google's compression solution, positioned for "ultimate compression ratio."

The fundamental difference from the other two: Draco changes the connectivity (topology) of vertices. Quantization only changes the numeric representation of each vertex; MeshOpt adds lossless coding on top; Draco reorganizes the triangle mesh and expresses "which vertices form triangles" more compactly.

Compression ratio: highest of the three, often 90%+ reduction on vertex-dense models
Decode speed: slowest of the three, but still fast in absolute terms
Decoder size: larger (~100-200KB, usually loaded as a separate wasm)
Quality: tunable, but at extreme ratios you'll see visible deformation

When to use Draco

Extremely large, vertex-super-dense models (million-vertex scans, terrain)
One-time load, reused for a long time after decoding (slower decode is acceptable)
Package size isn't the bottleneck, download speed is

When not to use it

Mobile + need a fast first paint—you have to download both the decoder and the model, which drags things out
Strict package-size environments like Mini Programs
Models that need skinned animation, morph targets—Draco's support for these is weak, and misconfiguration causes problems

All three side by side: a selection table

The compression ratios below reference community benchmarks (DeepKolos's tests + Reddit r/threejs discussions). Different models vary, but the relative relationships are stable:

Option	Compression ratio (vs float32)	Decode speed	Decoder size	Lossy?	glTF extension
Plain quantization	~50%	Native, no decode	0	Yes (precision)	`KHR_mesh_quantization`
MeshOpt	~25-35%	Extremely fast	~25KB	Yes (precision)	`EXT_meshopt_compression`
Draco	~10-20%	Fast (slowest of the three)	~100-200KB	Yes (precision + topology)	`KHR_draco_mesh_compression`

Decoders and platform compatibility:

Platform	Plain quantization	MeshOpt	Draco
Desktop web	✅ Native	✅ Native	✅ Needs decoder config
Mobile web	✅ Native	✅ Native	⚠️ Decoder is heavy
WebXR/VR	✅ Native	✅ Recommended	⚠️ Use with caution
WeChat Mini Program	✅ Recommended	✅ Recommended	❌ Avoid if possible

One-line summary: want it easy and dependency-free → plain quantization; want balanced → MeshOpt; want max ratio and can wait → Draco.

Hands-on: quantization and MeshOpt with gltfpack

gltfpack is the official glTF tool; one command handles quantization and MeshOpt.

First install (binaries from the gltfpack releases):

# Quantize model.glb to 16-bit and add MeshOpt compression
gltfpack -i model.glb -o model-packed.glb -cc

# -cc = compress (adds EXT_meshopt_compression on top of default quantization)

Common parameters:

# Quantize only, no MeshOpt (lightest, zero decoder dependency)
# gltfpack quantizes vertices to 16-bit (KHR_mesh_quantization) by default,
# so no extra flag is needed
gltfpack -i model.glb -o model-quant.glb

# Quantize and enable MeshOpt
gltfpack -i model.glb -o model-meshopt.glb -cc

# With very many vertices, you can also simplify (reduces vertex count, alters the model)
gltfpack -i model.glb -o model-simplify.glb -cc -si 0.5
# -si 0.5 means simplify to roughly 50% of vertices

About -cc: it's the "compress" switch that additionally applies EXT_meshopt_compression. Without -cc, gltfpack still quantizes by default—meaning gltfpack -i in.glb -o out.glb on its own is already "plain quantization, zero decoder dependency." (-v is the verbose logging flag—don't confuse them.)

Typical results (a 5MB, 120k-vertex PBR model, for reference only):

Treatment	File size	Notes
Original (float32)	5.0MB	Baseline
Plain quantization (default)	2.6MB	Halved, no visible difference
MeshOpt (`-cc`)	1.7MB	Another 35% off, slightly faster load

Note: -si simplification is a lossy operation that alters geometry; it's not the same thing as compression. Compression tries to preserve visual fidelity; simplification actively removes detail. The two can stack, but it depends on whether the scene allows it.

Common pitfalls

Normal direction changed after quantization: usually too-low precision. Use at least 16 bits for normals, or 8-bit octahedral encoding.
Materials lost after Draco decode: Draco only compresses meshes; materials and textures must be handled separately. At load time you must configure both the Draco decoder and the KHR extensions.
Draco won't load in a Mini Program: the decoder wasm is restricted in some runtimes; switching to MeshOpt usually fixes it.
Model "drifts" after quantization: when the model is far from the origin, 16-bit precision can't express both large coordinates and small detail. The fix is to move the model near the origin before quantizing, or increase the bit depth.

What's next

Vertices are compressed—don't celebrate too early. As noted, textures are 80% of a model's size. Next we switch battlefields and look at why traditional PNG/JPG is a "VRAM hog" in the GPU's eyes, and how GPU-native texture formats solve it.

Textures, the Glutton That Eats Your VRAM

Why does a 1.5MB JPG balloon to 87MB in VRAM? What's really wrong with PNG/JPG in the GPU's eyes? And how do GPU-native texture formats, Basis Universal, and KTX2 fit together?

Back to Blog

Slimming a Model, Lesson One: The Three Weapons of Vertex Compression

How big is one vertex

Weapon one: Quantization

When to use quantization

When not to use it

Weapon two: MeshOpt

When to use MeshOpt

When not to use it

Weapon three: Draco

When to use Draco

When not to use it

All three side by side: a selection table

Hands-on: quantization and MeshOpt with gltfpack

Common pitfalls

What's next

Related Tools

Model Compression

3D Model Converter

3D Model Viewer

3D Model Vertex Compression

GLTF to GLB

GLB Viewer

Textures, the Glutton That Eats Your VRAM