Any3DAny3D
·Any3D Team

Slimming a Model, Lesson One: The Three Weapons of Vertex Compression

3d-compressionvertex-compressionmeshoptdracogltf

In the previous article we cracked open a GLB file and saw textures eat 80% of the size while vertices take only 10-20%. So is vertex compression irrelevant?

Quite the opposite. When a model's textures are already compressed to KTX2 and the vertices are dense, the remaining 20% is vertices—and that 20% can be cut in half, or even by 90%. More importantly, vertex compression is one of the few optimizations that is almost free and works instantly: add a few commands, swap a decoder, and the file is thinner.

This article clarifies three things: what vertex data actually looks like; the temperament of each of the three approaches (quantization, MeshOpt, Draco); and a conclusion that will save you from pitfalls—there is no "best" solution, only the "most fitting" solution.

How big is one vertex

First, what's inside a vertex. In glTF, each vertex is made up of several attributes:

AttributePurposeDefault precisionBytes per vertex
positionVertex coordinates in space3 × float3212
normalDetermines lighting direction3 × float3212
tangentNormal-map computation4 × float3216
texcoord_0 (UV)Texture sampling coordinates2 × float328
color (vertex color)Per-vertex shading4 × float3216

A vertex with the full set of PBR attributes takes 48-64 bytes just for geometry data. A 100,000-vertex model is 5-6MB in vertices alone.

Notice almost everything here uses float32 (32-bit floats). That's the default, and it's also the attack surface for vertex compression—because the vast majority of attributes simply don't need 32-bit precision.

Weapon one: Quantization

Quantization is the underlying principle of all vertex compression; Draco and MeshOpt use it internally too.

Quantization (mapping high-precision floats to low-precision integers) boils down to this: for the float 3.14159265, remembering 3.14 is enough. For a set of coordinates within a space, instead of recording every decimal precisely on 32 bits, you use an integer with a smaller range to represent it.

Original:  position.x = 1.234567   (float32, 4 bytes)
Quantized: position.x = 1234       (int16,   2 bytes)  + a scale/offset to restore

Before vs after quantization:

Attributefloat32 bytesQuantized (16-bit)Savings
position12650%
normal126 (or 4, using int8 + octahedral)50-67%
tangent164-850-75%
texcoord8450%

For that 48-64 byte vertex, quantization basically compresses it to 16-24 bytes, more than halving the size.

When to use quantization

  • You just want to reduce size and don't need the ultimate compression ratio
  • You want zero decoder dependencies—a quantized glTF uses the standard KHR_mesh_quantization extension, natively supported by mainstream engines, with no need to bring in an extra decoder library
  • Your target platform is sensitive to package size (e.g. WeChat Mini Programs, where bundling a Draco decoder costs tens of KB)

When not to use it

  • The model is tiny and detail is the selling point (e.g. millimeter-level industrial parts). Quantization betrays itself most on small models—textures may be fine, but a 0.1mm vertex displacement is visible in a close-up.

The real cost of precision loss: a jewelry showcase scene quantized a ring model to 16 bits, and the metal edge showed aliasing in close-ups. The cause wasn't too few vertices; the world-space scale was too small for 16-bit integers to express finely enough. The fix is to shrink the quantization range (reduce the bounding box of position) or bump up the bit depth for small models.

Weapon two: MeshOpt

MeshOpt is the official glTF extension EXT_meshopt_compression, positioned as "decent compression ratio, blazing-fast decoding."

What it does: first quantize the attributes (same as above), then apply a technique called entropy coding (lossless) to losslessly re-compress the quantized integers. In other words: lossy quantization + lossless entropy coding = smaller size, same quality as quantization alone.

  • Compression ratio: another 30-50% smaller than quantization alone
  • Decode speed: extremely fast, pure C/JS, tens of millions of vertices per second on a single thread
  • Decoder size: tiny (~20-30KB gzipped)
  • Compatibility: natively supported by Three.js and Babylon.js, a de facto web standard

When to use MeshOpt

  • You need a higher compression ratio but can't accept Draco's slower decoding
  • Web-first, mobile, WebXR—decode speed directly affects first-paint experience
  • The model is decompressed frequently (e.g. dynamically loaded levels)

When not to use it

  • Your target platform doesn't even recognize EXT_meshopt_compression (rare, old engines)
  • You only need "it runs" and don't care about a 30% difference—then plain quantization is simpler and has one less dependency

Weapon three: Draco

Draco is Google's compression solution, positioned for "ultimate compression ratio."

The fundamental difference from the other two: Draco changes the connectivity (topology) of vertices. Quantization only changes the numeric representation of each vertex; MeshOpt adds lossless coding on top; Draco reorganizes the triangle mesh and expresses "which vertices form triangles" more compactly.

  • Compression ratio: highest of the three, often 90%+ reduction on vertex-dense models
  • Decode speed: slowest of the three, but still fast in absolute terms
  • Decoder size: larger (~100-200KB, usually loaded as a separate wasm)
  • Quality: tunable, but at extreme ratios you'll see visible deformation

When to use Draco

  • Extremely large, vertex-super-dense models (million-vertex scans, terrain)
  • One-time load, reused for a long time after decoding (slower decode is acceptable)
  • Package size isn't the bottleneck, download speed is

When not to use it

  • Mobile + need a fast first paint—you have to download both the decoder and the model, which drags things out
  • Strict package-size environments like Mini Programs
  • Models that need skinned animation, morph targets—Draco's support for these is weak, and misconfiguration causes problems

All three side by side: a selection table

The compression ratios below reference community benchmarks (DeepKolos's tests + Reddit r/threejs discussions). Different models vary, but the relative relationships are stable:

OptionCompression ratio (vs float32)Decode speedDecoder sizeLossy?glTF extension
Plain quantization~50%Native, no decode0Yes (precision)KHR_mesh_quantization
MeshOpt~25-35%Extremely fast~25KBYes (precision)EXT_meshopt_compression
Draco~10-20%Fast (slowest of the three)~100-200KBYes (precision + topology)KHR_draco_mesh_compression

Decoders and platform compatibility:

PlatformPlain quantizationMeshOptDraco
Desktop web✅ Native✅ Native✅ Needs decoder config
Mobile web✅ Native✅ Native⚠️ Decoder is heavy
WebXR/VR✅ Native✅ Recommended⚠️ Use with caution
WeChat Mini Program✅ Recommended✅ Recommended❌ Avoid if possible

One-line summary: want it easy and dependency-free → plain quantization; want balanced → MeshOpt; want max ratio and can wait → Draco.

Hands-on: quantization and MeshOpt with gltfpack

gltfpack is the official glTF tool; one command handles quantization and MeshOpt.

First install (binaries from the gltfpack releases):

# Quantize model.glb to 16-bit and add MeshOpt compression
gltfpack -i model.glb -o model-packed.glb -cc

# -cc = compress (adds EXT_meshopt_compression on top of default quantization)

Common parameters:

# Quantize only, no MeshOpt (lightest, zero decoder dependency)
# gltfpack quantizes vertices to 16-bit (KHR_mesh_quantization) by default,
# so no extra flag is needed
gltfpack -i model.glb -o model-quant.glb

# Quantize and enable MeshOpt
gltfpack -i model.glb -o model-meshopt.glb -cc

# With very many vertices, you can also simplify (reduces vertex count, alters the model)
gltfpack -i model.glb -o model-simplify.glb -cc -si 0.5
# -si 0.5 means simplify to roughly 50% of vertices

About -cc: it's the "compress" switch that additionally applies EXT_meshopt_compression. Without -cc, gltfpack still quantizes by default—meaning gltfpack -i in.glb -o out.glb on its own is already "plain quantization, zero decoder dependency." (-v is the verbose logging flag—don't confuse them.)

Typical results (a 5MB, 120k-vertex PBR model, for reference only):

TreatmentFile sizeNotes
Original (float32)5.0MBBaseline
Plain quantization (default)2.6MBHalved, no visible difference
MeshOpt (-cc)1.7MBAnother 35% off, slightly faster load

Note: -si simplification is a lossy operation that alters geometry; it's not the same thing as compression. Compression tries to preserve visual fidelity; simplification actively removes detail. The two can stack, but it depends on whether the scene allows it.

Common pitfalls

  • Normal direction changed after quantization: usually too-low precision. Use at least 16 bits for normals, or 8-bit octahedral encoding.
  • Materials lost after Draco decode: Draco only compresses meshes; materials and textures must be handled separately. At load time you must configure both the Draco decoder and the KHR extensions.
  • Draco won't load in a Mini Program: the decoder wasm is restricted in some runtimes; switching to MeshOpt usually fixes it.
  • Model "drifts" after quantization: when the model is far from the origin, 16-bit precision can't express both large coordinates and small detail. The fix is to move the model near the origin before quantizing, or increase the bit depth.

What's next

Vertices are compressed—don't celebrate too early. As noted, textures are 80% of a model's size. Next we switch battlefields and look at why traditional PNG/JPG is a "VRAM hog" in the GPU's eyes, and how GPU-native texture formats solve it.

Support Us