June 30, 2025·Any3D Team

Textures, the Glutton That Eats Your VRAM

3d-compressiontexture-compressionwebglwebgpu

Last time we halved the vertices and the model shrank a bit, but it didn't become a lightning bolt—because the real size hog was still there: textures. In a PBR model, textures usually take 80%+ of the size, and that's the part that bloats most in VRAM.

This article is the cure for texture's "VRAM gluttony." Three things: why PNG/JPG is guilty in the GPU's eyes; what GPU-native texture formats look like and why you can't use them directly; and how Basis Universal + KTX2 tie all three together.

Recap: why JPG makes VRAM explode

In the previous article we gave a formula:

VRAM usage = width * height * 4 bytes (RGBA) * 1.333 (with mipmaps)

A 4096×4096 texture, whether it's a 1.5MB JPG or an 8MB PNG on disk, becomes ~87MB in VRAM. There's one reason: the GPU doesn't understand JPG/PNG.

The GPU's texture sampler understands one thing: given a UV coordinate, read a color from a fixed-size block of pixels. It requires the texture to be "laid-out raw pixels" in VRAM. So before the browser uploads a JPG to the GPU, it must first fully decompress it into RGBA pixels on the CPU, then shove the whole block into VRAM.

This process has three problems:

VRAM explosion: the decompressed raw pixels are huge. 87MB isn't an exaggeration—it's what the formula computes.
Upload stall: moving a large block of pixels from CPU memory to GPU VRAM is a slow operation that blocks the first frame.
CPU decode cost: decoding a big image is itself time-consuming, especially on mobile.

Extending the "compressed sponge" metaphor from last time: PNG/JPG is a sponge squeezed flat for easy transport; once on the GPU, the sponge soaks up water and expands back to full size. Download got faster; VRAM didn't save a thing.

The GPU's own texture formats: compressed in VRAM by nature

Since the GPU won't accept pre-compressed PNG, can we keep the texture compressed even inside VRAM? The GPU decodes a single pixel block on the fly during sampling, at almost no cost.

That's exactly what GPU-native texture formats do. Representative families:

Format family	Full name	Main platforms	Notes
BC1-7	Block Compression	Desktop (PC, Mac)	Veteran, 4×4 pixel-block compression per generation
ETC1/2	Ericsson Texture Compression	Mobile (older Android/iOS)	Old mobile standard
ASTC	Adaptive Scalable Texture Compression	Mobile/VR (newer devices)	Flexible, best quality, per-block tunable
PVRTC	PowerVR	Older iOS	Being phased out by ASTC

These formats share one trait: textures are stored compressed in 4×4 pixel blocks (blocks), and the GPU decodes a small block on demand during sampling—what comes out isn't a single pixel but a whole block. The benefit is VRAM usage shrinks by a fixed ratio regardless of content.

Compare:

	PNG/JPG (traditional)	GPU-native formats
Disk size	Small (JPG especially)	Medium (block-compressed, fixed bitrate)
VRAM usage	Large (decompressed to raw pixels)	Small (block-compressed, resident)
Upload to GPU	Slow (CPU decode + big transfer)	Fast (just move it, no decode)
Sampling speed	Fast (already raw pixels)	Fast (hardware real-time decode)

GPU formats look like the perfect solution. So why can't we just use them?

Here's the problem: different devices recognize different formats

This is the biggest pitfall of GPU texture formats—fragmentation.

Desktop PCs recognize BC1-7, not ASTC
Android phones recognize ETC2/ASTC, mostly not BC
iOS (A7+) recognizes ASTC, older devices PVRTC
WebGPU/WebGL ride on the same hardware capabilities behind the device

If you want one texture to "exist as a GPU-native format on every device," you have to prepare a separate copy for each platform. One product shipping desktop + Android + iOS means the same texture needs BC + ETC2/ASTC—three versions. Package triples, engineering effort triples.

Worse, on the web you have no idea what device the user opens your page with. Pre-generating every format is unrealistic, and runtime detection comes too late.

Basis Universal: encode once, transcode everywhere

Basis Universal (Basis for short) was born to solve this fragmentation. Its idea in one sentence:

First encode the texture into an "intermediate format," then at runtime transcode it into the corresponding native format based on the current device's GPU capabilities.

Transcoding flow (schematic):

Source texture (PNG/JPG)
      │  one-time offline encode (slow, done once)
      ▼
Basis intermediate format (ETC1S or UASTC)
      │  packed into a KTX2 container
      ▼
Publish to web ──┬── Desktop GPU ──→ runtime transcode → BC1/3/7
                ├── Android ────→ runtime transcode → ETC2
                └── iOS/VR ─────→ runtime transcode → ASTC

Key points:

Offline encoding happens once, yielding a compact intermediate representation
Runtime transcoding is very fast (pure compute, a few milliseconds), and it transcodes block formats—no per-pixel decompression needed
What enters VRAM after transcoding is a real GPU-native format, so VRAM usage is computed at block-compression rates, identical to GPU-native formats

Basis offers two intermediate encoding modes; the next article unpacks them, but remember the names:

ETC1S: extremely high compression ratio, suited for diffuse/albedo and other color maps
UASTC: higher quality, suited for normals and other precision-sensitive maps

KTX2: the standard container for GPU textures

There's still an engineering question: where do the encoded Basis data go, how are they marked, and how do they relate to glTF? The answer is KTX2.

KTX2 (Khronos Texture 2) is not another image format—it's a container format. Just as a .zip doesn't care whether it holds documents or images, KTX2 only packages GPU texture data (including Basis-encoded) into a standard structure with metadata (format, mipmap levels, color space, etc.).

In glTF, KTX2 plugs in via the KHR_texture_basisu extension: the texture is no longer a PNG file but a KTX2 file containing Basis encoding. At load time the engine detects device capabilities and transcodes to the matching BC/ETC/ASTC.

Let's untangle the three roles—don't confuse them:

Name	Role	Analogy
Basis Universal	Encoding scheme (how to compress a texture into the intermediate format)	A "compression algorithm"
KTX2	Container format (how to package the encoded data)	A "box"
KHR_texture_basisu	glTF extension (tells the engine this is a Basis texture)	A "label"

A KTX2 file can hold Basis encoding (cross-platform) or a native format (e.g. raw BC7). On the web, 99% of the time it holds Basis, because what we want is "encode once, transcode everywhere."

VRAM example: a 4096 texture compared

Stacking the formula and GPU formats together, here's the real footprint of a 4096×4096 RGBA texture under different options:

Option	Disk size	VRAM usage (with mipmaps)	Upload speed	Cross-platform
PNG	~8MB	~87MB	Slow (needs decode)	✅
JPG	~1.5MB	~87MB	Slow (needs decode)	✅
WebP	~2MB	~87MB	Slow (needs decode)	✅
KTX2 (ETC1S)	~2-3MB	~11-14MB	Fast	✅ (transcode)
KTX2 (UASTC)	~6-8MB	~22MB	Fast	✅ (transcode)

Where the VRAM numbers come from: GPU block compression usually counts at 4bpp (4 bits per pixel) or 8bpp. 4096×4096 at 4bpp is about 8MB, ×1.333 with mipmaps ≈ 11MB. UASTC mostly transcodes to 8bpp, so about 22MB.

The point isn't the exact number in any one row, but these two:

Traditional formats (PNG/JPG/WebP) have nearly identical VRAM usage—all raw decompressed pixels, 87MB. No matter how small on disk, VRAM isn't saved.
KTX2 drops VRAM to 1/4 to 1/8, and the disk size is competitive too.

This is why VR and mobile web almost always go with KTX2—how many 87MB textures fit in a phone's 2GB VRAM? At 11MB, you can fit seven.

Platform support matrix: which GPUs recognize which formats

Basis shields us from the details, but understanding the underlying mapping helps with troubleshooting. Here's current mainstream device support for native formats:

Platform / device	BC1-7	ETC2	ASTC	PVRTC
Desktop PC (D3D11/12, Vulkan, WebGPU)	✅	❌	Partial (newer GPUs)	❌
macOS (Metal)	✅ (newer machines)	❌	✅	❌
Android (mainstream)	❌	✅	✅	❌
iOS (A8+)	❌	✅	✅	✅ (older devices)
WebGL 2	Depends on extensions	✅	Partial	❌
WebGPU	✅ (desktop)	✅	✅ (device-dependent)	❌

Basis probes these capabilities at runtime and transcodes the same intermediate encoding into the best match. That's why this Basis layer is nearly irreplaceable on the web—you can't predict the user's device before publishing.

Upload flow compared: traditional vs GPU formats

Finally, fix the difference in one flow diagram.

Traditional PNG/JPG:

PNG file ──download──> CPU memory ──CPU decode (slow)──> RGBA pixel block ──upload (big, slow)──> VRAM (87MB)

KTX2 + Basis:

KTX2 file ──download──> CPU memory ──runtime transcode (fast)──> GPU block format ──upload (small, fast)──> VRAM (11MB)

The latter drops the big "per-pixel CPU decode" step, and the uploaded data is an order of magnitude smaller. Faster first frame, less VRAM—that's the core value of this approach.

What's next

Theory done; next article is hands-on. We'll use toktx and gltf-transform to actually compress textures to KTX2, load them in Three.js / Babylon.js, and discuss how to choose ETC1S vs UASTC and how to tune compression parameters.

KTX2 in Practice: The Right Way to Do Texture Compression

ETC1S or UASTC? How do you use toktx and gltf-transform? How do you load KTX2 in Three.js / Babylon.js? This one is all copy-pasteable commands and code.

Back to Blog

Textures, the Glutton That Eats Your VRAM

Recap: why JPG makes VRAM explode

The GPU's own texture formats: compressed in VRAM by nature

Here's the problem: different devices recognize different formats

Basis Universal: encode once, transcode everywhere

KTX2: the standard container for GPU textures

VRAM example: a 4096 texture compared

Platform support matrix: which GPUs recognize which formats

Upload flow compared: traditional vs GPU formats

What's next

Related Tools

Model Compression

3D Model Converter

3D Model Viewer

3D Model Vertex Compression

GLTF to GLB

GLB Viewer

KTX2 in Practice: The Right Way to Do Texture Compression