Textures, the Glutton That Eats Your VRAM
Last time we halved the vertices and the model shrank a bit, but it didn't become a lightning bolt—because the real size hog was still there: textures. In a PBR model, textures usually take 80%+ of the size, and that's the part that bloats most in VRAM.
This article is the cure for texture's "VRAM gluttony." Three things: why PNG/JPG is guilty in the GPU's eyes; what GPU-native texture formats look like and why you can't use them directly; and how Basis Universal + KTX2 tie all three together.
Recap: why JPG makes VRAM explode
In the previous article we gave a formula:
VRAM usage = width * height * 4 bytes (RGBA) * 1.333 (with mipmaps)
A 4096×4096 texture, whether it's a 1.5MB JPG or an 8MB PNG on disk, becomes ~87MB in VRAM. There's one reason: the GPU doesn't understand JPG/PNG.
The GPU's texture sampler understands one thing: given a UV coordinate, read a color from a fixed-size block of pixels. It requires the texture to be "laid-out raw pixels" in VRAM. So before the browser uploads a JPG to the GPU, it must first fully decompress it into RGBA pixels on the CPU, then shove the whole block into VRAM.
This process has three problems:
- VRAM explosion: the decompressed raw pixels are huge. 87MB isn't an exaggeration—it's what the formula computes.
- Upload stall: moving a large block of pixels from CPU memory to GPU VRAM is a slow operation that blocks the first frame.
- CPU decode cost: decoding a big image is itself time-consuming, especially on mobile.
Extending the "compressed sponge" metaphor from last time: PNG/JPG is a sponge squeezed flat for easy transport; once on the GPU, the sponge soaks up water and expands back to full size. Download got faster; VRAM didn't save a thing.
The GPU's own texture formats: compressed in VRAM by nature
Since the GPU won't accept pre-compressed PNG, can we keep the texture compressed even inside VRAM? The GPU decodes a single pixel block on the fly during sampling, at almost no cost.
That's exactly what GPU-native texture formats do. Representative families:
| Format family | Full name | Main platforms | Notes |
|---|---|---|---|
| BC1-7 | Block Compression | Desktop (PC, Mac) | Veteran, 4×4 pixel-block compression per generation |
| ETC1/2 | Ericsson Texture Compression | Mobile (older Android/iOS) | Old mobile standard |
| ASTC | Adaptive Scalable Texture Compression | Mobile/VR (newer devices) | Flexible, best quality, per-block tunable |
| PVRTC | PowerVR | Older iOS | Being phased out by ASTC |
These formats share one trait: textures are stored compressed in 4×4 pixel blocks (blocks), and the GPU decodes a small block on demand during sampling—what comes out isn't a single pixel but a whole block. The benefit is VRAM usage shrinks by a fixed ratio regardless of content.
Compare:
| PNG/JPG (traditional) | GPU-native formats | |
|---|---|---|
| Disk size | Small (JPG especially) | Medium (block-compressed, fixed bitrate) |
| VRAM usage | Large (decompressed to raw pixels) | Small (block-compressed, resident) |
| Upload to GPU | Slow (CPU decode + big transfer) | Fast (just move it, no decode) |
| Sampling speed | Fast (already raw pixels) | Fast (hardware real-time decode) |
GPU formats look like the perfect solution. So why can't we just use them?
Here's the problem: different devices recognize different formats
This is the biggest pitfall of GPU texture formats—fragmentation.
- Desktop PCs recognize BC1-7, not ASTC
- Android phones recognize ETC2/ASTC, mostly not BC
- iOS (A7+) recognizes ASTC, older devices PVRTC
- WebGPU/WebGL ride on the same hardware capabilities behind the device
If you want one texture to "exist as a GPU-native format on every device," you have to prepare a separate copy for each platform. One product shipping desktop + Android + iOS means the same texture needs BC + ETC2/ASTC—three versions. Package triples, engineering effort triples.
Worse, on the web you have no idea what device the user opens your page with. Pre-generating every format is unrealistic, and runtime detection comes too late.
Basis Universal: encode once, transcode everywhere
Basis Universal (Basis for short) was born to solve this fragmentation. Its idea in one sentence:
First encode the texture into an "intermediate format," then at runtime transcode it into the corresponding native format based on the current device's GPU capabilities.
Transcoding flow (schematic):
Source texture (PNG/JPG)
│ one-time offline encode (slow, done once)
▼
Basis intermediate format (ETC1S or UASTC)
│ packed into a KTX2 container
▼
Publish to web ──┬── Desktop GPU ──→ runtime transcode → BC1/3/7
├── Android ────→ runtime transcode → ETC2
└── iOS/VR ─────→ runtime transcode → ASTC
Key points:
- Offline encoding happens once, yielding a compact intermediate representation
- Runtime transcoding is very fast (pure compute, a few milliseconds), and it transcodes block formats—no per-pixel decompression needed
- What enters VRAM after transcoding is a real GPU-native format, so VRAM usage is computed at block-compression rates, identical to GPU-native formats
Basis offers two intermediate encoding modes; the next article unpacks them, but remember the names:
- ETC1S: extremely high compression ratio, suited for diffuse/albedo and other color maps
- UASTC: higher quality, suited for normals and other precision-sensitive maps
KTX2: the standard container for GPU textures
There's still an engineering question: where do the encoded Basis data go, how are they marked, and how do they relate to glTF? The answer is KTX2.
KTX2 (Khronos Texture 2) is not another image format—it's a container format. Just as a .zip doesn't care whether it holds documents or images, KTX2 only packages GPU texture data (including Basis-encoded) into a standard structure with metadata (format, mipmap levels, color space, etc.).
In glTF, KTX2 plugs in via the KHR_texture_basisu extension: the texture is no longer a PNG file but a KTX2 file containing Basis encoding. At load time the engine detects device capabilities and transcodes to the matching BC/ETC/ASTC.
Let's untangle the three roles—don't confuse them:
| Name | Role | Analogy |
|---|---|---|
| Basis Universal | Encoding scheme (how to compress a texture into the intermediate format) | A "compression algorithm" |
| KTX2 | Container format (how to package the encoded data) | A "box" |
| KHR_texture_basisu | glTF extension (tells the engine this is a Basis texture) | A "label" |
A KTX2 file can hold Basis encoding (cross-platform) or a native format (e.g. raw BC7). On the web, 99% of the time it holds Basis, because what we want is "encode once, transcode everywhere."
VRAM example: a 4096 texture compared
Stacking the formula and GPU formats together, here's the real footprint of a 4096×4096 RGBA texture under different options:
| Option | Disk size | VRAM usage (with mipmaps) | Upload speed | Cross-platform |
|---|---|---|---|---|
| PNG | ~8MB | ~87MB | Slow (needs decode) | ✅ |
| JPG | ~1.5MB | ~87MB | Slow (needs decode) | ✅ |
| WebP | ~2MB | ~87MB | Slow (needs decode) | ✅ |
| KTX2 (ETC1S) | ~2-3MB | ~11-14MB | Fast | ✅ (transcode) |
| KTX2 (UASTC) | ~6-8MB | ~22MB | Fast | ✅ (transcode) |
Where the VRAM numbers come from: GPU block compression usually counts at 4bpp (4 bits per pixel) or 8bpp. 4096×4096 at 4bpp is about 8MB, ×1.333 with mipmaps ≈ 11MB. UASTC mostly transcodes to 8bpp, so about 22MB.
The point isn't the exact number in any one row, but these two:
- Traditional formats (PNG/JPG/WebP) have nearly identical VRAM usage—all raw decompressed pixels, 87MB. No matter how small on disk, VRAM isn't saved.
- KTX2 drops VRAM to 1/4 to 1/8, and the disk size is competitive too.
This is why VR and mobile web almost always go with KTX2—how many 87MB textures fit in a phone's 2GB VRAM? At 11MB, you can fit seven.
Platform support matrix: which GPUs recognize which formats
Basis shields us from the details, but understanding the underlying mapping helps with troubleshooting. Here's current mainstream device support for native formats:
| Platform / device | BC1-7 | ETC2 | ASTC | PVRTC |
|---|---|---|---|---|
| Desktop PC (D3D11/12, Vulkan, WebGPU) | ✅ | ❌ | Partial (newer GPUs) | ❌ |
| macOS (Metal) | ✅ (newer machines) | ❌ | ✅ | ❌ |
| Android (mainstream) | ❌ | ✅ | ✅ | ❌ |
| iOS (A8+) | ❌ | ✅ | ✅ | ✅ (older devices) |
| WebGL 2 | Depends on extensions | ✅ | Partial | ❌ |
| WebGPU | ✅ (desktop) | ✅ | ✅ (device-dependent) | ❌ |
Basis probes these capabilities at runtime and transcodes the same intermediate encoding into the best match. That's why this Basis layer is nearly irreplaceable on the web—you can't predict the user's device before publishing.
Upload flow compared: traditional vs GPU formats
Finally, fix the difference in one flow diagram.
Traditional PNG/JPG:
PNG file ──download──> CPU memory ──CPU decode (slow)──> RGBA pixel block ──upload (big, slow)──> VRAM (87MB)
KTX2 + Basis:
KTX2 file ──download──> CPU memory ──runtime transcode (fast)──> GPU block format ──upload (small, fast)──> VRAM (11MB)
The latter drops the big "per-pixel CPU decode" step, and the uploaded data is an order of magnitude smaller. Faster first frame, less VRAM—that's the core value of this approach.
What's next
Theory done; next article is hands-on. We'll use toktx and gltf-transform to actually compress textures to KTX2, load them in Three.js / Babylon.js, and discuss how to choose ETC1S vs UASTC and how to tune compression parameters.