DirectStorage has been in a strange place for three years. Microsoft shipped the API in 2022, the Windows storage stack caught up in 2023, a handful of flagship titles demonstrated it in 2024, and then the conversation largely stalled. DirectStorage 1.1 gave us GPU decompression via GDeflate, but the UE integration was partial, the packaging story was messy, and most studios decided the ROI was not there yet for PC-only work.
DirectStorage 2.0, paired with UE5.7, changes that calculus. The queueing model is cleaner, the CPU fallback is genuinely competitive for mid-range hardware, the BC7 texture path has been plumbed through, and — importantly — the .pak/.utoc/.ucas format in UE5.7 can now be read directly by the DirectStorage runtime without Unreal having to serialize through its own IO subsystem first. That was the blocker that made early adoption painful. It is gone.
This is a practical guide. What you get, what it costs you in pipeline complexity, and how to structure your game to actually realize the load-time gains on a range of hardware. Numbers are from a test project: a 12 GB open-world game built on UE5.7.1, tested on four hardware configs.
What is actually new in DirectStorage 2.0
Three things matter.
GPU decompression, for real. DirectStorage 1.1 had GPU decompression on paper. In practice, the handoff between the DirectStorage runtime and Unreal's loading system forced a CPU round-trip for metadata and texture header fixups. The net benefit on UE projects was often a wash or a small regression versus CPU decompression with a fast NVMe. DirectStorage 2.0 restructures this so the decompression completes directly into the destination GPU resource with no CPU touch beyond initial dispatch. On a 4070, this yields a genuine 3–5× throughput improvement over CPU path for BC7 asset loads.
Better queueing and priority. The 1.x queue model was effectively FIFO with a single priority. Good for streaming texture data, bad for the real workload, which is a mix of "critical cinematic asset, load now" and "distant LOD, load eventually." 2.0 adds proper multi-queue with priority, cancellation of in-flight requests, and query APIs for how full the queue is. This matters because it means you can stop building custom schedulers on top of DirectStorage and instead describe priorities to the runtime.
Integration with DirectX 12's Enhanced Barrier API. Small but real. Means the transition from a streamed-in resource to a used resource is one barrier instead of a sequence. Saves a fractional ms per streamed batch and eliminates a class of race-condition bugs that plagued DirectStorage 1.x shipping titles.
What is not new, but worth restating: DirectStorage has always been Windows 10 1909+. It does not require Windows 11, and the "Windows 11 only" myth refuses to die. It does not require a specific NVMe model, though it does benefit enormously from one. It does require DirectX 12 feature level 12.0 or higher.
The UE5.7 integration state
Honest assessment: it is good. Not perfect, but good enough to plan a ship around.
What works out of the box:
.pakand.utoc/.ucasarchives built withbDirectStorage=truein your.iniare read directly by the DS runtime.- Async loading requests from
FStreamableManagerroute through DS when available, with CPU fallback when not. - Texture streaming (via the texture streaming pool) uses GPU decompression when the texture format is BC1/3/5/6H/7 and the platform supports it.
- Nanite streaming pages go through DS with GPU decompression. This is the most visible win.
- Virtual Texture streaming uses DS. Less visible, but real.
What does not yet work without engine modifications:
- Level streaming (sub-level loading) does not currently route its package reads through DS. It still uses Unreal's normal loader. Fix is coming in 5.7.2 according to the roadmap; until then, open-world World Partition gets partial benefit only.
- Bulk data reads for skeletal mesh LODs are not DS-aware.
- Sound bank streaming still uses the platform's normal async read path.
The practical consequence: most of your load-time improvement comes from textures and Nanite geometry, which is also where most of your loading time was spent anyway. World Partition streaming is the remaining weak spot, and it is being actively worked on.
The numbers
Test scene: a 12 GB packaged open-world build. Cold boot, unhashed cache. Measuring from splash-to-playable.
| Hardware | UE5.6 (no DS) | UE5.7 (DS off) | UE5.7 (DS on) | DS improvement |
|---|---|---|---|---|
| Gen5 NVMe + RTX 4090 | 8.4s | 8.1s | 3.2s | 2.5× |
| Gen4 NVMe + RTX 4070 | 11.2s | 10.8s | 4.6s | 2.3× |
| Gen3 NVMe + RTX 3060 | 14.9s | 14.2s | 7.8s | 1.8× |
| SATA SSD + GTX 1660 | 22.1s | 21.6s | 15.4s | 1.4× |
In-game streaming hitch measurements (average ms stall during cell load in continuous play):
| Hardware | UE5.6 | UE5.7 (DS off) | UE5.7 (DS on) |
|---|---|---|---|
| Gen5 + 4090 | 42 ms | 38 ms | 11 ms |
| Gen4 + 4070 | 58 ms | 52 ms | 19 ms |
| Gen3 + 3060 | 89 ms | 82 ms | 41 ms |
| SATA + 1660 | 140 ms | 128 ms | 98 ms |
Two takeaways. First, the improvements are large but not uniform — they scale with both storage speed and GPU decompression capability. On a SATA SSD the bottleneck is the drive itself, and DS helps less. Second, the streaming hitch reduction is arguably more important than the initial load time. A 10 ms hitch is invisible to most players; a 50 ms hitch is a visible stutter. DS moves the 4070 tier from "visible" to "invisible."
Hardware requirements, clearly
The requirements conversation is muddled by marketing. Here is the actual breakdown.
To use DirectStorage 2.0 at all:
- Windows 10 version 1909 or later, or any Windows 11.
- DirectX 12 capable GPU, any vendor.
- An NVMe SSD is strongly recommended but not required. SATA SSDs work, HDDs do not.
To get GPU decompression (the big win):
- A GPU supporting DirectX 12 Ultimate (feature level 12_2), OR a GPU with a validated GPU decompression path.
- NVIDIA: GTX 1060 and above (Pascal+). Yes, Pascal.
- AMD: RX 5000 and above (RDNA and later).
- Intel: Arc, all models.
- A Gen3 or faster NVMe drive. SATA SSDs with GPU decompression enabled actually perform worse because the decompression workload stalls waiting for data.
To get the full advertised performance:
- Gen4 NVMe or better.
- Modern CPU (the DS runtime is single-threaded for dispatch; a weak CPU bottlenecks it).
- 16 GB system RAM minimum. 32 GB if you also want the OS file cache to work well.
Fallbacks:
- No DS support: BC decompression falls back to Unreal's normal pipeline. No regression, no gain.
- DS supported but no GPU decompression: CPU decompression path. Still gets the better queueing benefits; about 30% of the full DS benefit.
- DS supported, GPU decompression, but slow storage: you'll get some benefit but not much. This is the "SATA SSD + 1660" line in the table above.
For shipping PC games in 2026, we recommend enabling DS unconditionally and letting the runtime pick the best available path. The fallbacks are genuinely graceful now. DS 1.x had some nasty failure modes on weird hardware; 2.0 has not surfaced any in our testing across a reasonable variety of machines.
Packaging your .pak files for DirectStorage
This is where the pipeline complexity lives. Default UE packaging produces .utoc/.ucas archives that DS 2.0 can read, but you will not get the GPU decompression benefit unless you explicitly configure compression for it.
In DefaultGame.ini:
[/Script/UnrealEd.ProjectPackagingSettings]
bUseZenStore=True
bUseIoStore=True
bCompressed=True
PackageCompressionMethod=Oodle
PackageCompressionFormatCommandLineOverride=Oodle
PackageCompressionLevel_DebugDevelopment=5
PackageCompressionLevel_TestShipping=7
PackageCompressionLevel_Distribution=9
[/Script/Engine.StreamingSettings]
s.AsyncLoadingThreadEnabled=True
s.EventDrivenLoaderEnabled=True
s.DirectStorage.Enabled=True
s.DirectStorage.GPUDecompression=True
s.DirectStorage.QueueDepth=1024
Two notes. Oodle Kraken is the recommended compression format — it has a dedicated GDeflate-compatible variant that DS can decompress on the GPU. If you are still on zlib from a pre-Oodle UE license, move. The compression ratio and speed are dramatically better and RAD/Epic ship Oodle with UE now.
QueueDepth is the maximum number of in-flight IO requests. 1024 is fine for most games. Bump to 2048 for projects with many small assets (tile-based games, lots of UI atlases).
For the actual packaging pipeline, no changes to your build machines. UnrealAutomationTool BuildCookRun -package produces DS-compatible archives with the above settings. You can verify by checking the generated .utoc header — DS-ready archives have a version ≥ 9 and a compression format flag of Oodle or GDeflate.
Open-world loading workflows
World Partition was designed around Unreal's normal streaming path. DS does not plug into it cleanly in 5.7. The workflow we recommend for open-world games:
Initial load: Route through DS. This is where you get the huge win — the "splash to playable" time. All root world assets, startup levels, initial character, initial textures load via the streaming path and benefit from DS.
WP cell loading during play: Partially benefits. Textures and Nanite geometry inside a cell load via DS because they are referenced as streamed assets. The package loading itself (the .umap and its linker) does not, until 5.7.2.
Texture streaming pool updates: Fully DS. This is actually most of the in-game IO cost, so you still get the hitch reduction.
Nanite streaming: Fully DS. Huge win for detail-heavy scenes.
The net: open-world games see most of the benefit, just not all of it. If you are targeting a 2026/2027 PC release and the 5.7.2 WP fix lands as scheduled, full benefit.
The CPU fallback, more carefully
A lot of older advice says "DS CPU fallback is not worth it, just use the normal loader." That was true in 1.x. It is less true in 2.0 because the queueing improvements are real even without GPU decompression.
Measured on a Ryzen 5 3600 + GTX 1660 Super + Gen3 NVMe — the "minimum spec" test machine:
| Scenario | DS off | DS on, GPU decomp off | DS on, GPU decomp on |
|---|---|---|---|
| Cold boot | 19.8s | 16.4s | 13.1s |
| Mid-game streaming hitch | 118 ms | 89 ms | 52 ms |
The "DS on, GPU decomp off" column is better than "DS off" across the board. Why? Because the 2.0 queueing model does a better job scheduling reads against a consumer NVMe's NCQ. Even on 1660-class hardware, the GPU decompression path works — just more slowly.
Our recommendation: ship with DS on, GPU decompression auto-negotiated. Let the runtime pick.
Things that break, and how to fix them
The .pak mod pipeline. If you have players who mod your game via .pak patching, the DS-optimized .utoc format is not compatible with the old PakManager workflow. You will need to provide either (a) a mod SDK that produces DS-compatible archives or (b) a non-DS mod path that the engine can load from. Most shipping games that support mods went with option (b) — a designated Mods/ folder that loads through the legacy path, with a performance disclaimer.
Anti-cheat integration. EAC and BattlEye both have DS-aware hooks as of their 2025 builds. If you are on an older version, you will see load failures with cryptic error codes. Update.
Save game migration. Has nothing to do with DS, but people get confused. Your save files are not in .pak archives. DS changes nothing about saves.
Memory pressure during high-throughput streaming. DS can feed data to the GPU faster than UE's streaming pool can evict old data. On 8 GB cards, this surfaces as occasional texture pop-in that did not happen pre-DS. Mitigation: tighten r.Streaming.PoolSize on low-VRAM profiles by 256–512 MB to give the eviction path more headroom.
Shader compilation stalls masking DS benefits. Your cold-boot measurements will be dominated by PSO compilation if your PSO cache is not pre-warmed. Measure DS improvements against a warm PSO cache, or you will mis-attribute gains.
What this changes for your project
Three practical consequences.
You can afford higher-quality assets. Before DS, texture budgets were implicitly capped by how fast you could stream them. With DS, you can ship BC7 textures at higher resolution and detail-heavy Nanite meshes more aggressively. The streaming system keeps up.
Your "minimum spec" can be more honest. The fallback paths work. You do not have to design your game around the worst-case HDD user, because that user is not on Windows PC gaming in any meaningful number in 2026.
Loading screens are the new design smell. When your actual cold-boot time is 4 seconds on a mid-range machine, a 10-second loading screen with animated tips feels slow. Games shipping in the next 18 months will start dropping loading screens for initial scenes entirely. Plan for that.
Workflow recommendations
For a new project starting in 2026:
- Use UE5.7 (or newer) from the start. Do not plan to migrate from 5.3 late in development.
- Enable
bUseZenStore,bUseIoStore, Oodle compression from day one. Changing these mid-project causes cook-time pain. - Profile cold-boot and streaming hitches on at least three hardware tiers. Do not ship based only on your RTX 4090 dev machine.
- Budget for the pre-warming of PSO caches. DS will expose how much time you were really spending on shader compilation versus IO.
For a migration from an earlier UE version:
- Migrate to 5.7, confirm the base game works without DS.
- Enable DS via
s.DirectStorage.Enabled=True. - Measure cold-boot. Should improve immediately.
- Migrate to Oodle compression if you have not. Further ~15% load-time improvement.
- Enable GPU decompression via
s.DirectStorage.GPUDecompression=True. - Measure streaming hitches. Should drop 40–70%.
- Audit
r.Streaming.PoolSizeon low-VRAM configs.
If you want to automate the profiling portion of this — cold-boot measurement, hitch detection, diffing across builds — Unreal MCP Server can run build-and-measure cycles from a chat interface or a CI pipeline without you having to hand-script the Unreal Insights sessions. For teams that need a faster starting point on streaming-friendly content, our Blueprint Template Library includes WP-aware streaming volume patterns and level loading helpers that are already tuned for DS-era pipelines.
Real-world case: a 30 GB open-world port
We did a DS integration on a client's open-world action project — 30 GB packaged, about 180k textures, heavy Nanite usage, Lumen GI. The project was on UE5.6 with the normal async loader and had legitimate complaints about hitching during fast traversal. Here is what the migration actually looked like.
Week 1: UE5.6 to 5.7 migration. Normal engine bump pain. One deprecated API in a custom material expression, one shader compile issue on D3D12 debug layer. Two days of cleanup. Base game runs at parity with 5.6.
Week 2: Oodle compression migration. Project had been on zlib. Cook time went from 41 minutes to 28 minutes (Oodle Kraken decodes faster and compresses smaller). Pak size dropped 14%. Zero gameplay changes required.
Week 3: DS enablement and profiling. Flipped the .ini settings, rebuilt, ran the cold-boot profiler on six hardware tiers. Five of them showed immediate improvement. The sixth — an older laptop with a SATA SSD and integrated Intel graphics — showed a small regression that turned out to be a known issue with DS on integrated GPUs. Added a detection path that disables GPU decompression on iGPUs.
Week 4: streaming hitch audit. Cell load hitches dropped from an average of 64 ms to 23 ms on mid-spec PC. The remaining 23 ms was almost entirely shader compilation on first use of streamed-in materials. This led to a second-order project — pre-warming the PSO cache — that was independently valuable.
Total engineering time: about 70 hours for one engineer. Cold-boot time went from 16 seconds average to 6 seconds average. Streaming hitches reduced by roughly 60%. Shipped three months later with DS on by default.
The point: this was not a massive integration project. It was a normal week of engine work with well-defined steps and measurable wins at each step. The technology has matured to the point where you can actually budget for it.
Measuring DS correctly
A lot of early DS benchmarks were misleading because they measured cold-boot time with a warm OS file cache, or measured streaming hitches with an empty PSO cache, or measured on a single hardware config and extrapolated. If you want a number you can trust:
- Clear the OS file cache between runs. On Windows this means either rebooting or using
RAMMapto drop the standby list. Otherwise your second run is reading from RAM, not storage, and DS makes no difference. - Run with a warm PSO cache. PSO compilation stalls will dominate DS gains on cold runs. Either pre-compile shaders explicitly or run the benchmark twice and report the second number.
- Measure on the minimum spec you intend to ship. DS gains vary by hardware more than any other recent streaming improvement. A 4090 number is useless for predicting 1660 behavior.
- Separate cold-boot from in-game streaming. They are different loads with different bottlenecks. A 20% cold-boot improvement and a 60% streaming-hitch improvement are not the same story, even if the headline numbers look similar.
- Check for memory pressure. On 8 GB cards, DS can produce eviction-rate issues that do not surface in the first 5 minutes of gameplay but show up in a 30-minute session. Long sessions are part of the test matrix.
A reasonable benchmark protocol we use on client projects: three hardware tiers (high/mid/low), three scenarios (cold boot, in-game streaming, long-session stability), three runs each, dropped cache between runs. Nine data points per hardware tier. It is slow but the numbers mean something.
What comes next
The DS 2.0 + UE5.7 combination is the first time this technology has been genuinely ready. Here is what we expect in the next 12-18 months:
- 5.7.2 will fix WP package loading. Based on the public roadmap, this is the last major missing piece.
- More middleware will become DS-aware. FMOD, Wwise, and Scaleform are all reportedly working on DS paths for their streaming operations. Some of this will be transparent to users; some will require middleware version bumps.
- Console DS-analogues will continue diverging. PS5's IO system is its own thing; Xbox's velocity architecture is DS-derived but not identical. Expect UE's console paths to continue improving independently of the PC DS work.
- Consumer hardware baseline will shift. By 2027, DS-capable GPUs will be a meaningful majority of the Steam survey. At that point the fallback paths matter less and you can design more aggressively for DS as a baseline.
What this means practically: if you are starting a project in 2026, assume DS will be a baseline assumption by ship date. Build for it. If you are maintaining an existing project on UE5.7, the migration is worth doing — the ROI is high and the risk is low.
The honest summary
DirectStorage 2.0 in UE5.7 is the first version of this technology that is worth the integration effort for most PC games. The CPU fallback does not hurt you. The GPU decompression is a meaningful win for anyone with modern hardware. The pipeline changes are minimal if you start correctly. The measurable results range from "nice" on low-end hardware to "dramatic" on high-end.
The remaining weak spot is World Partition package loading, and that is scheduled to be fixed. If you are shipping a PC title in the next 18 months, you should turn DS on, measure, and likely ship with it.
The era of 30-second cold boots on PC is over. Games that still have them are going to look like they are from a different decade. That is a good thing, and DirectStorage is finally the path that gets us there reliably.