Sysadmin

Reading the MFT Directly: How DiskSleuth Scans a Drive in Seconds

1 May 2026 3 min read

The standard way to build a disk space analyser is to walk the directory tree — recurse through subdirectories, accumulate sizes, done. It works. It’s also slow, memory-hungry on large volumes, and forces the OS to construct the path hierarchy on every call.

DiskSleuth doesn’t do that. When elevated, it reads the NTFS Master File Table directly and reconstructs the directory tree in memory from raw MFT records. A 500 GB volume with a million files in a few seconds, not minutes.

What the MFT is

NTFS stores every file and directory as a record in the Master File Table. Each record contains the file’s name, size, timestamps, parent directory reference, and data run locations. The directory tree in Explorer isn’t separately stored — it’s reconstructed from the parent references whenever you navigate somewhere.

That means to get the size of every file on a volume, you don’t need to walk the tree at all. Enumerate the MFT directly, read the size from each record, then reconstruct parent-child relationships from the parent FRN in each record. Sequential disk reads rather than pointer chasing.

FSCTL_ENUM_USN_DATA

The API for direct MFT enumeration is the USN journal interface. FSCTL_ENUM_USN_DATA iterates MFT records in bulk, returning USN_RECORD_V2 structures for each file.

let input = MFT_ENUM_DATA_V0 {
    StartFileReferenceNumber: 0,
    LowUsn: 0,
    HighUsn: i64::MAX,
};

DeviceIoControl(
    volume_handle,
    FSCTL_ENUM_USN_DATA,
    Some(&input as *const _ as *const _),
    size_of::<MFT_ENUM_DATA_V0>() as u32,
    Some(buffer.as_mut_ptr() as *mut _),
    buffer.len() as u32,
    &mut bytes_returned,
    None,
)?;

DiskSleuth uses a 256 KB buffer per call — standard examples use 64 KB, but larger buffers mean fewer round-trips to the kernel and the gain levels off around 256 KB.

This requires elevation. Without it, DiskSleuth falls back to parallel directory traversal with jwalk and rayon, which takes 15–30 seconds versus 2–4 for MFT. Still faster than single-threaded walking, but not close.

Arena-allocated tree

After reading the MFT records you need to reconstruct parent-child relationships. The naive approach — HashMap<FRN, DirNode> with boxed children — means heap allocations scattered across memory and pointer chasing on every traversal.

DiskSleuth uses an arena instead:

pub struct FileTree {
    nodes: Vec<FileNode>,
}

pub struct FileNode {
    pub name:    compact_str::CompactString,
    pub size:    u64,
    pub parent:  Option<NodeIndex>,
    pub children: Vec<NodeIndex>,
}

#[derive(Clone, Copy)]
pub struct NodeIndex(u32);

All nodes live in a single Vec. References are NodeIndex(u32) offsets — cache-friendly, no pointer chasing, no per-node allocator overhead. Size aggregation bottom-up is O(n) over the flat array: sort nodes leaves-first, accumulate sizes upward through the parent references. No recursion, no stack overflow on deep hierarchies.

Keeping the UI responsive

The scan runs in a background thread. Shared state is Arc<RwLock<FileTree>>. The naive version takes a write lock per node insertion — on a million nodes, that’s a million lock acquisitions, and the UI thread is competing for read access the whole time.

DiskSleuth batches writes: accumulate 2,000 nodes, take one write lock, insert them all. Contention drops by a factor of 2,000. The UI shows a progress counter (a single read) rather than trying to render the tree mid-scan.

The finished view is a squarified treemap with click-to-drill-down navigation. Fast enough for egui’s immediate-mode render loop because the arena layout means computing a directory’s children is just reading a contiguous slice of indices.

The tool is on GitHub, MIT licensed.

← All articles

Reading the MFT Directly: How DiskSleuth Scans a Drive in Seconds

What the MFT is

FSCTL_ENUM_USN_DATA

Arena-allocated tree

Keeping the UI responsive

Related articles

Parsing Every Log Format: The Timestamp Problem in LogSleuth

What's Actually Holding This File? Two Approaches to Windows File Locks

Why Event Viewer Is Slow, and How I Replaced It in Rust