FSAnalyser — File Server Analysis Tool

What Is This?

FSAnalyser is a lightweight analysis tool that scans Windows file servers and produces a detailed report showing what data exists, how it is organised, who has access, and where optimization opportunities exist.

It functions as a comprehensive inventory and health check. The output provides the evidence needed to plan migrations, security audits, storage optimization, and compliance reviews with complete visibility into file server contents and permissions.

🔒 Safety Guarantee — Strictly Read-Only FSAnalyser operates in strict read-only mode. It does not modify, move, delete, or lock any files or folders. It does not write anything to the drives being scanned — all output (database, CSV exports, and report) is written to a separate output folder that you choose. The scanned data volumes are never altered in any way. It is a single application with a graphical interface — there is nothing to install and nothing to uninstall afterwards. In metadata-only mode the impact is typically low; optional content hashing is more I/O intensive and is best scheduled off-peak. It reads the same information that Windows Explorer reads when you browse a folder — just faster and more systematically.

What Does It Actually Do?

FSAnalyser is delivered as a single application with a point-and-click interface that runs directly on each file server. There is no remote scanning, no network overhead for file enumeration, and no agent or software to install. The system automatically detects local drives, domain controller, and DFS namespaces — pre-filling the form for immediate execution. The scan runs in up to eleven automated stages:

🖧

1. Discover Shares

Finds every SMB share on the local server, including hidden shares (e.g. Finance$), and records who has share-level permissions.

🔗

2. Map DFS Namespaces

Discovers all DFS links and where they point, identifying data that may be accessible via multiple paths.

📂

3. Inventory Every File

Walks the entire folder tree recording file name, size, type, dates, and NTFS attributes (including EFS encryption and reparse point flags) for every item.

🔐

4. Read Security Permissions

Reads the NTFS access control lists (ACLs) on every folder to understand exactly who can access what.

👥

5. Resolve Principals

Resolves every user and group SID found in permissions to readable identities and can expand nested group membership for access analysis.

💚

6. AD Enrichment & Group Health

Enriches resolved identities from Active Directory (for example display name, department, manager) and flags groups with no enabled members.

🔓

7. Detect Open Files

Uses the Windows Server API to identify files currently open by users. Helps plan migration cutover — open files may fail to copy. Requires running elevated (Administrator).

#️⃣

8. Content Hashing (optional)

Reads file contents and computes SHA-256 hashes for exact duplicate detection. Only runs when the "Hash file contents" option is ticked. This is the only stage that reads file content.

📎

9. Alternate Data Streams (optional)

Enumerates hidden NTFS alternate data streams attached to files. These are invisible in Explorer but consume space and will be lost during migration. Only runs when the "Scan Alternate Data Streams" option is ticked.

📊

10. Export Raw Data

Produces up to 21 CSV files covering every dimension of the analysis so the data can be loaded into Excel or Power BI for further review.

📋

11. Generate Report

Produces a self-contained HTML report with colour-coded charts, tables, and clear status indicators — no software needed to view it.

💡 Network Traffic Because FSAnalyser runs locally on the file server, all file and folder enumeration is local disk I/O — there is no SMB network traffic. The only network traffic is LDAP queries to Active Directory during identity enrichment (stage 6), which is lightweight (a few MB at most).

Scan Control — Stop, Pause & Resume

FSAnalyser provides full control over long-running scans, making it practical for very large file servers where a complete scan may take minutes or hours:

⏹️

Immediate Stop

The Stop button cancels the scan within seconds. All data collected so far is safely committed to the database — nothing is lost. Workers across all parallel threads respond cooperatively to the cancel signal.

⏸️

Pause & Resume

The Pause button temporarily suspends all scan activity. Worker threads block safely until Resume is pressed. Useful for reducing disk I/O during busy periods, then continuing without restarting.

🔄

Resume on Restart

If the scan is stopped, the application is closed, or the server is rebooted, simply run the tool again pointing to the same output folder. Completed stages are automatically skipped, and partially-completed stages (such as content hashing) resume from where they left off — files already processed are not re-scanned.

✨

Fresh Scan Option

Tick the “Fresh scan” checkbox to clear all previous state and start over from scratch. This removes the stored progress and database, ensuring a completely clean run.

🛡️ Safe by Design All database writes use transactions with idempotent inserts. Stopping or pausing at any point — even mid-write — cannot corrupt the database. The scan can always be safely resumed from its last checkpoint.

CSV Exports — Raw Data for Excel & Power BI

In addition to the HTML report, FSAnalyser produces up to 21 individual CSV files containing the full raw data behind every report section. These are designed for import into Excel, Power BI, or any other analytics tool. Only CSVs with data are produced — empty exports are automatically skipped.

CSV File	Contents
cold_data_by_folder.csv	Top-level folder breakdown showing file count and size in each age band (current, 1–2 yr, 2–5 yr, 5–7 yr, 7+ yr)
stale_files.csv	Individual files not modified in over 2 years — path, size, and last modified date
unpublished.csv	Top-level folders that are not accessible through any SMB share or DFS link
acl_complexity.csv	Folders ranked by permission complexity — unique ACE count, direct user grants, and inheritance status
principals.csv	Every domain user and group found in NTFS permissions, with AD display name and type
group_health.csv	AD group health analysis — total members, enabled members, and whether the group is effectively dead
duplicates.csv	Duplicate file clusters (same hash or same size+type), showing every file path and wasted bytes
compressed_files.csv	Every NTFS-compressed file — path, compressed size, and original size
file_types.csv	Complete file extension breakdown — extension, file count, and total size per type
folder_breakdown.csv	Storage consumed by each top-level folder — file count, folder count, and total size
sp_compatibility.csv	Every file that would fail a SharePoint upload — path, issue type (long path, illegal characters, blocked extension, oversized)
efs_files.csv	Every EFS-encrypted file — full path and size
reparse_points.csv	All symbolic links and junction points — path, type, and target
churn_by_month.csv	Monthly file modification counts and volume for trend analysis
permission_mapping.csv	Per-folder permission mapping suggestions for SharePoint migration (simple / moderate / complex)
open_files.csv	Files that were open/locked by users at scan time — path, username, and lock type
alternate_data_streams.csv	Hidden NTFS alternate data streams — host file path, stream name, and size

💡 Tip CSV files use UTF-8 encoding with a BOM header for reliable opening in Excel. Each file has column headers in the first row. Files are only created when data exists — you won't see empty CSV files cluttering the output folder.

What Will the Report Show?

The report is designed to answer the key questions that come up in every SharePoint migration planning exercise:

📐 How Much Data Do We Have?

Total file count, folder count, and storage consumed — broken down by top-level folder, file type, and file size range. You'll see at a glance which departments own the most data and what types of files dominate (documents, images, executables, etc.).

Total Items2,045,491

Total Size357.9 GB

Top file types.docx, .xlsx, .pdf, .pptx, .jpg, .png

❄️ How Much of It Is Cold (Unused)?

Files are grouped by how recently they were last modified: current (under 1 year), ageing (1–2 years), stale (2–5 years), and archive (5+ years). A colour-coded stacked bar shows the split instantly. Cold data is the single biggest opportunity — data that nobody has touched in years does not need to be migrated to premium SharePoint storage.

📋 Are There Duplicates?

The report identifies clusters of files that are likely duplicates (same file type and exact same file size appearing 3 or more times). It calculates the total wasted storage and highlights the worst offenders. When the "Hash file contents" option is ticked, the tool also performs exact duplicate detection by computing SHA-256 hashes, confirming which files are truly identical down to the byte.

🔒 Is There Unpublished Data?

Not all data on a file server is necessarily shared with users. The report compares the folder structure against SMB shares and DFS links, highlighting any top-level folders that are not accessible through any share or DFS path. This could be orphaned data, backup copies, or folders that were removed from shares but never cleaned up.

🔐 How Complex Are the Permissions?

SharePoint handles permissions very differently from NTFS. The report shows how many distinct access control lists exist across folders, flags folders with unusually high complexity (many unique permission entries), and highlights where individual users have been given direct permissions instead of going through groups — a pattern that is hard to migrate cleanly.

💚 Are AD Groups Healthy?

The report checks every security group referenced in the file server permissions and verifies it against Active Directory. Groups with no enabled members ("dead groups") are flagged — these are groups that appear in permissions but effectively grant access to nobody. Cleaning these up before migration simplifies the permission model significantly.

🖧 Share and DFS Overview

A complete inventory of every SMB share (including hidden admin shares) and every DFS link, with their paths and permissions, so you have a full picture of how users currently access the data.

🎯 SharePoint Compatibility Check

The report automatically flags files that would fail to upload to SharePoint: paths exceeding 400 characters, illegal characters in names (# % * : < > ? |), blocked file extensions, and files exceeding 250 GB. Each issue is categorised by severity so remediation can be prioritised.

🔓 EFS Encrypted Files & Reparse Points

Files encrypted with NTFS EFS and items with reparse points (symbolic links, junction points) are automatically detected from file attributes. EFS files cannot be migrated without decryption; junctions and symlinks may cause data to be counted twice or create circular references.

📡 Migration Bandwidth Estimation

Based on the total data volume, the report estimates migration time at three speeds: LAN (1 Gbps), WAN (100 Mbps), and SharePoint-throttled (~200 Mbps effective). This helps schedule realistic cutover windows.

📈 Data Churn & User Activity

The report analyses how recently files were modified (last 30 days, 90 days, 1 year) and provides monthly modification trends. Highly active folders may need a delta-sync migration approach. Where access timestamps are available, the report also shows user activity distribution.

🗺️ Permission Mapping Suggestions

For each top-level folder, the report analyses ACL complexity and suggests how it might map to SharePoint permission levels — simple (single group), moderate (2–3 levels), or complex (requiring manual flattening). This gives a starting point for permission planning.

🔓 Open Files at Scan Time

When run elevated, the report shows which files were locked/open by users at scan time, along with the username and access type. These files need special handling during migration cutover.

Overall Health Score

The report header includes an overall health badge — Healthy Needs Attention or Critical — based on the percentage of cold data, duplicate waste, SharePoint compatibility issues, and dead groups found.

📊 Example Report See a complete example of the HTML report produced by FSAnalyser: View Example Report →

Key Savings Opportunities

Based on typical file server assessments, the report will help quantify savings in these areas:

Opportunity	Typical Savings	How We Identify It
Cold data exclusion	30–60% of total storage	Files not modified in 2+ years — archive or delete instead of migrating to SharePoint
Duplicate elimination	5–15% of total storage	Same-size, same-type file clusters appearing across multiple folders
Orphaned/unpublished data	Varies	Folders on disk that no user can reach via any share — often forgotten data
Permission simplification	Reduced migration effort	Dead groups and direct user ACEs that can be cleaned up before migration
Empty files & tiny files	Clutter reduction	Zero-byte files and files under 1 KB that may be temporary or broken

Safety & Impact on Production

✅ What the tool DOES:

Reads file and folder metadata (name, size, dates, attributes) from local disk
Reads NTFS security descriptors (permissions)
Reads SMB share configurations and DFS namespace structures
Queries Active Directory for user and group details (read-only lookups — the only network traffic)
Detects currently open/locked files via Windows Server APIs (requires running as Administrator)
Optionally reads file contents to compute hashes for exact duplicate detection
Optionally enumerates NTFS alternate data streams
Stores all findings in a local database and CSV exports on the same server

🚫 What the tool DOES NOT do:

Does not modify, move, rename, or delete any file or folder — 100% read-only
Does not write anything to the drives being scanned — all output goes to a separate folder
Does not change any permissions, shares, or DFS configuration
Does not require installation — it is a single portable application
Does not lock files or prevent user access
Does not modify Active Directory in any way — only read-only LDAP lookups
Does not send data anywhere — all output is local files on the server

Performance impact: Because the application runs locally, all file enumeration is direct disk I/O with no network overhead. In a benchmark dataset of 2 million files (358 GB), the inventory stage completed in under 35 seconds on local NVMe. Real production times vary by file count, storage speed, ACL complexity, and AD responsiveness. The optional content hashing feature does read file contents, which will generate sustained disk read activity — this is best scheduled outside business hours on busy servers. All other stages have minimal I/O impact and are safe to run at any time.

Estimated Runtime by Data Volume (TB)

The key point is that runtime is driven by file count and disk throughput more than TB alone. Many small files take longer than fewer large files, even at the same total TB.

Planning assumptions The ranges below are practical planning bands for production servers. They include metadata, permissions, and reporting overhead. Optional content hashing adds substantial extra read time.

Data Volume	Metadata-Only Scan	With Focused Hashing	With Full Hashing
1 TB	15–60 minutes	30 minutes – 3 hours	2–8 hours
5 TB	1–5 hours	2–12 hours	8–40 hours
10 TB	2–10 hours	4–24 hours	16–80 hours

Hashing formula (best-effort): hashing time in hours is approximately total bytes to hash divided by effective read throughput in bytes per second, then divided by 3600. Example: 1 TB at 150 MB/s is about 1.9 hours for content reads alone, before database and report overhead.

Recommended execution model Run metadata-only scans first (fast, low risk), then run focused/full hashing as a second pass during an overnight window if exact duplicate evidence is required.

Advanced Capabilities

In addition to the core inventory, the following advanced analysis features are included and appear automatically in the report:

Capability	What It Does	Status
SharePoint compatibility	Flags files with paths over 400 characters, illegal characters, blocked extensions, and files over 250 GB — everything that would fail a SharePoint upload.	Included
Content-based deduplication	SHA-256 hashing of file contents for exact duplicate detection. Confirms which files are truly identical, not just same-size. Enabled by ticking a checkbox in the application.	Included (opt-in)
Open/locked file detection	Detects files currently open by users at scan time, showing username and lock type. Helps plan migration cutover for actively-used files. Runs automatically when elevated.	Included
Permission mapping suggestions	Analyses ACL complexity per folder and suggests how NTFS permissions could map to SharePoint permission levels (simple / moderate / complex).	Included
Cross-server duplicate detection	Compares two completed scan databases and identifies duplicates across servers by matching content hashes. Available via the CLI compare-db command, which generates cross_server_duplicates.csv and cross_server_duplicates.html.	Included (CLI)
EFS encrypted files	Automatically flags files with the NTFS encryption attribute set. These cannot be migrated without decryption by the key holder.	Included
Symbolic links & junctions	Detects NTFS reparse points (junctions and symlinks) that could cause data to be counted twice or create circular references. SharePoint does not support these.	Included
Alternate Data Streams	Enumerates hidden NTFS data streams (e.g. Zone.Identifier, macOS resource forks). These consume space but will be lost during migration. Enabled by ticking a checkbox in the application.	Included (opt-in)
Bandwidth estimation	Estimates migration time at three speeds: LAN (1 Gbps), WAN (100 Mbps), and SharePoint-throttled (~200 Mbps). Helps schedule realistic cutover windows.	Included
Data churn rate	Analyses modification patterns (30-day, 90-day, yearly) with monthly trends. Highly active folders may need delta-sync migration rather than bulk copy.	Included
User activity analysis	Shows access-time distribution when timestamps are available, identifying files that may be candidates for archiving rather than migration.	Included
Scan control (stop/pause/resume)	Stop and Pause buttons provide immediate control over the scan. If interrupted, the scan resumes from where it left off on the next run — completed stages are skipped and partially-processed data is not repeated. Safe for very large datasets that may require multiple sessions to complete.	Included

✅ All previously identified limitations have been addressed. Every item from the original roadmap is now implemented and appears in the HTML report. Optional features (content hashing, alternate data stream scanning) are enabled by ticking a checkbox in the application, giving full control over scan duration and disk read activity.

Typical Runtime Workflow

Copy the application onto the file server. It is a single file — no installer, no dependencies, no .NET or Java runtime required.
Right-click and choose "Run as Administrator" to ensure full access to NTFS security descriptors and the open-files API. The application opens and automatically detects the local environment: all fixed drives (C:\, D:\, etc.), the domain controller, and DFS namespace paths are pre-filled based on system configuration. No manual data entry is needed in most cases.
Select any optional features (such as content hashing or alternate data stream scanning) and press Start. The scan runs automatically through all stages with a live progress display. Stop and Pause buttons are always available to immediately halt or temporarily suspend the scan at any point — the tool responds within seconds even during heavy parallel processing. If the scan is interrupted (stopped, paused, or the application is closed), it can be resumed from where it left off simply by running it again with the same output folder — completed stages are automatically skipped and partially-completed work is not repeated. A benchmark run across 2 million files completed inventory in under 35 seconds on local NVMe. Production elapsed time varies by file count, storage performance, and selected options. All I/O is local disk — no network traffic for file enumeration.
All data stays on the server. The scan results (database, CSV exports, and HTML report) are written to an output folder on the server for later collection.
Open the HTML report directly from the application, or in any web browser — no special software needed. The report button is right there on screen once the scan completes.
Repeat for each file server, then optionally run compare-db in the CLI to find duplicates spanning environments. This generates a dedicated cross-server CSV and HTML report.
Use the findings to decide what to migrate, what to archive, and what to clean up — before incurring cloud storage costs.

🎯 Zero-Configuration Startup The application automatically detects your local drives, domain controller, and DFS namespaces on startup, pre-filling the form with sensible defaults. In most cases, you can simply press Start without entering any information manually. Advanced options remain available for custom scenarios.

🔑 Administrator Access The application should be run as Administrator (right-click → "Run as administrator") on each file server. This ensures it can read all NTFS security descriptors regardless of file permissions, and detect which files are currently open by users. Without elevation, the application still works but may miss some permission data and will skip open-file detection.

Next Steps

Identify the file servers to scan and the output folder location for each run.
Schedule a maintenance window or agree to run during business hours (the core scan has minimal impact; optional content hashing is more I/O-intensive and can be scheduled separately).
Log in to each file server with a domain admin or local admin account, copy the application across, and run it as Administrator. No service account, installation, or special configuration is needed — it is a single file with a graphical interface.
Copy the output (database, CSV exports, and HTML report) off the server for review.
Review the reports together and agree on the migration scope, archiving strategy, and cleanup plan.

Questions? This document is intended as a high-level overview. A full technical specification covering the scan stages, database schema, and output formats is available on request.