FSAnalyser is a lightweight analysis tool that scans Windows file servers and produces a detailed report showing what data exists, how it is organised, who has access, and where optimization opportunities exist.
It functions as a comprehensive inventory and health check. The output provides the evidence needed to plan migrations, security audits, storage optimization, and compliance reviews with complete visibility into file server contents and permissions.
FSAnalyser is delivered as a single application with a point-and-click interface that runs directly on each file server. There is no remote scanning, no network overhead for file enumeration, and no agent or software to install. The system automatically detects local drives, domain controller, and DFS namespaces β pre-filling the form for immediate execution. The scan runs in up to eleven automated stages:
Finds every SMB share on the local server, including hidden shares (e.g. Finance$), and records who has share-level permissions.
Discovers all DFS links and where they point, identifying data that may be accessible via multiple paths.
Walks the entire folder tree recording file name, size, type, dates, and NTFS attributes (including EFS encryption and reparse point flags) for every item.
Reads the NTFS access control lists (ACLs) on every folder to understand exactly who can access what.
Resolves every user and group SID found in permissions to readable identities and can expand nested group membership for access analysis.
Enriches resolved identities from Active Directory (for example display name, department, manager) and flags groups with no enabled members.
Uses the Windows Server API to identify files currently open by users. Helps plan migration cutover β open files may fail to copy. Requires running elevated (Administrator).
Reads file contents and computes SHA-256 hashes for exact duplicate detection. Only runs when the "Hash file contents" option is ticked. This is the only stage that reads file content.
Enumerates hidden NTFS alternate data streams attached to files. These are invisible in Explorer but consume space and will be lost during migration. Only runs when the "Scan Alternate Data Streams" option is ticked.
Produces up to 21 CSV files covering every dimension of the analysis so the data can be loaded into Excel or Power BI for further review.
Produces a self-contained HTML report with colour-coded charts, tables, and clear status indicators β no software needed to view it.
FSAnalyser provides full control over long-running scans, making it practical for very large file servers where a complete scan may take minutes or hours:
The Stop button cancels the scan within seconds. All data collected so far is safely committed to the database — nothing is lost. Workers across all parallel threads respond cooperatively to the cancel signal.
The Pause button temporarily suspends all scan activity. Worker threads block safely until Resume is pressed. Useful for reducing disk I/O during busy periods, then continuing without restarting.
If the scan is stopped, the application is closed, or the server is rebooted, simply run the tool again pointing to the same output folder. Completed stages are automatically skipped, and partially-completed stages (such as content hashing) resume from where they left off — files already processed are not re-scanned.
Tick the “Fresh scan” checkbox to clear all previous state and start over from scratch. This removes the stored progress and database, ensuring a completely clean run.
In addition to the HTML report, FSAnalyser produces up to 21 individual CSV files containing the full raw data behind every report section. These are designed for import into Excel, Power BI, or any other analytics tool. Only CSVs with data are produced β empty exports are automatically skipped.
| CSV File | Contents |
|---|---|
| cold_data_by_folder.csv | Top-level folder breakdown showing file count and size in each age band (current, 1β2 yr, 2β5 yr, 5β7 yr, 7+ yr) |
| stale_files.csv | Individual files not modified in over 2 years β path, size, and last modified date |
| unpublished.csv | Top-level folders that are not accessible through any SMB share or DFS link |
| acl_complexity.csv | Folders ranked by permission complexity β unique ACE count, direct user grants, and inheritance status |
| principals.csv | Every domain user and group found in NTFS permissions, with AD display name and type |
| group_health.csv | AD group health analysis β total members, enabled members, and whether the group is effectively dead |
| duplicates.csv | Duplicate file clusters (same hash or same size+type), showing every file path and wasted bytes |
| compressed_files.csv | Every NTFS-compressed file β path, compressed size, and original size |
| file_types.csv | Complete file extension breakdown β extension, file count, and total size per type |
| folder_breakdown.csv | Storage consumed by each top-level folder β file count, folder count, and total size |
| sp_compatibility.csv | Every file that would fail a SharePoint upload β path, issue type (long path, illegal characters, blocked extension, oversized) |
| efs_files.csv | Every EFS-encrypted file β full path and size |
| reparse_points.csv | All symbolic links and junction points β path, type, and target |
| churn_by_month.csv | Monthly file modification counts and volume for trend analysis |
| permission_mapping.csv | Per-folder permission mapping suggestions for SharePoint migration (simple / moderate / complex) |
| open_files.csv | Files that were open/locked by users at scan time β path, username, and lock type |
| alternate_data_streams.csv | Hidden NTFS alternate data streams β host file path, stream name, and size |
The report is designed to answer the key questions that come up in every SharePoint migration planning exercise:
Total file count, folder count, and storage consumed β broken down by top-level folder, file type, and file size range. You'll see at a glance which departments own the most data and what types of files dominate (documents, images, executables, etc.).
Files are grouped by how recently they were last modified: current (under 1 year), ageing (1β2 years), stale (2β5 years), and archive (5+ years). A colour-coded stacked bar shows the split instantly. Cold data is the single biggest opportunity β data that nobody has touched in years does not need to be migrated to premium SharePoint storage.
The report identifies clusters of files that are likely duplicates (same file type and exact same file size appearing 3 or more times). It calculates the total wasted storage and highlights the worst offenders. When the "Hash file contents" option is ticked, the tool also performs exact duplicate detection by computing SHA-256 hashes, confirming which files are truly identical down to the byte.
Not all data on a file server is necessarily shared with users. The report compares the folder structure against SMB shares and DFS links, highlighting any top-level folders that are not accessible through any share or DFS path. This could be orphaned data, backup copies, or folders that were removed from shares but never cleaned up.
SharePoint handles permissions very differently from NTFS. The report shows how many distinct access control lists exist across folders, flags folders with unusually high complexity (many unique permission entries), and highlights where individual users have been given direct permissions instead of going through groups β a pattern that is hard to migrate cleanly.
The report checks every security group referenced in the file server permissions and verifies it against Active Directory. Groups with no enabled members ("dead groups") are flagged β these are groups that appear in permissions but effectively grant access to nobody. Cleaning these up before migration simplifies the permission model significantly.
A complete inventory of every SMB share (including hidden admin shares) and every DFS link, with their paths and permissions, so you have a full picture of how users currently access the data.
The report automatically flags files that would fail to upload to SharePoint:
paths exceeding 400 characters, illegal characters in names (# % * : < > ? |),
blocked file extensions, and files exceeding 250 GB. Each issue is categorised by severity so
remediation can be prioritised.
Files encrypted with NTFS EFS and items with reparse points (symbolic links, junction points) are automatically detected from file attributes. EFS files cannot be migrated without decryption; junctions and symlinks may cause data to be counted twice or create circular references.
Based on the total data volume, the report estimates migration time at three speeds: LAN (1 Gbps), WAN (100 Mbps), and SharePoint-throttled (~200 Mbps effective). This helps schedule realistic cutover windows.
The report analyses how recently files were modified (last 30 days, 90 days, 1 year) and provides monthly modification trends. Highly active folders may need a delta-sync migration approach. Where access timestamps are available, the report also shows user activity distribution.
For each top-level folder, the report analyses ACL complexity and suggests how it might map to SharePoint permission levels β simple (single group), moderate (2β3 levels), or complex (requiring manual flattening). This gives a starting point for permission planning.
When run elevated, the report shows which files were locked/open by users at scan time, along with the username and access type. These files need special handling during migration cutover.
The report header includes an overall health badge β Healthy Needs Attention or Critical β based on the percentage of cold data, duplicate waste, SharePoint compatibility issues, and dead groups found.
Based on typical file server assessments, the report will help quantify savings in these areas:
| Opportunity | Typical Savings | How We Identify It |
|---|---|---|
| Cold data exclusion | 30β60% of total storage | Files not modified in 2+ years β archive or delete instead of migrating to SharePoint |
| Duplicate elimination | 5β15% of total storage | Same-size, same-type file clusters appearing across multiple folders |
| Orphaned/unpublished data | Varies | Folders on disk that no user can reach via any share β often forgotten data |
| Permission simplification | Reduced migration effort | Dead groups and direct user ACEs that can be cleaned up before migration |
| Empty files & tiny files | Clutter reduction | Zero-byte files and files under 1 KB that may be temporary or broken |
Performance impact: Because the application runs locally, all file enumeration is direct disk I/O with no network overhead. In a benchmark dataset of 2 million files (358 GB), the inventory stage completed in under 35 seconds on local NVMe. Real production times vary by file count, storage speed, ACL complexity, and AD responsiveness. The optional content hashing feature does read file contents, which will generate sustained disk read activity β this is best scheduled outside business hours on busy servers. All other stages have minimal I/O impact and are safe to run at any time.
The key point is that runtime is driven by file count and disk throughput more than TB alone. Many small files take longer than fewer large files, even at the same total TB.
| Data Volume | Metadata-Only Scan | With Focused Hashing | With Full Hashing |
|---|---|---|---|
| 1 TB | 15β60 minutes | 30 minutes β 3 hours | 2β8 hours |
| 5 TB | 1β5 hours | 2β12 hours | 8β40 hours |
| 10 TB | 2β10 hours | 4β24 hours | 16β80 hours |
Hashing formula (best-effort): hashing time in hours is approximately total bytes to hash divided by effective read throughput in bytes per second, then divided by 3600. Example: 1 TB at 150 MB/s is about 1.9 hours for content reads alone, before database and report overhead.
In addition to the core inventory, the following advanced analysis features are included and appear automatically in the report:
| Capability | What It Does | Status |
|---|---|---|
| SharePoint compatibility | Flags files with paths over 400 characters, illegal characters, blocked extensions, and files over 250 GB β everything that would fail a SharePoint upload. | Included |
| Content-based deduplication | SHA-256 hashing of file contents for exact duplicate detection. Confirms which files are truly identical, not just same-size. Enabled by ticking a checkbox in the application. | Included (opt-in) |
| Open/locked file detection | Detects files currently open by users at scan time, showing username and lock type. Helps plan migration cutover for actively-used files. Runs automatically when elevated. | Included |
| Permission mapping suggestions | Analyses ACL complexity per folder and suggests how NTFS permissions could map to SharePoint permission levels (simple / moderate / complex). | Included |
| Cross-server duplicate detection | Compares two completed scan databases and identifies duplicates across servers by matching content hashes. Available via the CLI compare-db command, which generates cross_server_duplicates.csv and cross_server_duplicates.html. | Included (CLI) |
| EFS encrypted files | Automatically flags files with the NTFS encryption attribute set. These cannot be migrated without decryption by the key holder. | Included |
| Symbolic links & junctions | Detects NTFS reparse points (junctions and symlinks) that could cause data to be counted twice or create circular references. SharePoint does not support these. | Included |
| Alternate Data Streams | Enumerates hidden NTFS data streams (e.g. Zone.Identifier, macOS resource forks). These consume space but will be lost during migration. Enabled by ticking a checkbox in the application. | Included (opt-in) |
| Bandwidth estimation | Estimates migration time at three speeds: LAN (1 Gbps), WAN (100 Mbps), and SharePoint-throttled (~200 Mbps). Helps schedule realistic cutover windows. | Included |
| Data churn rate | Analyses modification patterns (30-day, 90-day, yearly) with monthly trends. Highly active folders may need delta-sync migration rather than bulk copy. | Included |
| User activity analysis | Shows access-time distribution when timestamps are available, identifying files that may be candidates for archiving rather than migration. | Included |
| Scan control (stop/pause/resume) | Stop and Pause buttons provide immediate control over the scan. If interrupted, the scan resumes from where it left off on the next run β completed stages are skipped and partially-processed data is not repeated. Safe for very large datasets that may require multiple sessions to complete. | Included |