Tools→HDFS Capacity Planner

HDFS Capacity Planner

Plan HDFS storage capacity with replication, erasure coding, and growth forecasting

DataNode count

Disks per node

Disk size (TB)

Reserved per disk (GB)

Raw capacity: 360.0 TB

Configuration Output

# Capacity Summary

Raw capacity = 360.0 TB

Net capacity (after overhead) = 252.0 TB

Usable capacity = 84.0 TB

Storage efficiency = 23.3%

# HDFS Configuration

dfs.replication = 3

dfs.datanode.du.reserved = 0

Sanity Checks

3 pass

Minimum DataNodes for replication

10 nodes >= replication factor 3

Overhead percentage in safe range

30% overhead is adequate

Storage efficiency

23.3% of raw capacity is usable

Master HDFS administration in the CDP Administration course

How HDFS Capacity Planning Works

HDFS capacity planning starts with raw storage: the total number of DataNodes multiplied by disks per node and disk size. From this raw capacity, the planner subtracts OS and filesystem overhead, plus any space reserved per disk via dfs.datanode.du.reserved.

The remaining net capacity is divided by the replication factor (typically 3) to arrive at usable capacity — the actual amount of unique data you can store. Enabling erasure coding (EC) changes this calculation: policies like RS-6-3 store 6 data blocks and 3 parity blocks, yielding a 66.7% storage efficiency compared to 33.3% with triple replication.

Growth forecasting projects when your cluster will hit 70%, 80%, 90%, and 100% utilization based on current usage and monthly data growth. The sanity checks validate that your DataNode count supports the chosen replication factor, that overhead percentages are in a safe range, and that erasure coding policies have sufficient fault domain diversity.