Storage That Survives: NVMe/SSD/HDD Tiers & RAID
Why a database crawls on the wrong disk, and how a tiered NVMe/SSD/HDD layout plus the right RAID level (and the one thing RAID emphatically is not) fixes it.
Part 2 of a 9-part series: teaching CompTIA A+ (Core 1 / 220-1101 and Core 2 / 220-1102) through a real build, a private, local AI workstation/server for a small business.
The job: a database choking on the wrong disk
The stack came up clean. Docker showed every container healthy, the inference server answered its first probe, and the operator declared victory. Then someone ran a real workload (a few hundred documents indexed into the database) and the operation that should have finished in seconds crawled for several minutes.
The machine had a fast NVMe drive. The database was not on it.
During bring-up, the database container was configured with a default Docker named volume. Docker places named volumes under /var/lib/docker/volumes/: which lands on whatever disk holds the root filesystem. On this box, that was a spinning HDD used for OS and general storage. Perfectly fine for log files; a disaster for a write-heavy relational database doing concurrent inserts and index updates. Random IOPS on a spinning disk are roughly 100× worse than a mid-range NVMe under the same write pattern.
The fix was not buying faster hardware. Everything we needed was already in the machine: it was on the wrong tier. That problem, and the discipline that prevents it, is what this article is about. By the end you'll understand drive types and interfaces well enough to spec a tiered layout from scratch, know your RAID levels cold, and have a migration runbook you can adapt to move any service to the right tier.
📘 Objectives covered (220-1101) >Core 1 (220-1101)- 3.3, Given a scenario, install and configure storage devices: drive types (HDD, SSD, NVMe), form factors (M.2, 2.5-inch), storage interfaces (SATA, PCIe/NVMe), RAID concepts and hardware/software RAID, NAS/network storage.- 5.3, Given a scenario, troubleshoot hard drives and RAID arrays: failure symptoms (read/write errors, clicking, slow response), RAID degraded/failed states, NVMe thermal throttling. >Concepts taught below: SSD vs. NVMe vs. HDD trade-offs, SATA and PCIeinterfaces, RAID 0/1/5/10 levels and their failure behavior, hot vs. bulktier placement, NAS/NFS backup targets, and migrating a running database tothe correct drive.
Concepts: drives, interfaces, and tiers
Drive types and interfaces (1101 3.3)
Three things matter for storage: the physical medium (how bits are stored), the interface (how the drive talks to the motherboard), and the form factor (what slot or connector it uses).
HDD. Spinning magnetic platters with a read/write head that physically moves. Sequential throughput is decent (fine for streaming logs or bulk files) but random I/O is the weakness. Every random-access request requires a physical seek, adding 5–15 ms of latency per op. A database doing thousands of small random reads and writes per second hits that ceiling immediately.
SATA SSD. Flash, no moving parts. Random I/O jumps dramatically, 80,000– 100,000 IOPS vs. an HDD's few hundred. The SATA interface caps at roughly 600 MB/s sequential, which is fast enough for general-purpose storage but leaves performance on the table for flash that could go faster.
NVMe. Same flash technology, different interface. NVMe drives connect over PCIe lanes directly (purpose-built for low-latency flash) and deliver 3,000–7,000 MB/s sequential and hundreds of thousands of IOPS. That's 5–10× the throughput of a SATA SSD.
The most common form factor is M.2: a small rectangular connector that accepts both SATA (slower) and NVMe (faster). They look identical in a box. Before you buy: check whether the M.2 slot on your motherboard is wired for SATA, NVMe, or both: the wrong drive in the wrong slot either doesn't work or runs at SATA speed. A 2.5-inch SATA SSD (same physical shape as a laptop HDD) exists for systems without M.2 slots.
Type | Interface | Random IOPS | Sequential | Use for |
|---|---|---|---|---|
HDD | SATA | ~200–500 | ~150 MB/s | Bulk/archive; sequential workloads |
SATA SSD | SATA | ~80–100K | ~550 MB/s | OS, general storage, mid-tier |
NVMe SSD | PCIe (NVMe) | 200K–1M+ | 3,000–7,000 MB/s | Database, cache, model weights |
RAID: what it is and what it is not
RAID (Redundant Array of Independent Disks) combines multiple drives into one logical volume for redundancy (survive a drive failure) and/or performance (spread I/O across spindles). The four levels for the exam:
RAID 0, Striping, no redundancy. Data is split across two or more drives for full combined capacity and higher throughput. Zero fault tolerance: one drive failure loses everything. Use only for re-creatable data where speed outweighs survival.
RAID 1: Mirroring. Every write goes to both drives identically. Lose one drive, pull it, insert a replacement, the array rebuilds from the surviving copy. Cost: 50% of raw capacity. Read performance can be slightly better; write performance equals a single drive. Best for boot/OS or small critical volumes.
RAID 5: Striping with distributed parity. Minimum three drives. Parity information is distributed across all drives so any single drive can be reconstructed. Usable capacity = N-1 drives. Good read performance; writes are slower (parity must be recalculated). The risk: rebuilding after a failure is I/O-intensive and takes hours on large drives. A second failure during rebuild means total loss: a meaningful real-world risk with large-capacity disks.
RAID 10 (1+0): Mirroring + striping. Four drives minimum. Mirrors in pairs, then stripes across pairs. Survives a failure in each mirrored pair, delivers strong read and write performance, 50% usable capacity.
Quick reference:
- "Best performance, don't care about data": RAID 0
- "Most redundancy, small volume": RAID 1
- "Balance of capacity, redundancy, performance, 3+ drives": RAID 5
- "Best all-around for critical data with budget for drives": RAID 10
RAID is not a backup. RAID protects against hardware failure (a drive dies). It does not protect against logical corruption, accidental deletion, or ransomware: a bug that writes bad data mirrors it faithfully to all members. Backup is a separate, distinct concern: covered in Part 8.
When an array degrades: detect and respond (1101 5.3)
A degraded array is not a failed array, but it's one drive failure away from one. Detecting the state and responding promptly is both exam material and real operational discipline.
Detect. Software RAID on Linux uses mdadm. The fastest check:
cat /proc/mdstat
# Healthy members show [UU]; a downed member shows [U_]
mdadm --detail /dev/md0
# Reports "State : clean, degraded" and marks the bad member faulty/removedFor hardware RAID, check the controller's management utility or configured alerts. Don't wait for a full failure: smartctl -a /dev/sdX reads SMART data directly. Climbing reallocated-sector and pending-sector counts are the early warning; the drive is relocating bad blocks, which is a countdown.
What "degraded" means. The array is still serving data, but redundancy is spent: the next failure loses everything on RAID 1 or RAID 5. Treat a degraded array as urgent; "it still works, I'll deal with it later" is how you lose data.
Respond and rebuild. Replace the failed disk, then add the new member:
mdadm --manage /dev/md0 --add /dev/sdX
watch cat /proc/mdstat # rebuild progress, updates every 2 sRebuild is I/O-heavy and can take hours on large drives: exactly when a second marginal drive tends to fail. On RAID 5 that means total loss. This is the practical argument for RAID 10 on arrays where availability matters, and the reason tested backups matter more than the RAID level you chose.
Hot, bulk, and archive tiers
When you have NVMe, SATA SSD, and HDD available, the right model is tiers rather than "pick one best drive":
Hot tier (NVMe): Database WAL and active tables, vector search indexes, model weight files. Writes need to land fast. Keep headroom: a 95%-full NVMe tanks random write performance as the controller exhausts spare blocks for wear-leveling. Maintain a minimum free floor (20 GiB is a reasonable default) and alert before you breach it.
Bulk tier (SATA SSD or HDD): Sealed audit segments, backup staging, large sequential uploads. SATA speed is sufficient for sequential workloads. If this tier is on spinning disk, the workload must be sequential, random I/O on HDD is a cliff, not a slope.
Archive tier (NAS/network storage): Permanent backup destination. A network-attached storage device (NFS or CIFS share) adds physical separation from the primary server: important for any failure that takes the whole box down. A properly configured NAS also provides WORM (Write Once, Read Many) semantics: written data becomes immutable for a retention period, so it cannot be deleted even by an administrator with full access.
NAS and NFS basics
A NAS device exports storage over the network. The most common Linux protocol is NFS: the server exports a directory; the client mounts it and interacts with files normally. NFS is simpler than it sounds, a NAS exporting /backups, a server mounting it at /mnt/nas-backups, and the backup job writing there exactly as if it were local.
The exam failure mode: stale file handles. If a NAS reboots or the export is briefly unavailable, open handles on the client become stale and operations hang or fail until the mount is refreshed. Robust backup code checks that the NAS is actually mounted before writing, and retries on transient errors.
Hands-on walkthrough: tier the storage, migrate the database
Step 1, Inventory the drives and map the tiers
lsblk -o NAME,TYPE,SIZE,ROTA,MOUNTPOINT
# ROTA=1 = spinning disk; ROTA=0 = flash (SSD or NVMe)
nvme list
# Lists NVMe drives with model, serial, size, firmwareOnce you know your drives, define the tier layout explicitly:
# storage_tiers.yaml (essential structure)
tiers:
nvme_hot:
root: "/var/lib/ai-server/nvme"
min_free_bytes: 21474836480 # 20 GiB alert threshold
bulk_shared:
root: "/var/lib/ai-server/bulk"
min_free_bytes: 10737418240 # 10 GiB
nas_archive:
root: "/mnt/nas"
min_free_bytes: 0 # operator policy
purposes:
pg: { tier: nvme_hot, subpath: "pg" }
vector: { tier: nvme_hot, subpath: "vector" }
models: { tier: nvme_hot, subpath: "models" }
audit_active: { tier: nvme_hot, subpath: "audit/active" }
audit_sealed: { tier: bulk_shared, subpath: "audit/sealed" }
audit_worm: { tier: nas_archive, subpath: "audit/worm" }
backup: { tier: nas_archive, subpath: "backup" }
wal_ship: { tier: nas_archive, subpath: "pg-wal" }Pattern: hot tier for random I/O (database, vector indexes, model weights), bulk for sealed/cold data, NAS for backup and immutable audit copies.
Step 2, Mount the NFS share
sudo apt-get install -y nfs-common
showmount -e 192.168.1.50 # verify the NAS is exporting the share
sudo mkdir -p /mnt/nas
sudo mount -t nfs 192.168.1.50:/exports/backups /mnt/nas
# Persistent across reboots — add to /etc/fstab:
# 192.168.1.50:/exports/backups /mnt/nas nfs defaults,_netdev,soft,timeo=30 0 0_netdev tells the OS to wait for the network before mounting, without it, a boot-time NFS failure can hang the boot sequence. soft,timeo=30 prevents NFS operations from hanging indefinitely when the NAS is temporarily unreachable.
Step 3, Migrate the database to the NVMe hot tier
Stop the stack, copy the data, redirect the bind mount, restart.
# Confirm where data lives and how much there is
docker volume inspect appserver-postgres-data --format '{{ .Mountpoint }}'
docker exec appserver-postgres du -sh /var/lib/postgresql/data
# Take a dump before touching anything
docker exec appserver-postgres pg_dumpall -U appuser \
| gzip > /tmp/pre-migration-backup.sql.gz
# Stop the stack
sudo systemctl stop ai-backend ai-frontend
# Copy to NVMe — rsync preserves permissions, ACLs, and hardlinks
OLD_PATH=$(docker volume inspect appserver-postgres-data --format '{{ .Mountpoint }}')
sudo mkdir -p /var/lib/ai-server/nvme/pg/data
sudo rsync -aHAX --info=progress2 "$OLD_PATH/" /var/lib/ai-server/nvme/pg/data/
# Remove old volume metadata so Docker accepts the new bind-mount definition
docker volume rm appserver-postgres-data
# Set the new data path and restart
echo 'POSTGRES_DATA_PATH=/var/lib/ai-server/nvme/pg/data' \
| sudo tee -a /opt/ai-server/.env
sudo systemctl start ai-backend ai-frontendTwo points worth noting: take the dump before touching anything: even a non-destructive migration can go wrong mid-copy. And the explicit docker volume rm is necessary: Docker silently ignores new driver options for an existing volume name, so without removing the old registration the bind-mount definition never takes effect.
Step 4, Configure the NFS backup target
The backup job must verify the NAS is mounted before writing, otherwise it writes to an empty local directory at the mount point path, reports success, and the backup never reaches the NAS:
class NFSBackupTarget:
def __init__(self, mount_point: str, backup_subdir: str = "ai-server-backups",
retry_count: int = 3, retry_delay: float = 1.0):
self.mount_point = Path(mount_point)
self.backup_dir = self.mount_point / backup_subdir
self.retry_count, self.retry_delay = retry_count, retry_delay
def _is_mounted(self) -> bool:
if not self.mount_point.exists():
return False
return str(self.mount_point) in Path("/proc/mounts").read_text()
async def write_backup(self, filename: str, data: bytes) -> None:
for attempt in range(self.retry_count):
if not self._is_mounted():
raise RuntimeError(f"NAS not mounted at {self.mount_point}")
try:
self.backup_dir.mkdir(parents=True, exist_ok=True)
(self.backup_dir / filename).write_bytes(data)
return
except (OSError, IOError) as exc:
if "Stale file handle" in str(exc):
await asyncio.sleep(self.retry_delay * (attempt + 1))
else:
raiseThe /proc/mounts check is the key: the mount-point directory exists and is readable even when nothing is mounted there, so a path existence check alone is insufficient.
Verification: confirm placement and performance
# Confirm Postgres is on NVMe
docker inspect appserver-postgres --format '{{json .Mounts}}'
# Look for "Type":"bind", "Source":"/var/lib/ai-server/nvme/pg/data"
findmnt /var/lib/ai-server/nvme
# Expected: SOURCE=/dev/nvme0n1p1, FSTYPE=ext4
docker exec appserver-postgres pg_isready -U appuser -d appdb
# Expected: accepting connections
# Throughput spot-check — NVMe vs. bulk tier
dd if=/dev/zero of=/var/lib/ai-server/nvme/testfile bs=1M count=1024 oflag=direct 2>&1 | grep -E "MB/s|GB/s"
dd if=/dev/zero of=/var/lib/ai-server/bulk/testfile bs=1M count=1024 oflag=direct 2>&1 | grep -E "MB/s|GB/s"
rm /var/lib/ai-server/nvme/testfile /var/lib/ai-server/bulk/testfile
# NVMe should be 15-50× faster sequential; gap on random I/O is larger still.
# If numbers are close, recheck that the M.2 slot is wired for PCIe.
# Confirm NFS backup mount
grep nas /proc/mounts
echo "backup-test $(date)" | sudo tee /mnt/nas/ai-server-backups/mount-test.txt🎯 What the exam asks >- NVMe vs. SATA SSD is a near-certain question. The exam distinguishes them by interface (PCIe/NVMe vs. SATA) and speed. An M.2 slot can hold either; know that M.2 SATA ≈ 550 MB/s and M.2 NVMe = 3,000+ MB/s. "NVMe" is the software protocol; the electrical interface is PCIe. A question describing "a drive in an M.2 slot using the PCIe interface" is describing an NVMe drive.- RAID level identification appears as "best for this scenario." Drill the table above: 0=fast/no-redundancy, 1=mirror/50%-capacity, 5=parity/N-1- capacity/three-drive-minimum, 10=mirror+stripe/50%-capacity/four-drive- minimum. The exam asks which level provides the least fault tolerance (RAID 0) and which maximizes usable capacity across 4 drives while maintaining redundancy.- RAID failure symptoms (1101 5.3): "degraded" means one drive has failed but the array is still functioning on the remaining drives (true of RAID 1, 5, 10: not RAID 0). A degraded RAID 5 or 10 has zero remaining redundancy; a second failure means total loss. "Drive is clicking" or "system reports drive failure" = replace immediately.- NVMe thermal throttling (1101 5.3): under sustained load with inadequate airflow, an NVMe drive drops to a lower performance state to protect itself. Symptoms: starts fast, degrades over minutes, recovers after idle. Fix: M.2 heatsink or better airflow: not a drive replacement.- NAS/NFS in context: NFS = Linux/Unix native; SMB/CIFS = Windows native. "Shared storage accessible by multiple Linux machines over the network" = NAS + NFS.- RAID is not a backup: explicit exam concept. A mirrored array where both drives are corrupted by a software bug gives you two perfect copies of corrupted data. The correct answer always involves a separate backup strategy.
Common pitfalls
- Treating RAID as backup. RAID protects against a drive dying, not data corruption or deletion. A RAID 1 array where a runaway migration corrupts the database gives you two perfect copies of corrupted data. Backups go to a separate physical location.
- Mismatched RAID member sizes. The array clips every member to the smallest drive's capacity. Three drives at 2 TB, 2 TB, and 4 TB in a RAID 5: all are treated as 2 TB → 6 TB raw → minus one drive's worth for parity → 4 TB usable. The extra 2 TB on the large drive is wasted. Match sizes when building an array: this is both a cost trap and an exam concept.
- NVMe thermal throttling mistaken for a failing drive. Performance starts fast and degrades over 5–15 minutes of heavy load, then recovers after idle. Check
nvme smart-log /dev/nvme0: risingtemperatureandthermal_mgmt_temp1_trans_countconfirm throttling. Add an M.2 heatsink; don't replace the drive until you've ruled this out. - Filling the hot tier. NVMe write performance degrades sharply as the drive approaches full: the controller runs out of empty blocks for wear-leveling. A full NVMe can drop from 3,000 MB/s to 200 MB/s. The
min_free_bytesthreshold in the tier config alerts before the hot tier fills. - Default Docker volumes landing on the root disk. Docker's default named volumes land wherever the root filesystem is: often a slow HDD or small SATA SSD. Any service with significant I/O needs an explicit bind mount to the correct tier. Don't trust defaults.
- NFS stale file handles failing silently. Without a mount check, the backup job writes to an empty local directory at the mount point path and reports success. Always verify the mount is active before writing (see
_is_mounted()above); always read back to confirm the write landed. - M.2 SATA mistaken for NVMe. Both use the M.2 form factor. The drive and the slot must both support NVMe for PCIe speeds.
nvme listshows NVMe drives;lsblkshows SATA SSDs assdXblock devices. A fast NVMe showing SATA-class throughput means the slot is SATA-only M.2.
Recap + what's next
You diagnosed a database crawling on the wrong disk, laid out a three-tier storage architecture, NVMe for random-access-heavy (database, vector indexes, model weights), bulk HDD/SSD for sealed/cold data, NAS for backups and immutable archive, and migrated the database with a repeatable runbook. You know your RAID levels cold and why RAID and backup are two completely separate things.
The hot tier is now pulling its weight.
Next up: Part 3: "From Cables to Certificates: Networking a Multi-Node AI Cluster." A single box is a good start, but inference throughput scales with hardware, and the next hardware arrives as a second node. Getting two machines to work together means starting at the physical layer (cables and switch ports), moving up through IP addressing and DNS resolution, and ending at the reason a browser trusts the HTTPS endpoint while the two nodes trust each other through a completely different certificate chain. See you there.
