[Linux] 07 Linux Storage & Monitoring


Learning Objectives: Block Devices & Disk Usage 🎯

  • 1. Visualizing the Disk Tree Hierarchy (디스크 트리 한눈에 보기) 📊
    • Objective: Learn how to use the lsblk command to map out the system’s block devices in an intuitive tree structure.
    • Core Competency: Understand standard disk-naming rules to instantly identify the differences between whole raw physical disks (like /dev/sda) and individual partitions carved out of them (like /dev/sda1).
  • 2. Identifying Filesystem Types & Unique UUIDs (파일시스템·UUID 식별) 🆔
    • Objective: Master tracking commands like lsblk -f or blkid to reveal hidden attributes of active partitions.
    • Core Competency: Extract the partition’s functional filesystem type and its uniquely generated UUID (Universally Unique Identifier), establishing the baseline information needed to pinpoint and map individual drives permanently.
  • 3. Navigating “Whole Storage Room” vs. “Single Folder” Space (df -h vs. du -sh) 🗺️📁
    • Objective: Differentiate between broad filesystem usage data and micro-level directory volume calculations.
    • Core Competency: Run df -h to see the total storage landscape per mount point (“the whole room view”) while utilizing du -sh to audit localized folder weights (“the specific box view”) to target and eliminate storage-clogging directories.

Storage Devices

💡 Core Definition: Computer’s “Chest of Drawers”

A storage device is a piece of hardware containing data cells that do not lose their content even when the computer’s power is completely turned off (secondary memory). While humans categorize storage by their physical form factor or media type, Linux unifies them under a single concept known as a Block Device.

  • Block Device Mechanism: Hardware that reads and writes data exclusively in fixed-size chunks or “blocks”.
  • The Linux Philosophy: In Linux, all block devices are abstracted into the filesystem tree and represented as raw file nodes inside the /dev/ directory. This follows the foundational Linux design maxim: “Everything is a file” (including your hard drives).

🔍 The 3 Main Classes of Storage Hardware

The lesson breaks storage down into three distinct tiers based on mechanical behavior and practical server placement:

  • 1. HDD (Hard Disk Drive) — The Magnetic Spinner 💿
    • How it works: Uses an internal mechanical motor to spin magnetic platters combined with a moving read/write head pin.
    • Role: Provides massive amounts of storage capacity at low financial costs, making it ideal for deep file archiving and systemic backups.
    • Drawback: Offers slow random access speeds and is highly susceptible to physical drops or shocks due to moving parts.
  • 2. SSD / NVMe — The Flash Solid-State 🎛️
    • How it works: Constructed completely from silicon flash memory chips with zero moving physical components, ensuring quiet and fast performance.
    • Role: Serves as the primary main drive hosting the core Operating System (OS), heavy production applications, and critical databases.
    • NVMe Variant: Plugs directly into the high-bandwidth PCIe slots, acting as the fastest tier of accessible flash storage available.
  • 3. USB / SD Card — The Removable Media 🔌
    • How it works: Detachable flash hardware modules characterized by their dynamic behavior.
    • Role: Used primarily for operating system installation media or moving photographic data backups.
    • Linux Treatment: The operating system tracks these fluidly—the drive profile registers under a path like /dev/sdb as soon as it is plugged in and vanishes from the /dev/ tree the moment it is unplugged.

Device Naming Rules

💿 1. SCSI / SATA & USB Media Family Rule

For standard hard disks (HDDs), solid-state drives (SSDs) plugged via SATA, and removable external USB sticks, the system allocates names using the /dev/sd[a-z][Number] layout: \[\text{\bf /dev/s\ \ \color[rgb]{0.1, 0.6, 0.1}d\ \ \color[rgb]{0.5, 0.0, 0.5}a\ \ \color[rgb]{0.8, 0.4, 0.0}1}\]

  • /dev/: The system hardware device folder path.
  • sd: SCSI Disk abstraction identifier.
  • a: The Disk Order (Alphabetical notation index). Drives are lettered chronologically based on discovery order (a = 1st disk, b = 2nd disk, c = 3rd disk).
  • 1: The Partition Slice Number (1, 2, 3, etc.). No trailing number means you are targeting the entire raw block device container.

🔍 Practical Naming Index:

  • /dev/sda: The entire block space of your first physical main drive.
  • /dev/sda1: The first partition slice carved out inside drive a.
  • /dev/sdb: A secondary hard drive or an external flash drive plugged into a USB port.
  • /dev/sdb2: The second partition slice container resting on drive b.

⚠️ System Administrator Warning: Drive sequencing letters are dynamically assigned at boot. Depending on controller detection speeds or the exact sequence you plug your external USB drives into a hub, /dev/sda and /dev/sdb could swap roles. To prevent boot errors, never use raw sd names in permanent configuration tables like /etc/fstab—always reference unique partition UUIDs instead.

🎛️ 2. High-Performance NVMe (M.2) Family Rule

Modern, ultra-fast solid-state modules attached straight to motherboard PCIe buses follow a controller-oriented naming syntax: /dev/nvme[Controller]n[Namespace]p[Partition]: \[\text{\bf /dev/nvme\ \ \color[rgb]{0.1, 0.6, 0.1}0\ \ \color[rgb]{0.5, 0.0, 0.5}n1\ \ \color[rgb]{0.8, 0.4, 0.0}p1}\]

  • nvme0: Identifies hardware controller index number 0.
  • n1: Tracks virtual storage memory space allocation (Namespace 1). Think of this as equivalent to a distinct drive unit.
  • p1: Partition Number 1. Notice that the letter “p” must be explicitly written out ahead of the partition digit to distinguish it from the namespace suffix.

🔍 NVMe Mapping Examples:

  • /dev/nvme0n1: The complete raw storage volume belonging to the first physical NVMe board slot.
  • /dev/nvme0n1p1: The first boot or application partition mapped out inside that specific NVMe drive unit.
  • /dev/nvme1n1: The complete raw block volume for a secondary NVMe flash card occupying slot 2 on the board.

Storage Auditing — df vs du

🏠 1. df (Disk Free) — The “Whole Room View”

  • Core Command Practice:Bash

      $ df -h
    
  • What it does: Inspects the entire active filesystem structure, summarizing the total capacity, used space, available space, and percentage utilization for every mounted block device partition.
  • How it calculates space: It reads the raw global statistics (metadata block tables) straight from the filesystem’s Superblock. Because it queries a pre-aggregated system index rather than scanning individual files, the command executes instantaneously, regardless of how many terabytes of data are stored.
  • The Analogy: Standing at the doorway of a storage room and guessing how full it is based on the general volume of the room’s remaining open footprint.
  • Crucial Flag: The h flag stands for Human-Readable. It shifts the default, unreadable byte block digits into clean, recognizable metrics like Gigabytes (G) or Megabytes (M).

📦 2. du (Disk Usage) — The “Specific Box View”

  • Core Command Practice:Bash

      $ du -sh [Target_Directory_Path]
    
  • What it does: Measures the specific storage weight of a designated folder or individual file tree.
  • How it calculates space: It recursively wanders down into the specified directory path, scanning every single inner subdirectory and totaling up the file sizes item by item. If you run it on a massive multi-terabyte directory, it can take considerable time to crawl and complete the scan.
  • The Analogy: Opening a specific cardboard box inside that storage room, picking up every item one by one, and adding up their individual weights on a scale.
  • Crucial Flags:
    • s (Summarize): Compresses the terminal output. Instead of listing hundreds of individual files inside a folder, it combines them into a single total number for that directory.
    • h (Human-Readable): Converts output figures into standard readable formats (K, M, G).

📊 Metric Breakdown Comparison

Attributedf (Disk Free) 🌍du (Disk Usage) 📂
Primary ScopeThe entire overall filesystem / device.A specific targeted directory or file.
Query MechanismSuperblock metadata index read (Instant).Recursive directory crawling (Can be slow).
Common Use CaseChecking if the main system root partition (/) is 100% full.Hunting for heavy log files or large directories to free up space.
Standard Practicedf -hdu -sh /var/log

⚠️ Real-World Discrepancy Note (Why the numbers might not match)

The slide points out an important system administrator quirk: Sometimes the numbers reported by df and du do not match.

If an application (like Nginx) is writing to a massive log file and a user runs rm large_file.log, du will immediately report that the folder size has shrunk because the file path was removed from the directory structure. However, if the Nginx process is still running and holding that file descriptor open, the operating system kernel won’t actually free the disk blocks yet. In this scenario, df will show that the disk is still full until you restart the application service.


Hunting Large Files with du+sort and find -size

📂 1. Pipeline A: Sorting Folders by Size (du + sort + head)

When a directory is bloated, running a simple du command prints a long, unorganized mess of files. To easily find the largest directories, you can chain multiple utilities together using pipe (|) operators:

Bash

$ sudo du -xh /var/log | sort -rh | head -n 10

⚙️ Breaking Down the Components:

  • du -xh /var/log: Measures disk space recursively starting inside the system logs folder.
    • h: Outputs metrics in human-readable notation (e.g., 2G, 45M).
    • x (One File System): Crucial safety flag. Instructs du to stay strictly within the current local storage partition. It prevents the scan from wandering into external mounted network drives, network shares, or virtual system structures (like /proc), which could slow down or lock up your terminal session.
  • sort -rh: Sorts the incoming text streams based on volume metrics.
    • h: Evaluates alphanumeric strings according to their true data scale (knowing that 2G is exponentially larger than 45M).
    • r (Reverse): Flips the default ascending order into descending order so that the heaviest items appear first.
  • head -n 10: Trims the terminal output stream to show only the top 10 largest folders, filtering out the noise.

🎯 2. Pipeline B: Targeting Specific Large Files (find -size)

While the du pipeline isolates heavy directory paths, the find command directly hunts down specific standalone files that exceed a set storage threshold across the operating system.

  • Core Command Practice:Bash

      $ sudo find /var/log -type f -size +100M -exec ls -lh {} \;
    

⚙️ Breaking Down the Components:

  • /var/log: Designates the search target directory.
  • type f: Narrows down the scan to check regular files only, ignoring directory entries and system shortcut links.
  • size +100M: Filters the search results to pull files strictly larger than 100 Megabytes. (Using a lowercase k matches Kilobytes, while a capital G targets Gigabytes).
  • exec ls -lh {} \;: Automatically executes a secondary command on every single file that matches the size criteria.
    • {}: Acts as a dynamic placeholder variable that find fills in with the matching file’s path.
    • \;: Informs the shell that the exec sequence has concluded.
    • Result: Instead of just displaying raw text names, this runs ls -lh on each target to show its exact timestamp, permissions, and human-readable file size.

💡 Core Summary Takeaway

  • Use the du pipeline when you want to scan the directory tree to find which application or log folder is hogging disk space.
  • Use the find command when you want to bypass the folder structure and instantly find individual files (like a rogue error.log or a forgotten .tar.gz backup) that are over a certain size.

Learning Objectives: File System & Mount 🎯

  • 1. Understanding the Core Concept of a Filesystem (파일시스템 개념 이해) 📂🏗️
    • Objective: Learn what a filesystem is and why raw storage hardware cannot be used by the operating system without it.
    • Core Competency: Understand how a filesystem acts as a logical data management framework, organizing unformatted blocks into a structured index of files and directories.
  • 2. Mastering the Mount Mechanism (마운트 개념 파악) 🔌🔗
    • Objective: Demystify the Mount (마운트) process, which connects an independent hardware storage device to a specific directory path within the unified Linux root (/) directory tree.
    • Core Competency: Understand the difference between Windows-style drive letters (e.g., C:, D:) and the Linux unified single-directory tree system, learning how the OS bridges hardware slices to software paths.
  • 3. Hands-On Storage Attachment Workflows (mount & umount) 🛠️⌨️
    • Objective: Learn the precise operational commands required to attach and detach storage devices safely.
    • Core Competency: Gain practical experience using the mount command to link a partition to a folder, and the umount command to safely unmount devices to prevent data corruption before hardware removal.

File System

💡 Core Definition: The Storage Bookcase

A raw, unformatted storage drive (like a brand-new SSD or HDD) is essentially just a massive, empty warehouse filled with millions of sequential data blocks. The operating system cannot naturally understand where one file begins and another ends in this raw space.

  • The Analogy: Think of a file system as installing a labeled bookcase or filing cabinet inside that empty warehouse. It defines how data is partitioned into distinct files, gives them names, records their creation dates, and manages their directory locations so they can be easily searched and accessed later.

🗂️ Linux vs. Windows File Systems

Different operating systems use different architectural frameworks to manage their filesystems:

  • Windows Frameworks: Typically relies on NTFS (for main system drives) or FAT32/exFAT (commonly used for cross-platform removable USB drives).
  • Linux Standard Frameworks: Primarily utilizes ext4 (Extended File System 4) as its default, high-performance standard. Other enterprise variants include XFS (highly efficient for massive servers with large data volumes) and Btrfs.

🔨 The Administrative Rule: Formatting

Because a disk cannot hold structured data without a file system, you must inject one before utilizing raw hardware. This administrative process is called Formatting (포맷).

In Linux, you format a partition by running the mkfs (Make Filesystem) utility:

Bash

$ sudo mkfs.ext4 /dev/sdb1
  • What it does: This command wipes any raw residual data on the /dev/sdb1 partition slice and lays down a brand-new, clean ext4 filesystem grid structure, making the drive officially ready to host directories and files.

Mount — “Plugging a Disk into a Directory”

💡 Core Analogy: The Compartment and the Window

Unlike Windows, which assigns completely separate root drive letters (like C:\ or D:\) to each physical piece of storage hardware, Linux integrates everything into a single, seamless directory tree originating from the absolute root (/). To interact with any hardware block device in this unified framework, you must perform a mount operation.

  • The Disk is a “Compartment” (칸): A raw or formatted partition container (e.g., /dev/sdb1) that safely holds data blocks inside the hardware, completely invisible to everyday user shell sessions.
  • The Directory is a “Window” (창문): An empty directory folder path residing on your existing filesystem layout (e.g., /mnt/data).
  • Mounting is the Bridge: The operational act of aligning a specific disk compartment with a designated directory window. Once connected, looking into or writing files inside that specific folder path means you are directly modifying the physical block sectors of the underlying hardware drive.

🗺️ The Architecture Mapping Flow

The lecture provides a step-by-step architectural breakdown of what happens under the hood when a device is mounted: \[\text{\bf Block Device File}\ \ (/dev/sdb1) \ \longrightarrow \ \mathbf{\big[MOUNT\big]} \ \longrightarrow \ \text{\bf Target Mount Point}\ \ (/mnt/data)\]

  1. Before Mount: The drive partition exists safely as a detached hardware layer, registered in the system device tree as /dev/sdb1. Users cannot browse or create documents inside it yet.
  2. Execution: The system administrator runs a manual link action to attach the device file to a targeted folder location.
  3. After Mount: The target folder path becomes the explicit entry point (Mount Point) for that unique physical storage card.

⚠️ Critical Precaution: Overwriting Directory Windows

An important warning is noted regarding the state of the target folder used as a mount point:

  • If an administrator mounts a new storage device onto a folder that already contains existing files, those pre-existing files do not get deleted or corrupted.
  • Instead, they are completely hidden from the system because the new drive’s block structure has layered directly over that folder.
  • The original content will remain completely inaccessible to users until the secondary device is safely unmounted (umount). To avoid confusion, always ensure a target mount point directory is entirely empty before mounting any device to it.

/etc/fstab

💾 /etc/fstab (File System Table) is a critical system configuration file that manages persistent, automatic block device storage mapping across system reboots.

💡 Core Definition: Permanent Storage Automation

When an administrator attaches and configures a storage partition using the manual mount command, the linkage is only temporary. The moment the operating system triggers a restart or sudden power cycle, the active kernel clears the mounting runtime state.

To prevent administrators from having to manually remount every secondary hard drive, network share, and swap space every single time the system boots, Linux relies on /etc/fstab. During the initial boot sequence, system processes parse this configuration matrix line-by-line and automatically execute the defined mounting rules.

📊 The 6-Column Syntax Layout

Each active mapping rule inside the /etc/fstab file is written as a single horizontal text line broken into six distinct configuration columns, separated by spaces or tabs:

Plaintext

# <file system>             <mount point>   <type>    <options>   <dump>  <pass>
UUID=a1b2c3d4-e5f6-7890...  /mnt/data       ext4      defaults    0       2

⚙️ Column Breakdown:

  1. <file system> (Device Identifier)
    • Specfies the target hardware device partition.
    • Best Practice: While you can use traditional paths like /dev/sdb1, it is highly recommended to reference the partition’s unique UUID (Universally Unique Identifier). This ensures the configuration remains stable even if hardware cables are switched or storage devices change order at boot time.
  2. <mount point> (Target Directory)
    • The explicit folder path inside the file tree where the device contents should be loaded (e.g., /, /home, /mnt/data).
  3. <type> (Filesystem Framework)
    • Identifies the exact structure format laid down during partitioning, such as ext4, xfs, ntfs, or vfat.
  4. <options> (Mount Settings)
    • Dictates permission behaviors and performance features. Using defaults bundles standard choices together, applying read-write access (rw), asynchronous data execution (async), and standard device file translation permissions.
  5. <dump> (Backup Flag)
    • A legacy utility flag deciding if the filesystem needs to be completely backed up using the old dump command. In modern environments, this is almost always set to 0 (disabled).
  6. <pass> (FSCK Integrity Check Order)
    • Instructs the fsck (File System Consistency Check) utility whether to scan the drive for file errors during startup.
      • 0: Skip integrity scanning completely.
      • 1: Highest priority validation (strictly reserved for the main OS root directory /).
      • 2: Secondary validation (used for everyday data volumes and non-boot storage extensions).

🛠️ Practical Workflow & Safety Test

Editing /etc/fstab incorrectly can corrupt the startup loop, causing the server to freeze in a emergency recovery shell because it cannot locate a missing or broken partition configuration.

To mitigate this risk, experienced administrators always perform a safety validation test after making modifications:

Bash

# 1. Edit the file securely using root privileges
$ sudo nano /etc/fstab

# 2. RUN THE SAFETY TEST BEFORE REBOOTING
$ sudo mount -a
  • sudo mount -a: This command instructs the system to look into /etc/fstab and immediately try to mount every entry that isn’t currently attached.
  • Why it is critical: If there is a syntax typo or an invalid UUID, the command will fail and report an error message directly to your current session, allowing you to fix it instantly. If it completes silently without printing errors, your configurations are verified as correct and it is perfectly safe to reboot the machine.

Mount Options — ro, noexec, and nosuid

By layering these standard options into manual mount flags or permanent /etc/fstab columns, you can harden the Linux filesystem against potential exploits.

🛡️ Core Security Options Breakdown

1. ro (Read-Only) 🛑✍️

  • What it does: Locks the target filesystem down completely into an immutable, read-only state.
  • Security Behavior: Any subsequent administrative or software operations trying to write, edit, delete, or append files onto the device blocks will instantly fail with a structural Read-only file system system error.
  • Ideal Use Case: Critical system rescue media, sensitive core asset archives, or system boot folders that should remain completely unchanged during standard operation.

2. noexec (No Execute) 🚫🏃‍♂️

  • What it does: Disallows the operating system kernel from directly running any standalone binary file or script stored inside that specific storage volume.
  • Security Behavior: Even if a rogue user sets execution permissions (chmod +x rogue_script.sh) on a malicious asset inside that drive, attempting to activate it will throw a strict Permission denied block.
  • Ideal Use Case: Public multi-user upload spaces, shared data directories, or storage areas where only raw document media, images, or configuration strings should reside.

3. nosuid (No Set-User-ID) ⚠️🎭

  • What it does: Explicitly strips away and ignores the special SUID (Set-User-ID) and SGID execution privileges on all executable binaries stored inside that filesystem.
  • Security Behavior: Under normal circumstances, an executable with the SUID attribute runs with the elevated permission layer of the owner of the file (often root), rather than the limited permissions of the calling user. Turning on nosuid forces every single executable inside that volume to stay bound to the regular user’s limited sandbox scope, blocking privilege escalation vectors.
  • Ideal Use Case: Removable external storage expansions (like USB flash drives) or mounted local client partitions, ensuring untrusted files brought onto the system cannot secretly seize root shell control.

⌨️ Administrative Command Application

These flags are passed as a comma-separated block using the standard -o (Options) flag during execution:

Bash

# Securely attach a partition so files can only be read, with no executable permissions allowed
$ sudo mount -o ro,noexec /dev/sdb1 /mnt/secure_data

To make these secure boundaries permanent across hardware reboots, they are added straight into the 4th column of your system tables configuration index:

Plaintext

# /etc/fstab entry layout sample
# <file system>    <mount point>       <type>    <options>          <dump>  <pass>
UUID=xyz-123...    /home/user/uploads  ext4      noexec,nosuid      0       2

Learning Objectives: Disk Addition 🎯

  • 1. Standard 4-Step Provisioning Framework (디스크 추가의 4단계 이해) 🛠️📦
    • Objective: Master the mandatory structural sequence required to safely onboard any new raw storage hardware into a running Linux system.
    • Core Competency: Memorize and understand the dependency chain of the four pillars: \[\text{1. Identify Device } (\texttt{lsblk}) \rightarrow \text{2. Partition } (\texttt{fdisk}) \rightarrow \text{3. Format } (\texttt{mkfs}) \rightarrow \text{4. Mount } (\texttt{mount})\]

  • 2. Partition Creation via fdisk (fdisk를 이용한 파티션 생성) 💾⚙️
    • Objective: Learn how to use the interactive terminal partitioning utility, fdisk, to carve up a raw physical block device into distinct, manageable logical sections.
    • Core Competency: Master the core sub-commands inside the interactive fdisk interface (such as printing partition tables, creating new partitions, and saving changes to the disk sector).
  • 3. Filesystem Formatting & Verification (파일시스템 생성 및 마운트) 🗂️🔗
    • Objective: Bridge the gap between raw hardware partitions and user-accessible directory trees.
    • Core Competency: Learn how to deploy a clean file grid using mkfs.ext4, establish a stable target mount point, link them together, and run df -h to verify that the newly added capacity is officially active and available for system data storage.

Adding a Virtual Disk to VirtualBox / VMware

📦 1. The VirtualBox GUI Attachment Workflow

For systems running on Oracle VM VirtualBox, the lecture dictates a sequence of mouse selections to register a secondary block drive:

  1. Power Off State: You must completely turn off the target Virtual Machine (VM) before modifying its hardware layer allocations.
  2. Access Storage Settings: Right-click on your selected guest VM system, select Settings (설정), and navigate over to the Storage (저장소) sidebar menu.
  3. Locate Controller: Look inside the storage controller tree layout to locate your active Controller: SATA (컨트롤러: SATA) category.
  4. Append New Drive Attachment: Click on the tiny “Add Hard Disk” (하드 디스크 추가 아이콘) icon positioned right beside the SATA line entry.
  5. Carve Out Virtual Grid File: Choose Create (생성) to spawn a brand new virtual data plate. Choose the standard VDI option, specify your target size boundary (e.g., 5 GB), and select Dynamically Allocated (동적 할당) so the file only consumes real host space as data is loaded into it.
  6. Finalize: Select your newly minted disk from the register screen and hit OK.

⚙️ 2. The VMware Workstation GUI Attachment Workflow

For systems running on VMware Workstation / Player Pro, the steps are quite similar but follow VMware’s distinct menu lexicon:

  1. Power Off State: Safely shut down the target Linux guest VM.
  2. Access Hardware Settings: Open your VM profile view and click on Edit virtual machine settings (가상 머신 설정 편집).
  3. Trigger Hardware Addition Wizard: At the bottom edge of the Hardware overview tab page, click on the Add… (추가) option button.
  4. Select Component Type: Choose Hard Disk inside the wizard checklist window pane and click Next.
  5. Choose Controller Interface Node: Select the recommended connection interface structure layout—typically SCSI or SATA.
  6. Allocate Container Type: Choose Create a new virtual disk (새 가상 디스크 생성), dial in the specified size parameter boundaries (e.g., 5 GB), and save the matching configuration changes.

🔍 3. Post-Boot Administrative Verification via Terminal

Once the hypervisor settings are successfully saved, boot up your target Linux guest operating system. Before you can format or save items to the expansion drive, you must confirm that the kernel has discovered the new hardware allocation.

Open up your shell terminal terminal window screen and invoke the block listings utility tool:

Bash

$ lsblk

📊 Anticipated Shell Output Structure:

Plaintext

NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda       8:0    0   20G  0 disk
└─sda1    8:1    0   20G  0 part /
sdb       8:16   0    5G  0 disk
  • Analyzing Your Terminal Readout: Your primary operating system root block file directory system mapping profile is labeled as sda (20 GB).
  • The New Hardware: If your virtualization mapping sequence was successfully executed, a fresh unallocated drive entry block designated as sdb showing an explicit capacity rating of exactly 5G will be displayed. It shows up as a raw disk component type without any child partition limbs or directory attachment links. This clean, empty template is now ready for the next phase: fdisk partitioning.

Partitioning via fdisk

💡 Core Definition: Carving Up Raw Space

When you attach a brand-new 5 GB virtual disk (like /dev/sdb), it exists as an unorganized, raw block device. You cannot format it or store files directly onto it yet. Partitioning is the process of carving up that single physical drive into independent, isolated logical sections (partitions), which the operating system treats as separate individual drives.

  • The Primary Tool: Linux uses the fdisk utility to create, view, and modify partition tables on standard storage drives.
  • Targeting the Raw Disk: To start partitioning, you pass the path of the entire raw block device (not a partition number) to fdisk with root privileges:Bash

      $ sudo fdisk /dev/sdb
    

⌨️ Interactive fdisk Internal Command Menu

Running fdisk opens an interactive command-line interface rather than executing a quick, one-off task. Once inside, the utility waits for you to type specific single-letter sub-commands to guide the wizard:

  • p (Print) 📋: Displays the current partition table layout of the target disk, showing existing partitions, their start/end sectors, and total block capacities.
  • n (New) ➕: Initiates the wizard to create a brand-new partition slice.
  • d (Delete) ❌: Deletes an existing partition entry from the disk’s index table.
  • w (Write) 💾: Crucial Command. Saves all the architectural changes you have made during the session and writes them permanently to the disk’s partition table sector, before automatically exiting back to the shell terminal.
  • q (Quit) 🚪: Quits the fdisk utility instantly without saving any changes. If you make a mistake during the creation process, typing q safely resets the drive to its original state.

🛠️ Step-by-Step Creation Sequence (n Menu Workflow)

When you type n to carve out a new partition, fdisk will prompt you with a sequence of configuration choices:

  1. Select Partition Type: Choose between p (Primary) or e (Extended). Under standard setups, choosing p for a primary partition is the norm.
  2. Partition Number: Select a standard tracking index digit (typically 1 if it is the first partition on the drive).
  3. First Sector: Defines where on the disk platters the partition physically begins. Press Enter to accept the default value, which automatically picks the earliest available open sector block.
  4. Last Sector (Size Specification): Defines where the partition ends. Instead of calculating sector blocks manually, you can use a user-friendly size notation:
    • To use the entire remaining capacity of the drive, simply press Enter.
    • To explicitly specify a size (e.g., a 2 Gigabyte partition), type +2G.
  5. Commit the Changes: After the wizard confirms the creation of the new partition profile, you must type w and hit Enter to flush the new map onto the hardware.

🔍 Post-Partitioning Verification

Once you return to the standard bash terminal, verify that the kernel recognizes the new logical partition layout by running the block listing command:

Bash

$ lsblk

📊 Expected Output Structure:

Plaintext

NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sdb       8:16   0    5G  0 disk
└─sdb1    8:17   0    5G  0 part
  • The Result: You will now see your 5 GB raw sdb drive tree has spawned a brand-new branch beneath it labeled sdb1 of type part (partition). This logical partition slice is now ready for Step 3: Filesystem Formatting.

Filesystem Formatting via mkfs

💡 Core Definition: Building the Storage Grid

A newly carved partition is like an empty plot of land with no architectural structure. The operating system does not know how to map names, boundaries, or index properties to the raw sector blocks.

Formatting is the administrative process of laying down a filesystem grid onto that logical partition. This grid establishes the foundational metadata tables, directory indexing rules, and data sectors required to organize future files.

⌨️ The Administrative Command Syntax

In Linux, formatting is performed using variations of the mkfs (Make Filesystem) utility. To format a partition with the standard Linux native filesystem, you execute the command with root privileges, specifying the target partition block path:

Bash

$ sudo mkfs.ext4 /dev/sdb1

⚙️ Key Elements of the Command:

  • mkfs.ext4: Specifies the creation of an ext4 (Extended File System 4) grid structure, which is the default, high-performance standard filesystem for modern Ubuntu and Debian-based Linux environments.
  • /dev/sdb1: Crucial Target Precision. You must specify the exact logical partition slice (sdb1), not the raw physical disk parent (sdb). Formatting the raw parent disk instead of the partition can corrupt the partition table structure you built in the previous step.

🔍 Post-Formatting Verification

Once the formatting tool finishes constructing the filesystem blocks, you can verify that the partition successfully holds its new file structure.

Run the block listing command with the -f (Filesystem info) flag appended:

Bash

$ lsblk -f

📊 Expected Output Structure:

Plaintext

NAME    FSTYPE   FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdb
└─sdb1  ext4     1.0         a1b2c3d4-e5f6-7890-abcd-1234567890ef
  • Analyzing the Output Attributes:
    • FSTYPE: Shows ext4, proving that the partition is no longer a raw unformatted block and now contains an active Linux filesystem.
    • UUID: Displays a unique, 36-character string automatically generated during formatting. This identifier stays permanently bound to this specific file grid, allowing you to reliably map the drive inside /etc/fstab later.
    • MOUNTPOINTS: Appears blank. This indicates that while the disk is structured and ready to receive files, it is still detached from the operating system directory tree, leading directly into the final provisioning step: Mounting.

Permanent Mounting

Using the standard sudo mount /dev/sdb1 /mnt/new_disk command only creates a runtime configuration inside the active volatile memory (RAM). The moment the Linux server restarts or undergoes a power cycle, this relationship is wiped out, and the drive becomes detached and invisible again.

To make the connection permanent, the system administrator must write the layout details into the /etc/fstab (File System Table) configuration registry. During startup, the Linux kernel parses this table line-by-line and automatically remounts the drives.

🛠️ Step-by-Step Implementation Workflow

To permanently mount a new disk, follow this strict command-line sequence:

Step 1: Create a Dedicated Mount Point Folder 📁

You must establish an empty target directory to serve as the gateway for your drive’s file contents.

Bash

$ sudo mkdir /mnt/data

Step 2: Extract the Unique Partition UUID 🆔

Never use raw names like /dev/sdb1 in permanent configuration tables because disk ordering letters can dynamically shift during boot sequences. Instead, use the UUID (Universally Unique Identifier), which is uniquely stamped onto the partition during formatting.

Bash

$ lsblk -f

Copy the exact alphanumeric UUID string listed under your target partition line (e.g., a1b2c3d4-e5f6...).

Step 3: Edit the File System Table Registry 📝

Open the configuration file with root privileges:

Bash

$ sudo nano /etc/fstab

Append a brand-new configuration line at the very bottom of the document using the 6-column syntax schema:

Plaintext

# <file system>             <mount point>   <type>    <options>   <dump>  <pass>
UUID=your-copied-uuid-here  /mnt/data       ext4      defaults    0       2

⚠️ Critical Precaution: The Safety Verification Test

Making a typo or pasting an invalid UUID inside /etc/fstab is dangerous. If the Linux kernel fails to find a required drive during the startup sequence, it will halt the boot loop entirely, lock up the server, and drop into an emergency maintenance shell.

To completely avoid this risk, always execute the safety validation test before restarting the machine:

Bash

$ sudo mount -a
  • How it works: The a (All) flag forces the system to immediately parse /etc/fstab and try mounting every entry listed that isn’t currently attached.
  • Evaluating the Result:
    • If the command encounters an error or a structural syntax typo, it will print a warning on your current terminal screen, allowing you to fix it instantly.
    • If it finishes silently without printing any errors, your configuration is verified as correct, and it is perfectly safe to reboot the operating system.

Learning Objectives: Software RAID 🎯

  • 1. Understanding RAID Typology & Multi-Disk Architectures (RAID 개념 및 종류 이해) 🎛️⛓️
    • Objective: Learn the core engineering concepts behind RAID (Redundant Array of Independent Disks) and distinguish how it groups multiple physical drives into a single logical volume.
    • Core Competency: Master the structural, performance, and data-redundancy trade-offs between the primary RAID tiers: RAID 0 (Striping), RAID 1 (Mirroring), and RAID 5 (Striping with Distributed Parity).
  • 2. Array Management via the mdadm Utility (mdadm 명령어 사용법 숙지) 🛠️⌨️
    • Objective: Gain hands-on familiarity with mdadm (Multiple Disk Administrative utility), the standard Linux command-line tool used to configure and monitor software RAID arrays.
    • Core Competency: Learn the specific syntax patterns required to initialize a new array container, monitor its real-time rebuilding sync status, and safely stop or assemble running multi-disk volumes.
  • 3. Implementing Storage Redundancy & Drive Failover Workflows (디스크 장애 대응) 🚨💾
    • Objective: Simulate real-world storage hardware failures to understand how a redundant system maintains continuous uptime without data loss.
    • Core Competency: Learn how to logically fail a broken disk drive partition out of an active array, hot-swap a clean replacement drive into the empty slot, and verify that the filesystem rebuilds its data parity cleanly.

What is RAID?

💡 Core Definition: Strength in Numbers

In a server environment, relying on a single large hard drive creates a Single Point of Failure (SPOF)—if that individual drive physically breaks, the entire server crashes and data is permanently lost. Additionally, a single drive is limited by its mechanical read/write speed.

RAID solves this by grouping several independent physical disks together. The Linux operating system abstracts this cluster, hiding the individual hardware units and presenting them to users and applications as one massive, unified logical disk drive (typically registered as device nodes like /dev/md0).

🛠️ Hardware RAID vs. Software RAID

The lecture divides RAID implementations into two distinct engineering architectures based on where the processing workload occurs:

  • 1. Hardware RAID (하드웨어 RAID) 🔌
    • Mechanism: Relies on a dedicated, physical RAID controller expansion card plugged directly into the motherboard’s PCIe slot. This card features its own specialized processor chip and high-speed cache memory.
    • Pros: Because all complex data calculation rules and parity distributions happen entirely inside the dedicated controller hardware, it places zero performance overhead on the host computer’s main CPU. It is highly stable and fast.
    • Cons: Highly expensive and creates hardware vendor lock-in; if the controller card itself breaks, you often must purchase the exact same model to recover your data.
  • 2. Software RAID (소프트웨어 RAID) ⚙️
    • Mechanism: Requires no extra hardware components. Instead, the host operating system kernel uses pure software logic to group standard storage partitions together. In Linux, this is driven by the native md (Multiple Devices) kernel driver and managed via the mdadm command utility.
    • Pros: Cost-effective and highly flexible. Because it is entirely software-defined, you can easily migrate the drives to a completely different computer running Linux and assemble the array instantly without needing proprietary hardware.
    • Cons: Consumes a portion of the host system’s CPU cycles and RAM to compute disk operations, which can slightly impact overall server performance during heavy read/write workloads.

📊 Summary Takeaway

While Hardware RAID remains dominant in high-end, legacy enterprise data centers, Software RAID provides an incredibly accessible, flexible, and robust alternative for modern cloud instances, home labs, and virtualized development sandboxes.


RAID Levels 0, 1, and 5

🏎️ 1. RAID 0 (Striping — “Speed First”)

  • How it works: Data streams are broken down into chunks and spread evenly across all disks in the array simultaneously.
  • Minimum Disks Required: 2 disks.
  • Storage Capacity: $N \times \text{Size of smallest disk}$ (100% space efficiency; zero space wasted on safety backups).
  • Pros: Maximum performance. Because the system reads and writes to all disks at the same time, the throughput speed scales directly with the number of drives.
  • Cons: Zero Fault Tolerance. If even a single disk fails, the entire array is destroyed, and all data is permanently lost. It should never be used for critical server data.

🛡️ 2. RAID 1 (Mirroring — “Safety First”)

  • How it works: Every piece of data written to the primary disk is simultaneously cloned (mirrored) onto a secondary backup disk in real time.
  • Minimum Disks Required: 2 disks.
  • Storage Capacity: $\text{Size of smallest disk}$ (50% space efficiency; half the drive space is hidden for safety duplication).
  • Pros: High Fault Tolerance. If one disk physically dies, the system continues running smoothly off the surviving mirror disk without any downtime or data loss.
  • Cons: Poor storage efficiency. Buying two 2TB drives only yields 2TB of usable space. Write speeds are also limited to the speed of a single drive.

⚖️ 3. RAID 5 (Distributed Parity — “The Sweet Spot”)

  • How it works: Data chunks and mathematical parity (error-correcting codes) are striped alternatingly across all participating disks. No single disk is dedicated to parity; it is distributed evenly.
  • Minimum Disks Required: 3 disks.
  • Storage Capacity: $(N - 1) \times \text{Size of smallest disk}$ (The capacity of exactly one disk’s worth of space is allocated for parity calculations).
  • Pros: Balances excellent read performance with efficient data redundancy. It can survive the complete failure of any single disk. If a drive dies, the missing data chunks are reconstructed on-the-fly using the remaining parity data blocks.
  • Cons: Write speeds are slower compared to RAID 0 because the system must calculate and rewrite parity codes for every single write transaction.

📊 Metric Matrix Comparison

AttributeRAID 0 🏎️RAID 1 🛡️RAID 5 ⚖️
Core GoalExtreme PerformanceComplete DuplicationBalanced Efficiency
Min. Disks223
Max. Faults Allowed0 (Any failure breaks the array)1 disk1 disk
Usable Space Formula$N \times S$$1 \times S$$(N - 1) \times S$
Best Used ForTemporary scratch spaces, caches.Operating system boot drives.File servers, database backends.

Array Management via mdadm

Page 36 details the usage of mdadm (Multiple Disk Administrative utility), the standard command-line tool used in Linux to build, manage, and monitor software RAID arrays.

⚙️ 1. Building a RAID Array (-create)

To group independent disk partitions into a single operational RAID structure, you initialize the device using the create context syntax:

Bash

$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1

🛠️ Breaking Down the Parameters:

  • -create /dev/md0: Instructs the utility to construct a brand-new virtual RAID block device named /dev/md0. This node acts as the unified entry point for the array.
  • -level=1: Specifies the targeted RAID architecture tier. (e.g., set -level=0 for speed striping, -level=1 for mirror duplication, or -level=5 for distributed parity).
  • -raid-devices=2: Defines the exact count of active physical partition slices participating in the array.
  • /dev/sdb1 /dev/sdc1: Passes the explicit device path locations of the underlying disk storage partitions assigned to make up the volume.

🔍 2. Auditing Real-Time Array Status (-detail)

Once an array is built, the kernel immediately triggers a background synchronization task to align the storage drives. To check the real-time health, recovery progress, or structural composition of a running volume, pass the detail flag:

Bash

$ sudo mdadm --detail /dev/md0

📊 Key Metrics Provided in the Output:

  • RAID Level: Validates the operating configuration tier (e.g., raid1).
  • Array Size: Displays the total usable storage capacity available to user applications.
  • State: Reports the structural health status. A healthy array reads as clean, while an active synchronization or data rebuild operation displays as resyncing alongside a real-time completion percentage tracking bar.
  • Device Grid Matrix: Lists the individual member partitions, mapping their operational status (active sync, faulty, or spare).

🛑 3. Stopping and Deconstructing Arrays (-stop)

If you need to perform system maintenance, reconfigure storage layouts, or safely dismantle an array, you must deactivate the device node.

  • The Command:Bash

      $ sudo mdadm --stop /dev/md0
    
  • Administrative Rule: An array cannot be deactivated if it is currently in use. Before running the stop command, you must explicitly unmount (umount) the active file tree and ensure no active terminal sessions or background application processes are reading or writing data inside the mounted path.

🔄 4. Re-Assembling Existing Arrays (-assemble)

If an array was previously stopped, or if physical storage drives are migrated to a completely new Linux server, the operating system will not automatically combine them on boot. You must instruct the kernel to scan the disk metadata sectors and assemble the components back into a unified volume:

Bash

$ sudo mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1
  • What it does: This command reads the unique RAID superblocks stamped onto /dev/sdb1 and /dev/sdc1, verifies their historical relationship, and safely wakes up the /dev/md0 device interface with all original files intact.

Viewing RAID Status

📄 1. The /proc/mdstat File — Quick System Overview

The fastest way to check the status of all active software RAID devices on a system is by reading the virtual kernel statistics file located at /proc/mdstat.

  • Core Command Practice:Bash

      $ cat /proc/mdstat
    
  • What it does: It queries the Linux kernel’s multiple-device (md) driver directly to print a lightweight, text-based snapshot of all currently running arrays.
  • Real-Time Monitoring Hack: When an array is first built or recovering from a drive failure, the kernel actively synchronizes data across the disks. To monitor this progress without repeatedly typing cat, administrators use the watch utility to refresh the output every two seconds:Bash

      $ watch cat /proc/mdstat
    

📊 Understanding the /proc/mdstat Text Readout:

A typical healthy output line looks like this:

Plaintext

md0 : active raid1 sdc1[1] sdb1[0]
      20958464 blocks super 1.2 [2/2] [UU]
  • md0: The name of the virtual logical RAID device node.
  • active raid1: Indicates that the array is actively running under a RAID 1 (Mirroring) configuration framework.
  • sdc1[1] sdb1[0]: Lists the physical disk partition slices that are bound together to create this array.
  • [2/2]: Shows the device count ratio [Total Devices / Healthy Devices]. This means the array expects two operational partitions, and both are successfully online.
  • [UU]: The health status visualizer. Each U stands for Up (Healthy).
    • [UU] means both drives are healthy and in sync.
    • If a drive fails or is pulled out, the readout shifts to [_U] or [U_], instantly showing that the array is running in a degraded state.

🛠️ 2. mdadm --detail — Detailed Administrative View

While /proc/mdstat provides a quick overview, it doesn’t give you deep configuration data. To get an exhaustive, itemized breakdown of a specific array, use the --detail flag with the mdadm tool:

  • Core Command Practice:Bash

      $ sudo mdadm --detail /dev/md0
    

🔍 Key Fields Provided in the Detailed Output:

  • UUID: The unique tracking identifier assigned to the entire virtual RAID array.
  • Rebuild Status: If the array is degraded or recovering, it prints a precise mathematical progress bar (e.g., Rebuild Status : 45% complete) alongside an estimated time of completion.
  • Detailed Partition Roles Matrix: The bottom of the printout maps every partition to its exact state: active sync (working normally), faulty (broken/failed), or spare (idle backup drive ready to take over).

Learning Objectives: Logical Volume Manager (LVM) 🎯

  • 1. Grasping the Core Philosophy of LVM (LVM 개념 및 필요성 이해) 🧠🏗️
    • Objective: Learn what LVM is and why standard physical partitioning scales poorly in production enterprise server environments.
    • Core Competency: Understand how LVM decouples the operating system’s filesystems from raw physical hardware, eliminating fixed disk size boundaries.
  • 2. Mastering the 3-Layer Storage Abstraction (LVM 구성 요소 파악) 🧱📐
    • Objective: Memorize and differentiate the three architectural building blocks that make up an LVM pipeline.
    • Core Competency: Define the precise functions and structural relationships between:
      1. PV (Physical Volume): Raw physical partitions initialized for LVM use.
      2. VG (Volume Group): Combined storage resource pools created by grouping PVs together.
      3. LV (Logical Volume): Flexible virtual partitions carved out of a VG that host the final filesystems.
  • 3. Hands-On Lifecycle Management Commands (LVM 생성 및 확장 방법 숙지) 🛠️⌨…
    • Objective: Gain operational proficiency with the command utilities required to build, monitor, and dynamically resize storage pools on-the-fly without data loss.
    • Core Competency: Master the specific management toolsets across all three layers (e.g., pvcreate, vgcreate, lvcreate) and learn how to scale an active volume dynamically using tools like lvextend.

The Core Necessity of LVM: Overcoming Static Partitioning

🛑 The Problem: The Hard Limits of Static Partitions

Under a traditional storage architecture (like standard fdisk setups), when an operating system partition is built, its size boundaries are hardcoded directly into the physical disk’s partition table. This creates severe administrative bottlenecks in production environments:

  • The Storage Crunch: If a specific directory (e.g., /var or /home) completely runs out of space, you cannot easily steal empty space from an adjacent partition.
  • The High Cost of Scaling: To fix a full partition traditionally, an administrator must shut down services, back up the data, physically install a larger hard drive, recreate the entire partition map, and migrate the data back over. This causes significant system downtime.

🚀 The Solution: What is LVM?

LVM (Logical Volume Manager) solves this by decoupling the operating system’s filesystems from the rigid boundaries of physical storage hardware. Instead of formatting a physical disk directly, LVM pools your physical storage resources together into a flexible software layer.

  • The Virtual Pool Analogy: Think of LVM as turning your hard drives into a giant, fluid pool of water. If a specific volume is running low, you can dynamically pour more storage space into it from the pool.
  • Zero Downtime Scaling: LVM allows administrators to dynamically resize, expand, or shrink storage volumes on-the-fly while the system is actively running, completely eliminating the need for server downtime during storage upgrades.

🧱 The 3-Layer Architecture Preview

To manage this fluid space, LVM stacks storage into three distinct logical building blocks: \[\text{\bf PV } (\text{Physical Volume}) \ \longrightarrow \ \text{\bf VG } (\text{Volume Group}) \ \longrightarrow \ \text{\bf LV } (\text{Logical Volume})\]

  1. PV (물리 볼륨): Raw physical hard drives or partitions initialized to speak the LVM language.
  2. VG (볼륨 그룹): The massive storage pool created by combining multiple PVs together.
  3. LV (논리 볼륨): The flexible virtual partitions carved out of the VG. These function just like traditional partitions, meaning they can be formatted with a filesystem and mounted to a directory.

Dynamic Storage Expansion: lvextend and resize2fs

💡 The Two-Step Expansion Philosophy

When increasing available storage space in LVM, you must expand your allocation across two completely independent structural layers before the operating system can use it:

  1. The Container Layer (Logical Volume): You must first stretch the virtual boundaries of your LVM logical container. This is driven by lvextend.
  2. The Grid Layer (Filesystem): Once the container is larger, the underlying storage file grid (like an ext4 filesystem) is still locked to its original dimensions. You must instruct the filesystem itself to expand and occupy the newly available space inside the enlarged container. For ext4 filesystems, this is driven by resize2fs.

⌨️ Command Execution Breakdown

Step 1: Expand the Logical Volume (lvextend) 📐

To increase the size of an active Logical Volume, use the lvextend command. You can choose to allocate all available unallocated space remaining inside your parent Volume Group pool:

Bash

$ sudo lvextend -l +100%FREE /dev/myVG/myLV
  • l +100%FREE: Automatically recalculates and absorbs every single byte of open space left inside the Volume Group (myVG) and appends it to your Logical Volume (myLV).

Step 2: Expand the Filesystem Grid (resize2fs) 🗂️

If you check your disk space (df -h) immediately after running lvextend, your directory will still report its old size. To complete the expansion, you must stretch the ext4 file grid to match the new container boundaries:

Bash

$ sudo resize2fs /dev/myVG/myLV
  • Online Resizing: When no specific size parameters are added to the end of the command string, resize2fs defaults to stretching the filesystem to the absolute maximum limit of the underlying logical volume container.
  • For modern ext4 filesystems, this operation is fully supported online. The filesystem stays mounted, and your applications can read and write data uninterrupted while the storage expands in the background.

🚨 Critical File System Architecture Exception: ext4 vs. XFS

While resize2fs is the tool used for standard setups, it is explicitly bound to the ext4 filesystem family framework.

If your specific Linux installation utilizes the XFS filesystem profile instead, resize2fs will fail. For XFS architectures, you must substitute the second command with the xfs_growfs tool, targeting the active mount directory path rather than the raw device node block:

Bash

# Target the active MOUNT POINT directory for XFS systems, not the /dev device
$ sudo xfs_growfs /mnt/data

Learning Objectives: System Monitoring 🎯

  • 1. Real-Time Process and Resource Auditing (시스템 모니터링 명령어 사용법 숙지) 📊🕒
    • Objective: Gain operational familiarity with the essential command-line utilities used to track active processes, CPU utilization, and system load averages.
    • Core Competency: Master the interactive terminal utilities—specifically top and htop—to inspect running processes, sort resource consumers, and evaluate real-time system performance.
  • 2. Memory and Swap Space Tracking (메모리 사용량 확인) 🧠💾
    • Objective: Learn how to audit volatile memory allocation and virtual memory utilization across the operating system.
    • Core Competency: Use the free utility (alongside human-readable flags like h) to check total, used, and available physical RAM, as well as active Swap space utilization.
  • 3. Storage I/O and Disk Space Analysis (디스크 및 기타 자원 모니터링) 🗂️🔍
    • Objective: Keep track of storage consumption and disk space limits to prevent file grid overflows and system lockups.
    • Core Competency: Master filesystem monitoring commands like df (Disk Free) to check capacity thresholds per mount point, and du (Disk Usage) to calculate the precise storage footprint of specific directories and files.

Introduction to System Monitoring

🏛️ The 3 Pillars of System Monitoring

  • 1. CPU & Load Performance (CPU 및 부하 모니터링) 📊🕒
    • Core Focus: Tracking raw processor core utilization percentages alongside overall system load metrics to catch runaway software processes or bottlenecks before they freeze the operating system.
  • 2. Memory Allocation Tracking (메모리 모니터링) 🧠💾
    • Core Focus: Auditing the allocation grid of volatile physical memory (RAM) and virtual storage overflow fields (Swap space) to make sure applications have enough workspace to run efficiently without running out of memory.
  • 3. Storage Architecture & Resource Analysis (디스크 및 기타 자원 모니터링) 🗂️🔍
    • Core Focus: Checking disk space usage thresholds across all active mount points to prevent file grid overflows, while also analyzing directories to see which specific system folders are consuming the most space.

Memory Auditing and Resource Distribution: The free Command

Page 55 details the primary administrative method for auditing a Linux server’s volatile memory footprint using the free utility. It explains how to interpret physical RAM allocation and virtual Swap space distribution to ensure the operating system has enough headroom to run applications smoothly.

📉 1. Auditing Volatile Memory Space (free)

The free command parses the kernel’s internal memory management files to print a quick, tabular snapshot of the system’s current memory structures.

  • Core Command Practice:Bash

      $ free -h
    
  • The Importance of the h Flag: By default, free outputs raw statistics in kilobytes, which can be difficult to interpret quickly during an emergency. Appending the h (human-readable) flag automatically converts the data grid into user-friendly units like Megabytes (M) or Gigabytes (G).

📊 Understanding the Memory Matrix Columns

When you execute the command, the terminal displays two distinct row categories—Mem (Physical RAM) and Swap (Virtual Page File Space)—broken down into six structural columns:

  • total: The absolute total amount of physical RAM or swap space installed and recognized by the hardware layer.
  • used: The total volume of memory actively consumed by running processes, background daemons, and the operating system kernel itself.
  • free: Completely unallocated, pristine memory sectors that are currently untouched by any system tasks.
  • shared: The amount of memory utilized simultaneously by multiple individual processes to communicate or pass data blocks back and forth.
  • buff/cache (Buffers & Cache):
    • Buffers: Temporary storage blocks used to hold data waiting to be written to slow physical disks.
    • Cache: Frequently accessed files stored directly in fast RAM by the Linux kernel to speed up read operations. Linux automatically claims idle RAM for this purpose to optimize system performance.
  • available: The Most Important Metric. This shows the true amount of memory space remaining to launch new applications. Unlike the rigid free column, available factors in the portions of buff/cache that the kernel can instantly reclaim and free up if a new program suddenly demands memory.

🔄 2. Analyzing Virtual Memory: The Swap Space Row

The second row in the output tracking layout is dedicated entirely to Swap space.

  • The Safety Net: Swap space is an allocated section of a physical hard drive or solid-state drive that acts as virtual memory extension.
  • The Overflow Mechanism: If a server runs completely out of expensive physical RAM (Mem), the Linux kernel avoids an immediate system crash by shifting dormant, inactive background processes out of RAM and writing them temporarily into the slower Swap space disk sectors.
  • Administrative Warning: While Swap keeps the system alive, relying on it heavily will drastically slow down server performance because mechanical or solid-state storage drives are significantly slower than actual RAM chips. Monitoring an increasing used value in the Swap row is a clear indicator that the system requires a physical RAM upgrade.

Process Lifecycle Management: ps, kill, and pkill

🔍 1. Auditing Active Processes (ps)

Before you can manage a process, you need to find it. The ps (Process Status) command generates a static snapshot of the tasks currently running on the system.

  • Standard Practice Command:Bash

      $ ps -ef
    
  • What it does:

    • e: Selects all processes running across the entire operating system, not just the ones owned by the current terminal user.
    • f: Generates a full-format listing layout. This column grid details critical metadata such as the owner (UID), the unique Process ID (PID), the Parent Process ID (PPID), the execution start time, and the exact command string that launched the program.

🎯 2. Targeted Process Termination (kill)

When an application freezes or consumes too many resources, you can shut it down by sending a termination signal directly to its unique Process ID.

  • Standard Practice Command:Bash

      $ kill [Signal_Number] [PID]
    
  • Core Signal Levels:

    • kill -15 [PID] (SIGTERM): The default and safest termination signal. It politely requests the application to stop, giving it a chance to save its current state, close open database files, and clean up temporary resources before exiting.
    • kill -9 [PID] (SIGKILL): The forced termination signal. It completely bypasses the application logic; the Linux kernel immediately drops the process from memory. This should only be used as a last resort for frozen programs that are completely unresponsive to a standard signal 15 request.

📛 3. Bulk Termination by String Matching (pkill)

If an application has spawned dozens of worker threads, finding and typing every individual PID can be tedious. The pkill (Process Kill) command allows you to terminate tasks using their literal process names.

  • Standard Practice Command:Bash

      $ pkill -9 firefox
    
  • What it does: The system automatically scans the entire active process tree, identifies every single process containing the keyword “firefox” in its name string, and applies the forced termination signal (9) to all of them simultaneously.


System Log Auditing: journalctl /var/log/syslog

📄 1. The Traditional Log Store: /var/log/syslog

Historically, Linux operating systems record a wide variety of global system activities, kernel messages, and application alerts into a standardized plain-text log directory.

  • Core File Path: /var/log/syslog (or /var/log/messages on certain enterprise distributions).
  • Administrative Practice: Because these are raw, append-only text files, administrators typically monitor them using the tail utility combined with the follow flag to stream updates in real time:Bash

      $ sudo tail -f /var/log/syslog
    
  • What it captures: Chronological events detailing hardware attachments, authentication tracking, and system daemon startup or failure states.

⚡ 2. The Modern Unified Journaling Engine: journalctl

Modern Linux distributions utilizing the systemd init framework route system events through a specialized, high-performance binary logging service called systemd-journald. Because these files are stored in a compressed binary format, you cannot read them with standard tools like cat or nano. Instead, you query them using the journalctl command utility.

⌨️ Essential journalctl Operational Commands:

  • View All System Logs:Bash

      $ journalctl
    
    • What it does: Opens a full, scrollable chronological timeline of every single system event captured by the journal since the earliest recorded boot log.
  • Stream Logs in Real-Time (Live Follow Mode):Bash

      $ journalctl -f
    
    • What it does: Operates exactly like tail -f, locking the terminal view to the absolute bottom of the log file and dynamically printing new event lines onto the screen the exact millisecond they happen.
  • Isolate Logs for a Specific Background Service:Bash

      $ journalctl -u nginx
    
    • What it does: Filters out global system noise to show only the logs generated by a specific systemd service unit (e.g., u nginx or u sshd), making it easy to troubleshoot why a particular server daemon failed to start.

Learning Objectives: Daily Backup System 🎯

  • 1. Architectural Integrity & Automation Strategy (백업 시스템 구축 목적 및 계획 수립) 📋🛡️
    • Objective: Understand why automated backups are essential for server data integrity, disaster recovery, and system administration.
    • Core Competency: Learn how to plan a data preservation routine that targets critical server configuration directories and user spaces automatically.
  • 2. Archive Packaging via the tar Utility (tar 명령어 활용법 숙지) 🗜️📦
    • Objective: Gain operational proficiency with the tar (Tape Archive) utility to compress, package, and extract directory structures.
    • Core Competency: Master the specific command-line options required to bundle extensive system file trees into compact, compressed archive files.
  • 3. Task Automation via Shell Scripting and cron (스크립트 작성 및 자동화 등록) ⚙️⏰
    • Objective: Combine system commands into an executable script file and configure the system to run it on a strict recurring timeline without manual intervention.
    • Core Competency: Learn how to write a shell script that compresses files with dynamic, date-stamped filenames, and register that script inside the system’s cron scheduler daemon for automated daily execution.

The Core Necessity: Why Backup?

🛑 The Core Risks: Why Data Disappears

Server data is volatile and constantly exposed to risk. The lecture categorizes the primary threats to data integrity into three major areas:

  • 1. Hardware Failure (하드웨어 고장) 🔌💥
    • Physical storage media (HDDs, SSDs) have a finite lifespan and can experience catastrophic mechanical or electrical failure at any time, leading to sudden, permanent data loss.
  • 2. Human Error (사용자의 실수) 👤❌
    • System administrators or users can accidentally execute destructive commands (such as rm -rf), overwrite critical configuration matrices, or mistakenly delete active database directories.
  • 3. Security Breaches & Cyber Attacks (보안 공격 및 랜섬웨어) 🥷🔓
    • Malicious network Intruders or automated malware infections can corrupt operating system configurations or deliberately encrypt enterprise data files for ransom.

🎯 The Operational Goals of a Backup System

To counter these risks, a properly engineered backup architecture must achieve three main operational objectives:

  • Data Preservation (데이터 보존): Ensure that a complete, identical copy of essential enterprise data is securely isolated and maintained outside of the active production workspace.
  • Disaster Recovery (재해 복구): Provide a predictable, reliable pipeline to restore data files and return the operating system to a stable, functional state immediately following a system failure.
  • Minimizing Downtime (서비스 중단 최소화): Enable swift recovery workflows to reduce the duration of a system outage, ensuring that critical business services can resume operation with minimal impact on users.

tar Command

🧱 Understanding the Core Options (cvf vs. xvf)

The behavior of the tar command changes based on the options you pass to it. The lecture splits these into two primary workflows: Creating an archive and Extracting an archive.

1. Creating a New Archive Package ➕

To gather a group of files or an entire folder and pack them into a single file, use the create syntax:

Bash

$ tar cvf [Archive_Name.tar] [Target_Directory_or_Files]
  • c (Create): Instructs the utility to build a brand-new archive file.
  • v (Verbose): Toggles the progress display on, showing a real-time list of every file being packed into the archive inside the terminal screen.
  • f (File): Specifies the custom filename of the archive file you are generating. Note: This option must always be placed immediately before the archive filename.

2. Extracting an Existing Archive Package 📂

To unpack a tarball and restore the files back to their original state, use the extract syntax:

Bash

$ tar xvf [Archive_Name.tar]
  • x (Extract): Instructs the utility to unpack and break open the target archive file, reconstructing the original file structure inside the current working directory.

⚡ Combining Archiving with Compression (z / j)

By default, running standard tar cvf only bundles files together; it does not reduce their overall file size. To save storage disk space during a daily backup routine, page 64 introduces compression flags that pass the archive through an optimization engine on-the-fly:

  • z (gzip Compression) 🔘: Passes the archive through the gzip algorithm. This is the industry standard for fast compression.
    • File Extension: .tar.gz (or .tgz).
    • Creation Command: tar czvf backup.tar.gz /home/user.
  • j (bzip2 Compression) 🗜️: Passes the archive through the bzip2 algorithm. This takes slightly more CPU time but results in a smaller file size than gzip.
    • File Extension: .tar.bz2.
    • Creation Command: tar cjvf backup.tar.bz2 /home/user.

Fast and Efficient Data Synchronization: The rsync Command

💡 The Core Advantage: Incremental Replication & Delta Transfer

The defining feature of rsync is its ability to perform incremental backups using a specialized delta-transfer algorithm.

  • The Efficiency Gap: If you have a 10 GB directory and only a single 1 KB text file changes, a traditional copy command (cp) or archiving tool (tar) will re-copy all 10 GB of data.
  • The rsync Solution: It compares the source and destination folders, identifies exactly what has changed, and transfers only the newly created or modified data blocks. This drastically reduces disk I/O, network bandwidth consumption, and overall backup execution time.

⌨️ Command Syntax and Key Options

  • Standard Practice Command:

      $ rsync -avh --delete /src/directory/ /dest/directory/
    

Parameter Breakdown:

  • a (Archive Mode) 🗄️: A comprehensive flag that preserves almost all file attributes. It recursively copies directories, preserves symbolic links, retains file modification times, and maintains original user/group permissions and ownerships.
  • v (Verbose) 🗣️: Instructs the utility to output a real-time list of files currently being scanned, compared, and transferred in the terminal window.
  • h (Human-Readable) 📊: Formats all data transfer volumes, file sizes, and throughput speeds into easily readable units like Megabytes (M) or Gigabytes (G) instead of raw bytes.
  • -delete (Mirror Synchronization) ⚠️: Tells rsync to delete files in the destination directory if they no longer exist in the source directory. This ensures the backup target is an exact, identical mirror of the source, preventing old, deleted files from cluttering your backup storage.






© 2017. by isme2n

Powered by aiden