Exploring the Linux File System

Today, I find myself staring into the depths of Debian Linux terminal running on a virtual machine. At first the Linux file system felt like an incomprehensible labyrinth. Coming from a Windows background where the "C: Drive" included everything the system needed, the root directory on linux which is represented simply by a forward slash ( / ) looked like an empty layer. There were no user-friendly folders named "Program Files" or "Users" or "Windows" ("Linux" folder here) Instead, there were a bunch of vaguely Latin-sounding abbreviations: etc, var, usr, bin.

We always hear the famous Unix mantra, "Everything is a file," but what does that actually mean when you look at the files? I became Terry and went to brute force my way into folders(sorry directories!) and files and started googling aggressively. What I found was a brilliantly built system where text files act as the levers and dials for the entire machine.

Is it on Hard Drive? `/proc` generates files on the fly

My journey began with the most fascinating and deceptive directory in the entire operating system: the process information pseudo-file system at /proc. When I first navigated into this directory, I expected to find standard configuration files. Instead, I discovered a chaotic list of numbered directories alongside files like cpuinfo and meminfo.

I was shocked to find that most files are zero bytes aka. empty. Yet, when I read them, they overflowed with detailed, real-time hardware data. After researching, I realized that /proc is not a real folder on the hard drive at all; it exists entirely in RAM. It acts as a real-time window into the kernel's activity. Every numbered directory corresponds to a running process ID (PID), containing files that dictate what that process is doing, its memory consumption, and its open network ports. Reading (or "Opening") a file like /proc/cpuinfo is literally asking the kernel to dump its current hardware memory cache directly into your terminal. Rather than forcing developers to use complex programming APIs to query the system to look at basic data (Looking at you, Windows and CoreTemp), Linux abstracts the current state of the machine into simple text files.

if you wan to know more about processes and how to find them click here

Handling Hardware like Text: `/sys` and `/dev`

This thought about virtual files naturally led me to wonder how Linux handles physical hardware. This is where the brilliant relationship between the /sys and /dev directories comes into play.

In the past, hardware management was a messy affair. There weren't many standards and everyone was still figuring it out. But today, on boot, the kernel populates /sys with perfectly organized, highly structured virtual data about every hardware bus and kernel driver. A background process called udev then reads /sys and dynamically generates the interactive device nodes we see in /dev. Essentially, /sys tells you what the hardware is, and /dev is how you use it. Look into udev here.

The most random files in this hardware abstraction layer solve incredibly unique problems. For example, /dev/null is a black hole. When a script generates output you don’t care about, linux redirect it into /dev/null, completely discarding data streams without allocating disk space. So this is like shredding papers and burning them. Conversely, /dev/urandom solves the problem of cryptographic entropy. Generating SSH keys requires true randomness, so the kernel gathers system noise (any disk read timings, random network packets) and exposes it as an endless stream of random characters through this file ( Crazy right? ). Whether you are sending data to a void or pulling cryptographic noise from the ether, you do it simply by again, reading and writing files.

Networking by reading files: `/etc` and Routing

From the hardware layer, my curiosity drifted toward the network. How does the system resolves DNS? This led me into the /etc directory.

I opened /etc/resolv.conf, half-expecting to find a list of DNS servers. This file is actually a symbolic link managed by systemd-resolved. This choice solves a major historical problem: if you manually set your DNS servers, connecting to a new Wi-Fi network would blindly overwrite your settings. By using a dynamic symlink, the system can seamlessly switch between office networks, home Wi-Fi, and VPNs without destroying base configurations. Read more about it here.

But how does the system know whether to check a local host file or query those external DNS servers? The answer lives in /etc/nsswitch.conf (the Name Service Switch). This file establishes a strict hierarchy for resolving names, telling the OS to check local files first before querying the internet. This guarantees that every application from your browser to a basic ping command made on terminal resolves network addresses using this logic. Learn about nsswitch here.

Meanwhile, the physical packet routing relies on /proc/net/route. Unlike other tools, this file presents the raw routing table in hexadecimal format. I got to know what it was by googling. Because routing decisions must be made in fractions of a millisecond, network daemons read these hex values directly from the kernel’s memory.

services.msc of Linux: `systemd`

Network connections are often useless if the underlying services aren't running, Linux's initialization system runs these services. But see the system has both a /lib/systemd/system/ and an /etc/systemd/system/ directory.

Why have two folders for the same thing, you may ask? I asked google and it told me that the /lib directory is where software vendors place default service scripts. The /etc directory is reserved entirely for the local system administrator (you but with more privilege). If an admin places a modified configuration in /etc with the same name as one in /lib, the system executes the admin’s version. This solves the issue where if you customize a package and it gets an update, your custom package won't be affected. When you update your system, the package manager safely overwrites vendor defaults without destroying your custom code.

file_shortcut: Hard Links

My exploration of files led me to one of the strange thing in the Linux file system: hard links. Coming from Windows, I know whatt a "shortcut" is, in Linux, this is called a symbolic link (symlink). It just points to another file path. If you delete the original file, the shortcut breaks.

Linux goes one step further from Windows. Because Linux separates the filename from the actual data (the inode), you can create a completely new filename in a different directory that points to the exact same inode as the original file. Both files are completely equal; they share one soul. If you edit File A, File B changes instantly because they are literally the exact same data blocks on the disk. In wIndows you'd need to link a shortcut in Properties.

What happens when you try to delete them? If you delete File A, the data isn't lost! The system only deletes the actual data off the hard drive when all hard links pointing to that specific inode are removed. It completely redefined what "deleting" a file means to me. You aren't actually deleting data; you are just removing a name tag. Only when the last name tag is gone does the system finally wipe the memory.

Where Do Third-Party Apps Live? (`/opt` vs `/usr/local`)

While standard system utilities live in /bin and system configurations in /etc, I realized I didn't know where my own development tools, browsers and applications—like Docker, GIMP, Firefox, Chrome or other software actually belonged. If everything is so highly organized, where do these go?

I discovered that Linux has two specific directories: /opt and /usr/local. The /opt (Optional) directory is where massive, monolithic third-party software bundles live. An application installed here keeps its binaries, libraries, and configs in one self-contained folder (like /opt/google/chrome), completely isolated from the rest of the OS.

On the other hand, /usr/local is designated for software you compile yourself from source code. So anything that you clone from github and build from source. By forcing custom software into /usr/local, the OS guarantees that your manual installations with custom code will never accidentally overwrite the native software managed by the official package manager. Linux inherently distrusts third-party integrations (and rightfully so!) and uses the directory structure itself to keep the core system less bug-prone.

Privilege Separation in Plain Text: `/etc/passwd` and `/etc/shadow`

As these are related to user passwords, managing these services requires proper user permissions, throwing me into a fascinating security rabbit hole. In early Unix days, password hashes were stored directly inside /etc/passwd. The /etc/passwd file is a text file that describes user login accounts for the system. It should have read permission allowed for all users (many utilities, like ls use it to map user IDs to usernames), but write access only for the superuser. Because regular users need to read this file to map user IDs to names, anyone could copy the hashes and run offline dictionary attacks.

The solution to this was privilege separation. Linux engineers replaced the password field in /etc/passwd with a simple "x" and moved the actual cryptographic hashes into /etc/shadow. By strictly locking down the permissions of /etc/shadow so only the superuser can read it, Linux secured the system from snooping.

Managing Volatile Runtime Data: `/run`

While /etc holds persistent config data, the /run directory is entirely volatile. Mounted as a tmpfs (Temporary File System), it lives completely in RAM and serves as a database for active sockets, logged-in user sessions, and PID files.

This directory contains information which describes the system since it was booted. Once this purpose was served by /var/run and programs may continue to use it.

This solves a historically annoying problem. In older systems, if a server crashed, an old service lock file would remain on the hard drive. On reboot, the service would see the old file, assume it was already running, and refuse to start. By moving data to /run, the system guarantees that if power is cut, the data is gone. The server boots anew, preventing deadlocks and saving modern SSDs from unnecessary read/write wear.

The Log Files in `/var/log`

Linux has a special directory for storing logs called /var/log. This directory contains logs from the OS itself, services, and various applications running on the system. Even with perfect security and clean boots, things tend to go wrong. That’s why /var/log is a crucial directory. Unlike the static files in /etc or the volatile files in /run, the /var (variable) directory is meant for data that continuously grows.

Linux logs are just plaintext streams, you don't need fancy graphical event viewers. The tail command is probably one of the single most handy tools you have at your disposal for the viewing of log files. What tail does is output the last part of files. So, if you issue the command tail /var/log/dpkg.log, it will print out only the last few lines of the syslog file.

But wait, the fun doesn’t end there. The tail command has a very important trick up its sleeve, by way of the -f option. When you issue the command tail -f /var/log/dpkg.log, tail will continue watching the log file and print out the next line written to the file. This means you can follow what is written to this file, as it happens, within your terminal window.

fig. tail command output

Conclusion: A Beautiful Tree

As I wrap up this exploration, my perspective on the server environment has permanently shifted. As developers, it’s easy to treat the server our apps run on as an invisible black box. But the OS isn’t magic, it is built upon years of hard work from extremely smart folks lurking around the world.

fig. hierarchical file system chart of Linux. source

The Linux file system is not just a random assortment of directories designed to hold executables. It is a brilliant, battle-tested architecture made out of decades of solving complex computer science problems. From the virtual, zero-byte hardware mirrors in /proc, to the strict vendor-admin segregation of systemd, and the secure yet foolproof shadow architecture protecting our userspace, every text file has a highly engineered purpose. Linux doesn't hide its complexity behind a polished GUI, (unless it does) it shows the reality of its operations in plain text. You just have to be curious enough to read the files.

How Linux runs on files - The Linux File System

Is it on Hard Drive? `/proc` generates files on the fly

Handling Hardware like Text: `/sys` and `/dev`