January 2026
Projects

MiniSH

A mini shell in C to deeply understand processes, pipes, redirections, and signals.

C POSIX Make fork execvp pipe dup2

TL;DR

After taking Operating Systems Fundamentals, I wanted to move from “understanding on paper” to “understanding in code.” This project came from that need: building a mini shell in C to see, in practice, how processes, pipes, redirections, signals, and dynamic memory work together.

I did not try to clone Bash. I aimed for something more useful for learning: a small, readable system with clear boundaries.


The origin: when theory falls short

There is one very specific moment that pushed me to build this.

In class we talked about fork(), process tables, file descriptors, and execve(). Conceptually, it all made sense. I could answer exam questions. But when I asked myself, “what exactly happens when I type cat file | wc -l in a terminal?” my real answer was: “I kind of know.”

And that “kind of” bothered me.

So I set a goal: build a minimal shell from scratch in C, and force myself to truly solve what a shell solves every time it runs a command.

Not to ship an impressive feature. Not to sell anything. Not to show off complexity.

Just out of properly focused technical curiosity.


The project idea in one sentence

minish is a POSIX-style mini shell for learning that runs external commands, supports pipelines/redirections, and includes basic builtins.

The important part is not the feature list. The important part is that each feature was chosen to exercise a systems concept:

  • External commands -> fork + execvp
  • Pipelines -> pipe + dup2 + correct FD closing
  • Redirections -> open + dup2
  • Shell state -> builtins executed in the parent when needed
  • Interactive control -> SIGINT handling

Real scope (no hype)

What it does

  • Interactive loop with prompt (minish$)
  • Line reading with getline
  • Token/operator parsing: |, <, >, >>
  • Basic support for single and double quotes
  • Common syntax validations
  • External command execution via PATH (execvp)
  • N-command pipelines
  • Input/output redirections
  • Builtins: cd, pwd, echo, exit
  • Basic Ctrl+C handling

What it does not do (intentionally)

  • No variable expansion ($HOME, $?)
  • No &&, ||, ;, & operators
  • No job control (jobs, fg, bg)
  • No full Bash-style parser (advanced escapes/quoting)
  • No history/autocomplete

This point is key when defending the project: explicitly defining boundaries is a mature technical decision, not an accidental limitation.


Architecture: separation of responsibilities

The structure is designed so each module does one thing and does it well:

  • src/main.c

    • Main REPL
    • Interactive prompt
    • SIGINT capture/management
    • Calls parser and executor
  • src/parser.c

    • Line tokenization
    • Operator detection
    • pipeline_t construction
    • Syntax errors
  • src/executor.c

    • Command execution
    • Pipe creation
    • Redirections with dup2
    • fork, execvp, waitpid
  • src/builtins.c

    • cd, pwd, echo, exit
  • src/utils.c

    • Safe memory wrappers (xmalloc, xrealloc, xstrdup)
  • include/minish.h

    • Shared data types
    • Prototypes and module contracts

This split helped me iterate quickly without breaking everything on each change.


End-to-end command flow

When a user types a command, the path is:

  1. main reads the line with getline.
  2. If there is content, it calls parse_line.
  3. The parser returns a pipeline_t structure (or a syntax error).
  4. execute_pipeline walks the commands:
    • It decides whether it is a parent-executed builtin (simple case), or
    • It creates child processes and pipes (general case).
  5. It waits for all children with waitpid.
  6. It takes the last command’s status.
  7. It frees memory (free_pipeline) and returns to the prompt.

The key in this flow is that there is no magic: these are direct, traceable steps.


Important technical decisions and why

1) State-changing builtins run in the parent process

I decided that if there is a single command and it is a builtin, it runs in the parent.

Reason:

  • cd must change the real shell’s working directory.
  • exit must modify global shell state.

If this runs in a child, the effect dies with the child. This is one of those OS lessons you truly internalize when implementing it yourself.

2) execvp instead of absolute/manual paths

Using execvp simplifies things and aligns behavior with a real shell: it looks up binaries in PATH.

That allows running ls, cat, wc, etc. without manually resolving paths.

3) Simple parser with explicit limits

I did not want to build a complex grammar at this stage.

I preferred a linear, predictable parser with clear errors:

  • “Unclosed quotes”
  • “Pipe without command”
  • “Missing file for redirection”
  • “Duplicate redirection”

The goal was robustness and understandability, not full shell syntax support.

4) Dynamic memory for variable-length structures

A shell does not know how many words, commands, or pipes a line will contain.

That is why token_list_t, argv, and pipeline->commands grow with xrealloc. And that is why wrappers (xmalloc, xrealloc, xstrdup) exist, so I do not repeat NULL checks at every call site.

5) Signal handling designed for interactive mode

In the main shell process, SIGINT should not kill the whole shell. In child processes, default behavior is better so external commands react normally to Ctrl+C.

This separation greatly improves interactive UX and mirrors how real shells behave.


Where it was hardest (and why)

File descriptor closing in pipelines

This was the classic point where one small detail breaks everything:

  • Close too early -> command loses input/output
  • Do not close -> deadlocks or leaks

Understanding the correct order of dup2, close, and FD inheritance was one of the most formative parts of the project.

Redirections for parent-executed builtins

For builtins running in the parent process, you need temporary redirection and then restoration of the shell’s stdin/stdout.

If you do not restore them, the shell stays in a broken state for later commands.

Exit status consistency

Running commands is not enough: status codes must be coherent.

In pipelines, minish uses the last command’s status, which is what users and basic scripting expect.


Current drawbacks and known issues

This section is important for an honest defense: what works well today and what is still fragile.

Drawback Current state Consequence
Limited quoting/escaping Supports basic single and double quotes, but not a full Bash-like escaping model; it also does not interpret sequences like \n in echo except literally. Some commands that are valid in Bash behave differently here or are not supported.
No variable expansion Does not replace $HOME, $USER, $?, etc. Common shell scripts and habits do not work as-is.
No control-flow operators No &&, ||, ;, or background execution with &. You cannot chain command logic on a single line like in full shells.
No job control No jobs, fg, bg. No control over background processes.
Builtins with limited options echo implements a minimal version (simple -n) and cd, pwd, exit do not cover all variants from mature shells. Partial compatibility focused on learning.
Fail-fast memory handling If malloc/realloc/strdup fails, the program exits. Valid for an educational context, but it does not provide graceful product-level degradation.
Mostly manual testing Validation relies mostly on manual command tests. Missing automated tests to systematically detect regressions.
Basic UX and messaging Error messages are functional but improvable; no history or autocomplete. User experience is still simple.
Portability not fully audited Depends on POSIX APIs (fork, execvp, dup2, waitpid, etc.). Works well on Unix-like systems, but it does not target broad cross-platform support (e.g., native Windows without a POSIX layer).

A concrete execution example

Command:

Terminal window
cat < salida.txt | wc -c

What happens internally, in short:

  1. The parser creates two command_t entries in a pipeline_t.
  2. The executor creates one pipe.
  3. Child 1 (cat):
    • Opens salida.txt
    • dup2(file_fd, STDIN_FILENO)
    • dup2(pipe_write, STDOUT_FILENO)
    • execvp("cat", ... )
  4. Child 2 (wc -c):
    • dup2(pipe_read, STDIN_FILENO)
    • execvp("wc", ... )
  5. Parent closes unused pipe ends and waits for both children.
  6. Returns the last status (wc).

Seen this way, it stops being magic and becomes system mechanics.


Quick code look (real snippet)

This block comes from src/executor.c (original repository), inside execute_pipeline:

src/executor.c
pid_t pid = fork();
if (pid < 0) {
perror("fork");
if (pipefd[0] != -1) {
close(pipefd[0]);
}
if (pipefd[1] != -1) {
close(pipefd[1]);
}
if (prev_read != -1) {
close(prev_read);
}
for (int j = 0; j < started; ++j) {
waitpid(pids[j], NULL, 0);
}
free(pids);
return 1;
}
if (pid == 0) {
signal(SIGINT, SIG_DFL);
if (prev_read != -1) {
if (dup2(prev_read, STDIN_FILENO) < 0) {
perror("dup2");
_exit(1);
}
}
if (needs_pipe) {
if (dup2(pipefd[1], STDOUT_FILENO) < 0) {
perror("dup2");
_exit(1);
}
}
if (pipefd[0] != -1) {
close(pipefd[0]);
}
if (pipefd[1] != -1) {
close(pipefd[1]);
}
if (prev_read != -1) {
close(prev_read);
}
if (apply_redirections(&pipeline->commands[i]) != 0) {
_exit(1);
}
if (is_builtin(pipeline->commands[i].argv[0])) {
int st = execute_builtin(shell, pipeline->commands[i].argv);
_exit(st);
}
execvp(pipeline->commands[i].argv[0], pipeline->commands[i].argv);
fprintf(stderr, "%s: %s\n", pipeline->commands[i].argv[0], strerror(errno));
_exit(127);
}
pids[i] = pid;
started++;
if (prev_read != -1) {
close(prev_read);
}
if (pipefd[1] != -1) {
close(pipefd[1]);
}
prev_read = pipefd[0];

This is what the most delicate part looks like in real code: fork, dup2, FD closing, and command execution.

What is happening here, step by step:

  1. fork() splits execution into parent and child. If it fails (pid < 0), the code cleans everything opened so far (pipe ends, prev_read), waits for already-started children, and returns error. This cleanup prevents leaks and inconsistent states.

  2. In the child, SIGINT is reset to default. signal(SIGINT, SIG_DFL) lets the external command respond to Ctrl+C like in a real shell. The main shell process may follow a different policy, but the child should behave “normally.”

  3. dup2 wires the pipeline into the right command. If prev_read exists, it is duplicated to STDIN_FILENO (current command input). If needs_pipe is true, pipefd[1] is duplicated to STDOUT_FILENO (output to next command). In practical terms: input from previous command, output to next command.

  4. After dup2, original FDs are closed. This is key: after duplication, old descriptors are no longer needed. If they stay open, you may cause deadlocks (because a write end remains open) or consume resources unnecessarily.

  5. Redirections are applied and then command execution happens. apply_redirections(...) can override stdin/stdout for <, >, or >>. Then, if the command is a builtin, it runs in the child (pipeline case); otherwise execvp is called. If execvp fails, it exits with 127, the typical code for “command not executable/not found.”

  6. The parent keeps pipeline control. It stores the pid, closes what it no longer needs, and moves prev_read to the read end of the newly created pipe. This pattern allows chaining N commands without mixing descriptors.

In short: this snippet does not just “spawn processes”; it implements a precise choreography of descriptors and signals. Change the order of these steps and the pipeline can break even with the same system calls.


Why this project helped me so much

Because it forced me to answer questions I could avoid in class:

  • “What exactly changes after fork?”
  • “What does a child inherit and what not?”
  • “When should I close each FD?”
  • “Why can’t cd just be an external executable?”
  • “How does a system error surface in terminal UX?”

Most importantly: it forced me to debug real behavior, not just repeat concepts.


Technical lessons I take with me

  1. Systems work is about operation order. The same set of calls can work or fail depending on order.

  2. A clear module API is gold. Separating parser/executor/builtins made the code defendable and maintainable.

  3. Defining scope early avoids chaos. Explicitly deciding what NOT to implement keeps focus and quality.

  4. Clear errors are part of the product. A systems program without useful messages is much harder to use and debug.

  5. Dynamic memory is not optional here. In a shell, almost everything has variable size.


If I had a next iteration

The natural progression would be:

  1. Variable expansion ($VAR, $?)
  2. Logical operators (&&, ||)
  3. More complete quoting/escaping parser
  4. History and autocomplete
  5. Automated parser and execution tests

That way the shell would grow without losing the foundation already built.


Personal closing

The most valuable part of minish is not that it runs ls or supports pipes.

The valuable part is moving from “I understand it theoretically” to “I know exactly what is happening, where it can break, and why.”

If this project has one core idea, it is this: focused technical curiosity turns abstract concepts into real intuition.

And for me, that already made it worth it.