MiniSH

TL;DR

After taking Operating Systems Fundamentals, I wanted to move from “understanding on paper” to “understanding in code.” This project came from that need: building a mini shell in C to see, in practice, how processes, pipes, redirections, signals, and dynamic memory work together.

I did not try to clone Bash. I aimed for something more useful for learning: a small, readable system with clear boundaries.

The origin: when theory falls short

There is one very specific moment that pushed me to build this.

In class we talked about fork(), process tables, file descriptors, and execve(). Conceptually, it all made sense. I could answer exam questions. But when I asked myself, “what exactly happens when I type cat file | wc -l in a terminal?” my real answer was: “I kind of know.”

And that “kind of” bothered me.

So I set a goal: build a minimal shell from scratch in C, and force myself to truly solve what a shell solves every time it runs a command.

Not to ship an impressive feature. Not to sell anything. Not to show off complexity.

Just out of properly focused technical curiosity.

The project idea in one sentence

minish is a POSIX-style mini shell for learning that runs external commands, supports pipelines/redirections, and includes basic builtins.

The important part is not the feature list. The important part is that each feature was chosen to exercise a systems concept:

External commands -> fork + execvp
Pipelines -> pipe + dup2 + correct FD closing
Redirections -> open + dup2
Shell state -> builtins executed in the parent when needed
Interactive control -> SIGINT handling

Real scope (no hype)

What it does

Interactive loop with prompt (minish$)
Line reading with getline
Token/operator parsing: |, <, >, >>
Basic support for single and double quotes
Common syntax validations
External command execution via PATH (execvp)
N-command pipelines
Input/output redirections
Builtins: cd, pwd, echo, exit
Basic Ctrl+C handling

What it does not do (intentionally)

No variable expansion ($HOME, $?)
No &&, ||, ;, & operators
No job control (jobs, fg, bg)
No full Bash-style parser (advanced escapes/quoting)
No history/autocomplete

This point is key when defending the project: explicitly defining boundaries is a mature technical decision, not an accidental limitation.

Architecture: separation of responsibilities

The structure is designed so each module does one thing and does it well:

src/main.c
- Main REPL
- Interactive prompt
- SIGINT capture/management
- Calls parser and executor
src/parser.c
- Line tokenization
- Operator detection
- pipeline_t construction
- Syntax errors
src/executor.c
- Command execution
- Pipe creation
- Redirections with dup2
- fork, execvp, waitpid
src/builtins.c
- cd, pwd, echo, exit
src/utils.c
- Safe memory wrappers (xmalloc, xrealloc, xstrdup)
include/minish.h
- Shared data types
- Prototypes and module contracts

This split helped me iterate quickly without breaking everything on each change.

End-to-end command flow

When a user types a command, the path is:

main reads the line with getline.
If there is content, it calls parse_line.
The parser returns a pipeline_t structure (or a syntax error).
execute_pipeline walks the commands:
- It decides whether it is a parent-executed builtin (simple case), or
- It creates child processes and pipes (general case).
It waits for all children with waitpid.
It takes the last command’s status.
It frees memory (free_pipeline) and returns to the prompt.

The key in this flow is that there is no magic: these are direct, traceable steps.

Important technical decisions and why

1) State-changing builtins run in the parent process

I decided that if there is a single command and it is a builtin, it runs in the parent.

Reason:

cd must change the real shell’s working directory.
exit must modify global shell state.

If this runs in a child, the effect dies with the child. This is one of those OS lessons you truly internalize when implementing it yourself.

2) `execvp` instead of absolute/manual paths

Using execvp simplifies things and aligns behavior with a real shell: it looks up binaries in PATH.

That allows running ls, cat, wc, etc. without manually resolving paths.

3) Simple parser with explicit limits

I did not want to build a complex grammar at this stage.

I preferred a linear, predictable parser with clear errors:

“Unclosed quotes”
“Pipe without command”
“Missing file for redirection”
“Duplicate redirection”

The goal was robustness and understandability, not full shell syntax support.

4) Dynamic memory for variable-length structures

A shell does not know how many words, commands, or pipes a line will contain.

That is why token_list_t, argv, and pipeline->commands grow with xrealloc. And that is why wrappers (xmalloc, xrealloc, xstrdup) exist, so I do not repeat NULL checks at every call site.

5) Signal handling designed for interactive mode

In the main shell process, SIGINT should not kill the whole shell. In child processes, default behavior is better so external commands react normally to Ctrl+C.

This separation greatly improves interactive UX and mirrors how real shells behave.

Where it was hardest (and why)

File descriptor closing in pipelines

This was the classic point where one small detail breaks everything:

Close too early -> command loses input/output
Do not close -> deadlocks or leaks

Understanding the correct order of dup2, close, and FD inheritance was one of the most formative parts of the project.

Redirections for parent-executed builtins

For builtins running in the parent process, you need temporary redirection and then restoration of the shell’s stdin/stdout.

If you do not restore them, the shell stays in a broken state for later commands.

Exit status consistency

Running commands is not enough: status codes must be coherent.

In pipelines, minish uses the last command’s status, which is what users and basic scripting expect.

Current drawbacks and known issues

This section is important for an honest defense: what works well today and what is still fragile.

Drawback	Current state	Consequence
Limited quoting/escaping	Supports basic single and double quotes, but not a full Bash-like escaping model; it also does not interpret sequences like `\n` in `echo` except literally.	Some commands that are valid in Bash behave differently here or are not supported.
No variable expansion	Does not replace `$HOME`, `$USER`, `$?`, etc.	Common shell scripts and habits do not work as-is.
No control-flow operators	No `&&`, `\|\|`, `;`, or background execution with `&`.	You cannot chain command logic on a single line like in full shells.
No job control	No `jobs`, `fg`, `bg`.	No control over background processes.
Builtins with limited options	`echo` implements a minimal version (simple `-n`) and `cd`, `pwd`, `exit` do not cover all variants from mature shells.	Partial compatibility focused on learning.
Fail-fast memory handling	If `malloc`/`realloc`/`strdup` fails, the program exits.	Valid for an educational context, but it does not provide graceful product-level degradation.
Mostly manual testing	Validation relies mostly on manual command tests.	Missing automated tests to systematically detect regressions.
Basic UX and messaging	Error messages are functional but improvable; no history or autocomplete.	User experience is still simple.
Portability not fully audited	Depends on POSIX APIs (`fork`, `execvp`, `dup2`, `waitpid`, etc.).	Works well on Unix-like systems, but it does not target broad cross-platform support (e.g., native Windows without a POSIX layer).

A concrete execution example

Command:

1
cat < salida.txt | wc -c

What happens internally, in short:

The parser creates two command_t entries in a pipeline_t.
The executor creates one pipe.
Child 1 (cat):
- Opens salida.txt
- dup2(file_fd, STDIN_FILENO)
- dup2(pipe_write, STDOUT_FILENO)
- execvp("cat", ... )
Child 2 (wc -c):
- dup2(pipe_read, STDIN_FILENO)
- execvp("wc", ... )
Parent closes unused pipe ends and waits for both children.
Returns the last status (wc).

Seen this way, it stops being magic and becomes system mechanics.

Quick code look (real snippet)

This block comes from src/executor.c (original repository), inside execute_pipeline:

108
pid_t pid = fork();
109
if (pid < 0) {
110
    perror("fork");
111
    if (pipefd[0] != -1) {
112
        close(pipefd[0]);
113
    }
114
    if (pipefd[1] != -1) {
115
        close(pipefd[1]);
116
    }
117
    if (prev_read != -1) {
118
        close(prev_read);
119
    }
120
    for (int j = 0; j < started; ++j) {
121
        waitpid(pids[j], NULL, 0);
122
    }
123
    free(pids);
124
    return 1;
125
}
126

127
if (pid == 0) {
128
    signal(SIGINT, SIG_DFL);
129

130
    if (prev_read != -1) {
131
        if (dup2(prev_read, STDIN_FILENO) < 0) {
132
            perror("dup2");
133
            _exit(1);
134
        }
135
    }
136

137
    if (needs_pipe) {
138
        if (dup2(pipefd[1], STDOUT_FILENO) < 0) {
139
            perror("dup2");
140
            _exit(1);
141
        }
142
    }
143

144
    if (pipefd[0] != -1) {
145
        close(pipefd[0]);
146
    }
147
    if (pipefd[1] != -1) {
148
        close(pipefd[1]);
149
    }
150
    if (prev_read != -1) {
151
        close(prev_read);
152
    }
153

154
    if (apply_redirections(&pipeline->commands[i]) != 0) {
155
        _exit(1);
156
    }
157

158
    if (is_builtin(pipeline->commands[i].argv[0])) {
159
        int st = execute_builtin(shell, pipeline->commands[i].argv);
160
        _exit(st);
161
    }
162

163
    execvp(pipeline->commands[i].argv[0], pipeline->commands[i].argv);
164
    fprintf(stderr, "%s: %s\n", pipeline->commands[i].argv[0], strerror(errno));
165
    _exit(127);
166
}
167

168
pids[i] = pid;
169
started++;
170

171
if (prev_read != -1) {
172
    close(prev_read);
173
}
174
if (pipefd[1] != -1) {
175
    close(pipefd[1]);
176
}
177
prev_read = pipefd[0];

This is what the most delicate part looks like in real code: fork, dup2, FD closing, and command execution.

What is happening here, step by step:

fork() splits execution into parent and child. If it fails (pid < 0), the code cleans everything opened so far (pipe ends, prev_read), waits for already-started children, and returns error. This cleanup prevents leaks and inconsistent states.
In the child, SIGINT is reset to default. signal(SIGINT, SIG_DFL) lets the external command respond to Ctrl+C like in a real shell. The main shell process may follow a different policy, but the child should behave “normally.”
dup2 wires the pipeline into the right command. If prev_read exists, it is duplicated to STDIN_FILENO (current command input). If needs_pipe is true, pipefd[1] is duplicated to STDOUT_FILENO (output to next command). In practical terms: input from previous command, output to next command.
After dup2, original FDs are closed. This is key: after duplication, old descriptors are no longer needed. If they stay open, you may cause deadlocks (because a write end remains open) or consume resources unnecessarily.
Redirections are applied and then command execution happens. apply_redirections(...) can override stdin/stdout for <, >, or >>. Then, if the command is a builtin, it runs in the child (pipeline case); otherwise execvp is called. If execvp fails, it exits with 127, the typical code for “command not executable/not found.”
The parent keeps pipeline control. It stores the pid, closes what it no longer needs, and moves prev_read to the read end of the newly created pipe. This pattern allows chaining N commands without mixing descriptors.

In short: this snippet does not just “spawn processes”; it implements a precise choreography of descriptors and signals. Change the order of these steps and the pipeline can break even with the same system calls.

Why this project helped me so much

Because it forced me to answer questions I could avoid in class:

“What exactly changes after fork?”
“What does a child inherit and what not?”
“When should I close each FD?”
“Why can’t cd just be an external executable?”
“How does a system error surface in terminal UX?”

Most importantly: it forced me to debug real behavior, not just repeat concepts.

Technical lessons I take with me

Systems work is about operation order. The same set of calls can work or fail depending on order.
A clear module API is gold. Separating parser/executor/builtins made the code defendable and maintainable.
Defining scope early avoids chaos. Explicitly deciding what NOT to implement keeps focus and quality.
Clear errors are part of the product. A systems program without useful messages is much harder to use and debug.
Dynamic memory is not optional here. In a shell, almost everything has variable size.

If I had a next iteration

The natural progression would be:

Variable expansion ($VAR, $?)
Logical operators (&&, ||)
More complete quoting/escaping parser
History and autocomplete
Automated parser and execution tests

That way the shell would grow without losing the foundation already built.

Personal closing

The most valuable part of minish is not that it runs ls or supports pipes.

The valuable part is moving from “I understand it theoretically” to “I know exactly what is happening, where it can break, and why.”

If this project has one core idea, it is this: focused technical curiosity turns abstract concepts into real intuition.

And for me, that already made it worth it.