As beautiful as a shell..
In this project, you’ll create your own little bash.
This is a step-by-step guide to writing your own simple shell, we’ll provide you with a lot of information based on our own experience.
- What we’ll cover:
- readline.
- pipes & redirections.
- processes.
- signals.
- more…
char *readline (char *prompt);
The function readline()
prints a prompt and then reads and returns a single line of text from the user. The line readline
returns is allocated with malloc();
you should free()
the line when you are done with it.
So this is how you’ll get the user input.
char *line;
line = readline("[minishell][:)]~> ");
In order to read a line of text from the user. The line returned has the final newline removed, so only the text remains.
If readline
encounters an EOF
(Ctrl + D) while reading the line, and the line is empty at that point, then (char *)NULL
is returned. Otherwise, the line is ended just as if a newline \n
had been typed.
If you want the user to be able to get at the line later, you must call add_history()
to save the line away in a history list of such lines. By adding this function you’ll be able to navigate through history with up and down and retry some commands.
add_history(*line);
At this point, you displayed a prompt with a working history, here are more things you need to check:
if the line returned contains only spaces and tabs → all you need to do is to display a new prompt.
if there is something in the line then you’ll add it to your history.
while (1)
{
line = readline("[minishell][:)]~> ");
if (!line)
{
printf("exit\n");
exit(1);
}
if (strcmp(*line, "") == EQUAL || ft_strisspace(line))
return (1);
if (strlen(line) > 0)
add_history(line);
}
rl_clear_history (void);
Clear the history list by deleting all of the entries.
After getting the command line you need to execute it properly as bash does.
We’ll give you a brief summary of how you’ll do that. There are 3 steps:
- Lexer / tokenizer
- Parser
- executor
lexing or tokenization is the process of converting a sequence of characters into a sequence of lexical tokens for us we choose to traverse the line returned character by character and tokenize it, tokenizing is basically naming things (ex: this a WORD, this is a PIPE….) at this stage we don’t care if it’s a valid command or which command is it, all we do is discovering what it’s in there.
Technically speaking, let’s see how can we do that:
- We chose to create a doubly-linked list to store our lexer results.
typedef struct s_list
{
t_elem *head;
t_elem *tail;
int size;
} t_list;
This is a list element:
typedef struct s_elem
{
char *content;
int len;
enum e_token type;
enum e_state state;
t_elem *next;
t_elem *prev;
} t_elem;
content: a pointer to the string stored in a node.
len: the content length.
type: the content token.
-
enum e_token
enum e_token { WORD = -1, WHITE_SPACE = ' ', NEW_LINE = '\n', QOUTE = '\'', DOUBLE_QUOTE = '\"', ESCAPE = '\\', ENV = '$', PIPE_LINE = '|', REDIR_IN = '<', REDIR_OUT = '>', HERE_DOC, DREDIR_OUT, };
here we defined the tokens that we’ll need.
state: we choose to add this information, the content state, if it's inside/outside (double/single) quotes.
-
enum e_state
enum e_state { IN_DQUOTE, IN_QUOTE, GENERAL, };
Here is an example of how a line is tokenized:
echo "hello $USER " > file | grep h | cat << eof | cat >> file | echo 'done'
In details:
- we start with the first character ‘e’ its not a special char
**(|, >, <,>>,<<, $, ‘ ‘ )**
, so we store that pointer then we start counting the length till we find the first special character which is in this case ‘WHITE_SPACE’. - We store the white space.
- Now we have ‘DOUBLE QUOTES’ so we change the state from GENERAL (which is the default) to ’IN_DQUOUTE’ and we keep counting till finding the next ‘DOUBLE QUOTES’, then the state is set back to general.
- we store the string inside double quotes in a single node and if there is a
$
we store it as an ENV token.
- we store the string inside double quotes in a single node and if there is a
- when we find SINGLE_QUOTES we store the string inside in a single node without caring if there is a
$
because it will not be expanded.
we keep doing the same thing till the end.
💡 It’s better to write all the functions you’ll need related to linked list before you start implementing your lexer.
These are the functions we used.
// ----> list.c:
int is_empty(t_list *list);
t_list *init_list(t_list *list);
t_elem *new_elem(char *content, int len, enum e_token type, enum e_state state);
void add_tail(t_list *list, t_elem *new);
void free_list(t_list *list);
Now we have our linked list with all the information needed.
⚠️ Before launching the parser we check if there are any syntax errors, if yes we print an error message, then we display another prompt.
For us, Parsing is the process of turning a linked_list of tokens into a command tree.
We chose to implement an AST*(abstract syntax tree)* to store the result of the parser and then execute it recursively, here are the structs we used along the way:
-
Tree_struct:
❓ **- Structs:** allocate enough space to store all of the fields in the struct. The first one is stored at the beginning of the struct, the second is stored after that, and so on. **- Unions:** only allocate enough space to store the largest field listed, and all fields are stored in the same space.typedef struct s_ast_node { enum e_node_type type; t_union *content; } t_ast_node; typedef struct s_ast { t_ast_node *root; } t_ast;
We created the struct s_ast to store the root of the AST, and the struct s_ast_node to store a tree node.
type:
enum e_node_type { CMD, PIPE, };
the tree node can be either a command or a pipeline:
COMMAND#1 | COMMAND#2 | COMMAND#3: __ PIPELINE__ ___/ \____ / \ __ PIPELINE __ COMMAND #3 / \ COMMAND #1 COMMAND #2
-
Pipe_struct:
typedef struct s_pipe { t_ast_node *left; t_ast_node *right; } t_pipe;
-
Command_struct:
typedef struct s_cmd { char **args; t_fd fd; t_redir_list *redir; } t_cmd;
args: a 2D array to store the arguments for each command with args[0] the command name.
fd: a t_fd variable to store the fd_in and fd_out for every command we need to store it when there is a redirection or a pipeline as we’ll see later.
typedef struct s_fd { int in; int out; } t_fd;
redir: a t_redir linked list to store all the redirection related to a single command.
- arg: a string to store the name of the redirection file name for
(>,<,>>)
, or EOF for**(<<)**
. we’ll see this in detail later. - type: the redirection’s type
(REDIR_IN, REDIR_OUT, DREDIR_OUT, HERE_DOC).
typedef struct s_redir_list { t_redir_elem *head; t_redir_elem *tail; int size; } t_redir_list; typedef struct s_redir_elem { char *arg; enum e_token type; t_redir_elem *next; } t_redir_elem;
- arg: a string to store the name of the redirection file name for
After we got the lexer list and the syntax is valid we start by parsing the first command.
-
if the first list node is WORD type we know that it's going to be the command name for sure we store it in args[0] as we said before, and we continue to store the other arguments.
-
if we find an ENV we expand it and store it in the args array.
-
HOW?!
we store the env variables in key/value pairs to make things easier, and we’ve implemented all the functions needed to search/add/delete an env variable and also
functions to convert a list to a 2D array and vice versa.
💡 you can search in the env array directly.
-
-
If we find a (single/double) quotes we store all the content inside in args[i], without forgetting to expand env variables if we have double-quotes.
-
If we find a redirection
(>,<,>>)
we store the type, and the first node of the (WORD/ENV) type is stored in the arg variable as the file name.⚠️ If the arg is an env variable we expand it, if it’s not found we print an **`ambiguous redirect`** error**.**
-
If we find HERE_DOC:
- We open a temporary file and we start reading from
STDIN_FILENO
into it, and we compare every line entered with the**EOF**
character specified by the user, then we stop reading when matched.
💡 We should check if the line contains an `**ENV**` so we can expand it.
- We assign the temporary file name to the arg variable so we can treat it as a redirection.
⚠️ DON’T forget to unlink the file using the unlink()function, so it can be deleted automatically after being used.
- We open a temporary file and we start reading from
-
If we find PIPE_LINE:
- We initialize a pipe tree_node.
- We assign the tree root pointer to the pipe left child and the pipe node becomes the new tree root.
- We parse the next command the same way above to be the pipe right child.
- We repeat the same process recursively.
COMMAND: echo hello $USER1 $USER2 > file | cat -e file | grep h
__ PIPELINE __
___/ \____
/ \
__ PIPELINE __ COMMAND
/ \ |
COMMAND COMMAND ARGS
/ \ | |
ARGS REDIR ARGS 0: grep
| | | 1: h
0: echo > 0: cat
1: ael-khni | 1: -e
2: fathjami file 2: file
Typing...
- In order for
readline
functions to work you need to install it using brew! BUT before that you need to update brew or use the following commandbrew reinstall [email protected]
because brew will fail to install readline for some reason!. - If you have no more space, check with
du -sh /Users/$(USER)/*
- if you encountered this error:
error: unknown type name 'FILE' FILE *outf;
In your header file! you need to putstdio.h
file beforereadline/readline.h
andreadline/history.h
file
becauseFILE
is defined instdio.h