Visitors Views
The Page ... ...
The Site ... ...

1  The command line

In this chapter we will explore the fundamentals of the command line interface (aka CLI). We will distinguish the differences between Unix, CLI, Bash and Terminal and other concepts from the computer sciences.

As you will see the CLI is composed of several programs enabling the interaction with the machine, we will discuss some of the basics to navigate your machine, and some advance one that enable complex operations and automating tasks.

1.1 Command line basics

Before landing into the CLI let us consider the Unix concept. The first question that comes in this section is : what is Unix ? It simply is an :operating system (OS). In other words, it is a set of programs that inter-operate with each other to let you communicate with the machine. A very important variant (or clone) of Unix is the very well known OS :Linux, which was created by :Linus Torvalds from scratch. The most important idea behind Unix based systems is the idea that we can use it to access information and hardware programmatically. Other main feature from Unix-like OS systems is the fact that data is usually stored as text files and the interface by which users communicate with the machine is also text-based (TUI : text user interface as opposed to GUI, graphic user interface).

Figure 1.1: A terminal app displaying common features of the command line interface

Almost every computer has a way to interact with or access to the inner elements of the computer. Such interface is called the the command-line-interface Fig. 1.1.

1.1.1 File paths

Programs, files and directories on every machine (with Unix-like OS) display hierarchical paths (routes), starting out from the root (represented by the back-slash character /). The root represents the beginning of all the software installed in the machine. And many other files are nested from there forming a tree-like structure for the paths Fig. 1.2

Figure 1.2: A terminal displaying tree-like structure of the programs in a machine with macOS
Tip

You can inspect the paths of a nested directory tree using tree command in you cli :

tree -d -L 1

There are basically two ways to explore or navigate your file system. If you always represent it from the root, then you are presenting an absolute path. For instance the absolute path to my desktop is (/Users/camilogarcia/Desktop).

1.1.2 Basic Unix commands

Given that the vast majority of file systems are organized in file paths, the first question when starting with the CLI is “Where am I ?”. So Unix tool system is equipped with a bunch of commands but its basic ones are pretty much oriented to answer that question and navigating this text-based interface of files. The following three commands (pwd, cd, ls) will help you conquer the CLI.

1.1.2.1 Printing your working directory

To know where you are you can see your current location, that is to print your working directory using the pwd command.

pwd

1.1.2.2 Change to other directory

cd test-dir
Tip

Some basic arguments to navigate across your terminal :

cd .. # change backwards
cd ~  # change to the home
cd /  # change to the root
cd -  # change to previous dir

1.1.2.3 Listing files

ls 
Tip

You can navigate your executed commands by typing or .

1.1.2.4 Making new directories

mkdir test-dir

1.1.2.5 Creating a file

A simple command to create any file inside your terminal is touch it just create a file, but do not allow any editing.

touch new-file.txt

The new-file.txt is empty and created on your current location unless you assign another path when creating it. We suggest to take a look at Allison Horst illustrations, especially on how to name files depending on the case see Fig. 1.3

Figure 1.3: Different conventions for naming files or directories as a good computational practice. Such as de the kebab-case or the UpperCammelCase. Illustration by Allison Horst

1.1.2.6 Printing files or inputs

cat new-file.txt
some
lines
that
were
written
echo "This will be printed"
This will be printed

1.1.2.7 Removing files or directories

rm
Tip

When having a long command, it becomes practically to go to the beginning or to the end of it. To do so you can use the key combination Ctrl + A and Ctrl + E respectively.

rmdir

1.1.3 Anatomy of a command

There is still many conventions by which the parts of a command line might be called, yet a very standard convention is presented in Fig. 1.4

Figure 1.4: A simple command and a convention to call its main components

Some other for instance also tend to call the option as flag. This conventions are powerful because almost any command line interface display this structure (complex one add some other features and simple one tend to lack subcommands).

Bacterial defense mechanisms to avoid bacteriophage infections are abundant. One of these is the :restriction-modification system (RM-System), which works by targeting a specific sites called motifs, shared by the phage and bacteria, with methylations. Motifs are commonly represented as a :sequence logo which is a probabilistic representation of the nucleotides at each position. The challenge consists of finding the number of times the motif from Fig. 1.5 appears on B. tequilensis EA-CB0015 genome using a command. Assume that probabilities are equal when multiple bases appeared at one site.

Figure 1.5: A RM-system motif logo

Before diving into an :answer take your time to think and solve it by your own.

1.2 Most important skills

When facing the CLI several issues or problems will arise. As for any other unintuitive challenge, a complete text interface Handling errors. Getting help Patience

1.3 Intermediate Unix

1.3.1 Special operators or metacharacters

Some operator or metacharacters have special functions in bash. For instance the * or wildcard is a regular expression character (sometimes called as a placeholder) that will turn in any character, many times, similarly the ? represents any character, once. Whereas the $ (dollar sign or operator) is intended for an special task : call environmental variables which means that once a variable is defined (e.g., var=1) this variable can be called via the $ operator anytime echo $var will get us 1 as the standard output

1.3.2 Intermediate commands

wc
tr
grep 
sed 

1.3.3 Unix flows

Tip

When using the CLI at first its common to feal quite slow. Then, a very useful tip to boost the productivity from the command line is the autocompletion of commands by hitting <tab> after the initial command.

1.3.3.1 Redirection

Redirecting flow

Redirecting flow

1.3.3.2 Pipe

Pipe flow

Pipe flow
Tip

When having a long command, it is also useful to jump by lines instead of character by character. To do so you can use the key combination Alt + <- and Alt + -> respectively.

1.3.4 loops, conditionals and script variables

A second part of this challenges consists of create a script out from r the motif-search one-line command that recursively search the motifs in all genomes from a zip file that contains 10 bacterial genomes. The script should include the shebang, loops, conditionals and environmental variables.

See a possible script that solve the challenge :here

1.4 Advance Unix stuff

1.4.1 System permissions

1.4.2 Aliasing

1.4.3 The .bashrc

1.4.4 awk snippet

:Restriction endonucleases (RE) cleave the DNA by digesting the :phosphodiester bond between two nucleotides. Many RE are directed to specific DNA motifs normally palindromic. There are mainly three types according to it digestive mechanism. RE have been widely used in molecular biotechnology because its specificity and versatility to carry out different experiments.

One of the main uses of RE is to generate a pattern of restricted fragments from different organisms so that samples of organisms, sequences or genes could be distinguished, as long as they display differences in the number of recognition motifs. This is normally done in the lab, where an RE is mixed with a DNA sample and later an :electrophoresis gel is run to see a separation pattern according to the fragments size.

Professor Javier has sequenced the genome of a sampled SARS-CoV2 and want to see the band pattern that the genome would display if it were digested with the RE EcoRV. He has asked you to help him with this problem. The expected output is a text file with the sizes of the fragments, where the size is the number of nucleotides of each fragment.

For more explanations on the basic commands in the command line we suggest to visit the first chapters of Computing skills for biologist from Allesina & Wilmes (2019)

A list of reading for this section :

Dudley & Butte (2009)

Perkel et al. (2021)

Brandies & Hogg (2021)