What is the command-line?

The command-line is a way for you to communicate with your computer.

Imagine that you needed some information from a distant library, in a city where a friend of yours lives. Your friend is willing to help, but knows nothing about the subject matter of the information you want---they can get to the library, but from there on out you're going to have to tell them what to do. Your friend goes to the library, and you start a telephone conversation with each other.

You might start asking your friend questions in this situation, like:

What part of the library are you in?
Give me a list of some of the books you see there.

Based on that information, you might then ask your friend to start doing things for you, like:

Take from the shelf the book with the title "Cheese: A Cultural History".
Read for me the first several lines of the book.

The command-line is kind of like this scenario, except your friend on the other end of the line is the computer. And you're asking it questions not about books in a library, but files on the computer.

Another difference is that your friend in the library can understand human language, and (as a human) is clever enough to figure out your intent, even if you use ambiguous, misleading, or sarcastic language. A computer, on the other hand, can't understand human language. You have to communicate with it through a more limited language of pre-programmed verbs and nouns, following very strict syntax.

But why? Surely we've advanced past such barbarities.

This style of interaction with a computer was invented soon after the invention of computers themselves---it's a very simple for a programmer to create an interface to the computer's functionality, and an efficient way for human operators to unambiguously communicate their intent about what they want the computer to do.

Contemporary "graphic" user interfaces (GUIs) have existed in some form for a while. An early example of the GUI can be found in Doug Engelbart's so-called "Mother of All Demos", presented in 1968. The Xerox Alto project, developed in the 1970s established many conventions in GUIs that we still use today, and served as the inspiration for the Apple Macintosh.

But the command-line has a number of advantages over the GUI. For example, this command-line command:

$ cp file1.html animals/feline.html

... takes a file called file1.html and makes a copy called feline.html in a folder called animals. (cp is the UNIX command for "copy.") For an experienced user, performing this operation on the command-line can be much faster than performing the tasks necessary to do it in the GUI (which might involve opening several "windows," dragging an icon with the mouse, performing right clicks, etc.).

The command-line interface also easily allows for multiple actions to be combined into a single action, or for one program to use another program's output as input. Here's another example command-line command:

$ cut -d ',' -f 2-3 data.txt | sort | uniq | grep 'cheese'

This command extracts the second and third fields of a comma-separated values file, sorts the values in alphabetical order, eliminates all of the duplicate lines, and then filters the result to include only lines that contain the string cheese. In order to perform this same task in a GUI, you'd need to either cut-and-paste your data between different programs that accomplished the individual tasks (one program to select parts of the data, another to sort it), or you'd have to find a single program that supported all of the desired features.

The UNIX command-line

Nearly all operating systems provide a command-line interface of some sort or other. (I cut my teeth on the MS-DOS command-line, an analog of which is still available on most Windows machines as cmd.exe.) When most people think of the command-line in a contemporary context, they're thinking of the UNIX command-line.

SIDEBAR: 'Wait, what's an "operating system"?' I hear you ask. Good question! An operating system is the software that runs on your computer that provides the basic functionality necessary for other programs to function---everything from interfaces to your computer's components, like its hard drive or peripherals (mouse, printer, etc.) to things that you see as a user, like the user interface. You probably use multiple operating systems throughout the day: Windows or OS X on your computer, Android or iOS on your mobile phone.

UNIX is a family of common operating systems that originated in the 1970s, and are still frequently used today, in particular a clone/derivative called Linux. OS X is itself a derivative of UNIX (with a fancy proprietary GUI).

When UNIX was first being developed, and subsequently in its history, the programmers on the project came up with a series of command-line tools to help them accomplish tasks and solve common problems. It turns out that programmers, like other kinds of writers, deal with text a LOT, and so many of the tools they developed deal with text: filtering text, sorting text, modifying text. Over time, many other programmers have contributed to these tools, adding functionality and fixing bugs. For my money, they're some of the most useful things that writers, researchers and computer users in general can learn.

You can also use these tools creatively. So we're going to learn how to use them.

How do computers think about text?

Text can be divided into any number of different, overlapping units (document, page, section, subsection, chapter, clause, sentence, ascender, descender, act, stanza, syllable, foot, etc.) but only some of these are easy for computers to work with. (It's harder than you think to teach a computer what a "sentence" is, for example.)

The two most obvious units of text in a computer are:

the character, i.e., the byte (or series of bytes) that represents a single element of written language (e.g., A through Z in English, any one of many glyphs in Chinese, etc.)
the file, i.e., an ordered collection of characters

Somewhere in between these two is the line, a formal unit of text that has been part of written language from the beginning. (Here's an example of Cuneiform, an ancient writing system, written with lines.) The line arises in written text because writing transcribes speech, which is a one-dimensional medium, onto two-dimensional surfaces (paper, clay, stone, etc.). Line breaks in text are, fundamentally, a way of using up all of the space allotted on a surface.

But line breaks also serve syntactic, semantic, and metrical functions, as in poetry:

Rose, harsh rose, 
marred and with stint of petals, 
meagre flower, thin, 
spare of leaf,

more precious 
than a wet rose 
single on a stem -- 
you are caught in the drift.

Stunted, with small leaf, 
you are flung on the sand, 
you are lifted 
in the crisp sand 
that drives in the wind.

Can the spice-rose 
drip such acrid fragrance 
hardened in a leaf?

In computer text, the line is often used as a "record marker." This is how a text file can be used as a rudimentary database, with one "record" per line. (For example, here are some NBA stats, written in plain text format, with one line per player.)

Perhaps because of these parallelisms (text layout/poetic structure/database structure), many programs that operate on text use the line as their fundamental unit---especially those in UNIX (coming right up). The programs that we write in this class will do the same.

Getting started: Logging In

If you're already familiar with UNIX (or some variant thereof) and have access to a UNIX machine, then you're good to go, and you can skip to the next section.

For the duration of this course, there is a UNIX server (well, Amazon Linux) available online for student use. In order to log into it, though, you'll need an SSH client. (SSH means "secure shell": it's a protocol that allows you to log in to remote machines with secure data encryption.)

If you're using OS X, then you've already got a good SSH client. Open Terminal (Applications > Utilities > Terminal); at the prompt, type the following:

ssh -p port_number your_user_name@server_name

... replacing your_user_name with the username I gave you in class, server_name with the server name I gave in class, and port_number with the port number I gave in class. (I'm not putting these in the notes in order to mitigate hacking attempts from third parties. E-mail me if you need me to remind you about this information.)

You'll be prompted for a password, which I gave in class. Come see me if you missed it, or if you forgot your password when you changed it in class.

If you're using Windows, you'll need to download an SSH client. I recommend PuTTY. The file you want to download is putty-0.63-installer.exe (though by the time you read this, the number 63 might be different---it's fine to download newer versions.) Here's a YouTube video of someone installing PuTTY on a Windows machine that you might find helpful.

If you're using Linux, you probably already know the drill. Open your terminal emulator and SSH to the server, using the same command for OS X given above.

When you've successfully reached the command line (another line!), you should see something like this:

[aparrish@ip-172-30-0-159 ~]$

This is the "prompt" (because it "prompts" you to do something). It's telling you your username, the server you're logged into, and the current directory. More on that later.

Keystrokes you should know

The keys you type on the command-line generally do what you think they will: they print the character you typed to the screen. The command-line also has a number of special keystrokes that have particular meanings. Two are important to know from the very beginning.

Ctrl+D signals to a program that is waiting for you to type in something that you are done typing stuff in. For example, the sort command, when run on its own, will wait for you to type in the lines that you want to sort. Hit Ctrl+D to tell sort that you've entered your last line.

Ctrl+C signals to a program that you want it to stop doing whatever it's doing immediately, even if it hasn't yet completed its task. If, for example, you accidentally used the wrong file in an operation and you want the operation to stop (because it's printing out too many lines, or the wrong lines), hit Ctrl+C.

Summed up:

Ctrl+D: "I'm done typing things in. kthxbye."
Ctrl+C: "You're doing something I don't like. Please stop."

Notably, Ctrl+D is also used to signal to the command-line that you're done entering commands. At the prompt, hit Ctrl+D to log out. (You can accomplish the same thing by typing exit and hitting return.)

Your first UNIX commands

First off, we're going to create a directory, so that you can find it later and you don't risk overwriting something:

$ mkdir workshop 
$ cd workshop

(don't type the $! That's just there to indicate that those commands should be typed at the command line.)

The mkdir command means "make directory"–"directory" is just UNIX speak for "folder." When you're using the command line, there's one directory on your machine that is considered your "current" directory, i.e., the directory you're doing stuff in. The cd ("change directory") command makes the directory you give it (workshop in this case) the current directory.

There are (broadly) two kinds of commands in UNIX: commands that work on lines of input/output, and commands that operate on files and directories. The mkdir and cd commands are examples of the latter. We're primarily concerned with the former. Let's start with cat:

$ cat

(Make sure to hit "return" after you type cat.) Now type. After you enter a line, cat will print out the same line. It's the simplest text filter possible (one rule: let everything through).

When you're done with cat, press Ctrl+D. Let's try something more interesting, like grep:

$ grep foo

Now type some lines of text. Try typing, for example, I like drink and then I like food. The grep command only prints out lines that "match" the string of characters that follow the command (foo, in this case). Let's try it again, this time with a different "pattern":

$ grep you

If we cut and paste the poem above ("Sea Rose") into the terminal application the resulting output would look like this:

you are caught in the drift.
you are flung on the sand,
you are lifted

The commands head and `tail print out a certain number of lines from the beginning of a file and the end of a file, respectively. If you type in the following:

$ tail -3

... and then paste in the poem above, you'd get:

Can the spice-rose
drip such acrid fragrance
hardened in a leaf?

Structure of UNIX commands

UNIX commands generally follow this structure:

name_of_command [options] arguments

(The "[options]" part of that schema is usually one or more characters preceded by hyphens. The -4 in tail tells it to print the last four lines; grep takes an option, -i, which tells it to be case insensitive.)

You can think of UNIX commands like commands in English, but with a funny syntax: "Fetch thoroughly my slippers!"

You can figure out which options and arguments a command supports by typing man name_of_command at the command line.

Sorting and piping

The sort command takes every line of input and prints them back out, in alphabetical order. Try:

$ sort

... paste in the poem, and hit Ctrl+D. You'll get something like:

Can the spice-rose 
drip such acrid fragrance 
hardened in a leaf?
in the crisp sand 
marred and with stint of petals, 
meagre flower, thin, 
more precious 
Rose, harsh rose, 
SEA ROSE
single on a stem -- 
spare of leaf,
Stunted, with small leaf, 
than a wet rose 
that drives in the wind.
you are caught in the drift.
you are flung on the sand, 
you are lifted

(Why do you think there are so many blank lines at the beginning?)

So far, we've just been sending these commands input (by typing, or cutting and pasting), then letting the output be printed back to the screen. UNIX provides a means by which we can send the output of one program as the input of another program. We do this using the pipe character (| ... usually shift+backslash). For example:

$ grep leaf | sort

... takes lines from input, displays only those that contain the string "leaf," and then passes them to sort, which displays those lines in alphabetical order. The output from the poem:

hardened in a leaf?
spare of leaf,
Stunted, with small leaf,

cut

The cut command breaks up a line of text into its constituent parts. Let's say we had a text file full of data, where each line contained multiple "fields" separated by commas. (This is a "comma-separated value" file, a common way of exporting data from a spreadsheet program like Excel---especially if you want to share that data with someone who doesn't have Excel.) Here's what the data looks like:

Geraldine,New York,welding
Roberto,Tennessee,birdwatching
Dana,Wyoming,poetry
Priya,Maine,rock climbing

The cut command allows us to easily process this text and "select" only particular items from each line. Here's how, for example, we could print out just the "state" from each line:

$ cut -d , -f 2

Run that command, then cut-and-paste in the data from above. Here's the output you'd get:

New York
Tennessee
Wyoming
Maine

The cut command takes two options, both of which themselves have parameters. (This is confusing, but stick with me here.) The -d option is followed by the "delimiter" string (i.e., what you want to split the line on---in the example above, the comma); the -f option is followed by which field you want.

Words in a line of text also have a "delimiter" between them---a space character. So we can use cut to transform some text by selecting only, say, the first word of each line. For example, try this command:

cut -d ' ' -f 1

And paste in 'Sea Rose'. The output:

SEA

Rose,
marred
meagre
spare

more
than
single
you

Stunted,
you
you
in
that

Can
drip
hardened

SIDEBAR: What's with the ' '? Why are those quotes just hanging out like that? That's a good question! It turns out that the space character is itself used by the UNIX command-line to have a special meaning---that is, you use one or more space characters to separate commands and parameters from each other. If we want to tell a command to use a space character as a parameter, literally a space character, we need some way to set that space character apart from its "normal" use. We do this by putting the character in quotes (''). Quoting is extremely important to computer programming and this isn't the last time you'll see it---not by a longshot!

Exercise: Use man cut to find out how to use "ranges" as a parameter to the -f option. Then use cut to print out, for each line of "Sea Rose," the second and third words on the line.

tr

The tr command "translates" a set of characters in the original line to another set of characters. The source character set is the first parameter, and the second parameter is the characters you want them to be translated to. For example:

$ tr aeiou eioua
hello there, how are you?
hillu thiri, huw eri yua?

You can specify a range of characters with a hyphen:

$ tr a-z A-Z
hello there, how are you?
HELLO THERE, HOW ARE YOU?

Multiple pipes

Of course, you can include more than one command in a "pipeline":

$ sort | tail -6 | tr aeiou e

... which, if you send it our venerable poem, outputs the following:

Stented, weth smell leef,
then e wet rese
thet dreves en the wend.
yee ere ceeght en the dreft.
yee ere fleng en the send,
yee ere lefted

What happened? The input went to sort, which sorted the lines in alphabetical order. Then tail -6 grabbed only the last six lines of the output of sort, which sent those lines through to tr. (You can build pipelines of infinite length using this technique.)

Using files ("redirection")

So far, we've been building "programs" that can only read from the keyboard (or from cut-and-paste) and can only send their output to the screen. What if we want to read from an existing file, and then output to a file?

No complicated code is needed. UNIX provides a method for us. It's called "redirection." Here's how it works:

$ sort <sea_rose.txt

The < character means "instead of taking input from the keyboard, take input from this file." Likewise:

$ grep were >some_file.txt

The > means "instead of sending your output to the screen, send it to this file." You can use them both at the same time:

$ grep were <sea_rose.txt >some_file.txt

... in which case some_file.txt will end up with every line from sea_rose.txt that contains the string "were." (If the output file doesn't already exist, it will be created. If it does already exist, it will be overwritten, so be careful!)

Other helpful commands

wc -w foo will print out the number of words in the file named foo. (wc -l will count the number of lines; wc -c will count the number of characters.)
curl -s http://some.url/ fetches the web page at the given URL and prints its content to standard output. (We'll be using this command extensively!)
The ls command will give you a list ("ls" stands for "list") of files in your current directory. If you give it a parameter, it will give you a listing of the files in the directory you gave to it. (On OS X, for example, try ls /Users/your_user_name/Desktop)
Type pwd to find out what your current directory is.
The cp command will make a copy ("cp" for "copy") of a file. It takes two parameters: the first is the source file name, the second is the destination file name.
Type rm foo to delete the file named foo. (Note: this is permanent! The file won't go to your Trash, so be careful)