Getting Started With UNIX


Table of Contents


I.0 Getting Started With Unix

This guide is designed to give you just enough UNIX knowledge to let you work productively on the Unix server. This means that you will only have to learn about a dozen commands and some information about how files and folders (directories) are organized and managed on the system. For those interested in details, you can use the UNIX Help system at Unix Help for Users. So chin-up and away we go....

I.1 What is an operating system, and why do we need to know how to use it?

UNIX is an operating system or environment, that provides commands for creating, manipulating, and examining data files, and running programs. UNIX is a multi-user system so it is necessarily more complex than personal computers because of inherent security that must be provided to user files. Some operating systems with which you may be familiar are MS-DOS and Windows (IBM PC's and compatibles), Macintosh OS, or VMS (DEC VAX systems). Despite their differences, all of these operating systems do essentially the same thing, which is to act as the unifying framework within which all tasks are performed.

If you have only used computers for word processing, you have been doing operating system tasks, using commands within the word processor to list your files, rename them etc. While it is useful for software packages to have these facilities, they are really reinventing the wheel, in that they duplicate the things that are already present in the operating system. In fact, with each new software package that provides these functions, you have to relearn how to do the same housekeeping tasks.

For a variety of reasons, it is neither possible nor desirable for every program to offer the complete facilities of an operating system. Thus, most programs concentrate on the specific task for which they were designed (such as comparing two sequences for similarity). General tasks, such as creating and editing data files, are carried out using the tools provided by the operating system.

The programs and databases on the BSR computers come from a number of different sources, and perform a wide variety of tasks. There is no single program package that will do everything, and certainly none that will do everything well. Additionally, as new methods of analysis are devised, it is useful to be able to simply add new programs to the existing set. For these and other reasons, it is necessary to make the effort to learn how to do a minimal number of tasks using the operating system. The advantage is that once you have learned this minimal subset of the operating system, you can perform all of the basic housekeeping and editing functions consistently, regardless of how many new programs are added to the system.

I.2 Why do we need to know UNIX?

UNIX has traditionaly been used to develop software tools to solve problems in many disciplines since it runs on many different types of computers. So the simplest answer is that there are a large number of programs already available running under UNIX to solve many problems and most of them are freely available. UNIX is a multi-user system so many people can use the same programs on the same computer. Kind of like the old mainframe computer model.

Although you will find many programs also running on Macs and PCs, most of the comprehensive packages are very expensive requiring thousands of dollars per computer.

I.3 What you need to learn

Although UNIX is an immense operating system, it is possible to define a small set of commands that will enable you to do most of the things you need to do, and to find out how to do new tasks, as the need arises. The minimal set includes:

It may also be advisable to buy or borrow a book on using UNIX. These books are easy to read and contain useful reference guides for frequently used commands.

I.3.1 The core commands (from the command line)

Table 1 lists a minimal set of UNIX commands. If you learn these commands, you will be able to do the vast majority of what you need to do on the computer, without having to learn the literally thousands of other commands that are present on the system. Always be aware that unlike DOS, UNIX is a case sensitive language. You must type lower case or uppercase as required. You will find a Quick Reference of Commands at the end of this document.

Table 1  Core Commands

UNIX     DOS equivalent 
cat      type           Write and concatenate files
more     type           View a files one page at a time(space bar to advance)
cd       cd             Move to new working directory
chmod                   Change read, write, execute permissions for files
cp       copy           Copy files
logout                  Terminate UNIX session
lp                      Send files to  a printer 
ls       dir            List files and directories
man                     Read or find UNIX manual pages
mkdir    mkdir          Make a new directory
mv       ren            Rename or Move files
passwd                  Change password
rm       del            Remove files
rmdir    rmdir          Remove a directory
^C                      Control C (key sequence to stop program execution)
^Z                      Control Z (Key sequence to suspend a job)
fg                      restart a suspended job

If you have used MS-DOS or other operating systems, you will recognize many of these commands by different names, but they accomplish the same thing. For example, ls is comparable to the dir command in DOS, although it does a lot more. Similarly, cat in UNIX corresponds to type in DOS, cp to copy and mv to rename. This is not an accident, since DOS was actually patterned after UNIX. Consequently, if you are already familiar with DOS, you will have no problem picking up UNIX. In fact, after you have come to appreciate the extra power of UNIX, you will find yourself dissatisfied with the limitations of DOS. Some of the UNIX commands on our systems have been "aliased" (given alternate names) of their DOS equivalents. For, example: dir=ls, del=rm, etc. (Most UNIX systems also have a graphical user interface (GUI) much like Windows or the Macintosh in which many of these commands can be performed via point and click with the mouse. These GUIs differ for each vendor so we will only discuss the so called command line options that your will type at the system prompt.)

Type man [command name] at the UNIX prompt for a description of the command.

i.e., % man ls will give you the help page on the ls command.

Logging On to and Off of a UNIX System

You must have a valid account, called a login account, from your system manager to use a UNIX system. To log onto a UNIX system, enter the login name and password that your system manager gives you. For security reasons, you should change this password often.

When you first sit down to start a session, the login: prompt should appear on your screen. For example:



login: bubba
Password:

If it doesn't, press the <returnn> key (the carriage return) a few times. UNIX should then display the login: prompt. Begin to log on by typing in your login name, then press <rtn>. (Be aware that the user name and password are case sensitive -- upper case letters are distinguished from lower case letters -- so type carefully.) UNIX displays the Password: prompt. Type in your password and press <rtn>. You are logged on and ready to use the computer when

UNIX displays the prompt, like

%>.

When you want to log off a UNIX system, type the UNIX command logout. UNIX displays the login: prompt again.

Note : Make sure you were not running a program or editing a file.

I.3.2 Text editors

A text editor is a program that lets you enter data into files, and modify it, with a minimal amount of fuss. Text editors are distinct from word processors in two crucial ways. First, the text editor is a much simpler program, providing none of the formatting features (e.g.. footnotes, special fonts, tables, graphics, pagination) that word processors provide. This means that the text editor is simpler to learn, and what it can do is adequate for the task of entering a sequence, changing a few lines of text, or writing a quick note to send by electronic mail. For these simple tasks, it is easier and faster to use a text editor.

The second important difference between word processors and text editors is the way in which the data is stored. The price you pay for having underlining, bold face, multiple columns, and other features in word processors is the embedding of special computer codes within your file. If you used a word processor to enter data, your datafile would thus also contain these same codes. Consequently, only the word processor can directly manipulate the data in that file.

Text editors offer a way out of this dilemma, because files produced by a text editor contain only the characters that apweb-docsr on the screen, and nothing more. These files are sometimes referred to as ASCII or TEXT files, since they only contain standard ASCII characters.

Generally, files created by UNIX or by other programs are ASCII files. This seemingly innocuous fact is of great importance, because it implies a certain universality of files. Thus, regardless of which program or UNIX command was used to create a file, it can be viewed on the screen ('cat filename' or 'more filename' ), sent to the printer ('lp filename'), appended to another file ('cat filename1 >> filename2'), or used as input by other programs. More importantly, all ASCII files can be edited with the same text editor.

There are a variey of text editors available for UNIX.

As mentioned earlier, the trade off for simplicity of the text editor is that it doesn't do a lot of things that word processors do. Consequently, when you have simple data to enter, or changes to make in a file, it is easier to use a text editor. When you have large amounts of text to type, or need special formatting, use a word processor on your personal computer to type in the information. You can save word processor files as TEXT to convert them to ASCII, then transfer them to the UNIX system using FTP or a variety of different file sharing protocols.

I.3.3 File organization

It is very easy to rapidly generate so many files that they become an unmanageable mess. This section will describe some strategies for managing your data, and keeping things simple. Also, disk space is precious on any computer. Make sure you remove outdated files to free up disk space for everyone.

Structuring Your Data in Directories

Probably the most useful habit to get into is to organize your files in tree-structured directories. Whenever you login, you are placed in your home directory. Depending on what sort of work you are doing, it is useful to create subdirectories ('mkdir') to hold different sets of files. For example, if you were working with pea genes, you might have a sub-directory within your home directory called 'pea'. The organization of this directory is shown in Fig. 2. In the example, the prompt (enclosed in the {} characters) shows that the current working directory is /home/bubba/pea. Listing the files ('ls -l') shows that pea contains three subdirectories, indicated by a 'd' in the first column of each line. A directory listing of the drr directory shows several data files ('-' in column 1) and two subdirectories, each devoted to a particular multigene family (i.e.. drr39 and drr49). Within each directory are sequences and other files related to each multigene family.

Aside: Since UNIX is a multi-user system, each user has a login name (bubba) in Figure 2., and a home directory which is "owned" by the user. In this example, the first column of information on each line gives access permissions and is in the form of -rwxrwxrwx. You can see that rwx (read;write;execute) occurs three times. Read access is the ability to read the files, write access is the ability to write to a directory or to make changes to a file, and execute access is for programs which must be run. The first rwx represents what the owner of the file (bubba from column 3) can do the file. The second rwx represents what other members of the users group can do to the file. UNIX allows the system manager to set up work groups so that the protection system can grant access to groups but not to other users on the system. The last rwx represents what all other users can do to the file. Find information on the chmod command which allows you change these permissions.

Figure 2 Example of directory organization using nested subdirectories.

% ls -l
total 3
drwx------  5 bubba      512 Mar 28 18:54   cab
drwx------  4 bubba      512 Apr 24 18:09   drr
drwx------  2 bubba      512 Nov 24 17:35   wft
% ls -l drr
total 49
drwx------  2 bubba      1024  Mar  8 10:02  drr39
drwx------  2 bubba      1024  Mar  8 17:45  drr49
-rw-------  1 bubba       754  Mar  9 15:15  oligos.dna
-rw-------  1 bubba     19932 Jul 10  1990   pCHS2.seq
% ls -l drr/drr39
total 23
-rw-------  1 bubba      1460  Mar  6 19:13   drr39.aln
-rw-------  1 bubba       354  Mar  4 17:16   drr39.pep.aln
-rw-------  1 bubba      2275  Mar  6 19:16   drr39.ref
-rw-------  1 bubba       314  Sep  7  1990   pi230.pep
-rw-------  1 bubba       570  Mar  4 18:09   pi230.seq
-rw-------  1 bubba       326  Sep  7  1990   pi39.pep
-rw-------  1 bubba     11558  Nov 14 11:18   pi39.rest
-rw-------  1 bubba       556  Mar  4 18:08   pi39.seq
-rw-------  1 bubba       469  Mar  4 18:11   pi39.wrp

Organization of directories can be tailored to each particular problem. If you were sequencing several genes, each gene should probably have a separate directory to contain all of the files related to the sequencing project. Another approach might be to set up directory hierarchies to match an evolutionary tree. The most important thing is to use some sort or organization that makes sense in the context of the projects you are working on. Here are some general guidelines for organizing directories:

  1. Your home directory should be mostly composed of subdirectories. Leave individual files there only on a temporary basis.
  2. Directory organization is for your convenience. Whenever a set of files all relate to the same thing, dedicate a directory to them.
  3. If a directory gets too big (e.g.. more files than will fit on the screen when you type 'ls -l'), it's time to split it into subdirectories.
  4. Don't go overboard with directories. By splitting your files among too many directories, you could make it harder to use your data.


Directories can evolve

The tree-structured directories you create are not cast in concrete. UNIX is uniquely suited to re-shuffling directories at will. For example, if you were sequencing three cab genes, you might have three separate directories for genes a, b and c, called caba, cabb and cabc. When the sequences are completed, it might be more useful to reorganize files related to these sequences by other criteria. For example, two directories, cabpep and cabdna might contain amino acid and DNA sequence of the three genes, respectively. A third directory, cabfig, might contain figures for publication using the three sequences. The organizational utility of directory hierarchies is limited only by your imagination.

It is sometimes useful to create temporary directories, even if you only use them for half an hour and get rid of them. For example, if you were searching the databases for DNA and protein sequences for plant 'pathogenesis-related proteins', you might create a directory called prp, and use this as your working directory when searching for and retrieving the sequences. Once the sequences have been retrieved, you can discard the 'false' positives' and then divide the remaining entries among directories specialized for particular classes of sequences (e.g.. chitinase, glucanase, and so forth). Once the sequences have been redistributed, you can delete the prp directory.

File extensions identify the type of data in a file

Most operating systems permit files to have extensions that can be used to identify the type of data contained in the file. Although use of file extensions is not required in UNIX, it is strongly advised that all files have file extensions. (Please note that certain characters are not allowed in UNIX filenames; / \ " ' * ; ? [ ] ( ) ~ ! $ { } < > space tab. )

In Fig. 2, the drr39 directory illustrates the strategic use of file extensions. Two members of the drr39 multigene family have been sequenced: cDNAs pi39 and pi230, whose DNA and protein sequences are stored in pi39.seq and pi230.seq, and pi39.pep and pi230.pep, respectively. Additionally, a restriction site search was done on pi39, and the output stored in pi39.rest (Note that UNIX permits file extensions longer than 3 characters). Sequence similarity alignments of the DNA and protein sequences are stored in the files drr39.aln and drr39.pep.aln.

Another useful convention of file extensions is to use all or part of the name of the program that produced the file as the file extension. Thus, the output from a string search using grep would have the file extension '.grep'. Similarly, multiply-aligned sequences reformatted by the reform program have the extension '.ref', as in drr39.ref.

File extensions make it possible to work with groups of files in single commands. For example, if you wanted to create a new directory containing only protein sequences taken from the current directory, the following commands would create the directory 'protein', and move all '.pep' files into it:

mkdir  protein
mv  *.pep  protein

In this case the * is a wild card that will match any file that ends with .pep.

Similarly, the following commands would create a new directory containing all pi230-related files:

mkdir 230
mv pi230.* 230

I.4 Connecting to the system

Because of the variety of terminals and computers networked at the Center, there is no single set of procedures for connecting. However, in this section we will attempt to cover those aspects of connection that can be generalized.

I.4.1 Ethernet connection

Connection by Ethernet is the method of choice, both because of faster data transfer rates as well as greater data integrity. Ethernet also simplifies direct connection with remote systems. Finally, Ethernet makes it possible to run X-window terminal sessions (this gives you the ability to run UNIX graphic windows on your local computer), either using an X-terminal or running an X-terminal emulator on a PC or Mac. Alternatively, a regular line-mode session can be run using the TELNET program.

Unless your PC already has an Ethernet port, you will need to obtain an Ethernet card. It should be possible to get an Ethernet card for about $200. Check with Purchasing for options or call BSR. Macintosh computers without built-in Ethernet adapters can be connected thought their AppleTalk port or you can go through the same process as for PCs and install an Ethernet card. All new Macs come with ethernet built in.

When using Ethernet with a PC, you need two programs: TELNET and FTP, which handle terminal sessions and file transfer, respectively. The most popular PC based TELNET and FTP program at the Center is a product called REFLECTION. You are strongly advised to speak to the manager of the network applications you wish to run BEFORE you purchase. There are many flavors of graphics emulators available for TELNET and you must choose the right product to match your application. The same is true for the Macintosh TELNET programs. There are free versions and commercial versions and you must carefully choose to make sure you will be able to see the screen graphics you require. FETCH is a shareware program available for Mac FTP. You will also need MacTCP which is a program which allows a Mac to communicate over the network. Ask BSR staff for help choosing and installing this software.

Once TELNET and FTP are installed on your PC, connection is simple. In the example shown below, the user is connecting to workstation 'adam' by running TELNET at the DOS prompt (You may have another method of starting up TELNET and FTP depending upon your computer configuration):

 
C:\>TELNET adam
Trying 137.28.109.6...
Connected to adam.cs.uwec.edu.
Escape character is '^]'.


OSF/1 (adam.cs.uwec.edu) (ttyp6)

login: bubba
Password:

Last login: Wed Mar 27 15:23:28 from bubba

DEC OSF/1 V3.2 (Rev. 214); Wed Aug 16 02:05:09 CDT 1995 
DEC OSF/1 V3.2 Worksystem Software (Rev. 214)


.... various login messages, and finally the prompt:

%

This procedure can also be followed for connection to any remote system. Connection using FTP is similar:

C:\>FTP  adam.cs.uwec.edu
Connected to adam.cs.uwec.edu.
220 adam.cs.uwec.edu FTP server (OSF/1 Version 5.60) ready.
Name (adam:tan): tan
331 Password required for tan.
Password:
230 User tan logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> 

TELNET and FTP are actually UNIX utilities that have been transported to DOS and the Mac

Please refer to your manuals or ask BSR staff for help using Telnet or FTP.

MODEM connection details vary depending upon the modem and operating system you use and will not be covered here.

I.5 Electronic mail

Electronic mail (Email) makes it possible to send messages between users on different systems or on the same system. In its simplest form, Email lets you type a quick message and send it to one or more recipients in a matter of seconds. No printing is involved, and you don't have to wait for your secretary to get around to typing up a FAX form and sending the FAX out. If the recipients happen to be logged on, the reply can be received in minutes, right at your screen. Electronic mail has many advantages over telephone and FAX. Email is usually cheaper than calling, and avoids the frustrating phenomenon of 'telephone tag'. Additionally, Email helps break down time zone differences. The recipient gets the message the next time he/she logs in.

While many people are now hooked on FAX, the widespread availability of this medium may in the long run be counter productive by slowing the acceptance of Email. While FAXs are often difficult to read, Email, like other data sent across the Internet, is filtered through error-checking programs. If an error is detected during transmission, the packet is simply resent. Perhaps the greatest advantage of Email over FAX is that the recipient gets the message in computer-readable form, meaning that it can be edited or used as data, printed or stored on disk. Finally, FAX becomes impractical beyond 10 or so pages. In contrast, Email can handle very large documents.

Depending upon the computer and operating system you use, you may use one of several different Email programs. Our servers give access to several forms of Email programs used to write, send, receive, and read Email. The PINE program is a menu driven system that works when you TELNET to the host computer. This has been the primary mail utility for PCs. We also use an Email program called Eudora. It will transfer you UNIX mail to your Mac or PC in a graphical interface. Eudora is the program of choice. Finally, the UNIX mail program is more difficult and is not recommended for the average user.

Please ask BSR staff for an Email account and directions for access.

Quick Reference for UNIX and UNIX Mail

Most of the commands listed below can take both arguments and options. Usually, arguments follow options (e.g. "man -k file"). Use the "man" command to find out more details.

General Account Information:

access:
    login         		start a new login session on the same machine
    logout or exit  		end the current login session
    telnet        		connect to a remote host via Internet.  Also "rlogin"

security:
    passwd      		change your password

help:
    man         		Unix manual pages (specify a command as argument)
    man -k      		searches all manual pages for a string (specify a string)

files:
    ls         		        list files in the current or specified directory
    cd          		change directories (with no arguments, goes to home. 
 						 "cd ~" also goes to home)
    pwd         		show current working directory
    mkdir       		create a subdirectory
    rm          		remove a file (rmdir to remove a directory)
    cat         		dump a file to your screen
    more        		display a file one screen at a time
    grep        		search a file for a string
    
job info:
    whoami      		displays your username
    hostname    		displays the host's name
    w           		shows who's logged on and system stats
    uptime      		shows just system stats
    finger      		gives information about particular users (or system 
				use, if used without an argument).   


communication:
    mail        		send mail.  specify a username, e.g. "mail jdoe@fred.fhcrc.org"
    mail        		read mail.  specify no argument.
    talk        		use the "talk" protocol for interactive chatting
    write       		use the "write" protocol to send a line message
    

file transfer:
    kermit      		upload and download to a PC running Kermit
    ftp         		transfer files between Internet hosts

file compression/uncompression:
    compress/uncompress     	for files ending in .Z

printing:
    lp 			ONLY for text files.

job control:
    ^C          	control-c  To kill a job
    ^S          	control-s  To suspend output to your terminal
    ^Q          	control-q  To restart output suspended with ^S
    ^D          	control-d  "end of input"  sometimes used for ending 
			sessions or programs.
    ^Z          	control-z  To suspend a job.  This simply puts a job in 
			hibernation, it does NOT kill it.
    fg          	foreground  To bring a job suspended with ^Z back
    bg          	background  To run a suspended job in the background.  Good 
			for long-running jobs you don't want to watch or wait for.
    &           	background  Used on the command line to put a job in the 
			background from the start (without  ^Z).  e.g. 
			"rcp bigfile alexia.lis&"

I/O redirection & pipes:
    >           	direct output to a new file (may overwrite an existing 
			file!)
    <           	get input from a file
    >>          	append output to an existing file
    |           	pipe the output of one command to the input of another

Common mail commands (given at the UNIX Mail prompt):
    h           	show 1-line message summaries
    ?           	help screen
   "n"                 	some number.  Show that message, e.g.  "2"
    d           	delete a message.  Just "d" deletes current message.  "dn"
			 deletes message n,  e.g.  "d2"
    s           	save a message to a file.  Specify a filename, or else 
			default is ~/mbox.  e.g. s 33 mail.doc"

Common mail tilde escapes (given within a mail message in column 1):
   ~r filename		read a file into the current message
   ~q                  	quit.  Do not send the message.  (Saves what you have 
			written so far to a file called "dead.letter")
   ~b username   	blind carbon copy to "username"

Common command usage:
    man -k file	        to search for all commands with "file" in their description.

    ls -a 		list all files, including those which start with a period.

    ls -l            	long file listing.

    ls *.txt            lists all files ending in ".txt"

    lp file.txt	prints a text file on www's printer in B1-080

    rlogin www.fhcrc.org -l username
                        Access www using  rlogin.  The -l tells www which 
			username to log you in as.

    cd ~username 	change current directory to the home directory of some 
			other user, e.g. "cd  ~gbnewby"  This works only if the 
			other user has enabled access!

    telnet localhost 	login to the local host (that is, the same machine you are 
			on) using telnet.   Allows multiple sessions.

    grep -in library file | more
			look for all occurrences of the string  "library" in a file
			called "file" eg, the list of mailing lists).  Print 
			them with the line number.  Pipe the output through the 
			"more" command (so it doesn't scroll off your screen)
    
    ls -lR > dirfile	generate a long recursive directory listing, and place the 
			output in a file called "dirfile" for later searching.

    rm -i               cautious rm.  Prompts you before removing each file (this
			is usually the default..

    mail gbnewby@alexia.lis.uiuc.edu < filename
                        send mail, but get input from "filename" 
			instead of from the keyboard. This has the effect of 
			sending a file via email.

    cat file1 file2 > file3
                        has the effect of creating a new file, file3, which 
			contains the contents of  file1 and file2

    cat file3 >> file4
                        has the effect appending the contents of file3 to an 
			existing file, file4

Additional commands in brief (see "man" for more detailed information):
    head              	show the first part of a file
    tail              	show the end of a file
    wc                	count words/lines/characters in a file
    pr                	paginate a file for printing
    which             	find the location of a command
    csh, sh, ksh, tsh 	 run another shell
    tar               	tape archive.  For collections of files
    make              	for managing compilation of source code
    whoami            	get your username
    script            	capture all commands and output in a file

general command syntax:
    				command [ options ] [ arguments ]
                  			Some commands take options or arguments. 
					Some options or arguments may be required. 
					Most options are preceeded by a hyphen ( - )
					and are a single letter.  Options might be
                  			specified separately or together (e.g., these
                  			are equivalent:  "grep -ni str fil"  and 
					"grep -n -i str fil").  Arguments usually
					 follow options.

Some general tips:

Password selection: Make sure your password is impossible to guess. It should not be in any dictionary, forwards or backwards. Mix special characters, such as -_+='., or numbers, or mixed case letters. You should change your password often.

"You have stopped jobs." When you logout, this message apweb-docsrs when you have used ^Z to suspend jobs, but never unsuspended them. Use "fg" to bring back the jobs and exit them normally (or interrupt them with ^C). If you type logout" twice in a row, Unix will try to kill the suspended jobs for you, but this is not reliable.