This guide is designed to give you just enough UNIX knowledge to let you work productively on the Unix server. This means that you will only have to learn about a dozen commands and some information about how files and folders (directories) are organized and managed on the system. For those interested in details, you can use the UNIX Help system at Unix Help for Users. So chin-up and away we go....
UNIX is an operating system or environment, that provides commands for creating, manipulating, and examining data files, and running programs. UNIX is a multi-user system so it is necessarily more complex than personal computers because of inherent security that must be provided to user files. Some operating systems with which you may be familiar are MS-DOS and Windows (IBM PC's and compatibles), Macintosh OS, or VMS (DEC VAX systems). Despite their differences, all of these operating systems do essentially the same thing, which is to act as the unifying framework within which all tasks are performed.
If you have only used computers for word processing, you have been doing operating system tasks, using commands within the word processor to list your files, rename them etc. While it is useful for software packages to have these facilities, they are really reinventing the wheel, in that they duplicate the things that are already present in the operating system. In fact, with each new software package that provides these functions, you have to relearn how to do the same housekeeping tasks.
For a variety of reasons, it is neither possible nor desirable for every program to offer the complete facilities of an operating system. Thus, most programs concentrate on the specific task for which they were designed (such as comparing two sequences for similarity). General tasks, such as creating and editing data files, are carried out using the tools provided by the operating system.
The programs and databases on the BSR computers come from a number of different sources, and perform a wide variety of tasks. There is no single program package that will do everything, and certainly none that will do everything well. Additionally, as new methods of analysis are devised, it is useful to be able to simply add new programs to the existing set. For these and other reasons, it is necessary to make the effort to learn how to do a minimal number of tasks using the operating system. The advantage is that once you have learned this minimal subset of the operating system, you can perform all of the basic housekeeping and editing functions consistently, regardless of how many new programs are added to the system.
UNIX has traditionaly been used to develop software tools to solve problems in many disciplines since it runs on many different types of computers. So the simplest answer is that there are a large number of programs already available running under UNIX to solve many problems and most of them are freely available. UNIX is a multi-user system so many people can use the same programs on the same computer. Kind of like the old mainframe computer model.
Although you will find many programs also running on Macs and PCs, most of the comprehensive packages are very expensive requiring thousands of dollars per computer.
Although UNIX is an immense operating system, it is possible to define a small set of commands that will enable you to do most of the things you need to do, and to find out how to do new tasks, as the need arises. The minimal set includes:
It may also be advisable to buy or borrow a book on using UNIX. These books are easy to read and contain useful reference guides for frequently used commands.
Table 1 lists a minimal set of UNIX commands. If you learn these commands, you will be able to do the vast majority of what you need to do on the computer, without having to learn the literally thousands of other commands that are present on the system. Always be aware that unlike DOS, UNIX is a case sensitive language. You must type lower case or uppercase as required. You will find a Quick Reference of Commands at the end of this document.
Table 1 Core Commands UNIX DOS equivalent cat type Write and concatenate files more type View a files one page at a time(space bar to advance) cd cd Move to new working directory chmod Change read, write, execute permissions for files cp copy Copy files logout Terminate UNIX session lp Send files to a printer ls dir List files and directories man Read or find UNIX manual pages mkdir mkdir Make a new directory mv ren Rename or Move files passwd Change password rm del Remove files rmdir rmdir Remove a directory ^C Control C (key sequence to stop program execution) ^Z Control Z (Key sequence to suspend a job) fg restart a suspended job
If you have used MS-DOS or other operating systems, you will recognize many of these commands by different names, but they accomplish the same thing. For example, ls is comparable to the dir command in DOS, although it does a lot more. Similarly, cat in UNIX corresponds to type in DOS, cp to copy and mv to rename. This is not an accident, since DOS was actually patterned after UNIX. Consequently, if you are already familiar with DOS, you will have no problem picking up UNIX. In fact, after you have come to appreciate the extra power of UNIX, you will find yourself dissatisfied with the limitations of DOS. Some of the UNIX commands on our systems have been "aliased" (given alternate names) of their DOS equivalents. For, example: dir=ls, del=rm, etc. (Most UNIX systems also have a graphical user interface (GUI) much like Windows or the Macintosh in which many of these commands can be performed via point and click with the mouse. These GUIs differ for each vendor so we will only discuss the so called command line options that your will type at the system prompt.)
Type man [command name] at the UNIX prompt for a description of the command.
i.e., % man ls will give you the help page on the ls command.
You must have a valid account, called a login account, from your system manager to use a UNIX system. To log onto a UNIX system, enter the login name and password that your system manager gives you. For security reasons, you should change this password often.
When you first sit down to start a session, the login: prompt should appear on your screen. For example:
login: bubba Password:
If it doesn't, press the <returnn> key (the carriage return) a few times. UNIX should then display the login: prompt. Begin to log on by typing in your login name, then press <rtn>. (Be aware that the user name and password are case sensitive -- upper case letters are distinguished from lower case letters -- so type carefully.) UNIX displays the Password: prompt. Type in your password and press <rtn>. You are logged on and ready to use the computer when
UNIX displays the prompt, like
%>.
When you want to log off a UNIX system, type the UNIX command logout. UNIX displays the login: prompt again.
Note : Make sure you were not running a program or editing a file.
A text editor is a program that lets you enter data into files, and modify it, with a minimal amount of fuss. Text editors are distinct from word processors in two crucial ways. First, the text editor is a much simpler program, providing none of the formatting features (e.g.. footnotes, special fonts, tables, graphics, pagination) that word processors provide. This means that the text editor is simpler to learn, and what it can do is adequate for the task of entering a sequence, changing a few lines of text, or writing a quick note to send by electronic mail. For these simple tasks, it is easier and faster to use a text editor.
The second important difference between word processors and text editors is the way in which the data is stored. The price you pay for having underlining, bold face, multiple columns, and other features in word processors is the embedding of special computer codes within your file. If you used a word processor to enter data, your datafile would thus also contain these same codes. Consequently, only the word processor can directly manipulate the data in that file.
Text editors offer a way out of this dilemma, because files produced by a text editor contain only the characters that apweb-docsr on the screen, and nothing more. These files are sometimes referred to as ASCII or TEXT files, since they only contain standard ASCII characters.
Generally, files created by UNIX or by other programs are ASCII files. This seemingly innocuous fact is of great importance, because it implies a certain universality of files. Thus, regardless of which program or UNIX command was used to create a file, it can be viewed on the screen ('cat filename' or 'more filename' ), sent to the printer ('lp filename'), appended to another file ('cat filename1 >> filename2'), or used as input by other programs. More importantly, all ASCII files can be edited with the same text editor.
There are a variey of text editors available for UNIX.
As mentioned earlier, the trade off for simplicity of the text editor is that it doesn't do a lot of things that word processors do. Consequently, when you have simple data to enter, or changes to make in a file, it is easier to use a text editor. When you have large amounts of text to type, or need special formatting, use a word processor on your personal computer to type in the information. You can save word processor files as TEXT to convert them to ASCII, then transfer them to the UNIX system using FTP or a variety of different file sharing protocols.
It is very easy to rapidly generate so many files that they become an unmanageable mess. This section will describe some strategies for managing your data, and keeping things simple. Also, disk space is precious on any computer. Make sure you remove outdated files to free up disk space for everyone.
Structuring Your Data in Directories
Probably the most useful habit to get into is to organize your files in tree-structured directories. Whenever you login, you are placed in your home directory. Depending on what sort of work you are doing, it is useful to create subdirectories ('mkdir') to hold different sets of files. For example, if you were working with pea genes, you might have a sub-directory within your home directory called 'pea'. The organization of this directory is shown in Fig. 2. In the example, the prompt (enclosed in the {} characters) shows that the current working directory is /home/bubba/pea. Listing the files ('ls -l') shows that pea contains three subdirectories, indicated by a 'd' in the first column of each line. A directory listing of the drr directory shows several data files ('-' in column 1) and two subdirectories, each devoted to a particular multigene family (i.e.. drr39 and drr49). Within each directory are sequences and other files related to each multigene family.
Aside: Since UNIX is a multi-user system, each user has a login name (bubba) in Figure 2., and a home directory which is "owned" by the user. In this example, the first column of information on each line gives access permissions and is in the form of -rwxrwxrwx. You can see that rwx (read;write;execute) occurs three times. Read access is the ability to read the files, write access is the ability to write to a directory or to make changes to a file, and execute access is for programs which must be run. The first rwx represents what the owner of the file (bubba from column 3) can do the file. The second rwx represents what other members of the users group can do to the file. UNIX allows the system manager to set up work groups so that the protection system can grant access to groups but not to other users on the system. The last rwx represents what all other users can do to the file. Find information on the chmod command which allows you change these permissions.
Figure 2 Example of directory organization using nested subdirectories.
% ls -l total 3 drwx------ 5 bubba 512 Mar 28 18:54 cab drwx------ 4 bubba 512 Apr 24 18:09 drr drwx------ 2 bubba 512 Nov 24 17:35 wft % ls -l drr total 49 drwx------ 2 bubba 1024 Mar 8 10:02 drr39 drwx------ 2 bubba 1024 Mar 8 17:45 drr49 -rw------- 1 bubba 754 Mar 9 15:15 oligos.dna -rw------- 1 bubba 19932 Jul 10 1990 pCHS2.seq % ls -l drr/drr39 total 23 -rw------- 1 bubba 1460 Mar 6 19:13 drr39.aln -rw------- 1 bubba 354 Mar 4 17:16 drr39.pep.aln -rw------- 1 bubba 2275 Mar 6 19:16 drr39.ref -rw------- 1 bubba 314 Sep 7 1990 pi230.pep -rw------- 1 bubba 570 Mar 4 18:09 pi230.seq -rw------- 1 bubba 326 Sep 7 1990 pi39.pep -rw------- 1 bubba 11558 Nov 14 11:18 pi39.rest -rw------- 1 bubba 556 Mar 4 18:08 pi39.seq -rw------- 1 bubba 469 Mar 4 18:11 pi39.wrpOrganization of directories can be tailored to each particular problem. If you were sequencing several genes, each gene should probably have a separate directory to contain all of the files related to the sequencing project. Another approach might be to set up directory hierarchies to match an evolutionary tree. The most important thing is to use some sort or organization that makes sense in the context of the projects you are working on. Here are some general guidelines for organizing directories:
It is sometimes useful to create temporary directories, even if you only use them for half an hour and get rid of them. For example, if you were searching the databases for DNA and protein sequences for plant 'pathogenesis-related proteins', you might create a directory called prp, and use this as your working directory when searching for and retrieving the sequences. Once the sequences have been retrieved, you can discard the 'false' positives' and then divide the remaining entries among directories specialized for particular classes of sequences (e.g.. chitinase, glucanase, and so forth). Once the sequences have been redistributed, you can delete the prp directory.
In Fig. 2, the drr39 directory illustrates the strategic use of file extensions. Two members of the drr39 multigene family have been sequenced: cDNAs pi39 and pi230, whose DNA and protein sequences are stored in pi39.seq and pi230.seq, and pi39.pep and pi230.pep, respectively. Additionally, a restriction site search was done on pi39, and the output stored in pi39.rest (Note that UNIX permits file extensions longer than 3 characters). Sequence similarity alignments of the DNA and protein sequences are stored in the files drr39.aln and drr39.pep.aln.
Another useful convention of file extensions is to use all or part of the name of the program that produced the file as the file extension. Thus, the output from a string search using grep would have the file extension '.grep'. Similarly, multiply-aligned sequences reformatted by the reform program have the extension '.ref', as in drr39.ref.
File extensions make it possible to work with groups of files in single commands. For example, if you wanted to create a new directory containing only protein sequences taken from the current directory, the following commands would create the directory 'protein', and move all '.pep' files into it:
mkdir protein mv *.pep protein
In this case the * is a wild card that will match any file that ends with .pep.
Similarly, the following commands would create a new directory containing all pi230-related files:
mkdir 230 mv pi230.* 230
Because of the variety of terminals and computers networked at the Center, there is no single set of procedures for connecting. However, in this section we will attempt to cover those aspects of connection that can be generalized.
Unless your PC already has an Ethernet port, you will need to obtain an Ethernet card. It should be possible to get an Ethernet card for about $200. Check with Purchasing for options or call BSR. Macintosh computers without built-in Ethernet adapters can be connected thought their AppleTalk port or you can go through the same process as for PCs and install an Ethernet card. All new Macs come with ethernet built in.
When using Ethernet with a PC, you need two programs: TELNET and FTP, which handle terminal sessions and file transfer, respectively. The most popular PC based TELNET and FTP program at the Center is a product called REFLECTION. You are strongly advised to speak to the manager of the network applications you wish to run BEFORE you purchase. There are many flavors of graphics emulators available for TELNET and you must choose the right product to match your application. The same is true for the Macintosh TELNET programs. There are free versions and commercial versions and you must carefully choose to make sure you will be able to see the screen graphics you require. FETCH is a shareware program available for Mac FTP. You will also need MacTCP which is a program which allows a Mac to communicate over the network. Ask BSR staff for help choosing and installing this software.
Once TELNET and FTP are installed on your PC, connection is simple. In the example shown below, the user is connecting to workstation 'adam' by running TELNET at the DOS prompt (You may have another method of starting up TELNET and FTP depending upon your computer configuration):
C:\>TELNET adam Trying 137.28.109.6... Connected to adam.cs.uwec.edu. Escape character is '^]'. OSF/1 (adam.cs.uwec.edu) (ttyp6) login: bubba Password: Last login: Wed Mar 27 15:23:28 from bubba DEC OSF/1 V3.2 (Rev. 214); Wed Aug 16 02:05:09 CDT 1995 DEC OSF/1 V3.2 Worksystem Software (Rev. 214) .... various login messages, and finally the prompt: %
This procedure can also be followed for connection to any remote system. Connection using FTP is similar:
C:\>FTP adam.cs.uwec.edu Connected to adam.cs.uwec.edu. 220 adam.cs.uwec.edu FTP server (OSF/1 Version 5.60) ready. Name (adam:tan): tan 331 Password required for tan. Password: 230 User tan logged in. Remote system type is UNIX. Using binary mode to transfer files. ftp>
TELNET and FTP are actually UNIX utilities that have been transported to DOS and the Mac
Please refer to your manuals or ask BSR staff for help using Telnet or FTP.
MODEM connection details vary depending upon the modem and operating system you use and will not be covered here.
Electronic mail (Email) makes it possible to send messages between users on different systems or on the same system. In its simplest form, Email lets you type a quick message and send it to one or more recipients in a matter of seconds. No printing is involved, and you don't have to wait for your secretary to get around to typing up a FAX form and sending the FAX out. If the recipients happen to be logged on, the reply can be received in minutes, right at your screen. Electronic mail has many advantages over telephone and FAX. Email is usually cheaper than calling, and avoids the frustrating phenomenon of 'telephone tag'. Additionally, Email helps break down time zone differences. The recipient gets the message the next time he/she logs in.
While many people are now hooked on FAX, the widespread availability of this medium may in the long run be counter productive by slowing the acceptance of Email. While FAXs are often difficult to read, Email, like other data sent across the Internet, is filtered through error-checking programs. If an error is detected during transmission, the packet is simply resent. Perhaps the greatest advantage of Email over FAX is that the recipient gets the message in computer-readable form, meaning that it can be edited or used as data, printed or stored on disk. Finally, FAX becomes impractical beyond 10 or so pages. In contrast, Email can handle very large documents.
Depending upon the computer and operating system you use, you may use one of several different Email programs. Our servers give access to several forms of Email programs used to write, send, receive, and read Email. The PINE program is a menu driven system that works when you TELNET to the host computer. This has been the primary mail utility for PCs. We also use an Email program called Eudora. It will transfer you UNIX mail to your Mac or PC in a graphical interface. Eudora is the program of choice. Finally, the UNIX mail program is more difficult and is not recommended for the average user.
Please ask BSR staff for an Email account and directions for access.
Most of the commands listed below can take both arguments and options. Usually, arguments follow options (e.g. "man -k file"). Use the "man" command to find out more details.
access:
login start a new login session on the same machine
logout or exit end the current login session
telnet connect to a remote host via Internet. Also "rlogin"
security:
passwd change your password
help:
man Unix manual pages (specify a command as argument)
man -k searches all manual pages for a string (specify a string)
files:
ls list files in the current or specified directory
cd change directories (with no arguments, goes to home.
"cd ~" also goes to home)
pwd show current working directory
mkdir create a subdirectory
rm remove a file (rmdir to remove a directory)
cat dump a file to your screen
more display a file one screen at a time
grep search a file for a string
job info:
whoami displays your username
hostname displays the host's name
w shows who's logged on and system stats
uptime shows just system stats
finger gives information about particular users (or system
use, if used without an argument).
communication:
mail send mail. specify a username, e.g. "mail jdoe@fred.fhcrc.org"
mail read mail. specify no argument.
talk use the "talk" protocol for interactive chatting
write use the "write" protocol to send a line message
file transfer:
kermit upload and download to a PC running Kermit
ftp transfer files between Internet hosts
file compression/uncompression:
compress/uncompress for files ending in .Z
printing:
lp ONLY for text files.
job control:
^C control-c To kill a job
^S control-s To suspend output to your terminal
^Q control-q To restart output suspended with ^S
^D control-d "end of input" sometimes used for ending
sessions or programs.
^Z control-z To suspend a job. This simply puts a job in
hibernation, it does NOT kill it.
fg foreground To bring a job suspended with ^Z back
bg background To run a suspended job in the background. Good
for long-running jobs you don't want to watch or wait for.
& background Used on the command line to put a job in the
background from the start (without ^Z). e.g.
"rcp bigfile alexia.lis&"
I/O redirection & pipes:
> direct output to a new file (may overwrite an existing
file!)
< get input from a file
>> append output to an existing file
| pipe the output of one command to the input of another
Common mail commands (given at the UNIX Mail prompt):
h show 1-line message summaries
? help screen
"n" some number. Show that message, e.g. "2"
d delete a message. Just "d" deletes current message. "dn"
deletes message n, e.g. "d2"
s save a message to a file. Specify a filename, or else
default is ~/mbox. e.g. s 33 mail.doc"
Common mail tilde escapes (given within a mail message in column 1):
~r filename read a file into the current message
~q quit. Do not send the message. (Saves what you have
written so far to a file called "dead.letter")
~b username blind carbon copy to "username"
Common command usage:
man -k file to search for all commands with "file" in their description.
ls -a list all files, including those which start with a period.
ls -l long file listing.
ls *.txt lists all files ending in ".txt"
lp file.txt prints a text file on www's printer in B1-080
rlogin www.fhcrc.org -l username
Access www using rlogin. The -l tells www which
username to log you in as.
cd ~username change current directory to the home directory of some
other user, e.g. "cd ~gbnewby" This works only if the
other user has enabled access!
telnet localhost login to the local host (that is, the same machine you are
on) using telnet. Allows multiple sessions.
grep -in library file | more
look for all occurrences of the string "library" in a file
called "file" eg, the list of mailing lists). Print
them with the line number. Pipe the output through the
"more" command (so it doesn't scroll off your screen)
ls -lR > dirfile generate a long recursive directory listing, and place the
output in a file called "dirfile" for later searching.
rm -i cautious rm. Prompts you before removing each file (this
is usually the default..
mail gbnewby@alexia.lis.uiuc.edu < filename
send mail, but get input from "filename"
instead of from the keyboard. This has the effect of
sending a file via email.
cat file1 file2 > file3
has the effect of creating a new file, file3, which
contains the contents of file1 and file2
cat file3 >> file4
has the effect appending the contents of file3 to an
existing file, file4
Additional commands in brief (see "man" for more detailed information):
head show the first part of a file
tail show the end of a file
wc count words/lines/characters in a file
pr paginate a file for printing
which find the location of a command
csh, sh, ksh, tsh run another shell
tar tape archive. For collections of files
make for managing compilation of source code
whoami get your username
script capture all commands and output in a file
general command syntax:
command [ options ] [ arguments ]
Some commands take options or arguments.
Some options or arguments may be required.
Most options are preceeded by a hyphen ( - )
and are a single letter. Options might be
specified separately or together (e.g., these
are equivalent: "grep -ni str fil" and
"grep -n -i str fil"). Arguments usually
follow options.
Some general tips:
Password selection: Make sure your password is impossible to guess. It should not be in any dictionary, forwards or backwards. Mix special characters, such as -_+='., or numbers, or mixed case letters. You should change your password often.
"You have stopped jobs." When you logout, this message apweb-docsrs when you have used ^Z to suspend jobs, but never unsuspended them. Use "fg" to bring back the jobs and exit them normally (or interrupt them with ^C). If you type logout" twice in a row, Unix will try to kill the suspended jobs for you, but this is not reliable.