Safari | Beginning Perl for Bioinformatics -> B.14 Input/Output

Beginning Perl for Bioinformatics
	Copyright
	Table of Contents
	Preface
	1. Biology and Computer Science
	2. Getting Started with Perl
	3. The Art of Programming
	4. Sequences and Strings
	5. Motifs and Loops
	6. Subroutines and Bugs
	7. Mutations and Randomization
	8. The Genetic Code
	9. Restriction Maps and Regular Expressions
	10. GenBank
	11. Protein Data Bank
	12. BLAST
	13. Further Topics
	A. Resources
	B. Perl Summary
		B.1 Command Interpretation
		B.2 Comments
		B.3 Scalar Values and Scalar Variables
		B.4 Assignment
		B.5 Statements and Blocks
		B.6 Arrays
		B.7 Hashes
		B.8 Operators
		B.9 Operator Precedence
		B.10 Basic Operators
		B.11 Conditionals and Logical Operators
		B.12 Binding Operators
		B.13 Loops
		B.14 Input/Output
		B.15 Regular Expressions
		B.16 Scalar and List Context
		B.17 Subroutines and Modules
		B.18 Built-in Functions
	Colophon
	Index

Beginning Perl for Bioinformatics > B. Perl Summary > B.14 Input/Output

< BACK

CONTINUE >

B.14 Input/Output

This section covers getting information into programs and receiving data back from them.

B.14.1 Input from Files

Perl has several convenient ways to get information into a program. In this book, I've emphasized opening files and reading in the information contained in them, because it is frequently used, and because it behaves very much the same way on all different operating systems. You've observed the open and close system calls and how to associate a filehandle with a file when you open it, which then is used to read in the data. As an example:

open(FILEHANDLE, "informationfile");
@data_from_informationfile = <FILEHANDLE>;
close(FILEHANDLE);

This code opens the file informationfile and associates the filehandle FILEHANDLE with it. The filehandle is then used within angle brackets < > to actually read in the contents of the file and store the contents in the array @data_from_informationfile. Finally, the file is closed by referring once again to the opened filehandle.

B.14.2 Input from STDIN

Perl allows you to read in any input that is automatically sent to your program via standard input (STDIN). STDIN is a filehandle that by default is always open. Your program may be expecting some input that way. For instance, on a Mac, you can drag and drop a file icon onto the Perl applet for your program to make the file's contents appear in STDIN. On Unix systems, you can pipe the output of some other program into the STDIN of your program with shell commands such as:

 someprog | my_perl_program

You can also pipe the contents of a file into your program with:

cat file | my_perl_program

or with:

my_perl_program < file.

Your program can then read in the data (from program or file) that comes as STDIN just as if it came from a file that you've opened:

@data_from_stdin = <STDIN>;

B.14.3 Input from Files Named on the Command Line

You can name your input files on the command line. <> is shorthand for <ARGV>. The ARGV filehandle treats the array @ARGV as a list of filenames and returns the contents of all those files, one line at a time. Perl places all command-line arguments into the array @ARGV. Some of these may be special flags, which should be read and removed from @ARGV if there will also be datafiles named. Perl assumes that anything in @ARGV refers to an input filename when it reaches a < > command. The contents of the file or files are then available to the program using the angle brackets < > without a filehandle, like so:

@data_from_files = <>;

For example, on Microsoft, Unix, or on the MacOS X, you specify input files at the command line, like so:

% my_program file1 file2 file3

B.14.4 Output Commands

The print statement is the most common way to output data from a Perl program. The print statement takes as arguments a list of scalars separated by commas. An array can be an argument, in which case, the elements of the array are all printed one after the other:

@array = ('DNA', 'RNA', 'Protein');
print @array;

This prints out:

DNARNAProtein

If you want to put spaces between the elements of an array, place it between double quotes in the print statement, like this:

@array = ('DNA', 'RNA', 'Protein');
print "@array";

This prints out:

DNA RNA Protein

The print statement can specify a filehandle as an optional indirect object between the print statement and the arguments, like so:

print FH "@array";

The printf function gives more control over the formatting of the output of numbers. For instance, you can specify field widths; the precision, or number of places after the decimal point; and whether the value is right- or left-justified in the field. I showed the most common options in Chapter 12 and refer you to the Perl documentation that comes with your copy of Perl for all the details.

The sprintf function is related to the printf function; it formats a string instead of printing it out.

The format and write commands are a way to format a multiline output, as when generating reports. format can be a useful command, but in practice it isn't used much. The full details are available in your Perl documentation, and O'Reilly's Programming Perl contains an entire chapter on format. You can also see format in Chapter 12 of this book.

B.14.4.1 Output to STDOUT, STDERR, and Files

Standard output, with the filehandle STDOUT, is the default destination for output from a Perl program, so it doesn't have to be named. The following two statements are equivalent unless you used select to change the default output filehandle:

print "Hello biology world!\n";
print STDOUT "Hello biology world!\n";

Note that the STDOUT isn't followed by a comma. STDOUT is usually directed to the computer screen, but it may be redirected at the command line to other programs or files. This Unix command pipes the STDOUT of my_program to the STDIN of your_program:

my_program | your_program

This Unix command directs the output of my_program to the file outputfile:

my_program > outputfile

It's also common to direct certain error messages to the predefined standard error filehandle STDERR or to a file you've opened for input and named with a particular filehandle. Here are examples of these two tasks:

print STDERR "If you reached this part of the program, something is terribly wrong!";

open(OUTPUTFD, ">output_file");
print OUTPUTFD "Here is the first line in the output file output_file\n";

STDERR is also usually directed to the computer screen by default, but it can be directed into a file from the command line. This is done differently for different systems, for example, as follows (on Unix with the sh or bash shells):

myprogram 2>myprogram.error

You can also direct STDERR to a file from within your Perl program by including code such as the following before the first output to STDERR. This is the most portable way to redirect STDERR:

open (STDERR, ">myprogram.error") or die "Cannot open error file 
    myprogram.error:$!\n";

The problem with this is that the original STDERR is lost. This method, taken from Programming Perl, saves and restores the original STDERR:

open ERRORFILE, ">myprogram.error"
	or die "Can't open myprogram.error";
open SAVEERR, ">&STDERR";
open STDERR, ">&ERRORFILE;

print STDERR "This will appear in error file myprogram.error\n";

# now, restore STDERR 
close STDERR;
open STDERR, ">&SAVEERR";

print STDERR "This will appear on the computer screen\n";

There are a lot of details concerning filehandles not covered in this book, and redirecting one of the predefined filehandles such as STDERR can cause problems, especially as your programs get bigger and rely more on modules and libraries of subroutines. One safe way is to define a new filehandle associated with an error file and to send all your error messages to it:

open (ERRORMESSAGES, ">myprogram.error")
	or die "Cannot open myprogram.error:$!\n";

print ERRORMESSAGES "This is an error message\n";

Note that the die function, and the closely related warn function, print their error messages to STDERR.

< BACK

CONTINUE >

Index terms contained in this section

(angle brackets)
      line input (angle) operator
@ARGV arrays
angle operator
arrays
      @ARGV
close (system call)
command line
      input files, naming on
die function
error messages
      directing to STDERR
filehandles
      output
files
      directing output to
      input from
            named on command line
format function
formatting
     output
            printf function
input
      from files
            named on command line
      STDIN (standard input)
line input (angle) operator
MacOS X
      specifying input files on command line
open system call
output
      directing to STDOUT, STDERR and files
      functions for
Perl
      input/output
print function
printf function
sprintf function
STDERR filehandle
STDIN filehandle
STDOUT filehandle
strings
      formatting (sprintf function)
system calls
      open and close
Unix
      specifying input files on command line
warn function
Windows systems
      specifying input files on command line
write function