Safari | Beginning Perl for Bioinformatics -> 12.5 Presenting Data

Beginning Perl for Bioinformatics > 12. BLAST > 12.5 Presenting Data

12.5 Presenting Data

Up to now, we've relied on the print statement to format output. In this section, I introduce three additional Perl features for writing output:

printf function
here documents
format and write functions

The entire story about these Perl output features is beyond the scope of this book, but I'll tell you just enough to give you an idea of how they can be used.

12.5.1 The printf Function

The printf function is like the print function but with extra features that allow you to specify how certain data is printed out. Perl's printf function is taken from the C language function of the same name. Here's an example of a printf statement:

my $first  = '3.14159265';
my $second  = 76;
my $third = "Hello world!";

printf STDOUT "A float: %6.4f An integer: %-5d and a string: %s\n", 
     $first, $second,  $third;

This code snippet prints the following:

A float:  3.1416 An integer: 76    and a string: Hello world!

The arguments to the printf function consist of a format string, followed by a list of values that are printed as specified by the format string. The format string may also contain any text along with the directives to print the list of values. (You may also specify an optional filehandle in the same manner you would a print function.)

The directives consist of a percent sign followed by a required conversion specifier, which in the example includes f for floating point, d for integer, and s for string. The conversion specifier indicates what kind of data is in the variable to be printed. Between the % and the conversion specifier, there may be 0 or more flags, an optional minimum field width, an optional precision, and an optional length modifier. The list of values following the format string must contain data that matches the types of directives, in order.

There are many possible options for these flags and specifiers (some are listed in Appendix B). Here's what is in Example 12-3. First, the directive %6.4f specifies to print a floating point (that is, a decimal) number, with a minimum width of six characters overall (padded with spaces if necessary), and at most four positions for the decimal part. You see in the output that, although the $f floating-point number gives the value of pi to eight decimal places, the example specifies a precision of four decimal places, which are all that is printed out.

The %-5d directive specifies an integer to be printed in a field of width 5; the - flag causes the number to be left-justified in the field. Finally, the %s directive prints a string.

12.5.2 here Documents

Now we'll briefly examine here documents. These are convenient ways to specify multiline text for output with perhaps some variables to be interpolated, in a way that looks pretty much the same in your code as it will in the output—that is, without a lot of print statements or embedded newline \n characters. We'll follow Example 12-3 and its output with a discussion.

Example 12-3. Example of here document

#!/usr/bin/perl
# Example of here document

use strict;
use warnings;

my $DNA = 'AAACCCCCCGGGGGGGGTTTTTT';

for( my $i = 0 ; $i < 2 ; ++$i ) {
print <<HEREDOC;
     On iteration $i of the loop!
    $DNA

HEREDOC
}

exit;

Here's the output from Example 12-3:

On iteration 0 of the loop!
AAACCCCCCGGGGGGGGTTTTTT

On iteration 1 of the loop!
AAACCCCCCGGGGGGGGTTTTTT

In Example 12-3, a here document was put in a for loop, so that you can see the $i variable changing in the printout. The variables are interpolated into a here document in the same way they are interpolated into a double-quoted string. Every time they go through the loop, the contents of the here document are subject to variable interpolation and are printed out. The terminating string used in this example, HEREDOC, can be any string you specify. (There are several options for dealing with things like indentation and so forth; I won't discuss them here and refer you to the Perl documentation.) Here documents are handy for some tasks, such as when you have a long, multiline document with just a few changes applied each time you print it. A business form letter, with only the addressee changed, is a typical example. Using a here document preserves the look of the final output in the code, while allowing variable interpolation.

12.5.3 format and write

Finally, let's take a look at the format and write functions. format is designed to generate reports and can handle page numbers, headers, and various layout options such as centering and left and right justification. It's modelled on the FORTRAN programming-language conventions for formatting and so is particularly handy for producing reports based on that style, such as the PDB file format, in which fields are specified as occupying certain columns on the line.

Example 12-4 is a short example of a format that creates a FASTA-style output.

Example 12-4. Example of format function to produce FASTA output

#!/usr/bin/perl
# Create fasta format  DNA output with "format" function

use strict;
use warnings;

# Declare variables
my $id = 'A0000';
my $description = 'Highly weird DNA.  This DNA is so unlikely!';
my $DNA = 'AAAAAACCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTTT';

# Define the format
format STDOUT =
# The header line
>@<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<...
$id,        $description
# The DNA lines
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~
$DNA
.

# Print the fasta-formatted DNA output
write;

exit;

Here's the output of Example 12-4:

>A0000      Highly unlikely DNA.  This DNA is so...
AAAAAACCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGTTTTTTTTT
TTTTTTTTTTTT

After declaring and initializing the variables that fill in the form, the form is defined with:

 format STDOUT =

and the format continues until it reaches the line with a period at the beginning.

The format is composed of three kinds of lines:

A comment beginning with the pound sign #
A picture line that specifies the layout of text
An argument line that names the variables that fill in the preceding picture line

The picture line and the argument line must be adjacent; they can't be separated by a comment line, for instance.

The first picture line/argument line combo is for the header information:

>@<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<...
$id,        $description

The picture line has two picture fields in it, associated with the variables $id and $description, respectively. The picture line begins with a greater-than sign, >, which is just text that begins each FASTA file header line, by definition. Then comes the first picture field, which is an @ sign followed by nine < signs. The @ sign declares a field that has the associated variable interpolated into it. The use of the nine less-than signs specifies that the value should be left-justified, for a total of 10 columns. If the value is bigger than 10 columns, it is truncated. A less-than sign left-justifies, a greater-than sign right-justifies, and a vertical bar | centers the data in the field.

The second picture field is almost identical. It is longer and ends with three dots (an ellipsis) which prints if the contents of the variable $description can't fit into the length of the picture field (which, in this case, is true.)

The next pair of picture/argument lines is:

^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~
$DNA

The picture field starts with a caret, which declares a picture field that will handle variable-length records. The line also contains 49 less-than signs, for a total of 50 columns, left-justified. At the end are two tilde ~ signs, which indicate there should be additional lines for the data if it doesn't fit one on one line.

The write command simply prints the previously defined format. By default, the output goes to STDOUT, as is done in the example, but you can supply a filehandle to the format and write statements if you desire.

The upcoming release of Perl 6 will move formats out of the core of the language and make them into a module. Details are not available as of this writing, but this change will probably entail adding a statement such as use Formats; near the top of your code in order to load the module for using formats.

< BACK

CONTINUE >

Index terms contained in this section

% (percent sign)
      in directives
(angle brackets)
     > (right angle bracket)
            right justification
     < (left angle bracket)
            for left justification
^ (caret)
      for filled text in formats
| (vertical bar)
      for centering
~~ (tildes), for multiline data in fields
alignment of text
argument line
BLAST (Basic Local Alignment Search Tool)
     output files
            presenting data
centering text
conversion specifiers
directives, printf function
files
     BLAST output
            presenting data
for loops
      here document in
format function
formatting
      output
            format and write functions
            here documents
            printf function
here documents
interpolating variables
      into here documents
layout
      text, specifying with picture line
left justification of text
less than (<)
output
     BLAST files
            presenting data
picture fields
picture line, specifying test layout
presenting data, BLAST output files
right justification of text
STDOUT filehandle
      printing to
text
      alignment of
variables
     interpolating
            into here documents
write function
writing
      output
            format and write functions
            here documents
            printf function