Safari | Beginning Perl for Bioinformatics -> 6.3 Command-Line Arguments and Arrays

Beginning Perl for Bioinformatics > 6. Subroutines and Bugs > 6.3 Command-Line Arguments and Arrays

6.3 Command-Line Arguments and Arrays

Example 6-3 is another program that uses subroutines. You use the command line to give the program information it needs (such as filenames, or strings of DNA) without having to interactively answer the program's prompts. This is useful if you're scheduling a program to run at a time when you won't be there, for instance.

Example 6-3 also shows a little more about using arrays. You'll see how to use subscripts to access a specific element of an array.

For command-line programs, you type the name of the program, followed by the arguments to the program, if any, and then hit the Enter (or Return) key to start the program running. In Example 6-3, when the user types the program name, she follows that with the argument, which, in this case, is just the string of DNA in which she'll count the G's. So the program is called and returns an answer like so:

AAGGGGTTTCCC

The DNA AAGGGGTTTCCC has 4 G's in it!

Of course, many programs come with a graphical user interface (GUI). This gives the program some or all of the computer screen and usually includes such things as menus, buttons, and places to type in values to set parameters from the keyboard.

However, many programs are run from a command line. Even the newer MacOS X, which is built on top of Unix, now provides a command line. (Although most Windows users don't use the MS-DOS command window much, it's still useful, e.g., for running Perl programs.) As already mentioned, running a program noninteractively, passing parameters in as command-line arguments, allows you to run the program automatically, say in the middle of the night when no one is actually sitting at the computer.

Example 6-3 counts the number of G's in a string of DNA.

Example 6-3. Counting the G's in some DNA on the command line

#!/usr/bin/perl -w
# Counting the number of G's in some DNA on the command line

use strict;

# Collect the DNA from the arguments on the command line
#   when the user calls the program.
# If no arguments are given, print a USAGE statement and exit.

# $0 is a special variable that has the name of the program.
my($USAGE) = "$0 DNA\n\n";

# @ARGV is an array containing all command-line arguments.
#
# If it is empty, the test will fail and the print USAGE and exit
#   statements will be called.
unless(@ARGV) {
    print $USAGE;
    exit;
}

# Read in the DNA from the argument on the command line.
my($dna) = $ARGV[0];

# Call the subroutine that does the real work, and collect the result.
my($num_of_Gs) = countG ( $dna );

# Report the result and exit.
print "\nThe DNA $dna has $num_of_Gs G\'s in it!\n\n";

exit;

################################################################################
# Subroutines for Example 6-3
################################################################################

sub countG {
    # return a count of the number of G's in the argument $dna

    # initialize arguments and variables
    my($dna) = @_;

    my($count) = 0;

    # Use the fourth method of counting nucleotides in DNA, as shown in
    # Chapter Four, "Motifs and Loops"
    $count = ( $dna =~ tr/Gg//);

    return $count;
}

Now let's look at how this program works, while examining and explaining the new features. For starters, notice the new line:

use strict;

which I will use from now on to ensure all variables are declared with my, thus enforcing lexical scoping.

Perl has some special variables it sets so you can easily use the arguments from the command line. Every Perl program has an array variable @ARGV that contains any command-line arguments. Also, there's a special variable called $0 (a zero) that has the name of the program as it was called from the command line.

Notice in Example 6-3 that an informative message is defined in the variable $USAGE and that it begins with the value of the variable $0, followed an indication of the arguments the program needs. This is a common practice; if the user doesn't give the program what it needs, which is determined by some kind of test, the program prints information about how to properly use it and exits.

In fact, this program does check to see if any arguments were typed on the command line. It checks if @ARGV has anything in it, in which case it evaluates to true; or if it is completely empty, in which case it evaluates to false. If you want the program to require an argument be given, you can use the unless conditional, and if @ARGV is empty, to print out the $USAGE statement and exit the program:

unless(@ARGV) {
    print $USAGE;
    exit;
}

The next bit of code shows something new about arrays, namely, how to extract one element from an array, as referenced by a subscript. In other words, it shows how to get at the first, fourth, or whichever element. The code in Example 6-3 shows how to extract the first element, which as you've seen, is numbered 0:

my($dna) = $ARGV[0];

Now you already know there is a first element, since you've just tested to make sure the array isn't empty. You get the first element of array @ARGV by changing the @ to a $ and appending square brackets containing the desired subscript; 0 for the first element, 1 for the second element, and so on. This syntax indicates that since you're now looking at just one element of the array, and it's a scalar variable, you use the dollar sign, as you would any other scalar variables.

In Example 6-3, you copy this first (and only) element of the command-line array @ARGV into the variable $dna.

Finally comes the call to the subroutine, which contains nothing new but fulfills a dream from the final paragraph of Chapter 5:

my($num_of_Gs) = countG ( $dna );

< BACK

CONTINUE >

Index terms contained in this section

@ (at sign)
      @ARGV array variables
arguments
      command-line
            @ARGV array variables
arrays
     elements
            extracting from
command line
      counting Gs in DNA from
command-line arguments
      in @ARGV array variable
counting bases
      on the command line
DNA
     counting nucleotides in
            on the command line
graphical user interfaces (GUIs)
lexical scoping
      enforcing with strict
operating systems
      command line, running programs from
programs
      command line, running from
strict pragma
subroutines
      command-line arguments and arrays