6.3
Command-Line Arguments and Arrays
Example 6-3 is another program that uses
subroutines.
You use the command line to give the program information it needs
(such as filenames, or strings of DNA) without having to
interactively answer the program's prompts. This is useful if
you're scheduling a program to run at a time when you
won't be there, for instance.
Example 6-3 also shows a little more about using
arrays. You'll see how to use subscripts to access a specific
element of an array.
For command-line programs, you type the name of the program, followed
by the arguments to the program, if any, and then hit the Enter (or
Return) key to start the program running. In Example 6-3, when the user types the program name, she
follows that with the argument, which, in this case, is just the
string of DNA in which she'll count the G's. So the
program is called and returns an answer like so:
AAGGGGTTTCCC
The DNA AAGGGGTTTCCC has 4 G's in it!
Of course, many programs come with a graphical user interface
(GUI). This gives the program some or all of the computer screen and
usually includes such things as menus, buttons, and places to type in
values to set parameters from the keyboard.
However, many
programs are run from a command line.
Even the newer MacOS X, which is built on top of Unix, now provides a
command line. (Although most Windows users don't use the MS-DOS
command window much, it's still useful, e.g., for running Perl
programs.) As already mentioned, running a program noninteractively,
passing parameters in as command-line arguments, allows you to run
the program automatically, say in the middle of the night when no one
is actually sitting at the computer.
Example 6-3 counts the
number of G's in a string of DNA.
Example 6-3. Counting the G's in some DNA on the command line
#!/usr/bin/perl -w
# Counting the number of G's in some DNA on the command line
use strict;
# Collect the DNA from the arguments on the command line
# when the user calls the program.
# If no arguments are given, print a USAGE statement and exit.
# $0 is a special variable that has the name of the program.
my($USAGE) = "$0 DNA\n\n";
# @ARGV is an array containing all command-line arguments.
#
# If it is empty, the test will fail and the print USAGE and exit
# statements will be called.
unless(@ARGV) {
print $USAGE;
exit;
}
# Read in the DNA from the argument on the command line.
my($dna) = $ARGV[0];
# Call the subroutine that does the real work, and collect the result.
my($num_of_Gs) = countG ( $dna );
# Report the result and exit.
print "\nThe DNA $dna has $num_of_Gs G\'s in it!\n\n";
exit;
################################################################################
# Subroutines for Example 6-3
################################################################################
sub countG {
# return a count of the number of G's in the argument $dna
# initialize arguments and variables
my($dna) = @_;
my($count) = 0;
# Use the fourth method of counting nucleotides in DNA, as shown in
# Chapter Four, "Motifs and Loops"
$count = ( $dna =~ tr/Gg//);
return $count;
}
Now let's look at how this program works, while examining and
explaining the new features. For starters, notice the new line:
use strict;
which I will use from now on to ensure all variables are declared
with my, thus enforcing
lexical scoping.
Perl has some special variables it sets so you can easily use the
arguments from the command line. Every Perl program has an array
variable
@ARGV that
contains any command-line arguments. Also, there's a special
variable called $0 (a zero) that has the name of
the program as it was called from the command line.
Notice in Example 6-3 that an informative message is
defined in the variable $USAGE and that it begins
with the value of the variable $0, followed an
indication of the arguments the program needs. This is a common
practice; if the user doesn't give the program what it needs,
which is determined by some kind of test, the program prints
information about how to properly use it and exits.
In fact, this program does check to see if any arguments were typed
on the command line. It checks if @ARGV has
anything in it, in which case it evaluates to
true; or if it is completely empty, in which case
it evaluates to false. If you want the program to
require an argument be given, you can use the
unless conditional, and if
@ARGV is empty, to print out the
$USAGE statement and exit the program:
unless(@ARGV) {
print $USAGE;
exit;
}
The next bit of code shows something new about
arrays, namely, how to extract one element
from an array, as referenced by a subscript. In other words, it shows
how to get at the first, fourth, or whichever element. The code in
Example 6-3 shows how to extract the first element,
which as you've seen, is numbered 0:
my($dna) = $ARGV[0];
Now you already know there is a first element, since you've
just tested to make sure the array isn't empty. You get the
first element of array @ARGV by changing the
@ to a $ and appending square
brackets containing the desired subscript; 0 for the first element, 1
for the second element, and so on. This syntax indicates that since
you're now looking at just one element of the array, and
it's a scalar variable, you use the dollar sign, as you would
any other scalar variables.
In Example 6-3, you copy this first (and only)
element of the command-line array @ARGV into the
variable $dna.
Finally comes the call to the subroutine, which contains nothing new
but fulfills a dream from the final paragraph of Chapter 5:
my($num_of_Gs) = countG ( $dna );