6.2
Scoping and Subroutines
A subroutine is defined by the reserved
word
[3]
for subroutine definitions, sub; the
subroutine's name, in this case, addACGT;
and a
block,
enclosed in a pair of matching curly braces. This is the same kind of
block seen earlier in loops and conditional statements that groups
statements together.
In Example 6-1, the name of the subroutine is
addACGT, and the block is everything after the
name. Here is the subroutine definition again:
sub addACGT {
my($dna) = @_;
$dna .= 'ACGT';
return $dna;
}
Now let's look into the block of the subroutine.
A subroutine is like a separate helper program for the main program,
and it needs to have its own variables. You will use two types of
variables in your subroutines in this
book:[4]
Arguments are the values given to a subroutine when it is used, or
called. The values of the arguments are passed into the subroutine by
means of the special variable
@_, as you'll
see in the next section.
Other variables a subroutine might use must be protected from
interacting with variables in other parts of the program, so they
have effect only within the subroutine's own
scope. This is
accomplished by declaring them as
my variables, as will be explained
shortly.
Finally, most subroutines return their results via the
return function. This can return a single scalar
as in return $dna; in our subroutine
addACGT, in a list of scalars as in
return ($dna1, $dna2);, in an array as in
return @lines;, and more.
6.2.1
Arguments
To call a subroutine means to type its name and
give it appropriate arguments and, usually, collect its results.
Arguments
,
sometimes called parameters, usually contain the
data that the subroutine computes on. In Example 6-1, this is the call of the subroutine
addACGT with the argument
$dna:
$longer_dna = addACGT($dna);
The essential point is that whenever you, the programmer, want to use
a subroutine, you can call it with whatever argument(s) it is
designed to accept and with which you need to compute (in this case,
whatever DNA that needs ACGT appended to it) and
the value of each argument appears in the subroutine in
the
@_ array.
When you call a subroutine with certain arguments, the names of the
arguments you provide in the call are not important inside the
subroutine. Only the values of those arguments that are actually
passed inside the subroutine are important. The subroutine typically
collects the values from the @_ array and assigns
them to new variables that may or may not have the same names as the
variables with which you called the subroutine. The only thing
preserved is the order of the values, not the names of the variables
containing the values.
Here's how it works. The first line in the subroutine's
block is:
my($dna) = @_;
The values of the arguments from the call of the subroutine are
passed into the subroutine in the special array variable
@_. You know it's an array because it starts
with the @ character. It has the brief name
"_", and it's a special array variable that comes
predefined in Perl programs. (It's not a name you should pick
for your own arrays.) The array @_ contains all
the scalar values passed into the subroutine. These scalar values are
the values of the arguments to the subroutine. In this case, there is
one scalar value: the string of DNA that's the value of the
variable $dna passed in as an argument.
If the subroutine has more arguments—for instance one argument
for DNA, one for the associated protein, and one for the name of the
gene—they are all passed in and assigned to
my variables inside the subroutine:
my($dna,$protein,$name_of_gene) = @_;
If there are no arguments, just omit that statement in the subroutine.
After the statement:
my($dna) = @_;
executes in the subroutine, the passed-in value is assigned to the
subroutine's variable $dna. The next section
explains why this is a new variable specific to the subroutine. The
subroutine's variable can be called anything; it certainly
doesn't have to be the same name as the argument, as it happens
to be in this example. What's cool about scoping is that it
doesn't matter if it is or not.
|
Beware the common mistake of forgetting the @_
array when naming your arguments in a subroutine, that is, using the
statement:
my($dna);
instead of:
my($dna) = @_;
If you make this mistake, the values of the arguments won't
appear in your subroutine, even though their names are declared.
|
|
6.2.2
Scoping
By keeping
all variables a subroutine uses active only within the subroutine,
you can make it safe to call the subroutines from anywhere. You make
the variables specific only to the subroutine by declaring them as
myvariables.
my is a keyword defined in Perl that limits
variables to the block in which they are used (in this case, the
block is the subroutine).[5]
Hiding variables and making them local to only a restricted part of a
program, is called scoping. In Perl, using
my variables is known as lexical
scoping, and it's an essential part of
modularizing your programs.
You declare that a variable is a
myvariable like this:
my($x);
or:
my $x ;
or, combining the declaration with an initialization to a value:
my($x) = '49';
or, if you're collecting an argument within a subroutine:
my($x) = @_;
Once a variable is declared in this fashion, it exists only until the
end of the block it was declared in. So in a subroutine, if you
declare all your variables like this (both the arguments and any
other variables), they are active only in the subroutine. If any
variable has the same name as another variable elsewhere in the
program, you don't have to worry, because the
my declaration actually creates a new variable,
active only in the enclosing block, and any other variable of the
same name used elsewhere outside the block is kept separate.
The example that showed collecting an argument in a subroutine uses
parentheses around the variable. Because @_ is an
array, the parentheses around the new variables put them in
array
context and ensure that they are initialized correctly (see Chapter 4).
|
Always declare all your variables in your subroutines—even
those variables that don't come in as arguments—such as
the my construct.
|
|
Why use scoping? Example 6-2 shows the trouble that
can happen when you don't. Recall that one of the advantages of
subroutines is writing a useful bit of code once and then using it
whenever you need it. Example 6-2 is a program that
has a variable in the main program with the same name as a variable
in a subroutine it calls. This can easily happen if you write the
subroutine at a time other than the main program (say six months
later) or if you call a subroutine someone else wrote.
Example 6-2. The pitfalls of not using my variables
#!/usr/bin/perl -w
# Illustrating the pitfalls of not using my variables
$dna = 'AAAAA';
$result = A_to_T($dna);
print "I changed all the A's in $dna to T's and got $result\n\n";
exit;
################################################################################
# Subroutines
################################################################################
sub A_to_T {
my($input) = @_;
$dna = $input;
$dna =~ s/A/T/g;
return $dna;
}
Example 6-2 gives the following output:
I changed all the A's in TTTTT to T's and got TTTTT
What was expected was this output:
I changed all the A's in AAAAA to T's and got TTTTT
You can get by this expected output by changing the definition of
subroutine A_to_T to the following, in which the
variable $dna in the subroutine is declared as a
myvariable:
sub A_to_T {
my($input) = @_;
my($dna) = $input;
$dna =~ s/A/T/g;
return $dna;
}
Where exactly did Example 6-2 go wrong? When the
program entered the subroutine, and used the variable
$dna to calculate the string with A's
changed to T's, the Perl language saw that there was already a
variable $dna being used in the main part of the
program and just kept using it. When the program returned from the
subroutine and got to the print statement, it was
still using the same (the one and only) variable
$dna. So, when it printed the results, the
variable $dna, instead of having the original DNA
in it, had the altered DNA that had been computed in the subroutine.
Now this sort of thing can happen a lot. Programmers tend to use
certain names for variables a great deal: the usual suspects are
names such as $tmp, $temp,
$x, $a,
$number, $variable,
$var, $array,
$input, $output,
$result, $data,
$file, $filename, and so on.
Bioinformaticians are quite fond of $dna,
$protein, $motif,
$sequence, and the like. As you start using
libraries of subroutines from other people and as your programs get
larger, it's much easier—and a whole lot safer—to
let the Perl language worry about avoiding the problem of name
collisions.
In fact, from now
on
we're going to stop using
undeclared variables. From this point
forward, all our variables, even those in the main program, will be
declared with my. You can enforce this discipline
by adding the following directive to your programs:
use strict;
which has the effect of insisting that your programs have all their
variables declared as my variables.
Lest you rail at this seemingly unnecessary complication to your
coding, compared to the simpler and happier days of Chapter 4 and Chapter 5, you should
know that many languages require declarations for all their
variables. The fact that in Perl you don't have to enforce
strict scoping is handy when you're writing short programs, for
example, or when you're trying to teach programming without
hitting the students with a thousand details at the beginning.
Another benefit you get from strict scoping happens if you accidently
misspell a variable name while writing a program. If the variables
aren't being declared, Perl creates a new variable with the
(misspelled) name. The program may not work correctly, and it may be
hard to find where the problem is. By strictly scoping the program,
any misspelled variables are also undeclared, and Perl complains
about it, saving you hours or days of hair-pulling and bad language.
Finally, let's recap how scoping,
arguments, and subroutines work by taking
another look at Example 6-1. The subroutine is
called by writing its name addACGT, passing it
the argument $dna, and collecting results (if any)
by assignment to $longer_dna:
$longer_dna = addACGT($dna);
The first line in the subroutine gets the value of the argument from
the special variable @_, and stores it in its own
variable called $dna, which can't be seen
outside the subroutine because it uses my. Even
though the original variable outside the subroutine is also called
$dna, the variable called $dna
within the subroutine is an entirely new variable (with the same
name) that belongs only to the subroutine due to the use of
my. This new variable is in effect only during the
time the program is in the subroutine. Notice in the output from the
print statement at the end of Example 6-2 that even though a variable called
$dna is lengthened inside the subroutine, the
original variable, $dna, outside the subroutine
isn't changed.