Safari | Beginning Perl for Bioinformatics -> 6.2 Scoping and Subroutines

Beginning Perl for Bioinformatics > 6. Subroutines and Bugs > 6.2 Scoping and Subroutines

6.2 Scoping and Subroutines

A subroutine is defined by the reserved word ^[3] for subroutine definitions, sub; the subroutine's name, in this case, addACGT; and a block, enclosed in a pair of matching curly braces. This is the same kind of block seen earlier in loops and conditional statements that groups statements together.

^[3] A reserved word is a fundamental, defined word in the Perl language, such as if, while, foreach, or sub.

In Example 6-1, the name of the subroutine is addACGT, and the block is everything after the name. Here is the subroutine definition again:

sub addACGT {
    my($dna) = @_;

    $dna .= 'ACGT';
    return $dna;
}

Now let's look into the block of the subroutine.

A subroutine is like a separate helper program for the main program, and it needs to have its own variables. You will use two types of variables in your subroutines in this book:^[4]

^[4] In the subroutines in this book, we won't use global variables, which can be seen by both the main program and the subroutines; nor will we use variables declared with local, which provides a different kind of scoping restriction than my.

Arguments passed in to the subroutine
Other variables declared with my and restricted to the scope of the subroutine

Arguments are the values given to a subroutine when it is used, or called. The values of the arguments are passed into the subroutine by means of the special variable @_, as you'll see in the next section.

Other variables a subroutine might use must be protected from interacting with variables in other parts of the program, so they have effect only within the subroutine's own scope. This is accomplished by declaring them as my variables, as will be explained shortly.

Finally, most subroutines return their results via the return function. This can return a single scalar as in return $dna; in our subroutine addACGT, in a list of scalars as in return ($dna1, $dna2);, in an array as in return @lines;, and more.

6.2.1 Arguments

To call a subroutine means to type its name and give it appropriate arguments and, usually, collect its results. Arguments , sometimes called parameters, usually contain the data that the subroutine computes on. In Example 6-1, this is the call of the subroutine addACGT with the argument $dna:

$longer_dna = addACGT($dna);

The essential point is that whenever you, the programmer, want to use a subroutine, you can call it with whatever argument(s) it is designed to accept and with which you need to compute (in this case, whatever DNA that needs ACGT appended to it) and the value of each argument appears in the subroutine in the @_ array.

When you call a subroutine with certain arguments, the names of the arguments you provide in the call are not important inside the subroutine. Only the values of those arguments that are actually passed inside the subroutine are important. The subroutine typically collects the values from the @_ array and assigns them to new variables that may or may not have the same names as the variables with which you called the subroutine. The only thing preserved is the order of the values, not the names of the variables containing the values.

Here's how it works. The first line in the subroutine's block is:

my($dna) = @_;

The values of the arguments from the call of the subroutine are passed into the subroutine in the special array variable @_. You know it's an array because it starts with the @ character. It has the brief name "_", and it's a special array variable that comes predefined in Perl programs. (It's not a name you should pick for your own arrays.) The array @_ contains all the scalar values passed into the subroutine. These scalar values are the values of the arguments to the subroutine. In this case, there is one scalar value: the string of DNA that's the value of the variable $dna passed in as an argument.

If the subroutine has more arguments—for instance one argument for DNA, one for the associated protein, and one for the name of the gene—they are all passed in and assigned to my variables inside the subroutine:

my($dna,$protein,$name_of_gene) = @_;

If there are no arguments, just omit that statement in the subroutine.

After the statement:

my($dna) = @_;

executes in the subroutine, the passed-in value is assigned to the subroutine's variable $dna. The next section explains why this is a new variable specific to the subroutine. The subroutine's variable can be called anything; it certainly doesn't have to be the same name as the argument, as it happens to be in this example. What's cool about scoping is that it doesn't matter if it is or not.

Beware the common mistake of forgetting the @_ array when naming your arguments in a subroutine, that is, using the statement:

my($dna);

instead of:

my($dna) = @_;

If you make this mistake, the values of the arguments won't appear in your subroutine, even though their names are declared.

6.2.2 Scoping

By keeping all variables a subroutine uses active only within the subroutine, you can make it safe to call the subroutines from anywhere. You make the variables specific only to the subroutine by declaring them as myvariables. my is a keyword defined in Perl that limits variables to the block in which they are used (in this case, the block is the subroutine).^[5]

^[5] There are different models of scoping; my implements a type called lexical scoping, also known as static scoping. Another method is available in Perl via the local construct, but you almost always want to use my.

Hiding variables and making them local to only a restricted part of a program, is called scoping. In Perl, using my variables is known as lexical scoping, and it's an essential part of modularizing your programs.

You declare that a variable is a myvariable like this:

my($x);

or:

my $x ;

or, combining the declaration with an initialization to a value:

my($x) = '49';

or, if you're collecting an argument within a subroutine:

my($x) = @_;

Once a variable is declared in this fashion, it exists only until the end of the block it was declared in. So in a subroutine, if you declare all your variables like this (both the arguments and any other variables), they are active only in the subroutine. If any variable has the same name as another variable elsewhere in the program, you don't have to worry, because the my declaration actually creates a new variable, active only in the enclosing block, and any other variable of the same name used elsewhere outside the block is kept separate.

The example that showed collecting an argument in a subroutine uses parentheses around the variable. Because @_ is an array, the parentheses around the new variables put them in array context and ensure that they are initialized correctly (see Chapter 4).

Always declare all your variables in your subroutines—even those variables that don't come in as arguments—such as the my construct.

Why use scoping? Example 6-2 shows the trouble that can happen when you don't. Recall that one of the advantages of subroutines is writing a useful bit of code once and then using it whenever you need it. Example 6-2 is a program that has a variable in the main program with the same name as a variable in a subroutine it calls. This can easily happen if you write the subroutine at a time other than the main program (say six months later) or if you call a subroutine someone else wrote.

Example 6-2. The pitfalls of not using my variables

#!/usr/bin/perl -w
# Illustrating the pitfalls of not using my variables

$dna = 'AAAAA';

$result = A_to_T($dna);

print "I changed all the A's in $dna to T's and got $result\n\n";

exit;

################################################################################
# Subroutines
################################################################################
sub A_to_T {
    my($input) = @_;

    $dna = $input;

    $dna =~ s/A/T/g;

    return $dna;
}

Example 6-2 gives the following output:

I changed all the A's in TTTTT to T's and got TTTTT

What was expected was this output:

I changed all the A's in AAAAA to T's and got TTTTT

You can get by this expected output by changing the definition of subroutine A_to_T to the following, in which the variable $dna in the subroutine is declared as a myvariable:

sub A_to_T {
    my($input) = @_;

    my($dna) = $input;

    $dna =~ s/A/T/g;

    return $dna;
}

Where exactly did Example 6-2 go wrong? When the program entered the subroutine, and used the variable $dna to calculate the string with A's changed to T's, the Perl language saw that there was already a variable $dna being used in the main part of the program and just kept using it. When the program returned from the subroutine and got to the print statement, it was still using the same (the one and only) variable $dna. So, when it printed the results, the variable $dna, instead of having the original DNA in it, had the altered DNA that had been computed in the subroutine.

Now this sort of thing can happen a lot. Programmers tend to use certain names for variables a great deal: the usual suspects are names such as $tmp, $temp, $x, $a, $number, $variable, $var, $array, $input, $output, $result, $data, $file, $filename, and so on. Bioinformaticians are quite fond of $dna, $protein, $motif, $sequence, and the like. As you start using libraries of subroutines from other people and as your programs get larger, it's much easier—and a whole lot safer—to let the Perl language worry about avoiding the problem of name collisions.

In fact, from now on we're going to stop using undeclared variables. From this point forward, all our variables, even those in the main program, will be declared with my. You can enforce this discipline by adding the following directive to your programs:

use strict;

which has the effect of insisting that your programs have all their variables declared as my variables.

Lest you rail at this seemingly unnecessary complication to your coding, compared to the simpler and happier days of Chapter 4 and Chapter 5, you should know that many languages require declarations for all their variables. The fact that in Perl you don't have to enforce strict scoping is handy when you're writing short programs, for example, or when you're trying to teach programming without hitting the students with a thousand details at the beginning.

Another benefit you get from strict scoping happens if you accidently misspell a variable name while writing a program. If the variables aren't being declared, Perl creates a new variable with the (misspelled) name. The program may not work correctly, and it may be hard to find where the problem is. By strictly scoping the program, any misspelled variables are also undeclared, and Perl complains about it, saving you hours or days of hair-pulling and bad language.

Finally, let's recap how scoping, arguments, and subroutines work by taking another look at Example 6-1. The subroutine is called by writing its name addACGT, passing it the argument $dna, and collecting results (if any) by assignment to $longer_dna:

$longer_dna = addACGT($dna);

The first line in the subroutine gets the value of the argument from the special variable @_, and stores it in its own variable called $dna, which can't be seen outside the subroutine because it uses my. Even though the original variable outside the subroutine is also called $dna, the variable called $dna within the subroutine is an entirely new variable (with the same name) that belongs only to the subroutine due to the use of my. This new variable is in effect only during the time the program is in the subroutine. Notice in the output from the print statement at the end of Example 6-2 that even though a variable called $dna is lengthened inside the subroutine, the original variable, $dna, outside the subroutine isn't changed.

< BACK

CONTINUE >

Index terms contained in this section

() (parentheses)
      variables from @_ array, enclosing with
@ (at sign)
      @_ arrays
arguments
      scoping and
array context, putting variables into
arrays
      @_ arrays
blocks
contexts
      array
declaring
      my variables
            enforcing with strict pragma
global variables
lexical scoping
local scope
my variables 2nd
      pitfalls of not using
parameters
Perl
      reserved words
reserved words
scoping 2nd
      strict
static scoping
strict pragma
subroutines
      arguments
      scoping and
      variables in
variables
      @_
      global, in subroutines
     scoping
            in subroutines
      in subroutines