Safari | Beginning Perl for Bioinformatics -> 6.4 Passing Data to Subroutines

Beginning Perl for Bioinformatics > 6. Subroutines and Bugs > 6.4 Passing Data to Subroutines

6.4 Passing Data to Subroutines

When you start parsing GenBank, PDB, and BLAST files in later chapters, you'll need more complicated arguments to your subroutines to hold the several fields of data you'll parse out of the records. These next sections explain the way it's done in Perl. You can skim this section and return for a closer read when you get to Chapter 10.

6.4.1 Subroutines: Pass by Value

So far, all our subroutines have had fairly simple arguments. The values of these arguments are copied and passed to the subroutines, and whatever happens to those values in the subroutine doesn't affect the values of the arguments in the main program. This is called pass by value or call by value. For example:

#!/usr/bin/perl -w
# Example of pass-by-value (a.k.a. call-by-value)

use strict;

my $i = 2;

simple_sub($i);

print "In main program, after the subroutine call, \$i equals $i\n\n";

exit;

################################################################################
# Subroutines
################################################################################
sub simple_sub {

    my($i) = @_;

    $i += 100;

    print "In subroutine simple_sub, \$i equals $i\n\n";
}

This gives the following output:

In subroutine simple_sub, $i equals 102

In main program, after the subroutine call, $i equals 2

6.4.2 Subroutines: Pass by Reference

If you have more complicated arguments, say a mixture of scalars, arrays, and hashes, Perl often cannot distinguish between them. Perl passes all arguments into the subroutine as a single array, the special @_ array. If there are arrays or hashes as arguments, their elements get "flattened" out into this single @_ array in the subroutine. Here's an example:

#!/usr/bin/perl -w
# Example of problem of pass-by-value with two arrays

use strict;

my @i = ('1', '2', '3');
my @j = ('a', 'b', 'c');

print "In main program before calling subroutine: i = " . "@i\n";
print "In main program before calling subroutine: j = " . "@j\n";

reference_sub(@i, @j);

print "In main program after calling subroutine: i = " . "@i\n";
print "In main program after calling subroutine: j = " . "@j\n";

exit;

################################################################################
# Subroutines
################################################################################

sub reference_sub {

    my(@i, @j) = @_;

    print "In subroutine : i = " . "@i\n";
    print "In subroutine : j = " . "@j\n";

    push(@i, '4');

    shift(@j);
}

The following output illustrates the problem of this approach:

In main program before calling subroutine: i = 1 2 3
In main program before calling subroutine: j = a b c
In subroutine : i = 1 2 3 a b c
In subroutine : j = 
In main program after calling subroutine: i = 1 2 3
In main program after calling subroutine: j = a b c

As you see, in the subroutine all the elements of @i and @j were grouped into one @_ array. All distinction between the two arrays you started with was lost in the subroutine. When you try to get the two arrays back in the statement:

my(@i, @j) = @_;

Perl assigns everything to the first array, @i. This behavior makes passing multiple arrays into subroutines somewhat dicey.

Also, as usual, the original arrays in the main program were not affected by the subroutine, since you used lexical scoping (my variables).

To get around this problem, you can pass arguments into subroutines in a style called pass by reference or call by reference. Using pass by reference, you can pass a subroutine any collection of scalars, arrays, hashes, and more, and the subroutine can distinguish between them. There is a price to pay: the resulting code looks a little more complex. But the payoff is often well worth it.

There is one big difference in the behavior of arguments that are passed by reference. When argument variables are passed in this fashion, anything you do to the values of the argument variables in the subroutine also affects the values of the arguments in the main program.

To call a subroutine that has its arguments passed by reference, you call it the same way as before, with one difference: you must preface the argument names with a backslash. In the example of pass by reference in this section, the subroutine call is accomplished like so:

reference_sub(\@i, \@j);

As you see here, the arguments are two arrays, and, to preserve the distinction between them as they are passed into the reference_sub subroutine, they are passed by reference by prepending their names with a backslash.

Within the subroutine, there are a few changes. First, the arguments are collected from the @_ array, and saved as scalar variables. This is because a reference is a special kind of data that is stored in a scalar variable, no matter whether it's a reference to a scalar, an array, a hash, or other. The example collects its arguments as follows:

my($i, $j) = @_;

reading them from the @_ array as scalars.

The subroutine has to do one more thing with these referenced arguments. When it uses them, it has to dereference them. To dereference a referenced argument, you have to prepend the reference with the symbol that shows what kind of variable it is: a $ for a scalar, @ for an array, % for a hash. So these variables have two symbols before their name—reading left to right, their usual symbol and then a $ that indicates the variable is a reference. The lines:

push(@$i, '4');
shift(@$j);

in the following subroutine are the ones that manipulate the arguments. The push adds an element '4' to the end of the @i array, and the shift removes the first element from the @j array. Because these arrays have been passed by reference, their names in the subroutine are @$i and @$j. (If you want to look at the third element of the @j array, which normally is $j[2], you'd say $$j[2].)

Whatever changes you make to the arguments in the subroutine also take effect in the main program. This is because the references are references to the actual arguments; they are not copies of their values as in pass by value. So, as you see in the example, after calling the subroutine, the arrays in the main program have been altered accordingly:

#!/usr/bin/perl
# Example of pass-by-reference (a.k.a. call-by-reference)

use strict;
use warnings;

my @i = ('1', '2', '3');
my @j = ('a', 'b', 'c');

print "In main program before calling subroutine: i = " . "@i\n";
print "In main program before calling subroutine: j = " . "@j\n";

reference_sub(\@i, \@j);

print "In main program after calling subroutine: i = " . "@i\n";
print "In main program after calling subroutine: j = " . "@j\n";

exit;

################################################################################
# Subroutines
################################################################################

sub reference_sub {
    my($i, $j) = @_;

    print "In subroutine : i = " . "@$i\n";
    print "In subroutine : j = " . "@$j\n";

    push(@$i, '4');
    shift(@$j);
}

This gives the following output:

In main program before calling subroutine: i = 1 2 3
In main program before calling subroutine: j = a b c
In subroutine : i = 1 2 3
In subroutine : j = a b c
In main program after calling subroutine: i = 1 2 3 4
In main program after calling subroutine: j = b c

The subroutine can now distinguish between the two arrays passed on as arguments.The changes that were made inside the subroutine to the variables remain in effect after the subroutine has ended, and you've returned to the main program. This is the essential characteristic of pass by reference.

< BACK

CONTINUE >

Index terms contained in this section

$ (dollar sign)
      for scalar variable names
% (percent sign)
      for hash names
@ (at sign)
      @_ arrays
      for array names
\\\\ (backslash)
      references, use in
arguments
      @_ arrays, passing into subroutines with
      passing by reference
            dereferencing
arrays
      @_ arrays
      passing by reference into or out of subroutines
by reference
      passing arguments into subroutines
by value
      copying and passing data
calling
      subroutine arguments by value
dereferencing
hashes
names
      variables
passing by reference
      dereferencing
passing by value
push function
reference_sub subroutine
references
      dereferencing
      passing by
scalar variables
      references, storing in
shift function
subroutines
      passing data to
            by reference
            by value
variables
      symbols denoting types