6.4
Passing Data to Subroutines
When you start parsing GenBank, PDB, and BLAST files
in later chapters, you'll need more complicated arguments to
your subroutines to hold the several fields of data you'll
parse out of the records. These next sections explain the way
it's done in Perl. You can skim this section and return for a
closer read when you get to Chapter 10.
6.4.1
Subroutines: Pass by Value
So far, all our
subroutines have had fairly simple
arguments. The values of these arguments are copied and passed to the
subroutines, and whatever happens to those values in the subroutine
doesn't affect the values of the arguments in the main program.
This is called pass by value or call
by value. For example:
#!/usr/bin/perl -w
# Example of pass-by-value (a.k.a. call-by-value)
use strict;
my $i = 2;
simple_sub($i);
print "In main program, after the subroutine call, \$i equals $i\n\n";
exit;
################################################################################
# Subroutines
################################################################################
sub simple_sub {
my($i) = @_;
$i += 100;
print "In subroutine simple_sub, \$i equals $i\n\n";
}
This gives the following output:
In subroutine simple_sub, $i equals 102
In main program, after the subroutine call, $i equals 2
6.4.2
Subroutines: Pass by Reference
If you have more
complicated
arguments, say a mixture
of scalars, arrays, and hashes, Perl often cannot distinguish between
them. Perl passes all arguments into the subroutine as a single
array, the special @_
array. If there are arrays or hashes
as arguments, their elements get "flattened" out into
this single @_ array in the subroutine.
Here's an example:
#!/usr/bin/perl -w
# Example of problem of pass-by-value with two arrays
use strict;
my @i = ('1', '2', '3');
my @j = ('a', 'b', 'c');
print "In main program before calling subroutine: i = " . "@i\n";
print "In main program before calling subroutine: j = " . "@j\n";
reference_sub(@i, @j);
print "In main program after calling subroutine: i = " . "@i\n";
print "In main program after calling subroutine: j = " . "@j\n";
exit;
################################################################################
# Subroutines
################################################################################
sub reference_sub {
my(@i, @j) = @_;
print "In subroutine : i = " . "@i\n";
print "In subroutine : j = " . "@j\n";
push(@i, '4');
shift(@j);
}
The following output illustrates the problem of this approach:
In main program before calling subroutine: i = 1 2 3
In main program before calling subroutine: j = a b c
In subroutine : i = 1 2 3 a b c
In subroutine : j =
In main program after calling subroutine: i = 1 2 3
In main program after calling subroutine: j = a b c
As you see, in the subroutine all the elements of
@i and @j were grouped into one
@_ array. All distinction between the two arrays
you started with was lost in the subroutine. When you try to get the
two arrays back in the statement:
my(@i, @j) = @_;
Perl assigns everything to the first array, @i.
This behavior makes passing multiple arrays into subroutines somewhat
dicey.
Also, as usual, the original arrays in the main program were not
affected by the subroutine, since you used lexical scoping
(my variables).
To get around this problem, you can pass arguments into subroutines
in a style called pass by reference or
call by reference. Using pass by reference, you
can pass a subroutine any collection of scalars, arrays, hashes, and
more, and the subroutine can distinguish between them. There is a
price to pay: the resulting code looks a little more complex. But the
payoff is often well worth it.
There is one big difference in the behavior of arguments that are
passed by reference. When argument variables are passed in this
fashion, anything you do to the values of the argument variables in
the subroutine also affects the values of the arguments in the main
program.
To call a subroutine that has its arguments passed by reference, you
call it the same way as before, with one difference: you must preface
the argument names with a backslash. In the example of pass by
reference in this section, the subroutine call is accomplished like
so:
reference_sub(\@i, \@j);
As you see here, the
arguments are two arrays, and, to
preserve the distinction between them as they are passed into the
reference_sub subroutine, they are passed by
reference by prepending their names with a backslash.
Within the subroutine, there are a few changes. First, the arguments
are collected from the @_ array, and saved as
scalar variables. This is because a
reference is a special kind of data that is stored in a scalar
variable, no matter whether it's a reference to a scalar, an
array, a hash, or other. The example collects its arguments as
follows:
my($i, $j) = @_;
reading them from the @_ array as scalars.
The subroutine has to do one more thing with these referenced
arguments. When it uses them, it has to
dereference
them. To dereference a referenced argument, you have to prepend the
reference with the symbol that shows what kind of variable it is: a
$
for a scalar, @ for an array, %
for a hash. So these variables have two symbols before their
name—reading left to right, their usual symbol and then a
$ that indicates the variable is a reference. The
lines:
push(@$i, '4');
shift(@$j);
in the following subroutine are the ones that manipulate the
arguments. The push adds an element '4' to the end of the
@i array, and the
shift removes the first element from the
@j array. Because these arrays have been passed by
reference, their names in the subroutine are @$i
and @$j. (If you want to look at the third element
of the @j array, which normally is
$j[2], you'd say $$j[2].)
Whatever changes you make to the arguments in the subroutine also
take effect in the main program. This is because the references are
references to the actual arguments; they are not copies of their
values as in pass by value. So, as you see in the example, after
calling the subroutine, the arrays in the main program have been
altered accordingly:
#!/usr/bin/perl
# Example of pass-by-reference (a.k.a. call-by-reference)
use strict;
use warnings;
my @i = ('1', '2', '3');
my @j = ('a', 'b', 'c');
print "In main program before calling subroutine: i = " . "@i\n";
print "In main program before calling subroutine: j = " . "@j\n";
reference_sub(\@i, \@j);
print "In main program after calling subroutine: i = " . "@i\n";
print "In main program after calling subroutine: j = " . "@j\n";
exit;
################################################################################
# Subroutines
################################################################################
sub reference_sub {
my($i, $j) = @_;
print "In subroutine : i = " . "@$i\n";
print "In subroutine : j = " . "@$j\n";
push(@$i, '4');
shift(@$j);
}
This gives the following output:
In main program before calling subroutine: i = 1 2 3
In main program before calling subroutine: j = a b c
In subroutine : i = 1 2 3
In subroutine : j = a b c
In main program after calling subroutine: i = 1 2 3 4
In main program after calling subroutine: j = b c
The subroutine can now distinguish between the two arrays passed on
as arguments.The changes that were made inside the subroutine to the
variables remain in effect after the subroutine has ended, and
you've returned to the main program. This is the essential
characteristic of pass by
reference.