6.1
Subroutines
Subroutines are an important way to organize a program and are used
in all major programming languages.
A subroutine wraps up a bit of code, gives the
code a name, and provides a way to pass in some values for its
calculations and then report back the results. The rest of the
program can then use the subroutine's code just by calling its
name, giving the needed values to pass in to the subroutine code and
then collecting the results. This use or "invocation" of
a subroutine is commonly referred to as
calling
the subroutine. You can think of a subroutine as a program within a
program; just as you run programs to get results, so your programs
call subroutines to get results. Once you have a subroutine, you can
use it in a program simply by knowing which values to pass in and
what kind of values to expect it to pass out.
6.1.1
Advantages of Subroutines
Subroutines provide
several benefits. They endow programs with
abstraction, modularization, and the ability to create large programs
by organizing the code into manageable chunks with defined inputs and
outputs.
Say you need to calculate something, for instance the mean of a
distribution at several places in a program or in several different
programs. By writing this calculation as a subroutine, you can write
it once, and then call it whenever you need it, thus making your
program:
-
Shorter, since you're reusing the code.
-
Easier to test, since you can test the subroutine separately.
-
Easier to understand, since it reduces clutter and better organizes
programs.
-
More reliable, since you have less code when you reuse subroutines,
so there are fewer opportunities for something to go wrong.
-
Faster to write, since you may, for example, have already written
some subroutines that handle basic statistics and can just call the
one that calculates the mean without having to write it again. Or
better yet, you found a good statistics library someone else wrote,
and you never had to write it at all.
There is another subtle, yet powerful idea at work here. Subroutines
can themselves call other
subroutines, that is, a subroutine
can use another subroutine for help in its calculations.[1]
By writing a set of
subroutines, each of which does one or a few things well, you can
combine them in various ways to make new subroutines. You can then
combine the new subroutines, and so on, and the end result can be
large and flexible programming systems. Decomposing problems into
sets of subroutines that can be conveniently combined allows you to
create environments that can grow and adapt to changing conditions
with a minimum of effort.
The trick of all this is in how you partition the code into
subroutines. You want subroutines that encapsulate something that
will be generally useful, and not just called once (although that
sometimes can be useful too). There are various rules of thumb: a
subroutine should do one thing well, and it should be no more than a
page or two of code. These are not real rules, and exceptions are
frequent, but they can help you divide your code into manageable
chunks, suitable for subroutines.
6.1.2
Writing Subroutines
Let's look at how
subroutines
are used and then at how they're defined.
To use a subroutine, you pass data into the subroutine as
arguments,
and then you collect the return value(s) of the subroutine. For
example, say you want a subroutine that, given some DNA, appends
"ACGT" to the end of the DNA and returns the new, longer
DNA. Let's call the subroutine addACGT. In
Perl, you usually call a subroutine by typing its name, followed by a
parenthesized list of
arguments (if any). For example, here's a call to
addACGT with the one argument
$dna:
addACGT($dna);
When calling a subroutine, older versions of Perl required starting
the name of a subroutine with the
&
(ampersand) character. It's still okay to do so (e.g., :
&addACGT), but these days the ampersand is
usually omitted.[2]
Example 6-1 demonstrates a subroutine that
shows in detail how this works.
Example 6-1. A subroutine to append ACGT to DNA
#!/usr/bin/perl -w
# A program with a subroutine to append ACGT to DNA
# The original DNA
$dna = 'CGACGTCTTCTCAGGCGA';
# The call to the subroutine "addACGT".
# The argument being passed in is $dna; the result is saved in $longer_dna
$longer_dna = addACGT($dna);
print "I added ACGT to $dna and got $longer_dna\n\n";
exit;
################################################################################
# Subroutines for Example 6-1
################################################################################
# Here is the definition for subroutine "addACGT"
sub addACGT {
my($dna) = @_;
$dna .= 'ACGT';
return $dna;
}
Example 6-1 produces the following output:
I added ACGT to CGACGTCTTCTCAGGCGA and got CGACGTCTTCTCAGGCGAACGT
We'll now look at this code to see how subroutines are defined
and used in a Perl program.
The first thing to notice, taking the large view, is that the program
now has two sections. The first section starts from the beginning of
the program and ends with the exit command.
Following that (and announced by a blizzard of comments for easy
reading) is a section for
subroutine definitions, in this
case, only the one definition for subroutine
addACGT. It is common to place all subroutine
definitions together at the end of a program, for ease in reading.
Usually they're listed alphabetically or in some other
convenient way.
Actually, it is legal to put the
subroutine definitions almost anywhere in a
program. This is because Perl first scans through the code and does
things like check the syntax and learn subroutine definitions, before
it starts to run the program. In particular, subroutine definitions
can come after the point in the code where you use them (not
necessarily before, which many people assume is the rule), and they
don't have to be grouped together but can be scattered
throughout the code. But our method of collecting them together at
the end can make reading a program much easier. The possible
exception is when a small subroutine is used in one section of code,
as sometimes happens with the sort function, for
instance. In this case having the definition right there can save the
reader paging back and forth between the subroutine definition and
its use. Usually, it's more convenient to read the program
without the subroutine definitions, to get the overall flow of the
program first, and then go back and look into the subroutines, if
necessary.
As you see, Example 6-1 is very simple. It first
stores some DNA into the variable $dna and then
passes that variable as an argument to the subroutine call, which
looks like this: addACGT($dna). The
subroutine is called by its name,
followed by parentheses containing the arguments to the subroutine.
There may be no
arguments, or if more than one, they
are separated by commas. The value returned by the subroutine can be
saved; in this program the value is saved in a variable called
$longer_dna, which is then printed, and the
program exits.
The part of the program from the beginning to the
exit statement is called variously the
main program
or the main
body of the program. By looking over this section of the
code, you can see what happens from the beginning to the end of the
program without looking into the details of the subroutines.
Now that you've looked over the main program of Example 6-1, it's time to look at the subroutine
definition and how it uses the principal of
scoping.