Safari | Beginning Perl for Bioinformatics -> 6.6 Fixing Bugs in Your Code

Beginning Perl for Bioinformatics > 6. Subroutines and Bugs > 6.6 Fixing Bugs in Your Code

6.6 Fixing Bugs in Your Code

Now let's talk about what to do when your program is having trouble.

A program can go wrong in any number of ways. Maybe it won't run at all. A look at the error messages, especially the first line or two of the error messages, usually leads you to the problem, which will be somewhere in the syntax, and its solution, which will be to use the correct syntax (e.g., matching braces or ending each statement with a semicolon).

Your program may run but not behave as you planned. Then you have some problem with the logic of the program. Perhaps at some point, you've zigged when you should have zagged, like adding instead of subtracting or using the assignment operator = when you meant to test for equality between two numbers with ==. Or, the problem could be that you just have a poor design to accomplish your task, and it's only when you actually try it out that the flaw becomes evident.

However, sometimes the problem is not obvious, and you have to resort to the heavy artillery.

Fortunately, Perl has several ways to help you find and fix bugs in your programs. The use of the statements use strict; and use warnings; should become a habit, as you can catch many errors with them. The Perl debugger gives you complete freedom to examine a program in detail as it runs.

6.6.1 use warnings; and use strict;

In general, it's not too hard to tell when the syntax of a program is wrong because the Perl interpreter will produce error messages that usually lead you right to the problem. It's much harder to tell when the program is doing something you didn't really want. Many such problems can be caught if you turn on the warnings and enforce the strict use of declarations.

You have probably noticed that all the programs in this book up until now start with the command interpreter line:

#!/usr/bin/perl -w

That -w turns on Perl's warnings and attempts to find potential problems in your code and then to warn you about them. It finds common problems such as variables that are declared more than once, and so on, things that are not syntax errors but that can lead to bugs.

Another way to turn on warnings is to add the following statement near the top of the program:

use warnings;

The statement use warnings; may not be available on your version of Perl, if it's an old one. So if your Perl complains about it, take it out and use the -w command instead, either on the command interpreter line, or from the command line:

$ perl -w my_program

However, use warnings; is a bit more portable between different operating systems. So, from now on, that's the way I'll turn on warnings in my code. Another important helper you should use is the following statement placed near the top of your program (next to use warnings;):

use strict;

As mentioned previously, this forces you to declare your variables. (It has some options, that are beyond the scope of this book.) It finds misspelled variables, undeclared variables that may be interfering with other parts of the program, and so on.

It's best to always use both use strict; and use warnings; when writing your Perl code.

6.6.2 Fixing Bugs with Comments and Print Statements

Sometimes you can identify misbehaving code by selectively commenting out sections of the program until you find the part that seems to cause the problem. You can also add print statements at suspicious parts of a misbehaving program to check what certain variables are doing. Both of these are time-honored programming techniques, and they work well in almost any programming language.

Commenting out sections of code can be particularly helpful when the error messages that you get from Perl don't point you directly at the offending line. This happens occasionally. When it does happen you may, by trial and error, discover that commenting out a small section of code causes the error messages to go away; then you know where the error is occurring.

Adding print statements can also be a quick way to pinpoint a problem, especially if you already have some idea of where the problem is. As a novice programmer, however, you may find that using the Perl debugger is easier than adding print statements. In the debugger, you can easily set print statements at any line. For instance, the following debugger command says to print the values of $i and $k before line 48:

  a 48 print "$i $k\n"

Once you learn how to do it, this method is generally faster and easier than editing the Perl program and adding print statements by hand. Using this method is partly a matter of taste, since some extremely good Perl programmers prefer to do it the old-fashioned way, by adding print statements.

6.6.3 The Perl Debugger

My favorite way to deal with nonobvious bugs in my programs is to use the Perl debugger. The problem with bugs in code is that once a program starts running, all you can see is the output; you can't see the steps a program is taking. The Perl debugger lets you examine your program in detail, step by step, and almost always can lead you quickly to the problem. You'll also find that it's easy to use with a little practice.

There are situations the Perl debugger can't handle well: interacting processes that depend on timing considerations, for instance. The debugger can examine only one program at a time, and while examining, it stops the program, so timing considerations with other processes go right out the window.

For most purposes, the Perl debugger is a great, essential, programming tool. This section introduces its most important features.

6.6.3.1 A program with bugs

Example 6-4 has some bugs we can examine. It's supposed to take a sequence and two bases, and output everything from those two bases to the end of the sequence (if it can find them in the sequence). The two bases can be given as an argument, or if no argument is given, the program uses the bases TA by default.

There is one new thing in Example 6-4. The next statement affects the control flow in a loop. It immediately returns the control flow to the next iteration of the loop, skipping whatever else would have followed. Also, you may want to recall $_ , which we discussed back in Example 5-5 in the context of a foreach loop.

Example 6-4. A program with a bug or two

#!/usr/bin/perl
# A program with a bug or two
#
# An optional argument, for where to start printing the sequence,
#  is a two-base subsequence.
#
# Print everything from the subsequence ( or TA if no subsequence
# is given as an argument) to the end of the DNA.

# declare and initialize variables
my $dna = 'CGACGTCTTCTAAGGCGA';
my @dna;
my $receivingcommittment;
my $previousbase = ''; 

my$subsequence = '';

if (@ARGV) {
    my$subsequence = $ARGV[0];
}else{
    $subsequence = 'TA';
}

my $base1 = substr($subsequence, 0, 1);
my $base2 = substr($subsequence, 1, 1);

# explode DNA
@dna = split ( '', $dna );

######### Pseudocode of the following loop:
#
# If you've received a committment, print the base and continue.  Otherwise:
#
# If the previous base was $base1, and this base is $base2, print them.
#   You have now received a committment to print the rest of the string.
#
# At each loop, save the previous base.

foreach (@dna) {
    if ($receivingcommittment) {
        print;
        next;
    } elsif ($previousbase eq $base1) {
        if ( /$base2/ ) {
            print $base1, $base2; 
            $recievingcommitment = 1;
        }
    }
    $previousbase = $_;
}

print "\n";

exit;

Here's the output of two runs of Example 6-1:

$ perl example 6-4 AA

$ perl example 6-4
TA

Huh? It should have printed out AAGGCGA when called with the argument AA, and TAAGGCGA when called with no arguments. There must be a bug in this program. But, if you look it over, there isn't anything obviously wrong. It's time to fire up the debugger. What follows is an actual debugging session on Example 6-4, interspersed with comments to explain what's happening and why.

6.6.3.2 How to start and stop the debugger

The debugger runs interactively, and you control it from the keyboard.^[6] The most common way to start it is by giving the -d switch to Perl at the command line. Since you're using buggy Example 6-4 to demonstrate the debugger, here's how to start that program:

^[6] You also can run it automatically to produce a trace of the program in a file.

perl -d example6-4

Alternatively, you could have added a -d flag to the command interpreter:

#!/usr/bin/perl -d

On systems such as Unix and Linux where command interpretation works, this starts the debugger automatically.

To stop the debugger, simply type q.

6.6.3.3 Debugger command summary

First, let's try to find the bug in Example 6-4 when it's called with no arguments:

$ perl -d example6-4
Default die handler restored.

Loading DB routines from perl5db.pl version 1.07
Editor support available.

Enter h or 'h h' for help, or 'man perldebug' for more help.

main::(example6-4:11):    my $dna = 'CGACGTCTTCTAAGGCGA';
  DB<1>

Let's stop right here at the beginning and look at a few things. After some messages, which may not mean a whole lot right now, you get the excellent information that the commands h and h h give more help. Let's try h h:

  DB<1> h h
List/search source lines:               Control script execution:
  l [ln|sub]  List source code            T           Stack trace
  - or .      List previous/current line  s [expr]    Single step [in expr]
  w [line]    List around line            n [expr]    Next, steps over subs
  f filename  View source in file         <CR/Enter>  Repeat last n or s
  /pattern/ ?patt?   Search forw/backw    r           Return from subroutine
  v          Show versions of modules     c [ln|sub]  Continue until position
Debugger controls:                        L           List break/watch/actions
  O [...]     Set debugger options        t [expr]    Toggle trace [trace expr]
  <[<]|{[{]|>[>] [cmd] Do pre/post-prompt b [ln|event|sub] [cnd] Set breakpoint
  ! [N|pat]   Redo a previous command     d [ln] or D Delete a/all breakpoints
  H [-num]    Display last num commands   a [ln] cmd  Do cmd before line
  = [a val]   Define/list an alias        W expr      Add a watch expression
  h [db_cmd]  Get help on command         A or W      Delete all actions/watch
  |[|]db_cmd  Send output to pager        ![!] syscmd Run cmd in a subprocess
  q or ^D     Quit                        R           Attempt a restart
Data Examination:       expr     Execute perl code, also see: s,n,t expr
  x|m expr      Evals expr in list context, dumps the result or lists methods.
  p expr        Print expression (uses script's current package).
  S [[!]pat]    List subroutine names [not] matching pattern
  V [Pk [Vars]] List Variables in Package.  Vars can be ~pattern or !pattern.
  X [Vars]     Same as "V current_package [Vars]".
For more help, type h cmd_letter, or run man perldebug for all docs.
  DB<2>

It's a bit hard to read, but you have a concise summary of the debugger commands. You can also use the h command, which gives several screens worth of information. The | h command displays those several pages one at a time; the pipe at the beginning of a debugger command pipes the output through a pager, which typically advances a page when you hit the spacebar on your keyboard. You should try those out. Right now, however, let's focus on a few of the most useful commands. But remember that typing h command can give you help about the command.

6.6.3.4 Stepping through statements with the debugger

Back to the immediate problem. When you started up the debugger, you saw that it stopped on the first line of real Perl code:

main::(example6-4:11):    my $dna = 'CGACGTCTTCTAAGGCGA';

There's an important point about the debugger you should understand right away. It shows the line it's about to execute, not the line it just executed.

So really, Example 6-4 hasn't done anything yet. You can see from the command summary that p tells the debugger to print out values. If you ask it to print the value of $dna, you'll find:

  DB<2> p $dna

  DB<3>

It didn't show anything because there's nothing to show; it hasn't even seen the variable $dna yet. So you should execute the statement. There are two commands to use: n or s both execute the statement being displayed. (The difference is that n or "next" skips the plunge into a subroutine call, treating it like a single statement; s or "single step" enters a subroutine and single step you through that code as well.) Once you've given one of these commands, you can just hit Enter to repeat the same command.

Since there aren't any subroutines, you needn't worry about choosing between n and s, so let's use n:

 DB<3> n
main::(example6-4:12):    my @dna;
  DB<3>

This shows the next line (you can see the line numbers of the Perl program at the end of the prompt). If you wish to see more lines, the w or "window" command will serve:

  DB<3> w
9 	
10 	# declare and initialize variables
11:	my $dna = 'CGACGTCTTCTAAGGCGA';
12==>	my @dna;
13:	my $receivingcommittment;
14:	my $previousbase = ''; 
15 	
16:	my $subsequence = '';
17 	
18:	if (@ARGV) {
  DB<3>

The current line—the line that will be executed next—is highlighted with an arrow (==>).

The w seems like a useful thing. Let's get more information about it with the help command h w:

  DB<3> h w
w [line]        List window around line.
  DB<4>

Actually, there's more—hitting w repeatedly keeps showing more of the program; a minus sign backs up a screen. But enough of that.

Now that $dna has been declared and initialized, the program seems wrong on the first statement:

  DB<4> p $dna
CGACGTCTTCTAAGGCGA
  DB<5>

That's exactly what was expected. There's no bug, so let's continue examining the lines, printing out values here and there:

  DB<5> n
main::(example6-4:13):	my $receivingcommittment;
  DB<5> n
main::(example6-4:14):	my $previousbase = ''; 
  DB<5> n
main::(example6-4:16):	my $subsequence = '';
  DB<5> n
main::(example6-4:18):	if (@ARGV) {
  DB<5> p @ARGV

  DB<6> w
15 	
16:	my $subsequence = '';
17 	
18==>	if (@ARGV) {
19:	    my $subsequence = $ARGV[0];
20 	}else{
21:	    $subsequence = 'TA';
22 	}
23 	
24:	my $base1 = substr($subsequence, 0, 1);
  DB<6> n
main::(example6-4:21):	    $subsequence = 'TA';
  DB<6> n
main::(example6-4:24):	my $base1 = substr($subsequence, 0, 1);
  DB<6> p $subsequence
TA
  DB<7> n
main::(example6-4:25):	my $base2 = substr($subsequence, 1, 1);
  DB<7> n
main::(example6-4:28):	@dna = split ( '', $dna );
  DB<7> p $base1
T
  DB<8> p $base2
A
  DB<9>

So far, everything is as expected; the default subsequence TA is being used, and the $base1 and $base2 variables are set to T and A, the first and second bases of the subsequence. Let's continue:

  DB<9> n
main::(example6-4:39):    foreach (@dna) {
  DB<9> p @dna
CGACGTCTTCTAAGGCGA
  DB<10> p "@dna"
C G A C G T C T T C T A A G G C G A
  DB<11>

This shows a trick with Perl and printing arrays: normally they are printed without any spacing between the elements, but enclosing an array in double quotes in a print statement causes it to be displayed with spaces between the elements.

Again, everything seems okay, and we're about to enter a loop. Let's look at the whole loop first:

  DB<11> w
36 	#
37 	# At each loop, save the previous base.
38 	
39==>	foreach (@dna) {
40:	    if ($receivingcommittment) {
41:	        print;
42:	        next;
43 	    } elsif ($previousbase eq $base1) {
44:	        if ( /$base2/ ) {
45:	            print $base1, $base2; 
  DB<11> w
43 	    } elsif ($previousbase eq $base1) {
44:	        if ( /$base2/ ) {
45:	            print $base1, $base2; 
46:	            $recievingcommitment = 1;
47 	        }
48 	    }
49:	    $previousbase = $_;
50 	}
51 	
52:	print "\n";
  DB<11>

Despite the few repeated lines resulting from the w command, you can see the whole loop. Now you know something in here is going wrong: when you tested the program without giving it an argument, as it's running now, it took the default argument TA, and so far it seemed okay. However, all it actually did in your test was to print out the TA when it was supposed to print out everything in the string starting with the first occurrence of TA. What's going wrong?

6.6.3.5 Setting breakpoints

To figure out what's wrong, you can set a breakpoint in your code. A breakpoint is a spot in your program where you tell the debugger to stop execution so you can poke around in the code. The Perl debugger lets you set breakpoints in various ways. They let you run the program, stopping only to examine it when a statement with a breakpoint is reached. That way, you don't have to step through every line of code. (If you have 5,000 lines of code, and the error happens when you hit a line of code that's first used when you're reading the 12,000th line of input, you'll be happy about this feature.)

Notice that the part of this loop that prints out the rest of the string, once the starting two bases have been found, is the if block starting at line 40:

    if ($receivingcommittment) {
        print;
        next;
    }

Let's look at that $receivingcommittment variable.

Here's one way to do this. Let's set a breakpoint at line 40. Type b 40 and then c to continue, and the program proceeds until it hits line 40:

  DB<11> b 40
  DB<12> c
main::(example6-4:40):	    if ($receivingcommittment) {
  DB<12> p
C
  DB<12>

The last command, p , prints out the element from the @dna array you reached in the foreach loop. Since you didn't specify a variable for the loop, it used the default $_ variable. Many Perl commands such as print or pattern matching operate on the default $_ variable if no other variable is given. (It's the cousin of the @_ default array subroutines used to hold their parameters.) So the p debugger command shows that you're operating on C from the @dna array, which is the first character.

All well and good. But it would be good to have the program break when the variable $receivingcommittment has a change in its value, and then single step from there, to see why the program isn't printing out the rest of the string. Recall that this variable is the flag whose change tells the program to print the rest of the string. First let's delete all other breakpoints:

  DB<12> D
Deleting all breakpoints...

You can "watch" the variable with W like so:

  DB<12> W $receivingcommittment
  DB<13> c
TA
Debugged program terminated.  Use q to quit or R to restart,
  use O inhibit_exit to avoid stopping after program termination,
  h q, h R or h O to get additional info.  
  DB<13>

Wait a minute! The W command should indicate when $receivingcommittment changes value. But when the program continued running with the c command, it ran to the end, meaning that $receivingcommittment never changed value. So let's start up the program again and break on the line that changes its value:

  DB<13> R
Warning: some settings and command-line options may be lost!
Default die handler restored.

Loading DB routines from perl5db.pl version 1.07
Editor support available.

Enter h or 'h h' for help, or 'man perldebug' for more help.

main::(example6-4:11):	my $dna = 'CGACGTCTTCTAAGGCGA';
  DB<13> w 45
42:	        next;
43 	    } elsif ($previousbase eq $base1) {
44:	        if ( /$base2/ ) {
45:	            print $base1, $base2; 
46:	            $recievingcommitment = 1;
47 	        }
48 	    }
49:	    $previousbase = $_;
50 	}
51 	
  DB<14> b 46
  DB<15> c
TAmain::(example6-4:46):	            $recievingcommitment = 1;
  DB<15> n
main::(example6-4:49):	    $previousbase = $_;
  DB<15> p $receivingcommittment

  DB<16>

Huh? The code says it's assigning the variable a value of 1, but after you execute the code, with the n and try to print out the value, it doesn't print anything.

If you stare harder at the program, you see that at line 66 you misspelled $receivingcommittment as $recievingcommitment. That explains everything; fix it and run it again:

$ perl example6-4
TAAGGCGA

Success!

6.6.3.6 Fixing another bug

Now, did that fix the other bug when you ran Example 6-4 with an argument?

$ perl example6-4 AA
GACGTCTTCTAAGGCGA

Again, huh? You expected AAGGCGA. Can there be another bug in the program? Let's try the debugger again:

$ perl -d example6-4 AA
Default die handler restored.

Loading DB routines from perl5db.pl version 1.07
Editor support available.

Enter h or 'h h' for help, or 'man perldebug' for more help.

main::(example6-4:11):	my $dna = 'CGACGTCTTCTAAGGCGA';
  DB<1> n
main::(example6-4:12):	my @dna;
  DB<1> n
main::(example6-4:13):	my $receivingcommittment;
  DB<1> n
main::(example6-4:14):	my $previousbase = ''; 
  DB<1> n
main::(example6-4:16):	my $subsequence = '';
  DB<1> n
main::(example6-4:18):	if (@ARGV) {
  DB<1> n
main::(example6-4:19):	    my $subsequence = $ARGV[0];
  DB<1> n
main::(example6-4:24):	my $base1 = substr($subsequence, 0, 1);
  DB<1> n
main::(example6-4:25):	my $base2 = substr($subsequence, 1, 1);
  DB<1> n
main::(example6-4:28):	@dna = split ( '', $dna );
  DB<1> p $subsequence

  DB<2> p $base1

  DB<3> p $base2

  DB<4>

Okay, for some reason the $subsequence, and therefore the $base1 and $base2 variables, are not getting set right. How come?

Check out line 19 where you declared a new my variable in the block of the if statement with the same name, $subsequence. That's the variable you're setting, but it's disappearing as soon as the if statement is over, because it's scoped in the block since it's a my variable.

So again, you fix that problem by removing the my declaration on line 19 and instead inserting an assignment $subsequence = $ARGV[0]; and run the program again:

  $ perl example6-4
TAAGGCGA
$ perl example6-4 AA
AAGGCGA

Here, finally, is success.

6.6.3.7 use warnings; and use strict; redux

Example 6-4 was somewhat artificial. It turns out that these problems would have been reported easily if warnings had been used. So let's see an actual example of the benefits of use strict; and use warnings;, as discussed earlier in this chapter.

If you go back to the original Example 6-4 and add the use warnings; directive near the top of the program, you get the following output:

$ perl example6-4 
Name "main::recievingcommitment" used only once: possible typo at example6-4 line 47.
TA

As you see, the warnings found the first bug immediately. They noticed there was a variable that was used only once, usually a sign of a misspelled variable. (I can never spell "receiving" or "commitment" properly.) So fix the misspelling at line 66, and run it again:

$ perl example6-4 
TAAGGCGA
$ perl example6-4 AA
substr outside of string at example6-4 line 26.
Use of uninitialized value in regexp compilation at example6-4 line 45.
Use of uninitialized value in print at example6-4 line 46.
GACGTCTTCTAAGGCGA

So, the first bug is fixed. The second bug remains with a few warnings that are, perhaps, hard to understand. But focus on the first error message, and see that it complains about line 26:

my $base2 = substr($subsequence, 1, 1);

So, there's something wrong with $subsequence. Often, error messages will be off by one line, so it may well be that the error starts on the line before, the first time $subsequence is operated on by the substr. But that's not the case here.

Nonetheless, the warnings have pointed directly to the problem. In this case, you still have to take a little initiative; look back at the $subsequence variable and notice the extra my declaration within the if block on line 20 that is preventing the variable from being initialized properly. Now this is not necessarily always a bug—declaring a variable scoped within a block and that overrides another variable of the same name that is outside the block. In fact, it's perfectly legal, so the programmers who wrote the warnings did not flag it as an obvious error. However, it seems to have caused a real problem here!

One final point: if you go back to the original, buggy program, notice there's no use strict; in the program. If you add that and run the program without arguments, you get the following:

$ perl example6-4   
Global symbol "$recievingcommitment" requires explicit package name at example6-4 line 47.
Execution of example6-4 aborted due to compilation errors.

Fixing the misspelled variable, and running the program with the argument, you get:

$ perl example6-4 AA
GACGTCTTCTAAGGCGA

You can see that use strict; didn't help for the other bug. Remember, it's best to employ both use strict; and use warnings;.

< BACK

CONTINUE >

Index terms contained in this section

$ (dollar sign)
      $_ variables
-d (debug) command-line switch
-w command-line switch, turning on warnings
= (equal sign)
      ==> current line marker
| (vertical bar)
      debugger commands
arrays
      printing
breakpoints
      deleting
c (continue) command
calling
      Perl interpreter from command line
code
      breakpoints
            deleting
      debugging
command line
      calling Perl interpreter from
      starting Perl debugger from
commands
      Perl debugger
continue (c) command
control flow
      next operator, effect in loops
current line marker (==>)
debugging
      Perl debugger, using
            breakpoints
            command summary
            my declaration, removing
            starting and stopping debugger
            stepping through statements
            use strict; and use warnings;
foreach loops
      $_ variables in
help
      Perl debugger commands
interpreter, Perl
      calling from command line
line, current
loops
      foreach
      next operator, effect on control flow
my variables
      removing to debug code
n (next) command
next (n) command
next operator
operators
      next
p (print) command
piping debugger output through pager
printing
      arrays with spaces between elements
      p (print) command 2nd
programs
      debugging
removing
      my variables (in debugging)
s (single step) command
single step (s) command
strict pragma
      variables, finding errors in
syntax errors
use strict;
use warnings;
variables
      $_
W (watch) debugger command
w (window) command 2nd
warnings (-w flag)
warnings pragma
      variables, finding errors in
window (w) command 2nd