Beginning Perl for Bioinformatics by James Tisdall This errata page lists errors outstanding in the most recent printing. If you have technical questions or error reports, you can send them to booktech@oreilly.com. Please specify the printing date of your copy. This page was updated February 28, 2002 Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification +n: n'th paragraph from the top of the page -n: n'th paragraph from the bottom of the page Confirmed errors: ######################################## (vii) -1 ## A reader reports: Genbank -> GenBank ## This is confirmed. ######################################## (vii) -1 a link to the brain tumors you suspect exist. -> a role in tumor development. ######################################## (viii) +1 ## A reader reports: Genbank -> GenBank ## This is confirmed. ######################################## (viii) +2 itself or an adjunct to ongoing investigations. -> itself, or as an adjunct to ongoing investigations. ######################################## +3 This books is -> This book is ######################################## (viii) -2 and the computer program. -> and computer programming. ######################################## (ix) -2 metaphor between the two disciplines.) -> metaphor.) ######################################## {xi} +2 sequences data -> sequence data ######################################## (1) +3 large, programmable, digital/electronic (the ENIAC) computers. -> large-scale, programmable, digital, electronic computers (such as ENIAC). ######################################## [2] +3 of DNA and positions -> of DNA and proteins ######################################## <2> +4 The bases joined end to end to form -> The bases are joined end to end to form ######################################## [2] -3 as in translating it to RNA -> as in transcribing it to RNA ######################################## (2) -3 it's done so left to right -> it's usually written left to right ######################################## [3] +3 ## A reader reports: Genbank, the Genetic Sequence Data Bank (http://www.ncbi.n/m.nih.gov) -> GenBank, the Genetic Sequence Data Bank (http://www.ncbi.nlm.nih.gov) ## This is confirmed. ######################################## [3] -2 are composed of an amino group and a carboxyl group. -> are composed of an amino group, a carboxyl group, and a sidechain. ######################################## (3) footnote It can appear as -> It can appear in ######################################## (3) footnote most of the life -> much of the life ######################################## (4) +2 easy to identify -> easy to calculate the ######################################## (4) +5 such as the intriguing processes of -> for instance the intriguing techniques of ######################################## (5) +2 related by virtue of their protein products as part of -> related by virtue of their protein products being part of ######################################## {7} -4 natural, or spoken, languages, such as English -> natural languages, such as English ##(NOTE to editors): natural language is not synonomous with spoken language. ## You can speak mathematics, or music, or Perl. You can write English. ## Some natural languages are now only written, such as ancient Egyptian, etc. ## A fine point, I realize, but humor me -- I spent 6 years in speech research, ## and my opinions are ossified ;>). So I it as a technical term, and ## give an immediate example - English. ######################################## (9) +1 Mac, then -> Mac, and then ######################################## {9} -2 MacOS X -> Mac OS X ######################################## {10} -2 MacOS X -> Mac OS X ## two instances of this here ######################################## (11) +1 ## A reader reports: several books such as O'Reilly's Perl Resource Kits, -> several books ## The reader comments: ## I would suggest to remove "such as O'Reilly's Perl Resource Kits,". ## There is only the Win32 version still available, and even this version ## is totally outdated and much too expensive for the audience of this book. ## Tisdall says: sounds plausible to me: editors, how say you?? ######################################## <11> itemized list #1 the that -> that the ######################################## [14] -3 ## A reader reports: type /my_program -> type ./my_program ## This is confirmed ######################################## {15} +1 MacOS X -> Mac OS X ######################################## (16) -1 beginners, there's -> beginners: there's ######################################## [30] +1 structure -> primary structure ######################################## [30] +2 add a sugar and you get the nucleotides -> add a sugar and you get the nucleosides ######################################## <30> +2 the words "bases", "nucleosides" (see preceding note), and "nucleotides" should be in italics - these are important terms ######################################## (32) +1 remember that -> remember ######################################## (33) +2 command example ## A reader reports: perl example4 -1 -> perl example4-1 ## This is confirmed. ######################################## {34} +2 MacOS X -> Mac OS X ######################################## (35) +2 more easily read -> more easily readable ######################################## (38) +3 and to make sure -> and its output to make sure that ######################################## (38) -1 to the right -> on the right ######################################## [42] Figure 4-1 The text "delimiters to separate the parts of the operator" only point to two of the three delimiters, missing the middle delimiter. As drawn, it might be confused as representing the entire "/T/U/" instead of just the three forward slashes "/". The text "variable" is drawn to encompass "RNA", omitting the "$". It should encompass the entire "$RNA". The text "Other common options" should read "Other common modifiers" ######################################## (45) -2 new and successful way. -> new and successful approach. ######################################## (47) +4 You can use any name -> You can use any name for the file ######################################## <52> +2 You can print the elements one a after -> You can print the elements one after ######################################## (54) +1 both are listed in -> both are demonstrated in ######################################## <55> Exercise 4.7 the last line first. Or you may -> the last line first. You may ######################################## <60> -1 Example 5-2 (from Chapter 4) -> Example 5-2 (which modifies Example 4-7) ######################################## (61) -3 is assigned each time through the loop to the next line of the file -> is assigned to the next line of the file each time through the loop ######################################## <61> -2 a Microsoft Windows versions -> a version of Microsoft Windows ######################################## (62) +4 the type of input it gets. -> the type of input the program gets. ######################################## <63> +2 ## A reader reports: Formats A nd B -> Formats A and B ## This is confirmed. ######################################## (64) +1 Example 5-3, in Chapter 9, and -> Example 5-3; in Chapter 9; and ######################################## [66] +2 (as was true in Example 4-3) -> (as was true in Example 4-5) ######################################## <66> -3 ## A reader reports: is a scalar variable starts with a dollar sign $) -> is a scalar variable (which starts with a dollar sign $) ## This is confirmed. ######################################## <68> -3 whitespace characters, represented by \s with nothing and by the lack of anything between the second and third forward slashes. -> whitespace characters (represented by \s) with nothing (represented by the lack of anything between the second and third forward slashes). ######################################## (68) -3 characters, which is done globally -> characters. This deletion is done globally ######################################## (69) +1 ## A reader reports: line, and a formfeed advances to the next line. The two of them together amount to the same thing as a newline character. -> line. Perl newlines are handled differently on different operating systems; briefly, \n works on all, but you may see \r\n at the end of lines on Windows. ## This is confirmed. ######################################## (72) -1 We now have a design for the program, let's -> Now that we have a design for the program, let's ######################################## [74] Output from Example 5-4 ## A reader reports: T = 17 -> T = 17 errors = 1 ## This is confirmed. ######################################## (74) -3 is the same as previous -> is the same as in previous ######################################## (75) +2 decimal or floating-point numbers -> decimal (or floating-point) numbers ## italicise "floating-point" ######################################## {75} +2 6,544000 -> 6544000 ## not 6,544,000 because readers might try inputting numbers to Perl that way ######################################## <75> +4 ## A reader reports: (see Chapter 6 and the discussion of my variables) -> ## omit this parenthesis, it duplicates a remark earlier in the paragraph. ## This is confirmed. ######################################## {75} +4 languages also require you to declare the type of a variable, for example "integer," or "string," but Perl does not. -> languages require you to pre-declare a variable's type (e.g. "integer i"). Perl indicates the type right in the variable name (e.g. the "@" in "@array"). ######################################## [77] +2 This version of the foreach loop: foreach(@DNA) {. -> This version of the foreach loop: foreach (@DNA) { ######################################## [77] +2 ## A reader reports: in the version of this loop in Example 5-5 -> in the version of this loop in Example 5-4 ## This is confirmed. ######################################## [80] +1 (output of Example 5-6) recognize this vase: -> recognize this base: ######################################## (80) +2 or Perl documentation -> or the Perl documentation ######################################## [81] +5 "seeing. " -> "seeing". ## The word and quotes should not be broken ## across the line. The period should not ## be placed within the quote. (This is ## an example of a string, and needs to ## be exactly "seeing") ######################################## (81) -1 regularly -> often ######################################## [82] +1 (in code fragment) # Also write the results to a file called "countbase" $outputfile = "countbase"; ( unless ( open(COUNTBASE, ">$outputfile") ) { -> # Also write the results to a file called "countbase" $outputfile = "countbase"; unless ( open(COUNTBASE, ">$outputfile") ) { ######################################## [82] footnote several other behaviors. -> several other behaviors, such as appending to a file. ######################################## (82) footnote reading from, and writing to, files as well as other actions. -> reading from files and writing to files, as well as other actions. ######################################## (84) +2 also has the global modifier, -> also has the global modifier g, ######################################## (84) -3 you'll use those while loops to good effect -> you'll use while loops in a similar fashion, to good effect ######################################## (84) -1 The program, however, -> The program, moreover, ######################################## {85} Exercise 5-1 MacOS X -> Mac OS X ######################################## ?85? Exercise 5.6; ## A reader reports: The solution to Exercise 5.6 does not find the reverse complement of DNA. If s hift is substituted for pop in the code, then the reverse complement would be f ound. However the solution to Exercise 5.6 does not show any output when compiled. P lease fix the solution to Exercise 5.6, ie include STDIN in the program. ## Tisdall replies: ## This report is incorrect. The exercise determines if the two input ## strings are reverse complements of each other. ## Also, the program does not use STDIN -- it gets its data from the ## command line, and uses @ARGV for that. Here's an example of running ## the program: #### $ perl exer05.06 ACCGGTAGCGGAGCTGGG CCCAGCTCCGCTACCGGT #### They ARE reverse complements ######################################## (85) -1 ## A reader reports: (eq actually an operator). -> (eq is actually an operator). ## This is confirmed. ######################################## (86) Exercise 5.9 ## A reader reports: (Hint: you can use the Perl functions substr or slice. -> (Hint: you can use the Perl functions substr or splice.) ## This is confirmed ######################################## [86] Exercise 5.10; ## A reader reports: After being compiled and run Exercise 5.10 does not, print the message: "File $tmpfile does not exists!\n" Shouldn't the program print this, since the program's purpose is to remove the tmp file? ## Tisdall replies: ## This report is incorrect. The program does not print any message when ## everything works. Error messages are only printed if something fails ## to work during the program. (Think of it as an error-reporting program.) ######################################## <87> +1 the mutation of DNA. -> the mutation of DNA, and in all the following chapters. ######################################## (88) +1 mean of a distribution at -> mean of a distribution, at ######################################## (88) footnote so-called recursion -> ## italicise "recursion" ######################################## (88) -3 The trick of all -> The trick to all ######################################## (89) -2 We'll now look at this code -> We'll now look more closely at this code ######################################## (90) -1 seen earlier in loops and conditional statements that groups -> as the block used in a loop to group ######################################## <91> -3 as in return $dna; in our subroutine addACGT, in a list of scalars as in return ($dna1, $dna2);, in an array as in return @lines;, and more. -> as in return $dna in our subroutine addACGT; in a list of scalars as in return ($dna1, $dna2); in an array as in return @lines; and more. ######################################## {93} +3 code example my $x ; -> my $x; ######################################## [93] warning such as the my construct. -> with the my construct. (Unless you're using global variables, which we're not.) ######################################## <94> -1 You can get by this -> You can obtain the ######################################## <96> +1 even though a variable called $dna is lengthened inside the subroutine -> even though the value of a variable called $dna is altered inside the subroutine ######################################## (96) +2 You use the command line -> You'll also use the command line ######################################## [96] +4 code fragment AAGGGGTTTCCC -> % perl example6-3.pl AAGGGGTTTCCC ######################################## {96} -2 MacOS X -> Mac OS X ######################################## <98> -4 as you would any other scalar variables. -> as you would any other scalar variable. ######################################## (101) +3 In the example of pass-by-reference in this section -> In the example of pass-by-reference later in this section ######################################## (102) -2 passed on as arguments. -> passed in as arguments. ######################################## (102) -1 appear in multiple program. -> appear in multiple programs. ######################################## (103) -2 Beginning in Chapter 8, I'll define subroutines and show the code, but you'll be putting them into your module and typing: -> All the useful subroutines in this book will be put into this module, which you'll start using in Chapter 8 by typing: ######################################## (109) -2 step you through that code as well. -> steps you through that code as well. ######################################## <110> +4 Now that $dna has been declared and initialized, the program seems wrong on the first statement: -> Now that $dna has been declared and initialized, let's see if it's what we expect: this may be where the program's going wrong, on the first statement! ######################################## (118) +1 nonviable offspring that dies -> nonviable offspring that die ######################################## (118) +1 they can lead to evolutionary -> mutations can lead to evolutionary ######################################## {123} +2 is a do-until loop. -> is a do-until loop, first seen in Example 5-3. ######################################## (123) +2 that do the test first and then the block. -> that test first and then do the block. ######################################## [124] +4 greater than 0 and less than 7 -> greater than or equal to 0 and less than 7 ######################################## <125> -2 a task. the following -> a task. The following ######################################## (127) +1 is really a short function -> is a very short subroutine ######################################## (127) +1 It's just like the idea in -> It's a subroutine to select a string position, very similar to the idea in ######################################## <127> +2 Of course, if you were really writing this code, you'd make a little test to see if your subroutine worked. -> While writing subroutines, you frequently want to write a little test to see if your subroutine works as you intend: ######################################## <129> -3 is a short program. -> is a short subroutine. ######################################## (129) -3 picking a random position then -> picking a random position, then ######################################## (129) -3 have to do that a lot -> have to look things up a lot ######################################## (138) +3 main program and proceeds, following the order of the top-down design you did in pseudocode, then followed by the subroutines. -> main program and is followed by the subroutines, in the same order as the top-down design you did in pseudocode. ######################################## (142) +2 nucleotide in the same position -> nucleotides in the same positions ######################################## (148) Exercise 7.7; ## A reader reports: Sometimes not all choices are will be picked -> Sometimes not all choices are equally likely to be picked ## This is confirmed. ######################################## [149] +2 different data structures (hashes, arrays, and databases) can store -> different data structures (like hashes and arrays) and database systems can store ######################################## <150> -5 Mathematically, a Perl hash always represents a finite function. -> ## Take this sentence out of this paragraph, and place it instead ## between the two sentences of the following paragraph. ######################################## (152) +1 couldn't respond -> couldn't respond informatively ######################################## [152] +2 ## Append this at the end of the 2nd paragraph, ## after "missing from your experiment." -> (Maybe the user simply mistyped the gene name!) ######################################## (152) -1 You might try storing -> You could store ######################################## (153) +2 in an array, for example, to search -> in an array. For example, to search ######################################## (155) +1 ## A reader reports: databases. Some good ones that are free -> databases, some good ones that are free ## This is confirmed. ######################################## [155] -3 The genetic code is how a cell translates the information contained in its DNA into amino acids and then proteins, which do the real work in the cell. -> The genetic code describes how the information in the coding regions of DNA is translated into the correct amino acids for the assembly of the cell's proteins. ######################################## (155) -2 Herein -> Here ######################################## [156] +4 The chart in Figure 8-1 shows how the various bases combine to form amino acids. -> Figure 8-1 shows each codon and its associated amino acid: the genetic code. ######################################## [156] -2 the process stops when a codon is encountered. -> the process stops when one of the three stop codons is encountered. ######################################## {156} -1 the encoding of amino acids -> the encoding for amino acids ######################################## {159} +2 The print statement accepts a filehandle as an optional argument, but so far, we've been printing to the default STDOUT. -> The print statement accepts a filehandle as an optional argument (as in Example 5-7), but by default prints to STDOUT. ######################################## (159) -1 code, and the last subroutine clearly displays this redundancy. It might be interesting to express that in your subroutine. -> code, and subroutine codon2aa clearly maps several codons to the same amino acid. We can represent this redundancy another way. ######################################## (159) -1 let's try to redo the subroutine -> let's rewrite subroutine codon2aa ###################################### (160) +3 ## A reader reports: $codon -> ## could be formatted in nonproportional font ## Tisdall says: editors, how say you??? ####################################### {163} +4 the genetic code hash -> the genetic_code hash ######################################## {163} -1 subroutine translates a whole DNA sequence -> subroutine can be used to translate a whole DNA sequence ######################################## (164) +2 ## A reader reports: condon2aa -> codon2aa ## This is confirmed. ########################################### (165) after second paragraph; ## A reader reports: "$codon = substr ($dna, $i 3);" should be "$codon = substr ($dna, $i, 3);" ## This is confirmed. ######################################## (166) -1 biologists and programmers invented -> biologists and programmers have invented ######################################## {167} -1 limit them to 80 characters in length. -> limit them to at most 80 characters. ######################################## (170) +2 would think incorrectly, that the file -> would think, incorrectly, that the file ######################################## [171] -1 It's convenient to declare these my variables as $line on the spot, -> It's often convenient to declare a loop's "index variable" (like $line) as a my variable, right on the spot in the loop, ######################################## (174) -1 As you accumulate useful subroutines in our modules, -> As you accumulate useful subroutines in modules, ######################################## (175) -3 where in the DNA you're studying the cell -> where, in the DNA you're studying, the cell ######################################## [176] Code for subroutine "revcom" my($revcom) = reverse($dna); !!! Should be: !!! my $revcom = reverse $dna; ######################################### (177) -1 ## The equation should be reformatted so that: +1 -> + 1 ####################################### (177) -1 ## A reader reports: Pseudocode line near the bottom of the page, above last paragraph; This is pseudocode, so it doesn't have to be perfect, but as written this line looks like some kind of strange assignment instead of a comparison: "(end - 1) - (start - 1) + 1 = end - start +1" I'm not sure what the best way to fix this is. It is set in the monospace font used for pseudocode & real code, but it isn't quite either, so I was going to s uggest re-setting it in the variable width font used in the main text. That pro bably won't clarify it either though, so the equals sign should probably just b e doubled so that it's clear that the line is a comparison, and not an assignme nt. ## Tisdall replies: ## This report is (kind of) confirmed. The line in question is an ## equation from elementary algebra. ## = as assignment in programming languages, and = as equality ## in algebra and much other mathematics, is indeed confusing ## (and has already been discussed), therefore let me add: So let's write this subroutine: -> as we know from algebra. So let's write this subroutine: ######################################## (182) -4 ## A reader reports: an.d -> and ## This report is confirmed. ######################################## (184) +3 documentation (or Appendix B), for -> documentation (or Appendix B) for ######################################## [188] -4 If you have two or three per line that have whitespace and are separated from each other by whitespace, -> We have two or three words per line that are separated from each other by whitespace, ######################################## [188] -4 (which acts on the line as stored in the special variable @_.: -> (which by default splits on the line stored in the special variable $_): ######################################## [188] -4 ($name, $site) = split(" ") -> @fields = split(" "); ######################################## [188] -2 $name = shift@fields; $site = pop@fields; -> $name = shift @fields; $site = pop @fields; ######################################## (188) -1 the documentation on REBASE you found on its web site -> the documentation from the REBASE web site ######################################## {189} +2 Of course, REBASE uses them, because a given restriction enzyme might well match a few different recognition sites. -> Of course, REBASE uses these IUB codes because a given restriction enzyme might well bind to a few different DNA patterns, varying in one or more bases. ######################################## {189} +3 given a string, -> given a string of sequence containing IUB codes, ######################################## [191] Example 9-2 Subroutine "parseREBASE" # Read in the REBASE file @rebasefile = get_file_data($rebasefile); foreach ( @rebasefile ) { !!! Should be: !!! # Read in the REBASE file my $rebase_filehandle = open_file($rebasefile); while(<$rebase_filehandle>) { ######################################## [191] -1 You're using a foreach loop to process the lines of the bionet file stored in the @rebasefile array. !!! Should be: !!! You're using a while loop to process the lines of the bionet file as they are read in from the file using the filehandle called $rebase_filehandle. ######################################## [192] -1 evaluates and returns the right. -> evaluates the right and returns. ######################################### {193} +2 conditionals to their own blocks. -> conditionals with their own blocks. ####################################### [193] +3 and returns the left argument if it's true; if the left argument doesn't evaluate to true, it evaluates and returns the right argument. -> the left argument and returns if it's true; if the left argument is false, it evaluates the right argument and returns true or false. ######################################## [193] -3 you skip the rest of the loop. -> you skip the rest of the block and return to the top of the loop. ######################################## (200) +3 ## A reader reports: It's bad, because that same flexibility makes it harder to write programs that to find and extract the desired annotations. -> It's bad, because that same flexibility makes it harder to write programs that find and extract the desired annotations. ## This is confirmed. ######################################## (203) -1 contains 12,813516 loci and -> contains ######################################## [204] -3 over 8 trillion -> over 8 billion ######################################## {205} +3 with each appearing as an element -> with each line of the file appearing as an element ######################################## {205} +5 this datafile and the file record.gb in the next -> library.gb, and the file record.gb which contains just one GenBank record, in the next ######################################## (206) ## A reader reports: (typical)) foreach{ ... } section of the parse1() subroutine, starting at middle of page; Most of the book's code is pretty clear & consistent, but one thing that the au thor keeps wavering on is the inclusion or omission of spaces on lines like the ones here. Appreviating, the code here has these lines: } elseif( $in_sequence) { } elseif ( $line =~ /^ORIGIN/ ) { } else{ Note that the second elsif is followed by a space, but the first one isn't and neither is the else. Likewise, $in_sequence has a leading space but not a trail ing one. ## This report is correct. In this code fragment it does interfere with readability. ## Since the code is so dense here, I'd prefer: if ( $line =~ /^\/\/\n/ ) { }elseif ( $in_sequence ) { }elseif ( $line =~ /^ORIGIN/ ) { }else { ## But in general it's a deliberate choice to vary these things. 1) All such variations ## are legal Perl, and for good reason. 2) Beginners are likely to encounter such ## variations as they encounter different programming styles. 3) Beginners experiment ## with different styles as they develop their own preferences, so 4) I show the ## beginners different styles, and discuss the issue in the book. ## ## I am fully aware of the reason for standards for such formatting issues. ## However, I feel even in professional code they are not an unbreakable rule, ## and that readability is paramount. (You may feel that the only way to ## ensure readability is to follow a rule consistently, but I disagree.) ######################################## (207) +4 so that Perl doesn't interpret them as prematurely ending the pattern. The regular expression also ends with a newline \/\/\n, which is then placed inside the usual delimiters: /\/\/\n/ -> so that Perl doesn't interpret them as the usual / delimiters. The regular expression is anchored to the beginning of the line (^), ends with a newline (\n), and is placed inside the delimiters: /^\/\/\n/ ######################################## (207) +4 you can use another delimiter around the regular expression -> you can replace the / delimiter with any character, like "!", ######################################## (207) +4 like so: m!//\n!). -> like so: m!^//\n! where the expression is now ^//\n). ######################################### [208] +1 Other methods of collecting annotation and sequence lines are possible, especially if you go through the lines of the array more than once. You can scan through the array, keeping track of the start-of-sequence and end-of-record line numbers, and then go back and extract the annotation and sequence using an array splice (which was described in the parseREBASE subroutine in Example 9-2). Here's an example: -> Other methods of collecting annotation and sequence lines are possible, especially if you go through the array more than once. If you first find the start-of-sequence and end-of-record line numbers, you can do the extraction with an array slice, in which you list the desired elements or indicate them with a range, for example: @arrayslice = @array[0,3,5] or @arrayslice = @array[3..8]. Here we use a range: ## Note to Editor: "array slice" should be emphasized as a new term. If possible, ## it should be added to the index. ####################################### [208] +1 in program listing ## A reader reports: /^//\n/ -> m!^//\n! ## This is confirmed. ######################################## [208] +1 in program listing ## A reader reports: -> ## This line should be deleted from the program listing. ## This is confirmed. ######################################## [208] -3 in Examples 6-2 and 6-3 -> in Examples 8-2 and 8-3 ######################################## (209) +2 so it extends the ^ and the $ to match after, or before, a newline, embedded in the string. -> so it extends the ^ to match after an embedded newline, and the $ to match before an embedded newline. ######################################## (209) -3 First, let's examine -> Let's examine ######################################## {209} -2 A call to read -> After setting the input record separator to "//\n", a call to read ######################################### ?218? open_file subroutine; ## A reader reports: In the open_file subroutine I keep getting an error message for the following line: unless(open($fh, $filename)){ The error message reads: Can't use an undefined value as filehandle reference at ./example1-5.pl line 52. ## This report is not confirmed, it doesn't happen to me at all. May be an ## error typing in the program, or perhaps it's behaving differently on an ## older version of Perl than I have? -Jim ######################################### [219] Example 10-5 Subroutine "get_annotation_and_dna" at top of page ## A reader reports: # - given GenBank record, get annotation and DNA -> # - given filehandle for GenBank file, get GenBank record ## This is confirmed. (N.B. This should only be changed at the top of the ## page, not on the second occurence of the line in the middle of the page.) ######################################### [219] Example 10-5 Subroutine "get_annotation_and_dna" $dna =~ s/[\s\/]//g; !!!! Should be: !!!! $dna =~ s/[\s\/\d]//g; ####################################### {223} +3 which is a special variable pattern between -> which is a special variable that remembers the pattern that matched between ######################################## [228] -1 Example 10-8 finds -> Example 10-7 finds ######################################## {229} +1 as this allows you to store, for instance, only one instance of an exon). -> as a hash would allow you to store, for instance, only one value with key 'exon'). ######################################## [230] +1 Example 10-8 gives the output: -> Example 10-7 gives the output: ######################################## [231] +1 parse_features of Example 10-8, -> parse_features of Example 10-7, ######################################### [231] +1 code of Example 10-8: -> code of Example 10-7: ####################################### [235] middle of page; ## A reader reports: if ($offset) -> if (defined $offset) ## $offset can have a valid zero value for the first accession number, ## which would fail the test for "if ($offset)" ## This report is confirmed. The book and the downloadable examples will be updated. ######################################## (239) +1 the effect on biology and medicine would be profound. -> the effect on biology and medicine will be profound. ######################################## {239} +2 information form them. -> information from them. ######################################## [241] -1 Since you're running this program on a folder that contains PDB files, this is what you'll see: -> You can download a small sample 'pdb' directory from this book's web site; if you do, this is what you'll see when you run this program: ######################################## (242) +1 you can give the directory name the special name "." for the current directory, -> you can just call the directory by the special name ".", ######################################## {242} +2 the special files -> the special file names ######################################## {248} +3 The reason for using a subroutine that you define is that it enables you -> File::Find is designed to call your own subroutines, because it enables you ######################################### {257} Example 11-5 Omit this line: print "****chain $chain **** \n"; ####################################### (258) 20th line of code; ## A reader reports: given an scalar containing SEQRES lines, -> given a scalar containing SEQRES lines, ## This is confirmed. ####################################### (260) -4 In Chapter 10, I demonstrated two ways to parse GenBank files into sequence and annotation and then how to parse the annotation into finer and finer levels of detail. -> In Chapter 10, I compared two ways to parse GenBank files into sequence and annotation, and to parse the annotation into finer and finer levels of detail. ######################################## (260) -3 what field the input line was in. -> which section of the file each input line came from. ######################################## (261) +2 captures the pattern matched, denoted by $& -> copies the pattern matched, denoted by the special variable $& that always holds the last successful pattern match, ######################################## (261) -4 Let's examine the subroutine extractSEQRES, now that the record types have been parsed out, and extract the primary amino acid sequence. -> Now that the record types have been parsed out, let's examine the subroutine extractSEQRES, and see how it extracts the primary amino acid sequence. ######################################## [261] -2 The previous parse, in Example 11-4, -> The subroutine parsePDBrecordtypes, in Example 11-5, ######################################## (261) -2 Our success with the previous parsePDBrecordtypes subroutine that used iteration over lines (as opposed to regular expressions over multiline strings) leads to the same approach here. -> Our success in parsePDBrecordtypes using iteration over lines (as opposed to regular expressions over multiline strings) leads us to try the same approach here. ######################################## {262} +4 you won't have use for the strings of amino acids in three-character codes. -> you might not encounter strings of amino acids in three-character codes. ######################################## [265] +2 and the element symbol. -> and the element symbol. The $x, $y, and $z may also contain some spaces. ######################################## [265] -2 1.888 -8.251 -2.511 N 18.955 -10.180 10.777 C !!! Should be !!! 1.888 -8.251 -2.511 N -0.873 9.368 16.046 C ######################################## [266] +1 program code printf "%8.3f%8.3f%8.3f %2s\n", $x, $y, $z, $element; -> print "$x $y $z $element\n"; ######################################## [266] -2 We've already seen the use of the printf function to format output with more options then with the print function. -> For column-specific data such as in PDB, an alternative with more options than the print function is the printf function, which we'll see later in this chapter. ######################################## {266} -1 command example ## A reader reports: from the command line like so, assuming you saved the program in a file called get_two_atoms: %perl get_two_atoms pdb1a4O.ent -> from the command line like so ("biocomp%" is the computer prompt), assuming you saved the program in a file called get_two_atoms: biocomp% perl get_two_atoms pdb/c1/pdb1c1f.ent ## This report is confirmed. The prompt has been explicitly noted. ## Also, the file name is now the same as in the preceding example. ######################################## [267] +1 Alternatively, -> Alternatively, on Unix or Linux or Mac OS X, ######################################## [267] +1 program code % cat pdb1a4O.ent | perl get_two_atoms or % perl get_two_atoms < pdb1a4O.ent -> biocomp% cat pdb/c1/pdb1c1f.ent | perl get_two_atoms or coltrane% perl get_two_atoms < pdb/c1/pdb1c1f.ent ######################################## {267} -2 Unix or Linux -> Unix or Linux or Mac OS X ######################################## {267} -2 Windows or Macintosh -> Windows or older Macintosh ######################################## {267} -1 a program that outputs a secondary structure report, called stride. -> a program called stride that outputs a secondary structure report. ######################################## [267] -1 ## A reader reports: http://www.embl.heidelberg.ole/stride/stride_info.html -> http://www.embl-heidelberg.de/stride/stride_info.html ## This report is confirmed. ######################################## {267} -1 of a PDB filename and collect the output in the subroutine call_stride that follows. -> of a PDB filename. I collect the output in the subroutine call_stride that follows. ######################################## (270) +2 The actual running of the program and collecting its output happens in just one line. -> The actual running of the program and collecting its output happen in just one line. ######################################## <271> paragraphs +2 and +6 Move paragraph +2: Using the subst function, the two for loops alter each line of the two arrays by saving the 11th to the 60th positions of those strings. This is where the desired information lies. to the end of paragraph +6, which now reads: Next, you want to save just those positions (or columns) of the lines that have the sequence or structure information; you don't need the keywords, position numbers, or the PDB entry name at the end of the lines. Using the subst function, the two for loops alter each line of the two arrays by saving the 11th to the 60th positions. This is where the desired information lies. ######################################### <272> +1 Check the next section for a subroutine that will improve that output. -> The exercises that follow ask you to write a subroutine that will improve that output. ####################################### {274} +1 In biological research, the search for sequence similarity is very important. For instance, a researcher who has discovered a potentially important DNA or protein sequence wants to know if it's already been identified and characterized by another researcher. If it hasn't, the researcher wants to know if it resembles any known sequence from any organism. This information can provide vital clues as to the role of the sequence in the organism. -> In biological research, the search for sequence similarity is very important. For instance, a researcher who has isolated a potentially important DNA or protein sequence wants to know if it's already been identified and characterized by another researcher. If it hasn't, the researcher wants to know if it resembles any known sequence from any organism. This information can provide vital clues as to the role of the sequence in the organism under study. And when no such resemblence is found, it is evidence that the sequence may belong to a new class of genes or gene products. ######################################## (275) +1 There are a several -> There are several ######################################## {285} +2 so that you match all the available lines. -> to match all available such lines (at least one). ######################################## [288] code at bottom of page ## A reader reports: ## This routine does not remove the characters "ct" from the word ## "Sbjct" found on each subject line. This causes the insertion ## of the "ct" preceding each subject line into the sequence ## returned in $subject. The value returned in $subject is always ## incorrect. $query = join ( '' , ($HSP =~ /^Query.*\n/gm) ); $subject = join ( '' , ($HSP =~ /^Sbjct.*\n/gm) ); !!! Should be !!! $query = join ( '' , ($HSP =~ /^Query(.*)\n/gm) ); $subject = join ( '' , ($HSP =~ /^Sbjct(.*)\n/gm) ); ## This is confirmed. ######################################## [289] program output at top of page ## A reader reports: -> Subject String: ctggagatggctcagacctggaacctccggatgccggggacgacagcaagtctgagaatg ggctgagaacgctcccatctactgcatctgtcgcaaaccggacatcaattgcttcatgattggacttgtgacaactgca acgagtggttccatggagactgcatccggatca !!! Should be !!! -> Subject String: ggagatggctcagacctggaacctccggatgccggggacgacagcaagtctgagaatggg gagaacgctcccatctactgcatctgtcgcaaaccggacatcaattgcttcatgattggatgtgacaactgcaacgagt ggttccatggagactgcatccggatca ## This is confirmed. ######################################## [289] -5 ## Delete this sentence: Here you see that not only can a subroutine return an array on a scalar value; it can also return a hash. ######################################## [290] +2 This is the nongreedy or minimal matching mentioned in Chapter 9. -> This is the nongreedy or minimal matching; it matches the shortest string possible. By default, * is greedy, matching the longest string possible. ######################################## (290) +4 before and after embedded newlines. -> before and after embedded newlines, respectively. ######################################## {290} +6 the + following the other parentheses. -> the + following the outer parentheses. ######################################## {291} +3 in the same manner as you would a print function. -> in the same manner as you would in a print function. ######################################## [291] -3 Here's what is in Example 12-3. -> Here's what is in the example just given. ######################################## (292) -1 FORTRAN programming-language conventions -> FORTRAN programming language conventions ######################################## {294} +3 ## A reader reports confusion over the output on page 293, so I add on p.294: (which, in this case, is true.) -> (in our example it prints "This DNA is so...") ## This is confirmed. ######################################## <295> +2 Bioperl doesn't provide complete programs. Rather, it provides a fairly large-and growing-set of modules for accomplishing common tasks, including some tasks you've seen in this book. You're responsible for writing the code that holds the mod- ules together. By providing these ready and (usually) easy-to-use modules, Bioperl makes developing bioinformatics applications in Perl faster and easier. There are exam- ple programs for most of the modules, which can be examined and modified to get started. -> Bioperl doesn't provide the programmer with complete programs. Rather, it provides a fairly large-and growing-set of modules for accomplishing common tasks, including some tasks you've seen in this book. You have to write the programs that use the modules. The goal of Bioperl is to make developing bioinformatics applications easier, by providing easy-to-use standard modules. There are example programs for most of the modules, which can be examined and modified to get started. ######################################## (303) -1 I've frequently mentioned modules and CPAN -> I've frequently mentioned modules, and CPAN ######################################## (305) +2 the thousand of genes and gene products -> the genes and gene products ######################################## (305) +3 Graphics programming language present -> Graphics programming languages present ######################################## {310} +5 ## A reader reports: "low signal-to-noise ration" -> "low signal-to-noise ratio" ## This is confirmed. ######################################## [312] +2 Np-Completeness -> NP-Completeness ######################################### [313] +4 Baxecvanis -> Baxevanis ####################################### [315] +2 The Perl programs in this book start with the line: -> The Perl programs in this book start with the line (with or without the -w): ######################################## <315> +3 If the Perl program file was called myprogram, and had executable permissions, -> If the Perl program file is called myprogram, and has executable permissions, ######################################## [316] +1 and +2 ## The three instances of 1,000 should be changed. The comma is good ## Chicago Manual of Style, but incorrect Perl. 1,000 -> 1000 ######################################## {322} -4 (and if you're not careful, strings as well). -> (and may give you unexpected results if you apply them to strings, as well). ######################################## {323} +1 (or a to b) -> (or 'a' to 'b') ######################################## [325] -1 instead of not or and. -> instead of not, or, and. ## Also, the typesetting has made "not" and "and" in a special font, but ## has not done so for "or". All three should be in a special font. ######################################## {326} -3 and in their variants and loops. -> and their variants, and in loops. ######################################## [328] +5 This shows that a pattern match can return a count of the number of translations made in a string, which is then assigned to the variable $count. -> This shows how a pattern match can return a count of the number of patterns found in a string, which in this case is then assigned to the variable $count. ######################################## [329] -2 any variable name or none, -> ## The word 'none' is in a special font; it should be in normal font, as this ## is not a variable name ######################################## {332} +2 MacOS X -> Mac OS X