4.3
Concatenating DNA Fragments
Now we'll make a simple modification of Example 4-1 to show how to
concatenate
two DNA fragments. Concatenation is attaching
something to the end of something else. A biologist is well aware
that joining DNA sequences is a common task in the biology lab, for
instance when a clone is inserted into a cell vector or when splicing
exons together during the expression of a gene. Many bioinformatics
software packages have to deal with such operations; hence its choice
as an example.
Example 4-2 demonstrates a few more things to do
with strings, variables, and print statements.
Example 4-2. Concatenating DNA
#!/usr/bin/perl -w
# Concatenating DNA
# Store two DNA fragments into two variables called $DNA1 and $DNA2
$DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
$DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA';
# Print the DNA onto the screen
print "Here are the original two DNA fragments:\n\n";
print $DNA1, "\n";
print $DNA2, "\n\n";
# Concatenate the DNA fragments into a third variable and print them
# Using "string interpolation"
$DNA3 = "$DNA1$DNA2";
print "Here is the concatenation of the first two fragments (version 1):\n\n";
print "$DNA3\n\n";
# An alternative way using the "dot operator":
# Concatenate the DNA fragments into a third variable and print them
$DNA3 = $DNA1 . $DNA2;
print "Here is the concatenation of the first two fragments (version 2):\n\n";
print "$DNA3\n\n";
# Print the same thing without using the variable $DNA3
print "Here is the concatenation of the first two fragments (version 3):\n\n";
print $DNA1, $DNA2, "\n";
exit;
As you can see, there are three variables here,
$DNA1, $DNA2, and
$DNA3. I've added print
statements for a running commentary, so that the output of the
program that appears on the computer screen makes more sense and
isn't simply some DNA fragments one after the other.
Here's what the output of Example 4-2 looks
like:
Here are the original two DNA fragments:
ACGGGAGGACGGGAAAATTACTACGGCATTAGC
ATAGTGCCGTGAGAGTGATGTAGTA
Here is the concatenation of the first two fragments (version 1):
ACGGGAGGACGGGAAAATTACTACGGCATTAGCATAGTGCCGTGAGAGTGATGTAGTA
Here is the concatenation of the first two fragments (version 2):
ACGGGAGGACGGGAAAATTACTACGGCATTAGCATAGTGCCGTGAGAGTGATGTAGTA
Here is the concatenation of the first two fragments (version 3):
ACGGGAGGACGGGAAAATTACTACGGCATTAGCATAGTGCCGTGAGAGTGATGTAGTA
Example 4-2 has many similarities to Example 4-1. Let's look at the differences. To start
with, the print
statements have some extra, unintuitive parts:
print $DNA1, "\n";
print $DNA2, "\n\n";
The print statements have variables containing the
DNA, as before, but now they also have a comma and
then
"\n" or "\n\n". These are
instructions to print newlines. A newline is
invisible on the page or screen, but it tells the computer to go on
to the beginning of the next line for subsequent printing. One
newline, "\n", simply positions you at the
beginning of the next line. Two new lines, "\n\n",
moves to the next line and then positions you at the beginning of the
line after that, leaving a blank line in between.
Look at the code for Example 4-2 and to make sure
you see what these newline directives do to the output. A blank line
is a line with nothing printed on it. Depending on your operating
system, it may be just a newline character or a combination formfeed
and carriage return (in which cases, it may also be called an empty
line), or it may include nonprinting whitespace characters such as
spaces and tabs. Notice that the newlines are enclosed in double
quotes, which means they are parts of strings. (Here's one
difference between single and double quotes, as mentioned earlier:
"\n" prints a newline; '\n'
prints \n as written.)
Notice the comma in the print statement. A comma
separates items in a list. The print statement
prints all the items that are listed. Simple as that.
Now let's look at the
statement that concatenates the two
DNA fragments $DNA1 and $DNA2
into the variable $DNA3:
$DNA3 = "$DNA1$DNA2";
The assignment to $DNA3 is just a typical
assignment as you saw in Example 4-1, a variable
name followed by the = sign, followed by a value
to be assigned.
The value to the right of the assignment statement is a string
enclosed in double quotes. The double quotes allow the variables in
the string to be replaced with their values. This is called
string interpolation
.[2]
So, in effect, the string
here is just the DNA of variable $DNA1, followed
directly by the DNA of variable $DNA2. That
concatenation of the two DNA fragments is then assigned to variable
$DNA3.
After assigning the concatenated DNA to variable
$DNA3, you print it out, followed by a blank line:
print "$DNA3\n\n";
One of the Perl catch phrases is, "There's more than one
way to do it." So, the next part of the program shows another
way to concatenate two strings, using the
dot operator. The dot operator, when placed
between two strings, creates a single string that concatenates the
two original strings. So the line:
$DNA3 = $DNA1 . $DNA2;
illustrates the use of this operator.
|
An operator in a
computer language takes some arguments—in this case, the
strings $DNA1 and
$DNA2—and does something to them, returning
a value—in this case, the concatenated string placed in the
variable $DNA3. The most familiar operators from
arithmetic—plus, minus, multiply, and divide—are all
operators that take two numbers as arguments and return a number as a
value.
|
|
Finally, just to exercise the different parts of the language,
let's accomplish the same concatenation using only the
print
statement:
print $DNA1, $DNA2, "\n";
Here the print statement has three parts,
separated by commas: the two DNA fragments in the two variables and a
newline. You can achieve the same result with the following
print statement:
print "$DNA1$DNA2\n";
Maybe the Perl slogan should be, "There are more than two ways
to do it."
Before leaving this section, let's look ahead to other uses of
Perl variables. You've seen the use of variables to hold
strings of DNA sequence data. There are other types of data, and
programming languages need variables for them, too. In Perl, a
scalar variable such as
$DNA can hold a string, an integer, a
floating-point number (with a decimal point), a boolean
(true or false) value, and
more. When it's required, Perl figures out what kind of data is
in the variable. For now, try adding the following lines to Example 4-1 or Example 4-2, storing a
number in a scalar variable and
printing it out:
$number = 17;
print $number,"\n";