1.1
The Organization of DNA
It's necessary to review
some of the very basic concepts and
terminology of DNA and
positions at this point. This review is for the benefit of the
nonbiologist; if you're a biologist you can skip the next two
sections.
DNA is a
polymer
composed of four molecules, usually called
bases
or
nucleotides. Their names and one-letter
abbreviations are adenine (A), cytosine (C), guanine (G), and thymine
(T).[1]
(See Chapter 4 for more about how DNA
is represented as computer data.) The bases joined end to end to form
a single strand of DNA.
In the cell, DNA usually appears in a double-stranded form, with two
strands wrapped around each other in the famous double helix shape.
The two strands of the double helix have matching bases, known as
the base pairs. An A on
one strand is always opposite a T on the other strand, and a G is
always paired with a C.
There is also an orientation to the strands. One end of a nucleotide
is called the 5' (five prime) end, and the other
is called the
3' (three prime) end. When
nucleotides join to make a single strand of DNA, they always connect
the 5' end of one to the 3' end of the other. Furthermore, when the
cell uses the DNA, as in translating it to RNA, it does so base by
base from the 5' to the 3' direction. So, when DNA is written,
it's done so left to right on the page, corresponding to the 5'
to 3' orientation of the bases. An encoded gene can appear on either
strand, so it's important to look at both strands when
searching or analyzing DNA.
When two strands are joined in a double helix (as in Figure 1-1), the two strands have opposite orientations.
That is, the 5' to 3' orientation of one strand runs in an opposite
direction as the 5' to 3' orientation of the other strand. So at each
end of the double helix, one strand has a 3' end; the other has a 5'
end.
Figure 1-1. Two strands of DNA
Because the base pairs are always matched A-T and C-G and the
orientation of the strands are the reverse of each other, the term
reverse complement describes the relationship of the
bases of the two strands. It's "reverse" because
the orientations are reversed, and "complement" because
the
bases always pair to their
complementary bases, A to T and C to G.
Given these facts and a single strand of DNA, it's easy to
figure what the matching strand would be in the double helix. Simply
change all bases to their complements: A to T, T to A, C to G, and G
to C. Then, since DNA is written in the 5' to 3' direction, after
complementing the DNA, write it in reverse.
Genbank, the Genetic Sequence Data Bank (http://www.ncbi.nlm.nih.gov), contains most
known sequence data. We'll take a closer look at GenBank in
Chapter 10.
|