What Is Bioinformatics?
Biological data is proliferating rapidly. Public databases such as
GenBank and the Protein Data Bank have been growing exponentially for
some time now. With the advent of the World Wide Web and fast
Internet connections, the data
contained in these
databases and a great many
special-purpose programs can be accessed quickly, easily, and cheaply
from any location in the world. As a consequence, computer-based
tools now play an increasingly critical role in the advancement of
biological research.
Bioinformatics,
a rapidly evolving discipline, is the application of computational
tools and techniques to the management and analysis of biological
data. The term bioinformatics is relatively new, and as defined here,
it encroaches on such terms as "computational biology"
and others. The use of computers in biology research predates the
term bioinformatics by many years. For example, the determination of
3D protein structure from X-ray
crystallographic data has long relied on computer analysis. In this
book I refer to the use of computers in biological research as
bioinformatics. It's important to be aware, however, that
others may make different distinctions between the terms. In
particular, bioinformatics is often the term used when referring to
the data and the techniques used in large-scale sequencing and
analysis of entire genomes, such as C.
elegans, Arabidopsis, and
Homo sapiens.
What Bioinformatics Can Do
Here's a short example of bioinformatics in action. Let's
say you have discovered a very interesting segment of mouse DNA and
you suspect it may hold a clue to the development of fatal brain
tumors in humans. After sequencing the DNA, you perform a search of
Genbank and other data sources using web-based sequence alignment
tools such as BLAST. Although you find a few related sequences, you
don't get a direct match or any information that indicates a
link to the brain tumors you suspect exist. You know that the public
genetic databases are growing daily and rapidly. You would like to
perform your searches every day, comparing the results to the
previous searches, to see if anything new appears in the databases.
But this could take an hour or two each day! Luckily, you know Perl.
With a day's work, you write a program (using the Bioperl
module among other things) that automatically conducts a daily BLAST
search of Genbank for your DNA sequence, compares the results with
the previous day's results, and sends you email if there has
been any change. This program is so useful that you start running it
for other sequences as well, and your colleagues also start using it.
Within a few months, your day's worth of work has saved many
weeks of work for your community. This example is taken from real
life. There are now existing programs you can use for this purpose,
even web sites where you can submit your DNA sequence and your email
address, and they'll do all the work for you!
This is only a small example of what happens when you apply the power
of computation to a biological problem. This is bioinformatics.