12.3
BLAST Output Files
The following is part of a
BLAST output file. I created it by
entering a few lines of the sample.dna file from
Chapter 8 into the BLAST program at the NCBI web
site, without changing any of the default parameters. I then saved
the output as text in the file blst.txt, which
is available from this book's web site. I've used it
repeatedly in the parsing routines throughout this chapter. Because
the output is several pages long, I've truncated it here to
show the beginning, the middle, and the end of the file.
BLASTN 2.1.3 [Apr-11-2001]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
RID: 991533563-27495-9092
Query=
(400 letters)
Database: nt
868,831 sequences; 3,298,558,333 total letters
Score E
Sequences producing significant alignments: (bits) Value
dbj|AB031069.1|AB031069 Homo sapiens PCCX1 mRNA for protein cont... 793 0.0
ref|NM_014593.1| Homo sapiens CpG binding protein (CGBP), mRNA 779 0.0
gb|AF149758.1|AF149758 Homo sapiens CpG binding protein (CGBP) m... 779 0.0
ref|XM_008699.3| Homo sapiens CpG binding protein (CGBP), mRNA 765 0.0
emb|AL136862.1|HSM801830 Homo sapiens mRNA; cDNA DKFZp434F174 (f... 450 e-124
emb|AJ132339.1|HSA132339 Homo sapiens CpG island sequence, subcl... 446 e-123
emb|AJ236590.1|HSA236590 Homo sapiens chromosome 18 CpG island D... 406 e-111
dbj|AK010337.1|AK010337 Mus musculus ES cells cDNA, RIKEN full-l... 234 3e-59
dbj|AK017941.1|AK017941 Mus musculus adult male thymus cDNA, RIK... 210 5e-52
gb|AC009750.7|AC009750 Drosophila melanogaster, chromosome 2L, r... 46 0.017
gb|AE003580.2|AE003580 Drosophila melanogaster genomic scaffold ... 46 0.017
ref|NC_001905.1| Leishmania major chromosome 1, complete sequence 40 1.0
gb|AE001274.1|AE001274 Leishmania major chromosome 1, complete s... 40 1.0
gb|AC008299.5|AC008299 Drosophila melanogaster, chromosome 3R, r... 38 4.1
gb|AC018662.3|AC018662 Human Chromosome 7 clone RP11-339C9, comp... 38 4.1
gb|AE003774.2|AE003774 Drosophila melanogaster genomic scaffold ... 38 4.1
gb|AC008039.1|AC008039 Homo sapiens clone SCb-391H5 from 7q31, c... 38 4.1
gb|AC005315.2|AC005315 Arabidopsis thaliana chromosome II sectio... 38 4.1
emb|AL353748.13|AL353748 Human DNA sequence from clone RP11-317B... 38 4.1
ALIGNMENTS
>dbj|AB031069.1|AB031069 Homo sapiens PCCX1 mRNA for protein containing CXXC
domain 1,
complete cds
Length = 2487
Score = 793 bits (400), Expect = 0.0
Identities = 400/400 (100%)
Strand = Plus / Plus
Query: 1 agatggcggcgctgaggggtcttgggggctctaggccggccacctactggtttgcagcgg 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 agatggcggcgctgaggggtcttgggggctctaggccggccacctactggtttgcagcgg 60
Query: 61 agacgacgcatggggcctgcgcaataggagtacgctgcctgggaggcgtgactagaagcg 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 agacgacgcatggggcctgcgcaataggagtacgctgcctgggaggcgtgactagaagcg 120
Query: 121 gaagtagttgtgggcgcctttgcaaccgcctgggacgccgccgagtggtctgtgcaggtt 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 gaagtagttgtgggcgcctttgcaaccgcctgggacgccgccgagtggtctgtgcaggtt 180
Query: 181 cgcgggtcgctggcgggggtcgtgagggagtgcgccgggagcggagatatggagggagat 240
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 cgcgggtcgctggcgggggtcgtgagggagtgcgccgggagcggagatatggagggagat 240
Query: 241 ggttcagacccagagcctccagatgccggggaggacagcaagtccgagaatggggagaat 300
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ggttcagacccagagcctccagatgccggggaggacagcaagtccgagaatggggagaat 300
Query: 301 gcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgatcgggtgtgac 360
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 gcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgatcgggtgtgac 360
Query: 361 aactgcaatgagtggttccatggggactgcatccggatca 400
||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 aactgcaatgagtggttccatggggactgcatccggatca 400
>ref|NM_014593.1| Homo sapiens CpG binding protein (CGBP), mRNA
... (file truncated here)
>dbj|AK010337.1|AK010337 Mus musculus ES cells cDNA, RIKEN full-length
enriched library,
clone:2410002I16, full insert sequence
Length = 2538
Score = 234 bits (118), Expect = 3e-59
Identities = 166/182 (91%)
Strand = Plus / Plus
Query: 219 gagcggagatatggagggagatggttcagacccagagcctccagatgccggggaggacag 278
||||||||||||||| |||||||| ||||||| || ||||| ||||||||||| |||||
Sbjct: 260 gagcggagatatggaaggagatggctcagacctggaacctccggatgccggggacgacag 319
Query: 279 caagtccgagaatggggagaatgcgcccatctactgcatctgccgcaaaccggacatcaa 338
|||||| |||||||||||||| || ||||||||||||||||| |||||||||||||||||
Sbjct: 320 caagtctgagaatggggagaacgctcccatctactgcatctgtcgcaaaccggacatcaa 379
Query: 339 ctgcttcatgatcgggtgtgacaactgcaatgagtggttccatggggactgcatccggat 398
||||||||||| || |||||||||||||| |||||||||||||| ||||||||||||||
Sbjct: 380 ttgcttcatgattggatgtgacaactgcaacgagtggttccatggagactgcatccggat 439
Query: 399 ca 400
||
Sbjct: 440 ca 441
Score = 44.1 bits (22), Expect = 0.066
Identities = 25/26 (96%)
Strand = Plus / Plus
Query: 118 gcggaagtagttgtgggcgcctttgc 143
||||||||||||| ||||||||||||
Sbjct: 147 gcggaagtagttgcgggcgcctttgc 172
>dbj|AK017941.1|AK017941 Mus musculus adult male thymus cDNA, RIKEN
full-length enriched library, clone:5830420C16, full insert sequence
Length = 1461
Score = 210 bits (106), Expect = 5e-52
Identities = 151/166 (90%)
Strand = Plus / Plus
Query: 235 ggagatggttcagacccagagcctccagatgccggggaggacagcaagtccgagaatggg 294
|||||||| ||||||| || ||||| ||||||||||| ||||||||||| |||||||||
Sbjct: 1048 ggagatggctcagacctggaacctccggatgccggggacgacagcaagtctgagaatggg 1107
Query: 295 gagaatgcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgatcggg 354
||||| || ||||||||||||||||| ||||||||||||||||| ||||||||||| ||
Sbjct: 1108 gagaacgctcccatctactgcatctgtcgcaaaccggacatcaattgcttcatgattgga 1167
Query: 355 tgtgacaactgcaatgagtggttccatggggactgcatccggatca 400
|||||||||||||| |||||||||||||| ||||||||||||||||
Sbjct: 1168 tgtgacaactgcaacgagtggttccatggagactgcatccggatca 1213
Score = 44.1 bits (22), Expect = 0.066
Identities = 25/26 (96%)
Strand = Plus / Plus
Query: 118 gcggaagtagttgtgggcgcctttgc 143
||||||||||||| ||||||||||||
Sbjct: 235 gcggaagtagttgcgggcgcctttgc 260
>gb|AC009750.7|AC009750 Drosophila melanogaster, chromosome 2L, region 23F-24A,
BAC clone
... (file truncated here)
>emb|AL353748.13|AL353748 Human DNA sequence from clone RP11-317B17 on
chromosome 9, complete
sequence [Homo sapiens]
Length = 179155
Score = 38.2 bits (19), Expect = 4.1
Identities = 22/23 (95%)
Strand = Plus / Plus
Query: 192 ggcgggggtcgtgagggagtgcg 214
|||| ||||||||||||||||||
Sbjct: 48258 ggcgtgggtcgtgagggagtgcg 48280
Database: nt
Posted date: May 30, 2001 3:54 AM
Number of letters in database: -996,408,959
Number of sequences in database: 868,831
Lambda K H
1.37 0.711 1.31
Gapped
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 436021
Number of Sequences: 868831
Number of extensions: 436021
Number of successful extensions: 7536
Number of sequences better than 10.0: 19
length of query: 400
length of database: 3,298,558,333
effective HSP length: 20
effective length of query: 380
effective length of database: 3,281,181,713
effective search space: 1246849050940
effective search space used: 1246849050940
T: 0
A: 30
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 19 (38.2 bits)
As you can see, the file consists of three parts: some header
information at the beginning followed by a summary of the alignments,
the alignments, and then some additional summary parameters and
statistics at the end.