< BACKCONTINUE >

4.9 Arrays

In computer languages an array is a variable that stores multiple scalar values. The values can be numbers, strings, or, in this case, lines of an input file of protein sequence data. Let's examine how they can be used. Example 4-7 shows how to use an array to read all the lines of an input file.

Example 4-7. Reading protein sequence data from a file, take 3
#!/usr/bin/perl -w
# Reading protein sequence data from a file, take 3

# The filename of the file containing the protein sequence data
$proteinfilename = 'NM_021964fragment.pep';

# First we have to "open" the file
open(PROTEINFILE, $proteinfilename);

# Read the protein sequence data from the file, and store it
# into the array variable @protein
@protein = <PROTEINFILE>;

# Print the protein onto the screen
print @protein;

# Close the file.
close PROTEINFILE;

exit;

Here's the output of Example 4-7:

MNIDDKLEGLFLKCGGIDEMQSSRTMVVMGGVSGQSTVSGELQD
SVLQDRSMPHQEILAADEVLQESEMRQQDMISHDELMVHEETVKNDEEQMETHERLPQ
GLQYALNVPISVKQEITFTDVSEQLMRDKKQIR

which, as you can see, is exactly the data that's in the file. Success!

The convenience of this is clear—just one line to read all the data into the program.

Notice that the array variable starts with an at sign (@) rather than the dollar sign ($) scalar variables begin with. Also notice that the print function can handle arrays as well as scalar variables. Arrays are used a lot in Perl, so you will see plenty of array examples as the book continues.

An array is a variable that can hold many scalar values. Each item or element is a scalar value that can be referenced by giving its position in the array (its subscript or offset). Let's look at some examples of arrays and their most common operations. We'll define an array @bases that holds the four bases A, C, G, and T. Then we'll apply some of the most common array operators.

Here's a piece of code that demonstrates how to initialize an array and how to use subscripts to access the individual elements of an array:

# Here's one way to declare an array, initialized with a list of four scalar values.
@bases = ('A', 'C', 'G', 'T');

# Now we'll print each element of the array
print "Here are the array elements:";
print "\nFirst element: ";
print $bases[0];
print "\nSecond element: ";
print $bases[1];
print "\nThird element: ";
print $bases[2];
print "\nFourth element: ";
print $bases[3];

This code snippet prints out:

First element: A
Second element: C
Third element: G
Fourth element: T

You can print the elements one a after another like this:

@bases = ('A', 'C', 'G', 'T');
print "\n\nHere are the array elements: ";
print @bases;

which produces the output:

Here are the array elements: ACGT

You can also print the elements separated by spaces (notice the double quotes in the print statement):

@bases = ('A', 'C', 'G', 'T');
print "\n\nHere are the array elements: ";
print "@bases";

which produces the output:

Here are the array elements: A C G T

You can take an element off the end of an array with pop:

@bases = ('A', 'C', 'G', 'T');
$base1 = pop @bases;
print "Here's the element removed from the end: ";
print $base1, "\n\n";
print "Here's the remaining array of bases: ";
print "@bases";

which produces the output:

Here's the element removed from the end: T

Here's the remaining array of bases: A C G

You can take a base off of the beginning of the array with shift:

@bases = ('A', 'C', 'G', 'T');
$base2 = shift @bases;
print "Here's an element removed from the beginning: ";
print $base2, "\n\n";
print "Here's the remaining array of bases: ";
print "@bases";

which produces the output:

Here's an element removed from the beginning: A

Here's the remaining array of bases: C G T

You can put an element at the beginning of the array with unshift:

@bases = ('A', 'C', 'G', 'T');
$base1 = pop @bases;
unshift (@bases, $base1);
print "Here's the element from the end put on the beginning: ";
print "@bases\n\n";

which produces the output:

Here's the element from the end put on the beginning: T A C G

You can put an element on the end of the array with push:

@bases = ('A', 'C', 'G', 'T');
$base2 = shift @bases;
push (@bases, $base2);
print "Here's the element from the beginning put on the end: ";
print "@bases\n\n";

which produces the output:

Here's the element from the beginning put on the end: C G T A

You can reverse the array:

@bases = ('A', 'C', 'G', 'T');
@reverse = reverse @bases;
print "Here's the array in reverse: ";
print "@reverse\n\n";

which produces the output:

Here's the array in reverse: T G C A

You can get the length of an array:

@bases = ('A', 'C', 'G', 'T');
print "Here's the length of the array: ";
print scalar @bases, "\n";

which produces the output:

Here's the length of the array: 4

Here's how to insert an element at an arbitrary place in an array using the Perl splice function:

@bases = ('A', 'C', 'G', 'T');
splice ( @bases, 2, 0, 'X');
print "Here's the array with an element inserted after the 2nd element: ";
print "@bases\n";

which produces the output:

Here's the array with an element inserted after the 2nd element: A C X G T
< BACKCONTINUE >

Index terms contained in this section

$ (dollar sign)
      for scalar variable names
@ (at sign)
      beginning array variables with
arrays
      initializing and accessing individual elements
      inserting element at arbitrary position in
      reversing
declaring
      arrays
elements, array
      accessing with subscripts
initializing
      arrays
length of arrays
pop function
print function
      arrays, handling
push function
      adding element at end of array
reversing arrays
scalar variables
      $ (dollar sign), beginning with
shift function
      taking element from beginning of array
splice function
subscripts, array elements
unshift function
variables
      array

© 2002, O'Reilly & Associates, Inc.