| =head1 NAME |
| |
| perlreftut - Mark's very short tutorial about references |
| |
| =head1 DESCRIPTION |
| |
| One of the most important new features in Perl 5 was the capability to |
| manage complicated data structures like multidimensional arrays and |
| nested hashes. To enable these, Perl 5 introduced a feature called |
| 'references', and using references is the key to managing complicated, |
| structured data in Perl. Unfortunately, there's a lot of funny syntax |
| to learn, and the main manual page can be hard to follow. The manual |
| is quite complete, and sometimes people find that a problem, because |
| it can be hard to tell what is important and what isn't. |
| |
| Fortunately, you only need to know 10% of what's in the main page to get |
| 90% of the benefit. This page will show you that 10%. |
| |
| =head1 Who Needs Complicated Data Structures? |
| |
| One problem that came up all the time in Perl 4 was how to represent a |
| hash whose values were lists. Perl 4 had hashes, of course, but the |
| values had to be scalars; they couldn't be lists. |
| |
| Why would you want a hash of lists? Let's take a simple example: You |
| have a file of city and country names, like this: |
| |
| Chicago, USA |
| Frankfurt, Germany |
| Berlin, Germany |
| Washington, USA |
| Helsinki, Finland |
| New York, USA |
| |
| and you want to produce an output like this, with each country mentioned |
| once, and then an alphabetical list of the cities in that country: |
| |
| Finland: Helsinki. |
| Germany: Berlin, Frankfurt. |
| USA: Chicago, New York, Washington. |
| |
| The natural way to do this is to have a hash whose keys are country |
| names. Associated with each country name key is a list of the cities in |
| that country. Each time you read a line of input, split it into a country |
| and a city, look up the list of cities already known to be in that |
| country, and append the new city to the list. When you're done reading |
| the input, iterate over the hash as usual, sorting each list of cities |
| before you print it out. |
| |
| If hash values can't be lists, you lose. In Perl 4, hash values can't |
| be lists; they can only be strings. You lose. You'd probably have to |
| combine all the cities into a single string somehow, and then when |
| time came to write the output, you'd have to break the string into a |
| list, sort the list, and turn it back into a string. This is messy |
| and error-prone. And it's frustrating, because Perl already has |
| perfectly good lists that would solve the problem if only you could |
| use them. |
| |
| =head1 The Solution |
| |
| By the time Perl 5 rolled around, we were already stuck with this |
| design: Hash values must be scalars. The solution to this is |
| references. |
| |
| A reference is a scalar value that I<refers to> an entire array or an |
| entire hash (or to just about anything else). Names are one kind of |
| reference that you're already familiar with. Think of the President |
| of the United States: a messy, inconvenient bag of blood and bones. |
| But to talk about him, or to represent him in a computer program, all |
| you need is the easy, convenient scalar string "Barack Obama". |
| |
| References in Perl are like names for arrays and hashes. They're |
| Perl's private, internal names, so you can be sure they're |
| unambiguous. Unlike "Barack Obama", a reference only refers to one |
| thing, and you always know what it refers to. If you have a reference |
| to an array, you can recover the entire array from it. If you have a |
| reference to a hash, you can recover the entire hash. But the |
| reference is still an easy, compact scalar value. |
| |
| You can't have a hash whose values are arrays; hash values can only be |
| scalars. We're stuck with that. But a single reference can refer to |
| an entire array, and references are scalars, so you can have a hash of |
| references to arrays, and it'll act a lot like a hash of arrays, and |
| it'll be just as useful as a hash of arrays. |
| |
| We'll come back to this city-country problem later, after we've seen |
| some syntax for managing references. |
| |
| |
| =head1 Syntax |
| |
| There are just two ways to make a reference, and just two ways to use |
| it once you have it. |
| |
| =head2 Making References |
| |
| =head3 B<Make Rule 1> |
| |
| If you put a C<\> in front of a variable, you get a |
| reference to that variable. |
| |
| $aref = \@array; # $aref now holds a reference to @array |
| $href = \%hash; # $href now holds a reference to %hash |
| $sref = \$scalar; # $sref now holds a reference to $scalar |
| |
| Once the reference is stored in a variable like $aref or $href, you |
| can copy it or store it just the same as any other scalar value: |
| |
| $xy = $aref; # $xy now holds a reference to @array |
| $p[3] = $href; # $p[3] now holds a reference to %hash |
| $z = $p[3]; # $z now holds a reference to %hash |
| |
| |
| These examples show how to make references to variables with names. |
| Sometimes you want to make an array or a hash that doesn't have a |
| name. This is analogous to the way you like to be able to use the |
| string C<"\n"> or the number 80 without having to store it in a named |
| variable first. |
| |
| B<Make Rule 2> |
| |
| C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to |
| that array. C<{ ITEMS }> makes a new, anonymous hash, and returns a |
| reference to that hash. |
| |
| $aref = [ 1, "foo", undef, 13 ]; |
| # $aref now holds a reference to an array |
| |
| $href = { APR => 4, AUG => 8 }; |
| # $href now holds a reference to a hash |
| |
| |
| The references you get from rule 2 are the same kind of |
| references that you get from rule 1: |
| |
| # This: |
| $aref = [ 1, 2, 3 ]; |
| |
| # Does the same as this: |
| @array = (1, 2, 3); |
| $aref = \@array; |
| |
| |
| The first line is an abbreviation for the following two lines, except |
| that it doesn't create the superfluous array variable C<@array>. |
| |
| If you write just C<[]>, you get a new, empty anonymous array. |
| If you write just C<{}>, you get a new, empty anonymous hash. |
| |
| |
| =head2 Using References |
| |
| What can you do with a reference once you have it? It's a scalar |
| value, and we've seen that you can store it as a scalar and get it back |
| again just like any scalar. There are just two more ways to use it: |
| |
| =head3 B<Use Rule 1> |
| |
| You can always use an array reference, in curly braces, in place of |
| the name of an array. For example, C<@{$aref}> instead of C<@array>. |
| |
| Here are some examples of that: |
| |
| Arrays: |
| |
| |
| @a @{$aref} An array |
| reverse @a reverse @{$aref} Reverse the array |
| $a[3] ${$aref}[3] An element of the array |
| $a[3] = 17; ${$aref}[3] = 17 Assigning an element |
| |
| |
| On each line are two expressions that do the same thing. The |
| left-hand versions operate on the array C<@a>. The right-hand |
| versions operate on the array that is referred to by C<$aref>. Once |
| they find the array they're operating on, both versions do the same |
| things to the arrays. |
| |
| Using a hash reference is I<exactly> the same: |
| |
| %h %{$href} A hash |
| keys %h keys %{$href} Get the keys from the hash |
| $h{'red'} ${$href}{'red'} An element of the hash |
| $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element |
| |
| Whatever you want to do with a reference, B<Use Rule 1> tells you how |
| to do it. You just write the Perl code that you would have written |
| for doing the same thing to a regular array or hash, and then replace |
| the array or hash name with C<{$reference}>. "How do I loop over an |
| array when all I have is a reference?" Well, to loop over an array, you |
| would write |
| |
| for my $element (@array) { |
| ... |
| } |
| |
| so replace the array name, C<@array>, with the reference: |
| |
| for my $element (@{$aref}) { |
| ... |
| } |
| |
| "How do I print out the contents of a hash when all I have is a |
| reference?" First write the code for printing out a hash: |
| |
| for my $key (keys %hash) { |
| print "$key => $hash{$key}\n"; |
| } |
| |
| And then replace the hash name with the reference: |
| |
| for my $key (keys %{$href}) { |
| print "$key => ${$href}{$key}\n"; |
| } |
| |
| =head3 B<Use Rule 2> |
| |
| B<Use Rule 1> is all you really need, because it tells you how to do |
| absolutely everything you ever need to do with references. But the |
| most common thing to do with an array or a hash is to extract a single |
| element, and the B<Use Rule 1> notation is cumbersome. So there is an |
| abbreviation. |
| |
| C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >> |
| instead. |
| |
| C<${$href}{red}> is too hard to read, so you can write |
| C<< $href->{red} >> instead. |
| |
| If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is |
| the fourth element of the array. Don't confuse this with C<$aref[3]>, |
| which is the fourth element of a totally different array, one |
| deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the |
| same way that C<$item> and C<@item> are. |
| |
| Similarly, C<< $href->{'red'} >> is part of the hash referred to by |
| the scalar variable C<$href>, perhaps even one with no name. |
| C<$href{'red'}> is part of the deceptively named C<%href> hash. It's |
| easy to forget to leave out the C<< -> >>, and if you do, you'll get |
| bizarre results when your program gets array and hash elements out of |
| totally unexpected hashes and arrays that weren't the ones you wanted |
| to use. |
| |
| |
| =head2 An Example |
| |
| Let's see a quick example of how all this is useful. |
| |
| First, remember that C<[1, 2, 3]> makes an anonymous array containing |
| C<(1, 2, 3)>, and gives you a reference to that array. |
| |
| Now think about |
| |
| @a = ( [1, 2, 3], |
| [4, 5, 6], |
| [7, 8, 9] |
| ); |
| |
| @a is an array with three elements, and each one is a reference to |
| another array. |
| |
| C<$a[1]> is one of these references. It refers to an array, the array |
| containing C<(4, 5, 6)>, and because it is a reference to an array, |
| B<Use Rule 2> says that we can write C<< $a[1]->[2] >> to get the |
| third element from that array. C<< $a[1]->[2] >> is the 6. |
| Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a |
| two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get |
| or set the element in any row and any column of the array. |
| |
| The notation still looks a little cumbersome, so there's one more |
| abbreviation: |
| |
| =head2 Arrow Rule |
| |
| In between two B<subscripts>, the arrow is optional. |
| |
| Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the |
| same thing. Instead of C<< $a[0]->[1] = 23 >>, we can write |
| C<$a[0][1] = 23>; it means the same thing. |
| |
| Now it really looks like two-dimensional arrays! |
| |
| You can see why the arrows are important. Without them, we would have |
| had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For |
| three-dimensional arrays, they let us write C<$x[2][3][5]> instead of |
| the unreadable C<${${$x[2]}[3]}[5]>. |
| |
| =head1 Solution |
| |
| Here's the answer to the problem I posed earlier, of reformatting a |
| file of city and country names. |
| |
| 1 my %table; |
| |
| 2 while (<>) { |
| 3 chomp; |
| 4 my ($city, $country) = split /, /; |
| 5 $table{$country} = [] unless exists $table{$country}; |
| 6 push @{$table{$country}}, $city; |
| 7 } |
| |
| 8 foreach $country (sort keys %table) { |
| 9 print "$country: "; |
| 10 my @cities = @{$table{$country}}; |
| 11 print join ', ', sort @cities; |
| 12 print ".\n"; |
| 13 } |
| |
| |
| The program has two pieces: Lines 2--7 read the input and build a data |
| structure, and lines 8-13 analyze the data and print out the report. |
| We're going to have a hash, C<%table>, whose keys are country names, |
| and whose values are references to arrays of city names. The data |
| structure will look like this: |
| |
| |
| %table |
| +-------+---+ |
| | | | +-----------+--------+ |
| |Germany| *---->| Frankfurt | Berlin | |
| | | | +-----------+--------+ |
| +-------+---+ |
| | | | +----------+ |
| |Finland| *---->| Helsinki | |
| | | | +----------+ |
| +-------+---+ |
| | | | +---------+------------+----------+ |
| | USA | *---->| Chicago | Washington | New York | |
| | | | +---------+------------+----------+ |
| +-------+---+ |
| |
| We'll look at output first. Supposing we already have this structure, |
| how do we print it out? |
| |
| 8 foreach $country (sort keys %table) { |
| 9 print "$country: "; |
| 10 my @cities = @{$table{$country}}; |
| 11 print join ', ', sort @cities; |
| 12 print ".\n"; |
| 13 } |
| |
| C<%table> is an |
| ordinary hash, and we get a list of keys from it, sort the keys, and |
| loop over the keys as usual. The only use of references is in line 10. |
| C<$table{$country}> looks up the key C<$country> in the hash |
| and gets the value, which is a reference to an array of cities in that country. |
| B<Use Rule 1> says that |
| we can recover the array by saying |
| C<@{$table{$country}}>. Line 10 is just like |
| |
| @cities = @array; |
| |
| except that the name C<array> has been replaced by the reference |
| C<{$table{$country}}>. The C<@> tells Perl to get the entire array. |
| Having gotten the list of cities, we sort it, join it, and print it |
| out as usual. |
| |
| Lines 2-7 are responsible for building the structure in the first |
| place. Here they are again: |
| |
| 2 while (<>) { |
| 3 chomp; |
| 4 my ($city, $country) = split /, /; |
| 5 $table{$country} = [] unless exists $table{$country}; |
| 6 push @{$table{$country}}, $city; |
| 7 } |
| |
| Lines 2-4 acquire a city and country name. Line 5 looks to see if the |
| country is already present as a key in the hash. If it's not, the |
| program uses the C<[]> notation (B<Make Rule 2>) to manufacture a new, |
| empty anonymous array of cities, and installs a reference to it into |
| the hash under the appropriate key. |
| |
| Line 6 installs the city name into the appropriate array. |
| C<$table{$country}> now holds a reference to the array of cities seen |
| in that country so far. Line 6 is exactly like |
| |
| push @array, $city; |
| |
| except that the name C<array> has been replaced by the reference |
| C<{$table{$country}}>. The C<push> adds a city name to the end of the |
| referred-to array. |
| |
| There's one fine point I skipped. Line 5 is unnecessary, and we can |
| get rid of it. |
| |
| 2 while (<>) { |
| 3 chomp; |
| 4 my ($city, $country) = split /, /; |
| 5 #### $table{$country} = [] unless exists $table{$country}; |
| 6 push @{$table{$country}}, $city; |
| 7 } |
| |
| If there's already an entry in C<%table> for the current C<$country>, |
| then nothing is different. Line 6 will locate the value in |
| C<$table{$country}>, which is a reference to an array, and push |
| C<$city> into the array. But |
| what does it do when |
| C<$country> holds a key, say C<Greece>, that is not yet in C<%table>? |
| |
| This is Perl, so it does the exact right thing. It sees that you want |
| to push C<Athens> onto an array that doesn't exist, so it helpfully |
| makes a new, empty, anonymous array for you, installs it into |
| C<%table>, and then pushes C<Athens> onto it. This is called |
| 'autovivification'--bringing things to life automatically. Perl saw |
| that they key wasn't in the hash, so it created a new hash entry |
| automatically. Perl saw that you wanted to use the hash value as an |
| array, so it created a new empty array and installed a reference to it |
| in the hash automatically. And as usual, Perl made the array one |
| element longer to hold the new city name. |
| |
| =head1 The Rest |
| |
| I promised to give you 90% of the benefit with 10% of the details, and |
| that means I left out 90% of the details. Now that you have an |
| overview of the important parts, it should be easier to read the |
| L<perlref> manual page, which discusses 100% of the details. |
| |
| Some of the highlights of L<perlref>: |
| |
| =over 4 |
| |
| =item * |
| |
| You can make references to anything, including scalars, functions, and |
| other references. |
| |
| =item * |
| |
| In B<Use Rule 1>, you can omit the curly brackets whenever the thing |
| inside them is an atomic scalar variable like C<$aref>. For example, |
| C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as |
| C<${$aref}[1]>. If you're just starting out, you may want to adopt |
| the habit of always including the curly brackets. |
| |
| =item * |
| |
| This doesn't copy the underlying array: |
| |
| $aref2 = $aref1; |
| |
| You get two references to the same array. If you modify |
| C<< $aref1->[23] >> and then look at |
| C<< $aref2->[23] >> you'll see the change. |
| |
| To copy the array, use |
| |
| $aref2 = [@{$aref1}]; |
| |
| This uses C<[...]> notation to create a new anonymous array, and |
| C<$aref2> is assigned a reference to the new array. The new array is |
| initialized with the contents of the array referred to by C<$aref1>. |
| |
| Similarly, to copy an anonymous hash, you can use |
| |
| $href2 = {%{$href1}}; |
| |
| =item * |
| |
| To see if a variable contains a reference, use the C<ref> function. It |
| returns true if its argument is a reference. Actually it's a little |
| better than that: It returns C<HASH> for hash references and C<ARRAY> |
| for array references. |
| |
| =item * |
| |
| If you try to use a reference like a string, you get strings like |
| |
| ARRAY(0x80f5dec) or HASH(0x826afc0) |
| |
| If you ever see a string that looks like this, you'll know you |
| printed out a reference by mistake. |
| |
| A side effect of this representation is that you can use C<eq> to see |
| if two references refer to the same thing. (But you should usually use |
| C<==> instead because it's much faster.) |
| |
| =item * |
| |
| You can use a string as if it were a reference. If you use the string |
| C<"foo"> as an array reference, it's taken to be a reference to the |
| array C<@foo>. This is called a I<soft reference> or I<symbolic |
| reference>. The declaration C<use strict 'refs'> disables this |
| feature, which can cause all sorts of trouble if you use it by accident. |
| |
| =back |
| |
| You might prefer to go on to L<perllol> instead of L<perlref>; it |
| discusses lists of lists and multidimensional arrays in detail. After |
| that, you should move on to L<perldsc>; it's a Data Structure Cookbook |
| that shows recipes for using and printing out arrays of hashes, hashes |
| of arrays, and other kinds of data. |
| |
| =head1 Summary |
| |
| Everyone needs compound data structures, and in Perl the way you get |
| them is with references. There are four important rules for managing |
| references: Two for making references and two for using them. Once |
| you know these rules you can do most of the important things you need |
| to do with references. |
| |
| =head1 Credits |
| |
| Author: Mark Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>) |
| |
| This article originally appeared in I<The Perl Journal> |
| ( http://www.tpj.com/ ) volume 3, #2. Reprinted with permission. |
| |
| The original title was I<Understand References Today>. |
| |
| =head2 Distribution Conditions |
| |
| Copyright 1998 The Perl Journal. |
| |
| This documentation is free; you can redistribute it and/or modify it |
| under the same terms as Perl itself. |
| |
| Irrespective of its distribution, all code examples in these files are |
| hereby placed into the public domain. You are permitted and |
| encouraged to use this code in your own programs for fun or for profit |
| as you see fit. A simple comment in the code giving credit would be |
| courteous but is not required. |
| |
| |
| |
| |
| =cut |