| Text::Soundex - Implementation of the soundex algorithm. |
| |
| Basic Usage: |
| |
| Soundex is used to do a one way transformation of a name, converting |
| a character string given as input into a set of codes representing |
| the identifiable sounds those characters might make in the output. |
| |
| For example: |
| |
| use Text::Soundex; |
| |
| print soundex("Mark"), "\n"; # prints: M620 |
| print soundex("Marc"), "\n"; # prints: M620 |
| |
| print soundex("Hansen"), "\n"; # prints: H525 |
| print soundex("Hanson"), "\n"; # prints: H525 |
| print soundex("Henson"), "\n"; # prints: H525 |
| |
| In many situations, code such as the following: |
| |
| if ($name1 eq $name2) { |
| ... |
| } |
| |
| Can be substituted with: |
| |
| if (soundex($name1) eq soundex($name2)) { |
| ... |
| } |
| |
| Installation: |
| |
| Once the archive has been unpacked then the following steps are needed |
| to build, test and install the module (to be done in the directory which |
| contains the Makefile.PL) |
| |
| perl Makefile.PL |
| make |
| make test |
| |
| If the make test succeeds then the next step may need to be run as root |
| (on a Unix-like system) or with special privileges on other systems. |
| |
| make install |
| |
| If you do not want to use the XS code (for whatever reason) do the following |
| instead of the above: |
| |
| perl Makefile.PL --no-xs |
| make |
| make test |
| make install |
| |
| If any of the tests report 'not ok' and you are running perl 5.6.0 or later |
| then please contact Mark Mielke <mark@mielke.cc> |
| |
| History: |
| |
| Version 3.03: |
| Updated to allow the XS implementation to work properly under an |
| EBCDIC/EBCDIC-UTF8 character set environment. |
| |
| Updated documentation to better describe the history of the |
| soundex algorithm and how it applies to this module. |
| |
| Version 3.02: |
| 3.01 and 3.00 used the 'U8' type incorrectly causing some strict |
| compilers to complain or refuse to compile the XS code. Also, Unicode |
| support did not work properly for Perl 5.6.x. Both of these problems |
| are now fixed. |
| |
| Version 3.01: |
| A bug with non-UTF 8 strings that contain non-ASCII alphabetic characters |
| was fixed. The soundex_unicode() and soundex_nara_unicode() wrapper |
| routines were included and the documentation refers the user to the |
| excellent Text::Unidecode module to perform soundex encodings using |
| unicode strings. The Perl versions of the routines have been further |
| optimized, and correct a border case involving non-alphabetic characters |
| at the beginning of the string. |
| |
| Version 3.00: |
| Support for UTF-8 strings (unicode strings) is now in place. Note |
| that this allows UTF-8 strings to be passed to the XS version of |
| the soundex() routine. The Soundex algorithm treats characters |
| outside the ascii range (0x00 - 0x7F) as if they were not |
| alphabetical. |
| |
| The interface has been simplified. In order to explicitly use the |
| non-XS implementation of soundex(): |
| |
| use Text::Soundex (); |
| $code = Text::Soundex::soundex_noxs($name); |
| |
| In order to use the NARA soundex algorithm: |
| |
| use Text::Soundex 'soundex_nara'; |
| $code = soundex_nara($name); |
| |
| Use of the ':NARA-Ruleset' import directive is now obsolete. To |
| emulate the old behaviour: |
| |
| use Text::Soundex (); |
| *soundex = \&Text::Soundex::soundex_nara; |
| $code = soundex($name); |
| |
| Version 2.20: |
| This version includes support for the algorithm used to index |
| the U.S. Federal Censuses. There is a slight descrepancy in the |
| definition for a soundex code which is not commonly known or |
| recognized involved similar sounding letters being seperated |
| by the characters H or W. This is defined as the NARA ruleset, |
| as this descrepency was discovered by them. (Calling it "the |
| US Census ruleset" was too unwieldy...) |
| |
| NARA can be found at: |
| http://www.nara.gov/genealogy/ |
| |
| The algorithm used by NARA can be found at: |
| http://home.utah-inter.net/kinsearch/Soundex.html |
| |
| Version 2.00: |
| This version is a full re-write of the 1.0 engine by Mark Mielke. |
| The goal was for speed... and this was achieved. There is an optional |
| XS module which can be used completely transparently by the user |
| which offers a further speed increase of a factor of more than 7.5X. |
| |
| Version 1.00: |
| This version can be found in the perl core distribution from at |
| least Perl 5.8.0 and down. It was written by Mike Stok. It can be |
| identified by the fact that it does not contain a $VERSION |
| in the beginning of the module, and as well it uses an RCS |
| tag with a version of 1.x. This version, before some perl5'ish |
| packaging was introduced, was actually written for perl4. |