You should have read section 2 of this faq. There you would have learned that comp.lang.perl.misc is the appropriate place to go for free advice. If your question is really important and you require a prompt and correct answer, you should hire a consultant.
Furthermore, you may include this document in any distribution of the full Perl source or binaries, in its verbatim documentation, or on a complete dump of the CPAN archive, providing that the three stipulations given above continue to be met.
Added new question on Perl BNF on the perlfaq7 manpage.
In particular, the core development team (known as the Perl Porters) are a rag-tag band of highly altruistic individuals committed to producing better software for free than you could hope to purchase for money. You may snoop on pending developments via news://genetics.upenn.edu/perl.porters-gw/ and http://www.frii.com/~gnat/perl/porters/summary.html.
While the GNU project includes Perl in its distributions, there's no such thing as ``GNU Perl''. Perl is not produced nor maintained by the Free Software Foundation. Perl's licensing terms are also more open than GNU software's tend to be.
You can get commercial support of Perl if you wish, although for most users the informal support will more than suffice. See the answer to ``Where can I buy a commercial version of perl?'' for more information.
5 release of Perl'', but some people have interpreted this to
mean there's a language called ``perl5'', which isn't the case. Perl5 is
merely the popular name for the fifth major release (October 1994), while
perl4 was the fourth major release (March 1991). There was also a perl1 (in
January 1988), a perl2 (June 1988), and a perl3 (October 1989).
The 5.0 release is, essentially, a complete rewrite of the perl source code from the ground up. It has been modularized, object-oriented, tweaked, trimmed, and optimized until it almost doesn't look like the old code. However, the interface is mostly the same, and compatibility with previous releases is very high.
To avoid the ``what language is perl5?'' confusion, some people prefer to simply use ``perl'' to refer to the latest version of perl and avoid using ``perl5'' altogether. It's not really that big a deal, though.
Larry and the Perl development team occasionally make changes to the internal core of the language, but all possible efforts are made toward backward compatibility. While not quite all perl4 scripts run flawlessly under perl5, an update to perl should nearly never invalidate a program written for an earlier version of perl (barring accidental bug fixes and the rare new keyword).
Most tasks only require a small subset of the Perl language. One of the guiding mottos for Perl development is ``there's more than one way to do it'' (TMTOWTDI, sometimes pronounced ``tim toady''). Perl's learning curve is therefore shallow (easy to learn) and long (there's a whole lot you can do if you really want).
Finally, Perl is (frequently) an interpreted language. This means that you can write your programs and test them without an intermediate compilation step, allowing you to experiment and test/debug quickly and easily. This ease of experimentation flattens the learning curve even more.
Things that make Perl easier to learn: Unix experience, almost any kind of programming experience, an understanding of regular expressions, and the ability to understand other people's code. If there's something you need to do, then it's probably already been done, and a working example is usually available for free. Don't forget the new perl modules, either. They're discussed in Part 3 of this FAQ, along with the CPAN, which is discussed in Part 2.
Probably the best thing to do is try to write equivalent code to do a set of tasks. These languages have their own newsgroups in which you can learn about (but hopefully not argue about) them.
If you have a library that provides an API, you can make any component of it available as just another Perl function or variable using a Perl extension written in C or C++ and dynamically linked into your main perl interpreter. You can also go the other direction, and write your main program in C or C++, and then link in some Perl code on the fly, to create a powerful application.
That said, there will always be small, focused, special-purpose languages dedicated to a specific problem domain that are simply more convenient for certain kinds of problems. Perl tries to be all things to all people, but nothing special to anyone. Examples of specialized languages that come to mind include prolog and matlab.
Actually, one good reason is when you already have an existing application written in another language that's all done (and done well), or you have an application language specifically designed for a certain task (e.g. prolog, make).
For various reasons, Perl is probably not well-suited for real-time embedded systems, low-level operating systems development work like device drivers or context-switching code, complex multithreaded shared-memory applications, or extremely large applications. You'll notice that perl is not itself written in Perl.
The new native-code compiler for Perl may reduce the limitations given in the previous statement to some degree, but understand that Perl remains fundamentally a dynamically typed language, and not a statically typed one. You certainly won't be chastized if you don't trust nuclear-plant or brain-surgery monitoring code to it. And Larry will sleep easier, too -- Wall Street programs not withstanding. :-)
In ``standard terminology'' a program has been compiled to physical machine code once, and can then be be run multiple times, whereas a script must be translated by a program each time it's used. Perl programs, however, are usually neither strictly compiled nor strictly interpreted. They can be compiled to a bytecode form (something of a Perl virtual machine) or to completely different languages, like C or assembly language. You can't tell just by looking whether the source is destined for a pure interpreter, a parse-tree interpreter, a byte-code interpreter, or a native-code compiler, so it's hard to give a definitive answer here.
If you have a project which has a bottleneck, especially in terms of translation, or testing, Perl almost certainly will provide a viable, and quick solution. In conjunction with any persuasion effort, you should not fail to point out that Perl is used, quite extensively, and with extremely reliable and valuable results, at many large computer software and/or hardware companies throughout the world. In fact, many Unix vendors now ship Perl by default, and support is usually just a news-posting away, if you can't find the answer in the comprehensive documentation, including this FAQ.
If you face reluctance to upgrading from an older version of perl, then point out that version 4 is utterly unmaintained and unsupported by the Perl Development Team. Another big sell for Perl5 is the large number of modules and extensions which greatly reduce development time for any given task. Also mention that the difference between version 4 and version 5 of Perl is like the difference between awk and C++. (Well, ok, maybe not quite that distinct, but you get the idea.) If you want support and a reasonable guarantee that what you're developing will continue to work in the future, then you have to run the supported version. That probably means running the 5.004 release, although 5.003 isn't that bad (it's just one year and one release behind). Several important bugs were fixed from the 5.000 through 5.002 versions, though, so try upgrading past them if possible.
Although it's rumored that the (imminent) 5.004 release may build on Windows NT, this is yet to be proven. Binary distributions for 32-bit Microsoft systems and for Apple systems can be found http://www.perl.com/CPAN/ports/ directory. Because these are not part of the standard distribution, they may and in fact do differ from the base Perl port in a variety of ways. You'll have to check their respective release notes to see just what the differences are. These differences can be either positive (e.g. extensions for the features of the particular platform that are not supported in the source release of perl) or negative (e.g. might be based upon a less current source release of perl).
A useful FAQ for Win32 Perl users is http://www.endcontsw.com/people/evangelo/Perl_for_Win32_FAQ.html
make install. Most other approaches are doomed to failure.
One simple way to check that things are in the right place is to print out
the hard-coded @INC which perl is looking for.
perl -e 'print join("\n",@INC)'
If this command lists any paths which don't exist on your system, then you may need to move the appropriate libraries to these locations, or create symlinks, aliases, or shortcuts appropriately.
CPAN/path/... is a naming convention for files available on CPAN sites. CPAN indicates the base directory of a CPAN mirror, and the rest of the path is the path from that directory to the file. For instance, if you're using ftp://ftp.funet.fi/pub/languages/perl/CPAN as your CPAN site, the file CPAN/misc/japh file is downloadable as ftp://ftp.funet.fi/pub/languages/perl/CPAN/misc/japh .
Considering that there are hundreds of existing modules in the archive, one probably exists to do nearly anything you can think of. Current categories under CPAN/modules/by-category/ include perl core modules; development support; operating system interfaces; networking, devices, and interprocess communication; data type utilities; database interfaces; user interfaces; interfaces to other languages; filenames, file systems, and file locking; internationalization and locale; world wide web support; server and daemon utilities; archiving and compression; image manipulation; mail and news; control flow utilities; filehandle and I/O; Microsoft Windows modules; and miscellaneous modules.
man perl if you're on a system resembling Unix. This will lead you to other
important man pages. If you're not on a Unix system, access to the
documentation will be different; for example, it might be only in HTML
format. But all proper perl installations have fully-accessible
documentation.
You might also try perldoc perl in case your system doesn't have a proper man command, or it's been
misinstalled. If that doesn't work, try looking in /usr/local/lib/perl5/pod
for documentation.
If all else fails, consult the CPAN/doc directory, which contains the complete documentation in various formats, including native pod, troff, html, and plain text. There's also a web page at http://www.perl.com/perl/info/documentation.html that might help.
It's also worth noting that there's a PDF version of the complete documentation for perl available in the CPAN/authors/id/BMIDD directory.
Many good books have been written about Perl -- see the section below for more details.
comp.lang.perl.announce Moderated announcement group
comp.lang.perl.misc Very busy group about Perl in general
comp.lang.perl.modules Use and development of Perl modules
comp.lang.perl.tk Using Tk (and X) from Perl
comp.infosystems.www.authoring.cgi Writing CGI scripts for the Web.
There is also USENET gateway to the mailing list used by the crack Perl development team (perl5-porters) at news://genetics.upenn.edu/perl.porters-gw/ .
The incontestably definitive reference book on Perl, written by the creator of Perl and his apostles, is now in its second edition and fourth printing.
Programming Perl (the "Camel Book"):
Authors: Larry Wall, Tom Christiansen, and Randal Schwartz
ISBN 1-56592-149-6 (English)
ISBN 4-89052-384-7 (Japanese)
(French and German translations in progress)
Note that O'Reilly books are color-coded: turquoise (some would call it teal) covers indicate perl5 coverage, while magenta (some would call it pink) covers indicate perl4 only. Check the cover color before you buy!
What follows is a list of the books that the FAQ authors found personally useful. Your mileage may (but, we hope, probably won't) vary.
If you're already a hard-core systems programmer, then the Camel Book just might suffice for you to learn Perl from. But if you're not, check out the ``Llama Book''. It currently doesn't cover perl5, but the 2nd edition is nearly done and should be out by summer 97:
Learning Perl (the Llama Book):
Author: Randal Schwartz, with intro by Larry Wall
ISBN 1-56592-042-2 (English)
ISBN 4-89502-678-1 (Japanese)
ISBN 2-84177-005-2 (French)
ISBN 3-930673-08-8 (German)
Another stand-out book in the turquoise O'Reilly Perl line is the ``Hip Owls'' book. It covers regular expressions inside and out, with quite a bit devoted exclusively to Perl:
Mastering Regular Expressions (the Cute Owls Book): Author: Jeffrey Friedl ISBN 1-56592-257-3
You can order any of these books from O'Reilly & Associates, 1-800-998-9938. Local/overseas is 1-707-829-0515. If you can locate an O'Reilly order form, you can also fax to 1-707-829-0104. See http://www.ora.com/ on the Web.
Recommended Perl books that are not from O'Reilly are the following:
Cross-Platform Perl, (for Unix and Windows NT)
Author: Eric F. Johnson
ISBN: 1-55851-483-X
How to Set up and Maintain a World Wide Web Site, (2nd edition) Author: Lincoln Stein, M.D., Ph.D. ISBN: 0-201-63462-7
CGI Programming in C & Perl, Author: Thomas Boutell ISBN: 0-201-42219-0
Note that some of these address specific application areas (e.g. the Web) and are not general-purpose programming books.
Beyond this, two other magazines that frequently carry high-quality articles on Perl are Web Techniques (see http://www.webtechniques.com/) and Unix Review (http://www.unixreview.com/).
http://www.perl.com/CPAN (redirects to another mirror) http://www.perl.org/CPAN ftp://ftp.funet.fi/pub/languages/perl/CPAN/ http://www.cs.ruu.nl/pub/PERL/CPAN/ ftp://ftp.cs.colorado.edu/pub/perl/CPAN/
If you subscribe to a mailing list, it behooves you to know how to unsubscribe from it. Strident pleas to the list itself to get you off will not be favorably received.
Also see Matthias Neeracher's (the creator and maintainer of MacPerl) webpage at http://www.iis.ee.ethz.ch/~neeri/macintosh/perl.html for many links to interesting MacPerl sites, and the applications/MPW tools, precompiled.
subscribe Perl-Win32-Users
The list software, also written in perl, will automatically determine your address, and subscribe you automatically. To unsubscribe, email the following in the message body to the same address like so:
unsubscribe Perl-Win32-Users
You can also check http://www.activeware.com/ and select ``Mailing Lists'' to join or leave this list.
subscribe perl-packrats
The list software, also written in perl, will automatically determine your address, and subscribe you automatically. To unsubscribe, simple prepend the same command with an ``un'', and mail to the same address like so:
unsubscribe perl-packrats
ftp.cis.ufl.edu:/pub/perl/comp.lang.perl.*/monthly has an almost complete collection dating back to 12/89 (missing 08/91 through 12/93). They are kept as one large file for each month.
You'll probably want more a sophisticated query and retrieval mechanism than a file listing, preferably one that allows you to retrieve articles using a fast-access indices, keyed on at least author, date, subject, thread (as in ``trn'') and probably keywords. The best solution the FAQ authors know of is the MH pick command, but it is very slow to select on 18000 articles.
If you have, or know where can be found, the missing sections, please let perlfaq-suggestions@perl.com know.
However, these answers may not suffice for managers who require a purchase order from a company whom they can sue should anything go wrong. Or maybe they need very serious hand-holding and contractual obligations. Shrink-wrapped CDs with perl on them are available from several sources if that will help.
Or you can purchase a real support contract. Although Cygnus historically provided this service, they no longer sell support contracts for Perl. Instead, the Paul Ingram Group will be taking up the slack through The Perl Clinic. The following is a commercial from them:
``Do you need professional support for Perl and/or Oraperl? Do you need a support contract with defined levels of service? Do you want to pay only for what you need?
``The Paul Ingram Group has provided quality software development and support services to some of the world's largest corporations for ten years. We are now offering the same quality support services for Perl at The Perl Clinic. This service is led by Tim Bunce, an active perl porter since 1994 and well known as the author and maintainer of the DBI, DBD::Oracle, and Oraperl modules and author/co-maintainer of The Perl 5 Module List. We also offer Oracle users support for Perl5 Oraperl and related modules (which Oracle is planning to ship as part of Oracle Web Server 3). 20% of the profit from our Perl support work will be donated to The Perl Institute.''
For more information, contact the The Perl Clinic:
Tel: +44 1483 424424
Fax: +44 1483 419419
Web: http://www.perl.co.uk/
Email: perl-support-info@perl.co.uk or Tim.Bunce@ig.co.uk
If you are posting a bug with a non-standard port (see the answer to ``What platforms is Perl available for?''), a binary distribution, or a non-standard module (such as Tk, CGI, etc), then please see the documentation that came with it to determine the correct place to post bugs.
Read the perlbug man page (perl5.004 or later) for more information.
The perl.com domain is Tom Christiansen's domain. He created it as a public service long before perl.org came about. It's the original PBS of the Perl world, a clearinghouse for information about all things Perlian, accepting no paid advertisements, glossy gifs, or (gasp!) java applets on its pages.
Objects perlref, perlmod, perlobj, perltie Data Structures perlref, perllol, perldsc Modules perlmod, perlsub Regexps perlre, perlfunc, perlop Moving to perl5 perltrap, perl Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed Various http://www.perl.com/CPAN/doc/FMTEYEWTK/index.html (not a man-page but still useful)
the perltoc manpage provides a crude table of contents for the perl man page set.
perldebug man page, on an ``empty'' program, like this:
perl -de 42
Now just type in any legal Perl code, and it will be immediately evaluated. You can also examine the symbol table, get stack backtraces, check variable values, set breakpoints, and other operations typically found in symbolic debuggers
-w?
Have you tried use strict?
Did you check the returns of each and every system call?
Did you read the perltrap manpage?
Have you tried the Perl debugger, described in the perldebug manpage?
perl -MO=Xref[,OPTIONS] foo.pl
indent
will do for C. The complex feedback between the scanner and the parser
(this feedback is what confuses the vgrind and emacs programs) makes it
challenging at best to write a stand-alone Perl parser.
Of course, if you simply follow the guidelines in the perlstyle manpage, you shouldn't need to reformat.
Your editor can and should help you with source formatting. The perl-mode for emacs can provide a remarkable amount of help with most (but not all) code, and even less programmable editors can provide significant assistance.
If you are using to using vgrind program for printing out nice code to a laser printer, you can take a stab at this using http://www.perl.com/CPAN/doc/misc/tips/working.vgrind.entry, but the results are not particularly satisfying for sophisticated code.
In the perl source directory, you'll find a directory called ``emacs'', which contains a cperl-mode that color-codes keywords, provides context-sensitive help, and other nifty things.
Note that the perl-mode of emacs will have fits with ``main'foo'' (single quote), and mess up the indentation and hilighting. You should be using ``main::foo'', anyway.
Other approaches include autoloading seldom-used Perl code. See the AutoSplit and AutoLoader modules in the standard distribution for that. Or you could locate the bottleneck and think about writing just that part in C, the way we used to take bottlenecks in C code and write them in assembler. Similar to rewriting in C is the use of modules that have critical sections written in C (for instance, the PDL module from CPAN).
In some cases, it may be worth it to use the backend compiler to produce byte code (saving compilation time) or compile into C, which will certainly save compilation time and sometimes a small amount (but not much) execution time. See the question about compiling your Perl programs.
If you're currently linking your perl executable to a shared libc.so, you can often gain a 10-25% performance benefit by rebuilding it to link with a static libc.a instead. This will make a bigger perl executable, but your Perl programs (and programmers) may thank you for it. See the INSTALL file in the source distribution for more information.
Unsubstantiated reports allege that Perl interpreters that use sfio outperform those that don't (for IO intensive applications). To try this, see the INSTALL file in the source distribution, especially the ``Selecting File IO mechanisms'' section.
The undump program was an old attempt to speed up your Perl program by storing the already-compiled form to disk. This is no longer a viable option, as it only worked on a few architectures, and wasn't a good solution anyway.
In some cases, using substr or vec to simulate
arrays can be highly beneficial. For example, an array of a thousand
booleans will take at least 20,000 bytes of space, but it can be turned
into one 125-byte bit vector for a considerable memory savings. The
standard Tie::SubstrHash module can also help for certain types of data
structure. If you're working with specialist data structures (matrices, for
instance) modules that implement these in C may use less memory than
equivalent Perl modules.
Another thing to try is learning whether your Perl was compiled with the
system malloc or with Perl's built-in malloc. Whichever one it is, try
using the other one and see whether this makes a difference. Information
about malloc is in the INSTALL file in the source distribution. You can find out whether you are using
perl's malloc by typing perl -V:usemymalloc.
sub makeone {
my @a = ( 1 .. 10 );
return \@a;
}
for $i ( 1 .. 10 ) {
push @many, makeone();
}
print $many[4][5], "\n";
print "@many\n";
However, judicious use of my on your variables will help make
sure that they go out of scope so that Perl can free up their storage for
use in other parts of your program. (NB: my variables also
execute about 10% faster than globals.) A global variable, of course, never
goes out of scope, so you can't get its space automatically reclaimed,
although undefing and/or deleteing it will
achieve the same effect. In general, memory allocation and de-allocation
isn't something you can or should be worrying about much in Perl, but even
this capability (preallocation of data types) is in the works.
There are at least two popular ways to avoid this overhead. One solution involves running the Apache HTTP server (available from http://www.apache.org/) with either of the mod_perl or mod_fastcgi plugin modules. With mod_perl and the Apache::* modules (from CPAN), httpd will run with an embedded Perl interpreter which pre-compiles your script and then executes it within the same address space without forking. The Apache extension also gives Perl access to the internal server API, so modules written in Perl can do just about anything a module written in C can. With the FCGI module (from CPAN), a Perl executable compiled with sfio (see the INSTALL file in the distribution) and the mod_fastcgi module (available from http://www.fastcgi.com/) each of your perl scripts becomes a permanent CGI daemon processes.
Both of these solutions can have far-reaching effects on your system and on the way you write your CGI scripts, so investigate them with care.
First of all, however, you can't take away read permission, because the source code has to be readable in order to be compiled and interpreted. (That doesn't mean that a CGI script's source is readable by people on the web, though.) So you have to leave the permissions at the socially friendly 0755 level.
Some people regard this as a security problem. If your program does insecure things, and relies on people not knowing how to exploit those insecurities, it is not secure. It is often possible for someone to determine the insecure things and exploit them without viewing the source. Security through obscurity, the name for hiding your bugs instead of fixing them, is little security indeed.
You can try using encryption via source filters (Filter::* from CPAN). But crackers might be able to decrypt it. You can try using the byte-code compiler and interpreter described below, but crackers might be able to de-compile it. You can try using the native-code compiler described below, but crackers might be able to disassemble it. These pose varying degrees of difficulty to people wanting to get at your code, but none can definitively conceal it (this is true of every language, not just Perl).
If you're concerned about people profiting from your code, then the bottom line is that nothing but a restrictive licence will give you legal security. License your software and pepper it with threatening statements like ``This is unpublished proprietary software of XYZ Corp. Your access to it does not give you permission to use it blah blah blah.'' We are not lawyers, of course, so you should see a lawyer if you want to be sure your licence's wording will stand up in court.
Please understand that merely compiling into C does not in and of itself guarantee that your code will run very much faster. That's because except for lucky cases where a lot of native type inferencing is possible, the normal Perl run time system is still present and thus will still take just as long to run and be just as big. Most programs save little more than compilation time, leaving execution no more than 10-30% faster. A few rare programs actually benefit significantly (like several times faster), but this takes some tweaking of your code.
Malcolm will be in charge of the 5.005 release of Perl itself to try to unify and merge his compiler and multithreading work into the main release.
You'll probably be astonished to learn that the current version of the
compiler generates a compiled form of your script whose executable is just
as big as the original perl executable, and then some. That's because as
currently written, all programs are prepared for a full eval
statement. You can tremendously reduce this cost by building a shared
libperl.so library and linking against that. See the
INSTALL podfile in the perl source distribution for details. If you link your main
perl binary with this, it will make it miniscule. For example, on one
author's system, /usr/bin/perl is only 11k in size!
extproc perl -S -your_switches
as the first line in *.cmd file (-S due to a bug in cmd.exe's `extproc' handling). For DOS one should first
invent a corresponding batch file, and codify it in ALTERNATIVE_SHEBANG (see the
INSTALL file in the source distribution for more information).
The Win95/NT installation, when using the Activeware port of Perl, will modify the Registry to associate the .pl extension with the perl interpreter. If you install another port, or (eventually) build your own Win95/NT Perl using WinGCC, then you'll have to modify the Registry yourself.
Macintosh perl scripts will have the the appropriate Creator and Type, so that double-clicking them will invoke the perl application.
IMPORTANT!: Whatever you do, PLEASE don't get frustrated, and just throw the perl interpreter into your cgi-bin directory, in order to get your scripts working for a web server. This is an EXTREMELY big security risk. Take the time to figure out how to do it correctly.
# sum first and last fields
perl -lane 'print $F[0] + $F[-1]'
# identify text files
perl -le 'for(@ARGV) {print if -f && -T _}' *
# remove comments from C program
perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
# make file a month younger than today, defeating reaper daemons
perl -e '$X=24*60*60; utime(time(),time() + 30 * $X,@ARGV)' *
# find first unused uid
perl -le '$i++ while getpwuid($i); print $i'
# display reasonable manpath
echo $PATH | perl -nl -072 -e '
s![^/+]*$!man!&&-d&&!$s{$_}++&&push@m,$_;END{print"@m"}'
Ok, the last one was actually an obfuscated perl entry. :-)
For example:
# Unix
perl -e 'print "Hello world\n"'
# DOS, etc.
perl -e "print \"Hello world\n\""
# Mac
print "Hello world\n"
(then Run "Myscript" or Shift-Command-R)
# VMS
perl -e "print ""Hello world\n"""
The problem is that none of this is reliable: it depends on the command interpreter. Under Unix, the first two often work. Under DOS, it's entirely possible neither works. If 4DOS was the command shell, I'd probably have better luck like this:
perl -e "print <Ctrl-x>"Hello world\n<Ctrl-x>""
Under the Mac, it depends which environment you are using. The MacPerl shell, or MPW, is much like Unix shells in its support for several quoting variants, except that it makes free use of the Mac's non-ASCII characters as control characters.
I'm afraid that there is no general solution to all of this. It is a mess, pure and simple.
[Some of this answer was contributed by Kenneth Albanowski.]
The Idiot's Guide to Solving Perl/CGI Problems, by Tom Christiansen http://www.perl.com/perl/faq/idiots-guide.html
Frequently Asked Questions about CGI Programming, by Nick Kew ftp://rtfm.mit.edu/pub/usenet/news.answers/www/cgi-faq http://www3.pair.com/webthing/docs/cgi/faqs/cgifaq.shtml
Perl/CGI programming FAQ, by Shishir Gundavaram and Tom Christiansen http://www.perl.com/perl/faq/perl-cgi-faq.html
The WWW Security FAQ, by Lincoln Stein http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html
World Wide Web FAQ, by Thomas Boutell http://www.boutell.com/faq/
make test TEST_VERBOSE=1 along with perl -V.
perl program 2>diag.out
splain [-v] [-p] diag.out
or change your program to explain the messages for you:
use diagnostics;
or
use diagnostics -verbose;
oct or hex if you want the values converted.
oct interprets both hex (``0x350'') numbers and octal ones
(``0350'' or even without the leading ``0'', like ``377''), while
hex only converts hexadecimal ones, with or without a leading
``0x'', like ``0x255'', ``3A'', ``ff'', or ``deadbeef''.
This problem shows up most often when people try using chmod,
mkdir, umask, or sysopen, which all
want permissions in octal.
chmod(644, $file); # WRONG -- perl -w catches this
chmod(0644, $file); # right
sprintf or
printf is usually the easiest route.
The POSIX module (part of the standard perl distribution) implements
ceil, floor, and a number of other mathematical
and trigonometric functions.
The Math::Complex module (part of the standard perl distribution) defines a number of mathematical functions that can also work on real numbers. It's not as efficient as the POSIX library, but the POSIX library can't work with complex numbers.
Rounding in financial applications can have serious implications, and the rounding method used should be specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by Perl, but to instead implement the rounding function you need yourself.
pack function (documented in
pack):
$decimal = pack('B8', '10110110');
Here's an example of going the other way:
$binary_string = join('', unpack('B*', "\x29"));
@results = map { my_func($_) } @array;
For example:
@triple = map { 3 * $_ } @single;
To call a function on each element of an array, but ignore the results:
foreach $iterator (@array) {
&my_func($iterator);
}
To call a function on each integer in a (small) range, you can use:
@results = map { &my_func($_) } (5 .. 25);
but you should be aware that the .. operator creates an array of all integers in the range. This can take a lot
of memory for large ranges. Instead use:
@results = ();
for ($i=5; $i < 500_005; $i++) {
push(@results, &my_func($i));
}
You should also check out the Math::TrulyRandom module from CPAN.
localtime (see
localtime):
$day_of_year = (localtime(time()))[7];
or more legibly (in 5.004 or higher):
use Time::localtime;
$day_of_year = localtime(time())->yday;
You can find the week of the year by dividing this by 7:
$week_of_year = int($day_of_year / 7);
Of course, this believes that weeks start at zero.
When gmtime and localtime are used in a scalar
context they return a timestamp string that contains a fully-expanded year.
For example,
$timestamp = gmtime sets $timestamp to ``Tue Nov 13 01:00:00 2001''. There's no
year 2000 problem here.
s/\\(.)/$1/g;
Note that this won't expand \n or \t or any other special escapes.
s/(.)\1/$1/g;
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:
print "That yields ${\($n + 5)} widgets\n";
/xx/ will get the intervening bits in $1. For multiple ones, then something more
like /alphaomega/ would be needed. But none of these deals with nested patterns, nor can
they. For that you'll have to write a parser.
reverse in a scalar context, as documented in
reverse.
$reversed = reverse $string;
1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
Or you can just use the Text::Tabs module (part of the standard perl distribution).
use Text::Tabs;
@expanded_lines = expand(@lines_with_tabs);
use Text::Wrap;
print wrap("\t", ' ', @paragraphs);
$first_byte = substr($a, 0, 1);
If you want to modify part of a string, the simplest way is often to use
substr as an lvalue:
substr($a, 0, 3) = "Tom";
Although those with a regexp kind of thought process will likely prefer
$a =~ s/^.../Tom/;
$count = 0;
s{((whom?)ever)}{
++$count == 5 # is it the 5th?
? "${2}soever" # yes, swap
: $1 # renege and leave it there
}igex;
$string = "ThisXlineXhasXsomeXx'sXinXit":
$count = ($string =~ tr/X//);
print "There are $count X charcters in the string";
This is fine if you are just looking for a single character. However, if
you are trying to count multiple character substrings within a larger
string, tr/// won't work. What you can do is wrap a while loop around a
global pattern match. For example, let's count negative integers:
$string = "-9 55 48 -2 23 -76 4 14 -44";
while ($string =~ /-\d+/g) { $count++ }
print "There are $count negative numbers in the string";
$line =~
s/\b(\w)/\U$1/g;
To make the whole line upper case: $line = uc;
To force each word to be lower case, with the first letter upper case:
$line =~ s/(\w+)/\u\L$1/g;
SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming your string is contained in $text):
@new = ();
push(@new, $+) while $text =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($text,-1,1) eq ',';
Alternatively, the Text::ParseWords module (part of the standard perl distribution) lets you say:
use Text::ParseWords;
@new = quotewords(",", 0, $text);
$string =~ s/^\s*(.*?)\s*$/$1/;
It would be faster to do this in two steps:
$string =~ s/^\s+//;
$string =~ s/\s+$//;
Or more nicely written as:
for ($string) {
s/^\s+//;
s/\s+$//;
}
substr or unpack, both documented in the perlfunc manpage.
$text = 'this has a $foo in it and a $bar';
$text =~ s/\$(\w+)/${$1}/g;
Before version 5 of perl, this had to be done with a double-eval substitution:
$text =~ s/(\$\w+)/$1/eeg;
Which is bizarre enough that you'll probably actually need an EEG afterwards. :-)
If you get used to writing odd things like these:
print "$var"; # BAD
$new = "$old"; # BAD
somefunc("$var"); # BAD
You'll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct:
print $var;
$new = $old;
somefunc($var);
Otherwise, besides slowing you down, you're going to break code when the thing in the scalar is actually neither a string nor a number, but a reference:
func(\@array);
sub func {
my $aref = shift;
my $oref = "$aref"; # WRONG
}
You can also get into subtle problems on those few operations in Perl that
actually do care about the difference between a string and a number, such
as the magical ++ autoincrement operator or the syscall function.
Sometimes it doesn't make a difference, but sometimes it does. For example, compare:
$good[0] = `some program that outputs several lines`;
with
@bad[0] = `same program that outputs several lines`;
The -w flag will warn you about these matters.
$prev = 'nonesuch';
@out = grep($_ ne $prev && ($prev = $_), @in);
This is nice in that it doesn't use much extra memory, simulating
uniq's behavior of removing only adjacent duplicates.
undef %saw;
@out = grep(!$saw{$_}++, @in);
@out = grep(!$saw[$_]++, @in);
undef %saw;
@saw{@in} = ();
@out = sort keys %saw; # remove sort if undesired
undef @ary;
@ary[@in] = @in;
@out = @ary;
@blues = qw/azure cerulean teal turquoise lapis-lazuli/;
undef %is_blue;
for (@blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
@primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
undef @is_tiny_prime;
for (@primes) { $is_tiny_prime[$_] = 1; }
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
@articles = ( 1..10, 150..2000, 2017 );
undef $read;
grep (vec($read,$_,1) = 1, @articles);
Now check whether vec is true for some $n.
Please do not use
$is_there = grep $_ eq $whatever, @array;
or worse yet
$is_there = grep /$whatever/, @array;
These are slow (checks every element even if the first matches), inefficient (same reason), and potentially buggy (what if there are regexp characters in $whatever?).
@union = @intersection = @difference = ();
%count = ();
foreach $element (@array1, @array2) { $count{$element}++ }
foreach $element (keys %count) {
push @union, $element;
push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
}
for ($i=0; $i < @array; $i++) {
if ($array[$i] eq "Waldo") {
$found_index = $i;
last;
}
}
Now $found_index has what you want.
If you really, really wanted, you could use structures as described in the perldsc manpage or the perltoot manpage and do just what the algorithm book tells you to do.
unshift(@array, pop(@array)); # the last shall be first
push(@array, shift(@array)); # and vice versa
srand;
@new = ();
@old = 1 .. 10; # just a demo
while (@old) {
push(@new, splice(@old, rand @old, 1));
}
For large arrays, this avoids a lot of the reshuffling:
srand;
@new = ();
@old = 1 .. 10000; # just a demo
for( @old ){
my $r = rand @new+1;
push(@new,$new[$r]);
$new[$r] = $_;
}
for/foreach:
for (@lines) {
s/foo/bar/;
tr[a-z][A-Z];
}
Here's another; let's compute spherical volumes:
for (@radii) {
$_ **= 3;
$_ *= (4/3) * 3.14159; # this will be constant folded
}
rand function (see rand):
srand; # not needed for 5.004 and later
$index = rand @array;
$element = $array[$index];
permut
function should work on any list:
#!/usr/bin/perl -n
# permute - tchrist@perl.com
permut([split], []);
sub permut {
my @head = @{ $_[0] };
my @tail = @{ $_[1] };
unless (@head) {
# stop recursing when there are no elements in the head
print "@tail\n";
} else {
# for all elements in @head, move one from @head to @tail
# and call permut() on the new @head and @tail
my(@newhead,@newtail,$i);
foreach $i (0 .. $#head) {
@newhead = @head;
@newtail = @tail;
unshift(@newtail, splice(@newhead, $i, 1));
permut([@newhead], [@newtail]);
}
}
}
sort (described in sort):
@list = sort { $a <=> $b } @list;
The default sort function is cmp, string comparison, which would sort into . <=>, used above, is the numerical comparison operator.
If you have a complicated function needed to pull out the part you want to sort on, then don't do it inside the sort function. Pull it out first, because the sort BLOCK can be called many times for the same element. Here's an example of how to pull out the first word after the first number on each item, and then sort those words case-insensitively.
@idx = ();
for (@data) {
($item) = /\d+\s*(\S+)/;
push @idx, uc($item);
}
@sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
Which could also be written this way, using a trick that's come to be known as the Schwartzian Transform:
@sorted = map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, uc((/\d+\s*(\S+) )[0] ] } @data;
If you need to sort on several fields, the following paradigm is useful.
@sorted = sort { field1($a) <=> field1($b) ||
field2($a) cmp field2($b) ||
field3($a) cmp field3($b)
} @data;
This can be conveniently combined with precalculation of keys as given above.
See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about this approach.
See also the question below on sorting hashes.
pack and unpack, or else vec and
the bitwise operations.
For example, this sets $vec to have bit N set if $ints[N] was set:
$vec = '';
foreach(@ints) { vec($vec,$_,1) = 1 }
And here's how, given a vector in $vec, you can get those bits into your
@ints array:
sub bitvec_to_list {
my $vec = shift;
my @ints;
# Find null-byte density then select best algorithm
if ($vec =~ tr/\0// / length $vec > 0.95) {
use integer;
my $i;
# This method is faster with mostly null-bytes
while($vec =~ /[^\0]/g ) {
$i = -9 + 8 * pos $vec;
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
}
} else {
# This method is a fast general algorithm
use integer;
my $bits = unpack "b*", $vec;
push @ints, 0 if $bits =~ s/^(\d)// && $1;
push @ints, pos $bits while($bits =~ /1/g);
}
return \@ints;
}
This method gets faster the more sparse the bit vector is. (Courtesy of Tim Bunce and Winfried Koenig.)
each function (see each) if you don't care whether it's sorted:
while (($key,$value) = each %hash) {
print "$key = $value\n";
}
If you want it sorted, you'll have to use foreach on the
result of sorting the keys as shown in an earlier question.
%by_value = reverse %by_key;
$key = $by_value{$value};
That's not particularly efficient. It would be more space-efficient to use:
while (($key, $value) = each %by_key) {
$by_value{$value} = $key;
}
If your hash could have repeated values, the methods above will only find one of the associated keys. This may or may not worry you.
keys function:
$num_keys = scalar keys %hash;
In void context it just resets the iterator, which is faster for tied hashes.
@keys = sort keys %hash; # sorted by key
@keys = sort {
$hash{$a} cmp $hash{$b}
} keys %hash; # and by value
Here we'll do a reverse numeric sort by value, and if two keys are identical, sort by length of key, and if that fails, by straight ASCII comparison of the keys (well, possibly modified by your locale -- see the perllocale manpage).
@keys = sort {
$hash{$b} <=> $hash{$a}
||
length($b) <=> length($a)
||
$a cmp $b
} keys %hash;
tie using the
$DB_BTREE hash bindings as documented in In Memory Databases.
$key is present in the array, exists will return true. The value for a given key can be undef, in which case $array{$key} will be
undef while $exists{$key} will return true. This corresponds to ($key, undef) being in the hash.
Pictures help... here's the %ary table:
keys values +------+------+ | a | 3 | | x | 7 | | d | 0 | | e | 2 | +------+------+
And these conditions hold
$ary{'a'} is true
$ary{'d'} is false
defined $ary{'d'} is true
defined $ary{'a'} is true
exists $ary{'a'} is true (perl5 only)
grep ($_ eq 'a', keys %ary) is true
If you now say
undef $ary{'a'}
your table now reads:
keys values +------+------+ | a | undef| | x | 7 | | d | 0 | | e | 2 | +------+------+
and these conditions now hold; changes in caps:
$ary{'a'} is FALSE
$ary{'d'} is false
defined $ary{'d'} is true
defined $ary{'a'} is FALSE
exists $ary{'a'} is true (perl5 only)
grep ($_ eq 'a', keys %ary) is true
Notice the last two: you have an undef value, but a defined key!
Now, consider this:
delete $ary{'a'}
your table now reads:
keys values +------+------+ | x | 7 | | d | 0 | | e | 2 | +------+------+
and these conditions now hold; changes in caps:
$ary{'a'} is false
$ary{'d'} is false
defined $ary{'d'} is true
defined $ary{'a'} is false
exists $ary{'a'} is FALSE (perl5 only)
grep ($_ eq 'a', keys %ary) is FALSE
See, the whole entry is gone!
EXISTS and
DEFINED methods differently. For example, there isn't the
concept of undef with hashes that are tied to DBM* files. This means the
true/false tables above will give different results when used on such a
hash. It also means that exists and defined do the same thing with a DBM*
file, and what they end u