next up previous contents
Next: Accessing Email: the Net::POP3 Up: Perl on the Web Previous: Debugging CGI scripts   Contents

Web Clients: the LWP module

The LWP::Simple module exports a function called get, which accepts a single argument, the URL of the desired web page, and which returns the content of the web page in a scalar variable. Thus the following code fragment goes to the CNN web site, reads the home page, and prints today's headline, which, as of this writing, was the only text on the page which was surrounded by <H3> tags:
use LWP::Simple;
($page = get 'http://www.cnn.com/') =~ s#^.*<H3>(.*)</H3>.*$#$1#s; 
print "$page\n";
The s modifier of the substitution command was used because the period (.) in the regular expression needs to match newline, as well as all other characters. This will often be the case when a document is stored in a single scalar value.

Since it returns the entire content in a scalar, the get function is suitable for only small web pages. To deal with larger pages, as well as to be able to control headers, content types and other aspects of the HTTP protocol, the LWP::UserAgent and HTTP::Request modules should be used.


next up previous contents
Next: Accessing Email: the Net::POP3 Up: Perl on the Web Previous: Debugging CGI scripts   Contents
Phil Spector 2002-10-18