At http://news.ask.com/news, there are a
number of headlines broken down into categories, like "Top Stories" and
"Science
Tech"
Extract all the headlines from the page into a vector of character strings in R, one
headline per string, and print them.
The pdf file http://www.bea.gov/scb/pdf/2011/02 February/D Pages/0211dpg_c.pdf contains, among other things, the Gross Domestic Product of the
United States for the last 50 years. You can convert a pdf file called, say file.pdf to a text file
using the command:
pdftotext -layout file.pdf
which will automatically create a file called file.txt. Using this data, create
a data frame with columns for year and GDP. The pdftotext command is available on any
of the SCF computers - if you'd like to install pdftotext on your own computer, go
to the xpdf home page.
At http://www.treasurydirect.gov/govt/reports/pd/pd.htm, there are a set
of links entitled "Historical Debt Outstanding". These links
lead to five tables, showing dates and the amount of the national debt on
those dates.
Read the data for the debt from the tables that correspond to the GDP data, i.e. the last
fifty years, combine them with the GDP data, and
merge them together with
the GDP data based on the year of the data. Finally, calculate the percentage of the GDP that the
debt represents for
each year of available data, and make a plot of this percentage over time.
The populations of the US states in 2004 through 2009 can be found at
http://www.infoplease.com/ipa/A0004986.html. Create a data frame with columns for the state name, 2009 population,
2004 population, and the percent change from 1990-2000. Calculate the change in population between
the 2004 and 2009 as a column
in the data frame, and compare that change with the one from 1990-2000. Which states had
the most different population growths from 1990 to 2000 than they did from 2004 to 2009.
At http://www.forbes.com/lists/2009/54/rich-list-09_The-400-Richest-Americans_Rank.html is the first page of a
list of the 400 richest Americans from 2009. The complete list of the top 400 is spread over 16 pages.
Write a program that reads all of the pages, and create a data frame with the rank, name,
net worth, age, residence, and source of wealth for the top 400 richest Americans. Who was the
youngest person on the list? What were the ten most common sources of wealth among the 400
richest Americans?
Submit for grading all of your R code and the output requested by the
assignment, along with any output you used to answer the questions posed by the assignments.
Your submission should be contained in a single pdf file.
Email this file to me (s133@stat.berkeley.edu) by 11:59PM
on the due date. Make certain to save a copy of your email submission.
File translated from
TEX
by
TTH,
version 3.67. On 23 Feb 2011, 13:22.
|