------------------------------------------------------------------------
Baseball
The BaseballDatabank database on statdocs.berkeley.edu contains 21
relations/tables. These provide information about major league baseball
teams and players for the years 1884 through 2003. The database comes
from http://baseball1.info and was created and
licensed by Sean Lahman. We are entitled to use it for research use and
cannot distribute it further. And we are grateful for his efforts in
compiling and managing this database.
The attributes in the different relations are described in Baseball
Archive documentation .
Other sites for baseball data include
* http://asp.usatoday.com/sports/baseball/salaries/default.aspx
* http://www.baseball-databank.org/
The goal of this homework assignment is to illustrate aspects of using a
relational database, and understanding how and when to perform some
commands in a database and others in R.
To help you gain familiarity with using a database, answer each of the
questions below. Provide a history of your R session used to answer the
questions, the requisite plots, and answers to the questions below.
1. Use the table that contains salaries and compute the payroll for
each team in 1999. Which teams had the highest payrolls? Were
their payrolls much higher than other teams?
2. Now modify the above SQL statement to find the team payrolls for
all years.
3. Study the change in salary over time. Use the annual inflation
rates below to control for inflation. Have salaries kept up with
inflation, fallen behind, or grown faster? Use the bwplot function
in the lattice package to compare payrolls over years.
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00
01 02 03
--- 1.91 3.66 4.08 4.83 5.39 4.25 3.03 2.98 2.61 2.81 2.93
2.34 1.55 2.19 3.38 2.83 1.59 2.27
4. Have certain teams always had top payrolls over the years? Show
this graphically.
5. Augment the SQL statment above to include the following team
statistics for each season: the number of games in the season, the
number games won in a season, the information as to whether the
team won the division, wild card, league, or world series. To do
this you will need to join the table with the salaries with the
team table.
6. Use the xyplot in the lattice plot to explore the relationship
between payroll and performance. To do this, control for inflation
by cutting the data into six groups according to 3 year intervals,
and lay out six plots for these intervals. Construct a factor that
indicates whether the team won their division (or wild card),
league, or world series. The options are: no title, won division
or wild card but not league, won league but not the world series,
or won the world series.