Baseball
The BaseballDatabank database on statdocs.berkeley.edu
contains 21 relations/tables.
These provide information about major league baseball teams and
players for the years 1884 through 2003.
The database comes from
http://baseball1.info and
was created and licensed by Sean Lahman. We are entitled to use it
for research use and cannot distribute it further. And we are
grateful for his efforts in compiling and managing this database.
The attributes in the different relations are described in
Baseball
Archive documentation.
Other sites for baseball data include
The goal of this homework assignment is to illustrate aspects of using
a relational database, and understanding how and when to perform some
commands in a database and others in R.
To help you gain familiarity with using a database, answer
each of the questions below. Provide a history of your R session used
to answer the questions, the requisite plots, and answers to
the questions below.
-
Use the table that contains salaries and compute the payroll
for each team in 1999.
Which teams had the highest payrolls?
Were their payrolls much higher than other teams?
-
Now modify the above SQL statement to find the team payrolls
for all years.
-
Study the change in salary over time. Use the annual inflation
rates below to control for inflation. Have salaries kept up
with inflation, fallen behind, or grown faster?
Use the bwplot function in the lattice package to compare
payrolls over years.
85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94
| 95 | 96 | 97 | 98 | 99 | 00 | 01 | 02 | 03
|
--- | 1.91 | 3.66 | 4.08 | 4.83 | 5.39 | 4.25 | 3.03 | 2.98 | 2.61 | 2.81 | 2.93 | 2.34 | 1.55 | 2.19 |
3.38 | 2.83 | 1.59 | 2.27
|
-
Have certain teams always had top payrolls over the years?
Show this graphically.
-
Augment the SQL statment above to include the following team statistics
for each season: the number of games in the season,
the number games won in a season, the information as to whether
the team won the division, wild card, league, or world series.
To do this you will need to join the table with the salaries
with the team table.
-
Use the xyplot in the lattice plot to explore the relationship
between payroll and performance. To do this, control for inflation by
cutting the data into six groups according to 3 year intervals,
and lay out six plots for these intervals.
Construct a factor that indicates whether the team won their
division (or wild card), league, or world series.
The options are: no title, won division or wild card but not league,
won league but not the world series, or won the world series.