Baseball

The BaseballDatabank database on statdocs.berkeley.edu contains 21 relations/tables. These provide information about major league baseball teams and players for the years 1884 through 2003. The database comes from http://baseball1.info and was created and licensed by Sean Lahman. We are entitled to use it for research use and cannot distribute it further. And we are grateful for his efforts in compiling and managing this database.

The attributes in the different relations are described in Baseball Archive documentation.

Other sites for baseball data include

The goal of this homework assignment is to illustrate aspects of using a relational database, and understanding how and when to perform some commands in a database and others in R.

To help you gain familiarity with using a database, answer each of the questions below. Provide a history of your R session used to answer the questions, the requisite plots, and answers to the questions below.

  1. Use the table that contains salaries and compute the payroll for each team in 1999. Which teams had the highest payrolls? Were their payrolls much higher than other teams?
  2. Now modify the above SQL statement to find the team payrolls for all years.
  3. Study the change in salary over time. Use the annual inflation rates (they are relative to the dollar in 1985) below to control for inflation. Have salaries kept up with inflation, fallen behind, or grown faster? Use the bwplot function in the lattice package to compare payrolls over years.
    85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03
    --- 1.91 3.66 4.08 4.83 5.39 4.25 3.03 2.98 2.61 2.81 2.93 2.34 1.55 2.19 3.38 2.83 1.59 2.27
  4. Have certain teams always had top payrolls over the years? Show this graphically.
  5. Augment the SQL statment above to include the following team statistics for each season: the number of games in the season, the number games won in a season, the information as to whether the team won the division, wild card, league, or world series. To do this you will need to join the table with the salaries with the team table.