Doing calculations on subgroups of data

Suppose you want to calculate the mean within each group, where the group is defined by a column (field) in your dataframe. First use the split() function to split the data by group, with the output being a list. Then use lapply() to perform the calculation on each element of the list. You can also create discrete groups from a continuous variable using the cut() function.

Here's an example. Suppose I have a vector housePrice and a vector income where the observations are the house price and income for a number of households. To calculate the median house price for people with similar incomes, I can do the following:

You can do something similar using the aggregate function: 

I haven't used it, by the reshape package looks to be useful for manipulating the dimensions of datasets.

Last modified: 12/13/08.

Chris Paciorek 2012-01-21