Suppose you want to calculate the mean within each group, where the group is defined by a column (field) in your dataframe. First use the split() function to split the data by group, with the output being a list. Then use lapply() to perform the calculation on each element of the list. You can also create discrete groups from a continuous variable using the cut() function.
Here's an example. Suppose I have a vector housePrice and
a vector income where the observations are the house price
and income for a number of households. To calculate the median house
price for people with similar incomes, I can do the following:
categories=cut(income,breaks=c(seq(0,100000,by=10000),500000))
groupedPrices=split(housePrice,categories)
meanPrice=unlist(lapply(groupedPrices,mean))
You can do something similar using the aggregate function:
categories=cut(income,breaks=c(seq(0,100000,by=10000),500000))
meanPrice=aggregate(housePrice,by=list(categories),FUN='mean
I haven't used it, by the reshape package looks to be useful for manipulating the dimensions of datasets.
Last modified: 12/13/08.