Best Proj

STAT 157: Selected student course projects from 2006, 2008, 2011, 2014, 2016 and 2017

These are some of the projects which I found most interesting and which cover a range of topics and styles.

I discourage "book report" type projects, but well-written accounts of research literature are appropriate, such as Infant mortality rate (Jennifer Cheng and Wing Tam, 2006) written in the style of the Ropeik-Gray Risk book; and Risk measurement and management: an in-depth look at how Wall Street professionals deal with market risk (Kwon Joon Chang and Christopher Lin and Tianqi Zhu, 2008) provides a timely tutorial on "value at risk". Nuclear power: benefits and risks (Zhixin Li, 2017) uses surveys to assess whether extra information changes people's opinions.

Repeating experiments dealing with the psychology of probability is fun: Risk aversion in games of chance (Rose Cendak and Rebecca Graff, 2006) shows how people can calculate probabilities rationally while simultaneously believing in lucky/unlucky streaks.

Prediction market prices as martingales: theory and analysis (David Klein, 2008) analyses one of the course's basic examples. Coincidences are another basic example, illustrated by Near Misses in Bingo (William Shin-Yu Chon, 2014). And another is sports ranking and rating, illustrated by Elo Ratings and the ATP (tennis) Rankings (Justine Huang, 2014). Another is game theory, illustrated by Rationality of Strategies in Game Theory (Shichang Zhang, 2016).

Sports data provides an endless range of possible projects. Strategies in fantasy NBA basketball (Tommy Lu and Angela Zhang, 2006) devise improved player-ranking schemes for managers of fantasy basketball to plan advantageous drafts and trades; they modestly didn't say that they won their league competition. In an honors thesis project parallel to the course, A regression model using common baseball statistics to project offensive and defensive efficiency (Dennis Moy, 2006) studies improved ways of using runs scored and runs allowed to predict winning percentages. MLB: Why we should use the bullpen differently (Andrew Soncrant, 2016) suggests how to use relief pitchers more efficiently. Also in baseball, Swing or Take: Decision Making under Uncertainty Based on Visual Information (Dibya Ghosh and Maxwell Weinstein, 2017) combines data with a psychophysical model of human perception of moving 3d projectiles to study the interaction between visual stimuli from an incoming pitch and a hitter decision to swing. Easier schedules and increasing home field advantage in college football (Jeffrey Gonda, 2008) investigates why the average home field advantage has increased from 4 to 7 points over the last 30 years. Drafting NFL Wide Receivers: Hit or Miss? (Amrit Dhar 2011) studies why there is a gap between how NFL teams value wide receivers in the draft and how well they perform in the NFL. Statistical analysis of suspected officiating improprieties in two quite different settings and with different motivations (financial, nationalistic) is provided by Bias in the 2008 Beijing Olympics Gymnastics (Andrew Duong, 2008) and NBA Conspiracy Theory (Irvin Liu, 2008), each repeating previous published analyses in different years. The truth behind PGA tour players' scores (Dong Kyun Kim and Sukhyun Park and Ilsung Lee, 2016) studies which particular statistical factors of golfing ability are most important.

Stock market behavior is another focus of the course. Unconventional Success Analysis: Asset Allocation after the Financial Crisis (Robert McIlhatton, 2017) studies the post-publication performance of the asset allocations suggested in David Swesen's 2005 book. The link Design and analysis of an online investment game (Dodo Qian and Yixin Shen, 2017) goes to an actual game you can play: can you find the 1 in 25 funds that will beat the market in the long term? Time-variant CAPM: learning about factor loading (Jiho Han, 2006) describes a variant of recent sophisticated models and assesses its realism.

Predicting box office success: do critical reviews really matter (Alec Kennedy, 2008) examines the connection, highlighting a difference between limited release and wide release movies. Predicting the success of Kickstarter campaigns (Haochen Zhou, 2017) studies which factors are most sugnificant.

Evaluating the Accuracy of Public Opinion Polls with California Propositions (Erika Oblea, 2011) compares their accuracy to the extent of internet coverage. Predicting Stock Market Fluctuations from Twitter (Sang Chung and Sandy Liu, 2011) may not help with actual prediction but studies the correlation between stock fluctuations and twitter activity. Insurance Probability: Comparison and Analysis of Consumer Psychology and Pricing Strategy in U.S. and China (Wenjun Dong, Wanyue Zhang and Yan Dai, 2011) seeks to assess the variables affecting premiums in the two countries. Theft Insurance in Berkeley (Christina Yang and Amy Zhang, 2011) consider whether Berkeley students' willingness to pay insurance against property loss could generate profit opportunities for insurers. Racial Discrimination in the Sharing Economy: a Study on Airbnb (Hanying Mo, 2016) studied differences between prices offered by Asian-American and White hosts. Success of Celebrity Restaurants in China (Hang Gao, 2017) shows these have not been so successful.

The breadth of possible projects is illustrated by Probability in Music: Composer Styles, Music Perception, Identification, and Generation (Xu Deng, Yufan Hu and Yuqiu Shen, 2014).

Usually projects are to be written up in "paper" style but other styles (web pages etc) are sometimes more appropriate. Imran Yousuf (2014) provided very well-prepared slides and a youtube video for his overview of Big Data with Social and Search Networks. And Data-driven Analysis of NBA Draft Adages (Canzhi Ye and Shashank Bhargava, 2016) is a nice example of a HTML write-up.