Undergraduate Research Projects - Archive

This is an archival record of students projects, mostly proposed by the students themselves, that I advised during 2007-2017.

A link under ``project" indicates a write-up of specifically the student's work. Where the work features as part of a broader write-up, the link is under ``description". Some projects were under the URAP and VIGRE programs. Projects directly relevant to the "Probability in the Real World" project are marked (RW).

Students Project Description
Fayd Shelley (Summer 2006); Sunny Zhao (Fall 2008) Coincidences in Wikipedia (RW) Data appears in this unfinished draft paper.
Dennis Moy (Fall 2006) [Honors thesis] A regression model using common baseball s tatistics to project offensive and defensive efficiency.
Yanjiao Cheng, Jesse Friedman, Yu-Jay Huoh, Wayne Lee, Harrison Liu (Spring 2007) [URAP and VIGRE] Statistics of road networks Data collection for Figure 1 of this paper.
Tamar Lando (Spring 2008) [Masters thesis] Efficient Networks and Enumerations on Forests Part appears as part of this joint paper.
Julian Shun (Spring 2008) Optimal spatial networks Simulations, forming a substantial part of our joint paper.
Robert Huang (Spring 2008) [VIGRE] Exploratory data analysis of amazon.com book review data.
Eric Chao and Regina Wu (Spring 2009) [URAP] This and the next are continuations of the same project.
Timothy Wong (Spring 2009) [Honors thesis] Exploratory Data Analysis of Amazon.com Book Reviews
Amy Huang and Irvin Liu (Spring 2009) [URAP] References to chance in blogs (RW) What type of things do "ordinary people" attribute to chance? One way to study this is to search through blogs.
Amy Huang and Irvin Liu (Spring 2009) The 1.4 trillion dollar project (RW). A Google search on "1.4 trillion dollars" gets a surprisingly large number of hits, which can be traced back to some smaller number of different appearances of "1.4 trillion dollars" in some authoritative data. The project was to count this "number of different appearances" for a variety of dollar amounts (2.8 trillion; 1.8 billion, etc) to see whether they follow a particular "informationless" distribution. URAP.
Tung Phan (Spring 2009) Benford's law. (RW) Data collection, forming a substantial part of our short joint paper When Can One Test an Explanation? Compare and Contrast Benford's Law and the Fuzzy CLT exhibiting a typical undergrad project.
Priscilla Ku and Janet Larwood (Spring 2009) 40,000 coin tosses yield ambiguous evidence for dynamical bias (RW) Testing a prediction of Persi Diaconis et al that in coin-tossing there is a small bias -- maybe 1/100 - towards the coin landing the same way as it started. URAP.
Alan Choi (Spring 2009) Statistics of road networks Data collection, forming a substantial part of our short joint paper A Route-Length Efficiency Statistic for Road Networks .
Wei Zhou and Jonathan Ong (Spring 2009) Empires and percolation . Simulations and pictures, used to complement theory in our joint paper Empires and percolation: stochastic merging of adjacent regions.
Bowei Zheng (2008-2009) Java simulations for a "parking process" The process was studied analytically in this old paper.
Tung Phan (Fall 2009) What can you predict about a team's performance next season? (RW) Quantifies the regression effect for sports teams.
Karthik Ganesan [VIGRE] (Spring 2012) Empirical Study on Route-Length Efficiency of Road Networks Data collection for route-length efficiency of road networks. Graphics used in this talk
Hyerim Hong [Independent study] (Spring 2012) Perception on role of chance in different aspects of life Via a survey
Bowen Huang [VIGRE] (Spring 2012) City Growth Model Simulation Here is a slightly complicated model for city growth in which cities have positions, populations and spheres of influence. It's not hard to simulate the process, but I want some pretty pictures of the spheres of influence.
Willy Lai [VIGRE] (Spring 2012) Fitting power-law distributions to data Testing data for fit to power-law distributions. e.g. this data on family names.
Russell Mays [volunteer] (Spring 2012) Road route networks linking 4 addresses Data on real road networks used in this article.
Max Moacanin [volunteer] (Spring 2012) Lucky vs Unlucky teams Assuming gambling odds give true probabilities, one can classify a team as having been lucky or unlucky so far. Do results of matches between lucky and unlucky teams fit the gambling odds?
Selene Xu [Independent study] (Spring 2012) Study of Auction Theory in eBay Data Collecting and studying data about auction prices.
Amy Zhang [honors thesis] (Spring 2012) Pairs trading A simulation study to explore possible relationship and connection between profit and different variables associated with stock selections in pairs trading.
Yiming Zhou [Independent study] (Spring 2012) Spatial Poisson processes Draft of possible Wikipedia article
Xiaoyu (Lily) Wang [Volunteer] (Summer 2012) Design of simulation of efficient road networks Continuing the theme of heuristic algorithms in this paper to study models with junctions.
Jian Li [Volunteer] (Summer 2012) Dynamic random Gabriel networks Computer simulations and graphics.
Morgan Thompson [graduate volunteer] (Fall 2011 - Summer 2012) Data on dust-to-dust models Producing data used in Chapter 11 of Draft write-up of 13 lectures.

Karthik Ganesan [Independent study] (Summer-Fall 2012) math models of road networks Graphics and simulation data for the "binary hierarchy" model; used in this paper
Bowen Huang [Volunteer] (Summer-Fall 2012) Simulations of a model for city growth Graphics and simulation data appear in this paper.
Xiaoyu (Lily) Wang [Independent study] (Fall 2012) Dynamic Gabriel graphs The file of dynamic simulations of a network model is too large to show, but here is a static snapshot.
Weijian (James) Han [work-study] (Fall 2012) References to chance in micro-blogs Examples/analysis posted here.
Weijian (James) Han [work-study] (Fall 2012) Simulations of the Waves in long lines model Brief write-up of model description, simulation results and math conjectures.
Weijian (James) Han [work-study] (Fall 2012) Distribution of Losses Due to Structural Fires Analysis of data from Berkeley CA.
Weijian (James) Han [work-study] (Spring 2013) Simulation of a model of Random Particle Motion related to this model. Description on this page (applet doesn't work on Macs).
Weijian (James) Han [work-study] (Spring 2013) Simulation of multilevel Dyson's Brownian motion as studied by Mykhaylo Shkolnikov. Description on this page (applet doesn't work on Macs).
Max Moacanin [volunteer] (Spring 2013) Simulation data from the iPod process where favorites are played.
Wen Liang [senior thesis] (Spring 2013) Life Expectancy Index model and Risk Management. Understanding what J.P.Morgan's LifeMetrics does.
Bonghyun Kim [Independent study] (Spring 2013) Event Dispersal Simulation Simulating the dispersal of Facebook Events.
Misha Jhaveri [Independent study] (Spring 2013) Investigating the game of Hangman Progress report
MoonSoo Choi [Independent study] (Spring 2013) Statistical Analysis of Nuel Tournaments. These are N-person duels. Here is the Cal Day poster.
Seungjun Lee and Mingu Jo [Independent study] (Spring 2013) Simulation of Interactions on Campus Data and modeling of the chance of meeting different students while walking on campus. Note a nice example of a poster for Cal Day.
Wenyu Zhang [Independent study] (Spring 2013) Trends in Iphone 5 Sales on Ebay Collection and analysis of data.
Yu Haihan (Mark) [Independent study] (Spring 2013) Expanding Civilizations and the Fermi Paradox Simulation of a model.
Weijian (James) Han [work-study] (Fall 2013) Simulating a greedy tree Part of ongoing theory research project
Yee Tung (Alice) Man [Independent study] (Fall 2013) Spatial network simulation Java simulations.
Yuan He [Independent study] (Fall 2013) Predicting market value of soccer players How well can their market value be predicted from available quantitative data?
Chan Ik Jang and Kody Law [Independent study] (Fall 2013) The Relationship between Intellectual Property Infringement and Economic Indicators. Conventional wisdom meets empirical data.
Max Moacanin [volunteer] (Fall 2013) Simulation data for the iPod process Working with Dan Lanoue.
Zhijun (Steven) Yang [volunteer] (Fall 2013) Simulation of Brownian motion
Zhijun (Steven) Yang [volunteer] (Fall 2013) Computational analysis in risk and profit problem
Zhijun (Steven) Yang [volunteer] (Fall 2013) Geometric Brownian Motion Model in Financial Market.
Zhijun (Steven) Yang [volunteer] (Fall 2013) Escaping Time and Particle Collision Modelling and Simulation
Yijia Mao [Independent study] (Fall 2013) Risk of alcohol and caffeine Report on the scientific literature.
Weijian (James) Han [work-study] (Spring 2014) Simulations of the Compulsive Gambler process.
Jane Wenjin Liang [Independent study] (Spring 2014) Generic vs Brand Name Food Packaging.
Teeranan (Ben) Pokaprakarn [Volunteer] (Spring 2014) Anti-Streaky behavior in Currency Markets.
Sida Ye [Independent study] (Spring 2014) Sentiment analysis for iPad 2 and linear modeling. Cal Day poster
Frances Chen [Volunteer] (Summer 2014) Spatial networks. Continuing the theme of heuristic algorithms in this paper to study models with junctions.
Yijia Mao [Volunteer] (Summer 2014) Credit Risks in Daily Lives and Bond Rating Analysis. Brief literature surveys.
Weijian (James) Han [Volunteer] (Summer 2014) Numerical calculations related to this project.
Weijian (James) Han [Volunteer] (Summer 2014) Data collection, computing and graphics for the paper ........... Nash equilibria in an online game.
Tiffany Chang [Volunteer]. Selecting Energy Efficiency Indicators for Sustainable Development using Regression Analysis. Proposal.
Jiangzhen (Jane) Yu [Volunteer] (Summer 2014). Random Eulerian circuits. Simulations in this published open problem
Jiangzhen (Jane) Yu [Volunteer] (Fall 2014) Mixing cards in a box -- experiments and analysis.
Weijian (James) Han [Work-study] (Fall 2014) Simulations and inference for the basic model of league sports.
Vishal Kalyanasundaram [independent study] (Fall 2014) Comparing tech company valuations during the dot-com bubble and how to predict a boom and bust. Proposal
TianTian Li [Volunteer] (Fall 2014) Quantitative Investment Research Based on Merrill Lynch Investment Cycle
Jian Wang [Honors Thesis] (Fall 2014) Joint Model for Exchange Rate Dynamics and Influence of Presetting Correlation between Stock Price and Exchange Market
EuiSeok Kim [Honors Thesis, Math Dept] (Fall 2014) Mathematical model for cost-efficient installation of public transportation system
Yongtae Lee [Independent study] (Fall 2014). Correlation between Nationalism and Social Welfare. Proposal.
Michael Preischl [Masters Thesis, TU Munchen] (Spring 2015) Average Percolation .
Yuanhao (Stanley) Yang [Honors thesis, Statistics] (Spring 2015) Predicting Regular Season Results of NBA Teams Based on Regression Analysis of Common Basketball Statistics.
Yui (Hayden) Sheung [Independent study] (Spring 2015) An Improved Fantasy Basketball Prediction Tool A web site where you can rate NBA players using your own formula (points for blocks, steals, assists etc).
Wenjie (Wendy) Xu [Honors thesis, Statistics] (Spring 2015) Genetic Algorithms and Poker Rule Induction Cal Day poster
Cangao (Steven) Chu [Independent study] (Spring 2015) Predicting Cardiovascular Disease Based on Regression Analysis and Classification of NHANE Survey Statistics
Vishal Kalyanasundaram [Volunteer] (Spring 2015) The Effect of VC Tweets on the Stock Market.
Harish Kumar Palaniswamy [independent study] (Spring 2015) Exploratory Data Analysis of Enron Emails.
Jennifer Alch [Independent study] (Summer 2015) Improving Cognitive State Detection Using Supervised Learning Proposal
YinYin Lu [Volunteer] (Summer 2015) Predicting Concrete Compressive Strength Using Supervised Learning Proposal
Yilian Zhang [Peking University] [Volunteer] (Summer 2015) Simulations related to research project concerning maximum entropy martingales.
Weijian (James) Han [Honors thesis, Statistics] (Fall 2015) Analysis of the Rank-Order Mechanism for User-Generated Content through Simulations of Game Play . Development of ideas from this paper.
Andrew Soncrant [Independent study] (Fall 2015) An Examination of Indicators of Postseason Success in Major League Baseball Slides of presentation
Qiujun Li [Independent study] (Fall 2015) Does the Stock Connect Program Eliminate the Price Disparities of Cross-listed Companies? Proposal
Zhenyang Zhang [Independent study] (Fall 2015) Double dummy hand evaluator Proposal
William Wang [Independent study] (Fall 2015) Birth and Assassination Process Visualization Link goes to web site.
Jerry Chen [USA Independent Study Group] (Fall 2015) Predicting Stock Prices from News Articles. Proposal
Ziao Ju [Independent study] (Fall 2015) Trends in admission of international students in U.S. universities Proposal
Tyler Thompson [Volunteer] (Fall 2015) Would the Kelley Criterion outperform Warren Buffet's 90-10 rule? Proposal
Jin Deng [Volunteer] (Fall 2015) Impact of ECB communication on high frequency response of EUR-USD exchange rate. Slides of presentation
Noah Ngyuyen [Honors thesis] (Summer 2016) Copyrights and innovation Proposal.
Qianhui Wan [Volunteer] (Summer 2016) Looking into ZIKA Data Proposal.
Paul Park [Honors thesis] (Summer 2016) Does the NFL Combine Really Matter? Proposal
Grace Deng [Volunteer] (Summer 2016) Quality and Value Monitoring of Acute Care Hospitals
Ethan Yiqing Xu [Volunteer, pre-senior-thesis] (Summer 2016) Stock price changes before earnings statments Proposal
Steven Sun [Volunteer] (Summer 2016) Physical mixing: tickets in a box.
Zhen Dai and Zhenyang Zhang [reading course] (Summer 2016) Basic Stochastic Processes by Brzezniak and Zastawniak
Sky Cao [reading course] (Summer 2016) Group representations in probability and statistics by Diaconis.
Qianhui Wan [Volunteer] (Fall 2016) Looking into ZIKA Data Proposal
Junyuan Gao [Independent Study course] (Fall 2016) Predicting Premier League Final Points and Rank Using Linear Modeling Techniques. Proposal
Xiaochen Yang [Independent Study course] (Fall 2016) Predictive Power of Elo Rating Systems and Markovian System on Association Football Proposal
Jassem Shukr [Independent Study course] (Fall 2016) Is there a correlation between NBA teams' financial performance and on-court performance? Proposal
Nura Kawa [Independent Study course] (Fall 2016) Text Classification of BBC News Dataset is report, and here are code, data, and images. Proposal
Grace Deng [Honors thesis] (Fall 2016) Analyzing the risk of mortgage default Proposal
Ethan Yiqing Xu [Honors thesis] (Fall 2016) Analysis of stock price fluctuations before earnings statements. Proposal
Shichang Zhang [Honors thesis] (Fall 2016) Simulation of Birth-and-Assassination Process Proposal
Binglin Song [Reading course] (Fall 2016) Stroock Probability theory: An analytic view
Shirley Danlei Zhu [reading course] (Spring 2017) Information, physics, and computation by Montanari and Mezard.
Shirley Danlei Zhu (Spring 2017) This simulation of a coalescing regions process studied theoretically in this paper.
Charles Zhu [Senior thesis] (Spring 2017) Effect of implementing a defined contribution plan on cost for public employees Proposal
Xiaoyu Hu [Independent study] (Spring 2017) Predicting Domestic Gross of Movies . Proposal
Zihao Yang [reading course] (Spring 2017) Probability and Computing by Mitzenmacher and Upfal.
Ye Jin Choi [reading course] (Spring 2017) Probability and Random Processes by Grimmett and Stirzaker.
Divyansh Agarwal and team [Volunteer] (Summer 2017) Factors which influence Kiva loans. Proposal
Francesco Piva (Summer 2017) Earthquake statistics: comparing Italy and California
Luhuan Wu [Volunteer] (Summer 2017) Finding schemes for arranging large tournaments based on probability models Proposal
Jin Kweon [Volunteer] (Summer 2017) Which club is the most efficient? Proposal
Haoyu Wang [Independent Study] (Summer 2017) Simulation of Spatial Network Model of City Growth and Formation. Proposal
Winne Luo [Volunteer] (Summer 2017) Predicting Pertussis Proposal
Qing Guo [Independent Study] (Summer 2017) Predictions of Stock Price Trends Proposal
Haoyu Wang [Independent Study] (Fall 2017) Better bounds for the worst case of the stretch-length tradeoff in geometric networks

General styles of possible future projects

These are undergraduate in the sense of not requiring research-level mathematical probability knowledge (see Open Problems for ones which do). On the other hand they are intended to be serious in that successful results will often become a small part of some future published scholarly paper. For further suggestions from my undergraduate course click here.

Data collection

Simulation - drawing static pictures

In fact all of these would be even nicer as moving pictures, below.

Simulation - drawing moving pictures

The processes below are very easy to simulate, but I want "moving pictures", in Java or whatever, to illustrate their dynamics.

Inventing good heuristic algorithms and writing code

Physical experiments

Statistical analysis

Math theory

I don't encourage undergrads to attempt to do new math research. But a good project is on some topic you have learned from a course or a textbook.