Spring 2012 Undergraduate Research Project Descriptions
All applications are due by 12pm on November 30, 2011.
Deadline has been extended to 12pm on Tuesday, December 6, 2011.
Probability, Data and Simulation
Professor David Aldous
I have a variety of projects, ranging from seeking interesting data over the web to running simulations of stochastic processes. See http://www.stat.berkeley.edu/~aldous/Research/Ugrad/ugrad_res.html for explicit suggestions and examples of previous projects, and see http://www.stat.berkeley.edu/~aldous/157/topics.html for implicit suggestions. I am happy to meet with students and discuss which projects might fit their own interests and expertise.
Evaluation of Similarity Metrics (SimBench)
Professor Jim Pitman
This project aims to build a test collection and to develop a test methodology that can be used to evaluate similarity measures for the purpose of Scientific Recommender Systems.
The student’s tasks would be to support the team in:
1. Verify existing and implement new similarity metrics (citation- and text-based)
2. Assist the team in improving the evaluation methodology
3. Running some statistical analysis on the results
The student is expected to work approx. 10 hours per week. Good statistical knowledge and some programming experience in Java would be beneficial.
Evaluation of meta data extraction tools
Professor Jim Pitman
As part of our ongoing research we are building a database for storing bibliographic data of scientific documents. These documents are usually available in PDF format. To be able to add these documents to the database, tools to extract the meta data from PDF documents are required. As part of this project, a framework for PDF meta data extraction tools was developed that makes these tools interchangeable. A first oversight of available tools revealed different strengths and weaknesses of all of them.
The students first task will be the evaluation of available tools on a larger data set. That requires:
- choosing and importing a test collection
The results should then guide the further implementation of meta data extraction tools in the project environment.
The student is expected to work approx. 10 hours per week. Some statistic skills and knowledge of the Java programming language are required.
Precinct-Based Voting Systems and Voter Anonymity
Professor Philip B. Stark
Precinct-count optical scan (PCOS) voting systems, which are used in Alameda County, Contra Costa County, Marin County, and many other California counties, scan ballots cast by voters in polling places. These hand-marked ballots drop through the scanner into a ballot box.
It is widely claimed that as the ballots fall into the ballot box, they are in effect shuffled, which protects voter anonymity: Even if you observed the order in which people voted, you would not be able to tell which ballot was cast by which voter once the ballots "mix" in the ballot box.
Is this true?
This project will involve visiting the offices of the Registrar of Voters of three Bay-Area counties that use PCOS systems made by three different voting equipment vendors. We will conduct an experiment: feed numbered ballots through PCOS systems and measure how well the systems shuffle ballots, using statistical tests. For instance, we will estimate the average fraction of ballots that end up in their original order, for each vendor's equipment.
Touchscreen voting systems deployed in California "shuffle" the order of electronic records of voters' selections relative to the paper print-out of the voters' selections, again with the purported goal of protecting voter anonymity. How well does the software shuffle the electronic records? Does it use a random permutation? Does it use a pseudo-random number generator? If so, is it a good one? If so, how is the seed selected? Is the shuffling deterministic? Can the shuffling be reversed? *If* we can get official access to the systems and software, we will quantify how well these systems shuffle.
This project is time-sensitive. The data will be collected in January and February; we must submit an article describing the experiment and results by April.