## YU GROUP

| PEOPLE |

| RESEARCH |

| PUBLICATIONS |

 Group Memos Preparing for Conferences Getting Things Done Tips on Transitioning From Classes To Research Staying Current with RSS Computing Links Journal Submission Links LaTeX and !BibTeX Links Recommended Reading On Reading On Writing Zen and Art of Search Engines Preparing for Conferences. Your main purpose at this stage is to present results and to meet people in the field. Spending a lot of time with others from Cal is probably a bad idea. Check out the program in advance, and determine which speakers you'd like to meet and optimize your plan for attending talks. Contact people in advance to set up times to meet, and to exchange cell phone #s. Often, getting online can be difficult, so don't count on being able to email people. That's why it's useful to exchange mobile #s. Keep extra copies of your presentation on a USB drive, in case something happens to your computer or in case there's one computer used for all presentations in a session. Be sure to write notes on comments or questions you received, it's too hard to remember everything later. In the department, you may be able to get funding from the Grajski fund or from VIGRE. Additional funding sources include the GA (Graduate Assembly) and the Graduate Division. You will need to apply for these well before the conference - probably a semester before the conference. Check the application deadlines. Keep a large envelope for paperwork for reimbursable expenses (e.g. taxi receipts, hotel bills, registration receipt) -- every such piece of paper is worth money and time for you, so you don't want to lose them. Fill out your reimbursement paperwork even before the flight, just fill in the #s and receipts afterward. Write up your trip for VIGRE or Grajski ASAP. - David Purdy, Jan 2009 Getting Things Done David Allen authored a book entitled "Getting Things Done", which describes a very useful and straightforward methodology for organizing one's work and priorities. A single webpage is a poor summary for the book, but if you've read it, this page may be useful in providing reminders and further ideas. As a reminder of specific models he illustrates, here are the (1) "4 Criteria Model for Choosing Actions in the Moment" and the (2) "3-fold Model for Evaluating Daily Work": 4 Criteria Model for Choosing Actions in the Moment Context Time Available Energy Available Priority 3-fold Model for Evaluating Daily Work Doing predefined work Doing work as it shows up Defining your work Tips on Transitioning From Classes To Research I can think of two related issues about taking classes and starting research. Firstly, the most important motivation for research is interest. Classes are good resources for finding and devolping the academic interest of a young PhD student, since they provide detailed views, both historical and technical, of certain fields. So I think what should we get from classes before research, is our interests. I also think when one does not know what is his/her research interest, one had better sit in different kind of classes. Secondly, some special classes may lead a student to the environment of reserach. As my own experience, I did most of my classes by just grabbing what teachers said in lectures, working on the homework, and then tring my best to get high scores in the finals. Well, as it comes to research, a class would serve us better if it leads to more advanced topics in the area, telling us the ongoing research and challenging problems, then we gain more idea about this field by giving a reading presentation or term project. These assignments are more useful than in class finals in term of leading the students into research, since we practiced basic skills including learning, looking for references, communication, and giving talks. - Jing Lei, Jan 2007 Staying Current with RSS RSS is a relatively new technology for syndication. It is frequently used as a method for keeping updated on new postings to websites such as: blogs, journals, and traditional news. I liken it to a traditional newswire. The primary benefit is that with a proper RSS client, you can aggregate news from many different sources in one place, without having to manually check each individual source. Clients are available for both Windows and Mac OS X, and there are web-based ones as well. I recommend Google Reader because it is web-based and you can access it from anywhere. Of course, you have to be careful with RSS. It can be like drinking water from a fire hydrant, and you may find it slowing you down if you spend too much time manually sifting. This is where automated filtering is beneficial. I've been keeping aware of new publications and research by subscribing to RSS feeds from arXiv. Many researchers frequently post preprints to arXiv prior or at the same time as submission to a proper journal. In addition, all IMS Journal articles get automatically posted to arXiv. The most interesting arXiv sections to me are math.ST (statistics) and math.PR (probability). Here are the URLs for the respective RSS feeds: math.ST math.PR The more traditional publishers, such as Blackwell and Elsevier also provide RSS feeds for new publications. Ingenta Connect carries ASA journals and also provides RSS feeds. However, they are pretty minimal compared to the arXiv feeds. The greatest weakness is that an abstract is not provided. In any event, here are links to RSS feeds for some other journals that I pay attention to: Journal of the American Statistical Association Computational and Graphical Statistics Technometrics - Vince Vu, Feb 2007 Computing Links Software UC Berkeley IST Software Central has commercial software that can be downloaded by on-campus individuals. TextMate may be the greatest text editor ever. UC Berkeley has a campus-wide license so it is free for us. General Programming Optimizing Machine Learning Programs: A blog entry written by John Langford on high-level optimization in programming. Software carpentry: "an intensive introduction to basic software development practices for scientists and engineers that can reduce the time they spend programming by 20-25%". R CRAN - where R can be downloaded Table of Contents for R News. They have nice short articles related to using R. Uwe Ligges and John Fox. R help desk: How can i avoid this loop or make it faster? R News, 1(1):46-50, May 2008. Paul Murrell. Fonts, lines, and transparency in R graphics. R News, 4(2):5-9, September 2004. Mac OS X Application guide for those who have switched from Windows to Mac OS X Perl O'Reilly Perl Bookshelf In general proquest is a great resource for ebooks. UC Berkeley has a subscription, so it's all free to us. Journal Submission Links Institute of Mathematical Statistics (IMS): LaTeX support Annals of Statistics: preparation, submission Annals of Applied Statistics: submission Statistical Science: preparation, submission American Statistical Association (ASA): style, LaTeX arXiv: registration, submission - Vince Vu, Apr 2007 LaTeX and !BibTeX Links TeX FAQ latexdiff generates a !LaTeX document which shows differences between two--useful for tracking revisions. David highly recommends LyX - a system for creating documents. It has its own format, but converts to !LaTeX for compilation into PDF and other formats. It also has a nice set of Beamer examples. David Rosenberg demonstrated extraordinary proficiency in being able to type lecture notes in real time, which sold me on using !LyX. !Beamer - a useful !LaTeX package for creating presentations !BibTeX HOWTO The natbib !LaTeX package is wonderful because it allows you to have fancier citation styles. One really useful style is the (Author Year) style. Here is a reference. - Vince Vu, Apr 2007 Recommended Reading Theoretical: A general remark is that our (geometric) intution is fine for 3-dimensions and lower. It generally fails for higher dimensions. Shrinkage: The classic paper is actually James & Stein (1961), however, no electronic version is available. This paper summarizes the results from the orginal paper and adds more. The "Stein Effect" may be one of the most important discoveries if the last half-century, because it provides a simple example where our low-dimensional intuition breaks down in high dimensions. Charles M. Stein, Estimation of the mean of a multivariate normal distribution, The Annals of Statistics 9 (1981), no. 6, 1135–1151. Emprical Bayes: The Stein's discovery and the James-Stein estimator was probably inspired by Herbert Robbins' work on compound decision theory and emprical Bayes. Compound decision theory has to do with finding optimal decision rules for situations where you have several decision problems to consider simultaneously, e.g. estimating several parameters simultaneously. Empirical Bayes has a similar theory, but the formulation is quite different: the parameters are assumed to be distributed according to some unknown prior distribution. The problem is to approximate the Bayes rule when the prior is known. This is *very* related to the _unknown regularization parameter_ problem. Neyman called Empirical Bayes a "breakthrough." Robbins' survey paper is a classic. Herbert Robbins, The empirical bayes approach to statistical decision problems, The Annals of Mathematical Statistics 35 (1964), no. 1, 1–20. Concentration of Measure: Hoeffding's Inequality is part of a more general phenomenon known as "concentration of measure." The phenomenon was first discovered by Paul LÈvy, and exhibited in the volume of the n-dimensional sphere. In that case, it roughly says that if you look at a subset of the sphere with 1/2 of the volume, then most of the other points are near this subset. Our low-dimensional intuition fails here too. Michel Talagrand wrote an article in Annals of Probability, which is more or less a survey, on why/how exponential inequalities arise from product measure spaces. It is worth skimming. Michel Talagrand, A new look at independence, The Annals of Probability 24 (1996), no. 1, 1–34. Grad school in general: Getting What You Came For, by Robert Peters; Advice for a Young Investigator, by Santiago Ramon y Cajal. Visual Analytics: Wilkinson, Anand, and Grossman, Graph-Theoretic Scagnostics, IEEE Symposium on Information Visualization 2005. - Vince Vu, David Purdy, Mar 2007 On Reading When reading an academic paper there are several questions to keep in mind: What is the paper about? Why is it interesting? How? What have we learned? Tips Don't expect to be able to read quickly initially. As you become more familiar with a particular area/problem, you will find yourself more able to place a paper within a context and discriminate its features. The link that Guillherme provided on reviewing papers seems applicable to reading as well. - Guilherme Roche, Vince Vu, Feb 2007. On Writing When you are starting to draft a paper you should follow these helpful suggestions. Define the notation that you will use so that you can change it easily. At the beginning of the document type something like \newcommand{\X}{\ensuremath{{X}}} Now, whenever you type $\X$ it will appear as \ensuremath{{X}}. This is especially useful when you might change your mind about what subscripts and superscripts you want to use. Put in Citations as you go, it is easy and fresh in your mind. If you don't this could easily take an extra day at the end. Save backups. You might save the first one as ConsistencyA. Then, the second as ConsistencyB. And so on. Always capitalize Section or Chapter. For example, "This proof refers to the inequality in Section 2." Read some of David Pollard's work for quintessential examples in style. Notice: the order of thoughts makes reading easy; the statement of theorems is succinct; when the math centered and when is it not. John Hartigan also has a clean writing style. You can find links to some of his papers here. Pay attention to how things look. For example, should you use \frac{}{} or should you use /? Be aware that \frac{}{} makes the font much smaller. If your proof is not the main point, it should be put at the end of the paper. For backing up, it's easiest to use a version control system, just as one would do with code. This way, you can also fork your papers and presentations - it's helpful to be able to create a short set of slides or a brief report out of a longer one. The version control system I use is Mercurial (aka Hg). Each section should have a cap section to give a summary of the section and an ending section to connect to the next section. Each symbol/notation should be introduced or defined before being used. Index should have its range clearly defined. Equations are sentences so need to have commas or periods at the ends. Do not use the same symbol for two different variables in the same paper. Express things as precisely as you can and imagine somone you know as a reader and ask whether the writing is clear for that reader. When you have a choice between saying a statistical approach and a hierarchical Bayesian approach, use the more specific one. But don't cram too much into one sentence. Each sentence should express only one idea. If another piece of information is also important to mention, it probably deserves another sentence. Writing a good introduction William Strunk Jr. and E.B. White, The Elements of Style David Pollard's writing is clean and to the point. He is a good example. Convergence of Stochastic Processes, Strong Consistency of K-means Clustering - David Purdy, Karl Rohe, Yueqing Wang, Apr 2011. Lecture notes on mathematical writing based on a course of the same name given at Stanford University during autumn quarter, 1987, by Donald E. Knuth, Tracy Larrabee, and Paul M. Roberts. The course description (CS 209): Issues of technical writing and the effective presentation of mathematics and computer science; preparation of theses, papers, books, and "literate" computer programs. - Yueqing Wang, Mar 2012. Zen and Art of Search Engines When trying to find information on the web, there's a lot of different things to get your head around. There are a lot of other guides out there that teach how to use search engines, various directories, and so on. I'd like to share the techniques and thinking that I use. First, I should explain how these ideas and techniques evolved. One of the roles that I had at AltaVista, an early search engine acquired by Yahoo, was to design a set of tests for evaluating the quality of our search results. There are a lot of ideas for metrics, but the most important thing is that any proposed metric be well-defined and repeatable. Otherwise, it's subject to variations in human interpretation or lacks technical and scientific validity. The most basic kind of quality measure was for so-called "Navigational" queries: these are queries where the user intends to go directly from the search engine to the one page on the internet that all would agree *authoritatively* answers their question. A simple example is that [Stanford] should produce www.stanford.edu, but what about [IBM]? www.ibm.com? What if you're a user in Japan? www.ibm.co.jp? In fact, that currently redirects to www.ibm.com/jp/. So, it gets tricky fast. A precise set of definitions and exhaustive examples takes about a dozen pages, and I won't address that now. The key things I learned were: Interpreting a query Deciding if an authoritative page should exist Deciding if an authoritative page does exist Identifying which page is authoritative Just focusing on the first three should help you get to where you're going. Before I begin, though, I want to emphasize that the search engine is a like a librarian who doesn't have time to help you find the right resource. If you walk up to such a librarian and express an interest in [windows], they may send you to the computer section, when you're really thinking of architecture. If you wander over and discover it's the wrong section, you should think about what they sent you to, and if there's a way to more precisely emphasize your interest and rule out other interpretations. Often, you can see how it's interpreted by the words that appear on the search results page. However, it is also possible that you need to think of other words that could appear on the page you believe should exist and that you want to go to. - David Purdy.