Computing Resources and Tips
Git | Knitr | R | Unix | Home Page

Generally

This is a broad overview of some of principles in our group. I mantain an internal Google Doc with specific instructions on how to get setup on our servers specific to the SCF clusters. Please ask me for access to that file.

For help with parallel computing and using the cluster on SCF see their help pages http://statistics.berkeley.edu/computing (in particular the description of the cluster servers and the workshops and tutorials).

How I like my students to set up their work:
  • I like to have a project account that we can both access on the SCF rather than storing work on your local SCF machines. This makes it easier to share, create git repositories, etc. I also can ask for significant space allotments for these projects that you won't have on your account. Also after students move on, I still have the account!

  • Everyone in my group should be using GitHub for version tracking of our code and paper writting. This allows us to work on the same files and not worry about writing over each other's work (much better than Dropbox). This is primarily for text files (.tex, .Rmd, .R, .py, etc). Do not include a lot of binary files (.pdf, .rdata, etc) on git unless they are fairly 'permanent' results needed for long term (like pdfs needed by Latex paper, final versions of a paper, etc.). In particular, don't put the compiled .pdf of Latex or Knitr files or any intermediate files like .aux, .md, etc.; they should be able to be created locally and they are always getting overwritten making a mess of the git repository for no reason.

    Please continually commit your changes to git -- not just when I ask to see something. It is much better to have lots of small changes commited than a giant change every month. Please don't be worried that you will 'mess up' the repository. That's the point of git -- it's pretty hard to erase changes permanently.

  • I want my students to use Knitr to create reports, papers, and thesis. This makes sure that the R code is there for making the plots, etc. and that I can tweak the plotting code as I want. There are likely to be extensive analysis that takes a lot of time and probably the cluster to run. I don't expect that to be in an omnibus knitr file. Rather, get meaningful, compact summaries of the results of your simulations or extensive data analysis/processing that can be the starting point of the analysis you want to show in the writeup. And of course, anything that doesn't take long (e.g. simple data transformations/cleaning of the data, etc.) should be in the knitr file.

  • If you are making a report or presentation do put a copy of this compiled file, but in a different location (e.g. 'Reports' folder) so that we can go back to it for reference. This is particularly important if you show something to collaborators, and then later will be tweaking the code. We need a 'hard' copy of what you showed them -- and this also includes the important individual pictures, not just the pdf/html report. This folder can be in a Google drive folder rather than github.

  • Always close your R session at the end of each day (without saving your session). If it will take you longer than 5 minutes to get back where you were by reruning your code you should save some intermediate files (and have the corresponding code that you can run in BATCH mode to recreate those intermediate files). If you can't get back to where you were because you did something interactively, then its good to discover that right away and not after a week of work.

  • It's better to have a .txt files for your intermediate and final results, rather than .rdata files, except for rare exceptions where what you are saving is a complex R object. This will make the results more transferable and force you to create nice headers and identification for your files, rather than relying on multiple objects that link up. Some R objects are too useful to break apart (e.g. the result of lm).

I give some brief help on these tools below (more a compilation of useful tricks)

Back to top

Last updated 06/05/2015