Skip to main content

Latest Tweets

Some thoughts on improved performance of an analysis job on a cluster like ECCO or SDSx

Print Friendly, PDF & Email

Here are some tips (some specific to ECCO/SDSx, some general) about improving performance of a typical social science analysis job:

  1. you are writing out a lot of files to the home directory. To the extent that you do not CARE about these files other than to read them back in at some point in time, i.e., they are part of your iteration, you might consider writing them to $SCRATCH/(your id) (on ECCO, this stands for /scratch - may vary on other HPC systems) instead of your home directory ($HOME). On ECCO and SDSx, the home directory is a shared (and in this case, slow) filesystem, whereas $SCRATCH is optimized for just that purpose: intermediate files. On ECCO, it is a node-local filesystem, and about twice as fast as $HOME, but not shared and thus not usable from other nodes - this may impact how you structure your jobs. The referenced subdirectory of $SCRATCH does not exit - you need to create it from within the qsub script, for instance by adding the line
    [[ -d $SCRATCH/$(whoami) ]] || mkdir $SCRATCH/$(whoami)
  2. By default on ECCO/SDSx, you are using stock R. If your routines use a lot of matrix operations, you might look into using the MKL/MP versions of R (the availability of these will vary across HPC systems - ask your local system administrator). These are optimized for parallel and matrix operations - speed up about 3x on those operations. Use the module command to switch versions of R (you may need to re-install your R packages - they are version specific, but that's a one-time operation).
  3. You do some loops within each job. You might want to consider using the "foreach" loop. It's one of the easiest speed-ups. See https://www.vrdc.cornell.edu/computing-for-economists/web/day2-3.html#/15. Take care to limit it to use only as many cores (minus 1) as you are requesting in your qsub, so if you are requesting 8 cores, use "registerDoMC(cores=7)" - otherwise your job gets killed.