Skip to main content

Access to ECCO

The Economics Compute Cluster (ECCO) has migrated to the BioHPC environment, and accounts are not handled on this website anymore. See the BioHPC User Guide.

Latest Tweets

Error: Could not authenticate you.

Step 4 - Using the SDS

Print Friendly, PDF & Email

This page

This page describes how to use the SDS. Once the results are computed on the SDS, you will want to validate results. Please see the relevant page for that.

Next step

Important: No internet access when logged on

Please note that the Synthetic Data Server is a restricted-access server. While you can log on to the SDS from anywhere you have internet access, while you are logged on to the server,

  • you cannot transfer programs and/or data to or from the server - contact the data providers to perform these services for you. The key reason for this is that the data you are accessing is not intended for distribution. Your access to the data is limited to access while logged on to the server.
  • you cannot download programs, code, modules, packages, or auxiliary data automatically within your programs. This is true for R, SAS, Stata, Python.  Please see below how to address this for packages in R and Stata. For any other data, you will need to identify the precise source and nature of the upload, and request it be uploaded for you. The reason for this restriction is that it mirrors similar restrictions on the validation server, and enforces strong replicability. In order to validate your analysis, your analysis must be completely replicable.

Filesystem layout

The main filesystem is $HOME (7TB). Directory structure replicates the typical Census RDC node, on which both synthetic data and completed gold standard data reside:

  • /temporary/ for scratch space (for both SAS and Stata)
  • /rdcprojects/co/co00517 (for SSB)
  • /rdcprojects/tr/tr00612 (for SynLBD)


The most current data can change over time; the most reliable indicator is to inspect the data directories of each project:

  • SSB data resides in /rdcprojects/co/co00517/SSB/data/
  • SynLBD resides in /rdcprojects/tr/tr00612/data/synlbd/

Data documentation on these datasets can be found at

Additional public-use data is accessible under /data

  • Zero-obs datasets from the Census RDC are available at /data/virtualrdc/ in locations otherwise corresponding to their locations on the Census RDC, e.g./economic/cbo/microdata is where the CBO files would be on the Census RDC, and /data/virtualrdc/economic/cbo/microdata is where they are found on the SDS.
  • Cleaned and ready-to-use (generally, as SAS files) public use data for a variety of data sources can be found under /data/clean/(NAME), with accompanying documentation under /data/doc/(NAME). If you notice anything out of date, please let us know.
  • It should be noted that if you use these data in your SDS-based analysis and request validation by the data owner, you need to explicitly identify these data, as they may NOT be available at the data owner's compute server.

User-created programs

Users should create programs OUTSIDE of their home directories (see backup policy below). Create a directory for your project under

  • /rdcprojects/co/co00517/SSB/programs/users/(LOGIN ID)
  • /rdcprojects/tr/tr00612/programs/users/(LOGIN ID)

This ensures ease of validation on the Census internal computers. See below on suggested programming practices.

Statistical and other software

SDSx uses job scheduling software. If you need to learn about qsub, please see various tutorials about using qsub, including our own qsub page. However, we also have a few convenience commands (our 'q-commands' and 'i-commands'), which automatically submit jobs to the queue using the appropriate qsub commands (for general queues, and interactive queues, respectively). Available queues can be found on the SDSx queue page.

We also have a short tutorial you may want to consult.

Software Versions Commandline
(on compute node)
Qsub-aware command
qsas, isas
(use job queue 'sas')
Stata(SE, MP)
stata(-mp,-se), xstata(-mp,-se)
stata, stata/se,
qstata(-mp,-se), iStata
Compute nodes
R, Rscript
qR, iR
Compute nodes
0.98 using R 3.0.1
Compute nodes
R2013b, R2014b
qmatlab, imatlab
Compute nodes
qoctave, ioctave
On demand
On demand


ASReml(also related R package)
3.00 [01 Jan 2009]
On demand

Interactive usage of software

You will find many of these software packages available from the Gnome menu, under "Statistics". You can also launch them using the 'i-command' version (i.e., using 'iStata' to launch Stata). However, all instances run from the menu or the 'i-command' will run in the interactive queue, and are subject to limitations in terms of CPUs, memory, and runtime:

Wallclock Limit: 2 hours
Job/User Limit: 1
Memory Limit/Job: 4GB

Long-running jobs need to be submitted from the command line. The interactive versions should be considered appropriate for debugging, but not the full computational jobs.

Batch submission of software

For longer running jobs, users should use the 'q-commands'. Default runtimes, memory limits, and number of CPUs are noted on the SDSx queue page. Most 'q-commands' take "chunks" as arguments, where chunks are 2 CPUs and 8GB of memory. For differing requirements (for instance longer-running jobs), custom qsub scripts can be used, see the qsub page for more details. To monitor jobs in the queue, as well as your own jobs during processing, use qstat. A graphical utility wrapped around qstat is available.

Suggested programming practices

When validating results on the confidential data, the data custodians will use the same programs, but certain aspects of the environment will be different:

  • Exact filenames ("synlbd" vs "lbd")
  • Exact paths may change, although relative file structures are expected to be constant (by design)
  • Available add-on packages may be limited or not installed by default.

Think of the validation as a replication exercise, where your analysis is replicated by a different person in a somewhat different, constrained environment.

Paths and filenames

The most robust way to ensure ease of replication is to NEVER hard-code paths. Suggested practice is to use macro/global variables to encode such paths:





R users should check that all packages that are required to run the program are available on the CRAN mirror, in the appropriate versions (versions change quickly; this can be checked with sessionInfo()). We regularly add certain R packages, and mirror CRAN, but if you need anything in particular, please contact the Help Desk. Since you cannot access the internet from within the SDSx, we will need to transfer the R packages for you (or update the CRAN mirror). You should include the following code at the TOP of your code (or in a setup R script that is run before all other R code):

We occassionally mirror the RePEC repository of Stata packages to /cac/contrib/mirror/ Users can install packages by running commands such as the following: (for any package, use the first character of the name of the package in the first line). Do not assume that any package is installed in the validation environment - specify all packages explictly.

For more information, see


The SDSx cluster is configured as follows:

Names Processor Number of
Cores per
Total cores,
all nodes
Clockspeed Memory
per node
Resource set
login/head node
AMD 6380
64GB Login only
AMD 6380 2 16 96 2.5Ghz 256GB All
Total June 2014
      128   832 GB

Note: there may be limits accessing these resources. All access on SDSx is channeled through queues, see the queue configuration page for more details.


Due to the restricted-access nature of the server, we provide backup of critical files. However, we do not back up all files on the system, so in order to ensure that your critical programs get backed up, please note the following backup policy:

  • Files in your home directory (/home/(userid)) (and your desktop) are NOT backed up.
  • Files under /rdcprojects/co/co00517 and /rdcprojects/tr/tr00612 are generally backed up, but user-created data files (in the user/ directories) may be excluded in the future.
  • User-created programs under /rdcprojects/{co,tr}/{co00517,tr00612}/.../programs/usersare ALWAYS backed up.
  • Files in the scratch space are never backed up, and are regularly removed to efficiently manage space.

Keeping informed

You will be notified by the Cornell Center for Advanced Computing (CAC) of any downtimes of the SDS cluster. You can unsubscribe from CAC's mailing list by closing your account on SDS.

For updates on data, you might receive email from an  announcement-only mailing list ( If you wish to be notified at a different email address, send an email to with the body of the message stating "subscribe virtualrdc-sds-l". To unsubscribe, send an email to with the body of the message stating "unsubscribe virtualrdc-sds-l".

Getting help

If you need further assistance, please consult our Help page on how best to direct your inquiry.