Skip to main content

Where is the Social Science Gateway?

The Social Science Gateway (SSG) grant has ended, please read here about ongoing availability of resources created as part of that project.

We support:

APDU Logo foas-logo-small (2)

Step 4: Using ECCO


First, if you have not already done so, verify that you have changed your original password, as per Step 2. You only have to log onto the Social Science Gateway using SSH once in order to change your password. Subsequent connections can be made using SSH or NX.

SSH vs. NX

Although you can use SSH with X11 from your university desktop, the latency will not be great, especially for the SAS or Stata GUI. For that reason, we suggest using NX.

  • Launch the web interface to NX for easy access (requires Java, will not work for OSX 10.8+. For permission issues with Java, see this page).

While the web interface will serve most people's initial needs, you may want to leverage a larger screen on your computer, or encounter restrictions with some graphical applications (SAS is a notorious culprit). If that is the case, proceed with the full client install for NX. The Nomachine NX Player is free to use.

Hardware

The cluster is currently being expanded. It consists of a head-node and x compute nodes running CentOS6. You access these through the job scheduler, by specifying appropriate resources. Not all resources may be available to you.

Names Processor Number of
processors
Cores per
processor
Total cores,
all nodes
Clockspeed Memory
per node
Resource set
compute-0-0
Xeon E7330
4
4
16
2.4Ghz
128GB Interactive jobs
compute-0-{1-3}
Xeon E5530
2
4
24
2.40Ghz
144GB or 212GB All
compute-0-4
Xeon E7-8837
4
8
32
2.67Ghz
1TB All
compute-0-{5-8}
AMD 6380 2 16 128 2.5Ghz 256GB All

Compute-0-8: Knitro

Total Nov 2013
      44 628GB
Total Dec 2013
      204   2,676 GB

Filesystem layout

The two main filesystems are

  • /home (5.4TB)  is shared across all nodes, albeit using NFS over gigabit. /home houses two directory structures:
    • $HOME - your home directory. You cannot share files from your home directory with other people. We enforce a policy of deleting files older than 4 weeks in your home directories "Trash" folder. Thus, files older than 4 weeks cannot be "undeleted".
    • /ssgprojects/projectXXXX, where (shareable) project spaces live. The exact number is related to your primary user group (something like lv39_XXXX) and was communicated to you in the initial welcome message. You may be a member of multiple groups.
  • /scratch (also aliased (symlinked) as /temporary for compatibility with the Census RDC). /scratch is local to each node, including the headnode, and sizes vary between 4.4TB (headnode) and 5.2TB-7TB (compute nodes). We enforce a policy of deleting files older than 4 weeks in /scratch.
  • /data houses some (but not all) common public-use datasets, typically in SAS format. Prominent among these are the QWI (in /data/clean/qwipu/) and OnTheMap (/data/clean/onthemap). Other examples are public-use QCEW data or County Business Patterns (CBP).
  • The Census-RDC-emulating structure ("zero-obs datasets") is available under /data/virtualrdc (and on the VirtualRDC Vnode, SOON)

Statistical software

Software Versions Commandline
(on compute node)
Qsub-aware command
(head-node)
Availability
9.3
sas, sas93
qsas, isas
Compute nodes
Stata (SE, MP)
13.0
stata(-mp,-se), xstata(-mp,-se)
qstata(-mp,-se)
Compute nodes
3.0.1
R, Rscript
qR, iR, iRstudio
Compute nodes
ASReml (also related R package)
3.00 [01 Jan 2009]
asreml
 iasreml
Compute nodes
Matlab
R2013b (8.2.0.701)
matlab, matlab-R2013b
qmatlab, imatlab
Compute nodes
3.4.3
octave
qoctave, ioctave
Compute nodes
6.4.2
grass
Compute nodes
Knitro
9.0.1-z
(callable from Matlab)
compute-0-8
21.0.0
spss

Most software is available only on the compute nodes. They are not available on the head node. If a 'q' command exists, you can submit jobs (from a command line window) from the head node directly for batch processing.

For interactive processing, you will need to

  • open an interactive QSUB session ('iqsub' or menu item), and launch the software from there
  • use the convenience 'iXXX' commands listed above from the command line window.
  • using desktop menu entries under "Other" or "VirtualRDC" in the Gnome/KDE desktops accessed through NX.

Job scheduler

ECCO uses Maui+Torque for its job scheduling systems. Generic information can be found at http://www.clusterresources.com/torquedocs21/commands/qsub.shtml.

Queues

ECCO uses a number of queues that allow for efficient scheduling of jobs, combined with access rights to those queues. Queues impose limits on the type of jobs that are submitted. The currently set up queues can be found on the "ECCO Queues" page. (As of Nov 26, 2013, no access restrictions are imposed. This will be updated as those are imposed.)

Interactive jobs

In order to launch interactive jobs, launch "Interactive Session" (you will find it under Applications -> Statistics). Interactive SAS and R sessions are also available there. Other software packages (in particular Stata) need to be manually launched from an Interactive Session using 'xstata', 'xstata-se' or 'xstata-mp'. You can also launch an interactive session by opening a terminal, and typing 'iqsub'.

Advanced interactive jobs

At present, all nodes have licenses to SAS and Stata, and R is also installed on all nodes. If in the future software is available only on a specific node, an attribute will be set allowing to transparently request nodes with that software. If for data processing purposes, you have local data on a specific node, you can request that particular node by opening a terminal session (you will find it under Applications -> Accessories -> Terminal), with the

  qsub -I -q premium -l nodes=compute-0-3

Interactive jobs are restricted to run for one hour.

Initial configuration for improved monitoring

The job scheduler can send you email messages at certain points in the job submission and execution process. Our scripts are configured to do so when the job ends, and if it gets aborted. To ensure that you receive those messages, do the following steps. They only need to be done once, ideally after the first login:

  1. Open a terminal shell (if logged in via NX; not necessary if logged in through SSH)
  2. Type the following commands:
    echo "myfavorite.email@somewhere.com" > .forward

Easy batch submission

In order to run longer jobs, you will want to submit batch jobs. There are pre-defined short-cuts for some standard statistical software packages. Typically, they will allow some minimal customization (amount of CPU and memory used), simply typing the command without any arguments will give a help text.

  • Submit a qsub SAS job:
    qsas program.sas
  • Submit a qsub Stata job (this does NOT work for Xstata):
    qstata program.do
    qstata-se program.do
    qstata-mp program.do

    (note: qstata-mp forces the use of 8 processors, which may delay the scheduling of your program if there is high node usage)

  • Submit a qsub R job:
    qR program.R

    (note: this will write statistical output to program.output and errors to program.log)

  • Submit a qsub Matlab job:
    qmatlab program.m

    (note: this will write output to program.log)

  • Submitting a qsub octave, or asreml job: Please see Section on complex batch submission, below. Note that for Matlab, you will need to specify
    matlab -nodesktop -nosplash -r foo
    or
    matlab -nodesktop -nosplash < foo.m

    where foo.m is the program containing your Matlab commands [ref]

Complex batch submission

Qsub can be more complex. For more complex job submission, you will need to create a custom qsub script. For more detailed notes, please see this page.

Monitoring your jobs

The status of the compute nodes can be monitored using the web interface available locally (when using a browser running on the headnode) at http://localhost, or from outside the ECCO at http://www.vrdc.cornell.edu/ecco-monitor/ 

The status of your own jobs can be monitored using

 qstat

which will show jobs in the queue and running, identifying the nodes they are running on:

 Job id                    Name             User            Time Use S Queue
 ------------------------- ---------------- --------------- -------- - -----
 1902.ecco                  ..._program.qsub (userid)        06:18:04 R regular

If you need to stop a job, run

 qdel (JOBID)

e.g.

  qdel 1902

Next steps

If you are running out of ECCO resources, you may want to expand to other local or national compute clusters. You may want to transfer your prepared data to the XSEDE resource of your choice. Follow Step 6.

Backup

By default, your account and data information is not backed up. Typically, you will rely on backup at your host institution, and you will synchronize or use an off-site versioning system (subversion, git) to store your programs. However, if needed, backup can be provided at an additional fee; please contact the PIs. Subversion client tools are installed.

Keeping informed

By default, we will subscribe you to a announcement-only mailing list (econ-compute-discussion-l@cornell.edu) to notify you of any important information about the server.

  • If you wish to have additional notifications about general changes or events at the VirtualRDC, or you wish to be notified at a different email address, consult our Mailing list page.

Tips and tricks

If you prefer a more informative command line, you might want to adjust your shell commands and your login profile. Careful: doing this wrong may result in some unwanted effects.

  • Modify your (hidden) file ".profile" or ".bash_profile" (notice the dot), using your editor of choice
    if [[ "$HOSTNAME" = "ecco.vrdc.cornell.edu" ]] 
    then
    PS1="\h \w ->"
    fi
    export PATH=$PATH:$HOME/bin/
  • In your Ghome shell/terminal, on the menu bar,
    • choose "Edit -> Profile Preferences -> Title and Command"
    • check the box "Run a custom command..." and enter "/bin/bash -l" in the field.

Getting assistance

If you need further assistance, please consult our Help page on how best to direct your inquiry.