| Table of contents |
First, if you have not already done so, verify that you have changed your original password, as per Step 2. You only have to log onto the Social Science Gateway using SSH once in order to change your password. Subsequent connections can be made using SSH or NX.
SSH vs. NX
Although you can use SSH with X11 from your university desktop, the latency will not be great, especially for the SAS or Stata GUI. For that reason, we suggest using NX.
- Launch the web interface to NX for easy access (requires Java).
While the web interface will serve most people’s initial needs, you may want to leverage a larger screen on your computer, or encounter restrictions with some graphical applications (SAS is a notorious culprit). If that is the case, proceed with the full client install for NX. The Nomachine NX client is free to use.
- If you do not already have NX installed, follow download and installation instructions at http://www.nomachine.com/documents/client/install.php. You should choose the NX client for the “platform” of your desktop computer (i.e., if you are using a Windows computer to access the SSG, you should download the Windows client).
- If using Mac OS X 10.7 or higher, you need to follow these alternate instructions.
- Once you have NX installed, you need to configure a “session”. We have prepared a pre-configured NX session config file, downloadable from here. You should unzip it. The file contained within, called ‘SSG.nxs’, should be put into the NX config location
- $HOME/.nx/config (Linux, Mac)
- C:Documents and Settings(USERNAME).nxconfig (Windows XP and previous)
- C:Users(USERNAME).nxconfig (Windows 7, maybe Vista)
- You can now open the “NX Client for Windows” (or … for Linux or … for Mac), and should see a sessions called ‘SSG’. Open that, and when prompted for a password, use your SSG password
Filesystem layout
The two main filesystems are
- /home (5.4TB) is shared across all nodes, albeit using NFS over gigabit. /home houses two directory structures:
- $HOME – your home directory. You cannot share files from your home directory with other people. Important note: as of May 1, 2011, we are enforcing a policy of deleting files older than 4 weeks in your home directories “Trash” folder. Thus, files older than 4 weeks cannot be “undeleted”.
- /ssgprojects/projectXXXX, where (shareable) project spaces live. The exact number is related to your primary user group (something like lv39_XXXX) and was communicated to you in the initial welcome message. You may be a member of multiple groups.
- /scratch (also aliased (symlinked) as /temporary for compatibility with the Census RDC). /scratch is local to each node, including the headnode, and sizes vary between 4.4TB (headnode) and 5.2TB (compute nodes). Important note: as of May 1, 2011, we are enforcing a policy of deleting files older than 4 weeks in /scratch.
- /data houses some (but not all) common public-use datasets, typically in SAS format. Prominent among these are the QWI (in /data/clean/qwipu/) and OnTheMap (/data/clean/onthemap). Other examples are public-use QCEW data or County Business Patterns (CBP).
- The Census-RDC-emulating structure (“zero-obs datasets”) is available under /data/virtualrdc.
Statistical software
| Software | Versions | Commandline (on compute node) | Availability |
|
9.2
|
sas, sas92
|
Compute nodes
|
|
|
9.3
|
sas93
|
Compute nodes
|
|
|
Stata (SE, MP)
|
12.1
|
stata(-mp,-se), xstata(-mp,-se)
|
Compute nodes
|
|
2.15.0
|
R, R-2.15, Rscript
|
Compute nodes
|
|
|
2.14.1
|
R-2.14
|
Compute nodes
|
|
|
2.13.1
|
R-2.13
|
Compute nodes
|
|
|
ASReml (also related R package)
|
3.00 [01 Jan 2009]
|
asreml
|
Compute nodes
|
| Matlab |
R2012b (8.0.0.783)
|
matlab, matlab-R2012b
|
Compute nodes
|
| Matlab |
R2012a (7.14.0.739)
|
matlab-R2012a
|
Compute nodes
|
|
3.0.5
|
octave
|
compute-0-3 only
|
|
|
20.0.0
|
spss
|
Most software is available only on the compute nodes. They are not available on the head node. For interactive processing, you will need to open an interactive QSUB session, and launch the software from there.
Job scheduler
The SSG uses Maui+Torque for its job scheduling systems. Generic information can be found at http://www.clusterresources.com/torquedocs21/commands/qsub.shtml.
Interactive jobs
In order to launch interactive jobs, launch “Interactive Session” (you will find it under Applications -> Statistics). Interactive SAS and R sessions are also available there. Other software packages (in particular Stata) need to be manually launched from an Interactive Session using ‘xstata’, ‘xstata-se’ or ‘xstata-mp’. You can also launch an interactive session by opening a terminal, and typing ‘iqsub’.
Advanced interactive jobs
At present, all nodes have licenses to SAS and Stata, and R is also installed on all nodes. If in the future software is available only on a specific node, an attribute will be set allowing to transparently request nodes with that software. If for data processing purposes, you have local data on a specific node, you can request that particular node by opening a terminal session (you will find it under Applications -> Accessories -> Terminal), with the
qsub -I -q interactive -l nodes=compute-0-3
or
iqsub 3
Interactive jobs are restricted to run for one hour.
Initial configuration for improved monitoring
The job scheduler can send you email messages at certain points in the job submission and execution process. Our scripts are configured to do so when the job ends, and if it gets aborted. To ensure that you receive those messages, do the following steps. They only need to be done once, ideally after the first login:
- Open a terminal shell (if logged in via NX; not necessary if logged in through SSH)
- Type the following commands:
echo "myfavorite.email@somewhere.com" > .forward
Easy batch submission
In order to run longer jobs, you will want to submit batch jobs. There are pre-defined short-cuts for standard SAS and Stata submissions:
- Submit a qsub SAS job:
qsas program.sas
- Submit a qsub Stata job (this does NOT work for Xstata):
qstata program.do
qstata-se program.do
qstata-mp program.do
(note: qstata-mp forces the use of 8 processors, which may delay the scheduling of your program if there is high node usage)
- Submit a qsub R job:
qR program.R
(note: this will write statistical output to program.output and errors to program.log)
- Submitting a qsub Matlab, octave, or asreml job: Please see Section on complex batch submission, below. Note that for Matlab, you will need to specify
matlab -nodesktop -nosplash -r foo or matlab -nodesktop -nosplash < foo.m
where foo.m is the program containing your Matlab commands [ref]
Complex batch submission
Qsub can be more complex. For more complex job submission, you will need to create a custom qsub script. The basic steps are
- Using a regular editor, write a small script (f.i., myjob.qsub) with all the necessary information:
#PBS -l nodes=1:ppn=1 #PBS -N myjobname #PBS -j oe #PBS -m ae cd /path/to/my/program/directory sas myprog.sas
where SAS (or Stata, or R) should be called as you would normally call the program when submitting a batch job (refer to your software documentation for details). For the PBS parameters, three examples of which are noted above, refer to the qsub manual. It is preferable to use full paths wherever appropriate.
- Open a terminal shell, if logged in graphically
- Submit the job with
qsub myjob.qsub
- Monitor the job (see below)
Montoring your jobs
The status of the compute nodes can be monitored using the web interface available locally (when using a browser running on the headnode) at http://localhost, or from outside the SSG at http://www.vrdc.cornell.edu/ssg-monitor/
The status of your own jobs can be monitored using
qstat
which will show jobs in the queue and running, identifying the nodes they are running on:
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1902.ssg ..._program.qsub (userid) 06:18:04 R regular
If you need to stop a job, run
qdel (JOBID)
e.g.
qdel 1902
Next steps
Once your jobs are done, you may want to transfer your prepared data to the XSEDE resource of your choice. Follow Step 6.
Backup
By default, your account and data information is not backed up. Typically, you will rely on backup at your host institution, and you will synchronize or use an off-site versioning system (CVS, subversion, Bazaar) to store your programs. However, if needed, backup can be provided at an additional fee; please contact the PIs. Subversion client tools are installed.
Keeping informed
By default, we will subscribe you to a announcement-only mailing list (virtualrdc-ssg-l@cornell.edu) to notify you of any important information about the server.
- If you wish to have additional notifications about general changes or events at the VirtualRDC, you can subscribe from our front page to our RSS feed, or via Google to email notifications of the RSS feed.
- If you wish to be notified at a different email address, send an email to listmanager@list.cornell.edu with the body of the message stating “subscribe virtualrdc-ssg-l”.
Getting assistance
If you need further assistance, please consult our Help page on how best to direct your inquiry.


