Skip to main content

Latest Tweets

  • New post: High-performance computing for Economists 2017 edition
    about 4 weeks ago via VirtualRDC
  • New post: ANNOUNCEMENT: EMERGENCY DOWNTIME on ecco cluster July 21-22 2017
    about 4 weeks ago via VirtualRDC
  • Page edited: World Statistical Congress 2015 (ISI2015) session on "Synthetic establishment microdata around the
    about 3 months ago via VirtualRDC

Step 2 - Setting up on the Simulated FSRDC server

Print Friendly, PDF & Email

The Simulated FSRDC is hosted on the Synthetic Data Server (SDS).

Please  follow these instructions on how to install software and configuration files


User interface

While both the real FSRDC and the SDS use Linux desktop systems as the primary interface, there are a few key (and unavoidable) differences:

  • One key difference between the (older, as of December 2015) FSRDC Linux system and the SDS is the choice of graphical interfaces. On the FSRDC, you will be using the KDE 3.5 desktop system. The SDS will be using the (somewhat) newer Gnome 2.28 interface. The saving grace is that both of those are closer to each other, and to older versions of Windows (Windows 7 and earlier) than they are to more modern user interfaces.
    If you have ever used a Linux desktop system in the past 5 years, you will typically have encountered neither of those systems, because both are fairly old: you probably will have seen KDE 4 (Plasma), Gnome 3 (or Gnome Shell), or Unity (Ubuntu custom interface) (see this site for more information on Linux desktop environments).
  • Another more subtle difference are the underlying Linux operating systems, though these are in part what drives the above user interface differences. The real FSRDC uses RHEL 5, the SDS uses CentOS 6, a RHEL 6 clone (see here for lifecycle dates for RHEL systems). If you have used a desktop Linux system yourself, you are most likely to have used a Ubuntu system. There are some differences, but mostly, the commands and utilities you will be able to use are the same.
  • The actual hardware underlying the two clusters are of course different. The key component to worry about is available memory (in particular when using Stata): The SDS uses compute nodes with 256GB of memory, which as of December 2015 is more than most but not all FSRDC nodes had available.

Job scheduler

The SDS uses a job scheduler to submit programs - the real FSRDC uses the same mechanism. Some of the details may differ - we use a open-source and free solution, while the Census Bureau uses a commercial solution. However, there are more similarities than differences

  • Both systems use the fundamental 'qsub' command.
  • On both systems, some system-specific wrapper commands allow for convenient quick submission of SAS, Stata, etc. programs (see Statistical Software on the SDS page). There are some minor differences between these commands (many years ago, they started from the same base, but have evolved separately).
  • On both systems, you need to submit jobs from the command line.
  • On the SDS only, there are menu entries in the Gnome desktop. These are not available on the real FSRDC.
  • More information can be found

Special considerations

  • On the real FSRDC,
    • your project directory will be under /rdcprojects/XX/XXNNNNN, where XX is a two-letter abbreviation related to the physical FSRDC where you initated the project, and NNNNN is an arbitrary number.
    • the data you are allowed to access can be found under /economic (Economic Directorate data), /demographic (Demographic Directorate data), and /mixed (LEHD data). You will only be able to access the data for which you requested permissions.

    On the Simulated RDC,

    • your project directory will be not exist, but we can create one under /rdcprojects/XX/XXNNNNN with a number and letter combination that you provide to us.
    • the zero-obs data can be found under /data/virtualrdc/economic, /data/virtualrdc/demographic, and /data/virtualrdc/mixed. If you write programs, you should take into account that the "/data/virtualrdc" needs to be removed from all programs if transferred to the FSRDC
  • You are using the Synthetic Data Server. Because the primary purpose of the server is to host synthetic data products with specific non-dissemination rights, it is not possible to upload or download data or programs. This is the same as with the real FSRDC servers. If you need public-use data on the system that you intend to test your programs with, please let us know.
  • You cannot access the internet from within the remote desktop environment. This, too, is the same as with the real FSRDC servers.
  • You can copy-and-paste from your personal (or university) desktop INTO the remote environment. This is different from the real FSRDC servers.
  • Not all Census data sets have "zero-obs" equivalent on these servers. Please let us know if you see any that you would need for your project. There is no guarantee that we can provide them in a timely fashion, since that is entirely dependent on the ability of the Census Bureau to provide disclosure-protected versions of those datasets (yes, even though there are no observations - no real data - in these zero-obs datasets, the metadata - names of variables - is itself sometimes confidential)
  • There is no documentation on zero-obs datasets on the remote desktop environment. This is different from the real FSRDC servers, where Word documents and PDF files (containing confidential information, sometimes) are available to be viewed with appropriate permissions.
  • There are quite a few public-use datasets available on the SDS filesystem under /data/clean. This is a feature of the the SDS - these data are not available on the real FSRDC servers. However, if you find them useful, you can request that a copy be made onto the FSRDC servers. This does not differ from any other request for providing user-provided data to FSRDC-hosted projects, and you should consult with the local FSRDC administrator.