In July 2009, John Abowd and Lars Vilhuber were awarded NSF grant SES-0922005 to continue the VirtualRDC in its current form, as well as create a new Social Science Gateway to TeraGrid. From our grant proposal:
"The Virtual Research Data Center at Cornell University has been a successful research support tool for users of many of the Census Bureau large-scale confidential data products including, but not limited to, those that are accessible via the Census Research Data Center network. [...] In addition, most social science researchers face substantial hurdles when they wish to harness the power of large-scale computational clusters, in particular when using new, very large synthetic data sets with their unprecedented detail on people, jobs, and firms. The proposed activity seeks to extend the VirtualRDC model to allow support of tera-scale social science computing via the NSF-sponsored TeraGrid resources. The most widespread statistical software packages used by social scientists, i.e., SAS, Stata, and SPSS, are not available on the TeraGrid itself or on any of the servers at the borders of the TeraGrid with fast connections to it. When viewing the problem through the lens of the typical data-driven research process (extract, edit and transform data; transfer data to a computational location; and perform analysis) social science researchers are typically constrained in at least one of these steps when approaching the high-performance computing clusters on the TeraGrid. For most data preparation, and for much analysis, the lack of standard statistical analysis and data preparation software packages is a serious impediment. However, the typical social scientist workstation or university-provided computational infrastructure does not have the resources to handle these very large data sets. Furthermore, the social science workstation and the university-provided infrastructure do not have sufficiently fast data connectivity to transfer any large prepared data files to the TeraGrid for processing there. This project aims to remedy bottlenecks in the first and second steps, with a focused expansion of resources at a critical location resulting in a highly useful gateway to the TeraGrid for the social sciences. The project builds a social science TeraGrid gateway that (i) allows researchers to perform the data preparation step using their comfort-level software packages, speeding up the data preparation phase, and (ii) do so on servers that have a fast connection to the TeraGrid, thus greatly speeding up the data-transfer process. [...]"
The full award information can be found at http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0922005. Press articles are available at the Cornell Chronicle and at the ILR News Center.