Skip to main content

Where is the Social Science Gateway?

The Social Science Gateway (SSG) grant has ended, please read here about ongoing availability of resources created as part of that project.

We support:

APDU Logo foas-logo-small (2)

LBD Synthetic Data v2

Background

synlbd-qr

Mobile device link to this page

The Synthetic LBD Beta Data Product (SynLBD) is an experimental data product produced by the U.S. Census Bureau in collaboration with Duke University, Cornell University, the National Institute of Statistical Sciences (NISS), the Internal Revenue Service (IRS) and the National Science Foundation (NSF). The purpose of the SynLBD is to provide users with access to a longitudinal business data product that can be used outside of a secure Census Bureau facility. The Census Bureau created version 2 of the SynLBD by synthesizing information from the (confidential) LBD (Miranda & Jarmin, 2002) on establishments‘ employment and payroll, establishments‘ birth and death years, and industrial classification. The Census Disclosure Review Board and their counterparts at IRS have reviewed the content of the file, and allowed the release of these data for public use. See Kinney et al (2011) for detailed information on methods.

Scope

The SynLBD v2 covers years 1976-2000, and contains synthetic values for employment and payroll.

How to Access the SynLBD

Interested researchers should first read important information at the Census Bureau's information page on the SynLBD, in particular with respect to

  • analytic validity (disclaimer)
  • validation protocol

They may then apply for a free account on the VirtualRDC's Synthetic Data Server (SDS), by submitting “Application to use the SynLBD Synthetic Beta File” to ces.synthetic.data.use@census.gov.

Users may run programs on the server and results may be removed without disclosure or content review by Census Bureau staff. The SDS provides SAS, Stata and R analysis software and a computing environment similar to the one used to analyze the confidential LBD Gold Standard data on Census Bureau internal computers.

An important component of the use of the SynLBD v2 is the possibility of validation against the confidential LBD Gold Standard files. The SynLBD v2 have no guaranteed analytic validity. Validation against the internal-use, confidential LBD files will only occur if users run a clearly documented, error-free analysis against the SynLBD v2 files on the Synthetic Data Server.

Documentation

  • S. K. Kinney, J. P. Reiter, A. P. Reznek, J. Miranda, R. S. Jarmin, and J. M. Abowd, "Towards unrestricted public use business microdata: the Synthetic Longitudinal Business Database," Center for Economic Studies, U.S. Census Bureau, Working Papers 11-04, , 2011.
    [PDF] [URL] [Bibtex]
    @TECHREPORT{CES-WP-11-04,
    author = {Satkartar K. Kinney and Jerome P. Reiter and Arnold P. Reznek and
    Javier Miranda and Ron S. Jarmin and John M. Abowd},
    title = {Towards Unrestricted Public Use Business Microdata: The {Synthetic}
    {Longitudinal} {Business} {Database}},
    institution = {Center for Economic Studies, U.S. Census Bureau},
    year = {2011},
    type = {Working Papers},
    number = {11-04},
    month = Feb,
    abstract = {In most countries, national statistical agencies do not release establishment-level
    business microdata, because doing so represents too large a risk
    to establishments\' confidentiality. One approach with the potential
    for overcoming these risks is to release synthetic data; that is,
    the released establishment data are simulated from statistical models
    designed to mimic the distributions of the underlying real microdata.
    In this article, we describe an application of this strategy to create
    a public use file for the Longitudinal Business Database, an annual
    economic census of establishments in the United States comprising
    more than 20 million records dating back to 1976. The U.S. Bureau
    of the Census and the Internal Revenue Service recently approved
    the release of these synthetic microdata for public use, making the
    synthetic Longitudinal Business Database the first-ever business
    microdata set publicly released in the United States. We describe
    how we created the synthetic data, evaluated analytical validity,
    and assessed disclosure risk.},
    owner = {vilhuber},
    timestamp = {2013.10.14},
    url = {http://ideas.repec.org/p/cen/wpaper/11-04.html}
    }
  • S. K. Kinney, J. P. Reiter, A. P. Reznek, J. Miranda, R. S. Jarmin, and J. M. Abowd, "Towards unrestricted public use business microdata: the Synthetic Longitudinal Business Database," International statistical review, vol. 79, iss. 3, pp. 362-384, 2011.
    [DOI] [URL] [Bibtex]
    @ARTICLE{KinneyEtAl2011,
    author = {Kinney, Satkartar K. and Reiter, Jerome P. and Reznek, Arnold P.
    and Miranda, Javier and Jarmin, Ron S. and Abowd, John M.},
    title = {Towards Unrestricted Public Use Business Microdata: The {Synthetic}
    {Longitudinal} {Business} {Database}},
    journal = {International Statistical Review},
    year = {2011},
    volume = {79},
    pages = {362--384},
    number = {3},
    doi = {10.1111/j.1751-5823.2011.00153.x},
    issn = {1751-5823},
    keywords = {Economic census, data confidentiality, synthetic data, disclosure
    limitation},
    owner = {vilhuber},
    publisher = {Blackwell Publishing Ltd},
    timestamp = {2012.09.04},
    url = {http://dx.doi.org/10.1111/j.1751-5823.2011.00153.x}
    }
  • J. Miranda, "LBD codebook," U.S. Census Bureau, mimeo , , 2011.
    [PDF] [Bibtex]
    @TECHREPORT{LBD_Codebook,
    author = {Javier Miranda},
    title = {{LBD} Codebook},
    institution = {U.S. Census Bureau},
    year = {2011},
    type = {mimeo},
    owner = {vilhuber},
    timestamp = {2013.10.14}
    }
  • J. Miranda, "SynLBD codebook," U.S. Census Bureau, mimeo , , 2011.
    [PDF] [Bibtex]
    @TECHREPORT{SynLBD_Codebook,
    author = {Javier Miranda},
    title = {{SynLBD} Codebook},
    institution = {U.S. Census Bureau},
    year = {2011},
    type = {mimeo},
    owner = {vilhuber},
    timestamp = {2013.10.14}
    }
  • J. Miranda and R. Jarmin, "The Longitudinal Business Database," U.S. Census Bureau, Center for Economic Studies, Discussion Paper CES-WP-02-17, , 2002.
    [PDF] [URL] [Bibtex]
    @TECHREPORT{MirandaJarmin2002,
    author = {Javier Miranda and Ron Jarmin},
    title = {The {Longitudinal} {Business} {Database}},
    institution = {U.S. Census Bureau, Center for Economic Studies},
    year = {2002},
    type = {Discussion Paper},
    number = {CES-WP-02-17},
    abstract = {The LBD is a research dataset constructed at the Census Bureau's Center
    for Economic Studies. The LBD is an establishment based file created
    by linking the annual snapshot files from Census Bureau's Business
    Register over time. It contains high quality longitudinal establishment
    linkages. Firm level linkages are currently under development at
    CES. The LBD contains several basic data items such as firm ownership,
    location, industry, payroll and employment.},
    owner = {vilhuber},
    timestamp = {2009.09.25},
    url = {http://ideas.repec.org/p/cen/wpaper/02-17.html}
    }
  • S. K. Kinney, J. P. Reiter, A. P. Reznek, J. Miranda, R. S. Jarmin, and J. M. Abowd, "Appendix to 'Towards unrestricted public use business microdata: the Synthetic Longitudinal Business Database'," Center for Economic Studies, U.S. Census Bureau, online document , , 2011.
    [PDF] [URL] [Bibtex]
    @TECHREPORT{Kinney_et_al_2011_Appendix,
    author = {Kinney, Satkartar K. and Reiter, Jerome P. and Reznek, Arnold P.
    and Miranda, Javier and Jarmin, Ron S. and Abowd, John M.},
    title = {Appendix to '{T}owards Unrestricted Public Use Business Microdata: The {Synthetic}
    {Longitudinal} {Business} {Database}'},
    institution = {Center for Economic Studies, U.S. Census Bureau},
    year = {2011},
    type = {online document},
    keywords = {Economic census, data confidentiality, synthetic data, disclosure
    limitation},
    owner = {vilhuber},
    url = {https://www.census.gov/ces/pdf/SynLBD_Kinney_et_al_2011_Appendix.pdf}
    }

Additional Information

Earlier versions

A version 1 of the LBD Synthetic Beta file was released in 2007, and is documented here.

Citing the grant and data

The creation of the LBD Synthetic Beta file v2 was funded by NSF Grant #0427889. Ongoing work to improve the LBD Synthetic Beta file occurs on the NSF-funded RDC supercomputer, the acquisition and initial maintenance of which was funded by the same grant. The Synthetic Data Server is funded through NSF grant SES-1042181 with support by the U.S. Census Bureau.

The data can be cited as follows:

  • U.S. Census Bureau, "Synthetic LBD Beta version 2.0," {U.S. Census Bureau} and Cornell University, Synthetic Data Server [distributor], Washington,DC and Ithaca, NY, USA, [Computer file] , , 2011.
    [URL] [Bibtex]
    @TECHREPORT{SynLBD20,
    author = {{U.S. Census Bureau}},
    title = {Synthetic {LBD} {Beta} Version 2.0},
    institution = {{U.S. Census Bureau} and Cornell University, Synthetic Data Server
    [distributor]},
    year = {2011},
    type = {[Computer file]},
    address = {Washington,DC and Ithaca, NY, USA},
    abstract = {The Synthetic LBD Beta Data Product (SynLBD) is an experimental data
    product produced by the U.S. Census Bureau in collaboration with
    Duke University, Cornell University, the National Institute of Statistical
    Sciences (NISS), the Internal Revenue Service (IRS) and the National
    Science Foundation (NSF). The purpose of the SynLBD is to provide
    users with access to a longitudinal business data product that can
    be used outside of a secure Census Bureau facility. The Census Bureau
    created version 2 of the SynLBD by synthesizing information on establishments'
    employment and payroll, establishments' birth and death years, and
    industrial classification. The Census Disclosure Review Board and
    their counterparts at IRS have reviewed the content of the file,
    and allowed the release of these data for public use.},
    howpublished = {Computer file},
    organization = {Cornell University, Synthetic Data Server [distributor]},
    owner = {vilhuber},
    timestamp = {2013.06.10},
    url = {http://www2.vrdc.cornell.edu/news/data/lbd-synthetic-data/}
    }

Elsewhere on this site