Skip to main content

Latest Tweets

  • Page edited: World Statistical Congress 2015 (ISI2015) session on "Synthetic establishment microdata around the https://t.co/wqHFOXyKV5
    about 20 hours ago via VirtualRDC
  • New page: World Statistical Congress 2015 (ISI2015) session on "Synthetic establishment microdata around the world" https://t.co/wqHFOXha3x
    about 20 hours ago via VirtualRDC
  • New post: Some thoughts on improved performance of an analysis job on a cluster like ECCO or SDSx https://t.co/uYlrWLD2zb
    about 3 days ago via VirtualRDC

LBD Synthetic Data v2

Background

synlbd-qr

Mobile device link to this page

The Synthetic Longitudinal Business Database Beta Data Product (SynLBD) is an experimental data product produced by the U.S. Census Bureau in collaboration with Duke University, Cornell University, the National Institute of Statistical Sciences (NISS), the Internal Revenue Service (IRS) and the National Science Foundation (NSF). The purpose of the SynLBD is to provide users with access to a longitudinal business data product that can be used outside of a secure Census Bureau facility, without disclosing confidential information. The Census Bureau created version 2 of the SynLBD by synthesizing information from the (confidential) LBD [1, 2] on establishments‘ employment and payroll, establishments‘ birth and death years, and industrial classification. The Census Bureau's Disclosure Review Board and their counterparts at IRS have reviewed the content of the file, and allowed the release of these data for public use. Details on methods are described in [3, 4] and [5].

Scope

The SynLBD v2 covers years 1976-2000, and contains synthetic values for employment and payroll.

How to Access the SynLBD

Interested researchers should first read important information at the Census Bureau's information page on the SynLBD, in particular with respect to

  • analytic validity (disclaimer)
  • validation protocol

They may then apply for a free account on the VirtualRDC's Synthetic Data Server (SDS), by submitting “Application to use the SynLBD Synthetic Beta File” to ces.synthetic.data.use@census.gov.

Users may run programs on the server and results may be removed without disclosure or content review by Census Bureau staff. The SDS provides SAS, Stata and R analysis software and a computing environment similar to the one used to analyze the confidential LBD Gold Standard data on Census Bureau internal computers.

An important component of the use of the SynLBD v2 is the possibility of validation against the confidential LBD Gold Standard files. The SynLBD v2 have no guaranteed analytic validity. Validation against the internal-use, confidential LBD files will only occur if users run a clearly documented, error-free analysis against the SynLBD v2 files on the Synthetic Data Server.

Documentation

The Synthetic LBD v2 is documented in two ways:

Additional Information

Earlier versions

A version 1 of the LBD Synthetic Beta file was released in 2007, and is documented here.

Citing the grant and data

The creation of the Synthetic LBD  v2 was funded by NSF Grant #0427889. Ongoing work to improve the LBD Synthetic Beta file occurs on the NSF-funded RDC supercomputer, the acquisition and initial maintenance of which was funded by the same grant. The Synthetic Data Server is funded through NSF grant SES-1042181 with support by the U.S. Census Bureau.

The data can be cited as follows:

  • U.S. Census Bureau, "Synthetic LBD Beta Version 2.0," {U.S. Census Bureau} and Cornell University, Synthetic Data Server [distributor], Washington,DC and Ithaca, NY, USA, [Computer file] , 2011.
    [Abstract] [URL] [Bibtex]
    The Synthetic LBD Beta Data Product (SynLBD) is an experimental data product produced by the U.S. Census Bureau in collaboration with Duke University, Cornell University, the National Institute of Statistical Sciences (NISS), the Internal Revenue Service (IRS) and the National Science Foundation (NSF). The purpose of the SynLBD is to provide users with access to a longitudinal business data product that can be used outside of a secure Census Bureau facility. The Census Bureau created version 2 of the SynLBD by synthesizing information on establishments' employment and payroll, establishments' birth and death years, and industrial classification. The Census Disclosure Review Board and their counterparts at IRS have reviewed the content of the file, and allowed the release of these data for public use.

    @TECHREPORT{SynLBD20,
    author = {{U.S. Census Bureau}},
    title = {Synthetic {LBD} {Beta} Version 2.0},
    institution = {{U.S. Census Bureau} and Cornell University, Synthetic Data Server
    [distributor]},
    year = {2011},
    type = {[Computer file]},
    address = {Washington,DC and Ithaca, NY, USA},
    abstract = {The Synthetic LBD Beta Data Product (SynLBD) is an experimental data
    product produced by the U.S. Census Bureau in collaboration with
    Duke University, Cornell University, the National Institute of Statistical
    Sciences (NISS), the Internal Revenue Service (IRS) and the National
    Science Foundation (NSF). The purpose of the SynLBD is to provide
    users with access to a longitudinal business data product that can
    be used outside of a secure Census Bureau facility. The Census Bureau
    created version 2 of the SynLBD by synthesizing information on establishments'
    employment and payroll, establishments' birth and death years, and
    industrial classification. The Census Disclosure Review Board and
    their counterparts at IRS have reviewed the content of the file,
    and allowed the release of these data for public use.},
    howpublished = {Computer file},
    organization = {Cornell University, Synthetic Data Server [distributor]},
    owner = {vilhuber},
    timestamp = {2013.06.10},
    url = {http://www2.vrdc.cornell.edu/news/data/lbd-synthetic-data/}
    }

Elsewhere on this site

References

Below are references for documents cited on this page. Also consult the Synthetic Data Server (SDS) bibliography for additional papers that relate to the use of SynLBD data and methodology.

(Download bibtex)

  • J. Miranda and R. Jarmin, "The Longitudinal Business Database," U.S. Census Bureau, Center for Economic Studies, Discussion Paper CES-WP-02-17, 2002.
    [Abstract] [PDF] [URL] [Bibtex]
    The LBD is a research dataset constructed at the Census Bureau's Center for Economic Studies. The LBD is an establishment based file created by linking the annual snapshot files from Census Bureau's Business Register over time. It contains high quality longitudinal establishment linkages. Firm level linkages are currently under development at CES. The LBD contains several basic data items such as firm ownership, location, industry, payroll and employment.

    @TECHREPORT{MirandaJarmin2002,
    author = {Javier Miranda and Ron Jarmin},
    title = {The {Longitudinal} {Business} {Database}},
    institution = {U.S. Census Bureau, Center for Economic Studies},
    year = {2002},
    type = {Discussion Paper},
    number = {CES-WP-02-17},
    abstract = {The LBD is a research dataset constructed at the Census Bureau's Center
    for Economic Studies. The LBD is an establishment based file created
    by linking the annual snapshot files from Census Bureau's Business
    Register over time. It contains high quality longitudinal establishment
    linkages. Firm level linkages are currently under development at
    CES. The LBD contains several basic data items such as firm ownership,
    location, industry, payroll and employment.},
    owner = {vilhuber},
    timestamp = {2009.09.25},
    url = {http://ideas.repec.org/p/cen/wpaper/02-17.html}
    }
  • J. Miranda, "LBD Codebook," U.S. Census Bureau, mimeo , 2011.
    [PDF] [Bibtex]
    @TECHREPORT{LBD_Codebook,
    author = {Javier Miranda},
    title = {{LBD} Codebook},
    institution = {U.S. Census Bureau},
    year = {2011},
    type = {mimeo},
    owner = {vilhuber},
    timestamp = {2013.10.14},
    }
  • S. K. Kinney, J. P. Reiter, A. P. Reznek, J. Miranda, R. S. Jarmin, and J. M. Abowd, "Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database," International Statistical Review, vol. 79, iss. 3, pp. 362-384, 2011.
    [DOI] [URL] [Bibtex]
    @ARTICLE{KinneyEtAl2011,
    author = {Kinney, Satkartar K. and Reiter, Jerome P. and Reznek, Arnold P.
    and Miranda, Javier and Jarmin, Ron S. and Abowd, John M.},
    title = {Towards Unrestricted Public Use Business Microdata: The {Synthetic}
    {Longitudinal} {Business} {Database}},
    journal = {International Statistical Review},
    year = {2011},
    volume = {79},
    pages = {362--384},
    number = {3},
    doi = {10.1111/j.1751-5823.2011.00153.x},
    issn = {1751-5823},
    keywords = {Economic census, data confidentiality, synthetic data, disclosure
    limitation},
    owner = {vilhuber},
    publisher = {Blackwell Publishing Ltd},
    timestamp = {2012.09.04},
    url = {http://dx.doi.org/10.1111/j.1751-5823.2011.00153.x}
    }
  • S. K. Kinney, J. P. Reiter, A. P. Reznek, J. Miranda, R. S. Jarmin, and J. M. Abowd, "Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database," Center for Economic Studies, U.S. Census Bureau, Working Papers 11-04, 2011.
    [Abstract] [PDF] [URL] [Bibtex]
    In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments\' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.

    @TECHREPORT{CES-WP-11-04,
    author = {Satkartar K. Kinney and Jerome P. Reiter and Arnold P. Reznek and
    Javier Miranda and Ron S. Jarmin and John M. Abowd},
    title = {Towards Unrestricted Public Use Business Microdata: The {Synthetic}
    {Longitudinal} {Business} {Database}},
    institution = {Center for Economic Studies, U.S. Census Bureau},
    year = {2011},
    type = {Working Papers},
    number = {11-04},
    month = Feb,
    abstract = {In most countries, national statistical agencies do not release establishment-level
    business microdata, because doing so represents too large a risk
    to establishments\' confidentiality. One approach with the potential
    for overcoming these risks is to release synthetic data; that is,
    the released establishment data are simulated from statistical models
    designed to mimic the distributions of the underlying real microdata.
    In this article, we describe an application of this strategy to create
    a public use file for the Longitudinal Business Database, an annual
    economic census of establishments in the United States comprising
    more than 20 million records dating back to 1976. The U.S. Bureau
    of the Census and the Internal Revenue Service recently approved
    the release of these synthetic microdata for public use, making the
    synthetic Longitudinal Business Database the first-ever business
    microdata set publicly released in the United States. We describe
    how we created the synthetic data, evaluated analytical validity,
    and assessed disclosure risk.},
    owner = {vilhuber},
    timestamp = {2013.10.14},
    url = {http://ideas.repec.org/p/cen/wpaper/11-04.html}
    }
  • S. K. Kinney, J. P. Reiter, A. P. Reznek, J. Miranda, R. S. Jarmin, and J. M. Abowd, "Appendix to 'Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database'," Center for Economic Studies, U.S. Census Bureau, online document , 2011.
    [PDF] [URL] [Bibtex]
    @TECHREPORT{Kinney_et_al_2011_Appendix,
    author = {Kinney, Satkartar K. and Reiter, Jerome P. and Reznek, Arnold P.
    and Miranda, Javier and Jarmin, Ron S. and Abowd, John M.},
    title = {Appendix to '{T}owards Unrestricted Public Use Business Microdata: The {Synthetic}
    {Longitudinal} {Business} {Database}'},
    institution = {Center for Economic Studies, U.S. Census Bureau},
    year = {2011},
    type = {online document},
    keywords = {Economic census, data confidentiality, synthetic data, disclosure
    limitation},
    owner = {vilhuber},
    url = {https://www.census.gov/ces/pdf/SynLBD_Kinney_et_al_2011_Appendix.pdf}
    }
  • J. Miranda, "SynLBD Codebook," U.S. Census Bureau, mimeo , 2011.
    [PDF] [URL] [Bibtex]
    @TECHREPORT{SynLBD_Codebook,
    author = {Javier Miranda},
    title = {{SynLBD} Codebook},
    institution = {U.S. Census Bureau},
    year = {2011},
    type = {mimeo},
    owner = {vilhuber},
    url = {http://www.census.gov/ces/pdf/SynLBD_Codebook.pdf},
    timestamp = {2013.10.14}
    }
  • L. Vilhuber, "Codebook for the Synthetic LBD Version 2.0 [Codebook file]," {Comprehensive Extensible Data Documentation and Access Repository (CED2AR)}, Cornell Institute for Social and Economic Research and Labor Dynamics Institute [distributor]. Cornell University, Ithaca, NY, USA, DDI-C document , 2013.
    [URL] [Bibtex]
    @TECHREPORT{CED2AR-SynLBDv2,
    author = { Lars Vilhuber },
    title = {Codebook for the Synthetic LBD Version 2.0 [Codebook file]},
    institution = {{Comprehensive Extensible Data Documentation and Access Repository (CED2AR)}, Cornell Institute for Social and Economic Research and Labor Dynamics Institute [distributor]. Cornell University},
    type = {DDI-C document},
    address = {Ithaca, NY, USA},
    year = {2013},
    url = {http://www2.ncrn.cornell.edu/ced2ar-web/codebooks/synlbd/v/v2}
    }