Skip to main content

Latest Tweets

SIPP Synthetic Beta file

The SIPP Synthetic Beta (SSB) is a Census Bureau product that integrates person-level micro-data from a household survey with administrative tax and benefit data. These data link respondents from the Survey of Income and Program Participation (SIPP) to Social Security Administration (SSA)/Internal Revenue Service (IRS) Form W-2 records and SSA records of retirement and disability benefit receipt, and were produced by Census Bureau staff economists and statisticians in collaboration with researchers at Cornell University, the SSA and the IRS. The SIPP Synthetic Beta files are available on the VirtualRDC.

Applying for Access

Application forms and other documents are available at http://www.census.gov/programs-surveys/sipp/methodology/sipp-synthetic-beta-data-product.html. All but the application forms are also available on this site (see below).

Applications are judged solely on feasibility (i.e. the necessary variables are on the SSB). After projects are approved by the Census Bureau, researchers will be given accounts on the VirtualRDC (more specifically, on the Synthetic Data Server). More details regarding the use of the Synthetic Data Server are available at http://www.vrdc.cornell.edu/news/synthetic-data-server/.

Users should be aware that validation requests must follow certain rules, outlined on the Census Bureau's website. Deviations from the guidelines may be possible with prior approval of the Census Bureau, but are typically only granted if specialized software is needed (other than SAS or Stata), and only if said software also already exists on Census Bureau computing systems. Early contact with the data providers is highly encouraged.

Documentation

Documentation is available on this website as well as at the Census Bureau website:

  • Version 6.0.2:
    • U.S. Census Bureau, "Disclosure Review Board Memo: Second Request for Release of SIPP Synthetic Beta Version 6.0," U.S. Census Bureau 2015.
      [PDF] [URL] [Bibtex]
      @TECHREPORT{drbmemo2015,
      author = {{U.S. Census Bureau}},
      title = {Disclosure Review Board Memo: {S}econd Request for Release of {SIPP} {S}ynthetic
      {B}eta Version 6.0},
      institution = {U.S. Census Bureau},
      year = {2015},
      month = {January 15},
      owner = {vilhuber},
      timestamp = {2015.03.13},
      comment = {Original location http://www.census.gov/content/dam/Census/programs-surveys/sipp/methodology/DRBMemoTablesVersion2SSBv6_0.pdf},
      url = {http://hdl.handle.net/1813/42334}
      }

A online codebook is available at CED²AR (provided by the NSF-Census Research Network - Cornell Node)

  • L. B. Reeder, M. Stinson, K. E. Trageser, and L. Vilhuber, "Codebook for the SIPP Synthetic Beta v6.0.2 [Codebook file]," {Cornell Institute for Social and Economic Research} and {Labor Dynamics Institute} [distributor]. Cornell University, Ithaca, NY, USA, {DDI-C} document , 2015.
    [URL] [Bibtex]
    @TECHREPORT{CED2AR-SSBv602,
    author = {Lori B. Reeder and Martha Stinson and Kelly E. Trageser and Lars Vilhuber},
    title = {Codebook for the {SIPP} {S}ynthetic {B}eta v6.0.2 [Codebook file]},
    institution = {{Cornell Institute for Social and Economic Research} and {Labor Dynamics Institute} [distributor]. Cornell University},
    type = {{DDI-C} document},
    address = {Ithaca, NY, USA},
    year = {2015},
    url = {http://www2.ncrn.cornell.edu/ced2ar-web/codebooks/ssb/v/v602}
    }
Older Versions
  • Version 6.0:
    • U.S. Census Bureau, "Disclosure Review Board Memo: Second Request for Release of SIPP Synthetic Beta Version 6.0," U.S. Census Bureau 2015.
      [PDF] [URL] [Bibtex]
      @TECHREPORT{drbmemo2015,
      author = {{U.S. Census Bureau}},
      title = {Disclosure Review Board Memo: {S}econd Request for Release of {SIPP} {S}ynthetic
      {B}eta Version 6.0},
      institution = {U.S. Census Bureau},
      year = {2015},
      month = {January 15},
      owner = {vilhuber},
      timestamp = {2015.03.13},
      comment = {Original location http://www.census.gov/content/dam/Census/programs-surveys/sipp/methodology/DRBMemoTablesVersion2SSBv6_0.pdf},
      url = {http://hdl.handle.net/1813/42334}
      }

A online codebook is available at CED²AR (provided by the NSF-Census Research Network - Cornell Node)

  • L. B. Reeder, M. Stinson, K. E. Trageser, and L. Vilhuber, "Codebook for the SIPP Synthetic Beta v6.0 [Codebook file]," {Cornell Institute for Social and Economic Research} and {Labor Dynamics Institute} [distributor]. Cornell University, Ithaca, NY, USA, {DDI-C} document , 2015.
    [URL] [Bibtex]
    @TECHREPORT{CED2AR-SSBv6,
    author = {Lori B. Reeder and Martha Stinson and Kelly E. Trageser and Lars Vilhuber},
    title = {Codebook for the {SIPP} {S}ynthetic {B}eta v6.0 [Codebook file]},
    institution = {{Cornell Institute for Social and Economic Research} and {Labor Dynamics Institute} [distributor]. Cornell University},
    type = {{DDI-C} document},
    address = {Ithaca, NY, USA},
    year = {2015},
    url = {http://www2.ncrn.cornell.edu/ced2ar-web/codebooks/ssb/v/v6}
    }
  • Version 5.1:
    • U.S. Census Bureau, "Codebook for SIPP Synthetic Beta version 5.1," U.S. Census Bureau 2013.
      [Abstract] [PDF] [URL] [Bibtex]
      This codebook documents version 5.0 of the SIPP Synthetic Beta (SSB). The SSB is a set of files containing individual-level data synthesized from linked survey and administrative data. The SSB is produced by the US Census Bureau as part of a joint project with the Social Security Administration (SSA), and the Internal Revenue Service (IRS). The goal of the project is to make some of the benefits of linked survey and administrative data available to researchers outside of restricted‐access Census Bureau facilities in a manner that protects the confidentiality of the underlying data.

      @TECHREPORT{ssb_v5_1_codebook,
      author = {{U.S. Census Bureau}},
      title = {Codebook for {SIPP} {Synthetic} {Beta} version 5.1},
      institution = {U.S. Census Bureau},
      year = {2013},
      abstract = {This codebook documents version 5.0 of the SIPP Synthetic Beta (SSB).
      The SSB is a set of files containing individual-level data synthesized
      from linked survey and administrative data. The SSB is produced by
      the US Census Bureau as part of a joint project with the Social Security
      Administration (SSA), and the Internal Revenue Service (IRS). The
      goal of the project is to make some of the benefits of linked survey
      and administrative data available to researchers outside of restricted‐access
      Census Bureau facilities in a manner that protects the confidentiality
      of the underlying data.},
      owner = {vilhuber},
      timestamp = {2013.10.07},
      url = {http://hdl.handle.net/1813/42335}
      }

An online codebook is available at CED²AR (provided by the NSF-Census Research Network - Cornell Node)

  • L. B. Reeder, M. Stinson, K. E. Trageser, and L. Vilhuber, "Codebook for the SIPP Synthetic Beta v5.1 [Codebook file]," {Cornell Institute for Social and Economic Research} and {Labor Dynamics Institute} [distributor]. Cornell University, Ithaca, NY, USA, {DDI-C} document , 2014.
    [URL] [Bibtex]
    @TECHREPORT{CED2AR-SSBv51,
    author = {Lori B. Reeder and Martha Stinson and Kelly E. Trageser and Lars Vilhuber},
    title = {Codebook for the {SIPP} {S}ynthetic {B}eta v5.1 [Codebook file]},
    institution = {{Cornell Institute for Social and Economic Research} and {Labor Dynamics Institute} [distributor]. Cornell University},
    type = {{DDI-C} document},
    address = {Ithaca, NY, USA},
    year = {2014},
    url = {http://www2.ncrn.cornell.edu/ced2ar-web/codebooks/ssb/v/v51}
    }
  • Version 5.0:
    • U.S. Census Bureau, "DRB Memo September 20, 2010," U.S. Census Bureau 2010.
      [PDF] [URL] [Bibtex]
      @TECHREPORT{drbmemo2010,
      author = {{U.S. Census Bureau}},
      title = {{DRB} {M}emo {S}eptember 20, 2010},
      institution = {U.S. Census Bureau},
      year = {2010},
      month = {September 20},
      owner = {vilhuber},
      timestamp = {2013.10.07},
      oldurl = {http://www2.vrdc.cornell.edu/news/wp-content/uploads/2011/01/DRBMemoSeptember202010.pdf},
      url = {http://hdl.handle.net/1813/43926}
      }
    • U.S. Census Bureau, "Codebook for SIPP Synthetic Beta version 5.0," U.S. Census Bureau 2010.
      [Abstract] [PDF] [URL] [Bibtex]
      This codebook documents version 5.0 of the SIPP Synthetic Beta (SSB). The SSB is a set of files containing individual-level data synthesized from linked survey and administrative data. The SSB is produced by the US Census Bureau as part of a joint project with the Social Security Administration (SSA), and the Internal Revenue Service (IRS). The goal of the project is to make some of the benefits of linked survey and administrative data available to researchers outside of restricted‐access Census Bureau facilities in a manner that protects the confidentiality of the underlying data.

      @TECHREPORT{ssb_codebook,
      author = {{U.S. Census Bureau}},
      title = {Codebook for {SIPP} {Synthetic} {Beta} version 5.0},
      institution = {U.S. Census Bureau},
      year = {2010},
      abstract = {This codebook documents version 5.0 of the SIPP Synthetic Beta (SSB).
      The SSB is a set of files containing individual-level data synthesized
      from linked survey and administrative data. The SSB is produced by
      the US Census Bureau as part of a joint project with the Social Security
      Administration (SSA), and the Internal Revenue Service (IRS). The
      goal of the project is to make some of the benefits of linked survey
      and administrative data available to researchers outside of restricted‐access
      Census Bureau facilities in a manner that protects the confidentiality
      of the underlying data.},
      comment = {Original location: http://www.census.gov/sipp/SSB_Codebook.pdf},
      owner = {vilhuber},
      timestamp = {2013.10.07},
      oldurl = {http://www2.vrdc.cornell.edu/news/wp-content/uploads/2011/01/SSB_Codebook.pdf},
      url = {http://hdl.handle.net/1813/43925}
      }
  • Version 4.x:
    • J. M. Abowd, G. Benedetto, and M. Stinson, "Using the SIPP Synthetic Beta for Analysis," U.S. Census Bureau, Training provided to participants at a meeting at the U.S. Census Bureau on October 26, 2007 , 2007.
      [PDF] [URL] [Bibtex]
      @TECHREPORT{sipp_synthetic_beta_training_final_20071026,
      author = {John M. Abowd and Gary Benedetto and Martha Stinson},
      title = {Using the {SIPP} {Synthetic} {Beta} for Analysis},
      institution = {U.S. Census Bureau},
      year = {2007},
      type = {Training provided to participants at a meeting at the U.S. Census
      Bureau on October 26, 2007},
      owner = {vilhuber},
      timestamp = {2013.10.08},
      url = {http://www2.vrdc.cornell.edu/news/?p=306},
      url = {http://hdl.handle.net/1813/43930}
      }
    • J. M. Abowd, M. Stinson, and G. Benedetto, "Final Report to the Social Security Administration on the SIPP/SSA/IRS Public Use File Project," U.S. Census Bureau 2006.
      [Abstract] [PDF] [URL] [Bibtex]
      The creation of public use data that combine variables from the Census Bureau's Survey of Income and Program Participation (SIPP), the Internal Revenue Service's (IRS) individual lifetime earnings data, and the Social Security Administration's (SSA) individual benefit data began as part of ongoing collaborative research at the Census Bureau and SSA. The current project had its genesis with the formation of a joint committee containing representatives from the Census Bureau, SSA, IRS, and the Congressional Budget Office (CBO) that designed a prospective public use file. Aimed at a user community that was primarily interested in national retirement and disability programs, the selection of variables for the proposed SIPP/SSA/IRS-PUF focused on the critical demographic data to be supplied from the SIPP, earnings histories from the IRS data maintained at SSA, and benefit data from SSA’s master beneficiary records. After attempting to determine the feasibility of adding a limited number of variables from the SIPP directly to the linked earnings and benefit data, it was decided that the set of variables that could be added without compromising the confidentiality protection of the existing SIPP public use files was so limited that alternative methods had to be used to create a useful new public use file. The committee agreed to allow the Census Bureau to experiment with the confidentiality protection system known generically as "synthetic data." The actual technique adopted is called partially synthetic data with multiple imputation of missing items. As the term is used in this report, "partially synthetic data" means the release of person-level records containing some variables from the actual responses and other variables where the actual responses have been replaced by values sampled from the posterior predictive distribution for that record, conditional on all of the confidential data. This final report accompanies the delivery of version 4.0 to SSA as part of the fiscal year 2006 Jointly Financed Cooperative Agreement between the Census Bureau and SSA.

      @TECHREPORT{ssafinal,
      author = {John M. Abowd and Martha Stinson and Gary Benedetto},
      title = {Final Report to the {Social Security Administration} on the {SIPP/SSA/IRS}
      {Public} {Use} {File} {Project}},
      institution = {U.S. Census Bureau},
      year = {2006},
      owner = {vilhuber},
      timestamp = {2013.10.07},
      abstract = {The creation of public use data that combine variables from the Census Bureau's Survey of Income and Program Participation (SIPP), the Internal Revenue Service's (IRS) individual lifetime earnings data, and the Social Security Administration's (SSA) individual benefit data began as part of ongoing collaborative research at the Census Bureau and SSA. The current project had its genesis with the formation of a joint committee containing representatives from the Census Bureau, SSA, IRS, and the Congressional Budget Office (CBO) that designed a prospective public use file. Aimed at a user community that was primarily interested in national retirement and disability programs, the selection of variables for the proposed SIPP/SSA/IRS-PUF focused on the critical demographic data to be supplied from the SIPP, earnings histories from the IRS data maintained at SSA, and benefit data from SSA’s master beneficiary records. After attempting to determine the feasibility of adding a limited number of variables from the SIPP directly to the linked earnings and benefit data, it was decided that the set of variables that could be added without compromising the confidentiality protection of the existing SIPP public use files was so limited that alternative methods had to be used to create a useful new public use file. The committee agreed to allow the Census Bureau to experiment with the confidentiality protection system known generically as "synthetic data." The actual technique adopted is called partially synthetic data with multiple imputation of missing items. As the term is used in this report, "partially synthetic data" means the release of person-level records containing some variables from the actual responses and other variables where the actual responses have been replaced by values sampled from the posterior predictive distribution for that record, conditional on all of the confidential data. This final report accompanies the delivery of version 4.0 to SSA as part of the fiscal year 2006 Jointly Financed Cooperative Agreement between the Census Bureau and SSA.},
      oldurl = {http://www2.vrdc.cornell.edu/news/?p=308},
      url = {http://hdl.handle.net/1813/43929}
      }
    • U.S. Census Bureau, "DRB Memo on Disclosure Testing the SIPP Synthetic Beta," U.S. Census Bureau 2006.
      [Abstract] [URL] [Bibtex]
      As the result of a four year joint project between the Census Bureau, the Internal Revenue Service, and the Social Security Administration, the LEHD Program has created an enhanced SIPP file that links a subset of SIPP variables to ad- ministrative earnings and benefits data. We have reviewed this file for disclosure risk and here present our results to the Census Disclosure Review Board. We believe that the procedures we used to create the synthetic data conform to the Census Bureau’s disclosure avoidance requirements and request that the DRB grant permission for the file release.

      @TECHREPORT{drbmemnov2006,
      author = {{U.S. Census Bureau}},
      title = {{DRB} Memo on Disclosure Testing the {SIPP} {Synthetic} {Beta}},
      institution = {U.S. Census Bureau},
      year = {2006},
      month = {September 20},
      owner = {vilhuber},
      timestamp = {2013.10.07},
      abstract = {As the result of a four year joint project between the Census Bureau, the Internal
      Revenue Service, and the Social Security Administration, the LEHD Program
      has created an enhanced SIPP file that links a subset of SIPP variables to ad-
      ministrative earnings and benefits data. We have reviewed this file for disclosure
      risk and here present our results to the Census Disclosure Review Board. We
      believe that the procedures we used to create the synthetic data conform to the
      Census Bureau’s disclosure avoidance requirements and request that the DRB
      grant permission for the file release.},
      oldurl = {http://www2.vrdc.cornell.edu/news/?p=307},
      url = {http://hdl.handle.net/1813/43928}
      }
    • U.S. Census Bureau, "Codebook for the SIPP Synthetic Beta Version 4.1," U.S. Census Bureau 2007.
      [Abstract] [PDF] [URL] [Bibtex]
      This codebook documents version 4.1 of the SIPP Synthetic Beta (SSB). The SSB is a set of files containing individual-level data synthesized from linked survey and administrative data. The SSB is produced by the US Census Bureau as part of a joint project with the Social Security Administration (SSA), and the Internal Revenue Service (IRS). The goal of the project is to make some of the benefits of linked survey and administrative data available to researchers outside of restricted‐access Census Bureau facilities in a manner that protects the confidentiality of the underlying data.

      @TECHREPORT{technicaldescriptionsippsyntheticbetaoct42007,
      author = {{U.S. Census Bureau}},
      title = {Codebook for the {SIPP} {Synthetic} {Beta} Version 4.1},
      institution = {U.S. Census Bureau},
      year = {2007},
      month = {October},
      abstract = {This codebook documents version 4.1 of the SIPP Synthetic Beta (SSB).
      The SSB is a set of files containing individual-level data synthesized
      from linked survey and administrative data. The SSB is produced by
      the US Census Bureau as part of a joint project with the Social Security
      Administration (SSA), and the Internal Revenue Service (IRS). The
      goal of the project is to make some of the benefits of linked survey
      and administrative data available to researchers outside of restricted‐access
      Census Bureau facilities in a manner that protects the confidentiality
      of the underlying data.},
      owner = {vilhuber},
      timestamp = {2013.10.07},
      oldurl = {http://www.census.gov/sipp/technicaldescriptionsippsyntheticbetaoct42007.pdf},
      url = {http://hdl.handle.net/1813/43927}
      }

Accessing the data

Once an account on the Synthetic Data Server has been established, you will find template programs and instructions on the server under

 /rdcprojects/co00517/SSB/data/
                                  current -> v6.0.2
                                  v4.2/
                                  v5.0/
                                  v5.1/
                                  v6.0/
                                  v6.0.1/
                                  v6.0.2/
                                  users/
 /rdcprojects/co00517/SSB/programs/
                                  template/v4.2
                                  template/v5.0
                                  users/

Citing and Funding Acknowledgement

We ask that users of the data give credit to the different funders that contributed to the creation and distribution of the data:

The creation of the SIPP Synthetic Beta was funded by the US Census Bureau and SSA, with additional funding from NSF Grants #0427889 and #0339191.The Synthetic Data Server is funded through NSF grant SES-1042181 and BCS-0941226, and through a grant from the Alfred P. Sloan Foundation.

The data itself can be cited as

  • U.S. Census Bureau, "SIPP Synthetic Beta Version 6.0.2," {U.S. Census Bureau} [producer] and Cornell University, Synthetic Data Server [distributor], Washington,DC and Ithaca, NY, USA, [Computer file] , 2015.
    [URL] [Bibtex]
    @TECHREPORT{SSB602,
    author = {{U.S. Census Bureau}},
    title = {{SIPP} {S}ynthetic {B}eta Version 6.0.2},
    institution = {{U.S. Census Bureau} [producer] and Cornell University, Synthetic Data Server
    [distributor]},
    year = {2015},
    type = {[Computer file]},
    address = {Washington,DC and Ithaca, NY, USA},
    howpublished = {Computer file},
    organization = {Cornell University, Synthetic Data Server [distributor]},
    owner = {vilhuber},
    timestamp = {2015.01.10},
    url = {http://www2.vrdc.cornell.edu/news/data/sipp-synthetic-beta-file/}
    }

Citations for Older Versions

  • U.S. Census Bureau, "SIPP Synthetic Beta Version 5.1," {U.S. Census Bureau} [producer] and Cornell University, Synthetic Data Server [distributor], Washington,DC and Ithaca, NY, USA, [Computer file] , 2013.
    [URL] [Bibtex]
    @TECHREPORT{SSB5.1,
    author = {{U.S. Census Bureau}},
    title = {{SIPP} {S}ynthetic {B}eta Version 5.1},
    institution = {{U.S. Census Bureau} [producer] and Cornell University, Synthetic Data Server
    [distributor]},
    year = {2013},
    type = {[Computer file]},
    address = {Washington,DC and Ithaca, NY, USA},
    howpublished = {Computer file},
    organization = {Cornell University, Synthetic Data Server [distributor]},
    owner = {vilhuber},
    timestamp = {2013.06.10},
    url = {http://www2.vrdc.cornell.edu/news/data/sipp-synthetic-beta-file/}
    }
  • U.S. Census Bureau, "SIPP Synthetic Beta Version 5.0," {U.S. Census Bureau} [producer] and Cornell University, Synthetic Data Server [distributor], Washington,DC and Ithaca, NY, USA, [Computer file] , 2011.
    [URL] [Bibtex]
    @TECHREPORT{SSB5.0,
    author = {{U.S. Census Bureau}},
    title = {{SIPP} {S}ynthetic {B}eta Version 5.0},
    institution = {{U.S. Census Bureau} [producer] and Cornell University, Synthetic Data Server
    [distributor]},
    year = {2011},
    type = {[Computer file]},
    address = {Washington,DC and Ithaca, NY, USA},
    howpublished = {Computer file},
    organization = {Cornell University, Synthetic Data Server [distributor]},
    owner = {vilhuber},
    timestamp = {2013.06.10},
    url = {http://www2.vrdc.cornell.edu/news/data/sipp-synthetic-beta-file/}
    }