Skip to main content

Access to ECCO

The Economics Compute Cluster (ECCO) has migrated to the BioHPC environment, and accounts are not handled on this website anymore. See the BioHPC User Guide.

Latest Tweets

Error: Could not authenticate you.

"Distribution Preserving Statistical Disclosure Limitation", Woodcock and Benedetto

Print Friendly, PDF & Email

The NSF-ITR funded research paper by Simon Woodcock and Gary Benedetto is now available for download from the LEHD website at http://lehd.did.census.gov/led/library/techpapers_2006.html
A local copy can be found at the bottom of this announcement. The abstract states: "One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database."

Attached Files: