Skip to main content

Access to ECCO

The Economics Compute Cluster (ECCO) has migrated to the BioHPC environment, and accounts are not handled on this website anymore. See the BioHPC User Guide.

Latest Tweets

Error: Could not authenticate you.

NSF-ITR Synthetic Data Workshop at CISER

Print Friendly, PDF & Email

The NSF-ITR confidentiality work group will hold a workshop at CISER on September 8, 2006. Participation is by invitation-only. Interested participants should contact John Abowd. The workshop is part of and financed through NSF Grant #0427889.



Agenda

Sponsored by NSF Grant #0427889.
Hosted by the Cornell Institute for Social and Economic Research

Workshop Participant List

  • John Abowd, Cornell University and U.S. Census Bureau (LEHD), PI and organizer
  • Lars Vilhuber, Cornell University and U.S. Census Bureau (LEHD), coordinator
  • Fredrik Andersson, Cornell University and U.S. Census Bureau (LEHD)
  • Gary Benedetto, University of Maryland and U.S. Census Bureau (LEHD)
  • Rob Creecy, U.S. Census Bureau (SRD)
  • Josep Domingo-Ferrer, Univ. Rovira i Virgili
  • Lisa Dragoset, Cornell University and U.S. Census Bureau (LEHD)
  • Kaj Gittings, Cornell University and U.S. Census Bureau (LEHD)
  • Sam Hawala, U.S. Census Bureau (SRD)
  • Daniel Kifer, Cornell University
  • Ron Jarmin, U.S. Census Bureau (CES), Co-PI
  • Saki Kinney, Duke University
  • Karen Masken, Internal Revenue Service
  • Kevin McKinney, U.S. Census Bureau (LEHD)
  • Javier Miranda, U.S. Census Bureau (CES)
  • Ashwin Machanavajjhala, Cornell University
  • Kerry Papps, Cornell University
  • Corinne Prost, INSEE and Cornell University
  • Trevillore Raghunathan, University of Michigan, Co-PI
  • Jerry Reiter, Duke University
  • Arnie Reznek, U.S. Census Bureau (CES)
  • Bryan Ricchetti, Cornell University and U.S. Census Bureau (LEHD)
  • Rolando Rodriguez, U.S. Census Bureau (SRD)
  • Stephen Roehrig, Carnegie Mellon University, Co-PI
  • Ian Schmutte, Cornell University
  • Martha Stinson, U.S. Census Bureau (LEHD)
  • Vicenc Torra, University of Barcelona, Artificial Intelligence Research Institute
  • Simon Woodcock, Simon Fraser University

Thursday, September 7, 2006

  • Dinner: 7:30pm John Abowd's home (directions sent to all invitees)
  • After dinner discussion: Introductions and where we are
Hotel: Hilton Garden Inn (downtown Ithaca). The group room rate for the NSF-ITR Workshop is $134/night. There is a block of rooms available for September 7th. Reservations can be made now through August 7th by calling 1-877-STAY-HGI or 607-277-8900 or on-line at www.ithaca.stayhgi.com and entering group/convention code ABOWD.

Friday, September 8, 2006

Location: Ives 109 Distance Learning Room (Cornell Campus, a shuttle bus is available from the Hilton Garden Inn.)

Breakfast

  • 8:00am (ILR Conference Center Room 329. The hotel shuttle bus will bring you directly to the ILR Conference Center; go to the third floor. Food is not allowed in the distance learning room)

Morning Sessions (simulcast to the Census Bureau, room G-316/Building 3, and Barcelona, Spain)

  • 8:30-9:20 Building synthesizers for different data structures
    • Household data structures (longitudinal: SIPP, HRS; cross-sectional: ACS)
    • Establishment data structures (longitudinal: LBD; cross-sectional: CBP)
    • Job data structures (longitudinal: LEHD)
    • Origin/destination data structures (cross-sectional: OTM)
    • Dynamically linked tabular data (longitudinal: QWI)
    • IPSO synthesizers; probabilistic record linkage; distance record linking
  • 9:30-10:20 Testing the validity of synthetic data
    • Univariate methods (KDE; MI combining formulae)
    • Propensity score methods
    • Other multivariate methods
  • 10:30-11:20 Roundtable discussion of Data Privacy and Confidentiality Protection Technologies
    • (Open session, joint with the Institute for Social Sciences Networks Team)
  • 11:30-12:20 Certifying the degree of protection: Re-identification models and techniques
    • Probabilistic record linking
    • Distance record linking
    • Estimating the probability of re-identification
    • Estimating the PPF for information and protection

Lunch

  • 12:30 (ILR Conference Center Room 329, same as breakfast)

Afternoon Sessions (simulcast to the Census Bureau, room G-316/Building 3)

  • 1:30-2:00 Testing the validity of synthetic data (continued)
    • Univariate methods (KDE; MI combining formulae)
    • Propensity score methods
    • Other multivariate methods
  • 2:00-2:20 Computational issues
    • Basic computational engines (SAS, Java, R)
    • Computational problems for synthesizers (Control of multithreading, implementing informative priors)
    • Computational problems for re-identification software (SAS callable, native SAS)
    • Open issues
  • 2:30-3:20 Getting data to the users
    • What should we support on the VRDC?
    • How can we best teach the users the combining formulae for multiply-imputed synthetic data?
  • 3:30-4:00 Wrap-up
    • Progress reports
    • Working papers and publications
    • VRDC files