The NSF-ITR confidentiality work group will hold a workshop at CISER on September 8, 2006. Participation is by invitation-only. Interested participants should contact John Abowd. The workshop is part of and financed through NSF Grant #0427889.
Agenda
Sponsored by NSF Grant #0427889.
Hosted by the Cornell Institute for Social and Economic Research
Workshop Participant List
- John Abowd, Cornell University and U.S. Census Bureau (LEHD), PI and organizer
- Lars Vilhuber, Cornell University and U.S. Census Bureau (LEHD), coordinator
- Fredrik Andersson, Cornell University and U.S. Census Bureau (LEHD)
- Gary Benedetto, University of Maryland and U.S. Census Bureau (LEHD)
- Rob Creecy, U.S. Census Bureau (SRD)
- Josep Domingo-Ferrer, Univ. Rovira i Virgili
- Lisa Dragoset, Cornell University and U.S. Census Bureau (LEHD)
- Kaj Gittings, Cornell University and U.S. Census Bureau (LEHD)
- Sam Hawala, U.S. Census Bureau (SRD)
- Daniel Kifer, Cornell University
- Ron Jarmin, U.S. Census Bureau (CES), Co-PI
- Saki Kinney, Duke University
- Karen Masken, Internal Revenue Service
- Kevin McKinney, U.S. Census Bureau (LEHD)
- Javier Miranda, U.S. Census Bureau (CES)
- Ashwin Machanavajjhala, Cornell University
- Kerry Papps, Cornell University
- Corinne Prost, INSEE and Cornell University
- Trevillore Raghunathan, University of Michigan, Co-PI
- Jerry Reiter, Duke University
- Arnie Reznek, U.S. Census Bureau (CES)
- Bryan Ricchetti, Cornell University and U.S. Census Bureau (LEHD)
- Rolando Rodriguez, U.S. Census Bureau (SRD)
- Stephen Roehrig, Carnegie Mellon University, Co-PI
- Ian Schmutte, Cornell University
- Martha Stinson, U.S. Census Bureau (LEHD)
- Vicenc Torra, University of Barcelona, Artificial Intelligence Research Institute
- Simon Woodcock, Simon Fraser University
Thursday, September 7, 2006
- Dinner: 7:30pm John Abowd's home (directions sent to all invitees)
- After dinner discussion: Introductions and where we are
Hotel: Hilton Garden Inn (downtown Ithaca). The group room rate for the NSF-ITR Workshop is $134/night. There is a block of rooms available for September 7th. Reservations can be made now through August 7th by calling 1-877-STAY-HGI or 607-277-8900 or on-line at www.ithaca.stayhgi.com and entering group/convention code ABOWD.
Friday, September 8, 2006
Location: Ives 109 Distance Learning Room (Cornell Campus, a shuttle bus is available from the Hilton Garden Inn.)
Breakfast
- 8:00am (ILR Conference Center Room 329. The hotel shuttle bus will bring you directly to the ILR Conference Center; go to the third floor. Food is not allowed in the distance learning room)
Morning Sessions (simulcast to the Census Bureau, room G-316/Building 3, and Barcelona, Spain)
- 8:30-9:20 Building synthesizers for different data structures
- Household data structures (longitudinal: SIPP, HRS; cross-sectional: ACS)
- Establishment data structures (longitudinal: LBD; cross-sectional: CBP)
- Job data structures (longitudinal: LEHD)
- Origin/destination data structures (cross-sectional: OTM)
- Dynamically linked tabular data (longitudinal: QWI)
- IPSO synthesizers; probabilistic record linkage; distance record linking
- 9:30-10:20 Testing the validity of synthetic data
- Univariate methods (KDE; MI combining formulae)
- Propensity score methods
- Other multivariate methods
- 10:30-11:20 Roundtable discussion of Data Privacy and Confidentiality Protection Technologies
- (Open session, joint with the Institute for Social Sciences Networks Team)
- 11:30-12:20 Certifying the degree of protection: Re-identification models and techniques
- Probabilistic record linking
- Distance record linking
- Estimating the probability of re-identification
- Estimating the PPF for information and protection
Lunch
- 12:30 (ILR Conference Center Room 329, same as breakfast)
Afternoon Sessions (simulcast to the Census Bureau, room G-316/Building 3)
- 1:30-2:00 Testing the validity of synthetic data (continued)
- Univariate methods (KDE; MI combining formulae)
- Propensity score methods
- Other multivariate methods
- 2:00-2:20 Computational issues
- Basic computational engines (SAS, Java, R)
- Computational problems for synthesizers (Control of multithreading, implementing informative priors)
- Computational problems for re-identification software (SAS callable, native SAS)
- Open issues
- 2:30-3:20 Getting data to the users
- What should we support on the VRDC?
- How can we best teach the users the combining formulae for multiply-imputed synthetic data?
- 3:30-4:00 Wrap-up
- Progress reports
- Working papers and publications
- VRDC files