Skip to main content


A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.

Overview of LEHD Infrastructure files, including QWI

Print Friendly, PDF & Email
UPDATED 2009, 2012

This paper by John Abowd, Bryce Stephens, Lars Vilhuber, and their co-authors, was originally presented at the NBER-CRIW conference in April 2005. It is also available as LEHD TP-2006-01 (and replaces LEHD TP-2002-05), The paper describes the LEHD Infrastructure files, most of which are or will be available in the RDCs. It has subsequently been published in Dunne, T.; Jensen, J. B. & Roberts, M. J., ed. (2009), Producer Dynamics: New Evidence from Micro Data, The University of Chicago Press for the National Bureau of Economic Research. The definitive publication can be accessed here.

The Longitudinal Employer Household Dynamics program at the U.S. Census Bureau, with funding from several national funding agencies, has built a set of infrastructure files using administrative data provided by state agencies, enhanced with information culled from demographic and economic (business) surveys and censuses. The LEHD Infrastructure Files provide a detailed and comprehensive picture of workers, employers, and their interaction in the U.S. economy. Building on this infrastructure, the Quarterly Workforce Indicators (QWI), a new dataseries published since 2003 by the U.S. Census Bureau, are computed. The QWI offer unprecedented detail on the local dynamics of labor markets. Despite the fine detail, confidentiality is maintained due to the application of state-of-the-art confidentiality protection methods. This article describes how the input files are compiled and combined to create the infrastructure files. The multiple imputation mechanisms that are used to fill in missing data, and the statistical matching techniques used to combine data where a direct match is not possible are both crucial to the success of the final product, and described in detail here. Finally, special attention is paid to the confidentiality protection mechanisms used to hide the identity of the underlying entities in the final published data. A brief description of public-use and restricted-access data files is also provided, with pointers to further documentation for researchers interested in using these data.

The paper - updated 2006-01-16- and presentation can be downloaded here.