Two-step estimation of panel data models with censored endogenous variables and selection bias
Introduction
Despite the frequent use of panel data in empirical work there are few suitable estimators for panel data models with sample selection, truncation and limited dependent variables. While maximum likelihood can, under appropriate distributional assumptions, provide consistent estimators, its empirical use is hampered by computational complexities such as local maxima and multi-dimensional integrals. This paper proposes some two-step estimators, for a range of parametric panel data models, which avoid these problems. The models comprise a primary equation with an endogeneous explanatory variable, or selection bias, and a reduced form for the endogenous explanator or selection process. Following Heckman (1979), we argue that endogeneity and sample selection bias result from the failure to account for unobserved heterogeneity in the primary equation. We derive estimates of this heterogeneity from the reduced form residuals, to include as additional explanatory variables.
We examine two classes of models. The first is characterized by a primary equation with an uncensored dependent variable and an endogenous censored explanator. This case includes the sample selection model. Given estimates of the reduced form parameters, the primary equation parameters can be estimated by ordinary least squares, based upon a conditional expectation, and we refer to this as conditional moment estimation. The second class features a primary equation with a censored dependent variable and an uncensored endogenous explanatory variable. As the primary equation is estimated by maximum likelihood, where the likelihood function corresponds to the conditional density given the endogenous explanatory variable, we assign it an interpretation of conditional maximum likelihood (Smith and Blundell, 1986).
Many panel data estimators assume that the endogeneity or selection bias is due to time-invariant individual effects (see, for example, Hausman and Taylor, 1981; Amemiya and MaCurdy, 1986; Honoré, 1992, Honoré, 1993).1 We, however, also incorporate endogeneity/selectivity through an individual time specific component. This extends cross-sectional estimators (see, for example, Heckman, 1978, Heckman, 1979; Smith and Blundell, 1986; Rivers and Vuong, 1988; Vella, 1993) by separating the individual effects from these individual specific/time effects. We also capture state dependence in the process generating the endogeneity/selection bias. Our approach also encompasses existing panel data procedures for sample selection and attrition bias (see, for example, Ridder, 1990; Nijman and Verbeek, 1992).
Two-step procedures are generally inefficient (see, for example, Newey, 1987) and thus the attraction of our approach, in contrast to maximum likelihood, is its relative computational ease. In some instances, however, our two-step estimator is asymptotically efficient within a limited information framework (LIML). Our method provides initial consistent estimators for a LIML approach so that asymptotically efficient estimators can be obtained in one iteration.2
The following two sections consider conditional moment and conditional maximum likelihood estimation, respectively. Section 4presents an empirical example, featuring the wage–hours relationship, which illustrates a non-conventional form of sample selection bias where the endogenous explanatory variable, which is also the basis of the selection rule, enters the conditional mean non-linearly. This empirical example illustrates our procedure and highlights how many of our assumptions can be tested. Concluding comments are contained in Section 5.
Section snippets
Conditional moment estimation
Consider the following model where the parameters of Eq. (1)are of primary focus while Eq. (2)is the reduced form for the explanatory variable which is endogenous and/or the basis of the selection rule. The censoring and selection rules are in , :where i indexes individuals and t indexes time and are latent endogenous variables with
Conditional maximum-likelihood estimation
We now consider where the dependent variable in the primary equation is censored while the endogenous explanatory variable is fully observed. The model has the following general form:where h is a function mapping the latent into the observed yit. The assumptions on the unobservables are stronger than before and given bywhich implies Eq. (6)with τ1=σηv/ση2
Empirical example
We now provide an empirical example estimating the impact of weekly hours worked on the offered hourly wage rate while accounting for the endogeneity of hours. This issue has attracted attention in the labor economics literature (see, for example, Moffitt, 1984; and Biddle and Zarkin, 1989). The model has the formwhere wit represents the log of the
Concluding remarks
This paper presents a two-step approach to estimating panel data models with censored endogenous variables and sample selection. In contrast to maximum likelihood estimation our procedure is computationally simple because only one-dimensional numerical integration is required, while a closed-form solution for the second step estimator is available. The cost is a loss of efficiency, which partially depends upon the magnitude of the covariances responsible for the endogeneity/selectivity. Our
Acknowledgements
This paper was partially written while the authors were visitors in the Department of Economics, Research School of Social Sciences and the Department of Statistics, The Faculties at the Australian National University, Canberra, and while Vella was visiting the CentER for Economic Research at Tilburg University. An earlier version of this paper was circulated under the title ‘Estimating and Testing Simultaneous Equations Panel Data Models with Censored Endogenous Variables’. Helpful comments by
References (30)
- et al.
Simulation-based inference. A survey with special reference to panel data models
Journal of Econometrics
(1993) - et al.
Generalized residuals
Journal of Econometrics
(1987) Orthogonality conditions for tobit models with fixed effects and lagged dependent variables
Journal of Econometrics
(1993)A method of moments interpretation of sequential estimators
Economics Letters
(1984)Efficient estimation of limited dependent variable models with endogenous explanatory variables
Journal of Econometrics
(1987)- et al.
Limited information estimators and exogeneity tests for simultaneous probit models
Journal of Econometrics
(1988) Selection corrections for panel data models under conditional mean independence assumptions
Journal of Econometrics
(1995)- et al.
Instrumental-variable estimation of an error-components model
Econometrica
(1986) - Arellano, M., Bover, O., Labeaga, J.M., 1997. Autoregressive models with sample selectivity for panel data. Working...
- et al.
Choice among wage-hours packages: An empirical investigation of male labor supply
Journal of Labor Economics
(1989)
A computationally efficient quadrature procedure for the one-factor multinomial probit model
Econometrica
Analysis of covariance with qualitative data
Review of Economic Studies
Estimation and Inference in Econometrics
Panel data and unobservable individual effects
Econometrica
Cited by (100)
Absorptive capacity components: Performance effects in related and unrelated diversification
2024, Long Range PlanningDo defined contribution plans create value for shareholders?
2024, International Review of Economics and FinanceIrrigation technology adaptation for a sustainable agriculture: A panel endogenous switching analysis on the Italian farmland productivity
2023, Resource and Energy EconomicsConsistent estimation of panel data sample selection models
2023, Econometrics and StatisticsFemale CEO leadership and the likelihood of corporate diversity misconduct: Evidence from S&P 500 firms
2020, Journal of Business ResearchNonlinear and related panel data models
2019, Panel Data Econometrics: Theory