Two-step estimation of panel data models with censored endogenous variables and selection bias

https://doi.org/10.1016/S0304-4076(98)00043-8Get rights and content

Abstract

This paper presents some two-step estimators for a wide range of parametric panel data models with censored endogenous variables and sample selection bias. Our approach is to derive estimates of the unobserved heterogeneity responsible for the endogeneity/selection bias to include as additional explanatory variables in the primary equation. These are obtained through a decomposition of the reduced form residuals. The panel nature of the data allows adjustment, and testing, for two forms of endogeneity and/or sample selection bias. Furthermore, it incorporates roles for dynamics and state dependence in the reduced form. Finally, we provide an empirical illustration which features our procedure and highlights the ability to test several of the underlying assumptions.

Introduction

Despite the frequent use of panel data in empirical work there are few suitable estimators for panel data models with sample selection, truncation and limited dependent variables. While maximum likelihood can, under appropriate distributional assumptions, provide consistent estimators, its empirical use is hampered by computational complexities such as local maxima and multi-dimensional integrals. This paper proposes some two-step estimators, for a range of parametric panel data models, which avoid these problems. The models comprise a primary equation with an endogeneous explanatory variable, or selection bias, and a reduced form for the endogenous explanator or selection process. Following Heckman (1979), we argue that endogeneity and sample selection bias result from the failure to account for unobserved heterogeneity in the primary equation. We derive estimates of this heterogeneity from the reduced form residuals, to include as additional explanatory variables.

We examine two classes of models. The first is characterized by a primary equation with an uncensored dependent variable and an endogenous censored explanator. This case includes the sample selection model. Given estimates of the reduced form parameters, the primary equation parameters can be estimated by ordinary least squares, based upon a conditional expectation, and we refer to this as conditional moment estimation. The second class features a primary equation with a censored dependent variable and an uncensored endogenous explanatory variable. As the primary equation is estimated by maximum likelihood, where the likelihood function corresponds to the conditional density given the endogenous explanatory variable, we assign it an interpretation of conditional maximum likelihood (Smith and Blundell, 1986).

Many panel data estimators assume that the endogeneity or selection bias is due to time-invariant individual effects (see, for example, Hausman and Taylor, 1981; Amemiya and MaCurdy, 1986; Honoré, 1992, Honoré, 1993).1 We, however, also incorporate endogeneity/selectivity through an individual time specific component. This extends cross-sectional estimators (see, for example, Heckman, 1978, Heckman, 1979; Smith and Blundell, 1986; Rivers and Vuong, 1988; Vella, 1993) by separating the individual effects from these individual specific/time effects. We also capture state dependence in the process generating the endogeneity/selection bias. Our approach also encompasses existing panel data procedures for sample selection and attrition bias (see, for example, Ridder, 1990; Nijman and Verbeek, 1992).

Two-step procedures are generally inefficient (see, for example, Newey, 1987) and thus the attraction of our approach, in contrast to maximum likelihood, is its relative computational ease. In some instances, however, our two-step estimator is asymptotically efficient within a limited information framework (LIML). Our method provides initial consistent estimators for a LIML approach so that asymptotically efficient estimators can be obtained in one iteration.2

The following two sections consider conditional moment and conditional maximum likelihood estimation, respectively. Section 4presents an empirical example, featuring the wage–hours relationship, which illustrates a non-conventional form of sample selection bias where the endogenous explanatory variable, which is also the basis of the selection rule, enters the conditional mean non-linearly. This empirical example illustrates our procedure and highlights how many of our assumptions can be tested. Concluding comments are contained in Section 5.

Section snippets

Conditional moment estimation

Consider the following model where the parameters of Eq. (1)are of primary focus while Eq. (2)is the reduced form for the explanatory variable which is endogenous and/or the basis of the selection rule. The censoring and selection rules are in , :yit=m1(xit,zit;θ1)+μiit,zit=m2(xit,zi,t−1;θ2)+αi+vit,zit=h(zit;θ3),yit=yitifgt(zi1,…,ziT)=1,=0(unobserved)ifgt(zi1,…,ziT)=0,where i indexes individuals (i=1,…,N) and t indexes time (t=1,…,T);yit and zit are latent endogenous variables with

Conditional maximum-likelihood estimation

We now consider where the dependent variable in the primary equation is censored while the endogenous explanatory variable is fully observed. The model has the following general form:yit=m1(xit,zit;θ1)+μiit,zit=m2(xit,zi,t−1;θ2)+αi+vit,yit=h(yit3),zit=zit,where h is a function mapping the latent yit into the observed yit. The assumptions on the unobservables are stronger than before and given byμiı+ηiαiı+viXi∼N.I.D.00,σμ2ıı′+ση2Iσμαıı′+σηvIσα2ıı′+σv2Iwhich implies Eq. (6)with τ1=σηv/ση2

Empirical example

We now provide an empirical example estimating the impact of weekly hours worked on the offered hourly wage rate while accounting for the endogeneity of hours. This issue has attracted attention in the labor economics literature (see, for example, Moffitt, 1984; and Biddle and Zarkin, 1989). The model has the formwit=x1,it′β1+x2,it′β2+m(hoursit;β3)+μiit,hoursit=x3,it′θ1+hoursi,t−1θ2i+vit,hoursit=hoursitifhoursit>0,hoursit=0,witnotobservedifhoursit⩽0,where wit represents the log of the

Concluding remarks

This paper presents a two-step approach to estimating panel data models with censored endogenous variables and sample selection. In contrast to maximum likelihood estimation our procedure is computationally simple because only one-dimensional numerical integration is required, while a closed-form solution for the second step estimator is available. The cost is a loss of efficiency, which partially depends upon the magnitude of the covariances responsible for the endogeneity/selectivity. Our

Acknowledgements

This paper was partially written while the authors were visitors in the Department of Economics, Research School of Social Sciences and the Department of Statistics, The Faculties at the Australian National University, Canberra, and while Vella was visiting the CentER for Economic Research at Tilburg University. An earlier version of this paper was circulated under the title ‘Estimating and Testing Simultaneous Equations Panel Data Models with Censored Endogenous Variables’. Helpful comments by

References (30)

  • J Butler et al.

    A computationally efficient quadrature procedure for the one-factor multinomial probit model

    Econometrica

    (1982)
  • G Chamberlain

    Analysis of covariance with qualitative data

    Review of Economic Studies

    (1980)
  • R Davidson et al.

    Estimation and Inference in Econometrics

    (1993)
  • J Hausman et al.

    Panel data and unobservable individual effects

    Econometrica

    (1981)
  • Heckman J.J., 1978. Dummy endogenous variables in a simultaneous equation system. Econometrica...
  • Cited by (100)

    • Do defined contribution plans create value for shareholders?

      2024, International Review of Economics and Finance
    • Nonlinear and related panel data models

      2019, Panel Data Econometrics: Theory
    View all citing articles on Scopus
    View full text