Skip to main content
Log in

Are specialist certification examinations a reliable measure of physician competence?

  • Original Paper
  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

High stakes postgraduate specialist certification examinations have considerable implications for the future careers of examinees. Medical colleges and professional boards have a social and professional responsibility to ensure their fitness for purpose. To date there is a paucity of published data about the reliability of specialist certification examinations and objective methods for improvement. Such data are needed to improve current assessment practices and sustain the international credibility of specialist certification processes. To determine the component and composite reliability of the Fellowship examination of the College of Physicians of South Africa, and identify strategies for further improvement, generalizability and multivariate generalizability theory were used to estimate the reliability of examination subcomponents and the overall reliability of the composite examination. Decision studies were used to identify strategies for improving the composition of the examination. Reliability coefficients of the component subtests ranged from 0.58 to 0.64. The composite reliability of the examination was 0.72. This could be increased to 0.8 by weighting all test components equally or increasing the number of patient encounters in the clinical component of the examination. Correlations between examination components were high, suggesting that similar parameters of competence were being assessed. This composite certification examination, if equally weighted, achieved an overall reliability sufficient for high stakes examination purposes. Increasing the weighting of the clinical component decreased the reliability. This could be rectified by increasing the number of patient encounters in the examination. Practical ways of achieving this are suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Brennan, B. G., & Norman, G. R. (1997). Use of encounter cards for evaluation of residents in obstetrics. Academic Medicine, 72(Suppl.), S43–S44.

    Google Scholar 

  • Brennan, R. L. (2001a). Generalizability theory. New York: Springer-Verlag.

    Google Scholar 

  • Brennan, R.L. (2001b). Manual for mGENOVA. Iowa Testing Programmes Occasional Paper Number 47.

  • Clauser, B. E., Harik, P., & Margolis, M. J. (2006). A multivariate generalizability analysis of data from a performance assessment of physicians’ clinical skills. Journal of Educational Measurement, 43, 173–191.

    Article  Google Scholar 

  • Cronbach, I. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioural measurements: Generalizability for scores and profiles. New York: Wiley.

    Google Scholar 

  • Crossley, J., Davies, H., Humphris, G., & Jolly, B. (2002). Generalizability: a key to unlock professional assessment. Medical Education, 36, 972–978.

    Article  Google Scholar 

  • Daelmans, H. E. M., Scherpbier, A. J. J. A., van der Vleuten, C. P. M., & Donker, A. B. J. M. (2001). Reliability of oral examinations re-examined. Medical Teacher, 23, 422–424.

    Article  Google Scholar 

  • Downing, S. M., Tekian, A., & Yudkowsky, R. (2006). Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teaching and Learning in Medicine, 18, 50–57.

    Article  Google Scholar 

  • Friedman, M., & Mennin, S. P. (1991). Rethinking critical issues in performance assessment. Academic Medicine, 66, 390–395.

    Article  Google Scholar 

  • Hamdy, H., Prasad, K., Williams, R., & Salih, F. A. (2003). Reliability and validity of the direct observation clinical encounter examination. Medical Education, 37, 205–212.

    Article  Google Scholar 

  • Handfield-Jones, R., Brown, J. B., Rainsberry, P., & Brailovsky, C. A. (1996). Certification examination of the College of Family Physicians of Canada. Part II: Conduct and general performance. Canadian Family Physician, 42, 1188–1195.

    Google Scholar 

  • Hatala, R., & Norman, G. R. (1999). In-training evaluation during an internal medicine clerkship. Academic Medicine, 74(Suppl.), S118–S120.

    Article  Google Scholar 

  • Hays, R. B., Fabb, W. E., & van der Vleuten, C. P. M. (1995). Reliability of the Fellowship examination of the Royal Australian College of General Practitioners. Learning and Teaching in Medicine, 7, 43–50.

    Article  Google Scholar 

  • Hutchinson, L., Aitken, P., & Hayes, T. (2002). Are medical postgraduate certification processes valid? A systematic review of the published evidence. Medical Education, 36, 73–91.

    Article  Google Scholar 

  • Jarjoura, D., & Brennan, R. L. (1982). A variance components model for measurement procedures associated with a table of specifications. Applied Psychological Measurement, 6, 161–171.

    Article  Google Scholar 

  • Kane, M. T. (1992). The assessment of professional competence. Evaluation & the Health Professions, 15, 163–182.

    Article  Google Scholar 

  • Kane, M., & Case, S. M. (2004). The reliability and validity of weighted composite scores. Applied Measurement in Education, 17, 221–240.

    Article  Google Scholar 

  • Meadow, R. (1998). The structured exam has taken over. British Medical Journal, 317, 1329.

    Google Scholar 

  • Lew, S. R., Page, C. G., Schuwirth, L. W., Baron-Maldonado, M., Lescop, J. M., Paget, N. S., et al. (2002). Procedures for establishing defensible programmes for assessing practice performance. Medical Education, 36, 936–941.

    Article  Google Scholar 

  • Nelson, M. S., Clayton, B. L., & Moreno, R. (1990). How medical school faculty regard educational research and make pedagogical decisions. Academic Medicine, 65, 122–126.

    Article  Google Scholar 

  • Norcini, J. J., Blank, L. L, Arnold, G. K., & Kimball, H. R. (1995). The mini-CEX (clinical evaluation exercise): a preliminary investigation. Annals of Internal Medicine, 123, 795–799.

    Google Scholar 

  • Norcini, J. J., & Shea, J. A. (1997). The credibility and comparability of standards. Applied Measurement in Education, 10, 39–59. .

    Article  Google Scholar 

  • Norcini, J. (2001). The validity of long cases. Medical Education, 35, 720–721.

    Article  Google Scholar 

  • Norcini, J. J. (2002). The death of the long case? British Medical Journal, 324, 408–409.

    Article  Google Scholar 

  • Norcini, J. J. (2003). Setting standards on educational tests. Medical Education, 37, 464–469.

    Article  Google Scholar 

  • Norcini, J. J., Blank, L. L., Duffy, D., & Fortna, G. S. (2003). The mini-CEX: a method for assessing clinical skills. Academic Medicine, 138, 476–481.

    Google Scholar 

  • Norman, G. R., van der Vleuten, C. P. M., & de Graaff, E. (1991). Pitfalls in the pursuit of objectivity: issues of validity, efficiency and acceptability. Medical Education, 25, 119–126.

    Google Scholar 

  • Royal College of Physicians, Surgeons of Canada (2000). Handbook for Chairs and Members of Examination Boards. Ottawa: RCPSC.

    Google Scholar 

  • Swanson, D. B., Norman, G. R., & Linn, R. L. (1995). Performance based assessment: lessons from the health professions. Educational Researcher, 24, 5–11.

    Google Scholar 

  • Thompson, A. N. (1990a). An assessment of a postgraduate examination of competence in general practice: part I – reliability. New Zealand Medical Journal, 103, 182–184.

    Google Scholar 

  • Thompson, A. N. (1990b). An assessment of a postgraduate examination of competence in general practice: part II – validity. New Zealand Medical Journal, 103, 217–219.

    Google Scholar 

  • Turnbull, J., MacFayden, J., van Barneveld, C., & Norman, G. (2000). Clinical work sampling: a new approach to the problem of in training evaluation. Journal of General Internal Medicine, 15, 556–561.

    Article  Google Scholar 

  • Tweed, M., & Moila, J. (2001). Legal vulnerability of assessment tools. Medical Teacher, 23, 312–314.

    Article  Google Scholar 

  • Van der Vleuten, C. P. M., Norman, G. R., & de Graaff, E. (1991). Pitfalls in the pursuit of objectivity: issues of reliability. Medical Education, 25, 110–118.

    Article  Google Scholar 

  • Van der Vleuten, C. P. M. (1996a). The assessment of professional competence: developments, research and practical implications. Advances in Health Sciences Education, 1, 41–67.

    Article  Google Scholar 

  • Van der Vleuten, C. P. M. (1996b). Making the best of the “long case”. Lancet, 347, 704–705.

    Article  Google Scholar 

  • Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessing professional competence: from methods to programmes. Medical Education, 39, 309–317. .

    Article  Google Scholar 

  • Wainer, H. (1976). Estimating coefficients in linear models: it don’t make no nevermind. Psychological Bulletin, 83, 213–217.

    Article  Google Scholar 

  • Wass, V., & Jolly, B. (2001). Does observation add to the validity of the long case? Medical Education, 35, 729–734.

    Article  Google Scholar 

  • Wass, V., McGibbon, D., & van der Vleuten, C. (2001a). Composite undergraduate clinical examinations: how should the components be combined to maximize reliability? Medical Education, 35, 326–330.

    Article  Google Scholar 

  • Wass, V., Jones, R., & van der Vleuten, C. (2001b). Standardised or real patients to test clinical competence? The long case revisited. Medical Education, 35, 321–325.

    Article  Google Scholar 

  • Wass, V., & van der Vleuten, C. P. M. (2004). The long case. Medical Education, 38, 1176–1180.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. C. Burch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burch, V.C., Norman, G.R., Schmidt, H.G. et al. Are specialist certification examinations a reliable measure of physician competence?. Adv in Health Sci Educ 13, 521–533 (2008). https://doi.org/10.1007/s10459-007-9063-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-007-9063-5

Keywords

Navigation