Are specialist certification examinations a reliable measure of physician competence?

Burch, V. C.; Norman, G. R.; Schmidt, H. G.; van der Vleuten, C. P. M.

doi:10.1007/s10459-007-9063-5

Are specialist certification examinations a reliable measure of physician competence?

Original Paper
Published: 03 May 2007

Volume 13, pages 521–533, (2008)
Cite this article

Advances in Health Sciences Education Aims and scope Submit manuscript

V. C. Burch¹,
G. R. Norman²,
H. G. Schmidt³ &
…
C. P. M. van der Vleuten⁴

377 Accesses
9 Citations
4 Altmetric
Explore all metrics

Abstract

High stakes postgraduate specialist certification examinations have considerable implications for the future careers of examinees. Medical colleges and professional boards have a social and professional responsibility to ensure their fitness for purpose. To date there is a paucity of published data about the reliability of specialist certification examinations and objective methods for improvement. Such data are needed to improve current assessment practices and sustain the international credibility of specialist certification processes. To determine the component and composite reliability of the Fellowship examination of the College of Physicians of South Africa, and identify strategies for further improvement, generalizability and multivariate generalizability theory were used to estimate the reliability of examination subcomponents and the overall reliability of the composite examination. Decision studies were used to identify strategies for improving the composition of the examination. Reliability coefficients of the component subtests ranged from 0.58 to 0.64. The composite reliability of the examination was 0.72. This could be increased to 0.8 by weighting all test components equally or increasing the number of patient encounters in the clinical component of the examination. Correlations between examination components were high, suggesting that similar parameters of competence were being assessed. This composite certification examination, if equally weighted, achieved an overall reliability sufficient for high stakes examination purposes. Increasing the weighting of the clinical component decreased the reliability. This could be rectified by increasing the number of patient encounters in the examination. Practical ways of achieving this are suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Medical specialty certification exams studied according to the Ottawa Quality Criteria: a systematic review

Article Open access 30 August 2023

Validity of a cardiology fellow performance assessment: reliability and associations with standardized examinations and awards

Article Open access 15 March 2022

“That Was Pretty Powerful”: a Qualitative Study of What Physicians Learn When Preparing for Their Maintenance-of-Certification Exams

Article 03 July 2019

References

Brennan, B. G., & Norman, G. R. (1997). Use of encounter cards for evaluation of residents in obstetrics. Academic Medicine, 72(Suppl.), S43–S44.
Google Scholar
Brennan, R. L. (2001a). Generalizability theory. New York: Springer-Verlag.
Google Scholar
Brennan, R.L. (2001b). Manual for mGENOVA. Iowa Testing Programmes Occasional Paper Number 47.
Clauser, B. E., Harik, P., & Margolis, M. J. (2006). A multivariate generalizability analysis of data from a performance assessment of physicians’ clinical skills. Journal of Educational Measurement, 43, 173–191.
Article Google Scholar
Cronbach, I. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioural measurements: Generalizability for scores and profiles. New York: Wiley.
Google Scholar
Crossley, J., Davies, H., Humphris, G., & Jolly, B. (2002). Generalizability: a key to unlock professional assessment. Medical Education, 36, 972–978.
Article Google Scholar
Daelmans, H. E. M., Scherpbier, A. J. J. A., van der Vleuten, C. P. M., & Donker, A. B. J. M. (2001). Reliability of oral examinations re-examined. Medical Teacher, 23, 422–424.
Article Google Scholar
Downing, S. M., Tekian, A., & Yudkowsky, R. (2006). Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teaching and Learning in Medicine, 18, 50–57.
Article Google Scholar
Friedman, M., & Mennin, S. P. (1991). Rethinking critical issues in performance assessment. Academic Medicine, 66, 390–395.
Article Google Scholar
Hamdy, H., Prasad, K., Williams, R., & Salih, F. A. (2003). Reliability and validity of the direct observation clinical encounter examination. Medical Education, 37, 205–212.
Article Google Scholar
Handfield-Jones, R., Brown, J. B., Rainsberry, P., & Brailovsky, C. A. (1996). Certification examination of the College of Family Physicians of Canada. Part II: Conduct and general performance. Canadian Family Physician, 42, 1188–1195.
Google Scholar
Hatala, R., & Norman, G. R. (1999). In-training evaluation during an internal medicine clerkship. Academic Medicine, 74(Suppl.), S118–S120.
Article Google Scholar
Hays, R. B., Fabb, W. E., & van der Vleuten, C. P. M. (1995). Reliability of the Fellowship examination of the Royal Australian College of General Practitioners. Learning and Teaching in Medicine, 7, 43–50.
Article Google Scholar
Hutchinson, L., Aitken, P., & Hayes, T. (2002). Are medical postgraduate certification processes valid? A systematic review of the published evidence. Medical Education, 36, 73–91.
Article Google Scholar
Jarjoura, D., & Brennan, R. L. (1982). A variance components model for measurement procedures associated with a table of specifications. Applied Psychological Measurement, 6, 161–171.
Article Google Scholar
Kane, M. T. (1992). The assessment of professional competence. Evaluation & the Health Professions, 15, 163–182.
Article Google Scholar
Kane, M., & Case, S. M. (2004). The reliability and validity of weighted composite scores. Applied Measurement in Education, 17, 221–240.
Article Google Scholar
Meadow, R. (1998). The structured exam has taken over. British Medical Journal, 317, 1329.
Google Scholar
Lew, S. R., Page, C. G., Schuwirth, L. W., Baron-Maldonado, M., Lescop, J. M., Paget, N. S., et al. (2002). Procedures for establishing defensible programmes for assessing practice performance. Medical Education, 36, 936–941.
Article Google Scholar
Nelson, M. S., Clayton, B. L., & Moreno, R. (1990). How medical school faculty regard educational research and make pedagogical decisions. Academic Medicine, 65, 122–126.
Article Google Scholar
Norcini, J. J., Blank, L. L, Arnold, G. K., & Kimball, H. R. (1995). The mini-CEX (clinical evaluation exercise): a preliminary investigation. Annals of Internal Medicine, 123, 795–799.
Google Scholar
Norcini, J. J., & Shea, J. A. (1997). The credibility and comparability of standards. Applied Measurement in Education, 10, 39–59. .
Article Google Scholar
Norcini, J. (2001). The validity of long cases. Medical Education, 35, 720–721.
Article Google Scholar
Norcini, J. J. (2002). The death of the long case? British Medical Journal, 324, 408–409.
Article Google Scholar
Norcini, J. J. (2003). Setting standards on educational tests. Medical Education, 37, 464–469.
Article Google Scholar
Norcini, J. J., Blank, L. L., Duffy, D., & Fortna, G. S. (2003). The mini-CEX: a method for assessing clinical skills. Academic Medicine, 138, 476–481.
Google Scholar
Norman, G. R., van der Vleuten, C. P. M., & de Graaff, E. (1991). Pitfalls in the pursuit of objectivity: issues of validity, efficiency and acceptability. Medical Education, 25, 119–126.
Google Scholar
Royal College of Physicians, Surgeons of Canada (2000). Handbook for Chairs and Members of Examination Boards. Ottawa: RCPSC.
Google Scholar
Swanson, D. B., Norman, G. R., & Linn, R. L. (1995). Performance based assessment: lessons from the health professions. Educational Researcher, 24, 5–11.
Google Scholar
Thompson, A. N. (1990a). An assessment of a postgraduate examination of competence in general practice: part I – reliability. New Zealand Medical Journal, 103, 182–184.
Google Scholar
Thompson, A. N. (1990b). An assessment of a postgraduate examination of competence in general practice: part II – validity. New Zealand Medical Journal, 103, 217–219.
Google Scholar
Turnbull, J., MacFayden, J., van Barneveld, C., & Norman, G. (2000). Clinical work sampling: a new approach to the problem of in training evaluation. Journal of General Internal Medicine, 15, 556–561.
Article Google Scholar
Tweed, M., & Moila, J. (2001). Legal vulnerability of assessment tools. Medical Teacher, 23, 312–314.
Article Google Scholar
Van der Vleuten, C. P. M., Norman, G. R., & de Graaff, E. (1991). Pitfalls in the pursuit of objectivity: issues of reliability. Medical Education, 25, 110–118.
Article Google Scholar
Van der Vleuten, C. P. M. (1996a). The assessment of professional competence: developments, research and practical implications. Advances in Health Sciences Education, 1, 41–67.
Article Google Scholar
Van der Vleuten, C. P. M. (1996b). Making the best of the “long case”. Lancet, 347, 704–705.
Article Google Scholar
Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessing professional competence: from methods to programmes. Medical Education, 39, 309–317. .
Article Google Scholar
Wainer, H. (1976). Estimating coefficients in linear models: it don’t make no nevermind. Psychological Bulletin, 83, 213–217.
Article Google Scholar
Wass, V., & Jolly, B. (2001). Does observation add to the validity of the long case? Medical Education, 35, 729–734.
Article Google Scholar
Wass, V., McGibbon, D., & van der Vleuten, C. (2001a). Composite undergraduate clinical examinations: how should the components be combined to maximize reliability? Medical Education, 35, 326–330.
Article Google Scholar
Wass, V., Jones, R., & van der Vleuten, C. (2001b). Standardised or real patients to test clinical competence? The long case revisited. Medical Education, 35, 321–325.
Article Google Scholar
Wass, V., & van der Vleuten, C. P. M. (2004). The long case. Medical Education, 38, 1176–1180.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Medicine, University of Cape Town, J-floor, Old Main Building, Groote Schuur Hospital, Observatory, 7925, Cape Town, South Africa
V. C. Burch
Programme for Educational Research and Development, McMaster University, Hamilton, Canada
G. R. Norman
Department of Psychology, Erasmus University, Rotterdam, The Netherlands
H. G. Schmidt
Department of Educational Development, Maastricht University, Maastricht, The Netherlands
C. P. M. van der Vleuten

Authors

V. C. Burch
View author publications
You can also search for this author in PubMed Google Scholar
G. R. Norman
View author publications
You can also search for this author in PubMed Google Scholar
H. G. Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
C. P. M. van der Vleuten
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. C. Burch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burch, V.C., Norman, G.R., Schmidt, H.G. et al. Are specialist certification examinations a reliable measure of physician competence?. Adv in Health Sci Educ 13, 521–533 (2008). https://doi.org/10.1007/s10459-007-9063-5

Download citation

Received: 12 December 2006
Accepted: 16 March 2007
Published: 03 May 2007
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10459-007-9063-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Are specialist certification examinations a reliable measure of physician competence?

Abstract

Access this article

Similar content being viewed by others

Medical specialty certification exams studied according to the Ottawa Quality Criteria: a systematic review

Validity of a cardiology fellow performance assessment: reliability and associations with standardized examinations and awards

“That Was Pretty Powerful”: a Qualitative Study of What Physicians Learn When Preparing for Their Maintenance-of-Certification Exams

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Are specialist certification examinations a reliable measure of physician competence?

Abstract

Access this article

Similar content being viewed by others

Medical specialty certification exams studied according to the Ottawa Quality Criteria: a systematic review

Validity of a cardiology fellow performance assessment: reliability and associations with standardized examinations and awards

“That Was Pretty Powerful”: a Qualitative Study of What Physicians Learn When Preparing for Their Maintenance-of-Certification Exams

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation