Abstract
High stakes postgraduate specialist certification examinations have considerable implications for the future careers of examinees. Medical colleges and professional boards have a social and professional responsibility to ensure their fitness for purpose. To date there is a paucity of published data about the reliability of specialist certification examinations and objective methods for improvement. Such data are needed to improve current assessment practices and sustain the international credibility of specialist certification processes. To determine the component and composite reliability of the Fellowship examination of the College of Physicians of South Africa, and identify strategies for further improvement, generalizability and multivariate generalizability theory were used to estimate the reliability of examination subcomponents and the overall reliability of the composite examination. Decision studies were used to identify strategies for improving the composition of the examination. Reliability coefficients of the component subtests ranged from 0.58 to 0.64. The composite reliability of the examination was 0.72. This could be increased to 0.8 by weighting all test components equally or increasing the number of patient encounters in the clinical component of the examination. Correlations between examination components were high, suggesting that similar parameters of competence were being assessed. This composite certification examination, if equally weighted, achieved an overall reliability sufficient for high stakes examination purposes. Increasing the weighting of the clinical component decreased the reliability. This could be rectified by increasing the number of patient encounters in the examination. Practical ways of achieving this are suggested.
Similar content being viewed by others
References
Brennan, B. G., & Norman, G. R. (1997). Use of encounter cards for evaluation of residents in obstetrics. Academic Medicine, 72(Suppl.), S43–S44.
Brennan, R. L. (2001a). Generalizability theory. New York: Springer-Verlag.
Brennan, R.L. (2001b). Manual for mGENOVA. Iowa Testing Programmes Occasional Paper Number 47.
Clauser, B. E., Harik, P., & Margolis, M. J. (2006). A multivariate generalizability analysis of data from a performance assessment of physicians’ clinical skills. Journal of Educational Measurement, 43, 173–191.
Cronbach, I. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioural measurements: Generalizability for scores and profiles. New York: Wiley.
Crossley, J., Davies, H., Humphris, G., & Jolly, B. (2002). Generalizability: a key to unlock professional assessment. Medical Education, 36, 972–978.
Daelmans, H. E. M., Scherpbier, A. J. J. A., van der Vleuten, C. P. M., & Donker, A. B. J. M. (2001). Reliability of oral examinations re-examined. Medical Teacher, 23, 422–424.
Downing, S. M., Tekian, A., & Yudkowsky, R. (2006). Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teaching and Learning in Medicine, 18, 50–57.
Friedman, M., & Mennin, S. P. (1991). Rethinking critical issues in performance assessment. Academic Medicine, 66, 390–395.
Hamdy, H., Prasad, K., Williams, R., & Salih, F. A. (2003). Reliability and validity of the direct observation clinical encounter examination. Medical Education, 37, 205–212.
Handfield-Jones, R., Brown, J. B., Rainsberry, P., & Brailovsky, C. A. (1996). Certification examination of the College of Family Physicians of Canada. Part II: Conduct and general performance. Canadian Family Physician, 42, 1188–1195.
Hatala, R., & Norman, G. R. (1999). In-training evaluation during an internal medicine clerkship. Academic Medicine, 74(Suppl.), S118–S120.
Hays, R. B., Fabb, W. E., & van der Vleuten, C. P. M. (1995). Reliability of the Fellowship examination of the Royal Australian College of General Practitioners. Learning and Teaching in Medicine, 7, 43–50.
Hutchinson, L., Aitken, P., & Hayes, T. (2002). Are medical postgraduate certification processes valid? A systematic review of the published evidence. Medical Education, 36, 73–91.
Jarjoura, D., & Brennan, R. L. (1982). A variance components model for measurement procedures associated with a table of specifications. Applied Psychological Measurement, 6, 161–171.
Kane, M. T. (1992). The assessment of professional competence. Evaluation & the Health Professions, 15, 163–182.
Kane, M., & Case, S. M. (2004). The reliability and validity of weighted composite scores. Applied Measurement in Education, 17, 221–240.
Meadow, R. (1998). The structured exam has taken over. British Medical Journal, 317, 1329.
Lew, S. R., Page, C. G., Schuwirth, L. W., Baron-Maldonado, M., Lescop, J. M., Paget, N. S., et al. (2002). Procedures for establishing defensible programmes for assessing practice performance. Medical Education, 36, 936–941.
Nelson, M. S., Clayton, B. L., & Moreno, R. (1990). How medical school faculty regard educational research and make pedagogical decisions. Academic Medicine, 65, 122–126.
Norcini, J. J., Blank, L. L, Arnold, G. K., & Kimball, H. R. (1995). The mini-CEX (clinical evaluation exercise): a preliminary investigation. Annals of Internal Medicine, 123, 795–799.
Norcini, J. J., & Shea, J. A. (1997). The credibility and comparability of standards. Applied Measurement in Education, 10, 39–59. .
Norcini, J. (2001). The validity of long cases. Medical Education, 35, 720–721.
Norcini, J. J. (2002). The death of the long case? British Medical Journal, 324, 408–409.
Norcini, J. J. (2003). Setting standards on educational tests. Medical Education, 37, 464–469.
Norcini, J. J., Blank, L. L., Duffy, D., & Fortna, G. S. (2003). The mini-CEX: a method for assessing clinical skills. Academic Medicine, 138, 476–481.
Norman, G. R., van der Vleuten, C. P. M., & de Graaff, E. (1991). Pitfalls in the pursuit of objectivity: issues of validity, efficiency and acceptability. Medical Education, 25, 119–126.
Royal College of Physicians, Surgeons of Canada (2000). Handbook for Chairs and Members of Examination Boards. Ottawa: RCPSC.
Swanson, D. B., Norman, G. R., & Linn, R. L. (1995). Performance based assessment: lessons from the health professions. Educational Researcher, 24, 5–11.
Thompson, A. N. (1990a). An assessment of a postgraduate examination of competence in general practice: part I – reliability. New Zealand Medical Journal, 103, 182–184.
Thompson, A. N. (1990b). An assessment of a postgraduate examination of competence in general practice: part II – validity. New Zealand Medical Journal, 103, 217–219.
Turnbull, J., MacFayden, J., van Barneveld, C., & Norman, G. (2000). Clinical work sampling: a new approach to the problem of in training evaluation. Journal of General Internal Medicine, 15, 556–561.
Tweed, M., & Moila, J. (2001). Legal vulnerability of assessment tools. Medical Teacher, 23, 312–314.
Van der Vleuten, C. P. M., Norman, G. R., & de Graaff, E. (1991). Pitfalls in the pursuit of objectivity: issues of reliability. Medical Education, 25, 110–118.
Van der Vleuten, C. P. M. (1996a). The assessment of professional competence: developments, research and practical implications. Advances in Health Sciences Education, 1, 41–67.
Van der Vleuten, C. P. M. (1996b). Making the best of the “long case”. Lancet, 347, 704–705.
Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessing professional competence: from methods to programmes. Medical Education, 39, 309–317. .
Wainer, H. (1976). Estimating coefficients in linear models: it don’t make no nevermind. Psychological Bulletin, 83, 213–217.
Wass, V., & Jolly, B. (2001). Does observation add to the validity of the long case? Medical Education, 35, 729–734.
Wass, V., McGibbon, D., & van der Vleuten, C. (2001a). Composite undergraduate clinical examinations: how should the components be combined to maximize reliability? Medical Education, 35, 326–330.
Wass, V., Jones, R., & van der Vleuten, C. (2001b). Standardised or real patients to test clinical competence? The long case revisited. Medical Education, 35, 321–325.
Wass, V., & van der Vleuten, C. P. M. (2004). The long case. Medical Education, 38, 1176–1180.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Burch, V.C., Norman, G.R., Schmidt, H.G. et al. Are specialist certification examinations a reliable measure of physician competence?. Adv in Health Sci Educ 13, 521–533 (2008). https://doi.org/10.1007/s10459-007-9063-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-007-9063-5