Do examinations influence student evaluations?

https://doi.org/10.1016/j.ijer.2009.10.001Get rights and content

Abstract

This paper measures the impact of timing on student evaluations of teaching effectiveness, using a dataset of close to 3000 observations from Erasmus School of Economics. A special feature of the data is that students were able to complete on-line questionnaires during a time window ranging from one week before to one week after the final examination. This allows for the isolation of the effect of the examination on student evaluations. Among students who subsequently pass the exam, we find little difference between pre- and post-exam ratings. Among students who fail, evaluation scores are significantly lower after the exam on a number of items. Our evidence is compatible with a self-serving bias in student evaluations, but does not indicate that students seek revenge on instructors through lower ratings.

Introduction

Student evaluation of teaching (SET) is the most common means of evaluating teaching at educational institutions. The value of the information in SET scores is a hotly debated issue in higher education. The survey by Marsh (1987) concludes that student evaluations are generally valid and reliable and that they can yield useful information for students, instructors and management. More recent papers by Aleamoni (1999), Boex (2000), Bosshardt and Watts (2001) and Theall and Franklin (2001) reaffirm this conclusion and raise doubts about the most commonly reported biases and criticisms of student evaluations.

Over the past decade, the mode of SET administration has changed considerably. Hmieleski (2000) reports that in 2000, 98% of the most wired educational institutions in the US still used a paper-and-pencil mode of administration. A few years later, Gamliel and Davidovitz (2005) observe that on-line evaluation has become the established practice at most institutions of higher education. Nulty (2008) also notices an increasing growth in the use of on-line course and teaching evaluations over the past few years. Since paper-and-pencil evaluations are time-consuming, costly, susceptible to faculty influence and less suited to dealing with open questions, the shift towards on-line SETs is not hard to understand (Dommeyer, Baum, Hanna, & Chapman, 2004). The academic research on the mode of SET administration has focused on comparing response rates and ratings. While most studies observe lower response rates for on-line surveys, the consensus finding is that on-line evaluations do not produce significantly different mean SET scores than the traditional paper-based evaluations (Carini et al., 2003, Dommeyer et al., 2004, Layne et al., 1999, Liu, 2006).

A common practice at educational institutions is to administer SETs before the final examination of a course. In the paper-and-pencil era, students typically completed SETs during a class session at the very end of the term (Layne et al., 1999). Administration during the exam was usually avoided because of the lack of time and the increased stress level at the exam setting. The absence of physical contact with the complete class of students after the exam session prohibited the administration of paper-based evaluation after the exam session. Though on-line evaluation makes it possible to extend the evaluation period beyond the date of the exam, most educational institutions stick to the practice of administering SET before the final examination. Administration after the final examination is usually avoided out of a concern that a disappointing experience at the exam might sour student attitudes and have a negative influence on student ratings (Carrier, Howard, & Miller, 1974). If instructors share this concern, administrating SETs after the exam would increase their incentives to make the exams less demanding.

The support for this concern in the literature is mixed. A survey by Simpson and Siguaw (2000) among faculty members indicates that most respondents perceive SETs as problematic measurement instruments, which encourage professors to lower standards and which students may use as a tool for revenge. Simpson and Siguaw (2000) ask faculty members to list potential responses to SETs, including activities used by colleagues to expressly influence SETs. Most comments (23.6%) concern activities that lower grading or course work standards. This result supports earlier findings by Tabachnick, Keith-Spiegel, and Pope (1991, p. 510), whose survey reveals that 22% of the instructors admits to “giving easy courses or tests to ensure popularity with students”. Schmelkin, Spencer, and Gellman (1997), however, find that faculty members do not show strong resistance to student ratings and their use, in contrast to most anecdotal evidence. In addition, the literature on student perceptions of SETs indicates that students try to be fair and accurate in completing SETs and have confidence in their ability to rate professors (Brown, 2008, Spencer and Schmelkin, 2002).

While the literature on SETs is huge, few papers address the issue of timing and, more specifically, the effect of examinations. This paper exploits the fact that at Erasmus School of Economics, SET questionnaires are made available to students on-line, during a two-week time window centered around the date of the final examination. This set-up allows for a much better isolation of the effect of examinations on ratings, compared to the old practice of processing SETs using paper forms, filled in by the whole class at the same time.

While one could argue that situational factors like the timing of SETs in principle should not affect evaluations, there are three reasons why the examination might have an impact. First, the examination confronts the student in a direct manner with the success of his or her individual learning. Post-exam evaluation might thus lead to a different and, arguably, more complete student assessment of teaching effectiveness than pre-exam evaluation. A second explanation is based on economic behavior and holds that students use ratings to exact revenge on or reward instructors (Lin, 2008). Third, examinations are stressful events which release feelings of relief or disappointment, or of success or failure. This may give rise to attributional biases to reduce cognitive dissonance, whereby students internalize success and externalize failure. The asymmetrical nature of the attributional bias will be used to empirically disentangle this explanation from the other two explanations.

The paper is organized as follows. Section 2 reviews the literature and further develops the hypothesis which will be tested. Section 3 described the data. Section 4 reports the results and Section 5 concludes.

Section snippets

Literature

This paper focuses on the effect of timing (pre- or post-exam) on SET scores and controls for the influence of course grades. The review below is therefore confined to previous research relating to timing and grading, and to a discussion of the relevant literature on attributional bias and revenge theory. Throughout, a stringent definition of “bias” will be used, holding that “student ratings are biased to the extent that they are unrelated to teaching effectiveness” (Marsh, 1984, p. 733).

Student evaluation of teaching

The Erasmus School of Economics (ESE) at Erasmus University Rotterdam is the largest faculty of economics in the Netherlands, with a domestic market share of approximately 25%. The annual enrolment consists of approximately 700 freshmen. The school's largest program is in Economics and Business. It is offered as a three-year Bachelor programme, followed by a one-year Master programme.

Our dataset consist of 19 courses from the first two years of the Bachelor programme in Economics and Business,

Tests of equality of means

Table 2 reports descriptive statistics for the complete sample. In addition to the means (μ) and standard deviations (σ) for the pre-exam and post-exam subgroups, the final column reports the p-values of the test of equality of pre- and post-exam μ. The first row reports the results for the average of the SET scores on all items. On average, SET scores are 0.03 above the class average before and 0.021 below the class average after the exam. While the test of equality indicates that this

Summary and conclusions

This paper measures the impact of timing on student evaluations, using a dataset of 3000 observations from Erasmus School of Economics. We exploit students’ ability to complete the on-line questionnaires during a period from one week before the final examination to one week after the examination. In this way, we try to isolate the effect of the examination on SET scores. Among students who subsequently pass their exam, we find no significant differences between pre- and post-exam average

References (45)

  • R.M. Carini et al.

    College student responses to web and paper surveys: does mode matter?

    Research in Higher Education

    (2003)
  • N.A. Carrier et al.

    Course evaluation: when?

    Journal of Educational Psychology

    (1974)
  • C.J. Dommeyer et al.

    Gathering faculty teaching evaluations by in-class and online surveys: their effects on response rates and evaluations

    Assessment & Evaluation in Higher Education

    (2004)
  • K.A. Feldman

    Grades and college students’ evaluations of their courses and teachers

    Research in Higher Education

    (1976)
  • K.A. Feldman

    The significance of circumstances for college students’ ratings of their teachers and courses

    Research in Higher Education

    (1979)
  • P.W. Frey

    Validity of student instructional ratings: Does timing matter?

    The Journal of Higher Education

    (1976)
  • E. Gamliel et al.

    Online versus traditional teaching evaluations: Mode can matter

    Assessment & Evaluation in Higher Education

    (2005)
  • R.J. Gigliotti et al.

    Attributional bias and course evaluations

    Journal of Educational Psychology

    (1990)
  • A.G. Greenwald et al.

    Grading leniency is a removable contaminant of student ratings

    American Psychologist

    (1997)
  • A.G. Greenwald et al.

    No pain, no gain? The importance of measuring course workload in student ratings of instruction

    Journal of Educational Psychology

    (1997)
  • J.A Hausman

    Specification tests in econometrics

    Econometrica

    (1978)
  • T.M. Heckert et al.

    Relations among student effort, perceived class difficulty appropriateness, and student evaluations of teaching: Is it possible to “buy” better evaluations through lenient grading?

    College Student Journal

    (2006)
  • Cited by (0)

    I thank Maurit Kroon and Harry Post for assistance in gathering the data and William E. Becker and two referees for comments on an earlier draft of this paper.

    View full text