Note that this manuscript is currently draft version and will be revised later this year. This is the version that was presented at the AERA conference in 2001.

 

Grading Leniency, Grade Discrepancy, and Student Ratings of Instruction

Bryan W. Griffin

Georgia Southern University

March 28, 2002

            Paper presented at the American Educational Research Conference, New Orleans, LA, April, 2002.

            Correspondence concerning this article should be addressed to Bryan W. Griffin; Department of Curriculum, Foundations, and Research; Georgia Southern University; P. O. Box 8144; Statesboro, GA 30460 (e-mail: bwgriffin@gasou.edu).


Grading Leniency, Grade Discrepancy, and Student Ratings of Instruction

            Student ratings are widespread and a common tool for evaluating faculty. When asked, most faculty members approve of the use of student ratings of instruction for teaching improvement (Baxter, 1991; Griffin, 1999; Moses, 1986; Schmelkin, Spencer, & Gellamn, 1997), but many are resistant to the use of student ratings for tenure, promotion, and merit decisions (Feldman, 1997; McKeachie, 1997a). What many educators believe is that student ratings are affected, or biased, by a number of factors unrelated to teaching performance (Marsh & Overall, 1979; Wilson, 1998), and one common concern is that grading standards employed by instructors could bias ratings. As Marsh and Roche (2000) have noted, the average correlation between expected grades and student ratings of instruction is around .20. Typically this relationship has been interpreted using one of three theoretical explanations (for reviews see Greenwald & Gillmore, 1997a; Marsh & Roche, 2000; Wachtel, 1998).

First, the positive correlation between expected grade and student ratings of instruction may be explained as indicating a valid measurement of student ratings since better instruction should result in more learning, better grades, and better ratings. Second, the association between expected grades and ratings of instruction could be spurious and produced by various student characteristics such as motivation. For example, more motivated students who have greater interest in the subject matter are likely to learn more, achieve more, and rate the instructor higher. Third, an association between expected grades and ratings could reflect some type of biasing effect. For example, one possible biasing effect is grading leniency. Under this hypothesis, instructors are rewarded with higher ratings for assigning higher grades as a result of lenient grading practices, or conversely penalized with lower ratings for assigning lower grades due to grading harshness. One important weakness of studies examining the grading leniency hypothesis is that few have incorporated measures of student perceptions of the instructor’s grading leniency (Marsh 1987; Marsh & Roche, 2000).

Olivares’ (2001) was the only study found that incorporated a measure of grading leniency. Olivares measured grading leniency by asking students to compare their current instructor to others they have had and rate this instructor’s grading from 1 “much easier/lenient grader” to 7 “much harder/strict grader.” Olivares found zero-order correlations of -.42 between grading leniency and an overall rating of the instructor, and of -.45 between grading leniency and a composite rating of the instructor based on students’ perceptions of the instructor’s organization, communication, level of caring, and classroom atmosphere. Given the scoring system of the rating scale used for grading leniency, the negative correlations indicate that more lenient grading was associated with higher ratings of the instructors. Olivares also found that the association between grading leniency and student ratings of the instructor remained after controlling for pre-course interest, change in interest, expected grade for the course, and a measure of cognitive ability.

            In addition to the grading leniency hypothesis, another possible biasing effect interpretation for the grades-ratings association can be found in the theories of attribution and retribution (Feldman, 1997). Attribution theory suggests that a student may react in one of two ways if that student receives a grade that differs from what was expected. If the grade is lower than expected, then that student is likely to activate a defensive mechanism commonly referred to as a self-serving bias (Gigliotti & Buchtel, 1990). With self-serving bias, a student will attempt to protect his or her view of self and assign blame for the lower than expected performance to an external cause. The likely target will be the instructor, so the student will rate the instructor lower, thus a rating penalty effect will occur. If a student receives a grade that is higher than expected, then the student is will assign credit to this performance to internal causes, such as his or her intelligence, ability, hard work, etc. Since the better than expected grade is seen as a result of the student’s behavior or ability, ratings of the instructor are not likely to differ from ratings given by students who receive grades as expected; in essence, there is no rating reward effect. Further diminishing the possible rating reward effect is the situation identified by Miller and Ross (1975) in which individuals typically anticipate positive outcomes, so it is unlikely that many students will acknowledge higher than expected grades since high grades were expected anyway. In short, with attribution theory and self-serving bias, students are likely to penalize instructors for lower than expected grades, but there is unlikely to be any reward effect for the few students who might believe they are receiving a grade higher than expected. Retribution effect (Feldman, 1997) predicts simpler behavior on the part of students. If, for example, a student receives lower than expected grades, this individual will penalize the instructor, while a student who receives higher than expected grades will reward the instructor.

One difficulty with student ratings research using the self-serving bias and retribution effect explanations has been the method for determining the grade discrepancy—whether grades are higher or lower than what students expect. The cleanest method for assessing grade discrepancy is usually found in grade manipulation experiments in which students are lead to anticipate one grade, but then receive a grade inconsistent with their expectations (e.g., Abrami, Dickens, Perry, & Leventhal, 1980; Tata 1999; Worthington & Wong, 1979). Reviewers of these studies, however, have pointed to a number of potential flaws. One important flaw is that in classroom settings, often students do not know what their actual grade will be before they complete ratings of instruction forms, so the external validity of these studies is limited. For correlational studies of attribution and retribution effects, researchers often calculate grade discrepancy by considering pre-course grade point average (GPA) or pre-course expected grade, and then examining how the end-of-course expected grade or actual grade differs from the pre-course GPA or expected grade (e.g., Gigliotti & Buchtel, 1990; Granzin & Painter 1973; Greenwald, & Gillmore, 1997b; Palmer, Carliner, & Romer, 1978). A potential limitation of these designs is that students are very likely to reassess their expectations once they are exposed to the course and instructor, so pre-course grade expectation may provide an inaccurate grade discrepancy baseline. Similarly, the use of GPA for determining grade discrepancy could be misleading since performance, and expectation for performance, in a given course can be independent of performance in other courses. This does not mean that previous correlational studies are flawed or misleading, but alternative methods for assessing grade discrepancy may prove useful.

The purpose of this study is twofold. First, since only one study of the grading leniency hypothesis has incorporated a measure of leniency as perceived by students, it is important to understand better how scores from such a measure relate to student ratings, and to learn if the relationship between grading leniency and student ratings replicate across studies. Second, the calculation of grade discrepancy for assessing the self-serving bias and retribution effect hypotheses can be done in a manner that is perhaps more course appropriate than previously examined. Thus, the intent of this study is to examine the grading leniency explanation of student ratings by incorporating a measure of students’ perceptions of leniency, and to test both self-serving bias and retribution effect hypotheses by incorporating a more course specific measure of grade discrepancy.

Method

Participants

            A total of 754 undergraduate students enrolled in 39 education courses at a medium sized (14,000 students), regional university in the southeastern United States participated in this study. The classes ranged in size from 6 to 34 students. Undergraduate education students at this institution are predominately White (71%) and female (80%). Most respondents (76%) reported grade point averages in the range of 2.5 to 3.5 on a 4.0 scale. Data were collected during the fall and spring semesters of the years 1998 through 2000.

Instrument and Variables

            An instrument to assess student evaluations of instruction and course characteristics was developed drawing item and question wording from multiple sources (Abrami, d'Apollonia, & Rosenfield, 1997; Feldman, 1997; Marsh, 1987; Murray, 1997). To measure teaching effectiveness, 12 statements were used to assess multiple dimensions of instruction with ratings following a five-point scale. The 12 statements follow.

  1. Overall, how would you rate this course?
  2. Overall, how would you rate this instructor?
  3. The instructor was dynamic and energetic in conducting the course.
  4. The instructor presented the material in a clear and understandable manner.
  5. Course materials were well prepared and organized.
  6. Students were invited to share their ideas and knowledge.
  7. The instructor made students feel welcome in seeking help/advice in or outside of class.
  8. The content of this course is useful, worthwhile, or relevant to you.
  9. Methods of evaluating student work were fair and appropriate.
  10. The instructor seems to have a real interest in and concern for students.
  11. The instructor gave students useful/helpful feedback on work.
  12. The instructor is very knowledgeable in the subject of this course.

For the first 2 items, overall course and overall instructor, the scaled ranged from 1 “Poor” to 5 “Excellent” and for the remaining 10 items the scale ranged from 1 “strongly disagree” to 5 “strongly agree.”

The two predictors of interest in this study are grading leniency, which was assessed by students’ responses to this statement, “This instructor is a lenient/easy grader” (1 “strongly disagree” to 5 “strongly agree”), and grade discrepancy, which was calculated as the difference between the grade a student expected (“What grade do you think the instructor will assign you in this course?") minus the grade a student believed they deserved in the course (“What grade do you think you deserve in this course?”). Both expected and deserved grades were computed using a four point scale such that expected grades of A+, A, or A- were scored as 4; B+, B, and B- were scored as 3, etc. The difference between expected minus deserved grade can be interpreted as follows: a positive difference indicates the expected was higher than the deserved grade (e.g., expect an A but deserve a B), no difference shows expected and deserved are same (e.g., expect a B and deserve a B), and a negative difference reflects expected is lower than deserved (e.g., expect B and deserve A).

            In addition to these measures, students also provided information concerning (a) the instructor’s reputation (1 “very bad” to 5 “very good”, and 6 “didn’t know about the instructor”), (b) course difficulty (1 “one of easiest” to 5 “one of most difficult”), (c) course workload (1 “very light” to 5 “very heavy”), (d) current GPA, and (e) prior subject interest (1 “no interest” to 5 “very interested”). Class size and instructor’s sex were also included in the analysis. Three categories of instructor reputation were developed for the analyses performed in this study: negative reputation, which included students who selected responses 1 to 3 (“instructor very bad” to “about average”) for the instructor reputation item, positive reputation, which included students who choose responses 4 and 5 (“above average” to “instructor very good”) for the instructor reputation item, and no information, which consisted of students who selected response 6 (“didn’t know about the instructor”) for the instructor reputation item.

            From these three categories of instructor reputation, two dummy variables (Pedhazur, 1997) were created for the regression analyses performed below. The first, called positive reputation, was coded 1 if student responses corresponded with the positive reputation category, and 0 otherwise. The second dummy variable was labeled negative reputation and was coded 1 if student responses corresponded with the negative reputation category otherwise a 0 was used. Of the 754 respondents, 176 (23.3%) were classified into the positive reputation group, 420 (55.7%) into the no information group, and 158 (21%) into the negative reputation group.

[Discussion of construct validity for the scores from the ratings instrument will be added here]

Procedures

            Students in 39 classes were administered the evaluation instrument during the last week of regular classes in the fall and spring semesters, 1998 through 2000. Instructors were required to leave the classroom during evaluations. Students were told that evaluations would not be made available until after course grades had been assigned and would only be provided to instructors in aggregate form.

Results

            Of the 754 students sampled, 82.63% (n = 623) believed that the grade they expected in the course was the grade they deserved, hence there was no difference between expected and deserved grade for these students. A total of 118 students (15.65%) expected a grade lower than they deserved, and only 13 students (1.73%) expected a grade higher than they deserved. Of the two competing theories, self-serving bias and retribution effect, these data provide a better fit to the self-serving bias explanation since almost none of the students thought they were to receive a grade higher than deserved. As noted above, Miller and Ross (1975) predicted such behavior. It is also interesting to note that the great majority of students expected no discrepancy at all, so it is likely that any grade discrepancy effect on student ratings of instruction may be small or limited to only a small percentage of students overall.

            To statistically model student ratings, it was necessary to create a dummy variable for grade discrepancy. Since the number of students who expected grades higher than they deserved was only 13, too few to provide reliable regression analysis, these students were combined with the 623 who believed their expected and deserved grades were the same. As a check of the effect of combining these two groups, the means for each of the 12 items was calculated separately for both groups. The overall average rating across the 12 items provided by the group of 13 students was 4.19, and the overall average rating by the group of 623 students was 4.18, for a mean difference of only .01. Hence the mean rating from the 13 students who expected grades higher than they deserved was essentially the same as the mean rating by the 623 students who expected the grade they deserved. The remaining group of 118 students represents those who believed their expected grade was lower than they deserved. A dummy variable, called negative grade discrepancy, was created to represent these two new groups with a coding of 1 for the students with lower than deserved expectations (n = 118) and 0 for the other students (n = 636).

            To learn whether grading leniency and grade discrepancy are associated with student ratings of instruction, multilevel regression (Bryk & Raudenbush, 1992; Goldstein, 1995; Longford, 1993) was used in an effort to examine variation in student ratings both within and across classes. Several researchers of student ratings of instruction (e.g., Cranton & Smith, 1990; Feldman, 1998; Gigliotti & Buchtel, 1990) have noted that the level of analysis, either student- or class-level, at which student ratings are examined could influence the nature of the relationships that are revealed. For example, the analysis of class means rather than student-level data may obscure important variation in ratings that result from individual student differences within the classroom. Multilevel analysis allows one to combine both levels of analysis to provide a more complete model of student ratings.

            Incorporated into the multilevel analyses that follow were several covariates previously identified as important predictors of student ratings of instruction. At the student level, these covariates include course difficulty, course workload, prior subject interest, instructor reputation, and expected grade in the course. Research on student ratings has demonstrated course difficulty and course workload, often measured together, to correlate positively with ratings of instruction (Greenwald & Gillmore, 1997a, 1997b; Marsh, 1980; Marsh & Roche, 2000). Interest in the subject matter of the course before enrollment—prior subject interest—has been linked to higher student ratings of instruction (Howard & Maxwell, 1980; Marsh, 1980; Prave & Baril, 1993). Barké, Tollefson, and Tracy (1983), Griffin (2001), and Ory (1980) found that instructor reputation was associated with various measures of teaching effectiveness. Finally, expected grade in the course, which typically correlates positively with ratings, has been the subject of much debate and research (Greenwald & Gillmore, 1997a; Marsh, 1987; Marsh & Roche, 1997, 2000; McKeachie, 1997b) and therefore was included in the analysis.

            At the class level, class size and instructor sex were included. Research shows that class size correlates, albeit weakly, with ratings of instruction (Feldman, 1984). The sex of the instructor also appears to relate to student ratings. Feldman's (1998) reviews have shown that women tend to receive slightly higher ratings than men. However, Feldman (1998) also notes that a same-sex favorability in ratings exists; students of the same sex as their instructor may provide slightly higher ratings (Centra & Gaubatz, 2000). Since the majority of students in the classes examined in this study were women, it is likely that women instructors in this sample may have higher ratings.

            Thus, the models examined were, with variables enclosed in parentheses, as follows:

Student-level.

(Student Rating of Instruction Item)ij = b0j + b1 (Grade Leniency)ij + b2 (Neg. Grade Discrepancy)ij

+ b3 (Positive Reputation)ij + b4 (Negative Reputation)ij + b5 (Course Difficulty)ij 

+ b6 (Course Workload)ij + b7 (Prior Subject Interest)ij + b8 (Expected Grade)ij + eij 

 

At the class-level, mean ratings of the instructor were modeled with class size and instructor sex:

Class-level.

b0j = g00 + g01 (Instructor’s Sex)j + g02 (Class Size)j + m0j

 

Combining the student- and class-level equations yields the following model of instructor rating:

Combined.

 (Student Rating of Instruction Item)ij = g00 + b1 (Grade Leniency)ij + b2 (Neg. Grade Discrepancy)ij

+ b3 (Positive Reputation)ij + b4 (Negative Reputation)ij + b5 (Course Difficulty)ij 

+ b6 (Course Workload)ij + b7 (Prior Subject Interest)ij + b8 (Expected Grade)ij

+ g01 (Instructor’s Sex)j + g02 (Class Size)j + eij + m0j

This combined model was used to estimate the regression coefficients for each of the 12 rating items presented above. Correlations and descriptive statistics for the student-level variables are presented in Table 1. Multilevel regression results, using full information maximum likelihood to obtain estimates (Hox, 1995), are presented in Table 2.

 

Tables 1 and 2 about here

 

As the correlations demonstrate, grading leniency showed a positive correlation with each of the 12 ratings items. The correlations ranged from a low of .06 to a high of .36 with an average correlation of .21. The negative grading discrepancy dummy correlated negatively with each of the 12 ratings items, with correlations ranging from -.08 to -.25, and an average of -.18. These correlations show that students with lower expected than deserved grades tended to rate the instructor and instruction lower on each of the ratings items. While the zero-order correlations are informative about the general nature of the relationships among these variables, it is important to determine whether these patterns of association remain once other important predictors of student ratings are taken into account in a regression equation. The regression results in Table 2 indicated that grading leniency was statistically and positively related to all 12 rating items. The weakest relationship (b = .08) was with the course content item and the strongest (b = .21) was with the fair evaluation of students item. The latter coefficient may be interpreted as showing that the more lenient the instructor’s grading, the more fair and appropriate was judged the instructor’s evaluations of students’ work. The average partial regression coefficient for the 12 items was .13. To put these estimates into perspective, consider the situation of examining the single overall instructor rating item for which the grading leniency regression estimate is b = .14. Assuming that all other factors are held constant, two instructors who differ only on perceived grading leniency by one standard deviation (SD = 1.16, see Table 1) could expect an average mean difference of 1.16 × .14 = .16 points on their overall instructor rating item. On the extremes, one instructor judged the least lenient (rating = 1) and another judged most lenient (rating = 5) would differ by (5-1) × .14 = .56 points on their average overall instructor rating, for example, say 4.56 vs. 4.00 on a scale of 1 to 5.

The relationship between grade discrepancy and student ratings was less robust than that found with grading leniency. Negative grade discrepancy was found to be negatively associated with student ratings in all cases, but was statistically significant in only 7 of the 12 items. Since negative grade discrepancy is a dummy variable, the regression coefficient may be interpreted as the mean difference in student ratings between (a) those students who expect a grade lower than they deserve and (b) everyone else. The largest differences (b = -.27) were found with two items, course content worthwhile and instructor showed interest in students. The smallest difference (b = -.03) was found for the instructor knowledgeable item. Drawing on the example above using the overall instructor rating item, consider two instructors who differ only in the expectations held by their students regarding their expected and deserved grades. The overall instructor rating for the instructor with students who believe their expected grades will be lower than they deserve will be -.20 points lower than the instructor whose students do not anticipate any difference between their expected and deserved grades, e.g., 4.20 vs. 4.00.

For the other variables included in the models, results mirrored findings from previous studies. The negative instructor reputation dummy shows strong negative differences for each rating item except for the presented information clearly item and the materials organized item. Course difficulty was consistently, and positively, related to all rating items. The more difficult the course, as judged by students, the more positively students rated the course. Course workload was not statistically related to any of the rating items. Prior subject interest was statistically related to 8 of the 12 rating items, and in all cases prior subject interest was positively associated with ratings. Expected grade was also positively and statistically related to 10 of the 12 rating items (except for the course content worthwhile item and the instructor knowledgeable item, as these two relationships were not statistically significant). The partial regression coefficients for expected grade ranged from a low of .08 to a high of .24.

Discussion

Recall the three possible interpretations of the positive relationship between expected grade and student ratings of instruction: (a) valid teaching/learning association, (b) spurious association, (c) biasing effect. Two ways of expressing the biasing effect were examined in this paper, grading leniency and grade discrepancy. Grading leniency was positively and linearly associated with each of the 12 rating items examined such that more lenient graders also tended to have higher student ratings and, conversely, instructors with harsher grading practices tend to receive lower ratings (i.e., grading harshness effect; Marsh & Roche, 2000; Worthington & Wong, 1979). This finding replicates that reported by Olivares (2001) who also found that instructors with more lenient grading practices tended to have higher student ratings. On the basis of results from this study and Olivares’ study, it appears that students rate instructors who are lenient graders higher than instructors who are less lenient with their grading.

Also examined was the relationship between student ratings and grade discrepancy, which is defined in this study as the difference between students’ expected grade and perceived deserved grade. Two theoretical explanations for such an effect were listed, self-serving bias and retribution effect. As noted, self-serving bias suggests that students will penalize instructors for lower than deserved grades, but will not reward instructors for higher than deserved grades. Retribution effect holds that students will reward instructors for higher than deserved grades, and penalize instructors for lower than deserved grades. The data examined here provide a better fit to self-serving bias hypothesis. Less than 2% of the students sampled expected grades higher than they deserved, and about 16% expected grades lower than they deserved. There was little evidence that those who expected higher than deserved grades rewarded instructors with higher ratings when compared to students who expected grades to be what they deserved. There is, however, evidence of a penalty effect; students who expected grades lower than they deserved consistently provided ratings that were lower than the other two groups. The differences, adjusting for the modeled covariates, ranged from low of -.03 to a high of -.27, with the overall average of -.18. Marsh and Roche (2000) point out that the self-serving bias may not be a bias under certain conditions for student ratings of instruction. Perhaps, for example, if a grade discrepancy is due to factors unrelated to instruction or the instructor, then students may not provide lower ratings. Unfortunately, the reason for a grade discrepancy was not assessed this study, so it is impossible to know further what students were thinking when they identified a grade discrepancy.

In summary, these results suggest two things. First, there may be a grading leniency effect in student ratings, but so far only this study and Olivares’ (2001) study have apparently examined directly grading leniency. Replication studies are needed to further evaluate this finding. Second, in addition to a possible grading leniency effect, there appears to be an association between negative grade discrepancy and student ratings. This finding supports the self-serving bias hypothesis in that students appear to penalize instructors when grades are lower than expected, but do not reward instructors when grades are higher than expected. Since grading leniency and grade discrepancy, both possible parts of the biasing effect interpretation, were statistically controlled in the multilevel regression models, the partial regression coefficients for expected grade may perhaps represent a more pure examination of the (a) valid teaching/learning association and (b) spurious association hypotheses. Several factors that could lead to the spurious association effect were included in the regression models, such as prior subject interest, course difficulty and workload. It is possible, though, that other factors could be contributing to the observed relationship between expected grade and ratings found in this and other studies. More careful examinations taking into account various motivational factors such as intrinsic and extrinsic motivation, personal control, autonomy, etc. may prove useful in further elimination of the spurious effects hypothesis. However, since at least part of the spurious association and biasing effects hypotheses have been controlled in this study, that means the relationships between expected grades and student ratings of instruction found in the current study probably can be explained, at least in part, by the valid teaching/learning hypothesis. Thus, the results provided here suggest that student ratings of instruction are probably a function of both valid teaching and learning and some biasing due to grading leniency and grade discrepancy. 


References

            Abrami, P. C., d'Apollonia, S., & Rosenfield, S. (1997). The dimensionality of student ratings of instruction: What we know and what we do not. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 321-367). New York: Agathon.

            Abrami, P. C., Dickens, W. J., Perry, R. P., & Leventhal, L. (1980). Do teacher standards for assigning grades affect student evaluations of instruction? Journal of Educational Psychology, 72, 107-118.

            Barké, C. R., Tollefson, N., & Tracy, D. B. (1983). Relationship between course entry attitudes and end-of-course ratings. Journal of Educational Psychology, 75, 75-85.

            Baxter, E. P. (1991). The TEVAL experience, 1983-88: The impact of a student evaluation of teaching scheme on university teachers. Studies in Higher Education, 16, 151-179.

            Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage

            Feldman, K. A. (1997). Identifying exemplary teachers and teaching: Evidence from student ratings. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 368-395). New York: Agathon.

            Gigliotti, R. J., & Buchtel, F. S. (1990). Attributional bias and course evaluations. Journal of Educational Psychology, 82, 341-351.

            Granzin, K. L., & Painter, J. J. (1973). A new explanation for students’ course evaluation tendencies. American Educational Research Journal, 10, 115-124.

            Greenwald, A. G., & Gillmore, G. M. (1997a). Grading leniency is a removable contaminant of student ratings. American Psychologist, 52, 1209-1217.

            Greenwald, A. G., & Gillmore, G. M. (1997b). No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of Educational Psychology, 89, 743-751.

Griffin, B. W. (1999). Results of the Faculty Survey on Student Ratings of Instruction:

Preliminary Report. Statesboro, GA: Georgia Southern University, Student Ratings Committee.

            Griffin, B. W. (2001). Instructor reputation and student ratings of instruction. Contemporary Educational Psychology, 26, 534-552.

Marsh, H. W. (1987). Students’ evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, 253-388.

            Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on students’ evaluations of teaching: Popular myth, bias, validity, or innocent bystanders? Journal of Educational Psychology, 92, 202-228.

            McKeachie, W. J. (1997). Good teaching makes a differenceand we know what it is. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 396-408). New York: Agathon.

            Miller, D. T., & Ross, M. (1975). Self-serving biases in the attribution of causality: Fact or fiction? Psychological Bulletin, 82, 213-225.

            Moses, I. (1986). Student evaluation of teaching in an Australian universitystaff perceptions and reactions. Assessment & Evaluation in Higher Education, 11, 117-129.

            Murray, H. G. (1997). Effective teaching behaviors in the college classroom. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 171-204). New York: Agathon.

            Olivares, O. J. (2001). Student interest, grading leniency, and teacher ratings: A conceptual analysis. Contemporary Educational Psychology, 26, 382-399.

            Ory, J. C. (1980). The influence of students’ affective entry on instructor and course evaluations. The Review of Higher Education, 4, 13-24.

            Palmer, J., Carliner, G., & Romer, T. (1978). Leniency, learning, and evaluations. Journal of Educational Psychology, 70, 855-863.

            Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation and prediction (3rd ed.). New York: Harcourt, Brace.

Schmelkin, L. P., Spencer, K. J., & Gellman, E. S. (1997). Faculty perspectives on course and teacher evaluations. Research in Higher Education, 38, 575-592.

Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel Analysis: An introduction to basic and advanced multilevel modeling. London: Sage.

Tata, J. (1999). Grade distributions, grading procedures, and students’ evaluations of instructors: A justice perspective. Journal of Psychology Interdisciplinary & Applied, 133, 263-271.

            Wachtel, H. K. (1998). Student evaluation of college teaching effectiveness: A brief review. Assessment & Evaluation in Higher Education, 23, 191-212.

            Wigington, H., Tollefson, N., & Rodriguez, E. (1989). Students' ratings of instructors revisited: Interactions among class and instructor variables. Research in Higher Education, 30, 331-344.

            Wilson, R. (1998, January 16). New research casts doubt on value of student evaluations of professors. The Chronicle of Higher Education, p. A12-A14.

            Worthington, A.G., & Wong, P. T. P. (1979). Effects of earned and assigned grades on student evaluations of an instructor. Journal of Educational Psychology, 71, 764-775.


Table 1. Descriptive Statistics and Correlations Among Student-level Variables

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

1

1.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

.789

1.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3

.715

.646

1.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4

.716

.676

.728

1.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5

.654

.618

.700

.759

1.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6

.479

.391

.499

.492

.481

1.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

7

.617

.505

.548

.556

.544

.652

1.000

 

 

 

 

 

 

 

 

 

 

 

 

 

8

.556

.658

.569

.608

.543

.415

.498

1.000

 

 

 

 

 

 

 

 

 

 

 

 

9

.604

.519

.543

.571

.570

.602

.637

.467

1.000

 

 

 

 

 

 

 

 

 

 

 

10

.675

.557

.635

.628

.618

.644

.762

.511

.703

1.000

 

 

 

 

 

 

 

 

 

 

11

.656

.577

.645

.664

.662

.604

.663

.520

.671

.737

1.000

 

 

 

 

 

 

 

 

 

12

.544

.487

.605

.597

.627

.537

.504

.506

.571

.658

.649

1.000

 

 

 

 

 

 

 

 

13

.232

.167

.158

.190

.164

.238

.252

.061

.361

.276

.239

.145

1.000

 

 

 

 

 

 

 

14

-.208

-.212

-.153

-.197

-.174

-.130

-.235

-.161

-.246

-.224

-.184

-.081

-.168

1.000

 

 

 

 

 

 

15

.216

.202

.150

.176

.153

.109

.127

.174

.122

.134

.120

.117

.084

-.074

1.000

 

 

 

 

 

16

-.356

-.304

-.213

-.240

-.199

-.256

-.294

-.163

-.369

-.319

-.284

-.218

-.253

.227

-.284

1.000

 

 

 

 

17

.131

.135

.157

.081

.133

.072

.045

.178

.027

.075

.099

.174

-.337

.155

.012

.112

1.000

 

 

 

18

.048

.057

.114

-.021

.083

.015

.023

.039

.021

.039

.089

.086

-.169

.063

-.047

.067

.478

1.000

 

 

19

.148

.296

.164

.206

.156

.036

.108

.309

.094

.111

.113

.129

-.052

-.058

.097

-.031

.142

.166

1.000

 

20

.155

.154

.136

.149

.129

.121

.217

.095

.219

.179

.167

.064

.154

-.537

.061

-.164

-.235

-.085

.043

1.000

M

3.85

3.50

4.06

3.87

4.12

4.52

4.26

4.04

4.19

4.27

4.13

4.47

2.94

0.16

0.23

0.21

3.25

3.47

3.25

3.52

SD

1.16

1.13

1.11

1.15

1.02

0.81

0.99

1.14

1.02

0.97

1.01

0.80

1.16

0.36

0.42

0.41

0.90

0.94

1.10

0.63

Note. Variables include: 1 = Overall Instructor Rating; 2 = Overall Course Rating; 3 = Dynamic/Energetic Rating; 4 = Presented Clearly Rating; 5 = Materials Organized Rating; 6 = Students Invited to Share Ideas Rating; 7 = Students Could Seek Help Rating; 8 = Course Content Worthwhile Rating; 9 = Fair Evaluations Rating; 10 = Instructor Show Interest in Students Rating; 11 = Feedback Helpful Rating; 12 = Instructor Knowledgeable Rating; 13 = Grading Leniency; 14 = Negative Grading Discrepancy; 15 = Positive Reputation Dummy; 16 = Negative Reputation Dummy; 17 = Course Difficulty; 18 = Course Workload; 19 = Prior Subject Interest; 20 = Expected Grade.

All correlations larger than .071 in absolute value are statistically significant at the .05 level.

n = 754


Table 2. Multilevel Regression Results for Student Ratings of Instruction

 

Overall Instructor

 

Overall   Course

 

Dynamic and Energetic

 

Presented Clearly

 

Materials Organized

 

Students Shared Ideas

Fixed Portion of Model

B

SE B

 

B

SE B

 

B

SE B

 

B

SE B

 

B

SE B

 

B

SE B

Student Level

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Grading Leniency

.14*

.03

 

.10*

.03

 

.13*

.03

 

.15*

.03

 

.12*

.03

 

.13*

.03

Grade Lower than Expected

-.20*

.10

 

-.25*

.10

 

-.09

.10

 

-.19

.10

 

-.21*

.10

 

-.06

.09

Instructor Reputation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pos. Reputation

.22*

.08

 

.11

.08

 

.10

.08

 

.08

.08

 

.10

.08

 

.05

.07

Neg. Reputation

-.45*

.10

 

-.39*

.10

 

-.24*

.10

 

-.14

.10

 

-.17

.10

 

-.32*

.08

Course Difficulty

.16*

.04

 

.13*

.04

 

.12*

.04

 

.11*

.05

 

.11*

.04

 

.13*

.04

Course Workload

.01

.04

 

.03

.04

 

.01

.04

 

-.07

.05

 

.04

.04

 

-.02

.04

Prior Interest in Subject

.09*

.03

 

.20*

.03

 

.08*

.03

 

.11*

.03

 

.05

.03

 

.02

.03

Expected Grade

.20*

.06

 

.17*

.06

 

.18*

.06

 

.23*

.07

 

.15*

.06

 

.13*

.05

Intercept

2.50

.47

 

2.23

.45

 

2.82

.48

 

2.93

.48

 

3.19

.43

 

3.34

.34

Class Level

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class Size

-.01

.01

 

-.02

.01

 

-.01

.02

 

-.03

.02

 

-.02

.01

 

.00

.01

Instructor’s Sex

-.60*

.21

 

-.50*

.20

 

-.51*

.22

 

-.46*

.21

 

-.40*

.17

 

-.13

.11

Random Portion of Model

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class-level variance

.38*

 

 

.33*

 

 

.39*

 

 

.37*

 

 

.23*

 

 

.08*

 

Student-level variance

.66*

 

 

.63*

 

 

.67*

 

 

.72*

 

 

.66*

 

 

.14*

 

R2  (total variance modeled)

.26

 

 

.26

 

 

.17

 

 

.18

 

 

.16

 

 

.67

 

* p < .05.

n = 754 students in 39 courses.


Table 2. continued

 

Students Could to Seek Help

 

Course Content Worthwhile

 

Fair Evaluation of Students

 

Interest in Students

 

Feedback Helpful

 

Instructor Knowledgeable

Fixed Portion of Model

B

SE B

 

B

SE B

 

B

SE B

 

B

SE B

 

B

SE B

 

B

SE B

Student Level

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Grading Leniency

.15*

.03

 

.08*

.03

 

.21*

.03

 

.15*

.03

 

.16*

.03

 

.09*

.03

Grade Lower than Expected

-.26*

.10

 

-.27*

.11

 

-.22*

.10

 

-.27*

.09

 

-.15

.10

 

-.03

.08

Instructor Reputation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pos. Reputation

.07

.08

 

.06

.09

 

-.01

.08

 

.05

.08

 

.03

.08

 

.01

.07

Neg. Reputation

-.40*

.10

 

-.33*

.11

 

-.55*

.09

 

-.40*

.09

 

-.36*

.10

 

-.35*

.08

Course Difficulty

.13*

.04

 

.18*

.05

 

.11*

.04

 

.13*

.04

 

.10*

.04

 

.16*

.04

Course Workload

-.01

.04

 

.00

.05

 

.02

.04

 

.02

.04

 

.08

.04

 

.01

.04

Prior Interest in Subject

.08*

.03

 

.21*

.03

 

.07*

.03

 

.06*

.03

 

.05

.03

 

.05

.03

Expected Grade

.18*

.06

 

.10

.07

 

.22*

.06

 

.15*

.06

 

.24*

.06

 

.08

.05

Intercept

2.88

.39

 

2.88

.47

 

2.46

.39

 

3.11

.39

 

2.58

.41

 

3.73

.33

Class Level

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class Size

-.00

.01

 

-.02

.01

 

-.00

.01

 

-.01

.01

 

-.01

.01

 

-.02*

.01

Instructor’s Sex

-.41*

.13

 

-.52*

.19

 

-.20

.14

 

-.33*

.14

 

-.35*

.16

 

-.23*

.11

Random Portion of Model

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class-level variance

.11*

 

 

.28*

 

 

.15*

 

 

.16*

 

 

.18*

 

 

.09*

 

Student-level variance

.67*

 

 

.78*

 

 

.61*

 

 

.58*

 

 

.66*

 

 

.47*

 

R2  (total variance modeled)

.22

 

 

.21

 

 

.27

 

 

.23

 

 

.19

 

 

.15

 

* p < .05.

n = 754 students in 39 courses.


Copyright © 2000, Bryan W. Griffin

Last revised on 08 October, 2002 11:19 PM