Instructor Reputation and Student Ratings of Instruction

Bryan W. Griffin
Georgia Southern University

(To be published in Contemporary Educational Psychology)

 

I thank Drs. Kent A. Rittschof, Namok Choi, and Marlynn M. Griffin for their valuable and insightful comments on this research.

            Correspondence concerning this article should be addressed to Bryan W. Griffin; Department of Curriculum, Foundations, and Research; Georgia Southern University; P. O. Box 8144; Statesboro, GA 30460 (E-mail: bwgriffin@gasou.edu).


Abstract

The purpose of this study was to examine the association between instructor reputation, as perceived by students, and student evaluations of the instructor and course. A total of 754 students from 39 classes participated in the study. Based upon what students claimed to have heard about the instructor prior to enrolling in the course, they were classified into one of three groups: positive reputation, no information, and negative reputation. Using these groupings, two analyses were performed. In the first, mean overall ratings for the instructor and course were calculated and presented by class. In the second, both instructor and course ratings were modeled using multilevel regression. Results show large mean differences in both instructor and course ratings between the positive and negative reputation groups. More specifically, students who heard positive information regarding the instructor's reputation rated both the instructor and course higher than students who heard negative information about the instructor.


Instructor Reputation and Student Ratings of Instruction

            The use of student ratings of instruction for teaching improvement and administrative decision making is now widespread in colleges and universities across North America (Wilson, 1998), and their use worldwide appears to be growing (Husbands, 1996; Husbands & Fosh, 1993; Powell, Hunt, & Irving, 1997; Stringer & Irwing, 1998). When asked, most faculty members approve of the use of student ratings of instruction for teaching improvement (Baxter, 1991; Griffin, 1999; Moses, 1986; Schmelkin, Spencer, & Gellamn, 1997), but many are resistant to the use of student ratings for tenure, promotion, and merit decisions (Feldman, 1997; McKeachie, 1997a). What many educators believe is that student ratings are affected, or biased, by a number of factors unrelated to teaching performance (Marsh & Overall, 1979; Wilson, 1998), yet numerous studies on the validity of student ratings have shown that relatively few variables unrelated to instructional performance correlate strongly with student ratings (Marsh & Dunkin, 1992; Marsh & Roche, 1997). However, one variable, instructor reputation, does appear to impact student ratings, but has received relatively little attention (Wachtel, 1998) in the hundreds of studies on student ratings over the last several decades. In this context instructor reputation is defined as information students learn about an instructor prior to enrolling in the instructor’s course. This information may be obtained from a number of sources, such as published ratings of the instructor or prior experience with the instructor, but in most instances this information will likely be obtained by word-of-mouth from other students.

            How might instructor reputation influence student ratings? Research on expectancies, impression formation, and stereotyping provides a clue (Higgins, 1996; Mackie & Hamilton, 1993; Olson, Roese, & Zanna, 1996). An instructor’s reputation, for example, may stimulate in students certain expectations, and these expectations may then act as filters that influence perceptions of events and behaviors. Thus, if a student hears from other students (or reads from published ratings) that an instructor is a poor teacher, is unfair, or communicates poorly, then this information may create an expectation about the instructor that could influence the student’s perceptions and judgements of instruction in the classroom. This relationship was illustrated well in Kelley’s (1950) experiment on instructor reputation and student ratings of instruction. Kelley found that when students were prompted prior to viewing a lecture that an instructor was either “very warm” or “rather cold” as a person, these simple descriptions, which were embedded in a one paragraph biographical sketch of the instructor, were enough to influence student ratings of the instructor in a positive or negative direction. Widmeyer and Loy (1988), in a replication of Kelley's experiment, obtained similar results.

            Both experimental and nonexperimental research on instructor reputation and student ratings of instruction support the hypothesis that expectations can influence students' perceptions and judgements. Among the experimental studies it is clear that attempts to manipulate instructor reputation either negatively or positively results in decreases or increases, respectively, in student ratings of instruction (Brady, 1994; Feldman & Prohaska, 1979; Kelley, 1950; McClelland, 1970; Perry, Abrami, Leventhal, & Check, 1979; Perry, Niemi, & Jones, 1974; Widmeyer & Loy, 1988). Most of these experiments, however, did not occur in field settings, or only involved one lecture or presentation by an instructor. As a result of these limitations, it is difficult to know whether such manipulations of reputation would hold in regular classrooms over a longer period of time, such as a full semester.

            A number of researchers have examined the association between instructor reputation and student ratings of instruction with nonexperimental studies. The most common approach has been to survey students to learn why they selected a particular course or instructor. Research of this nature has shown consistent results; instructor reputation is always ranked among the top reasons for selecting the course (Centra & Creech, 1976; Leventhal, Abrami, & Perry, 1976; Leventhal, Abrami, Perry, & Breen, 1975). In addition, the findings of these studies show that students who selected the course due to the instructor's reputation also rated the instructor higher on end-of-term ratings of instruction than students who selected the course for reasons other than reputation (Tollefson & Wigington, 1986; Wigington, Tollefson, & Rodriguez, 1989). These studies do not show, however, whether the information students heard about the instructor's reputation was negative, positive, or mixed, and how such an assessment of reputation would correlate with student ratings of instruction. At least two field-based studies have examined the association between instructor reputation and student ratings of instruction.

In one such study, Ory (1980) asked students to indicate, among a number of background and affective factors, the amount of gossip they heard about the course and their pre-course opinion (either positive, negative, or no opinion) for the instructor. While neither of these variables are direct measures of instructor reputation, they appear to provide relevant information. Ory found that students who heard more gossip about the course tended to rate the instructor’s teaching higher, and that students who entered the course with positive opinions about the instructor also rated the instructor’s teaching higher. In a similar study, Barké, Tollefson, and Tracy (1983) had students in 65 classes complete a 52-item questionnaire at the beginning of the semester that measured several background variables. Fifteen of the items were indicators of instructor reputation (e.g., instructor is fair, hard grader, and is sensitive to class response). Of the 927 student respondents, 60% had "no basis on which to make a judgement" (p. 78) to each item related to reputation on the questionnaire, and 87% could not make a judgement to more than five items. At the end of the semester students completed a course evaluation form that measured six general areas related to instruction. Using stepwise regression to determine the predictive contribution of a large number of variables, Barké, Tollefson, and Tracy found that only a few of the 15 items designed to measure instructor reputation correlated with end-of-course ratings of instruction. For example, the expected grading leniency of the instructor at the outset of the course correlated with three of the six dimensions of teaching effectiveness: relationship with students, subject matter expertise, and evaluation standards. Despite the findings of these two studies, it is still not clear how students' overall impression of an instructor’s reputation relates to end-of-term instructional ratings and whether this relationship persists when other factors associated with student ratings (e.g., course workload or difficulty, expected grade) are controlled.

To better understand the nature of the relationship between instructor reputation and student ratings of instruction, the present study was conducted to provide additional, field-based data that more closely addressed this issue. Students were asked to make overall judgements about what they heard concerning the instructor before enrolling in the course. This information was then categorized as either positive reputation, negative reputation, or no prior information. These categories were used to compare overall, end-of-term ratings of the instructor and the course. Finally, the relationship between instructor reputation and student ratings was examined within the context of variables that have been shown in previous research to relate to student ratings of instruction, such as prior subject interest, course workload and difficulty, and expected grade for the course.

Method

Participants

            A total of 754 undergraduate students enrolled in 39 education courses at a medium sized (14,000 students), regional university in the southeastern United States participated in this study. The classes ranged in size from 6 to 34 students. Undergraduate education students at this institution are predominately White (71%) and female (80%). Most respondents (76%) reported grade point averages in the range of 2.5 to 3.5 on a 4.0 scale.

Instrument and Variables

            An instrument to assess student evaluations of instruction and course characteristics was developed drawing item and question wording from multiple sources (Abrami, d' Apollonia, & Rosenfield, 1997; Feldman, 1997; Marsh, 1987; Murray, 1997). The variables used in this study were measured in the manner described below.

 

Overall instructor rating. Students were asked to respond on a 5-point scale (1 = poor to 5 = excellent) to the statement "Overall, how would you rate this instructor?"

Overall course rating. Using the same scale described above, overall assessment of the course was measured by student responses to this question: "Overall, how would you rate this course?"

Instructor reputation. This variable was measured by student responses to the question "Before taking this course, what did you hear about this instructor?" The scale included the following step descriptors: 1 = instructor very bad, 3 = about average, 5 = instructor very good, 6 = didn't know about instructor.

Course difficulty. On a 1 to 5 point scale, students rated the perceived difficulty of the course (1 = one of easiest to 5 = one of most difficult) by addressing the statement "Course difficulty, relative to other courses was:"

Course workload. This variable was assessed, using the scale 1 = very light to 5 = very heavy, by responses to the statement "Course workload, relative to other courses was:"

Prior subject interest.1 Students responded to the question "What was your level of interest in this subject matter before taking this course?" using the scale 1 = no interest at all to 7 = very interested.

Expected grade. Students indicated the grade they expected to receive in the class by selecting responses ranging from F = 1 to A+ = 13 to address the following question: "What grade do you think the instructor will assign you in this course?"

 

            In addition to these variables, two other variables were used in this study, class size and the instructor’s sex. Class size was the number of students enrolled in the class (M = 19.56, SD = 7.30), and instructor’s sex was dummy (Pedhazur, 1997) coded 1 for males (n = 16) and 0 for females (n =  23).

Procedures

            Students in 39 classes were administered the evaluation instrument on the last day of class during the fall 1998 semester. A student in each class was selected to dispense and collect all instruments, and to return the instruments to a central location in the College of Education. Instructors were required to leave the classroom during evaluations. Students were told that evaluations would not be made available until after course grades had been assigned and would only be provided to instructors in aggregate form.

                 To use the instructor reputation variable in the analyses that follow, it was necessary to incorporate the didn’t know about instructor response, option 6, with the other five response options in such a way that would allow for regression analyses to be performed. To do this, responses to the instructor reputation variable, described above, were categorized as follows:

 

(a)    Negative reputation: This category included students who selected responses 1 to 3 (instructor very bad to about average) for the instructor reputation item.

(b)    Positive reputation: Students in this category choose responses 4 and 5 (above average to instructor very good) for the instructor reputation item.

(c)    No information: This category consisted of students who selected response 6 (didn’t know about the instructor) for the instructor reputation item.

 

                 In determining how to construct the three reputation categories, it was clear that response options 1 and 2 represented negative information, options 4 and 5 reflected positive information, and option 6 indicated no prior information was known about the instructor. It was less clear how response option 3 (about average) should be classified since this response may reflect a mixture of both positive and negative information regarding the instructor. The decision to include option 3 with 1 and 2 was empirically driven. For each of the first five response options for the instructor reputation item, mean ratings on both overall instructor and overall course were examined. This analysis revealed that students who selected options 1 through 3 provided mean overall ratings of the instructor and course that were very similar and, in contrast, were markedly different from mean overall ratings by students who selected responses 4 and 5. For example, the mean overall instructor rating by students who selected response options 1 and 2 was 2.90; the mean rating by students who selected option 3 was 3.18; and the mean rating by students who selected responses 4 and 5 was 4.31.

                 To facilitate analysis of instructor reputation, two dummy variables were created. The first, called positive reputation, was coded 1 if student responses corresponded with the positive reputation category, and 0 otherwise. The second indicator variable was labeled negative reputation and was coded 1 if student responses corresponded with the negative reputation category otherwise a 0 was used. Of the 754 respondents, 176 (23.3%) were classified into the positive reputation group, 420 (55.7%) into the no information group, and 158 (21%) into the negative information group. The percentage of respondents in this study who had no prior information about the instructor matches well the figure of 60% reported in Barké, Tollefson, and Tracy’s (1983) study.

Results

                 Descriptive statistics and correlations among all student-level variables are given in Table 1. The correlations in Table 1 show that the dummy variables positive reputation was positively related, and negative reputation was negatively related, to overall ratings for both the instructor and course. Thus, students who heard positive information regarding the instructor’s reputation rated the instructor and course higher than students who heard primarily negative information or no information. Course difficulty, prior subject interest, and expected grade in the course were also positively related to instructor and course ratings. To understand better how reputation relates to student ratings, two separate analyses were performed. First, means for each rating (instructor and course) were calculated for each of the reputation groups formed (positive, no information, and negative) and presented by selected classes. Second, mean differences among the three reputation groups were estimated via multilevel regression after controlling for covariates that have been shown in previous research to relate to student ratings of instruction.  

 

Table 1 about here

 

Analysis 1: Mean Overall Ratings by Reputation Groups and Instructor

            Differences in ratings among the three reputation groups of students could be interpreted in at least two ways. With the first interpretation, differences in overall ratings of the instructor and course may be a function of a priming effect upon expectancies that colors students’ judgments about the instructor and course. An alternative interpretation is that differences in overall ratings among reputation groups stem from actual instructional differences among instructors. That is, better instructors have earned the positive reputation by offering better instruction while the converse is true for instructors with negative reputations. If one only compared student ratings between instructors, one would not be able to separate the confounded effects of priming-expectancies from actual instructional differences among instructors. One method for controlling for actual instructional differences across instructors is to examine differences among the reputation groups within instructors. Thus, the purpose of this first set of analyses was to explore, in a descriptive manner, student ratings by the three reputation groups within instructors to address this important interpretational issue. If differences in overall ratings of the instructor and course occur within the same classroom for the same instructor, then the alternative interpretation is seriously weakened.

 

Tables 2 and 3 about here

 

Descriptive statistics and effect sizes by each reputation group for overall rating of the instructor are presented in Table 2, and for overall rating of the course are presented in Table 3. Note that the data presented in these two tables reflect only those classes for which three or more students were present in each of the three reputation groups within the same classroom. A total of 9 of the 39 classes surveyed met this criterion. It is important to point out that each row of Tables 2 and 3 reflects a unique, distinct instructor and class; thus, each row represents a within instructor comparison rather than a between or across instructor comparison.

            As the means and effect sizes in Tables 2 and 3 show, students who indicated that they heard positive information about the instructor prior to enrolling in the course consistently rated the instructor higher than students in the same class who indicated they heard nothing about the instructor before enrolling in the course. Conversely, students who indicated that they heard primarily negative information about the instructor before enrolling in the course tended to rate the instructor worse than students in the same class who heard nothing about that instructor. Similar relationships existed for the ratings of the course. Overall, the effect size for the difference between the positive reputation and negative reputation groups for the instructor rating is d = .82 – (–.40) = 1.22. The effect size for the difference between the positive and negative reputation groups for the course rating is d = .42 – (–.63) = 1.25. Both of these effect size estimates are substantial. Note the consistent pattern of means revealed in these two tables. In 17 of the 18 cases examined, the mean for the positive reputation group is larger than the mean for the negative reputation group except for class 9 for overall course rating.

Analysis 2: Multilevel Regression

The effect sizes reported above do not reflect adjustments for control of possible confounding variables such as prior subject interest, instructor sex, or class size. To better estimate the size of the reputation effect, several covariates that relate to instructional evaluations were included in the models of instructor and course ratings. Researchers in the area of student ratings (e.g., Cranton & Smith, 1990; Feldman, 1998) have noted that the unit of analysis examined, either student level or class-mean level, may have an important effect on the types of relationships revealed between ratings of instruction and various predictor variables. The analysis of class means only, which is advocated by Marsh (1987) for example, may obscure important variation in ratings that result from individual student differences within the classroom. Given this, multilevel regression (Bryk & Raudenbush, 1992; Goldstein, 1995; Longford, 1993) was used in an effort to examine variation in student ratings both within and across classes (Feldman, 1998).

            The covariates included in the analysis at the student level were course difficulty, course workload, prior subject interest, and expected grade in the course. Research on student ratings has shown that course difficulty and course workload, often measured together, relate positively to ratings of instruction (Greenwald & Gillmore, 1997a, 1997b; Marsh, 1980; Marsh & Roche, 2000). Interest in the subject matter of the course before enrollment—prior subject interest—has been linked to higher student ratings of instruction (Howard & Maxwell, 1980; Marsh, 1980; Prave & Baril, 1993). Expected grade in the course, which typically correlates positively with ratings, has been the subject of much debate and research (Greenwald & Gillmore, 1997a; Marsh, 1987; Marsh & Roche, 1997, 2000; McKeachie, 1997b) and therefore was included in the analysis.

            At the class level, class size and instructor sex were included. Research demonstrates that class size correlates, albeit weakly, with ratings of instruction (Feldman, 1984). The sex of the instructor also appears to relate to student ratings. Feldman's (1998) reviews have shown that women tend to receive slightly higher ratings than men. However, Feldman (1998) also notes that a same-sex favorability in ratings exist; students of the same sex as their instructor may provide slightly higher ratings (Centra & Gaubatz, 2000). Since the majority of students in the classes examined in this study were women, it is likely that women instructors in this sample may have higher ratings.

            Thus, the models examined were, with variables enclosed in parentheses, as follows:

Student-level.

(Overall Instructor Rating)ij = b0j + b1 (Positive Reputation)ij + b2 (Negative Reputation)ij

 + b3 (Course Difficulty)ij + b4 (Course Workload)ij + b5 (Prior Subject Interest)ij

 + b6 (Expected Grade)ij + eij 

At the class-level, mean ratings of the instructor were modeled with class size and instructor sex:

Class-level.

b0j = g00 + g01 (Instructor’s Sex)j + g02 (Class Size)j + m0j

 

Combining the student- and class-level equations yields the following model of instructor rating:

Combined.

 (Overall Instructor Rating)ij = g00 + g01 (Instructor’s Sex)j + g02 (Class Size)j

 + b1 (Positive Reputation)ij + b2 (Negative Reputation)ij + b3 (Course Difficulty)ij

 + b4 (Course Workload)ij + b5 (Prior Subject Interest)ij + b6 (Expected Grade)ij + eij + m0j

 

The same student-level, class-level, and combined models were also examined for overall course rating.

 

Table 4 about here

 

Results of the multilevel regression, using full information maximum likelihood to obtain estimates (Hox, 1995), are presented in Table 4. For these analyses note that all 39 classes and 754 student responses were included. The parameter estimates in Table 4 show that the mean difference in student ratings of the instructor is .25 for the positive reputation vs. the no information groups, and

-.49 for the negative reputation vs. the no information groups. These estimates indicate that positive and negative reputation correspond with higher and lower mean ratings even after controlling for prior subject interest, expected grade, etc. The estimated mean difference in student ratings between the positive reputation and negative reputation groups is .25 – (–.49) = .74 (SE = .10, t = 7.37, p < .001). This difference can be converted to an effect size, d, by dividing it by the standard deviation of overall instructor rating responses which is estimated via the null multilevel model (Bryk & Raudenbush, 1992; Hox, 1995) as 1.19, thus d = .74 / 1.19 = .62. (Note that the standard deviation obtained from the null multilevel regression model is slightly larger than the standard deviation provided in Table 1. This is not uncommon in multilevel modeling according to Snijders & Bosker, 1994.) After controlling for the various covariates included in the model, there still appears to be a large difference in overall ratings of the instructor between students who heard positive and negative information about the instructor prior to enrolling in the class.

            Like the estimates for the instructor’s ratings, the estimated mean differences between the reputation groups for overall ratings of the course also reveal mean ratings that are higher for the positive reputation group and lower for the negative reputation group. The estimated difference between the positive reputation and no information groups was .14, which was not statistically significant at the .05 level. The difference between the negative reputation and no information group, -.43, was statistically significant at the .05 level. The estimated mean difference in student ratings of the course between the positive reputation and negative reputation groups is .14 – (–.43) = .57 (SE = .10, t = 5.75, p < .001), which is statistically significant. This estimate can also be converted to a standardized effect: d = .57 / 1.14 =  .50. Thus, the difference in mean ratings for the course between students who heard positive vs. negative information regarding the instructor is estimated to be just less than half a standard deviation.

            The regression models also showed that course difficulty, prior subject interest, expected grade, and instructor sex were statistically related to both overall instructor and overall course ratings. Course workload and class size were not statistically related to student ratings. Recently Marsh and Roche (2000) demonstrated the importance of considering course workload in assessing factors related to student ratings. A potential limitation for one of Marsh and Roche’s analyses was that course workload was confounded with course difficulty. The results provided here show that perceived course difficulty, not workload, corresponds positively with student ratings. As with many other studies, prior subject interest and expected grade were statistically related to student ratings (Greenwald & Gillmore, 1997a; Marsh, 1980, 1987; Marsh & Roche, 2000; Prave & Baril, 1993). The models also revealed a strong instructor sex association with ratings. As noted earlier, a disproportionate number of students in the sample were female since the sample consisted of college of education faculty and students. Centra and Gaubatz (2000) and Feldman (1998) reported that a same-sex favorability exists in ratings, so female instructors are likely to benefit from a disproportionate number of female students in their classrooms. In addition, Feldman also notes that female instructors tend to get slightly better ratings in general. Combined, these two factors may explain the large estimated mean difference between male and female instructors modeled here.

Discussion

            Results from this study show that prior information about the instructor, when interpreted by students either positively or negatively, corresponds with higher or lower end-of-term ratings for both the instructor and the course. This association demonstrates large mean differences even when important predictors of student ratings are taken into account, such as prior subject interest, course difficulty, and expected grade. As a practical example to illustrate this difference, consider estimated ratings for a hypothetical female instructor who has two groups of students in her class—those who heard negative information about the instructor prior to enrolling in the course and those who heard positive information prior to enrolling. With all modeled covariates held constant at their means, the predicted rating for this instructor by students in the positive reputation group is 4.42, and the predicted mean rating for this instructor by students in the negative reputation group is 3.68, a mean difference of .74 points. For administrative and personnel decisions based upon student ratings, which is becoming more common, a mean difference of this size in ratings could have important consequences (Perry & Smart, 1997; McKeachie, 1997b; Wilson, 1998).

While the current study differs from previous research on instructor reputation in a number of ways (e.g., inclusion of distinct reputation categories, multiple covariates, and use of student-level and class-level data), the results of this study are consistent with findings from both experimental (Brady, 1994; Feldman & Prohaska, 1979; Kelley, 1950; McClelland, 1970; Perry, Abrami, Leventhal, & Check, 1979; Perry, Niemi, & Jones, 1974; Widmeyer & Loy, 1988) and nonexperimental (Barké, Tollefson, & Tracy, 1983; Ory, 1980) investigations of instructor reputation and student ratings of instruction. Combined, the evidence from these studies indicates that instructor reputation is associated with ratings of instruction. Not addressed in these studies is the question of which types of prior information shape student assessment of an instructor’s reputation. For example, do published ratings or written comments about an instructor influence reputation more than informal communication among students? Of those factors that shape reputation, which are most important? Does, for instance, hearing that an instructor is a hard grader, is disorganized, or communicates poorly have the same priming influence on reputation assessment as hearing that an instructor is an uncaring person, is not enthusiastic in the classroom, or does not value student input?

            While an instructor’s reputation may influence, to some extent, the end-of-term evaluations the instructor receives, the more critical issues of motivation, cognition, and learning should not be overlooked. Clearly certain types of priming information are likely to engender stronger expectancies than others, and if expectancies shape one's judgement about the effectiveness of an instructor, then it is possible that the instructional expectancies students bring to the classroom may affect their motivation to learn (Bandura, 1997). If students anticipate a poor instructor who does not provide, for example, contingent feedback, interaction, or clear and organized material, then this could lead to a weakened sense of personal control that may negatively affect active engagement, persistence, self-regulation, and interest in the subject (Perry, 1997; Pintrich & Schunk, 1996). Experimental research in this area suggests that there may be some connection between instructional expectancies and student motivation. Kelley (1950), for instance, noted that students who were primed that the instructor was a very warm person participated more in class discussions than students who read that the instructor was rather cold. Similarly, Feldman and Theiss (1982) noted that students with positive expectations about the instructor’s competence perceived the instructor and lecture more positively. Feldman and Prohaska (1979) also found that positive and negative expectancies for the instructor influenced student attitudes and non-verbal behavior in the classroom, and had some impact on achievement. Perry, Abrami, Leventhal, and Check (1979) found that reputation did not affect achievement, but did interact with some teaching behaviors to influence student ratings of instruction.

            Finally, there is some evidence that students’ expectations for the instructor could influence the instructional environment of the classroom. For example, Feldman and Prohaska (1979) found that students’ expectancies for the instructor were related to the instructor’s attitudes and behaviors in the classroom, but in a follow-up study, Feldman and Theiss (1982) found that student expectancies did not affect the instructor’s behaviors. The connection between student expectancies for the instructor and the instructor's behavior seems weak, but it could exist. If student expectancies for the instructor do shape the instructor’s behavior, then this suggests a reciprocal model similar to Bandura’s (1986) triadic reciprocality. First, an instructor’s reputation may affect student motivation, perceptions of control, and various thoughts and behaviors in the classroom. Noting these student behaviors, the instructor may react either positively or negatively, depending upon the cues received, and this reaction could directly affect the quality of instruction provided. If this model is credible, then it is possible that instructor reputation may influence student behavior and thought, but not directly bias student ratings. More specifically, if a positive or negative reputation corresponds with higher or lower end-of-term evaluations by students, then perhaps this occurs as a result of attitudes and behaviors that are communicated within the classroom between students and instructors. More research is needed to determine if, and by how much, student expectancies for the instructor influence the instructor's behaviors in the classroom. If this relationship is weak or nonexistent, then reputation appears to be a biasing factor in student ratings of instruction.


References

            Abrami, P. C., d'Apollonia, S., & Rosenfield, S. (1997). The dimensionality of student ratings of instruction: What we know and what we do not. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 321-367). New York: Agathon.

Bandura, A.  (1986). Social foundations of thought and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice Hall.

Bandura, A. (1997). Self-efficacy: The exercise of control. New York: Freeman.

            Barké, C. R., Tollefson, N., & Tracy, D. B. (1983). Relationship between course entry attitudes and end-of-course ratings. Journal of Educational Psychology, 75, 75-85.

            Baxter, E. P. (1991). The TEVAL experience, 1983-88: The impact of a student evaluation of teaching scheme on university teachers. Studies in Higher Education, 16, 151-179.

            Brady, P. J. (1994). How likeability and effectiveness ratings of college professors by their students are affected by course demands and professors’ attitudes. Psychological Reports, 74, 907-913.

            Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage

Centra, J. A., & Creech, F. R., (1976). The relationship between student, teacher, and course characteristics and student ratings of teach effectiveness. (ETS Project Report 76-1). Princeton, NJ: Educational Testing Service.

Centra, J. A., & Gaubatz, N. B. (2000). Is there gender bias in student evaluations of teaching? The Journal of Higher Education, 70, 17-33.

Cranton, P., & Smith, R. A. (1990). Reconsidering the unit of analysis: A model of student ratings of instruction. Journal of Educational Psychology, 82, 207-212.

Feldman, K. A. (1994). Class size and college students' evaluations of teachers and courses: A closer look. Research in Higher Education, 21, 45-116.

            Feldman, K. A. (1997). Identifying exemplary teachers and teaching: Evidence from student ratings. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 368-395). New York: Agathon.

            Feldman, K. A. (1998). Reflections on the study of effective college teaching and student ratings: One continuing question and two unresolved issues. In J. C. Smart (Ed.) Higher Education: Handbook of Theory and Research (pp. 35-74). New York: Agathon.

            Feldman, R. S., & Prohaska, T. (1979). The student as Pygmalion: Effect of student expectation on the teacher. Journal of Educational Psychology, 71, 485-493.

            Feldman, R. S., & Theiss, A. J. (1982). The teacher and student as Pygmalion: Joint effects of teacher and student expectations. Journal of Educational Psychology, 74, 217-223.

            Goldstein, H. (1995). Multilevel statistical models (2nd ed.). London: Edward Arnold.

            Greenwald, A. G., & Gillmore, G. M. (1997a). Grading leniency is a removable contaminant of student ratings. American Psychologist, 52, 1209-1217.

            Greenwald, A. G., & Gillmore, G. M. (1997b). No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of Educational Psychology, 89, 743-751.

Griffin, B. W. (1999). Results of the Faculty Survey on Student Ratings of Instruction:

Preliminary Report. Statesboro, GA: Georgia Southern University, Student Ratings Committee.

            Higgins, E. T. (1996). Knowledge activation: Accessibility, applicability, and salience. In E. T. Higgins & A. W. Kruglanski (Eds.), Social Psychology: Handbook of Basic Principles (pp. 133-168). New York: Guilford.

            Howard, G. S., & Maxwell, S. E. (1980). Correlation between student satisfaction and grades: A case of mistaken causation? Journal of Educational Psychology, 72, 810-820.

Howard, G. S., & Schmeck, R. R. (1979). Relationship of changes in student motivation to student evaluations of instruction. Research in Higher Education, 10, 305-315.

            Hox, J. J. (1995). Applied multilevel analysis. Amsterdam: TT-Publikaties. Available on-line (March 6, 2000): http://www.ioe.ac.uk/multilevel/workpap.html

            Husbands, C. T. (1996). Variations in students’ evaluations of teachers’ lecturing and small-group teaching: A study at the London School of Economics and Political Science. Studies in Higher Education, 21, 187-207.

            Husbands, C. T., & Fosh, P. (1993). Students’ evaluation of teaching in higher education: Experiences from four European countries and some implications of the practice. Assessment & Evaluation in Higher Education, 18, 95-115.

            Kelley, H. H. (1950). The warm-cold variable in first impressions of persons. Journal of Personality, 18, 431-439.

            Leventhal, L., Abrami, P. C., & Perry, R. P. (1976). Do teacher rating forms reveal as much about students as about teachers? Journal of Educational Psychology, 68, 441-445.

            Leventhal, L., Abrami, P. C., Perry, R. P., & Breen, L. J. (1975). Section selection in multisection courses: Implications for the validation and use of teacher rating forms. Educational and Psychological Measurment, 35, 885-895.

            Longford, N. T. (1993). Random coefficient models. Oxford, UK: Oxford University Press.

            Mackie, D. M., & Hamilton, D. L. (Eds.). (1993). Affect, cognition, and stereotyping: Interactive processes in group perception. New York: Academic Press.

            Marsh, H. W. (1980). The influence of student, course, and instructor characteristics on evaluations of university teaching. American Educational Research Journal, 17, 219-237.

            Marsh, H. W. (1987). Students’ evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, 253-388.

            Marsh, H. W., & Dunkin, M. J. (1992). Students' evaluations of university teaching: A multidimensional perspective. In J. C. Smart (Ed.), Higher Education: Handbook of Theory and Research (Volume VIII) (pp. 143-232). New York: Agathon Press.

            Marsh, H. W., & Overall, J. U. (1979). Validity of students' evaluations of teaching: A comparison with instructor self-evaluations by teaching assistants, undergraduate faculty, and graduate faculty. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA. (ERIC Document No. ED 177 205).

            Marsh, H. W., & Roche, L. A. (1997). Making students' evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52, 1187-1197.

            Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on students’ evaluations of teaching: Popular myth, bias, validity, or innocent bystanders? Journal of Educational Psychology, 92, 202-228.

             McClelland, J. N. (1970). The effect of student evaluations of college instruction upon subsequent evaluations. California Journal of Educational Research, 21, 88-95.

            McKeachie, W. J. (1997a). Good teaching makes a difference—and we know what it is. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 396-408). New York: Agathon.

            McKeachie, W. J. (1997b). Student ratings: The validity of use. American Psychologist, 52, 1219-1225.

            Moses, I. (1986). Student evaluation of teaching in an Australian university—staff perceptions and reactions. Assessment & Evaluation in Higher Education, 11, 117-129.

            Murray, H. G. (1997). Effective teaching behaviors in the college classroom. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 171-204). New York: Agathon.

            Olson, J. M., Roese, N. J., & Zanna, M. P. (1996). Expectancies. In E. T. Higgins & A. W. Kruglanski (Eds.), Social Psychology: Handbook of Basic Principles (pp. 211-238). New York: Guilford.

            Ory, J. C. (1980). The influence of students’ affective entry on instructor and course evaluations. The Review of Higher Education, 4, 13-24.

            Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation and prediction (3rd ed.). New York: Harcourt, Brace.

            Perry, R. P. (1997). Perceived control in college students: Implications for instruction in higher education. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 11-60). New York: Agathon.

            Perry, R. P., Abrami, P. C., Leventhal, L., & Check, J. (1979). Instructor reputation: An expectancy relationship involving student ratings and achievement. Journal of Educational Psychology, 71, 776-787.

            Perry, R. P., Niemi, R. P., & Jones, K. (1974). Effect of prior teaching evaluations and lecture presentation on ratings of teaching performance. Journal of Educational Psychology, 66, 851-856.

            Perry, R. P., & Smart, J. C. (1997). Introduction. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 1-8). New York: Agathon.

            Pintrich, P. R., & Schunk, D. H. (1996). Motivation in education: Theory, research, and applications. Columbus, OH: Merrill

            Powell, A. M., Hunt, A., & Irving, A. (1997). Evaluation of courses by whole student cohorts: A case study. Assessment & Evaluation in Higher Education, 22, 397-404.

            Prave, R. S., & Baril, G. L. (1993). Instructor ratings: Controlling for bias from initial student interest. Journal of Education for Business, 68, 362-366.

Schmelkin, L. P., Spencer, K. J., & Gellman, E. S. (1997). Faculty perspectives on course and teacher evaluations. Research in Higher Education, 38, 575-592.

Snijders, T. A. B., & Bosker, R. J. (1994). Modeled variance in two-level models. Sociological Methods & Research, 22, 342-363.

Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel Analysis: An introduction to basic and advanced multilevel modeling. London: Sage.

Stringer, M., & Irwing, P. (1998). Students' evaluations of teaching effectiveness: A structural modelling approach. British Journal of Educational Psychology, 68, 409-426.

            Tollefson, N., & Wigington, H. (1986). Teacher-generated and student-generated variability in teacher effectiveness ratings. Instructional Science, 15, 109-120.

            Wachtel, H. K. (1998). Student evaluation of college teaching effectiveness: A brief review. Assessment & Evaluation in Higher Education, 23, 191-212.

            Widmeyer, W. N., & Loy, J. W. (1988). When you're hot, you're hot! Warm-cold effects in first impressions of persons and teaching effectiveness. Journal of Educational Psychology, 80, 118-121.

            Wigington, H., Tollefson, N., & Rodriguez, E. (1989). Students' ratings of instructors revisited: Interactions among class and instructor variables. Research in Higher Education, 30, 331-344.

            Wilson, R. (1998, January 16). New research casts doubt on value of student evaluations of professors. The Chronicle of Higher Education, p. A12-A14.  


Footnote

1 Both the instructor reputation and prior subject interest variables represent retrospective information in which students are asked to recall, at the end of a course, thoughts or opinions they held prior to enrolling in a course. There may be concern that such information is subject to faulty memory or confounding influences. At least two studies have examined this issue in regard to prior subject interest. Clegg (as cited in Prave & Baril, 1993) found a correlation of .93 between pre-course and end-of-course measures of student motivation, and Howard and Schmeck (1979) noted that end-of-course measurement of motivation was an accurate estimate of pre-course motivation.  


Table 1: Correlations and Descriptive Statistics among Student Ratings Variables (N = 754)

Variable

1

2

3

4

5

6

7

8

1. Overall Rating of Instructor

--

 

 

 

 

 

 

 

2. Overall Rating of Course

.79*

--

 

 

 

 

 

 

3. Positive Reputation Indicator

.22*

.20*

--

 

 

 

 

 

4. Negative Reputation Indicator

-.36*

-.30*

-.28*

--

 

 

 

 

5. Course Difficulty

.13*

.14*

.01

.11*

--

 

 

 

6. Course Workload

.05

.06

-.05

.07

.48*

--

 

 

7. Prior Subject Interest

.15*

.30*

.10*

-.03

.14*

.17*

--

 

8. Expected Grade

.17*

.16*

.05

-.17*

-.28*

-.11*

.04

--

Means

3.86

3.50

0.23

0.21

3.25

3.47

3.25

10.54

Standard Deviations

1.16

1.13

0.42

0.41

0.90

0.94

1.10

1.77

* p < .05.

   

Table 2: Means, Standard Deviations, and Effect Sizes for Overall Instructor Rating by the Three Categories of Reputation

 

 

Reputation Categories

 

 

 

 

Class

 

Positive Reputation

 

No Information

 

Negative Reputation

 

Overall Class

 

F-Test

 

 

M

d

n

 

M

n

 

M

d

n

 

M

SD

 

df

F

1

 

4.83

.40

12

 

4.55

11

 

3.67

-1.26

3

 

4.58

.70

 

2,23

4.17*

2

 

4.70

.53

10

 

4.17

6

 

3.00

-1.10

8

 

4.00

1.06

 

2,21

10.61*

3

 

3.50

1.21

4

 

2.44

16

 

2.22

-.25

9

 

2.52

.87

 

2,26

3.74*

4

 

5.00

.92

3

 

4.44

9

 

4.38

-.10

8

 

4.50

.61

 

2,17

1.26

5

 

4.13

.11

8

 

4.00

3

 

2.50

-.46

12

 

3.26

1.18

 

2,20

9.17*

6

 

4.00

1.32

5

 

2.67

6

 

3.00

.33

8

 

3.16

1.01

 

2,16

3.12

7

 

3.40

.52

5

 

2.75

4

 

1.92

-.66

13

 

2.41

1.26

 

2,19

3.23

8

 

4.00

1.03

4

 

3.00

4

 

2.50

-.51

12

 

2.90

.97

 

2,17

5.26*

9

 

4.00

1.38

5

 

2.83

12

 

3.17

.40

12

 

3.17

.85

 

2,26

4.07*

Mean d

.82

 

 

 

 

 

 

-.40

 

 

 

 

 

 

 

Note. The data presented in this table reflect only those classes for which three or more students were present in each of the three reputation groups. The effect size is defined as d = (M (positive [or negative])M (no information))/ SD(overall). 

* p < .05  

Table 3: Means, Standard Deviations, and Effect Sizes for Overall Course Rating by the Three Categories of Reputation

 

 

Reputation Categories

 

 

 

 

Class

 

Positive

Reputation

 

No Information

 

Negative Reputation

 

Overall Class

 

F-Test

 

M

d

n

 

M

n

 

M

d

n

 

M

SD

 

df

F

1

 

4.50

.39

12

 

4.18

11

 

3.33

-1.04

3

 

4.23

.82

 

2,23

2.86

2

 

3.90

-.32

10

 

4.17

6

 

3.25

-1.08

8

 

3.75

.85

 

2,21

2.60

3

 

3.00

1.00

4

 

2.25

16

 

2.00

-.33

9

 

2.28

.75

 

2,26

2.80

4

 

4.67

.59

3

 

4.22

9

 

3.63

-.78

8

 

4.05

.76

 

2,17

3.00

5

 

2.38

-1.25

8

 

3.67

3

 

2.00

-1.62

12

 

2.35

1.03

 

2,20

4.04*

6

 

4.00

.83

5

 

3.33

6

 

2.75

-.72

8

 

3.26

.81

 

2,16

5.68*

7

 

3.20

.69

5

 

2.50

4

 

2.15

-.35

13

 

2.45

1.01

 

2,19

2.15

8

 

4.00

1.51

4

 

2.50

4

 

2.25

-.25

12

 

2.65

.99

 

2,17

8.55*

9

 

3.00

.38

5

 

2.67

12

 

3.08

.48

12

 

2.90

.86

 

2,26

0.73

Mean d

.42

 

 

 

 

 

 

-.63

 

 

 

 

 

 

 

Note. The data presented in this table reflect only those classes for which three or more students were present in each of the three reputation groups. The effect size is defined as d = (M (positive [or negative])M (no information))/ SD(overall). 

* p < .05  

Table 4: Multilevel Regression Estimates for Models of Overall Rating of Instructor and Overall Rating of Course

 

Overall Instructor Rating

 

Overall Course Rating

Fixed Portion of Model

B

SE B

t

DR2

 

B

SE B

t

DR2

Student Level

 

 

 

 

 

 

 

 

 

Intercept

4.17

.15

27.93*

 

 

3.81

.14

27.84*

 

Instructor Reputation

 

 

 

.10

 

 

 

 

.08

Positive Reputation Dummy

.25

.08

3.10*

 

 

.14

.08

1.74

 

Negative Reputation Dummy

-.49

.10

-5.01*

 

 

-.43

.10

-4.48*

 

Course Difficulty

.11

.04

2.69*

.03

 

.10

.04

2.32*

.02

Course Workload

.01

.04

0.22

.00

 

.03

.04

0.65

.00

Prior Interest in Subject

.09

.03

2.95*

.00

 

.20

.03

6.62*

.04

Expected Grade

.11

.02

5.56*

.00

 

.10

.02

5.30*

.02

Class Level

 

 

 

 

 

 

 

 

 

Class Size

-.01

.02

-0.85

.00

 

-.02

.01

-1.57

.02

Instructor’s Sex

-.60

.23

-2.65*

.06

 

-.50

.21

-2.39*

.05

Random Portion of Model

 

 

 

 

 

 

 

Class-level Variance (between classes)

.44

c(36) = 443.88*

 

.36

c(36) = 419.84*

Student-level variance (within classes)

.67

 

 

 

.64

 

 

R2  (total variance modeled)

.21

 

 

 

.24

 

 

Note. All predictors, except dummy variables, centered about their grand means, and df = 36 for each statistical test. Sample sizes were 754 students nested within 39 classes. Partial R2, denoted DR2, is calculated in the normal manner (Pedhazur, 1997), but model variance is calculated by summing both the between and within class variances (Snijders & Bosker, 1999).

* p < .05


Copyright © 2000, Bryan W. Griffin

Last revised on 08 December, 2000 03:23 AM