Instructor
Reputation and Student Ratings of Instruction
Georgia
Southern University
I
thank Drs. Kent A. Rittschof, Namok Choi, and Marlynn M. Griffin for their
valuable and insightful comments on this research.
Correspondence concerning this article should be addressed to Bryan W. Griffin; Department of Curriculum, Foundations, and Research; Georgia Southern University; P. O. Box 8144; Statesboro, GA 30460 (E-mail: bwgriffin@gasou.edu).
Abstract
The
purpose of this study was to examine the association between instructor
reputation, as perceived by students, and student evaluations of the instructor
and course. A total of 754 students from 39 classes participated in the study.
Based upon what students claimed to have heard about the instructor prior to
enrolling in the course, they were classified into one of three groups: positive
reputation, no information, and negative reputation. Using these groupings, two
analyses were performed. In the first, mean overall ratings for the instructor
and course were calculated and presented by class. In the second, both
instructor and course ratings were modeled using multilevel regression. Results
show large mean differences in both instructor and course ratings between the
positive and negative reputation groups. More specifically, students who heard
positive information regarding the instructor's reputation rated both the
instructor and course higher than students who heard negative information about
the instructor.
Instructor
Reputation and Student Ratings of Instruction
The
use of student ratings of instruction for teaching improvement and
administrative decision making is now widespread in colleges and universities
across North America (Wilson, 1998), and their use worldwide appears to be
growing (Husbands, 1996; Husbands & Fosh, 1993; Powell, Hunt, & Irving,
1997; Stringer & Irwing, 1998). When asked, most faculty members approve of
the use of student ratings of instruction for teaching improvement (Baxter,
1991; Griffin, 1999; Moses, 1986; Schmelkin, Spencer, & Gellamn, 1997), but
many are resistant to the use of student ratings for tenure, promotion, and
merit decisions (Feldman, 1997; McKeachie, 1997a). What many educators believe
is that student ratings are affected, or biased, by a number of factors
unrelated to teaching performance (Marsh & Overall, 1979; Wilson, 1998), yet
numerous studies on the validity of student ratings have shown that relatively
few variables unrelated to instructional performance correlate strongly with
student ratings (Marsh & Dunkin, 1992; Marsh & Roche, 1997). However,
one variable, instructor reputation, does appear to impact student ratings, but
has received relatively little attention (Wachtel, 1998) in the hundreds of
studies on student ratings over the last several decades. In this context
instructor reputation is defined as information students learn about an
instructor prior to enrolling in the instructor’s course. This information may
be obtained from a number of sources, such as published ratings of the
instructor or prior experience with the instructor, but in most instances this
information will likely be obtained by word-of-mouth from other students.
How
might instructor reputation influence student ratings? Research on expectancies,
impression formation, and stereotyping provides a clue (Higgins, 1996; Mackie
& Hamilton, 1993; Olson, Roese, & Zanna, 1996). An instructor’s
reputation, for example, may stimulate in students certain expectations, and
these expectations may then act as filters that influence perceptions of events
and behaviors. Thus, if a student hears from other students (or reads from
published ratings) that an instructor is a poor teacher, is unfair, or
communicates poorly, then this information may create an expectation about the
instructor that could influence the student’s perceptions and judgements of
instruction in the classroom. This relationship was illustrated well in
Kelley’s (1950) experiment on instructor reputation and student ratings of
instruction. Kelley found that when students were prompted prior to viewing a
lecture that an instructor was either “very warm” or “rather cold” as a
person, these simple descriptions, which were embedded in a one paragraph
biographical sketch of the instructor, were enough to influence student ratings
of the instructor in a positive or negative direction. Widmeyer and Loy (1988),
in a replication of Kelley's experiment, obtained similar results.
Both
experimental and nonexperimental research on instructor reputation and student
ratings of instruction support the hypothesis that expectations can influence
students' perceptions and judgements. Among the experimental studies it is clear
that attempts to manipulate instructor reputation either negatively or
positively results in decreases or increases, respectively, in student ratings
of instruction (Brady, 1994; Feldman & Prohaska, 1979; Kelley, 1950;
McClelland, 1970; Perry, Abrami, Leventhal, & Check, 1979; Perry, Niemi,
& Jones, 1974; Widmeyer & Loy, 1988). Most of these experiments,
however, did not occur in field settings, or only involved one lecture or
presentation by an instructor. As a result of these limitations, it is difficult
to know whether such manipulations of reputation would hold in regular
classrooms over a longer period of time, such as a full semester.
A
number of researchers have examined the association between instructor
reputation and student ratings of instruction with nonexperimental studies. The
most common approach has been to survey students to learn why they selected a
particular course or instructor. Research of this nature has shown consistent
results; instructor reputation is always ranked among the top reasons for
selecting the course (Centra & Creech, 1976; Leventhal, Abrami, & Perry,
1976; Leventhal, Abrami, Perry, & Breen, 1975). In addition, the findings of
these studies show that students who selected the course due to the instructor's
reputation also rated the instructor higher on end-of-term ratings of
instruction than students who selected the course for reasons other than
reputation (Tollefson & Wigington, 1986; Wigington, Tollefson, &
Rodriguez, 1989). These studies do not show, however, whether the information
students heard about the instructor's reputation was negative, positive, or
mixed, and how such an assessment of reputation would correlate with student
ratings of instruction. At least two field-based studies have examined the
association between instructor reputation and student ratings of instruction.
In
one such study, Ory (1980) asked students to indicate, among a number of
background and affective factors, the amount of gossip they heard about the
course and their pre-course opinion (either positive, negative, or no opinion)
for the instructor. While neither of these variables are direct measures of
instructor reputation, they appear to provide relevant information. Ory found
that students who heard more gossip about the course tended to rate the
instructor’s teaching higher, and that students who entered the course with
positive opinions about the instructor also rated the instructor’s teaching
higher. In a similar study, Barké, Tollefson, and Tracy (1983) had students in
65 classes complete a 52-item questionnaire at the beginning of the semester
that measured several background variables. Fifteen of the items were indicators
of instructor reputation (e.g., instructor is fair, hard grader, and is
sensitive to class response). Of the 927 student respondents, 60% had "no
basis on which to make a judgement" (p. 78) to each item related to
reputation on the questionnaire, and 87% could not make a judgement to more than
five items. At the end of the semester students completed a course evaluation
form that measured six general areas related to instruction. Using stepwise
regression to determine the predictive contribution of a large number of
variables, Barké, Tollefson, and Tracy found that only a few of the 15 items
designed to measure instructor reputation correlated with end-of-course ratings
of instruction. For example, the expected grading leniency of the instructor at
the outset of the course correlated with three of the six dimensions of teaching
effectiveness: relationship with students, subject matter expertise, and
evaluation standards. Despite the findings of these two studies, it is still not
clear how students' overall impression of an instructor’s reputation relates
to end-of-term instructional ratings and whether this relationship persists when
other factors associated with student ratings (e.g., course workload or
difficulty, expected grade) are controlled.
To
better understand the nature of the relationship between instructor reputation
and student ratings of instruction, the present study was conducted to provide
additional, field-based data that more closely addressed this issue. Students
were asked to make overall judgements about what they heard concerning the
instructor before enrolling in the course. This information was then categorized
as either positive reputation, negative reputation, or no prior information.
These categories were used to compare overall, end-of-term ratings of the
instructor and the course. Finally, the relationship between instructor
reputation and student ratings was examined within the context of variables that
have been shown in previous research to relate to student ratings of
instruction, such as prior subject interest, course workload and difficulty, and
expected grade for the course.
Method
Participants
A
total of 754 undergraduate students enrolled in 39 education courses at a medium
sized (14,000 students), regional university in the southeastern United States
participated in this study. The classes ranged in size from 6 to 34 students.
Undergraduate education students at this institution are predominately White
(71%) and female (80%). Most respondents (76%) reported grade point averages in
the range of 2.5 to 3.5 on a 4.0 scale.
Instrument
and Variables
An
instrument to assess student evaluations of instruction and course
characteristics was developed drawing item and question wording from multiple
sources (Abrami, d' Apollonia, & Rosenfield, 1997; Feldman, 1997; Marsh,
1987; Murray, 1997). The variables used in this study were measured in the
manner described below.
Overall
instructor rating.
Students were asked to respond on a 5-point scale (1 = poor to 5 = excellent)
to the statement "Overall, how would you rate this instructor?"
Overall
course rating. Using
the same scale described above, overall assessment of the course was measured by
student responses to this question: "Overall, how would you rate this
course?"
Instructor
reputation. This
variable was measured by student responses to the question "Before taking
this course, what did you hear about this instructor?" The scale included
the following step descriptors: 1 = instructor very bad, 3 = about
average, 5 = instructor very good, 6 = didn't know about
instructor.
Course
difficulty. On a 1 to
5 point scale, students rated the perceived difficulty of the course (1 = one
of easiest to 5 = one of most difficult) by addressing the statement
"Course difficulty, relative to other courses was:"
Course
workload. This
variable was assessed, using the scale 1 = very light to 5 = very
heavy, by responses to the statement "Course workload, relative to
other courses was:"
Prior
subject interest.1
Students responded to the question "What was your level of interest in this
subject matter before taking this course?" using the scale 1 = no
interest at all to 7 = very interested.
Expected
grade. Students
indicated the grade they expected to receive in the class by selecting responses
ranging from F = 1 to A+ = 13 to address the following question: "What
grade do you think the instructor will assign you in this course?"
In
addition to these variables, two other variables were used in this study, class
size and the instructor’s sex. Class size was the number of students enrolled
in the class (M = 19.56, SD = 7.30), and instructor’s sex was
dummy (Pedhazur, 1997) coded 1 for males (n = 16) and 0 for females (n
= 23).
Procedures
Students
in 39 classes were administered the evaluation instrument on the last day of
class during the fall 1998 semester. A student in each class was selected to
dispense and collect all instruments, and to return the instruments to a central
location in the College of Education. Instructors were required to leave the
classroom during evaluations. Students were told that evaluations would not be
made available until after course grades had been assigned and would only be
provided to instructors in aggregate form.
To use the instructor reputation variable in the analyses that follow, it
was necessary to incorporate the didn’t know about instructor response,
option 6, with the other five response options in such a way that would allow
for regression analyses to be performed. To do this, responses to the instructor
reputation variable, described above, were categorized as follows:
(a)
Negative
reputation: This
category included students who selected responses 1 to 3 (instructor very bad
to about average) for the instructor reputation item.
(b)
Positive
reputation: Students
in this category choose responses 4 and 5 (above average to instructor
very good) for the instructor reputation item.
(c)
No information:
This category consisted of students who selected response 6 (didn’t know
about the instructor) for the instructor reputation item.
In determining how to construct the three reputation categories, it was
clear that response options 1 and 2 represented negative information, options 4
and 5 reflected positive information, and option 6 indicated no prior
information was known about the instructor. It was less clear how response
option 3 (about average) should be classified since this response may
reflect a mixture of both positive and negative information regarding the
instructor. The decision to include option 3 with 1 and 2 was empirically
driven. For each of the first five response options for the instructor
reputation item, mean ratings on both overall instructor and overall course were
examined. This analysis revealed that students who selected options 1 through 3
provided mean overall ratings of the instructor and course that were very
similar and, in contrast, were markedly different from mean overall ratings by
students who selected responses 4 and 5. For example, the mean overall
instructor rating by students who selected response options 1 and 2 was 2.90;
the mean rating by students who selected option 3 was 3.18; and the mean rating
by students who selected responses 4 and 5 was 4.31.
To facilitate analysis of instructor reputation, two dummy variables were
created. The first, called positive reputation, was coded 1 if student responses
corresponded with the positive reputation category, and 0 otherwise. The second
indicator variable was labeled negative reputation and was coded 1 if student
responses corresponded with the negative reputation category otherwise a 0 was
used. Of the 754 respondents, 176 (23.3%) were classified into the positive
reputation group, 420 (55.7%) into the no information group, and 158 (21%) into
the negative information group. The percentage of respondents in this study who
had no prior information about the instructor matches well the figure of 60%
reported in Barké, Tollefson, and Tracy’s (1983) study.
Results
Descriptive statistics and correlations among all student-level variables
are given in Table 1. The correlations in Table 1 show that the dummy variables
positive reputation was positively related, and negative reputation was
negatively related, to overall ratings for both the instructor and course. Thus,
students who heard positive information regarding the instructor’s reputation
rated the instructor and course higher than students who heard primarily
negative information or no information. Course difficulty, prior subject
interest, and expected grade in the course were also positively related to
instructor and course ratings. To understand better how reputation relates to
student ratings, two separate analyses were performed. First, means for each
rating (instructor and course) were calculated for each of the reputation groups
formed (positive, no information, and negative) and presented by selected
classes. Second, mean differences among the three reputation groups were
estimated via multilevel regression after controlling for covariates that have
been shown in previous research to relate to student ratings of instruction.
|
|
Table
1 about here |
|
Analysis
1: Mean Overall Ratings by Reputation Groups and Instructor
Differences
in ratings among the three reputation groups of students could be interpreted in
at least two ways. With the first interpretation, differences in overall ratings
of the instructor and course may be a function of a priming effect upon
expectancies that colors students’ judgments about the instructor and course.
An alternative interpretation is that differences in overall ratings among
reputation groups stem from actual instructional differences among instructors.
That is, better instructors have earned the positive reputation by offering
better instruction while the converse is true for instructors with negative
reputations. If one only compared student ratings between instructors, one would
not be able to separate the confounded effects of priming-expectancies from
actual instructional differences among instructors. One method for controlling
for actual instructional differences across instructors is to examine
differences among the reputation groups within instructors. Thus, the
purpose of this first set of analyses was to explore, in a descriptive manner,
student ratings by the three reputation groups within instructors to address
this important interpretational issue. If differences in overall ratings of the
instructor and course occur within the same classroom for the same
instructor, then the alternative interpretation is seriously weakened.
|
|
Tables
2 and 3 about here |
|
Descriptive
statistics and effect sizes by each reputation group for overall rating of the
instructor are presented in Table 2, and for overall rating of the course are
presented in Table 3. Note that the data presented in these two tables reflect
only those classes for which three or more students were present in each of the
three reputation groups within the same classroom. A total of 9 of the 39
classes surveyed met this criterion. It is important to point out that each row
of Tables 2 and 3 reflects a unique, distinct instructor and class; thus, each
row represents a within instructor comparison rather than a between or
across instructor comparison.
As
the means and effect sizes in Tables 2 and 3 show, students who indicated that
they heard positive information about the instructor prior to enrolling in the
course consistently rated the instructor higher than students in the same class
who indicated they heard nothing about the instructor before enrolling in the
course. Conversely, students who indicated that they heard primarily negative
information about the instructor before enrolling in the course tended to rate
the instructor worse than students in the same class who heard nothing about
that instructor. Similar relationships existed for the ratings of the course.
Overall, the effect size for the difference between the positive reputation and
negative reputation groups for the instructor rating is d = .82 –
(–.40) = 1.22. The effect size for the difference between the positive and
negative reputation groups for the course rating is d = .42 – (–.63)
= 1.25. Both of these effect size estimates are substantial. Note the consistent
pattern of means revealed in these two tables. In 17 of the 18 cases examined,
the mean for the positive reputation group is larger than the mean for the
negative reputation group except for class 9 for overall course rating.
Analysis
2: Multilevel Regression
The
effect sizes reported above do not reflect adjustments for control of possible
confounding variables such as prior subject interest, instructor sex, or class
size. To better estimate the size of the reputation effect, several covariates
that relate to instructional evaluations were included in the models of
instructor and course ratings. Researchers in the area of student ratings (e.g.,
Cranton & Smith, 1990; Feldman, 1998) have noted that the unit of analysis
examined, either student level or class-mean level, may have an important effect
on the types of relationships revealed between ratings of instruction and
various predictor variables. The analysis of class means only, which is
advocated by Marsh (1987) for example, may obscure important variation in
ratings that result from individual student differences within the classroom.
Given this, multilevel regression (Bryk & Raudenbush, 1992; Goldstein, 1995;
Longford, 1993) was used in an effort to examine variation in student ratings
both within and across classes (Feldman, 1998).
The
covariates included in the analysis at the student level were course difficulty,
course workload, prior subject interest, and expected grade in the course.
Research on student ratings has shown that course difficulty and course
workload, often measured together, relate positively to ratings of instruction
(Greenwald & Gillmore, 1997a, 1997b; Marsh, 1980; Marsh & Roche, 2000).
Interest in the subject matter of the course before enrollment—prior subject
interest—has been linked to higher student ratings of instruction (Howard
& Maxwell, 1980; Marsh, 1980; Prave & Baril, 1993). Expected grade in
the course, which typically correlates positively with ratings, has been the
subject of much debate and research (Greenwald & Gillmore, 1997a; Marsh,
1987; Marsh & Roche, 1997, 2000; McKeachie, 1997b) and therefore was
included in the analysis.
At
the class level, class size and instructor sex were included. Research
demonstrates that class size correlates, albeit weakly, with ratings of
instruction (Feldman, 1984). The sex of the instructor also appears to relate to
student ratings. Feldman's (1998) reviews have shown that women tend to receive
slightly higher ratings than men. However, Feldman (1998) also notes that a
same-sex favorability in ratings exist; students of the same sex as their
instructor may provide slightly higher ratings (Centra & Gaubatz, 2000).
Since the majority of students in the classes examined in this study were women,
it is likely that women instructors in this sample may have higher ratings.
Thus,
the models examined were, with variables enclosed in parentheses, as follows:
Student-level.
(Overall
Instructor Rating)ij = b0j
+ b1
(Positive Reputation)ij + b2
(Negative Reputation)ij
+
b3
(Course Difficulty)ij + b4
(Course Workload)ij + b5
(Prior Subject Interest)ij
+
b6
(Expected Grade)ij + eij
At
the class-level, mean ratings of the instructor were modeled with class size and
instructor sex:
Class-level.
b0j
= g00
+ g01
(Instructor’s Sex)j + g02
(Class Size)j + m0j
Combining
the student- and class-level equations yields the following model of instructor
rating:
Combined.
(Overall
Instructor Rating)ij = g00
+ g01
(Instructor’s Sex)j + g02
(Class Size)j
+
b1
(Positive Reputation)ij + b2
(Negative Reputation)ij + b3
(Course Difficulty)ij
+
b4
(Course Workload)ij + b5
(Prior Subject Interest)ij + b6
(Expected Grade)ij + eij + m0j
The
same student-level, class-level, and combined models were also examined for
overall course rating.
|
|
Table
4 about here |
|
Results
of the multilevel regression, using full information maximum likelihood to
obtain estimates (Hox, 1995), are presented in Table 4. For these analyses note
that all 39 classes and 754 student responses were included. The parameter
estimates in Table 4 show that the mean difference in student ratings of the
instructor is .25 for the positive reputation vs. the no information groups, and
-.49 for the
negative reputation vs. the no information groups. These estimates indicate that
positive and negative reputation correspond with higher and lower mean ratings
even after controlling for prior subject interest, expected grade, etc. The
estimated mean difference in student ratings between the positive reputation and
negative reputation groups is .25 – (–.49) = .74 (SE = .10, t
= 7.37, p < .001). This difference can be converted to an effect size,
d, by dividing it by the standard deviation of overall instructor rating
responses which is estimated via the null multilevel model (Bryk &
Raudenbush, 1992; Hox, 1995) as 1.19, thus d = .74 / 1.19 = .62. (Note
that the standard deviation obtained from the null multilevel regression model
is slightly larger than the standard deviation provided in Table 1. This is not
uncommon in multilevel modeling according to Snijders & Bosker, 1994.) After
controlling for the various covariates included in the model, there still
appears to be a large difference in overall ratings of the instructor between
students who heard positive and negative information about the instructor prior
to enrolling in the class.
Like
the estimates for the instructor’s ratings, the estimated mean differences
between the reputation groups for overall ratings of the course also reveal mean
ratings that are higher for the positive reputation group and lower for the
negative reputation group. The estimated difference between the positive
reputation and no information groups was .14, which was not statistically
significant at the .05 level. The difference between the negative reputation and
no information group, -.43, was statistically significant at the .05 level. The
estimated mean difference in student ratings of the course between the positive
reputation and negative reputation groups is .14 – (–.43) = .57 (SE =
.10, t = 5.75, p < .001), which is statistically significant.
This estimate can also be converted to a standardized effect: d = .57 /
1.14 = .50. Thus, the difference in
mean ratings for the course between students who heard positive vs. negative
information regarding the instructor is estimated to be just less than half a
standard deviation.
The
regression models also showed that course difficulty, prior subject interest,
expected grade, and instructor sex were statistically related to both overall
instructor and overall course ratings. Course workload and class size were not
statistically related to student ratings. Recently Marsh and Roche (2000)
demonstrated the importance of considering course workload in assessing factors
related to student ratings. A potential limitation for one of Marsh and
Roche’s analyses was that course workload was confounded with course
difficulty. The results provided here show that perceived course difficulty, not
workload, corresponds positively with student ratings. As with many other
studies, prior subject interest and expected grade were statistically related to
student ratings (Greenwald & Gillmore, 1997a; Marsh, 1980, 1987; Marsh &
Roche, 2000; Prave & Baril, 1993). The models also revealed a strong
instructor sex association with ratings. As noted earlier, a disproportionate
number of students in the sample were female since the sample consisted of
college of education faculty and students. Centra and Gaubatz (2000) and Feldman
(1998) reported that a same-sex favorability exists in ratings, so female
instructors are likely to benefit from a disproportionate number of female
students in their classrooms. In addition, Feldman also notes that female
instructors tend to get slightly better ratings in general. Combined, these two
factors may explain the large estimated mean difference between male and female
instructors modeled here.
Discussion
Results
from this study show that prior information about the instructor, when
interpreted by students either positively or negatively, corresponds with higher
or lower end-of-term ratings for both the instructor and the course. This
association demonstrates large mean differences even when important predictors
of student ratings are taken into account, such as prior subject interest,
course difficulty, and expected grade. As a practical example to illustrate this
difference, consider estimated ratings for a hypothetical female instructor who
has two groups of students in her class—those who heard negative information
about the instructor prior to enrolling in the course and those who heard
positive information prior to enrolling. With all modeled covariates held
constant at their means, the predicted rating for this instructor by students in
the positive reputation group is 4.42, and the predicted mean rating for this
instructor by students in the negative reputation group is 3.68, a mean
difference of .74 points. For administrative and personnel decisions based upon
student ratings, which is becoming more common, a mean difference of this size
in ratings could have important consequences (Perry & Smart, 1997;
McKeachie, 1997b; Wilson, 1998).
While
the current study differs from previous research on instructor reputation in a
number of ways (e.g., inclusion of distinct reputation categories, multiple
covariates, and use of student-level and class-level data), the results of this
study are consistent with findings from both experimental (Brady, 1994; Feldman
& Prohaska, 1979; Kelley, 1950; McClelland, 1970; Perry, Abrami, Leventhal,
& Check, 1979; Perry, Niemi, & Jones, 1974; Widmeyer & Loy, 1988)
and nonexperimental (Barké, Tollefson, & Tracy, 1983; Ory, 1980)
investigations of instructor reputation and student ratings of instruction.
Combined, the evidence from these studies indicates that instructor reputation
is associated with ratings of instruction. Not addressed in these studies is the
question of which types of prior information shape student assessment of an
instructor’s reputation. For example, do published ratings or written comments
about an instructor influence reputation more than informal communication among
students? Of those factors that shape reputation, which are most important?
Does, for instance, hearing that an instructor is a hard grader, is
disorganized, or communicates poorly have the same priming influence on
reputation assessment as hearing that an instructor is an uncaring person, is
not enthusiastic in the classroom, or does not value student input?
While
an instructor’s reputation may influence, to some extent, the end-of-term
evaluations the instructor receives, the more critical issues of motivation,
cognition, and learning should not be overlooked. Clearly certain types of
priming information are likely to engender stronger expectancies than others,
and if expectancies shape one's judgement about the effectiveness of an
instructor, then it is possible that the instructional expectancies students
bring to the classroom may affect their motivation to learn (Bandura, 1997). If
students anticipate a poor instructor who does not provide, for example,
contingent feedback, interaction, or clear and organized material, then this
could lead to a weakened sense of personal control that may negatively affect
active engagement, persistence, self-regulation, and interest in the subject
(Perry, 1997; Pintrich & Schunk, 1996). Experimental research in this area
suggests that there may be some connection between instructional expectancies
and student motivation. Kelley (1950), for instance, noted that students who
were primed that the instructor was a very warm person participated more in
class discussions than students who read that the instructor was rather cold.
Similarly, Feldman and Theiss (1982) noted that students with positive
expectations about the instructor’s competence perceived the instructor and
lecture more positively. Feldman and Prohaska (1979) also found that positive
and negative expectancies for the instructor influenced student attitudes and
non-verbal behavior in the classroom, and had some impact on achievement. Perry,
Abrami, Leventhal, and Check (1979) found that reputation did not affect
achievement, but did interact with some teaching behaviors to influence student
ratings of instruction.
Finally,
there is some evidence that students’ expectations for the instructor could
influence the instructional environment of the classroom. For example, Feldman
and Prohaska (1979) found that students’ expectancies for the instructor were
related to the instructor’s attitudes and behaviors in the classroom, but in a
follow-up study, Feldman and Theiss (1982) found that student expectancies did
not affect the instructor’s behaviors. The connection between student
expectancies for the instructor and the instructor's behavior seems weak, but it
could exist. If student expectancies for the instructor do shape the
instructor’s behavior, then this suggests a reciprocal model similar to
Bandura’s (1986) triadic reciprocality. First, an instructor’s reputation
may affect student motivation, perceptions of control, and various thoughts and
behaviors in the classroom. Noting these student behaviors, the instructor may
react either positively or negatively, depending upon the cues received, and
this reaction could directly affect the quality of instruction provided. If this
model is credible, then it is possible that instructor reputation may influence
student behavior and thought, but not directly bias student ratings. More
specifically, if a positive or negative reputation corresponds with higher or
lower end-of-term evaluations by students, then perhaps this occurs as a result
of attitudes and behaviors that are communicated within the classroom between
students and instructors. More research is needed to determine if, and by how
much, student expectancies for the instructor influence the instructor's
behaviors in the classroom. If this relationship is weak or nonexistent, then
reputation appears to be a biasing factor in student ratings of instruction.
References
Abrami,
P. C., d'Apollonia, S., & Rosenfield, S. (1997). The dimensionality of
student ratings of instruction: What we know and what we do not. In R. P. Perry
& J. C. Smart (Eds.), Effective Teaching in Higher Education: Research
and Practice (pp. 321-367). New York: Agathon.
Bandura, A.
(1986). Social foundations of thought and action: A social cognitive
theory. Englewood Cliffs, NJ: Prentice Hall.
Bandura,
A. (1997). Self-efficacy: The exercise of control. New York: Freeman.
Barké,
C. R., Tollefson, N., & Tracy, D. B. (1983). Relationship between course
entry attitudes and end-of-course ratings. Journal of Educational Psychology,
75, 75-85.
Baxter,
E. P. (1991). The TEVAL experience, 1983-88: The impact of a student evaluation
of teaching scheme on university teachers. Studies in Higher Education, 16,
151-179.
Brady,
P. J. (1994). How likeability and effectiveness ratings of college professors by
their students are affected by course demands and professors’ attitudes. Psychological
Reports, 74, 907-913.
Bryk,
A. S., & Raudenbush, S. W. (1992). Hierarchical linear models:
Applications and data analysis methods. Newbury Park, CA: Sage
Centra,
J. A., & Creech, F. R., (1976). The relationship between student,
teacher, and course characteristics and student ratings of teach effectiveness.
(ETS Project Report 76-1). Princeton, NJ: Educational Testing Service.
Centra,
J. A., & Gaubatz, N. B. (2000). Is there gender bias in student evaluations
of teaching? The Journal of Higher Education, 70, 17-33.
Cranton,
P., & Smith, R. A. (1990). Reconsidering the unit of analysis: A model of
student ratings of instruction. Journal of Educational Psychology, 82,
207-212.
Feldman,
K. A. (1994). Class size and college students' evaluations of teachers and
courses: A closer look. Research in Higher Education, 21, 45-116.
Feldman,
K. A. (1997). Identifying exemplary teachers and teaching: Evidence from student
ratings. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in
Higher Education: Research and Practice (pp. 368-395). New York: Agathon.
Feldman,
K. A. (1998). Reflections on the study of effective college teaching and student
ratings: One continuing question and two unresolved issues. In J. C. Smart (Ed.)
Higher Education: Handbook of Theory and Research (pp. 35-74). New York:
Agathon.
Feldman, R. S., & Prohaska, T. (1979). The student as Pygmalion: Effect of student expectation on the teacher. Journal of Educational Psychology, 71, 485-493.
Feldman, R. S., & Theiss, A. J. (1982). The teacher and student as Pygmalion: Joint effects of teacher and student expectations. Journal of Educational Psychology, 74, 217-223.
Goldstein,
H. (1995). Multilevel statistical models (2nd ed.). London:
Edward Arnold.
Greenwald,
A. G., & Gillmore, G. M. (1997a). Grading leniency is a removable
contaminant of student ratings. American Psychologist, 52, 1209-1217.
Greenwald,
A. G., & Gillmore, G. M. (1997b). No pain, no gain? The importance of
measuring course workload in student ratings of instruction. Journal of
Educational Psychology, 89, 743-751.
Griffin, B.
W. (1999). Results of the Faculty Survey on Student Ratings of Instruction:
Preliminary Report. Statesboro, GA: Georgia Southern University, Student Ratings Committee.
Higgins,
E. T. (1996). Knowledge activation: Accessibility, applicability, and salience.
In E. T. Higgins & A. W. Kruglanski (Eds.), Social Psychology: Handbook
of Basic Principles (pp. 133-168). New York: Guilford.
Howard,
G. S., & Maxwell, S. E. (1980). Correlation between student satisfaction and
grades: A case of mistaken causation? Journal of Educational Psychology, 72,
810-820.
Howard, G. S., & Schmeck, R.
R. (1979). Relationship of changes in student motivation to student evaluations
of instruction. Research in Higher
Education, 10, 305-315.
Hox,
J. J. (1995). Applied multilevel analysis. Amsterdam: TT-Publikaties.
Available on-line (March 6, 2000): http://www.ioe.ac.uk/multilevel/workpap.html
Husbands,
C. T. (1996). Variations in students’ evaluations of teachers’ lecturing and
small-group teaching: A study at the London School of Economics and Political
Science. Studies in Higher Education, 21, 187-207.
Husbands,
C. T., & Fosh, P. (1993). Students’ evaluation of teaching in higher
education: Experiences from four European countries and some implications of the
practice. Assessment & Evaluation in Higher Education, 18, 95-115.
Kelley,
H. H. (1950). The warm-cold variable in first impressions of persons. Journal
of Personality, 18, 431-439.
Leventhal,
L., Abrami, P. C., & Perry, R. P. (1976). Do teacher rating forms reveal as
much about students as about teachers? Journal of Educational Psychology, 68,
441-445.
Leventhal,
L., Abrami, P. C., Perry, R. P., & Breen, L. J. (1975). Section selection in
multisection courses: Implications for the validation and use of teacher rating
forms. Educational and Psychological Measurment, 35, 885-895.
Longford,
N. T. (1993). Random coefficient models. Oxford, UK: Oxford University
Press.
Mackie,
D. M., & Hamilton, D. L. (Eds.). (1993). Affect, cognition, and
stereotyping: Interactive processes in group perception. New York: Academic
Press.
Marsh,
H. W. (1980). The influence of student, course, and instructor characteristics
on evaluations of university teaching. American Educational Research Journal,
17, 219-237.
Marsh,
H. W. (1987). Students’ evaluations of university teaching: Research findings,
methodological issues, and directions for future research. International
Journal of Educational Research, 11, 253-388.
Marsh,
H. W., & Dunkin, M. J. (1992). Students' evaluations of university teaching:
A multidimensional perspective. In J. C. Smart (Ed.), Higher
Education: Handbook of Theory and Research (Volume VIII) (pp. 143-232). New
York: Agathon Press.
Marsh,
H. W., & Overall, J. U. (1979). Validity of students' evaluations of
teaching: A comparison with instructor self-evaluations by teaching assistants,
undergraduate faculty, and graduate faculty. Paper presented at the annual
meeting of the American Educational Research Association, San Francisco, CA.
(ERIC Document No. ED 177 205).
Marsh,
H. W., & Roche, L. A. (1997). Making students' evaluations of teaching
effectiveness effective: The critical issues of validity, bias, and utility. American
Psychologist, 52, 1187-1197.
Marsh,
H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload
on students’ evaluations of teaching: Popular myth, bias, validity, or
innocent bystanders? Journal of Educational Psychology, 92, 202-228.
McClelland, J. N. (1970). The effect of student evaluations of college
instruction upon subsequent evaluations. California Journal of Educational
Research, 21, 88-95.
McKeachie,
W. J. (1997a). Good teaching makes a difference—and we know what it is. In R.
P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education:
Research and Practice (pp. 396-408). New York: Agathon.
McKeachie,
W. J. (1997b). Student ratings: The validity of use. American Psychologist,
52, 1219-1225.
Moses,
I. (1986). Student evaluation of teaching in an Australian university—staff
perceptions and reactions. Assessment & Evaluation in Higher Education,
11, 117-129.
Murray,
H. G. (1997). Effective teaching behaviors in the college classroom. In R. P.
Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education:
Research and Practice (pp. 171-204). New York: Agathon.
Olson,
J. M., Roese, N. J., & Zanna, M. P. (1996). Expectancies. In E. T. Higgins
& A. W. Kruglanski (Eds.), Social Psychology: Handbook of Basic
Principles (pp. 211-238). New York: Guilford.
Ory,
J. C. (1980). The influence of students’ affective entry on instructor and
course evaluations. The Review of Higher Education, 4, 13-24.
Pedhazur,
E. J. (1997). Multiple regression in behavioral research: Explanation and
prediction (3rd ed.). New York: Harcourt, Brace.
Perry,
R. P. (1997). Perceived control in college students: Implications for
instruction in higher education. In R. P. Perry & J. C. Smart (Eds.), Effective
Teaching in Higher Education: Research and Practice (pp. 11-60). New York:
Agathon.
Perry,
R. P., Abrami, P. C., Leventhal, L., & Check, J. (1979). Instructor
reputation: An expectancy relationship involving student ratings and
achievement. Journal of Educational Psychology, 71, 776-787.
Perry,
R. P., Niemi, R. P., & Jones, K. (1974). Effect of prior teaching
evaluations and lecture presentation on ratings of teaching performance. Journal
of Educational Psychology, 66, 851-856.
Perry,
R. P., & Smart, J. C. (1997). Introduction. In R. P. Perry & J. C. Smart
(Eds.), Effective Teaching in Higher Education: Research and Practice
(pp. 1-8). New York: Agathon.
Pintrich,
P. R., & Schunk, D. H. (1996). Motivation in education: Theory, research,
and applications. Columbus, OH: Merrill
Powell,
A. M., Hunt, A., & Irving, A. (1997). Evaluation of courses by whole student
cohorts: A case study. Assessment & Evaluation in Higher Education, 22,
397-404.
Prave,
R. S., & Baril, G. L. (1993). Instructor ratings: Controlling for bias from
initial student interest. Journal of Education for Business, 68, 362-366.
Schmelkin, L. P., Spencer, K. J., & Gellman, E. S. (1997). Faculty perspectives on course and teacher evaluations. Research in Higher Education, 38, 575-592.
Snijders, T. A. B., & Bosker, R. J. (1994). Modeled variance in two-level models. Sociological Methods & Research, 22, 342-363.
Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel Analysis: An introduction to basic and advanced multilevel modeling. London: Sage.
Stringer, M., & Irwing, P. (1998). Students' evaluations of teaching effectiveness: A structural modelling approach. British Journal of Educational Psychology, 68, 409-426.
Tollefson,
N., & Wigington, H. (1986). Teacher-generated and student-generated
variability in teacher effectiveness ratings. Instructional Science, 15,
109-120.
Wachtel,
H. K. (1998). Student evaluation of college teaching effectiveness: A brief
review. Assessment & Evaluation in Higher Education, 23, 191-212.
Widmeyer,
W. N., & Loy, J. W. (1988). When you're hot, you're hot! Warm-cold effects
in first impressions of persons and teaching effectiveness. Journal of
Educational Psychology, 80, 118-121.
Wigington,
H., Tollefson, N., & Rodriguez, E. (1989). Students' ratings of instructors
revisited: Interactions among class and instructor variables. Research in
Higher Education, 30, 331-344.
Wilson,
R. (1998, January 16). New research casts doubt on value of student evaluations
of professors. The Chronicle of Higher Education, p. A12-A14.
Footnote
1
Both the instructor reputation and prior subject interest variables represent
retrospective information in which students are asked to recall, at the end of a
course, thoughts or opinions they held prior to enrolling in a course. There may
be concern that such information is subject to faulty memory or confounding
influences. At least two studies have examined this issue in regard to prior
subject interest. Clegg (as cited in Prave & Baril, 1993) found a
correlation of .93 between pre-course and end-of-course measures of student
motivation, and Howard and Schmeck (1979) noted that end-of-course measurement
of motivation was an accurate estimate of pre-course motivation.
Table 1:
Correlations and Descriptive Statistics among Student Ratings Variables (N
= 754)
|
Variable |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
|
1.
Overall Rating of Instructor |
-- |
|
|
|
|
|
|
|
|
2.
Overall Rating of Course |
.79* |
-- |
|
|
|
|
|
|
|
3.
Positive Reputation Indicator |
.22* |
.20* |
-- |
|
|
|
|
|
|
4.
Negative Reputation Indicator |
-.36* |
-.30* |
-.28* |
-- |
|
|
|
|
|
5.
Course Difficulty |
.13* |
.14* |
.01 |
.11* |
-- |
|
|
|
|
6.
Course Workload |
.05 |
.06 |
-.05 |
.07 |
.48* |
-- |
|
|
|
7.
Prior Subject Interest |
.15* |
.30* |
.10* |
-.03 |
.14* |
.17* |
-- |
|
|
8.
Expected Grade |
.17* |
.16* |
.05 |
-.17* |
-.28* |
-.11* |
.04 |
-- |
|
Means |
3.86 |
3.50 |
0.23 |
0.21 |
3.25 |
3.47 |
3.25 |
10.54 |
|
Standard
Deviations |
1.16 |
1.13 |
0.42 |
0.41 |
0.90 |
0.94 |
1.10 |
1.77 |
* p
< .05.
Table 2:
Means, Standard Deviations, and Effect Sizes for Overall Instructor Rating by
the Three Categories of Reputation
|
|
|
Reputation
Categories |
|
|
|
|
|||||||||||
|
Class |
|
Positive
Reputation |
|
No
Information |
|
Negative
Reputation |
|
Overall
Class |
|
F-Test |
|||||||
|
|
|
M |
d |
n |
|
M |
n |
|
M |
d |
n |
|
M |
SD |
|
df |
F |
|
1 |
|
4.83 |
.40 |
12 |
|
4.55 |
11 |
|
3.67 |
-1.26 |
3 |
|
4.58 |
.70 |
|
2,23 |
4.17* |
|
2 |
|
4.70 |
.53 |
10 |
|
4.17 |
6 |
|
3.00 |
-1.10 |
8 |
|
4.00 |
1.06 |
|
2,21 |
10.61* |
|
3 |
|
3.50 |
1.21 |
4 |
|
2.44 |
16 |
|
2.22 |
-.25 |
9 |
|
2.52 |
.87 |
|
2,26 |
3.74* |
|
4 |
|
5.00 |
.92 |
3 |
|
4.44 |
9 |
|
4.38 |
-.10 |
8 |
|
4.50 |
.61 |
|
2,17 |
1.26 |
|
5 |
|
4.13 |
.11 |
8 |
|
4.00 |
3 |
|
2.50 |
-.46 |
12 |
|
3.26 |
1.18 |
|
2,20 |
9.17* |
|
6 |
|
4.00 |
1.32 |
5 |
|
2.67 |
6 |
|
3.00 |
.33 |
8 |
|
3.16 |
1.01 |
|
2,16 |
3.12 |
|
7 |
|
3.40 |
.52 |
5 |
|
2.75 |
4 |
|
1.92 |
-.66 |
13 |
|
2.41 |
1.26 |
|
2,19 |
3.23 |
|
8 |
|
4.00 |
1.03 |
4 |
|
3.00 |
4 |
|
2.50 |
-.51 |
12 |
|
2.90 |
.97 |
|
2,17 |
5.26* |
|
9 |
|
4.00 |
1.38 |
5 |
|
2.83 |
12 |
|
3.17 |
.40 |
12 |
|
3.17 |
.85 |
|
2,26 |
4.07* |
|
Mean
d |
.82 |
|
|
|
|
|
|
-.40 |
|
|
|
|
|
|
|
||
Note.
The data presented in this table reflect only those classes for which three or
more students were present in each of the three reputation groups. The effect
size is defined as d = (M (positive [or negative]) – M
(no information))/ SD(overall).
* p
< .05
Table 3:
Means, Standard Deviations, and Effect Sizes for Overall Course Rating by the
Three Categories of Reputation
|
|
|
Reputation
Categories |
|
|
|
|
|||||||||||
|
Class |
|
Positive
Reputation |
|
No
Information |
|
Negative
Reputation |
|
Overall
Class |
|
F-Test |
|||||||
|
|
M |
d |
n |
|
M |
n |
|
M |
d |
n |
|
M |
SD |
|
df |
F |
|
|
1 |
|
4.50 |
.39 |
12 |
|
4.18 |
11 |
|
3.33 |
-1.04 |
3 |
|
4.23 |
.82 |
|
2,23 |
2.86 |
|
2 |
|
3.90 |
-.32 |
10 |
|
4.17 |
6 |
|
3.25 |
-1.08 |
8 |
|
3.75 |
.85 |
|
2,21 |
2.60 |
|
3 |
|
3.00 |
1.00 |
4 |
|
2.25 |
16 |
|
2.00 |
-.33 |
9 |
|
2.28 |
.75 |
|
2,26 |
2.80 |
|
4 |
|
4.67 |
.59 |
3 |
|
4.22 |
9 |
|
3.63 |
-.78 |
8 |
|
4.05 |
.76 |
|
2,17 |
3.00 |
|
5 |
|
2.38 |
-1.25 |
8 |
|
3.67 |
3 |
|
2.00 |
-1.62 |
12 |
|
2.35 |
1.03 |
|
2,20 |
4.04* |
|
6 |
|
4.00 |
.83 |
5 |
|
3.33 |
6 |
|
2.75 |
-.72 |
8 |
|
3.26 |
.81 |
|
2,16 |
5.68* |
|
7 |
|
3.20 |
.69 |
5 |
|
2.50 |
4 |
|
2.15 |
-.35 |
13 |
|
2.45 |
1.01 |
|
2,19 |
2.15 |
|
8 |
|
4.00 |
1.51 |
4 |
|
2.50 |
4 |
|
2.25 |
-.25 |
12 |
|
2.65 |
.99 |
|
2,17 |
8.55* |
|
9 |
|
3.00 |
.38 |
5 |
|
2.67 |
12 |
|
3.08 |
.48 |
12 |
|
2.90 |
.86 |
|
2,26 |
0.73 |
|
Mean
d |
.42 |
|
|
|
|
|
|
-.63 |
|
|
|
|
|
|
|
||
Note.
The data presented in this table reflect only those classes for which three or
more students were present in each of the three reputation groups. The effect
size is defined as d = (M (positive [or negative]) – M
(no information))/ SD(overall).
* p
< .05
Table 4: Multilevel Regression Estimates for Models of Overall Rating of Instructor and Overall Rating of Course
|
|
Overall
Instructor Rating |
|
Overall
Course Rating |
||||||||||||||
|
Fixed
Portion of Model |
B |
SE
B |
t |
DR2 |
|
B |
SE
B |
t |
DR2 |
||||||||
|
Student
Level |
|
|
|
|
|
|
|
|
|
||||||||
|
Intercept |
4.17 |
.15 |
27.93* |
|
|
3.81 |
.14 |
27.84* |
|
||||||||
|
Instructor Reputation |
|
|
|
.10 |
|
|
|
|
.08 |
||||||||
|
Positive Reputation Dummy |
.25 |
.08 |
3.10* |
|
|
.14 |
.08 |
1.74 |
|
||||||||
|
Negative Reputation Dummy |
-.49 |
.10 |
-5.01* |
|
|
-.43 |
.10 |
-4.48* |
|
||||||||
|
Course Difficulty |
.11 |
.04 |
2.69* |
.03 |
|
.10 |
.04 |
2.32* |
.02 |
||||||||
|
Course Workload |
.01 |
.04 |
0.22 |
.00 |
|
.03 |
.04 |
0.65 |
.00 |
||||||||
|
Prior Interest in Subject |
.09 |
.03 |
2.95* |
.00 |
|
.20 |
.03 |
6.62* |
.04 |
||||||||
|
Expected Grade |
.11 |
.02 |
5.56* |
.00 |
|
.10 |
.02 |
5.30* |
.02 |
||||||||
|
Class
Level |
|
|
|
|
|
|
|
|
|
||||||||
|
Class Size |
-.01 |
.02 |
-0.85 |
.00 |
|
-.02 |
.01 |
-1.57 |
.02 |
||||||||
|
Instructor’s Sex |
-.60 |
.23 |
-2.65* |
.06 |
|
-.50 |
.21 |
-2.39* |
.05 |
||||||||
|
Random
Portion of Model |
|
|
|
|
|
|
|
||||||||||
|
Class-level
Variance (between classes) |
.44 |
c(36)
= 443.88* |
|
.36 |
c(36)
= 419.84* |
||||||||||||
|
Student-level
variance (within classes) |
.67 |
|
|
|
.64 |
|
|
||||||||||
|
R2 (total variance modeled) |
.21 |
|
|
|
.24 |
|
|
||||||||||
Note.
All predictors, except dummy variables, centered about their grand means, and df
= 36 for each statistical test. Sample sizes were 754 students nested within 39
classes. Partial R2,
denoted DR2, is calculated in the normal manner (Pedhazur, 1997), but
model variance is calculated by summing both the between and within class
variances (Snijders & Bosker, 1999).
* p
< .05
Copyright © 2000, Bryan W. Griffin
Last revised on 08 December, 2000 03:23 AM