The higher ed internets and social medias are positively abuzz these days with the Carter et al. paper on the effects of computer use in introductory econ at West Point. I've finally gotten around to reading the paper, and I like the study! It is extremely well-designed: it is situated in a real-life classroom setting, not a lab like other computer use studies; it uses real-life exams, not some ungraded quiz that students could care less about; it randomly assigns computer use policies in small course sections of large classes that use a common textbook and a common final exam; it compares course sections taught by different instructors and course sections taught by the same instructor; it includes important controls, does robustness checks with different types of standard errors, and covers almost any objection that I could think of. It's exactly how a quantitative, outcome-focused study in the Scholarship of Teaching and Learning should be conducted. It should be required reading in SoTL workshops.
But I don't buy the substantive conclusion drawn by the authors: that they have shown that the use of laptops and/or other electronic devices had a substantive impact on learning. Here is why.
Well, first, here is why not: I don't have a problem with the whole correlation/causation thing in this study. Sure, correlation is not causation, but the authors do a good job (including the use of 2SLS) to exclude alternative causal stories to the extent possible. I also don't have a problem with the conclusion that the authors found a statistically significant effect of computer/el.dev. use. I'm as critical of significance tests as anybody who has used Bayesian analysis, but I buy that Carter et al. find an effect that is clearly significant according to econometric practice.
My problem is that the statistically significant effect is small. (Yes, I know that the authors go out of their way to argue that it's large - more on that below.) Carter et al. present various versions of their analysis, most with a pretty consistent effect of around -0.2. Since the dependent variable is standardized, this means that permitting laptop or electronic device use was associated with a ceteris paribus reduction of the final exam grade (multiple choice and short answer portions) by .2 of a standard deviation. On a 100-point scale, that standard deviation was 9 points. In other words, the possibility of computer use in class reduced the average exam grade by less than two points out of 100 hundred.
Let's put this in grade terms. The average final exam score in the classes under observation was 72 in the multiple choice and short answer portions. Electronics use in the classroom was equivalent to reducing that score to a 70.2.
News flash: computer use in the classroom reduces a C- to a low C-.
Significance is not substance. This looks pretty insubstantial to me. But the authors make the argument that the effect is in fact meaningful, even large, and they do so by comparing it to changes of other factors identified in other studies. The currency of comparison is the standard deviation: Carter et al. point to the fact that "Aronson, Barrow, and Sander (2007) find that one standard deviation improvement in teacher quality increases test scores by 0.15 standard deviations" in a study of high-school students; as a result, they conclude, permitting laptops in class is worse than reducing teacher quality. This assumes, of course, that a standard deviation in a West Point intro to econ test is equivalent to a standard deviation of a high school test score. What's the apple and what's the orange here?
There's another aspect of the Carter et al. study that makes me wary of its conclusion that it shows a reduction of learning due to laptops in the classroom. What worries me is the 72-point average that I mentioned above. You put highly motivated young people who were selected on the basis of academic merit and test scores into a small class taught by an expert with a graduate degree - and at the end of the semester the average score is a C-? That's puzzling! But it's not uncommon (I should take another look at my own classes!), and I think, based on my own experience but no actual data analysis, that there are three likely explanations for this. First, the final exam may not be well-aligned with what was taught in the course. Most frequently, this happens when the exam questions are about material that was not emphasized but mentioned somewhere in the class material, or when exam questions are not about the core of the facts and concepts but about marginal details associated with them. A popular type of such questions asks about the precise value of some number that was used to exemplify a course concept in the textbook. Such exams measure not only to what extent students have understood and can use the course material but also to what extent they can memorize the details. A second reason for such a low final exam score could be that the exam consists of a large number of questions, so that a fair number of students lose points because they don't get through the exam. In that case, the exam measures to what extent students are skilled at working quickly with a particular exam format, in addition to measuring learning. A third possibility is that the instructors aim for a particular grade curve. It could be that they want only a certain percentage of top students to continue in the major, or they may want to identify top students for particular honors or other outstanding treatment. These reasons are not mutually exclusive and, depending on the goals of the instructor and the educational program, they may be legitimate. But they suggest that the exam scores may have limited measurement validity as indicators of learning. In other words, computer use may (slightly) reduce something, but it is not necessarily learning, or not all learning.
The last point raises an interesting question about an intriguing detail of the Carter et al. study: While computer use in the classroom had a significant effect on multiple choice and short answer scores, this effect was missing with regards to the essay portion of the exam; instead, the instructor effect was much stronger. The authors, in response, dismissed the essay grade as a bad indicator of student learning. Could it be, though, that the opposite was true and that at least some instructors used the essay grade to correct for the low standardized scores of students that they knew, from classroom experience, office hours, and the like, were bad test-takers but had learned a lot?
In any case, I am generally happy about the results found by Carter at al.: Once you control for all the other factors that influence student performance, it barely matters whether instructors permit electronic devices in the classroom or not. Considering the many problems associated with prohibiting computers in the classroom, this is good news! Of course, the results do not tell us how instructors and students can use all the tools, electronic and otherwise, at their disposal to actually increase their learning, but that's the next conversation to be had.
No comments:
Post a Comment