This article is part of the guide Better, Faster, Stronger: How Learning Engineering Aims to Transform Education.
Imagine you’re an edtech company with thousands of students on your platform. You see an opportunity to make a small change that might improve their learning outcomes, so you roll it out to a group of students who don’t know they are part of the sample. Did you simply practice the same type of A/B testing that’s common throughout the tech sphere? Or rope unwitting students into being the guinea pigs of your experiment without consent?
That’s the ethics debate playing out in one online educator space, a lively back-and-forth carried over from a recent workshop on educational A/B testing.
It’s a timely one, too, as recent news has shown just how tired students are of secret monitoring. In June, Dartmouth College dropped a cheating investigation into medical students following the dubious use of Canvas to track perceived activity during exams. Students across the country have pushed back against the use of test proctoring software, citing the mental toll of constant monitoring and concerns over privacy. But if researchers are A/B testing two innocuous options, what’s the harm?
Providing the Spark
Jenessa Peterson, director of learning engineering at the Learning Agency, touched off the discussion in a Google Group run by her organization with the question: Is A/B testing between two benign conditions without participants' knowledge OK? One example cited was a Pearson A/B test that earned negative media coverage a few years ago. As part of the experiment, students at randomly selected colleges were shown encouraging messages after choosing an incorrect answer during online quizzes, and Pearson later published a paper on its “social-psychological interventions.” Peterson wondered at the concern over the test expressed in media coverage. She also shared research that found participants disapproved of medical A/B testing even if they thought each condition was acceptable on its own. Most people would be fine if Pearson offered either option to all users─software with or without encouraging messages, she writes.
“If both treatments are acceptable alone, why then would it be unacceptable to conduct an experiment, even without participants’ knowledge, to see which treatment leads to better learning outcomes?” Peterson asks on the message board.
In a conversation with EdSurge, Peterson says she would like to see researchers build upon the federal guidelines that protect study participants. Regulations say that researchers don't have to get informed consent to test minor changes that involve normal educational practices, as long as they’re not likely to adversely impact students’ ability to learn the required education content, she explains.
“One of the things I think we really need is a shared set of protocols or a checklist that we could create as a community for researchers to build trust for learner participants and their families,” Peterson says, describing a tool that could be used by researchers around the world which spells out when it’s OK to waive informed consent. “I think the research community should attempt to align and discuss what those standards should be.”
How Harmless? There’s one snag in that line of thinking: How do you know your A/B is harmless until you test it? And what do you do if it’s not? Those questions were posed in the thread by Collin Lynch, an assistant professor in the Department of Computer Science at North Carolina State University. “A/B approaches, particularly those based upon deception, are experimental by nature and that means that you are subjecting one group to a different treatment than others,” he writes.
In an interview with EdSurge, Lynch poses this scenario: As the result of A/B testing on a pair of classrooms, one of the teachers finds themself with students performing worse because of variables they did not control. A researcher like Lynch might learn something from the experiment, but some students and their teacher would suffer the consequences. What would be better, he says, is if students experienced both sides of the experiment, then switched so they could ultimately be exposed to both.
“My general take on it is that simple A/B testing is a useful technique, but education is a unique context and vastly different than an at-will experience than say Facebook,” Lynch says. “That’s really what drives my skepticism about blanket use of A/B testing. At some point we do have to experiment, but you have to be careful anytime you’re introducing something that could adversely affect one group over another. Especially if you do so without any form of informed consent or the involvement of the instructors.” He adds that the discussion is ultimately one among practitioners about methods and what’s ethical.
“It carries with it the question of, what’s benign? How do we determine what’s a safe thing to test and what's not?” Lynch says. “That is a research and methodology question we have to discuss.”
Checks and Balances Those concerns could be addressed through an institutional review board similar to those at universities, poses Jeff Dieffenbach, associate director of the MIT Integrated Learning Initiative.
“There's of course a continuum, but I suspect that most A/B tests that an education company would run would be benign,” Dieffenbach writes. “Yes, if the difference between A and B is significant, there could be an educational harm, but that harm is likely (although not guaranteed to be) small and temporary.”
Dieffenbach tells EdSurge that in his experience with K-12 research, parents don’t want their child to be in a control group during A/B testing. One way his lab has assuaged that fear is by offering alternatives that are still academically beneficial. If researchers are testing the benefits of a literacy program, children in the control group might receive math, computer science or mindfulness classes─something that doesn’t impact literacy. During such research, Dieffenbach says parents always give fully informed consent, and even that paperwork is approved by an institutional review board, or IRB, which reviews research methods and weighs in on ethical considerations.
“We’re always running experiments. Every time a teacher chooses to do something in a classroom, they're in effect running an experiment against some different thing,” Dieffenbach says. “If we want to make learning better, we should do that incrementally such that we’re not dooming a generation of kids to a totally faulty premise. But at the same time, not moving from where we are is essentially dooming kids to a future that’s not as good as it should be.”
The Responsibility to Test Christopher Brooks, assistant professor at the University of Michigan School of Information, wrote in the message thread that he has requested a waiver of informed consent from his IRB in cases where participants’ consent might change their response rate or introduce bias. He tells EdSurge that any experiment, including things like questionnaires or interviews, should be approached with care. That’s the benefit of working with an IRB, he adds.
“One of the things I’m super frustrated about is the word ‘experiment’ triggers in people’s mind some sort of mad scientist,” Brooks says, referring to another user on the message board who brought up the Nuremberg Code—a set of ethical principles developed after World War II for mostly-medical experiments involving people . “This is not even in the same class of what learning scientists are doing. People at that time were talking about dramatically different, horrible things─not improving education by giving slightly different test questions.”
Something that hasn’t been widely touched on during the debate, Brooks says, is the “huge missed opportunity” to get students involved with research results. “I think we have the opportunity to do translational work, taking the research we do in higher ed and making it accessible to the students that were doing that research on/with,” he says.
Steven Ritter, founder and chief scientist at Carnegie Learning, notes on the message board that his company is constantly tweaking its software, whether at the request of a client or to improve the product.
“We’re never going to A/B test everything, but I think we have an obligation to, as much as possible, know whether we’re moving in the right direction,” he writes. David Porcaro, vice-president of learning and innovation at General Assembly, writes that he was involved with the lightning rod Pearson study that touched off the debate. He says the company concluded, after extensive review, that informing students of the A/B testing would impact the results..
“While the results of the study were not as impactful as everyone had hoped (echoing much of the recent research showing how context matters in applying growth mindset messages in education settings), all included in this study ... learned a lot about where people are and aren't comfortable with in educational A/B testing,” he writes. Users are fine with A/B testing on supplemental course materials and not so much when it comes to material that is graded, Porcaro says, but ultimately the logic of A/B testing in education is “upside down.”
A structured experiment that leads to improvements scares people, Porcaro posits. But an unstructured experiment, like a new feature launch or content tweak, might be seen as an improvement even if it causes harm.
Commentaires