Scaling up vs. spreading out

I’ve been thinking of scale lately. Not the musical sort, but the concept of size. One of the common questions asked in education research is “how well does this scale up?” It’s already no small feat to invent a practice or intervention that makes a measurable difference to students on the scale of a classroom or school. It’s another thing entirely to “scale it up” to a district, state, or nation.

But is “scaling up” even a proper goal? This recent article in The New Republic calls into question the wisdom of scaling up successful interventions in the international development arena. The basic argument is that “context matters.” To quote from the article:

The repeated “success, scale, fail” experience of the last 20 years of development practice suggests something super boring: Development projects thrive or tank according to the specific dynamics of the place in which they’re applied. It’s not that you test something in one place, then scale it up to 50. It’s that you test it in one place, then test it in another, then another. No one will ever be invited to explain that in a TED talk.

The scaling up dynamic seems to work reasonably well in some fields of medicine. A drug or vaccine is developed, tested on a small group of volunteers, found effective in a larger scale trial, and eventually comes to market. The drug “scales up” after passing several smaller-scale tests. We don’t feel the need to re-test the efficacy of a polio vaccine in each and every city of the US, for instance.

But medical research and social research are different in key ways. First, the nature of the “treatment” in medicine is often a pill – something that can be standardized, and whose mechanism relies more on chemistry than on behavior or cognition. That is, we know that if you administer a polio vaccine to 100,000 children, some (high) percentage will develop immunity to polio. It’s basic immunology. Sure, there will be outliers for whom it doesn’t work, and we need to have backup plans for those few.

But the New Republic article cites a similar medical-like intervention – de-worming children. After implementing a de-worming program in Kenya, the researcher found:

The deworming pills made the kids noticeably better off. Absence rates fell by 25 percent, the kids got taller, even their friends and families got healthier. By interrupting the chain of infection, the treatments had reduced worm infections in entire villages. Even more striking, when they tested the same kids nearly a decade later, they had more education and earned higher salaries. The female participants were less likely to be employed in domestic services.

So of course one would want to scale up that intervention. It’s a pill, after all. Completely standardized and easy to administer. Why not de-worm an entire nation? Well, they tried that program within several states of India, and the results were… unclear. I strongly suggest you read the full article for the nuances, but here’s the author’s punch line for this part of the argument:

In 2000, the British Medical Journal (BMJ) published a literature review of 30 randomized control trials of deworming projects in 17 countries. While some of them showed modest gains in weight and height, none of them showed any effect on school attendance or cognitive performance. After criticism of the review by the World Bank and others, the BMJ ran it again in 2009 with stricter inclusion criteria. But the results didn’t change. Another review, in 2012, found the same thing: “We do not know if these programmes have an effect on weight, height, school attendance, or school performance.”

The underlying point is this: many many things contribute to children’s health, school attendance, and intellectual development. Carrying parasites is one of many problems they face. Not that eradicating worms is not by itself an unquestioned “good thing,” but it may not (re)produce the outcomes one was expecting. And, like it or not, lots of “good things” have to get rated against one another when resources are limited.

So if something as controlled and unproblematic as giving a pill can have radically different results in different social settings, how in the world do we contemplate scaling up a much less standardized intervention like giving every child a laptop? Changing a textbook for an entire state? (I note in passing that textbooks are not like pills – teachers pick and choose which aspects of a book to use and which to ignore). Much of the Big Money in education research is looking for magic interventions that are scalable. As one who works in the trenches of evaluation, I can tell you just how hard it is to tease apart why some things work in some situations and not others.

Thus far I’m convincing myself that we should be conservative with our scaling up desires. Local context matters. Now let’s ask the question: when can one make a case for universal policy? I suspect this is a no-brainer for Policy 101 students, but my last policy analysis class was taught by an active alcoholic while I was in the throws of an undergraduate depression.

We have a national civil rights policy – rooted in education policy – that prohibits de jure discrimination by race. (Whether de facto discrimination is addressed appears to depend on the priorities of a given administration). But we don’t permit “local control” or “local choice” with matters of racial discrimination, nor should we. I can think of other nasty actions we try to outright prohibit on a large scale: corporal punishment and gender discrimination come to mind.

So where does local control run the risk that the locals will “get it wrong?” Where does a one-size-fits-all policy run the risk of having no effect or worse causing harm to a significant segment of the population? In particular I’m thinking of the Common Core standards movement, an attempt to bring unity to academic standards across the states (and, as a consequence, make it much easier and cheaper to have a single national achievement test). What “problem” does Common Core purport to solve? and is it as likely as not to cause problems if local variation is not supported?

I welcome any reader to chime in with a thought.


Starting to bite the (non-magical) bullet

I’ve been deferring writing about my professional life during NaBloPoMo. Have you ever faced a project so large and complicated that it’s difficult to even know where to begin? That’s what it feels like when I tried to put down on (e-)paper my thoughts about 21st century education.

Another inhibition I’m feeling has to do with professional pride. I don’t mind blathering about topics my paycheck and professional reputation don’t depend on. When I write about education – particularly the learning sciences – I feel the need to start citing sources, dotting i’s and crossing t’s. Another bad professional habit – a reluctance to comment in areas outside my particular specialties. For example, I know a lot has been written on the history of education reforms, but I only have a fairly cursory understanding of that history. In my blogging meanderings I’m likely to “discover” ideas that have been well known for decades. So first, an explicit disclaimer: I’m going to be using this blog to throw some wet clay on the wheel and start shaping components of arguments. Please join in by writing in the comments section. Let’s be enthusiastic amateurs together for the sake of discovery.

(Regarding the title of this post – one theme through this post and others in this thread will be that there are no magic bullets. This I feel confident in stating unequivocally.)

Theme of Variation

I came across a blurb on the CAST web site (Center for Applied Special Technologies)

The Average Learner Myth

Typically, a “mythical average learner” is used as the basis for creating a “one size fits all” curriculum. Often attempts to address learner variability take a remediation approach that emphasizes how individuals who least resemble the “mythical average” can overcome the ways in which they are different…

The most consistent finding to emerge from the interdisciplinary study of learning is that when it comes to learning, natural variability is the rule, not the exception. What is perhaps most important to understand about learner variability is not that it exists, but that not all of it is random. Because some variability is systematic, you can design for it in advance. This approach is called Universal Design for Learning (UDL). UDL is an educational framework that guides the design of learning goals, materials, methods, and assessments as well as the policies surrounding these curricular elements with a diversity of learners in mind.

The idea I want to focus on is variation and all that entails. In our daily communication, particularly in the media, we often talk about averages and trends: stock prices are rising, math scores are falling, the unemployment rate is below 6%. Statisticians call these measures of central tendency – they describe a sense of “center” in a collection of stuff. And they’ve very useful – measures of center are the most frequently cited statistics by far.

I want to focus on the second-most used statistic, a measure of dispersion or “spread” (when I teach a research methods course, I often use the colloquial terms “center” and “spread” to describe these two common statistics). Stock prices are rising… well, are they all rising? By the same degree? Probably not, and knowing something about how strongly individual stocks (or collections of stocks) vary from one another is important. Math scores are falling… everyone’s? Are there identifiable subgroups? And isn’t some variation in individual scores due to the random luck of having particular items on the test? The unemployment rate is an interesting measure of “average” as it’s composed of two values: 0 (not employed) and 1 (employed). Nobody is 6% employed (okay, I know, we can talk about partial or part-time employment, or number of hours worked as a fraction of the desired number of working hours, which means one could in theory be over-employed).

Variation, dispersion… it’s no exaggeration to argue that life itself could not function without it. Biological evolution critically depends on variation, in particular variation in “fitness” for passing on one’s genes. Fitness is a relative concept – an organism is not universally “fit,” but is fit insofar as it can function well within its environment. Change the environment, and the organism’s fitness may rise or fall.

Now, there are places where we want to tame variation. Manufacturing comes to mind, particularly when safety is a concern. In producing turbines for aircraft engines we don’t want variation in the stiffness of the material or the weight of the individual fan blades. Variation in that case is a problem – drift too far away from the center and things start to go wrong very quickly. So let’s bear in mind that in some cases, variation is a good thing, and in other cases variation is to be avoided.

Education is a process of guiding human development. So, do we want lots of Darwinian variation, whereby some people are more “fit” for their environment than others, or do we want aircraft manufacturing, with very tight tolerances for assuring uniformity in the components? (Hint: it’s not a black or white answer. “It depends.”)

Many of the arguments in education policy have underlying them a position regarding variation. This position is not always explicitly articulated, not for nefarious reasons, but because one’s belief in the vice/virtue of diversity seems so obvious as to be not worth stating. I am going to argue that attending to variation, diversity, call it what you will, is not simply a “politically correct” genuflection, but rather an essential part of any intelligent discussion of educational policy.

Standards and Standardization

The idea of “standards” gets thrown around a lot. “We need higher education standards” is a popular political slogan. A “standard” is more a concept of location, rather than dispersion. Remember, we can think in terms of “center” and “spread”. Standards tend to stick a pin in a location and say “everything here and above = good.”

I’m not trying to argue that standards are good or bad, just what they entail, and what we really mean by them. Again, standards for aircraft engines are a non-controversial good thing.

I think it was Lee Schulman who wrote about the difference between Michelin Guide standards and Board of Health standards for restaurants.  The Michelin web site is pretty explicit:

Michelin inspectors eat their way around the globe, selecting outstanding restaurants for a diverse range of comforts, tastes and budgets. Travelers and locals alike rely on the red MICHELIN Guides to help them discover exceptional meals and hidden gems in neighborhoods across cities and regions.

I’ve underlined the emphasis of diversity. A Michelin rating combines elements of a central location (“this restaurant is consistently good”) with elements of variation. When we look more closely at the criteria Michelin inspectors use we find:

One star indicates a very good restaurant in its category, offering cuisine prepared to a consistently high standard…

So there is that coupling of central tendency and spread. The idea of variation shows up in two different ways in that sentence. First, we’re certifying a “very good restaurant in its category,” which implies that there can be several senses in which a restaurant may be very good. A very good Thai restaurant may not rate very highly if a customer is in the mood for Italian food. A second way dispersion appears is in the phrase “consistently high standard” – consistency tends to imply low variation (again, think aircraft engine components). We don’t want any bad surprises here (but for higher star ratings, we might delight in pleasant surprises).

I think we can agree we like the concept of Michelin standards for, say, schools and other educational institutions. There is that nagging question of “in its category,” though, and we’ll have to come back to that. We also want low variation in terms of meeting/exceeding a standard. Again, this dual sense of an expanded repertoire of standards by category, but a sense of quality control as well.

I can’t remember whether it was Lee Schulman or his colleague Elliot Eisner who asked, rhetorically, “does anybody really want to go to a restaurant for a standardized meal?” (Ok, yes, there are times when the site of a McDonalds in a foreign land might actually attract American tourists).  What he’s referring to is the boring predictability of a “standard” meal. Within certain bounds, we enjoy variety. We may particularly delight in being pleasantly surprised.

I’m running out of steam at this point. Let me wrap up by noting that standards have their place and use, and we need to be aware of what we mean by standards. Also, what sort of situational variation are we allowing for our standards (think high quality Thai food vs high quality Italian food). Last, let’s remember the claim of CAST that “when it comes to learning, natural variability is the rule, not the exception.”  We need to continue to think about natural variability in learning against a backdrop of  “standards.”