Scaling up vs. spreading out

I’ve been thinking of scale lately. Not the musical sort, but the concept of size. One of the common questions asked in education research is “how well does this scale up?” It’s already no small feat to invent a practice or intervention that makes a measurable difference to students on the scale of a classroom or school. It’s another thing entirely to “scale it up” to a district, state, or nation.

But is “scaling up” even a proper goal? This recent article in The New Republic calls into question the wisdom of scaling up successful interventions in the international development arena. The basic argument is that “context matters.” To quote from the article:

The repeated “success, scale, fail” experience of the last 20 years of development practice suggests something super boring: Development projects thrive or tank according to the specific dynamics of the place in which they’re applied. It’s not that you test something in one place, then scale it up to 50. It’s that you test it in one place, then test it in another, then another. No one will ever be invited to explain that in a TED talk.

The scaling up dynamic seems to work reasonably well in some fields of medicine. A drug or vaccine is developed, tested on a small group of volunteers, found effective in a larger scale trial, and eventually comes to market. The drug “scales up” after passing several smaller-scale tests. We don’t feel the need to re-test the efficacy of a polio vaccine in each and every city of the US, for instance.

But medical research and social research are different in key ways. First, the nature of the “treatment” in medicine is often a pill – something that can be standardized, and whose mechanism relies more on chemistry than on behavior or cognition. That is, we know that if you administer a polio vaccine to 100,000 children, some (high) percentage will develop immunity to polio. It’s basic immunology. Sure, there will be outliers for whom it doesn’t work, and we need to have backup plans for those few.

But the New Republic article cites a similar medical-like intervention – de-worming children. After implementing a de-worming program in Kenya, the researcher found:

The deworming pills made the kids noticeably better off. Absence rates fell by 25 percent, the kids got taller, even their friends and families got healthier. By interrupting the chain of infection, the treatments had reduced worm infections in entire villages. Even more striking, when they tested the same kids nearly a decade later, they had more education and earned higher salaries. The female participants were less likely to be employed in domestic services.

So of course one would want to scale up that intervention. It’s a pill, after all. Completely standardized and easy to administer. Why not de-worm an entire nation? Well, they tried that program within several states of India, and the results were… unclear. I strongly suggest you read the full article for the nuances, but here’s the author’s punch line for this part of the argument:

In 2000, the British Medical Journal (BMJ) published a literature review of 30 randomized control trials of deworming projects in 17 countries. While some of them showed modest gains in weight and height, none of them showed any effect on school attendance or cognitive performance. After criticism of the review by the World Bank and others, the BMJ ran it again in 2009 with stricter inclusion criteria. But the results didn’t change. Another review, in 2012, found the same thing: “We do not know if these programmes have an effect on weight, height, school attendance, or school performance.”

The underlying point is this: many many things contribute to children’s health, school attendance, and intellectual development. Carrying parasites is one of many problems they face. Not that eradicating worms is not by itself an unquestioned “good thing,” but it may not (re)produce the outcomes one was expecting. And, like it or not, lots of “good things” have to get rated against one another when resources are limited.

So if something as controlled and unproblematic as giving a pill can have radically different results in different social settings, how in the world do we contemplate scaling up a much less standardized intervention like giving every child a laptop? Changing a textbook for an entire state? (I note in passing that textbooks are not like pills – teachers pick and choose which aspects of a book to use and which to ignore). Much of the Big Money in education research is looking for magic interventions that are scalable. As one who works in the trenches of evaluation, I can tell you just how hard it is to tease apart why some things work in some situations and not others.

Thus far I’m convincing myself that we should be conservative with our scaling up desires. Local context matters. Now let’s ask the question: when can one make a case for universal policy? I suspect this is a no-brainer for Policy 101 students, but my last policy analysis class was taught by an active alcoholic while I was in the throws of an undergraduate depression.

We have a national civil rights policy – rooted in education policy – that prohibits de jure discrimination by race. (Whether de facto discrimination is addressed appears to depend on the priorities of a given administration). But we don’t permit “local control” or “local choice” with matters of racial discrimination, nor should we. I can think of other nasty actions we try to outright prohibit on a large scale: corporal punishment and gender discrimination come to mind.

So where does local control run the risk that the locals will “get it wrong?” Where does a one-size-fits-all policy run the risk of having no effect or worse causing harm to a significant segment of the population? In particular I’m thinking of the Common Core standards movement, an attempt to bring unity to academic standards across the states (and, as a consequence, make it much easier and cheaper to have a single national achievement test). What “problem” does Common Core purport to solve? and is it as likely as not to cause problems if local variation is not supported?

I welcome any reader to chime in with a thought.

Advertisements

Diversity and dispositions

Another post that touches on my professional life as an educator / researcher. This too may take multiple installments to get the thoughts fully fleshed out, but I want to start sketching out the issue.

In a previous post on education I brought up the issue of diversity and variation. Here’s a snippet of what I wrote:

Variation, dispersion… it’s no exaggeration to argue that life itself could not function without it. Biological evolution critically depends on variation, in particular variation in “fitness” for passing on one’s genes. Fitness is a relative concept – an organism is not universally “fit,” but is fit insofar as it can function well within its environment. Change the environment, and the organism’s fitness may rise or fall.

Now, there are places where we want to tame variation. Manufacturing comes to mind, particularly when safety is a concern. In producing turbines for aircraft engines we don’t want variation in the stiffness of the material or the weight of the individual fan blades. Variation in that case is a problem – drift too far away from the center and things start to go wrong very quickly. So let’s bear in mind that in some cases, variation is a good thing, and in other cases variation is to be avoided.

Education is a process of guiding human development. So, do we want lots of Darwinian variation, whereby some people are more “fit” for their environment than others, or do we want aircraft manufacturing, with very tight tolerances for assuring uniformity in the components? (Hint: it’s not a black or white answer. “It depends.”)

I want to come back to this question of where we desire variation, where we want to control or eliminate it, and what a “healthy” balance looks like both within and between individuals. In particular, i want to discuss interests, attitudes, and dispositions. This is going to draw on some ideas I’ve been sketching out on standards and standardization, as well as attitudes among middle schoolers.

Let’s start interests writ large (I’m not going to analytically parse an exact definition of “interest” – let’s stick with the colloquial). It’s not controversial to hope that children and adults have a variety of healthy interests: sports, music, arts, academic subjects, Civil War re-enactments, bird watching, you name it. It also seems to be a generally agreed desire that children at least try a variety of things, and probably adopt a much smaller number as “main” interests, while continuing to cultivate a habit of curiosity and openness to new experiences.

Now let’s dive down a level. Parents and adults will differ on some of the particulars. Most would hope that their children develop some sort of strong interest in a socially acceptable, personally fulfilling and economically beneficial domain – arts, engineering, business, and the like.  But I doubt most parents would find a complete lack of interest in any of these things perfectly okay – we’d worry about the child, not just for their future but for their present sense of well-being. A child who exhibits little interest in anything may be exhibiting signs of depression. (Yes, I realize a child can have strong interests in anti-social domains, too. If I try to footnote every exception this will start to read like an academic journal article. I’m trying to avoid that).

So I’m going to postulate this: we care that our children develop interest(s) in some domains that we would consider “healthy” (a shorthand for fulfilling, productive and pro-social). But we don’t necessarily care about the particulars: computer science of culinary science; martial arts or fine arts. Or rather, across a society, individuals may care about these distinctions, but as a whole there’s a healthy mix of healthy interests.

It almost goes without saying that there are activities, pursuits, and lifestyles that few of us would wish for our children. Drug addiction leads to varying circles of Hell. Does anybody really want – as a first choice – that their child grow up to be an assassin for a gang? Backing off the obviously criminal, most of us would probably want more for our children than to sit on a sidewalk begging for spare change.

We have a healthy mix of positives, and a somewhat clear set of universal negatives. Are there any “must have” positives, something that pretty universally every adult wants for his/her child? And I mean this with some degree of specificity – not just “I wish my child to be fulfilled and happy.”  (This reminds me of a joke about a Jewish mother telling her son he can be anything he wants: a cardiologist, a neurologist, a dermatologist, a surgeon…)  At the moment I can’t think of any that jump out. Perhaps grow into a healthy romantic relationship of their own?

My main point, though, is that while we may have universal wishes for our children at a particular level of generality (I want my kids to find fulfilling work), we may disagree or even have no opinion about the particulars.

Now let’s talk about the “STEM crisis.” (STEM stands for Science, Technology, Engineering and Math). Lots of hand-wringing about how we aren’t producing enough STEM graduates in our schools. In particular, there are too few women choosing STEM careers. I’m asking – are these really problems?

What does it mean that we aren’t producing enough STEM-ready graduates? Generally it means that there are open jobs available on the market and not enough qualified individuals to fill them. In the US, that also means lobbying Congress to open up visas for skilled immigrants. But as economists and others have argued, this is not a STEM problem, it’s an economics problem. Basic supply-demand theory says if you raise the salaries for STEM employees, you’ll end up with a greater number of qualified applicants knocking on the door. So it’s not that there’s a STEM worker shortage – there is a shortage of workers willing to work for the current salary ranges. Edit: this article from Businessweek makes the same claim.

I believe the supply-demand argument works up to a point. At some point, though, we’re going to bump into an interest limit. That is, there are reasonable, intelligent people who would say “I don’t care how much you want to pay me; you couldn’t pay me enough to major in engineering. I’d rather starve and live on the streets.” Perhaps this is a first-world problem, that those who have grown up in true poverty and suffering just couldn’t understand. But anybody who has been to an American university has run into students with this attitude. And it isn’t just STEM – change the subject to social work, kindergarten teaching, marketing… you’ll find people who would rather gouge their eyeballs out than partake of that work.

Likewise, what does it mean that there aren’t enough women interested in STEM careers? Superficially, it means that the proportions of women are lower than those of men in terms of STEM interest. Some of this, as has been documented, is due to barriers to women’s entry, including discouragement from teachers and an exclusionary culture in some STEM fields. So let’s assume that some of that gender disparity is due to structural impediments imposed from the outside. Still, at some point we’re going to hit the barrier defined by intrinsic interest – surely not every woman or man wants to go into a STEM field. And if not every, what is the “natural” base rate of interest? (again, given that this base rate is partially sensitive to the perceived rewards).

I’m choosing career interest as my illustrative case – we care that children become interested in something positive, but may care less about the actual details. What other choices are we happy with leaving up to general variation? Not every child will want to take up music for starters, and those that do will have different preferences for instruments and genres.

If we step back and think of our education system, there is not a lot of respect or room given for diversity of interests, at least until the upper levels of high school. The curriculum from K to roughly grade 10 is relatively standard. We give all kids a taste of everything – some they will take to, some they will want to reject, but they are required to at least try it (sort of like making sure kids eat their vegetables?). And we select winners (at least for university admissions) based on whether they were able to succeed (i.e., get A’s) at subjects whether or not they actually enjoyed them. There’s a whole other topic for discussion, but I wouldn’t be the first to question the social consequences of that selection policy.

Specialization appears to be something that is left to the after-school or non-school part of a child’s life. Perhaps that is fine. I just want to mark that as the case.

That’s all I’m going to write for now. My main point was to push back a bit on the hand-wringing over the “STEM crisis” and distinguish between general and specific wishes for our children. This is a work in progress, but at some point I want to develop a clearer idea of how variation plays out in society and education.

When prior scores are not destiny

This post is for statistics and assessment wonks.  I’ve been really engaged in a bit of data detective work, and one of my findings-in-progress has whacked me up side the head, making me re-think my interpretation of some common statistics.

Here’s the setup. In lots of educational experimental designs we have some sort of measure of prior achievement – this can be last year’s end-of-year test score, or a pre-test administered in early Fall.  Then (details vary depending on the design) we have one group of students/teachers try one thing, and another group do something else. We then administer a test at the end of the course, and compare the test score distributions (center and spread) between the two groups. What we’re looking for is a difference in the mean outcomes between the two groups.

So, why do we even need a measure of prior achievement? If we’ve randomly assigned students/teachers to groups, we really don’t. In principle, with a large enough sample, those two groups will have somewhat equal distributions of intellectual ability, motivation, special needs, etc. If the assignment isn’t random, though – say one group of schools is trying out a new piece of software, while another group of schools isn’t – then we have to worry that the schools using software may be “advantaged” as a group, or different in some other substantial way. Comparing the students on prior achievement scores can be one way of assuring ourselves that the two groups of students were similar (enough) upon entry to the study.  I’m glossing over lots of technical details here – whole books have been written on the ins and outs of various experimental designs.

Here’s another reason we like prior achievement measures, even with randomized experiments: they give us a lot more statistical power. What does that mean? Comparing the mean outcome score of two groups is done against a background of a lot of variation. Let’s say the mean scores of group A are 75% and group B are 65%. That’s a 10 percentage point difference. But let’s say the scores for both groups range from 30% to 100%. We’re looking at a 10 point difference against a background of a much wider spread of scores. It turns out that if the spread of scores is very large relative to the mean difference we see, we start to worry that our result isn’t “real” but is in fact just an artifact of some statistical randomness in our sample. In more jargon-y language, our result may not be “statistically significant” even it the difference is educationally important.

Prior scores to the rescue. We can use these to eliminate some of the spread of outcome scores by first using the prior scores to predict what the outcomes scores would likely be for a given student. Then we look at the mean difference of two groups against not the spread of scores, but the spread of predicted scores. That ends up reducing a lot of the variation in the background and draws out our “signal” against the “noise” more clearly.  Again, this is a hand-wavy explanation, but that’s the essence of it. (A somewhat equivalent model is to look at the  gains from pretest to posttest and compare those gains across groups. This requires a few extra conditions but is entirely feasible and increases power for the same reasons).

In order for this to work, it is very helpful to have a prior achievement measure that is highly predictive of the outcome. When we have a strong predictor, we can (it turns out) be much more confident that any experimental manipulation or comparisons we observe are “real” and not due to random noise. And for many standardized tests across large samples, this is the case – the best predictor of how well a student does at the end of grade G is how well they were doing at the end of grade G-1. Scores at the end of grade G-1 swamp race, SES, first language… all of these predictors virtually disappear once we know prior scores.

What happens in the case when the prior test scores don’t predict outcomes very well? From a statistical power perspective, we’re in trouble – we may not have reduced the “noise” adequately enough to detect our signal. Or, it could indicate technical issues with the tests themselves – they may not be very reliable (meaning the same student taking both tests near to one another in time may get wildly different scores). In general, I’ve historically been disappointed by low pretest/posttest correlations.

So today I’m engaged in some really interesting data detective work. A bunch of universities are trying out this nifty new way of teaching developmental math – that’s the course you have to take if your math skills aren’t quite what are needed to engage in college-level quantitative coursework. It’s a well-known problem course, particularly in the community colleges: students may take a developmental math course 2 or 3 times, fail it each time, accumulate no college credits, and be in debt after this discouraging experience. This is a recipe for dropping out of school entirely.

In my research, I’ve been looking at how different instructors go about using this nifty new method (I’m keeping the details vague to protect both the research and participant interests – this is all very preliminary stuff). One thing I noticed is that in some classes, the pretest predicts the posttest very accurately. In others, it barely predicts the outcome at all. The “old” me was happy to see the classrooms with high prediction – it made detecting the “outlier” students, those that were going against all predicted trends, easier to spot. The classes with low prediction were going to cause me trouble in spotting “mainstream” and “outlier” students.

Then it hit me – how should I interpret the low pretest-posttest correlation? It wasn’t a problem with test reliability – the same tests were being used across all instructors and institutions, and were known to be reliable. Restriction of range wasn’t a problem either (although I still need to document that for sure) – sometimes we get low correlations because, for example, everyone aces the posttest – there is therefore very little variation to “predict” in the first place.

Here’s one interpretation: the instructors in the low pretest-posttest correlation classrooms are doing something interesting and adaptive to change a student’s trajectory. Think about it – high pretest-posttest correlation essentially means “pretest is destiny” – if I know what you score before even entering the course, I can very well predict what you’ll score on the final exam. It’s not that you won’t learn anything – we can have high correlations even if every student learns a whole lot. It’s just that whatever your rank order in the course was when you came in, that’ll likely be your rank order at the end of the course, too. And usually the bottom XX% of that distribution fails the class.

So rather than strong pretest-posttest correlations being desirable for power, I’m starting to see them as indicators of “non-adaptive instruction.” This means whatever is going on in the course, it’s not affecting the relative ranking of students; put another way, it’s affecting each student’s learning somewhat consistently. Again, it doesn’t mean they’re not learning, just that they’re still distributed similarly relative to one another. I’m agnostic as to whether this constitutes a “problem” – that’s actually a pretty deep question I don’t want to dive into in this post.

I’m intrigued for many reasons by the concept of effective adaptive instruction – giving the bottom performers extra attention or resources so that they may not just close the gap but leap ahead of other students in the class. It’s really hard to find good examples of this in general education research – for better or worse, relative ranks on test scores are stubbornly persistent. It also means, however, that the standard statistical models we use are not accomplishing everything we want in courses where adaptive instruction is the norm. “Further research is needed” is music to the ears of one who makes a living conducting research. 🙂

I’m going to be writing up a more detailed and technical treatment of this over the next months and years, but I wanted to get an idea down on “paper” and this blog seemed like a good place to plant it. It may turn out that these interesting classes are not being adaptive at all – the low pretest-posttest correlations could be due to something else entirely. Time will tell.

Starting to bite the (non-magical) bullet

I’ve been deferring writing about my professional life during NaBloPoMo. Have you ever faced a project so large and complicated that it’s difficult to even know where to begin? That’s what it feels like when I tried to put down on (e-)paper my thoughts about 21st century education.

Another inhibition I’m feeling has to do with professional pride. I don’t mind blathering about topics my paycheck and professional reputation don’t depend on. When I write about education – particularly the learning sciences – I feel the need to start citing sources, dotting i’s and crossing t’s. Another bad professional habit – a reluctance to comment in areas outside my particular specialties. For example, I know a lot has been written on the history of education reforms, but I only have a fairly cursory understanding of that history. In my blogging meanderings I’m likely to “discover” ideas that have been well known for decades. So first, an explicit disclaimer: I’m going to be using this blog to throw some wet clay on the wheel and start shaping components of arguments. Please join in by writing in the comments section. Let’s be enthusiastic amateurs together for the sake of discovery.

(Regarding the title of this post – one theme through this post and others in this thread will be that there are no magic bullets. This I feel confident in stating unequivocally.)

Theme of Variation

I came across a blurb on the CAST web site (Center for Applied Special Technologies)

The Average Learner Myth

Typically, a “mythical average learner” is used as the basis for creating a “one size fits all” curriculum. Often attempts to address learner variability take a remediation approach that emphasizes how individuals who least resemble the “mythical average” can overcome the ways in which they are different…

The most consistent finding to emerge from the interdisciplinary study of learning is that when it comes to learning, natural variability is the rule, not the exception. What is perhaps most important to understand about learner variability is not that it exists, but that not all of it is random. Because some variability is systematic, you can design for it in advance. This approach is called Universal Design for Learning (UDL). UDL is an educational framework that guides the design of learning goals, materials, methods, and assessments as well as the policies surrounding these curricular elements with a diversity of learners in mind.

The idea I want to focus on is variation and all that entails. In our daily communication, particularly in the media, we often talk about averages and trends: stock prices are rising, math scores are falling, the unemployment rate is below 6%. Statisticians call these measures of central tendency – they describe a sense of “center” in a collection of stuff. And they’ve very useful – measures of center are the most frequently cited statistics by far.

I want to focus on the second-most used statistic, a measure of dispersion or “spread” (when I teach a research methods course, I often use the colloquial terms “center” and “spread” to describe these two common statistics). Stock prices are rising… well, are they all rising? By the same degree? Probably not, and knowing something about how strongly individual stocks (or collections of stocks) vary from one another is important. Math scores are falling… everyone’s? Are there identifiable subgroups? And isn’t some variation in individual scores due to the random luck of having particular items on the test? The unemployment rate is an interesting measure of “average” as it’s composed of two values: 0 (not employed) and 1 (employed). Nobody is 6% employed (okay, I know, we can talk about partial or part-time employment, or number of hours worked as a fraction of the desired number of working hours, which means one could in theory be over-employed).

Variation, dispersion… it’s no exaggeration to argue that life itself could not function without it. Biological evolution critically depends on variation, in particular variation in “fitness” for passing on one’s genes. Fitness is a relative concept – an organism is not universally “fit,” but is fit insofar as it can function well within its environment. Change the environment, and the organism’s fitness may rise or fall.

Now, there are places where we want to tame variation. Manufacturing comes to mind, particularly when safety is a concern. In producing turbines for aircraft engines we don’t want variation in the stiffness of the material or the weight of the individual fan blades. Variation in that case is a problem – drift too far away from the center and things start to go wrong very quickly. So let’s bear in mind that in some cases, variation is a good thing, and in other cases variation is to be avoided.

Education is a process of guiding human development. So, do we want lots of Darwinian variation, whereby some people are more “fit” for their environment than others, or do we want aircraft manufacturing, with very tight tolerances for assuring uniformity in the components? (Hint: it’s not a black or white answer. “It depends.”)

Many of the arguments in education policy have underlying them a position regarding variation. This position is not always explicitly articulated, not for nefarious reasons, but because one’s belief in the vice/virtue of diversity seems so obvious as to be not worth stating. I am going to argue that attending to variation, diversity, call it what you will, is not simply a “politically correct” genuflection, but rather an essential part of any intelligent discussion of educational policy.

Standards and Standardization

The idea of “standards” gets thrown around a lot. “We need higher education standards” is a popular political slogan. A “standard” is more a concept of location, rather than dispersion. Remember, we can think in terms of “center” and “spread”. Standards tend to stick a pin in a location and say “everything here and above = good.”

I’m not trying to argue that standards are good or bad, just what they entail, and what we really mean by them. Again, standards for aircraft engines are a non-controversial good thing.

I think it was Lee Schulman who wrote about the difference between Michelin Guide standards and Board of Health standards for restaurants.  The Michelin web site is pretty explicit:

Michelin inspectors eat their way around the globe, selecting outstanding restaurants for a diverse range of comforts, tastes and budgets. Travelers and locals alike rely on the red MICHELIN Guides to help them discover exceptional meals and hidden gems in neighborhoods across cities and regions.

I’ve underlined the emphasis of diversity. A Michelin rating combines elements of a central location (“this restaurant is consistently good”) with elements of variation. When we look more closely at the criteria Michelin inspectors use we find:

One star indicates a very good restaurant in its category, offering cuisine prepared to a consistently high standard…

So there is that coupling of central tendency and spread. The idea of variation shows up in two different ways in that sentence. First, we’re certifying a “very good restaurant in its category,” which implies that there can be several senses in which a restaurant may be very good. A very good Thai restaurant may not rate very highly if a customer is in the mood for Italian food. A second way dispersion appears is in the phrase “consistently high standard” – consistency tends to imply low variation (again, think aircraft engine components). We don’t want any bad surprises here (but for higher star ratings, we might delight in pleasant surprises).

I think we can agree we like the concept of Michelin standards for, say, schools and other educational institutions. There is that nagging question of “in its category,” though, and we’ll have to come back to that. We also want low variation in terms of meeting/exceeding a standard. Again, this dual sense of an expanded repertoire of standards by category, but a sense of quality control as well.

I can’t remember whether it was Lee Schulman or his colleague Elliot Eisner who asked, rhetorically, “does anybody really want to go to a restaurant for a standardized meal?” (Ok, yes, there are times when the site of a McDonalds in a foreign land might actually attract American tourists).  What he’s referring to is the boring predictability of a “standard” meal. Within certain bounds, we enjoy variety. We may particularly delight in being pleasantly surprised.

I’m running out of steam at this point. Let me wrap up by noting that standards have their place and use, and we need to be aware of what we mean by standards. Also, what sort of situational variation are we allowing for our standards (think high quality Thai food vs high quality Italian food). Last, let’s remember the claim of CAST that “when it comes to learning, natural variability is the rule, not the exception.”  We need to continue to think about natural variability in learning against a backdrop of  “standards.”

More thoughts on reading & writing

In a comment to my last blog post, drkrisg writes

I have a few thoughts here, so here is my first one 🙂. Do you want people to take you “seriously”? Is that really the question you are asking? Or are you more asking if people will derive pleasure and/or interest from your writing?

My thoughts in that posting were a bit all over the map – some autobiographical, some a virtual primal scream at the frustration of trying to understand “the game” of academic literary analysis, some an inquiry into rhetoric and justification. Let me take a second pass at these and see whether it’s any clearer this time around.

One can “write” for a number of purposes. Entertainment/amusement, provocation, inspiration, persuasion, and literal communication all come to mind. (Edit: here’s an article that explores this question in more depth. And this book looks like an interesting treatment of the “why write” question.) I’m mainly thinking of writing with the purpose of persuasion or illumination, where the author has a definite point to make, be that though fictional or non-fiction genres.

First, just a little more personal background. My doctoral thesis studied how adolescents acquire different forms of “knowing” as they develop. I’ll use a taxonomy from Women’s Ways of Knowing as a way of summarizing what I mean (but read that linked Wikipedia article for more detail)

  • Silence (feeling deaf and dumb)
  • Received Knowing (knowledge as a set of absolute truths received from infallible authorities)
  • Subjective Knowing (the “inner voice”)
  • Procedural Knowing (methods for reliably evaluating knowledge claims)
  • Constructed Knowing (integration of subjective and procedural)

I was particularly interested in how/whether/when adolescents jumped between “received knowing” and “procedural knowing.” Some kids study math as “received knowers” throughout their K-12 careers, while others early on recognize that there is reason and logic behind mathematical claims. Others may treat history as a “received knowing” subject (memorizing names, dates, and places) while some treat historical knowledge as an integration of contexts, perspectives of authors, evaluations of historical sources and multiple perspectives, etc. (What did I learn? Stay tuned for a future post…)

Now, when I come across a blog post or editorial that is essentially non-fiction in nature, I feel confident in my ability to evaluate any claims being made and the warrants for those claims. There are assumptions to be made, for sure. For example, I generally assume an author is not deliberately lying or mis-stating facts (I know, a naive assumption in many cases, particularly with regard to foreign policy). But I can spot logical fallacies, question assumptions behind claims, and otherwise weigh the persuasiveness of an argument.

With fiction the waters become much more murky for me. In “serious” literature the author is often holding up a situation or event or relationship for our inspection. The classics of ancient Greece and Rome were held up as exemplars of good, virtuous living for generations of school children, for example. The “message” of the author (and yes, I’m thinking of cases where authors write to make a point, not solely as entertainment or artistic expression) should never be too overt or the piece comes off as “heavy handed.”

Fictional works need to be studied more deeply, in a sense, than non-fiction writing. With non-fiction you generally know what the author is up to; in fiction it can be more subtle (and yes, I do appreciate the artistry of good fiction writing – that’s not the issue). And here is my fundamental conundrum – fiction may be more or less grounded in reality. By that I mean the author can paint a portrait of a realistic-seeming situation, with realistic-seeming characters, acting in perhaps a not-very-plausible manner.

I guess I need an example, and I’ll pull from some of the criticisms I remember Denis Phillips making about treating fiction as social science research. Basically, the example goes like this: imagine someone telling a story of a teacher in a classroom, and the particular events of that day. They may illustrate what we call “good pedagogy,” student interactions that a seasons teacher would recognize as realistic – all in all, a fairly plausible story. Perhaps the author is illustrating the virtue of, let’s say, listening deeply to student arguments.

Phillips argues that in spite of its illustrative value (perhaps this narrative showcase a particular concept being taught in an educational methods class), the fact that it never actually happened is worrisome. That is, the story was not constrained by any real facts on the ground. It was plausible, but not strictly true.

Here is why I worry about this – let’s take that story one step further. Now the teacher is a bit impatient with the African American students in his class, and perhaps in the narrator’s mind we see his pitying of the genetic intellectual inferiority of these students. We would all (I hope) recognize this as ignorant racism. The expected narrative arc would show how the teacher eventually changes his mind or finds some other redemption.

But perhaps there is no happy ending to this story – the classroom dynamics are illustrated and this thick dollop of racial prejudice is sitting front and center of the narrative.  Worse still, the university instructor who assigned this story as a reading goes on to talk about the “well known” inferiority of Black students in his class. In that case, a fictional story was used to support a factually inaccurate and morally bankrupt lesson.

Here’s my point – I don’t think one can simply assume that students would necessarily see through this as “unrealistic.” I crafted this example precisely because in some regions of the US and in some social circles (if not now then certainly within the recent past) this is an entirely plausible scenario. The fictional work issued to illustrate a “point” that is simply not grounded in reality.

So when I pick up a piece of fiction – say, about Palestinian and Israeli teens falling in love – I’m going to absorb a lot of attitudes and situational inferences from the work. I’ll probably claim to have learned a lot about what it’s like to be a Palestinian living in Israel – and I won’t have any basis whatsoever for making that claim. All I have, really, is some naive trust in the author’s good intentions. I could be, for all I know, accepting either a sugar-coated version of reality or a horribly prejudiced narrative (or both), and with general ignorance of facts on the ground may find myself forming opinions – strong opinions, even – based on pure fiction.

I’m unresolved on this point – what do I dare draw from fiction as a “lesson,” and what do I hold tentatively, at arms length, as an “insight” for further exploration? (Obviously the latter is a safer choice) Again, the specter of confirmation bias looms – I certainly can’t use mere “gut check” to evaluate the lesson, for many Southern whites in the 1950’s would gut check the story about those poor Negro kids and find that it fits their world view perfectly.

Now if you’re reading this, and you were a literature major in college, or otherwise don’t share my blind spot about reading fiction, you may be chuckling to yourself about how I’ve gotten this all wrong, or how I’m fundamentally mis-understanding how one should approach the deep reading of literature. If so, please leave me a comment! I’d really like to wrap my head around what insights I can reasonably expect from fiction without being led down a “plausible” but otherwise unproductive path.

Back to my roots – middle school math

Once upon a time, I was a high school math teacher. Twice upon a time, actually. Both times I flunked out. Well, not really flunked out – voluntarily withdrew from the profession. The first time I was 23 and teaching in a private school that turned out to be a bit of a cult (no exaggeration). Six years later I was in a public school with a “mentor” teacher who would leave the class with me when he needed to meet with his real estate clients (guess what his moonlighting job was). Sure, I could blame my early exits on the particulars of the situations, but the second go-around also taught me that I really didn’t want to work with adolescents so intensely day after day. I thought I did, but when push came to shove I preferred working with adults, which I’ve done ever since.

Now I live with two 7th graders. It’s homework time. The distributive property. Simplifying algebraic expressions. And everything I’ve been studying about middle-school mathematics teaching and learning is coming to the surface. There’s the basic “how do you teach algebra” question, but I had that pretty well down from the get-go. Then there’s the attitude question: “why do we have to learn this stuff?” Tonight’s sticking point – developing the metal habit of careful accounting for terms and signs. It’s not that the procedure for distributing terms is difficult, but it entails really understanding each component of the process, why a particular transformation is used, and careful book keeping. That’s tonight’s struggle.

An example: simplify 6b – 2(2b – 7) = 21.  The tricky part here is keeping track of the minus signs. Actually, working through this has exposed some confusion in the student between unary minus (“eyebrow level minus”, as my old teacher called it) and binary minus (“belly-button minus”). So, what gets distributed here?  There are a couple of ways to approach it. One is to treat the term 2(2b-7) as the basic unit to be unpacked, and you get:

6b – [2(2b) – 2(7)] = 21

Then you’ve got to deal with that minus sign before the expression in brackets – essentially distributing a (-1) again.

Or, you can try to distribute (-2) across the expression, and end up with

6b -2(2b) -2(-7) = 21

My student was not tracking the minus signs accurately, in part due to some odd (but not entirely incorrect) use of parentheses that obscured what was happening to the signs of each term.

Here’s where we hit the wall – I had started with a simpler version of this where all signs were +. Then I moved to just having a minus sign inside the parenthesis – no problem. Then I switched to a minus sign after the 6b but a + sign within the parentheses – trouble! That’s where the diagnostic flag went up. How to either distribute that pesky minus sign, or block out the whole expression in parentheses. Both were not making sense to the student. Blocking out the whole expression in parentheses (my first expansion example above) was a non-starter. But distributing -2 as a factor ran into trouble because “that’s not a negative 2, that’s a minus!” (in the way only a 7th grade girl can whine).  In the end she worked enough examples that she’s learned to be aware of the -a(b-c) forms, and this expands to -ab + ac, but the learning is not robust. She hasn’t yet seen, for example, -a(-b-c) expressions, and I’m sure she’ll struggle as soon as she sees one.

Attitude. As soon as I tried to work an example I got the “just tell me if it’s right!” impatience. She’s not really trying to “get” it, but “get through” it. And – tonight at least – there isn’t a lot I’m going to do to change that attitude. That’s a longer term… I was going to use the word “battle,” but that frames the situation as one of domination. How to get her to take an interest in how algebra works? That’s the question. If she had some degree of curiosity tonight, I could steer her energy in a productive direction. But what I’m feeling is a sense of drudgery, that the homework is something to be gotten through and then moved past.

(Her sister, on the other hand, has an entirely different set of issues: she grasps the concepts quickly, is actually curious about how things work, aces tests, but is “rebelling” by not actually handing in her work and lying about what she actually has to do in a given night. That’s a later story…)

So what to do? This situation, right here, repeated day after day as a high school teacher, is where I hit a wall. If a student isn’t really latching onto the problem, how do I inspire? I had a high school senior once say to me “Mr. G., I know you’re trying hard, but really, I’m gonna take this course again in community college next year anyway, so I just don’t care.”  Anybody who has ever found a passion – or even a modest interest – in life knows that feeling of “latching on.” One long-term question I’ve been really curious about is how to “transfer” that attitude of curiosity from situations where it occurs naturally to those where it might take a little work.

Then again, why *should* a student be deeply curious about the ins and outs of algebra? I was, but I wouldn’t claim that everyone *should* be. I was never that curious about British literature as a student, and still am not. But if I were in school right now with a general ed. requirement that included British literature, I would try to understand what the instructor saw in the subject matter. In fact, that’s one of the joys of getting to know somebody new – understanding what they’re interested in even if it’s not my own personal interest. But as a 7th grader, I was either interested in something or I wasn’t… but I have to wonder how my interests may have been shaped by talented teachers. I had a 6th grade social studies teacher who made world history fascinating for a year, and that was the only time I enjoyed a social studies class through 12 years of schooling.

I actually just asked the 7th grader about this – whether she was “interested” in understanding. She said she was actually interested when she started the homework, “but then it got hard.” She agrees that she’s not always interested in math, “not like that boy in class who gets excited whenever the teacher is about to do something new.” It makes me wonder, how much interest is enough?

To be continued…

The medicine man behind the curtain

Over the past year or so I’ve journeyed with my partner through a significant health crisis. It began with what appeared to be an allergic reaction to an antibiotic (Flagyl) – when her breathing was affected I took her straight to an emergency room. There she began vomiting, was unable to walk on her own, and was having trouble controlling her limbs. The attending ER doctor called a “stroke alert,” and I was impressed with how the medical team ran her through a complete diagnostic screening, complete with CAT scan to check for active bleeding on the brain. Fortunately, she was cleared – no sign of stroke.

So now what do we do? The ER doc was puzzled. He focused on high blood pressure and said it could be some sort of “hypertensive encephalopathy,” but was otherwise stymied. They admitted her to a neurology unit for observation, and that turned out to be a critical fork in the road (since they could have admitted her to, I suppose, a ward focusing on allergies and immune response). The attending neurologist said she could be suffering from a very rare neurological disorder, but was not sure. The following day, his replacement (while the attending was away at a conference) looked at my partner’s chart, looked at her and said, effectively, “I don’t know why you’re here – just get up and get out of bed. This is all in your head.” He meant that, of course, in a psychiatric sense. My partner was sent home frightened, with no treatment plan and no significant follow-up planned with a doctor.

I’ll fast forward through the next six months, else this post will become a novella. A repeat ER episode a month after the first, this time at a different hospital. Once again physicians argued amongst themselves and with my partner over diagnoses, and once again a psychiatric “diagnosis of exclusion” was assigned (i.e., if we can’t find a definitive set of measurable symptoms, it must be psychological). Fortunately one senior physician suggested a particular medication known for its anti-histamine properties, and that started us down a road of significant recovery.

For the past year I have watched no fewer than 8 physicians of varying specialties (internal medicine, immunology, psychiatry, endocrinology, in addition to several neurologists) try to figure out what’s wrong. What struck me first was how varied they were in their respect and attentiveness to the patient. Some clearly were ready to dismiss the “hysterical woman” in spite of her calm, rational presentation of her history. Others were very respectful and kind. Physicians also differed in their comfort with the unknown – again, some seemed to want to reach a decisive diagnosis rather than say “I don’t know,” while others were more comfortable sitting with an open mystery and acknowledging as much.

In education we often hear calls to “professionalize” teaching by having it function more like medicine (at some point I’ll deconstruct the whole “education = medicine” argument in depth – fodder for a future post). In the “reformer’s” idealized view, doctors make rational treatment decisions based on the best science, not snap judgments based on the latest diagnostic fads, stereotypes about patients, or fear of looking incompetent. Hmmm.  I beg to differ. Doctors are professional humans operating in resource-constrained environments, taking care of people with complex needs. Just like teachers. And both are susceptible to getting it wrong sometimes.

Just as I’ve seen what a difference it makes for students to have parents who are able to advocate on their behalf with a teacher, I have to wonder what becomes of patients who do not have graduate degrees, who do not feel entitled to ask a physician to elaborate on their reasoning or on why other diagnostic avenues may be excluded.  In both education and medicine having personal and social capital affects the outcome far more strongly than I would like to believe.

Tomorrow is election day in the U.S.  Be sure to get out and vote!

(Photo: an EEG trace. Lots and lots of data!)

Data!