...as good as I wanna be...: Rethinking negative reinforcement

Animals in a state of relief.

As I wrote in my last post regarding the increasingly fuzzy distinction between classical and operant conditioning, old terminology can hobble new thinking, and given how awkward the language of behaviorism was at its inception, we shouldn't be surprised to discover how creaky it has become in its dotage. (The surprise lies in its holding up at all!) The positive/negative confusion has never really cleared up for many lay training students (positive punishment? WTF?), and no term has given people more trouble than "negative reinforcement," which bundles all the paradoxes and blurred connotations of behaviorist theory into seven dry-sounding but intellectually and emotionally fraught syllables. Technically, it's negative because it describes the removal of some "thing" (which may not be a thing at all). Colloquially, it's negative because the thing that gets removed needs to be nasty or at least unpleasant in order for its removal to be reinforcing, and so the deliberate use of negative reinforcement implies (and carries the ghost of) the deliberate introduction of nasty or unpleasant things, i.e., positive punishment. That's the theoretical tangle as clearly as I can state it (not very!), and it has significant consequences in practice, as teachers and trainers line up on either side of the R+/R- divide (and take occasional potshots at each other over the crevasse that yawns between them).

Does the theory still encompass what we know of reality? Do the terms describe with satisfactory accuracy our growing knowledge of how animals learn? On the contrary, they appear to be busting at the seams. We're patching as fast as we can right now, but I think our best hope of finding our way to a new kind of coherence (to a description of teaching and learning that covers our collective butts once again) may be to pick at the threads where they're coming unraveled. To combine my canyon and sewing metaphors, these may become the ropes that swing us over the training divide. (Ack.)

Some of the most exciting work in contemporary learning theory is being done by scientists and practitioners (e.g. teachers and trainers) who dare to test the boundaries between behaviorism and humanism; between the body and the mind; between emotion and thought; between psychology, ethology, and neuroscience; between biological and historical accounts of the past; between objective and subjective accounts of the present. On the scholarly and/or scientific side, Frans de Waal, Sarah Blaffer Hrdy, Oliver Sacks, Irene Pepperberg, Marc Bekoff, Mihaly Csikszentmihaly, Alison Gopnik, Timothy Wilson, Gerd Gigerenzer, Antonio Damasio, Daniel Kahneman, and V. S. Ramachandran are some of the great "unravelers" I've encountered (if only on the page), and Jaak Panksepp seems like someone who might actually help us knit a new pattern.

But I think all of us who practice learning theory with focused intent and honest reflection can contribute significantly to the radical revision now underway, and a re-examination of the R+/R- split could be an excellent place to begin. I'm not prepared to say that, as a philosophical distinction, it's totally illusory (I'd like to tackle that question in another post), but as a scientific distinction, it may be. This is one of many places where Jaak Panksepp's work is so fascinating and potentially useful, as he's been investigating the physiological and neurochemical bases of approach and avoidance, of appetite and satisfaction, of aversion and reward. I look forward to the publication of his promised book for the lay reader, because I hope it will make his insights more widely accessible. (Temple Grandin's Animals in Translation remains the best introduction to his ideas for the general reader, as far as I know.) In the meantime, I've been making my way very slowly through Affective Neuroscience and highly recommend it despite its density. I hope I don't distort its content too badly here!

In his book, Panksepp describes a discrete number of affective (emotional) processes whose physiological coherence is marked enough that he is comfortable labeling them "systems." These are activated and expressed in more or less predictable ways by animals of diverse species, and we can guess from our common evolutionary history that there are also strong similarities in how they are subjectively experienced. Panksepp is keen to avoid Skinner's mistake of choosing his terms in opposition to common parlance, so he simply capitalizes the colloquial names for these primal emotions/processes to denote their technical use: FEAR, PANIC, RAGE, and SEEKING. While this group may appear heavily weighted to the unpleasant, the SEEKING system encompasses many varieties of pleasurable anticipation.

If I understand him correctly, Panksepp suggests that most of our strongest appetites or drives (and the emotions that accompany their satisfaction or frustration) arise from various kinds of disequilibrium. A truly safe and contented animal is an animal at rest. FEAR is activated by perceived threats to the self, PANIC by social isolation, and RAGE by constraint (especially of one's access to valued resources). The SEEKING system may be engaged when any of these other emotions is in less than full flower. When we're a little anxious, a little lonely, or a little hungry, our minds/brains are primed to seek out whatever will restore our internal equilibrium: an escape route, a friendly touch, a Hostess cupcake.

In such situations, our minds are also primed to learn, to draw connections between environmental circumstances, our own behavior, and the consequences that result from their meeting. Indeed, our capacity to learn has so many advantages for our continued survival that we are primed to find it intrinsically pleasurable. Thus the SEEKING system affords us pleasures that are largely independent from the satisfaction of consuming a good meal or the relief of escaping a fearsome predator. They're compelling enough to be literally addictive - the SEEKING system appears to be modulated primarily by the action of dopamine, and gets easily hijacked by cocaine and methamphetamine among other stimulants.

In addition, while the research remains sketchy, it appears that the (intrinsically rewarding) SEEKING system is activated whether an animal is seeking out the object of some appetitive desire (food, a mate, etc.) or seeking escape from a perceived threat.

Okay, if you've followed this far, I should finally be able to bring the conversation back around to positive and negative reinforcement and the question of whether they're entirely distinct. Once we start thinking about drive or desire in terms of disequilibrium, it becomes harder to draw an absolute line between the internal pressure of hunger and the external pressure of a bit or a leg; it becomes harder to separate the gift of peace from the gift of an apple. It becomes clear that all effective teaching necessarily "exploits" one appetite or another. And it becomes much more interesting and rich to talk about how to do so in a way that best enlists an animal's SEEKING system and taps into our shared love of learning.

I don't want to tax your patience much further in this post, but in closing I'd like to quote a couple of eloquent descriptions of expert horse trainers who supposedly sit on opposite sides of the R+/R- divide, but who clearly overlap in their ability to help other animals to flourish. I already knew I needed to learn more about Alex Kurland's work, but Cindy Martin persuaded me that I'd better do it soon. She wrote in an email, "When the dog world found clicker training, many people abandoned their leashes, vowed to free-shape everything and never touch their dogs. Well, with horses, we're bound to have physical contact. Riding is about tactile cues. Our weight shifts, we squeeze with our legs, we ask with the reins. Alex developed the idea of pressure as information, below the level of a
true aversive. So is it still R-? Probably. But if we very quickly lighten pressure, by highlighting the first approximations of a desired behavior, with the click/treat, then all these kinds of pressure can be information, simply cues for the horse. And they can still learn to work for 'the release.' In fact, the release of subtle pressure can be a low value reinforcer, once the horse gets more sophisticated, and the click/treat can highlight the especially good responses. Alex calls this process, 'Shaping on a point of contact.'"

Emma Kline attended the same Buck Brannaman clinic in Spanaway that inspired me to write my bumptious letter back in November. You can find some lovely reflections on the SEEKING system on her blog, and you can also find her poetic response to seeing Buck at work:

"At one point Buck was talking about how extraordinary it was to be with a horse that was hunting the feel. He talked about giving the horse what it wants most in the world: PEACE. No wonder this guy doesn't need to use treats.

I could feel the lines in my forehead getting deeper as I strained to see how he was utilizing the laws of science and behavior modification with an accuracy I have rarely seen. And sure enough, he was using a marker and a reward. His marker was the release and his reward was the Peace of Feeling Together.

I think that it is very important to note that this is not a "peacefulness" that comes from robbing the horse of his sense of security or taking away the little peace he, as a flight animal, is born with. Its about adding a peace the horse didn't have before. That's when horse and human become more than what we were separately. So in fact, the release is a marker and not a reward."

4 comments:

FunderFebruary 26, 2012 at 8:42 PM
Fascinating post. I just got home from six hours of riding so I'm too brain-dead to say anything meaningful yet, but I'll come back to it tomorrow.
Emma Kline of Northwood FarmsFebruary 28, 2012 at 1:26 PM
I find this to be a delightfully technical and thought provoking post. Its so fascinating to think of this art as evolving. Thank you for including my blogpost as well. I feel honoured to be a part of the whole "thing". :)emma
Gretchen IcenogleFebruary 28, 2012 at 2:40 PM
Thanks Emma and Funder for your own provocative intelligence and eloquence! Damn, but I love a juicy discussion.
FunderMarch 7, 2012 at 4:52 PM
Ok, I've tended to the rest of the fires in my life and I'm back here to think Deep Thoughts. Panksepp's work sounds really interesting! I need to buy a new copy of AIT - I read it, and was deeply affected by it, before I'd ever bought my first horse, but I sold/lost it in a move years ago.

This FEAR/PANIC/RAGE/SEEKING paradigm seems really useful to me. You said, "it appears that the (intrinsically rewarding) SEEKING system is activated whether an animal is seeking out the object of some appetitive desire (food, a mate, etc.) or seeking escape from a perceived threat. " That fits with my imperfect and limited experience with horses. My "-R" cues hardly ever rise to a level that would cause fear or panic in my horse. They don't shut down her ability to dialogue with me, and they don't make her feel inherently threatened. I just want her to start seeking the response I want. I've always felt that well-done traditional or "natural" horsemanship leaves the horse curious to keep working, just like well-done clicker training. I think it's really interesting that a neuroscientist has found some biological justification for my private theory. :)

Sunday, February 26, 2012

Rethinking negative reinforcement

4 comments: