Sunday, February 26, 2012

Rethinking negative reinforcement

Animals in a state of relief.
As I wrote in my last post regarding the increasingly fuzzy distinction between classical and operant conditioning, old terminology can hobble new thinking, and given how awkward the language of behaviorism was at its inception, we shouldn't be surprised to discover how creaky it has become in its dotage. (The surprise lies in its holding up at all!) The positive/negative confusion has never really cleared up for many lay training students (positive punishment? WTF?), and no term has given people more trouble than "negative reinforcement," which bundles all the paradoxes and blurred connotations of behaviorist theory into seven dry-sounding but intellectually and emotionally fraught syllables. Technically, it's negative because it describes the removal of some "thing" (which may not be a thing at all). Colloquially, it's negative because the thing that gets removed needs to be nasty or at least unpleasant in order for its removal to be reinforcing, and so the deliberate use of negative reinforcement implies (and carries the ghost of) the deliberate introduction of nasty or unpleasant things, i.e., positive punishment. That's the theoretical tangle as clearly as I can state it (not very!), and it has significant consequences in practice, as teachers and trainers line up on either side of the R+/R- divide (and take occasional potshots at each other over the crevasse that yawns between them).

Does the theory still encompass what we know of reality? Do the terms describe with satisfactory accuracy our growing knowledge of how animals learn? On the contrary, they appear to be busting at the seams. We're patching as fast as we can right now, but I think our best hope of finding our way to a new kind of coherence (to a description of teaching and learning that covers our collective butts once again) may be to pick at the threads where they're coming unraveled. To combine my canyon and sewing metaphors, these may become the ropes that swing us over the training divide. (Ack.)

Some of the most exciting work in contemporary learning theory is being done by scientists and practitioners (e.g. teachers and trainers) who dare to test the boundaries between behaviorism and humanism; between the body and the mind; between emotion and thought; between psychology, ethology, and neuroscience; between biological and historical accounts of the past; between objective and subjective accounts of the present. On the scholarly and/or scientific side, Frans de Waal, Sarah Blaffer Hrdy, Oliver Sacks, Irene Pepperberg, Marc Bekoff, Mihaly Csikszentmihaly, Alison Gopnik, Timothy Wilson, Gerd Gigerenzer, Antonio Damasio, Daniel Kahneman, and V. S. Ramachandran are some of the great "unravelers" I've encountered (if only on the page), and Jaak Panksepp seems like someone who might actually help us knit a new pattern.

But I think all of us who practice learning theory with focused intent and honest reflection can contribute significantly to the radical revision now underway, and a re-examination of the R+/R- split could be an excellent place to begin. I'm not prepared to say that, as a philosophical distinction, it's totally illusory (I'd like to tackle that question in another post), but as a scientific distinction, it may be. This is one of many places where Jaak Panksepp's work is so fascinating and potentially useful, as he's been investigating the physiological and neurochemical bases of approach and avoidance, of appetite and satisfaction, of aversion and reward. I look forward to the publication of his promised book for the lay reader, because I hope it will make his insights more widely accessible. (Temple Grandin's Animals in Translation remains the best introduction to his ideas for the general reader, as far as I know.) In the meantime, I've been making my way very slowly through Affective Neuroscience and highly recommend it despite its density. I hope I don't distort its content too badly here!

In his book, Panksepp describes a discrete number of affective (emotional) processes whose physiological coherence is marked enough that he is comfortable labeling them "systems." These are activated and expressed in more or less predictable ways by animals of diverse species, and we can guess from our common evolutionary history that there are also strong similarities in how they are subjectively experienced. Panksepp is keen to avoid Skinner's mistake of choosing his terms in opposition to common parlance, so he simply capitalizes the colloquial names for these primal emotions/processes to denote their technical use: FEAR, PANIC, RAGE, and SEEKING. While this group may appear heavily weighted to the unpleasant, the SEEKING system encompasses many varieties of pleasurable anticipation.

If I understand him correctly, Panksepp suggests that most of our strongest appetites or drives (and the emotions that accompany their satisfaction or frustration) arise from various kinds of disequilibrium. A truly safe and contented animal is an animal at rest. FEAR is activated by perceived threats to the self, PANIC by social isolation, and RAGE by constraint (especially of one's access to valued resources). The SEEKING system may be engaged when any of these other emotions is in less than full flower. When we're a little anxious, a little lonely, or a little hungry, our minds/brains are primed to seek out whatever will restore our internal equilibrium: an escape route, a friendly touch, a Hostess cupcake.

In such situations, our minds are also primed to learn, to draw connections between environmental circumstances, our own behavior, and the consequences that result from their meeting. Indeed, our capacity to learn has so many advantages for our continued survival that we are primed to find it intrinsically pleasurable. Thus the SEEKING system affords us pleasures that are largely independent from the satisfaction of consuming a good meal or the relief of escaping a fearsome predator. They're compelling enough to be literally addictive - the SEEKING system appears to be modulated primarily by the action of dopamine, and gets easily hijacked by cocaine and methamphetamine among other stimulants.

In addition, while the research remains sketchy, it appears that the (intrinsically rewarding) SEEKING system is activated whether an animal is seeking out the object of some appetitive desire (food, a mate, etc.) or seeking escape from a perceived threat.

Okay, if you've followed this far, I should finally be able to bring the conversation back around to positive and negative reinforcement and the question of whether they're entirely distinct. Once we start thinking about drive or desire in terms of disequilibrium, it becomes harder to draw an absolute line between the internal pressure of hunger and the external pressure of a bit or a leg; it becomes harder to separate the gift of peace from the gift of an apple. It becomes clear that all effective teaching necessarily "exploits" one appetite or another. And it becomes much more interesting and rich to talk about how to do so in a way that best enlists an animal's SEEKING system and taps into our shared love of learning.

I don't want to tax your patience much further in this post, but in closing I'd like to quote a couple of eloquent descriptions of expert horse trainers who supposedly sit on opposite sides of the R+/R- divide, but who clearly overlap in their ability to help other animals to flourish. I already knew I needed to learn more about Alex Kurland's work, but Cindy Martin persuaded me that I'd better do it soon. She wrote in an email, "When the dog world found clicker training, many people abandoned their leashes, vowed to free-shape everything and never touch their dogs. Well, with horses, we're bound to have physical contact. Riding is about tactile cues. Our weight shifts, we squeeze with our legs, we ask with the reins. Alex developed the idea of pressure as information, below the level of a
true aversive. So is it still R-? Probably. But if we very quickly lighten pressure, by highlighting the first approximations of a desired behavior, with the click/treat, then all these kinds of pressure can be information, simply cues for the horse. And they can still learn to work for 'the release.' In fact, the release of subtle pressure can be a low value reinforcer, once the horse gets more sophisticated, and the click/treat can highlight the especially good responses. Alex calls this process, 'Shaping on a point of contact.'"

Emma Kline attended the same Buck Brannaman clinic in Spanaway that inspired me to write my bumptious letter back in November. You can find some lovely reflections on the SEEKING system on her blog, and you can also find her poetic response to seeing Buck at work:  

"At one point Buck was talking about how extraordinary it was to be with a horse that was hunting the feel. He talked about giving the horse what it wants most in the world: PEACE. No wonder this guy doesn't need to use treats.

I could feel the lines in my forehead getting deeper as I strained to see how he was utilizing the laws of science and behavior modification with an accuracy I have rarely seen. And sure enough, he was using a marker and a reward. His marker was the release and his reward was the Peace of Feeling Together.

I think that it is very important to note that this is not a "peacefulness" that comes from robbing the horse of his sense of security or taking away the little peace he, as a flight animal, is born with. Its about adding a peace the horse didn't have before. That's when horse and human become more than what we were separately. So in fact, the release is a marker and not a reward."

Saturday, February 11, 2012

Have we outgrown our vocabulary?

A couple of weeks ago now, Professor Jesús Rosales-Ruiz (of the Department of Behavior Analysis at the University of North Texas) gave an esoteric but fascinating talk on the disappearing distinction between respondent (a.k.a. classical) and operant conditioning. By tracing the history of the terms and describing the difficulties that contemporary researchers often encounter when trying to apply them with any consistency, he exposed their contingency and fragility: while they have been extremely useful as springboards to the investigation of how we learn, they may prove not to have any real substance. They might even have brought us far enough that we can safely discard them (and move forward more easily without their dead weight). Wittgenstein once noted how many stubborn philosophical problems are in fact problems of vocabulary; we are sometimes slow to recognize when we've exhausted our terms.

But dying words (and the concepts or categories they name) have something left to teach. By looking closely at their definitional foundations, and then taking note of their specific failures vis-à-vis reality, we can identify some of the perceptual biases that made them so appealing in the first place. The lay distinction between "respondent" and "operant" has always hinged on the question of whether or not a response to a given stimulus (or set of stimuli) is voluntary, whether or not it can be brought under conscious control. But as Jesús described in his talk, even during Skinner's time, the erosion of that distinction was already underway, as physiological responses that had been considered perfectly autonomic (such as blood pressure) were brought through biofeedback under conscious control. More recently (and provocatively), challenges have come from the opposite direction, as individual cells have been observed in response patterns that mimic operant conditioning. As Jesús noted, anytime that relationships between contingencies (in the environment and behavior) grow measurably more consistent, learning is taking place. Our loyalty to the terms "respondent" (or "classical") and "operant" may obscure the complex but unified realities of that process.

Among the phenomena resistant to any simple respondent/operant dichotomy has been the tendency of certain behaviors to wander from unconscious to conscious and back to unconscious "control." We're all familiar with this dynamic as it applies to our assimilation of complex skills. The famous four stages of competence trace the general pattern, from unconscious incompetence (we don't know we can't do something), to conscious incompetence (we know we can't do it), to conscious competence (we can do it with great mental effort and focus), to unconscious competence (we can do it without effort and without conscious focus). If I'm a skilled driver, or soccer player, or surgeon, as long as the given challenge falls within the range of what is well-known to me and therefore predictable, the relationship between the contingencies of the environment and the contingencies of my behavior may be so consistent as to appear reflexive, and I will hardly have the sense that I am making a voluntary decision at any juncture. Only novelty is likely to wake me from the dream of competence and force me back into a state of conscious engagement.

"Respondent" and "operant," like "unconscious" and "conscious," may only describe different modes of energetic expenditure. The brain is a highly economical organ, a regular Bartleby when it comes to the heavy lifting required for conscious thought. ("I would prefer not to.") That said, it is much more active at an unconscious level than we generally give it credit for being.


Image by Jolyon.

Sunday, February 5, 2012

A no-brainer?

What can brown do for you?
As I've been getting ready to formally launch my people/dog training business (and to boldly go where no human or canine has gone before, split infinitives be damned), I've been seeking out advice from many corners, trying to sort and synthesize the ideas that seem most salient to my own particular enterprise. The ideal of "synergy" has become clichéd to the point of comedy, but it's worth pursuing nonetheless, particularly in this context. Given that dog training is primarily about bringing diverse desires into harmonious alignment, it makes perfect sense to extrapolate that goal out to one's business relations. So trainers ally themselves with sitters and vets and shelters and other trainers, and so we all sustain the flow of good things and the trust that follows in their wake.

Well, the muse of synergy tapped me on the shoulder the other day and whispered naughtily in my ear. Why not partner my dog training business with an escort service? Preferably one that employs men as well as women, and preferably one with an emphasis on role-playing and a warehouse full of costumes. How wonderful it would be if -- in my work with a reactive dog -- I could make one call and summon a physically imposing man in a UPS uniform, or a woman in scrubs to play vet tech? One day I might say, "Bring a dozen hats and a long, yellow raincoat." On another day, "Hey, do you still have that Dick Cheney mask? Excellent. A briefcase would be good, too."

I'm pretty sure no one has done this yet, and I think I'd better wait until my business is well enough established that such a partnership would be advantageous to the escorts, too. But anyone who wants to run with the idea has my blessing!