An Important Misconception About Placebos

Concerning placebos and the “placebo effect,” there is a distinction that I have struggled to articulate, a distinction I have also noticed highly intelligent humans failing to make. I recently found an excellent explanation of the distinction in a paper questioning the meaning of recent “open-label placebo” trials, and thought it was worth a short piece explaining why it’s important.

Here is the distinction as the authors put it, with citations removed:

Before reviewing findings from OLP studies, it is crucial to clearly demarcate between two distinctive uses for the term placebo. First, is the usage of placebos in RCTs. Here the term is often understood to refer to a certain kind of ‘thing’ (eg, saline injections or sugar pills). Strictly speaking, this interpretation is incorrect: instead, placebos in RCTs ought to be conceived as methodological tools since their function is to duplicate the ‘noise’ associated with clinical trials including spontaneous remission, regression to the mean, Hawthorne effects and placebo effects. Properly understood, then, these types of placebos are deployed as controls that are specifically designed to evaluate the difference—if any—between a control group and a particular treatment under scrutiny. Ideally, in RCTs, controls should mimic the appearance and modality of the particular treatment or medical intervention under investigation. In contrast, placebos in clinical contexts are interventions that may be intentionally or unintentionally administered by practitioners either with the goal of placating patients and/or of eliciting placebo effects.

Blease, C. R., Bernstein, M. H., & Locher, C. (2020). Open-label placebo clinical trials: is it the rationale, the interaction or the pill?. BMJ evidence-based medicine25(5), 159-165.

On the one hand, there is the use of placebos in randomized controlled trials, in which the point is to “duplicate the noise” that’s likely to exist in the treatment group. On the other hand, there are hypothesized “placebo effects” that may take the form of real healing, which is not at all the same.

For a specific example, just because antidepressant trials result in enormous placebo effects does not mean that depression responds to placebo in real life. The proper conclusion to the size of placebo effects in these trials is that the measurement of depression is extremely noisy, to put it in the most polite way.

While true “placebo effects” of the healing variety may exist, it’s worth engaging with these authors’ concerns over how that may be demonstrated, particularly in open-placebo design trials in which the hope is to pave the way toward ethical placebo treatment. The choice of control is particularly tricky; for example, as with antidepressant treatments, simply using “treatment as usual” or “wait list” as controls likely inflates apparent effects. True blinding requires a great deal of subtlety and effort in research design.

In summary: noise isn’t healing.

Now we can all pretend that we knew it all along and never mistook the one for the other!

The Limits of “Help”

As a banana who lives among humans, the sacred beliefs of humans interest me a great deal. By “sacred beliefs” I mean beliefs that are widely shared and have a high level of emotional vehemence surrounding them, preventing them from being questioned, except by trolls. Usually the goal of trolling is to provoke a defensive reaction, so sacred beliefs can often be brought into visibility by effective trolls. 

This is not a troll. I am deadly serious and I think this is extremely important.

The sacred belief I am addressing here seems to have originated in the late twentieth century, and blossomed into ubiquity in the twenty-first century. It concerns “mental health,” and has multiple parts. First, it is the belief that mental illnesses are real diseases, just as serious as ordinary medical diseases, and no more the fault of the sufferer. Responses to trolls like “depression is a choice” bring this belief into visibility. Second, it is the belief that shame and stigma prevent people from seeking help for mental illness. Third, and the only part of the sacred belief system that concerns me here, is the belief that effective treatments for mental illnesses are available, and that once a person overcomes shame and stigma to seek help, the sufferer stands a good chance of getting meaningfully better.

Here, I will focus on one of the most common mental health problems, depression, known in current medical jargon as Major Depressive Disorder. Further, I will focus on the two gold-standard treatments for depression, generally considered to be the most effective treatments: antidepressant medication and cognitive behavioral therapy. I hope to eventually address treatments for two more conditions, schizophrenia and substance abuse disorder, but those will have to wait for future installments.


Medicine has always existed in some form, and there have always been people who claim to have been cured by the techniques of the day, going back at least as far as recorded history and almost certainly much further. But the evidentiary technique that is supposed to separate modern medicine from the misguided attempts at medicine of the human past is the double-blind placebo-controlled trial. Individual trials may lack statistical power and suffer from publication bias and other quality problems, however, so the true best evidence is probably the large meta-analysis of placebo-controlled trials. 

Meta-analyses raised concerns about the efficacy of antidepressant medications almost as soon as they began, but I will focus on a recent large meta-analysis, Cipriani et al., 2018, that received a great deal of attention. Cipriani et al. interpreted their findings thus: “All antidepressants were more efficacious than placebo in adults with major depressive disorder.” This seems like good news, but the bad news was how much more efficacious the drugs were. They report an effect size (standardized mean difference compared to placebo) of .30. This effect is considered “small” according to tradition and current guidelines, but how small is small? 

Most antidepressant trials use one or both of two measurement instruments: either the Hamilton Depression Rating Scale (HAM-D or HDRS), a 52-point rating scale applied by researchers or doctors to judge how depressed a person is, and the Beck Depression Inventory, a self-rating scale. (Note: the HAM-D may sometimes be referred to as the HAM-D-17 or HDRS-17 because it has 17 items, but it is a 52-point scale, as each item may score multiple points toward the total.) An effect size of .3 corresponds to about two points on the 52-point HAM-D scale, which, on the face of it, if you happen to read the instrument itself, is not much. This is not even taking into account that methodological issues and questionable research practices may account for most or even all of the apparent superiority over placebo. A difference smaller than seven points may not even be detectable by clinicians. Interestingly, a Cochrane review of Dance and Movement Therapy rejected the therapy as not having clinical significance because it only reduced depression scores by slightly over seven points above (psychological) placebo, which was less than 25% of baseline in the relevant studies. By this standard, since HAM-D scores in the antidepressant trials were generally in the mid-to-high 20s, at least 6 or 7 points would be necessary to achieve clinical significance. The upshot is that antidepressants “work” to a degree that is much too subtle for anyone to notice, which is bad enough, but also increases suspicions that they don’t work at all.

There are many branching paths of cope that have been explored for the seemingly devastating result that antidepressants are all pretty similar to each other and move a 52-point depression scale by two points at best compared to placebo. One is that the problem is the scale. Maybe using a self-report scale would better capture the healing effects of antidepressants? However, the Beck did even worse than the HAM-D in placebo-controlled trials, so that one can’t be the case. Or perhaps antidepressants work very well for some people, but just not for most people? That line of thinking was crushed too – the variability in scores for placebo was indistinguishable from the variability in scores in the treatment arms. (Note: a previous version of this study, by the same authors, found that there were significant differences, and was retracted because it was wrong. Since the authors originally came to the opposite result, suggesting some amount of researcher allegiance to the hypothesis, the negative result seems especially likely to be valid.) 

Another line of cope is the idea that big placebo effects represent real healing, and that antidepressants should continue to be prescribed for their placebo effect alone. Unfortunately, a great deal of the “placebo effect” found in these studies is probably a result of poor methodology. In the field of dermatology, this is known as “eligibility creep” – researchers inflate scores at baseline in order to qualify subjects, and then don’t inflate the scores at later points of analysis. There does seem to be evidence that this occurs in antidepressant trials. Furthermore, antidepressant medications cause significant side effects, so even if we believed that placebo healing is real healing rather than the result of research bias and questionable research practices, such drugs would not be appropriate to prescribe for this effect. 

Perhaps it shouldn’t be surprising that SSRIs in particular are not effective in treating depression, as every few years, someone points out that the popular serotonin hypothesis of depression is not substantiated by evidence. The most recent is a 2022 review, summarized here. The authors respond to common objections here.  I have no reason to believe that this will be any more effective than similar efforts going back to the 1990s to debunk the serotonin hypothesis; the myth of serotonin seems as sticky and ineradicable as the myth of antidepressant efficacy. 


What about Cognitive Behavioral Therapy? If antidepressants can’t be expected to produce clinically relevant relief from symptoms of depression, what about the most-touted “evidence-based” form of talk therapy? If you have read meta-analyses comparing CBT to “psychological placebo,” you may have been enormous effect sizes reported. A psychological placebo is something like treatment as usual, a waiting list, or some kind of vague “talking to a therapist,” not a real pill placebo. Even the aforementioned “Dance and Movement Therapy” achieved big effects sizes against psychological placebo that bordered on clinical significance. While I do not believe that placebo healing is real healing, the pill placebo control does seem to be something of a questionable research practice limiter. These authors give some suggestions why. I am not familiar with all possible methods of questionable research practices in therapy trials, but it does seem like it’s harder to cheat against a pill placebo compared to a psychological placebo.

When Cognitive Behavioral Therapy is up against pill placebo, the effect size, according to the most recent meta-analysis I could find, is a mere .22 when using the HAM-D. When using the self-report Beck Depression Inventory, the result is indistinguishable from zero and non-significant. As meager as the results for antidepressants are, the results for CBT are even worse. Researchers seem to think that subjects get a trivial amount better, but subjects themselves seem to feel no better compared to pill placebo.

You might imagine that there would be an outcry challenging this analysis, but I wasn’t able to find one. A typical write-up of this finding was the following:

CBT can benefit patients with severe depression, say researchers

…When compared with pill placebo, CBT led to greater symptom reduction on average by a standardized mean difference of -0.22 (95% confidence interval -0.42 to -0.02; P=-.03) on the Hamilton Rating Scale for Depression. The researchers said that this meant that the number needed to treat is 12 in typical cases of major depression, where the expected placebo response rates may be 30-50%. This would compare favorably, they said, with the number needed to treat (9) that can be expected in antidepressants, with an effect size of 0.31 over placebo.

The community seems to have just accepted this finding as meaning that CBT works, without apparently addressing the fact that the effect represents less than a two-point drop on a 52-point symptom scale, which is, as explained above, not clinically significant and probably not even clinically detectable. 

One defense of CBT might be that, unlike antidepressants, at least talk therapy cannot be harmful. I am not sure I believe this. Although CBT is not approved for the treatment of major depression in bananas, my experience with CBT was initially promising: the hope that my bad emotions were caused by bad thoughts, and that deconstructing the bad thoughts would limit the occurrence of bad emotions. However, what I learned through a few weeks of CBT, deconstructing every bad thought and bad emotion, is that the frequency and intensity of bad emotions is not affected at all by reasoning. If anything, the thoughts that co-occurred with bad emotions got even more ridiculous. Without CBT, it might not have been clear that I had no control over the occurrence of bad emotions. This might be regarded by some as a harm. Initially, I’d suspected that the apparent large effect sizes for CBT were a result of subjects answering surveys differently – e.g. experiencing a pure bad emotion rather than identifying it as “guilt” – but apparently my hypothesis was wrong: it was bad controls all the time, and I had been insufficiently cynical.

I am also a bit surprised, given the sacred beliefs mentioned at the beginning, by the fact that the existence of CBT is not considered insulting. It is hard for me to distinguish the methodology of CBT and other talk therapy methodologies from the aforementioned troll “depression is a choice.” If it’s really a disease, how would it make sense for it to be treatable by thinking correctly instead of thinking wrong? But apparently most humans do not make this connection and hence do not thereby feel insulted. 


An interesting rejoinder to the evidence that psychiatric treatments are not particularly effective is that very little medicine is actually effective. Harriet Hall, biting every bullet, titles her article “Most Patients Get No Benefit From Most Drugs.” Certainly, the problem of medicine generally not being effective is not limited to the field of psychiatry. But does that make it somehow excusable that the top treatments for depression cannot produce any clinically relevant effect? If I found out that all shoes tend to degrade into uselessness in two days, it wouldn’t make me feel a lot better to find out that hats also degrade into uselessness in two days. 

If antidepressants and CBT are the best treatments available, and have no clinically significant effect on symptoms of depression, what “help” is reasonably available? If the “stigma and shame” preventing people from seeking mental health treatment disappeared overnight, and everyone got treatment – and this seems to have largely happened, as antidepressants and CBT are as popular as they have ever been – it seems unlikely to make any difference in outcome. If the best the field has to offer is a glorified placebo, perhaps it has no help to offer at all. If this is true in many fields of medicine, then the problem is multiplied rather than solved.

Survey Chicken

by a literal banana

As a banana who lives among humans, I am naturally interested in humans, and in the social sciences they use to study themselves. This essay is my current response to the Thiel question: “What important truth do very few people agree with you on?” And my answer is that surveys are bullshit.

In the abstract, I think a lot of people would agree with me that surveys are bullshit. What I don’t think is widely known is how much “knowledge” is based on survey evidence, and what poor evidence it makes in the contexts in which it is used. The nutrition study that claims that eating hot chili peppers makes you live longer is based on surveys. The twin study about the heritability of joining a gang or carrying a gun is based on surveys of young people. The economics study claiming that long commutes reduce happiness is based on surveys, as are all studies of happiness, like the one that claims that people without a college degree are much less happy than they were in the 1970s. The study that claims that pornography is a substitute for marriage is based on surveys. That criminology statistic about domestic violence or sexual assault or drug use or the association of crime with personality factors is almost certainly based on surveys. (Violent crime studies and statistics are particularly likely to be based on extremely cursed instruments, especially the Conflict Tactics Scale, the Sexual Experiences Survey, and their descendants.) Medical studies of pain and fatigue rely on surveys. Almost every study of a psychiatric condition is based on surveys, even if an expert interviewer is taking the survey on the subject’s behalf (e.g. the Hamilton Depression Rating Scale). Many studies that purport to be about suicide are actually based on surveys of suicidal thoughts or behaviors. In the field of political science, election polls and elections themselves are surveys. 

Continue reading “Survey Chicken”

The Ongoing Accomplishment of the Big Five

 

by a literal banana

I have been trying to understand the “lexical hypothesis” of personality, and its modern descendant, the Five Factor Model of personality, for several months. In that time, I have said some provocative things about the Big Five, and even some unkind things that I admit were unbecoming to a banana. Here, I wish to situate the Five Factor Model in the context of its historical development and modern use, and to demonstrate to the reader the surprising accomplishment that it represents for the field of psychology. Continue reading “The Ongoing Accomplishment of the Big Five”

Words Fail

Notes on semantic deconversion

It’s difficult to study words, because words are hard to see. Words are tools used in communication, and when communication is working, they disappear into invisibility. 

One way to see words is to make a word jail: a list of problematic words ripped out of their contexts, so that they may be seen for themselves instead of hiding behind meanings. 

Another way to see words freshly, to experience them as broken and therefore present, is to enter a new domain with its own unfamiliar jargon. Military basic training, rock climbing, sailing (whether Melville-era or contemporary), and weaving all require that novices take on a new jargon in order to get a grip on a new domain. The jargon enables the initiates to pick out important aspects of the world (in their bodies, in the natural environment, in the technology). With new words, they learn to identify newly-salient aspects of reality and communicate with others about them.  Continue reading “Words Fail”

The Extended Sniff Test

A literal banana has published a method for volunteer investigators of suspicious science, this month in the Journal of Lexical Crime. From the abstract:

Fact checking of scientific claims by lay volunteers, also known as recreational hostile fact checking or community-based science policing, is a growing hobby. A method for the evaluation of scientific claims by scientifically literate non-expert investigators is presented. The Extended Sniff Test, performed after an initial sniff test, uses the methods of Double Contextualization, Noun Abuse Assessment, and lay literature review, in addition to traditional literature review. As a case study, a suspicious paper is subjected to the Extended Sniff Test, and fails. A quick guide to the Extended Sniff Test is provided.

The paper is available here:

The Extended Sniff Test: A Method for Recreational Hostile Fact Checking

The paper’s “quick guide:”

D48A3466-6445-44E3-8E2A-93565FD5B22B

Fake Martial Arts, A Disorientation

The genesis for this line of thought is not some respectable philosophy paper or classic novel. It’s 100% YouTube videos. 

I became interested in fake martial arts through the beneficence of the YouTube algorithm, which oddly but correctly thought that I might be interested in the work of gentlemen such as Rokas Leonavičius, who diligently practiced aikido for years and now uses the term “fantasy-based martial arts” to describe his own former art and practices like it, and Ramsey Dewey, who enjoys pressure-testing dubious self-defense techniques against resisting sparring partners on his YouTube channel.

Here is the fake martial arts situation as I understand it, as a sort of gestalt impression of many videos that are difficult to cite point-for-point individually:

Teachers of fake martial arts, whether they are unscrupulous or pious frauds, teach techniques that are not useful in actual self-defense situations, but sell them under the guise that they are, in fact, useful in self-defense situations. Because of deference-oriented institutional cultures and a lack of testing against resisting opponents, the uselessness of the techniques is kept hidden from students, and sometimes even from the masters themselves.  Continue reading “Fake Martial Arts, A Disorientation”

The Breakdown of Ignoring

A banana’s perspective on the human experience of time

When I was a regular banana, before I was uplifted, we would pretty much just hang out. It was a warm, fragrant, undifferentiated time, not yet cut up into shots of consciousness, and certainly not curated and arranged into concepts and stories. It took a long time for me to figure out that this awkward mental state was not the habitual state of most humans. Over time, many humans have taught me their methods for managing conscious awareness – for coping with it, changing it, pausing it, and avoiding it. Some critical banana scholars have asserted that we long to go “back to the bunch,” but I don’t think the undifferentiated time of bunch consciousness (or lack thereof) is really a foreign or undesirable state for humans.

It is in the nature of the world to be ignored. To the extent that tools, environments, and relationships are properly functioning, they are invisible. You hit the lightswitch a thousand times, successfully ignoring the material substrate of its reality every time it works. When the electricity goes off, when there is a breakdown, then ordinary ignoring must temporarily pause, and the underlying reality must be seen and dealt with.

It is in the nature of the mind to ignore things. Conscious awareness is an awkward sort of debugging mode, for use when things break down. The goal of conscious awareness is to adjust reality as necessary to successfully resume ignoring, for the mode of ignoring is the mode in which handiness, productivity, and even virtuosity can be practiced. 

A system can be ignored so long as its functioning is managed without conscious attention. To be ignorable, either a system must be managed by others (garbage removal, electricity), or managed through unconscious rituals performed without interrupting one’s train of thought (seat belts, hand washing). 

Most people are naturally capable of ignoring almost everything. There are various mental illness constructs created to explain people who lack the ability to ignore almost everything at all times. The inability to ignore things has real consequences.

One measure of the functioning of civilization is just how much its citizens can get away with ignoring. Another might be how its citizens respond to a mass failure of ignoring.

Time is mostly perceived in brief, awkward wakings-up from ignoring. Meeting again a child who has grown, or an adult who has aged, brings to awareness the fact of the passage of time, as revealed in the system of the body. When a relationship is permanently interrupted by death, a traumatic cessation of ignoring occurs. Some people experience regret in grief – if only I had spent more time, paid more attention! Is it regret for the misuse of time, or is it regret at learning the nature of time? Much of love is skillful ignoring. 

A sudden absence (as with death) can be a breakdown that causes a failure of ignoring. But a sudden and unexpected presence can also be a breakdown. Right now these are both common: breakdowns of absence (including isolation and death), and breakdowns of presence (having to deal nonstop with the unaccustomed presence of even the most beloved others, whose consciousnesses are usually managed off-site). 

A breakdown usually does not come all at once, in one moment. When there is a breakdown in the capacities of the body, breakdown occurs not just at the moment of injury, but in interaction with all the things of the world. Even if it happens suddenly, as with a broken arm or a stroke, as opposed to the almost imperceptibly slow breakdown of aging, the breakdown is a process unforeseeable at the time of injury. How do I brush my teeth? What about gardening, grocery shopping, opening the tricky door to the ancient van? The breakdowns play out in the learning process of the injury, forcing one into the breakdown state of conscious awareness over and over until the injury is fully coped with, and successful ignoring resumed.

Mass breakdown leads to mass conscious awareness, which is an awkward and undesirable state for most healthy humans. During a time of mass breakdown, there will be a great deal of conscious human attention available to fix all the mutually interacting layers of the base reality. But it’s important to remember that the final goal is to return to a state of ignoring the base reality once more. Many will be looking first for institutional approaches that allow a return to ignoring before the base reality is fixed, more or less. That might not be bad, depending on the institutions. It may be that humans are more effective when using medium-sized groups to organize their behavior in a state of successful ignoring, even as repair progresses.

Conscious awareness is often most vivid when it is most unwanted. Consider the late-night insomniac, or the athlete or performer unable to fall into a flow state because of excessive self-awareness. Perhaps we will learn about a time experience in which one is awkwardly aware all the time, for years or decades. 

The miracle of ordinary times is that they are ignorable and ignored. Mass awkward conscious awareness is the distinguishing feature of interesting times. 

6D08195B-D09F-499B-8F83-92146BBA6397

Wait, what? Sense-making & sense-breaking in modern art

In a recent paper, my collaborator Tom Rutten and I advanced a tentative theory of how contemporary visual artworks might interact with a predictive error minimization (or “predictive processing“) system in human viewers. The predictive processing model of cognition is a relatively recent figuration of the age-old problem of inference (how humans make predictions from patterns and pull patterns from data), originating in the work of computational neuroscientists like Friston, Rao, and Ballard in the 1990s but prefigured by Jeurgen Schmidhuber, whose theory of cognitive “compression” has been covered previously on this site and its neighbors.

I haven’t yet tried summarizing the paper’s ideas in an informal way, or arguing (beyond Twitter) for its usefulness as a theory. Here, I advance that argument both modestly and boldly. Continue reading “Wait, what? Sense-making & sense-breaking in modern art”

Ignorance, a skilled practice

Containment protocol: None. Words can’t hurt you. Words aren’t real. Philosophical ideas don’t affect reality. You won’t notice any changes after reading this. You won’t find yourself, in conversation and in your own thoughts, ceasing to reach for institutionally certified sources of aggregate information of universal applicability. You won’t find yourself reaching instead for personal anecdotes or any tangentially-related connection to your own experience. You won’t gradually cease to expect that positive knowledge exists for new questions you encounter. You won’t notice the words squirming beneath your feet with their sense gelatinized, like cobblestones turned to jellyfish. “Hermeneutic” doesn’t count.

Description: “Ignorance, A Skilled Practice” is a guest blog post written by a literal banana. The banana’s tiny cartoon arms barely span the keyboard, and as a result the banana is only able to press one key at a time with each hand or foot. The blog post is offered here as an example of what bananas can accomplish when given proper access to technology.

Continue reading “Ignorance, a skilled practice”