An Important Misconception About Placebos

Concerning placebos and the “placebo effect,” there is a distinction that I have struggled to articulate, a distinction I have also noticed highly intelligent humans failing to make. I recently found an excellent explanation of the distinction in a paper questioning the meaning of recent “open-label placebo” trials, and thought it was worth a short piece explaining why it’s important.

Here is the distinction as the authors put it, with citations removed:

Before reviewing findings from OLP studies, it is crucial to clearly demarcate between two distinctive uses for the term placebo. First, is the usage of placebos in RCTs. Here the term is often understood to refer to a certain kind of ‘thing’ (eg, saline injections or sugar pills). Strictly speaking, this interpretation is incorrect: instead, placebos in RCTs ought to be conceived as methodological tools since their function is to duplicate the ‘noise’ associated with clinical trials including spontaneous remission, regression to the mean, Hawthorne effects and placebo effects. Properly understood, then, these types of placebos are deployed as controls that are specifically designed to evaluate the difference—if any—between a control group and a particular treatment under scrutiny. Ideally, in RCTs, controls should mimic the appearance and modality of the particular treatment or medical intervention under investigation. In contrast, placebos in clinical contexts are interventions that may be intentionally or unintentionally administered by practitioners either with the goal of placating patients and/or of eliciting placebo effects.

Blease, C. R., Bernstein, M. H., & Locher, C. (2020). Open-label placebo clinical trials: is it the rationale, the interaction or the pill?. BMJ evidence-based medicine25(5), 159-165.

On the one hand, there is the use of placebos in randomized controlled trials, in which the point is to “duplicate the noise” that’s likely to exist in the treatment group. On the other hand, there are hypothesized “placebo effects” that may take the form of real healing, which is not at all the same.

For a specific example, just because antidepressant trials result in enormous placebo effects does not mean that depression responds to placebo in real life. The proper conclusion to the size of placebo effects in these trials is that the measurement of depression is extremely noisy, to put it in the most polite way.

While true “placebo effects” of the healing variety may exist, it’s worth engaging with these authors’ concerns over how that may be demonstrated, particularly in open-placebo design trials in which the hope is to pave the way toward ethical placebo treatment. The choice of control is particularly tricky; for example, as with antidepressant treatments, simply using “treatment as usual” or “wait list” as controls likely inflates apparent effects. True blinding requires a great deal of subtlety and effort in research design.

In summary: noise isn’t healing.

Now we can all pretend that we knew it all along and never mistook the one for the other!

The Limits of “Help”

As a banana who lives among humans, the sacred beliefs of humans interest me a great deal. By “sacred beliefs” I mean beliefs that are widely shared and have a high level of emotional vehemence surrounding them, preventing them from being questioned, except by trolls. Usually the goal of trolling is to provoke a defensive reaction, so sacred beliefs can often be brought into visibility by effective trolls. 

This is not a troll. I am deadly serious and I think this is extremely important.

The sacred belief I am addressing here seems to have originated in the late twentieth century, and blossomed into ubiquity in the twenty-first century. It concerns “mental health,” and has multiple parts. First, it is the belief that mental illnesses are real diseases, just as serious as ordinary medical diseases, and no more the fault of the sufferer. Responses to trolls like “depression is a choice” bring this belief into visibility. Second, it is the belief that shame and stigma prevent people from seeking help for mental illness. Third, and the only part of the sacred belief system that concerns me here, is the belief that effective treatments for mental illnesses are available, and that once a person overcomes shame and stigma to seek help, the sufferer stands a good chance of getting meaningfully better.

Here, I will focus on one of the most common mental health problems, depression, known in current medical jargon as Major Depressive Disorder. Further, I will focus on the two gold-standard treatments for depression, generally considered to be the most effective treatments: antidepressant medication and cognitive behavioral therapy. I hope to eventually address treatments for two more conditions, schizophrenia and substance abuse disorder, but those will have to wait for future installments.

Medicine has always existed in some form, and there have always been people who claim to have been cured by the techniques of the day, going back at least as far as recorded history and almost certainly much further. But the evidentiary technique that is supposed to separate modern medicine from the misguided attempts at medicine of the human past is the double-blind placebo-controlled trial. Individual trials may lack statistical power and suffer from publication bias and other quality problems, however, so the true best evidence is probably the large meta-analysis of placebo-controlled trials. 

Meta-analyses raised concerns about the efficacy of antidepressant medications almost as soon as they began, but I will focus on a recent large meta-analysis, Cipriani et al., 2018, that received a great deal of attention. Cipriani et al. interpreted their findings thus: “All antidepressants were more efficacious than placebo in adults with major depressive disorder.” This seems like good news, but the bad news was how much more efficacious the drugs were. They report an effect size (standardized mean difference compared to placebo) of .30. This effect is considered “small” according to tradition and current guidelines, but how small is small? 

Most antidepressant trials use one or both of two measurement instruments: either the Hamilton Depression Rating Scale (HAM-D or HDRS), a 52-point rating scale applied by researchers or doctors to judge how depressed a person is, and the Beck Depression Inventory, a self-rating scale. (Note: the HAM-D may sometimes be referred to as the HAM-D-17 or HDRS-17 because it has 17 items, but it is a 52-point scale, as each item may score multiple points toward the total.) An effect size of .3 corresponds to about two points on the 52-point HAM-D scale, which, on the face of it, if you happen to read the instrument itself, is not much. This is not even taking into account that methodological issues and questionable research practices may account for most or even all of the apparent superiority over placebo. A difference smaller than seven points may not even be detectable by clinicians. Interestingly, a Cochrane review of Dance and Movement Therapy rejected the therapy as not having clinical significance because it only reduced depression scores by slightly over seven points above (psychological) placebo, which was less than 25% of baseline in the relevant studies. By this standard, since HAM-D scores in the antidepressant trials were generally in the mid-to-high 20s, at least 6 or 7 points would be necessary to achieve clinical significance. The upshot is that antidepressants “work” to a degree that is much too subtle for anyone to notice, which is bad enough, but also increases suspicions that they don’t work at all.

There are many branching paths of cope that have been explored for the seemingly devastating result that antidepressants are all pretty similar to each other and move a 52-point depression scale by two points at best compared to placebo. One is that the problem is the scale. Maybe using a self-report scale would better capture the healing effects of antidepressants? However, the Beck did even worse than the HAM-D in placebo-controlled trials, so that one can’t be the case. Or perhaps antidepressants work very well for some people, but just not for most people? That line of thinking was crushed too – the variability in scores for placebo was indistinguishable from the variability in scores in the treatment arms. (Note: a previous version of this study, by the same authors, found that there were significant differences, and was retracted because it was wrong. Since the authors originally came to the opposite result, suggesting some amount of researcher allegiance to the hypothesis, the negative result seems especially likely to be valid.) 

Another line of cope is the idea that big placebo effects represent real healing, and that antidepressants should continue to be prescribed for their placebo effect alone. Unfortunately, a great deal of the “placebo effect” found in these studies is probably a result of poor methodology. In the field of dermatology, this is known as “eligibility creep” – researchers inflate scores at baseline in order to qualify subjects, and then don’t inflate the scores at later points of analysis. There does seem to be evidence that this occurs in antidepressant trials. Furthermore, antidepressant medications cause significant side effects, so even if we believed that placebo healing is real healing rather than the result of research bias and questionable research practices, such drugs would not be appropriate to prescribe for this effect. 

Perhaps it shouldn’t be surprising that SSRIs in particular are not effective in treating depression, as every few years, someone points out that the popular serotonin hypothesis of depression is not substantiated by evidence. The most recent is a 2022 review, summarized here. The authors respond to common objections here.  I have no reason to believe that this will be any more effective than similar efforts going back to the 1990s to debunk the serotonin hypothesis; the myth of serotonin seems as sticky and ineradicable as the myth of antidepressant efficacy. 

What about Cognitive Behavioral Therapy? If antidepressants can’t be expected to produce clinically relevant relief from symptoms of depression, what about the most-touted “evidence-based” form of talk therapy? If you have read meta-analyses comparing CBT to “psychological placebo,” you may have been enormous effect sizes reported. A psychological placebo is something like treatment as usual, a waiting list, or some kind of vague “talking to a therapist,” not a real pill placebo. Even the aforementioned “Dance and Movement Therapy” achieved big effects sizes against psychological placebo that bordered on clinical significance. While I do not believe that placebo healing is real healing, the pill placebo control does seem to be something of a questionable research practice limiter. These authors give some suggestions why. I am not familiar with all possible methods of questionable research practices in therapy trials, but it does seem like it’s harder to cheat against a pill placebo compared to a psychological placebo.

When Cognitive Behavioral Therapy is up against pill placebo, the effect size, according to the most recent meta-analysis I could find, is a mere .22 when using the HAM-D. When using the self-report Beck Depression Inventory, the result is indistinguishable from zero and non-significant. As meager as the results for antidepressants are, the results for CBT are even worse. Researchers seem to think that subjects get a trivial amount better, but subjects themselves seem to feel no better compared to pill placebo.

You might imagine that there would be an outcry challenging this analysis, but I wasn’t able to find one. A typical write-up of this finding was the following:

CBT can benefit patients with severe depression, say researchers

…When compared with pill placebo, CBT led to greater symptom reduction on average by a standardized mean difference of -0.22 (95% confidence interval -0.42 to -0.02; P=-.03) on the Hamilton Rating Scale for Depression. The researchers said that this meant that the number needed to treat is 12 in typical cases of major depression, where the expected placebo response rates may be 30-50%. This would compare favorably, they said, with the number needed to treat (9) that can be expected in antidepressants, with an effect size of 0.31 over placebo.

The community seems to have just accepted this finding as meaning that CBT works, without apparently addressing the fact that the effect represents less than a two-point drop on a 52-point symptom scale, which is, as explained above, not clinically significant and probably not even clinically detectable. 

One defense of CBT might be that, unlike antidepressants, at least talk therapy cannot be harmful. I am not sure I believe this. Although CBT is not approved for the treatment of major depression in bananas, my experience with CBT was initially promising: the hope that my bad emotions were caused by bad thoughts, and that deconstructing the bad thoughts would limit the occurrence of bad emotions. However, what I learned through a few weeks of CBT, deconstructing every bad thought and bad emotion, is that the frequency and intensity of bad emotions is not affected at all by reasoning. If anything, the thoughts that co-occurred with bad emotions got even more ridiculous. Without CBT, it might not have been clear that I had no control over the occurrence of bad emotions. This might be regarded by some as a harm. Initially, I’d suspected that the apparent large effect sizes for CBT were a result of subjects answering surveys differently – e.g. experiencing a pure bad emotion rather than identifying it as “guilt” – but apparently my hypothesis was wrong: it was bad controls all the time, and I had been insufficiently cynical.

I am also a bit surprised, given the sacred beliefs mentioned at the beginning, by the fact that the existence of CBT is not considered insulting. It is hard for me to distinguish the methodology of CBT and other talk therapy methodologies from the aforementioned troll “depression is a choice.” If it’s really a disease, how would it make sense for it to be treatable by thinking correctly instead of thinking wrong? But apparently most humans do not make this connection and hence do not thereby feel insulted. 

An interesting rejoinder to the evidence that psychiatric treatments are not particularly effective is that very little medicine is actually effective. Harriet Hall, biting every bullet, titles her article “Most Patients Get No Benefit From Most Drugs.” Certainly, the problem of medicine generally not being effective is not limited to the field of psychiatry. But does that make it somehow excusable that the top treatments for depression cannot produce any clinically relevant effect? If I found out that all shoes tend to degrade into uselessness in two days, it wouldn’t make me feel a lot better to find out that hats also degrade into uselessness in two days. 

If antidepressants and CBT are the best treatments available, and have no clinically significant effect on symptoms of depression, what “help” is reasonably available? If the “stigma and shame” preventing people from seeking mental health treatment disappeared overnight, and everyone got treatment – and this seems to have largely happened, as antidepressants and CBT are as popular as they have ever been – it seems unlikely to make any difference in outcome. If the best the field has to offer is a glorified placebo, perhaps it has no help to offer at all. If this is true in many fields of medicine, then the problem is multiplied rather than solved.