Day One: Placebo Workshop: Translational Research Domains and Key Questions
Transcript
July 11, 2024
Welcome Remarks
ERIN KING: All right. We'll go ahead and get started. On behalf of the co-chairs and the NIMH planning committee, I'd like to welcome you to the NIMH Placebo Workshop: Translational Research Domains and Key Questions.
Before we begin, I'm going to quickly go through a few housekeeping items. All attendees have been entered into the workshop in listen-only mode with cameras disabled. You can submit your questions via the Q&A box at any time during the presentation. And be sure to address your question to the speaker that you'd like to respond. For more information on today's speakers, their biographies can be found on the event registration website.
If you have technical difficulties hearing or viewing the workshop, please note these in the Q&A box and our technicians will work to fix the problem. You can also send an email to NIMH@mn-ee.com. And we'll put that email address in the chat box. This workshop will be recorded and posted to the NIMH event website for later viewing.
Now I'd like to turn it over to the acting NIMH Director, Dr. Shelli Avenevoli for opening remarks.
I think the audio is still out. If we can restart the video with the audio turned up.
TOR WAGER: That was some placebo audio. I think I might be able to share my screen and get the audio to come up on the video. So maybe I can try that. Hopefully you can see this okay. Let's see if it comes through.
SHELLI AVENEVOLI: Good morning. I'm excited to be here today to kick off the NIMH Placebo Workshop. I am currently the Acting Director of NIMH, and I look forward to serving in this role while NIMH conducts a national search for the next NIMH Director.
Today we are bringing together experts in neurobiology, clinical trials and regulatory science to examine placebo effects in drug devices and psychosocial interventions. NIMH has long understood that the placebo phenomenon is highly active in studies of mental illness. Understanding how to design and interpret clinical trial results as well as placebo neurobiological mechanisms have been important research questions that still have significant gaps. Subsequently, I'm eager to learn what you believe are the most important questions of placebo research and how they might be answered. This is no small charge, I understand. But our organizers have designed a carefully thought out agenda to help facilitate our success.
The workshop is organized into domains that aim to identify those important questions. I'm looking forward to hearing a historical review of the successes and failures around mitigating the placebo response in both academic and industry research. This includes historical perspectives in drug and device trials, understanding psychosocial aspects of the placebo response and measuring and mitigating the placebo effect.
Clearly, several perspectives will be discussed during these presentations. It will be exciting to hear your individual views as well as the panel discussions. I'd like to thank Doctors Tor Wager and Cristina Cusin, the co-chairs of the workshop, as well as the rest of the planning committee for their work in organizing this excellent agenda.
I will now turn it over to Dr. Tor Wager. Thank you.
Introduction and Workshop Overview
TOR WAGER: Okay. Hi, everybody. Sorry the audio didn't turn out as well as we had hoped, but I hope you could still hear it to some degree. And I just want to say I'm really delighted to have you all here. And I'm really delighted that NIMH has decided to organize this workshop and has worked so hard in planning it.
I'd like to thank my co-chair Cristina and also the NIHM co-leads Erin King and Doug Meinecke as well as the rest of the team that's been working really hard on preparing this meeting, including Meg Grabb and Laura Rowland and Alex Talkovsky, Mi Hillefors and Arina Knowlton.
My job for the next few minutes is just to give you a brief overview of the -- some of the main concepts in the placebo field altogether. And I'm going to start really at the very, very beginning.
The workshop goals are really to understand how placebo and nocebo effects impact clinical trial design and outcomes; to understand some of the psychological, neurobiological, and social mechanisms that underlie placebo effects.
And we'd like to think together to use this understanding to help to identify and maximize therapeutic effects of drugs and devices. And that means better clinical trial designs, better identification of outcomes, and also to harness placebo mechanisms in clinical care alongside active treatments so that we don't think of only specific treatments, we think of treatments as having psychological and psychosocial components as well as active drug or device components.
And to go back to the very, very beginning, my colleague Ted Kaptchuk once wrote that the history of medicine is the history of the placebo effect. So this is the Ebers Papyrus circa 1500BCE and it documents hundreds of ancient medications that are now thought to be little better than or no better than placebo effects. Some of them we recognize today like, for example, opium, the ingredient of opiates; and wormwood, the ingredient of absinthe for headache.
If you were poisoned, you might be treated with crushed up emerald or Bezoar stone which is undigested material from the intestines of animals. You might be treated with human sweat and tapeworms and feces, moths scraped from the skull of a hung criminal, or powdered Egyptian mummy, among many other treatments. And what all of these have in common is that none of them or very few of them have active ingredients in terms of specific effects, but they all act on the mind and brain of the perceiver. And so there is something about the beliefs and the imagination of the person that has made these treatments persist for many, many centuries.
And this provides both a challenge and an opportunity. I'm going to introduce the challenge with this clinical trial which is a gene therapy for Parkinson's disease AZ neurokinin which was an industry funded trial. And they went out two years. And this is a genetic manipulation intervention for Parkinson's disease. And what you see here is an improvement in motor scores in PDRS3 on Parkinson's. And if you see, people getting the active treatment, they got substantially better within the first six months and they stayed better for two years.
And this seems great. But the problem is that this trial failed. And the failure resulted in the drug company being sold off and this treatment may never see the light of day. And that's because people in the placebo group also got better and stayed better for two years. And there was no drug placebo difference.
And this is really shocking to me because Parkinson's is a neurodegenerative disorder. And so it's very surprising to see changes of this magnitude last this long. So the opportunity is in harnessing these psychosocial processes and the active ingredients that go into the placebo index like this, or placebo responses like this I should say. And the challenge, of course, is that placebo responses can mask effects of treatment in the way that we've seen here.
And this is not a unique occurrence. In many cases, there are treatments that are widely used that are Medicare reimbursed that turn out after they are tested later to not be better than placebo in clinical trials, randomized trials. And this includes arthroscopic knee surgery for arthritis, vertebroplasty, epidural steroid injections which are still practiced widely every day. Some other interesting ones like stents for angina, which is chest pain. And also some recent high profile failures to beat placebo after very initially promising results in emerging treatments like gene therapy for Parkinson's disease that I mentioned before and deep brain stimulation for depression.
A recent interesting case is the reversal of FDA approval for phenylephrine which is a very common nasal decongestant. It's the most widely used decongestant on the market. Almost $2 billion in sales. So it turns out, it may not be better than the placebo. One of the problems is that in some areas like, for example, in chronic pain, placebo effects are growing across time but drug effects are not. And so the drug placebo gap is shrinking and fewer and fewer treatments are then getting to market and getting through clinical trials.
And that's particularly true in this study by Alex Tavalo in the United States. So as an example, surgery has been widely practiced first in an open label way where people know what they are getting. And it was only much later that people started to go back and do trials where they would get a sham surgery that was blinded or just a superficial incision then. So the person doesn't know that they are not getting the real surgery. And those sham surgeries in many cases have effects that are substantial and in some cases as large or nearly as large as the active placebo -- as the active drug effects.
So this is what we call placebo response which is overall improvement on placebo. It doesn't mean that the sham surgery or other placebo treatment caused them to get better.
And so if we think about what the placebo response is, it's a mixture of interesting and uninteresting effects including regression to the mean, people fluctuate in their symptoms over time. And they tend to enroll sometimes when the symptoms are high. And there is sampling bias and selective attrition. There is natural history effects. And then there is the placebo effect which we'll define as a causal effect of the placebo context.
And the simplest way to identify a placebo effect is to compare placebo treatment with a natural history or no treatment group in a randomized trial. So here in this three-arm trial, a parallel groups trial, what you see is the typical way of identifying the effect is the active drug effect comparing active treatment to placebo. And you need to compare placebo to the natural history group to identify the placebo effect here.
And if we look at those studies that do such comparisons, we can see that there are many effects across different areas. And those effects are active brain body responses or mental responses to the treatment in context. And so there are many ingredients. It's not the placebo drug or stimulation or device itself, of course, that has the effect. It's the suggestions and the context surrounding that.
And there are many types of cues. There are verbal suggestions, information, there are place cues, there are social cues including body language and touch. There are specific treatment cues that are associated with the drugs. And there is a rich internal context. Expectations about treatment outcomes, interpretations of the meaning of what symptoms mean and the meaning of the therapeutic context and the care context. As well as engagement of emotions and memories. And what I'm calling here precognitive associations that are learned or conditioned responses in the brain and the body. So there is a large family of placebo effects; not one, but many placebo effects. They operate both via conscious and unconscious means. They are embedded in the nervous system through learning processes. And an idea here is that meaning of the response to the treatment to the person and the symptom is really the key. What are the implications of the cues and the symptoms and the whole context for future well being? o if we look at studies that have isolated placebo effects compared to no treatment, we see that there are many studies and many systematic reviews and meta analysis including many types of clinical pain in depression, in Parkinson's disease, in motor symptoms as well as other symptoms. In anxiety including social anxiety in particular and general anxiety. Substance misuse and perceived drug effects. Some effects in schizophrenia. Potentially some effects in asthma. And that is a sort of a tricky thing with the conflicting results that we could talk about. And effects on sleep and cognitive function and more. So these effects are really widespread.
There have been some attempts to decompose these into, you know, how large are the effects of placebo versus the effects of active drugs. And so if you look at pharmacotherapy for depression, at least in one analysis here by Irving Kirsch, half of the overall benefit, the placebo response -- or the active treatment response, I should say, is placebo. A very small proportion is specific drug effects. And about a quarter of it is people who would have gotten better anyway, they recover spontaneously from depression. That's natural history.
So the placebo effect is a large part of the overall therapy response. And this mirrors what's called common factors in psychotherapy. And common -- and this is for mood and anxiety disorders, substance use disorders and more. And common factors are those therapeutic elements that are shared across many treatments. And really in particular to -- they include drug and therapy, providing listening and social support, positive engagement and positive expectations. And in this analysis here the common factors also were responsible for a lion's share of the therapeutic effects of psychotherapy.
So in one sense you can say that placebo effects are really powerful, they can affect many kinds of outcomes. But there is continuing controversy, I would say. Even though these competing "New York Times" headlines are somewhat old now. And this latter headline came out after a landmark meta analysis in Froberg, Jenning, Kaptchuk in 2001 which they've updated several times since then.
And what they found is consistent with what I said. There are significant placebo effects in the domains that they were powered to detect. But they discounted those. They said it's probably due to reporting bias and other kinds of biases. So this is a key question is which outcomes count as important?
So here is an example from a fairly recent study of expectancy effects in anxiety. They compare it, people getting an SSRI in a typical open label way which is in the blue line with people who got a hidden SSRI, they didn't know that they were getting the SSRI. And that difference is a placebo-like effect or an expectancy effect.
There was a substantial drop in anxiety that was caused by getting the knowledge that you -- that people were being treated. So the question is does that actually count as a meaningful effect? And, you know, I think there's -- it's right to debate and discuss this. It relates to this idea of what I'll call heuristically depth. That this effect might simply be people telling us what we want to hear. That's a communication bias or a so-called demand characteristic that's been studied since the '50s.
It could be an effect on how people feel and their decision making about how they report feelings. It could be an effect on the construction and anxiety in the brain. It could be an effect on -- a deeper effect in potentially on some kind of lower level pathophysiology, some kind of effect on the organic causes of anxiety.
So the gold standard has been to look for these organic causes. And it gets very tricky when you define outcomes in terms of symptoms. Like is true with pain, with depression-related symptoms, anxiety-related symptoms and more in mental health. In pain, what the field has been trying to do is to look at pathways that are involved in early perceptual effects of nociception and on those central circuits that are involved in constructing the pain experience to ask if those are affected. And what we've seen, this is sort of the most developed area I think in human neuroscience of placebo effects. And we see reduced responses to painful events in many relevant areas. Including in the spinal cord areas in some studies that are known to give rise to nociceptive input to the brain.
There is increases in activity in punitive pain control systems that send descending projections down to the spinal cord. And there is release of endogenous opioids with placebo treatment in some of those pain control systems and other areas of the frontal cortex and forebrain. So these are all causal effects of placebo treatment that seem to be relevant for the construction of pain.
And what is remarkable is that the effects in the frontal cortex that are the most reliably influenced by placebo including the medial prefrontal cortex and the insula and other areas really are not just involved in pain, of course. They really affect some systems that are involved in high-level predictive control of motivation, decision making and perception.
So an emerging concept is this idea that what these circuits are for and what a lot of our brain is for in general is forming a predictive model of what is going to happen to us, what situation do we find ourselves in. So these cortical circuits are important for representing hidden states that we have to infer. And that's another way of saying meaning. Therefore, understanding what the meaning of events is. If it's an eye gaze, what is the meaning of that look? If it's a movement, what is the underlying meaning of the movement?
And it's that underlying situation model, predictive model that guides how we respond to a situation and what we learn from experience. So these systems in the brain that are influenced by placebo provide joint control over perception, over behavior and decision making including whether we choose to smoke or not smoke or eat more or eat less. And the body through the autonomic and neuroendocrine and immune systems. So broadly speaking, there is this joint control.
So this is one example where we can get closer to pathophysiology with some forms of placebo effects. And this is forebrain control over all of the various brainstem and spinal centers that are important for particular kinds of regulation of the body. The respiratory muscles, the heart, the intestines, and immune responses as well. When we look in the brain, the most consistent correlates in meta analyses of immune changes in the body are those that seem to play central roles in placebo effects as well like the ventromedial prefrontal cortex.
And another important development in this and aspect of this is the idea of parallel models in nonhuman animals and in humans, particularly those that use classical conditioning. So there are many kinds of pharmacological conditioning in which a cue is paired with a drug over time, usually over several days. And then the cues alone like the inscription alone can come to enlisted effects that sometimes mimic drug effects and sometimes are compensatory responses that oppose them.
And one of the most famous was the phenomenon of conditioned immunosuppression that was first published by Bob Ader in 1976 in Science and has since been developed quite a lot. So this is from a review by Mia Chelowoski's group which is a very comprehensive review of different kinds of immunosuppressive responses. And the point I want to make here is that there is increasing evidence that the insular cortex as an example is really important for storing memories about context that then get translated into effects on cellular immunity that are relevant for the trajectory of health and disease in broad ways. And those areas of the insula are similar to those that are involved in placebo effects in humans on pain, itch, cough, disgust and other conditions as well. So there is the potential here for memories that are stored in the cortex to play out in very important ways in the body. And that can influence mental health directly and indirectly as well.
And I want us to move toward wrapping up here with a couple of ideas about why these effects should exist. Why do we have placebo effects in the first place? And two ideas are that we need them for two reasons. One is for predictive control. The idea about what we need an evolved brain for, a highly developed brain is to anticipate those threats and opportunities in the environment and respond in advance. So it's not that we -- we don't respond to the world as it is. We really respond to the world as it could be or as we think it will be.
And the second principle is causal inference. That we -- what is less relevant is, is the particular sensory, you know, signals that are hitting our apparatus at any one time. And what is really more important is the underlying state of the body and the world, what is happening.
Just to illustrate those things, one example from Peter Sterling is this very complicated machinery for regulating blood pressure when you stand up and when you are under psychological stress. And we need this complex set of machinery in order to predict what the current -- what the future metabolic demands are. So our blood pressure essentially like other systems responds in advance of challenges. And that's why we get stressed about a lot of physiology.
An example of the second is a simple example from vision. If you look at these two squares that we circled here, you can see they probably look like they are different colors. One is brighter and one is darker. But if I just take away the context, you can see that the squares are exactly the same color. And so you don't see the color of the light hitting your retina. What you see is your brain's guess about the underlying color of the paint or the color of the cubes that discounts illumination and factors it out as a cause. So what our perceptual systems are doing is causal inference.
So with pain, itch or nausea, for example, other symptoms, you don't -- or mood or motivation, you don't feel your skin or your stomach or your body in a direct way. Your brain is making a guess about the underlying state from multiple types of information. And this really starts with our memories and past associations and our projections about the future.
So I'm using pain as an example because we study it a lot. But the idea is that the pain really starts with these projections about the future. And there is a representation in the brain of the current state of threat and safety, if you will. Nociceptive input from the body plays some role in that, but it's really the central construction that integrates other forms of context, what is the look, what kind of support are you getting, that together determines what we end up feeling.
And there are different kind of responses that are linked to different parts of that system. But the idea of suffering and well being, of fatigue and motivation, all of those things I think are related to the current state.
There are many open questions. You know, one is which outcomes count as important for determining whether an intervention is meaningful? Can we separate changes on decision making and suffering from response biases that we really shouldn't consider important for clinical research.
Secondly, can we identify outcomes affected by real treatments, drugs and devices but not placebos? And how can we use those outcomes in clinical trials in advance of the regulatory front as well on the scientific front?
Third, what kinds of experimental designs will help us separate specific effects from these broader context effects? And is this a reasonable goal? Can we actually separate them or do they often work together or synergize with one other? So do they interact?
Fourth, can we predict who will be a placebo responder from personality, genetics perhaps, or brain responses? Can we use this to maximize our treatment effects in clinical trials and improve the pipeline? And, you know, unclear whether that is possible.
And finally, how can we use all of these factors we've discussed alongside other treatments that are current medical treatments to improve outcomes?
With that, I'm just going to introduce the next -- the rest of today. I realize we're a little bit long getting started. Hopefully we can make up some time here. But now we're going to start our first session which is about perspectives on placebo in drug trials from Michael Detke and Ni Khin and Tiffany Francione. So this is going to be about the sort of history and state of how placebo effects interface with the regulatory environment.
Then we'll take a break. And after that we'll continue to the rest of the sessions. So without further ado, I would like to turn it over to Mike. Thank you.
Historic Perspectives on Placebo in Drug Trials
MICHAEL DETKE: I think Ni is going before me. Correct, Ni?
NI AYE KHIN: Yes, I am.
MICHAEL DETKE: Okay, thank you.
NI AYE KHIN: I'll do the first part for the historical perspective.
Hi, I'm Ni Khin. And I'll be talking about historical perspective on placebo response in drug trials.
My disclaimer slide. Although I'm currently an employee of Neurocrine Biosciences, part of the presentation today is the work conducted during my tenure with U.S. Food and Drug Administration.
The presentation reflects view of my view and it's not being not quoted with all the organizations that I was affiliated with and currently affiliated.
Let me start with a brief overview of what FDA required for drug approval. FDA regulation defines that there should be substantial evidence, evidence consisting of coming from adequate and well-controlled trial.
The usual interpretation is that it would require two positive randomized controlled clinical trials. However, in terms of drug approval process, we use holistic approach in review of clinical efficacy and safety coming from clinical trials. So in FDA data from both successful and non-successful study, positive and negative studies, as a package when the industry or the drug sponsors submit New Drug Application packages to the agency. And these mainly the efficacy results generally would come from shorter term efficacy data. And safety data will be according to the ICH requirement 1500 patients, three to 600 for six months and at least 100 patients for a year. Generally the maintenance efficacy or also relaxed prevention trials are conducted mostly post approval in the U.S.
So the data that I'm presenting was conducted as kind of a pool analysis from the data that was submitted to agency in terms of in support of New Drug Applications. Why we did that data mining effort. And as you know high rate of placebo response and decline in treatment effect is over time in psychiatry was the main major concern. At the time when we did this analysis if there were increasing trials at clinical trial sites outside the U.S. And we are looking into applicability of such data from non-U.S. sites in the U.S. population.
So we did exploratory analysis of pooled efficacy data from two different psychiatric indication, major depressive disorder and schizophrenia. We have data level coming from trial level and subject level data. And we for depression across the application package, we have Hamilton Depression Rating Scale as the common primary or key secondary efficacy rating scale. And schizophrenia application packages we have PANSS which is Positive and Negative Syndrome Scales.
So we were looking at those two endpoint measures. And then did some exploratory analysis and then summary from these findings. And the processes and challenges experienced in our effort looking into these databases will be shared today.
Let me start with depression trial level data that we looked at. It consisted of 81 RCT short-term trials. So it spans about 25 years. So these are mainly SSRIs and SNRIs, antidepressant. From that 81 short-term control trial, total number of subject was over 20,000 subject, 81% enrolled in U.S. sites. And as you could see here, majority were whites, Caucasian, female. And mean age was around 43 years of age. And baseline HAMD scores were approximately 24. And dropout rate, average dropout rate in these trials was approximately 33%.
We explored treatment effect and trial success rate based on the questions raised about applicability of data from non-U.S. site to the U.S. population. This is the overall results that we published in 2011 paper. We noticed that both placebo and drug group from non-U.S. tended to be larger change from baseline in HAMD-17 total scores than those observed in the U.S.
You can see on the left-hand column non-U.S. site placebo response is approximately 9.5 and U.S. is 8. But drug effect were also larger slightly in non-U.S. sites and U.S. is slightly lower. So if you subtract drug placebo differences, average is about the same for both U.S. -- data coming from both U.S. and non-U.S. sites. So it's about 2.5 points HAMD total difference.
So what we see overall over 25 years of antidepressant trials is that there is increase in highly variable placebo responses across trial. Slight decline in treatment effect moving from approximately three points difference in HAMD total towards two points drug and placebo difference. In trial success rate was slightly lower, 55 versus 50.
And as part of that analysis we also look at any difference in data between fixed and flexible doses. So 95% of the trials that is in the database utilize flexible dosing design regimen. And so placebo responses were quite similar. Treatment effect was slightly larger for flexible doses as compared to fixed dose.
And we pointed out that in our analysis we used data versus -- data coming from the treatment arms versus number of trials as the denominator in the calculation. So slightly higher trial success rate for fixed dose trials, which is 57%, versus flexible dose 50%.
So and some of you may already know that there was an earlier paper published by Arif Khan and his group. A similar database, but it was datasets coming from trial conducted between 1985 to 2000.
And from that analysis it was showing that 61% of the flexible dose studies versus 33 for fixed dose results in terms of success rate. And Khan's use number of treatment arm as the denominator. And if you look at the results, it's a flexible dose is also 60% compared to 31% of fixed dose. However, in our larger database, data included conducted after 2000, that is 2001 to 2008, our findings are in favor of still fixed dose design with success rate around 60% for fixed dose arm, compared to 34% for flexible dose arm. So we think that the more recent trial fixed dose studies, the success rate is likely higher.
So in addition to trial level data, we also look into subject level data from these trials. So for subject level data we initiated with 24 randomized control trial data from -- then we expanded to 45. And the main thing that we were looking at was the – what could we use in terms of responder definition. Do we need a HAMD total cutoff?
So from that analysis we noticed that overall 50% change for baseline is sufficient to define responder status and HAMD total cutoff is not necessary. Whether you use percent change or HAMD total cutoff or both, we would capture more or less the same folks as the responder, median responder status.
And then another item that we looked into was for optimal trial duration. And we -- if you -- from -- generally from eight weeks trials are the ones that would give overall successful trial results. And we looked into whether if we shorten it to six weeks, whether it will get similar results. So it was like somewhere in between that maybe shorten if you could see the two points difference at week six.
And another item that we look into was time to treatment discontinuation instead of change from baseline as the primary efficacy endpoint. And the data support -- not supportive of time to treatment discontinuation as an alternative primary endpoint for drug trials.
So I'm going to cover a little bit about efficacy results from maintenance efficacy trials also known as relapse prevention trials where we usually use randomized withdrawal design.
And they are generally not regulatory requirement in the U.S. to do maintenance efficacy study. But if the agency would see it would be needed, then we'll communicate with the drug sponsor before coming in with the application.
So as you could see on this slide, these longer term maintenance efficacy study generally design as open label treatment for approximately 12 weeks. Once they meet the stable responder status will be randomized into double-blind randomized withdrawal phase to either continue on the drug or the other half will be into placebo. The endpoint generally used is the time to relapse or relapse rate. And we did overall look at trial level data from 15 randomized controlled maintenance, randomized withdrawal trial that was conducted between 1987 and 2012. And you can see demographic disposition is more or less the same for this trial. Average number of subject per study is in the 500. And mean HAMD score at baseline prior to open label is more or less the same. And randomization after they meet responder status to drug and placebo HAMD total score is 9.4.
And the relapse and -- response and relapse criteria used in these studies are varied among studies. And stabilization period is varied. Regardless of that, these are approved based on short-term study. You also see maintenance efficacy based on the results of this study.
This is just the overall slide that shows the duration of open label -- open label response criteria, response rate, double-blind study period, relapse criteria, and different placebo relapse rate and relapse rate and 50% reduction in terms of relapse difference you will see with the drug treatment.
These results were published. And overall I just want to summarize the results saying that almost all the trials are successful. Open label phase, mean treatment response is about 52%. Those meeting responder status going into double-blind randomized withdrawal phase, there is average 50% reduction in relapse rate for drug treatment group as compared to placebo. And in that paper we have side by side comparison of subject level data in terms of relapse survival analysis Kaplan-Meier Curve.
And let me summarize a little bit about schizophrenia trial data. We did have a pool analysis of 32 randomized placebo-controlled short-term clinical trial that was conducted between '91 and 2009. And those are mainly atypical antipsychotics. And this slide shows number of subjects along with mean age and demographic distribution along with the mean baseline PANSS total score.
And we provided the observed increasing placebo response, stable drug response, and declining treatment effect over time in North America region. One thing we would notice was that treatment effect decrease as body weight increased in North America trial patients. And this is FDA also conducted post 2009 period analysis. And this slide shows comparison between pre 2009 trials and post 2009. And you could see that predominantly multiregional clinical trial in recent years dropout rate is higher, slightly higher. But continuing trend of increasing placebo and decreasing treatment effect when you look at in combination of two different pool analysis is that it still persist over 24-year period. Those both level pool data analysis and schizophrenia data analysis is for 25 years period.
So I'm just going to let folks know a little bit about challenges in doing these type of pool analysis is the datasets. Data standard issue. And it was because of the technology in those times' difference. We do not have subject level data trial conducted before 1997 in the database.
And of course always the resources is an issue. And the main point that I would like to bring for everyone's attention is the collaboration, collaboration, collaboration in terms of solving this major issue of placebo response.
I'm going to stop here. And I'll let Dr. Mike Detke continue with this topic from industry perspective. Mike.
MICHAEL DETKE: Thanks, Ni. I'm having problems sharing my screen. I got to make this full screen first. Okay, great. Sorry, minor technical problems. Thanks for the introductions, thanks to NIMH for inviting me to present here.
As Ni said very well, my background is industry. I'll be presenting this from kind of an industry perspective. I've spent 25 years working at a clinical trial site at big pharma, small biotech, and a vendor company all in CNS clinical development, mostly drugs. And I'll -- I'm also a board certified psychiatrist and practiced for about 20 years. I still do medicine part time. And I'll talk about relevant disclosures as they come up during my talk because I have worked in these fields a fair bit.
So that being said, there we go. This is just a high level overview of what I'll talk about. And again, from the industry perspective in contrast to the –
ERIN KING: Your camera is off if you want to turn it on.
MICHAEL DETKE: I will turn it on. My apologies. There we go.
So as I said, I'll be presenting from the industry perspective. And for the most part my definition of placebo response throughout this talk is if the patient got seven points better on placebo and the patients got ten points better on drug, the placebo response is seven points and we'll be focusing on that perspective.
And Tor gave a great overview of many other aspects of understanding placebo. And we'll talk and my esteemed co-presenters will talk more about that, too.
But again, I'll give you the historical perspective. And mostly I'm going to try to go through some data. Some a little older, some a little newer, that of things that have been tried to reduce placebo response and/or improve signal detection, drug placebo separation which especially in a proven effective therapeutic is probably a better way to look at it. And this is just a list of some of the topics I'll cover. I've got a lot of ground to cover, and this won't be exhaustive. But I'll do my best to get through as much of it as possible for you today.
Dr. Khin already talked about designs including the randomized withdrawal design. Important to keep those in mind. I'll briefly mention a couple of other major designs here that are worth keeping in mind.
The crossover design has an advantage that it's much higher statistical power because in -- the ideal way to use this is to use the patients themselves as their own control groups. So you're doing within the subject statistics which make this much more powerful. You do a much more statistically powerful study with far fewer patients.
A couple of important cons are there can be washout effects in the drugs. So pharmacokinetic or even if it's completely washed out, the patient's depression or whatever might have gotten to a better state that might be lingering for some time. And because of these overlap effects there, you can't be totally certain that the baseline of phase two is the same as the baseline of phase one. And that's an important issue. And those overlap effects are important.
But diseases with stable baselines and I think in the CNS space things like adult ADHD could be things that you would consider for this perhaps in proof of concept rather than confirmatory, though. I'll leave that to my colleagues from the FDA.
Sequential parallel design. This has been presented a long time ago and published on much. This is a design where some of the patients get drug in the phase one and others get placebo. They randomize just like a typical parallel arm randomized study. However, in a second phase the placebo nonresponders specifically are then re-randomized to receive placebo or drug. So this has a couple of advantages.
One is that there are two phases from which you can combine the data. And the other is that this second phase enriches for placebo non-responders just like the randomized withdrawal enriches for drug responders. And this has been published on in the literature. This is a slide that hasn't been updated in a while. But the results even back a few years ago were, you know, out of, you know, quite a few trials that have been reported on.
There was a reduction in placebo response in phase two. The drug placebo difference improved. And the p values were better and so forth. So this is an important trial design to know about. Dr. Farchione will talk about I think one example of this having been used recently. It's a little bit hard because you can't really do this within trial comparisons of different trial designs. That's a limitation.
So these are all cross-trial comparisons really. But and there are some advantages and disadvantages. It -- by using patients twice, you might be able to do the trial with somewhat fewer patients, save money, save time. On the other hand, there is two phases so in that sense it might take a little longer. So various pros and cons like anything.
And then I'm going to talk about placebo lead-in. So historically people did single-blind placebo lead-ins where all patients would get placebo for the first week or so blinded to the patient, not to the staff. And then if they had a high placebo response, they would be excluded from the study.
Typically it was about a week and about a 30% placebo response, but it varied. Trivedi & Rush did a great review of this, over a hundred trials as you can see. And little evidence that it really improved -- reduced placebo or improved drug placebo separation. This is some work from my early days earlier in the 2000s at Eli Lilly when I worked on Cymbalta and Duloxetine for about seven years. We did something called a variable duration placebo lead-in where we -- this was the design as it was presented to the patients and to the site personnel that randomization would occur anytime between week -- visits two and four. Which meant they were on placebo for either 0 to one to two weeks. Usually, in fact, they were on for one week.
This has some pros and cons again practically. This -- the placebo lead-in adds a week or two of timeline and cost. The patients, the way this was designed and to maintain the blind, the patients that you, air quotes, throw out for having too high of a placebo response have to be maintained throughout the study which costs money and means that your overall end might need to be higher. So time and money implications.
When we looked at this, Craig Nalstrom, a statistician published from this. And we found that the average effect size did go up pretty substantially, this is going to the effect size. But you also lost some end when you excluded placebo responders. So the frequency of significant differences did not go up substantially in this analysis.
Moving on. Dr. Khin referred to this study by Arif Khan where flexible dose trials did better than fixed dose. I would say that, you know, the database that Dr. Khin presented from the FDA, bigger database, you know, less publication bias and things like that. So I would lean in favor of preferring that. But I would also say that if you focus on my last bullet point, there is clinical intuition about this. And ask yourself the question if you had a case of depression and you could go see a doctor that would only prescribe 20 milligrams of Prozac to every patient or a doctor that would prescribe 20 milligrams and if you're having side effects maybe titrate down, and if you're not having X he might titrate up, you know, which doctor would you rather go to?
So I think on some level it seems to have good faith validity that adjusting the dose to individual patients should lead to better efficacy and better assessment of true tolerability and safety. And that should do a better job than adjusting the dose of placebo. But importantly, because flex dose studies are two arms, one drug with a flexible dose and one placebo. And fixed dose studies are frequently dose-finding studies with, say, one arm of placebo and maybe three arms, 10, 20 and 40 milligrams of drug. So the number of treatment arms is practically, it's confounded with fixed versus flexible dosing. And likewise -- and that may matter. And the percentage randomized to placebo. And again, this is confounded with number of arms.
If you do equal randomization in a two-arm study, you have got a 50% chance of placebo; a four-arm study, you've got a 25% chance of placebo. And again, it makes good base validity, good sense that if your chance of getting placebo is much higher then you might have a higher placebo response rate or the chance of getting active drug is higher.
And that is what Papakostas found in a meta analysis in depression and Mallinckrodt again in a meta analysis of schizophrenia data. So those were all confounded. And they have pros and cons. And you do need to do some dose finding with your drug anyway. So they are all designs that have pros and cons to lead to better outcomes.
Better scales. This is a simple analysis taken from that same paper that did the double-blind placebo lead-in with Mallinckrodt. And we just looked at a pooled set of 22 RCTs. I think these were mostly or all duloxetine studies and depression studies. And the HAMD-17 item scale had an average effect size of about .38. But some of these subscales, which are, you know, five, six, seven or eight items long of items among the 17 in the HAMD. In other words, if you throw out half of the data from the HAMD, you could actually get a better effect size. And so this is something to think about at least in proof of concept. Obviously these subscales would need to be validated for regulatory and other purposes. But good to know that there are different approaches.
And too, if you have a drug that you believe based on earlier clinical data or preclinical data that are more likely to be efficacious in certain domains, symptom domains, that is important, too.
Statistical approaches. This is a little bit dated at this point in time, but there are a lot of important statistical issues to take into account. When I entered the industry, last observation carried forward, LOCF, was the gold standard. There have been a lot of papers published on mixed model repeated measure that protects better against both false positives and false negatives, gives you better effect sizes here. And here almost, you know, 30 or so percent bigger which is pretty substantial. And I'll show you that later. So better protection against false positives and false negatives means we have got more true positives and true negatives which is exactly what we want in therapeutic development.
And I'll talk here now about different implementation strategies during the trial. Central raters and a lot of people use different terminology here. So my terminology for central ratings is when a rater is remote and actually does the assessment. They are asking the questions, they're hearing the answers, they are asking for clarification, they are doing the scoring, etc. And these raters can be more easily blinded to protocol pressures and more easily independent of pressures to meet enrollment and so on and so forth. Note here, I was previously an employee and stockholder and consultant to MedAvante which was one of the companies that pioneered doing the central ratings. So I'm no longer -- I don't have any stock or no financial conflicts of interest now, but I did work with them for a while.
One advantage to centralized ratings on the right is that you can simply use fewer raters which reduces the variance that all of us humans are going to contribute. These people can be trained together more frequently and more consistently. And that can reduce variability, too.
Just some perspective, and Tor presented some nice stuff from other therapeutic areas, too. Is that, you know, in psychiatry, in CNS most of our outcomes are subjective and highly variable and probably need to be improved upon in some ways. Despite that, in other areas where there is probably less inherent variability, they have already standardized the fact that, you know, centralized blinded review or assessments by at least a second or a third person for lots of other types of therapeutics. And these are relatively old guidances from the UMA and FDA mandating this in other therapeutic areas.
So then to get back to the data on centralized ratings, MedAvante was able to conduct about seven studies where they did within study comparisons of site-based ratings and centralized ratings. And across these seven studies, my interpretation, and you can look at the data, are that about five of seven were green. They were clearly -- clearly showed better lower placebo responses or if there was an effective drug, better drug placebo separation with centralized ratings. And two showed pretty equivocal or not impressive differences.
And again, I'm a former employee and consultant to MedAvante. Here is one example, a large GAD study with -- that had escitalopram as an active comparator. And you can see the effect size was about twice as big in HAM-A points. The Cones-D effect size here was about twice. And the chart we put together when I was at MedAvante illustrates that a doubling of the Cone-D effect size means that you can either reduce your sample size by 75% and still have the same statistical power; or you can select a sample size of, say, N of 100 and your power goes up from about 60 to almost 100.
The more important way to read these powers is that your chance of a false negative, your chance of killing your drug when you shouldn't have is 38% with this effect size. And less than 1% with this effect size.
So then there are other approaches than having a central rater really do the assessment remotely. You can review the work, have a third party review the work of the site-based raters. MedAvante, their competitors Verisite, Signed and others all offer these services now and other companies do, too. And I'm not trying to -- and I don't know of any reasons to prefer one versus the other.
So you can review the source documents, audio or video recordings. This looks like it should work. It has good face validity. I've run trials with this. But I'm just not aware of any control data. I haven't seen studies where people have done third-party remote feedback in, say, half the sites or half the raters and not the other half and shown results. If you have those data, please send them to me. I'd love to incorporate those.
But, as I said, it has good face validity. You know, if you're giving people feedback on the quality of their assessment, the raters should do nothing but improve. There is effect called the Hawthorne effect that people behave differently when they know they are being monitored. This should work.
And let me talk a little bit about operations, doing central ratings is pretty burdensome. You have got to coordinate ratings with a rater that's somewhere else maybe in a different time zone and the patient and the site. It's expensive. It's labor intensive. This is less labor intensive because you don't have to review all the recordings. It can be done not in real time. And so it's less burdensome, it's less expensive.
Not clear exactly how efficacious it is, but it has good face validity. Or just replace those human raters with computers. There have been a lot of different groups that have done work on this. And I'm going to jump right into some data.
These are data from -- you'll recognize duloxetine again. And John Grice was one of the early pioneers in this in a company called Healthcare Technology Solutions. And this was done with patient self-report using IVR. So just basically an old fashioned keypad on a phone is good enough to do this. And the patients self-report this. And for those of you that don't know this, separating 30 and 60 milligrams of Duloxetine is really hard. We never really saw this with clinical rating scales.
But patients self-rating using a computer in days saw really nice signal detection and really rapid signal detection. And this is just another example of a different measure, PGI. And again, really impressive separation on these. Or humans are good and computers are good, why not combine the both. And Gary Sachs founded a company called Concordance many years ago. And it's been merged into other companies. And this is part of Signed now. And they showed that if you did a clinician rating and a patient self-rating by computer and compared them, you could learn a lot from the points that were not -- were discordant. And you could learn a lot about both severity ratings but also inclusion/exclusion criteria, diagnosis, things like that. So that's valuable.
Let's talk about professional patients quickly. This is just an anecdote. And I generally stay away from anecdotes, but I found this is really compelling. This subject returned to the site with their unused pills from their pill bottle. Unfortunately, he had a pill bottle from a different trial site, same sponsor and protocol. And this is probably a common problem. This is a phase three program in depression where they had up to 4% duplicate subjects at least in screening. It could be higher. We don't know how big the problem is. But we know it's a kind of a -- it's a tip of the iceberg issue. Because you can look -- you know, there probably aren't too many patients that are bold enough to try to enroll twice at different sites in the same study, but they might enroll sequentially. They might go through multiple screenings until they get in. They might be in different studies by different sponsors for the same or even different indications. Be in a bipolar study this week and a schizophrenia study next month, and a depression study the month after.
And these patients may or may not be compliant with medications and also protocol features. Anecdotal data on subject selection. There are lots of websites out there that will teach you how to be a bad patient in a clinical trial. And I just want to note, not that it's that a bad thing, I love ClinicalTrials.gov, I use it a lot, but any tool can be used for good or bad things, or almost any tool.
And the reason I mention this to you again, as you are posting your trials on ClinicalTrials.gov you want to be transparent enough to share what you need to share, but you might not want to help them too much with specific details of certain inclusion/exclusion criteria that are subjective and can be, for lack of a better word, faked.
The top three of these are all companies that do duplication check for duplicate patients that might be in your study and another study that they have in their database. I've worked with all of them. And worth noting, this is relatively minimally expensive. You just have to get a few demographics on each patient at screening. So also the site and patient burden are pretty minimal.
And AICure is really more of a medication adherence platform. But of course the really bad professional patients don't want to take the medications either. So there is some overlap between professional patients per se and medication adherence. Medication adherence. I'm going to go through the rest of this quickly in the interest of time. Difficult to know with certainty. Not as helpful if done after randomization certainly if you need intent to treat. But pK collection is important. One way to do it is just pK collection. That is a gold standard that tells you that the drug is in the patient's body. I'm going to skip this slide, too.
If half the patients don't take their medicine, you can imagine that the power is very bad. And I did consult with AiCure previously. That's an important disclosure, too. The reason I like AiCure, not so much because I consulted with them, there are many medication adherence platforms out there on the market. This is the only one where I've seen evidence that their platform is consistent with, correlates with, predicts pK values. So if I were you, that's an important question to ask. Then you also have to ask about all of the operational issues, too.
Biomarkers. I mean when we've got biomarkers, they're great. You know, if you've got a PET ligand and you can -- help you narrow down the dose and really demonstrate that you are engaging the target, that's fantastic. This is just an example of PET ligand. This is another biomarker. This is hot off the press, this was presented just a few weeks ago at ASCP. And the idea here is basically taking baseline demographics and putting them all into an AI model to see what predicts placebo response and drug placebo separation.
This is another company that I work with currently so there is that disclosure with as many grains of salt as you believe. We did a blinded analysis of baseline EEGs and identified three clusters in a placebo-controlled Zoloft study.
In the overall study, it just failed to separate. And we identified three distinct clusters, one of which has a huge Cone-C effect size and P value even in a little less than half the population. Another cluster that really weren't responders at all. And a cluster, the third cluster that is less than 20% of the population that had fantastic placebo responders and terrible drug responders.
So this needs more validation like all biomarkers. And I just want to leave this with the point that biomarkers are great as we continue to understand the biology and pathophysiology better. First we are going to have to validate these against the gold standards. And the current gold standards are variable and biased and imperfect. So to close on a relatively optimistic note, this is a red, green, yellow. Green is good. Yellow is questionable. Red is probably not that worth it. My own personal subjective assessment of -- but the takeaway is that a lot of these things can be helpful, especially when fit for purpose with the therapeutic that you are developing, the phase of development, and your strategic goals for that therapeutic.
So I'll end there. Thank you very much for your attention. Look forward to questions and so forth.
TOR WAGER: Great. Thank you, Mike. For time reasons, we're going to go on to our next speaker. But just to let everybody know, there's a Q&A and people are posting questions there. And our panelists can answer questions there in the Q&A panel as well as in the -- during the discussion phase. So keep the questions coming, thank you.
All right. Dr. Farchione, thank you.
Current State of Placebo in Regulatory Trials
TIFFANY FARCHIONE: Thank you. Let me just get this all cued up here. So thanks, everybody, and good afternoon.
As we've already said, my name is Tiffany Farchione, and I'm the Director of the Division of Psychiatry in the Center for Drug Evaluation and Research at the Food and Drug Administration. So because I'm Fed, I have no conflicts to disclose.
I'm going to be providing the regulatory perspective of placebo response in psychiatric trials. So, so far today you've heard a little bit of an historical perspective from Dr. Khin, who is actually my former team leader and peer reviewer. And she showed us that not only do we have a high rate of placebo response in psychiatry trials, but the extent of that problem has actually been increasing over time.
And then Dr. Detke just presented some of the strategies that have been proposed for dealing with this problem. And in some ways they are somewhat limited utility in some examples.
So I'm going to talk a little bit about the importance of placebos for regulatory decision making and give a few examples of placebo response mitigation strategies and registration studies. And then I'll go on and talk a bit about placebo response in other disease areas and end with some thoughts on what may ultimately help us to resolve this issue. All right. So I want to start first by expanding a bit on Dr. Khin's presentation and just quickly presenting some updated data. I saw that there was a question either in the chat or the Q&A about depression studies. And honestly, we don't have too much more from what she presented in depression. And also the things that we've approved more recently have different designs, different lengths of treatment and things like that so it makes it hard to combine them with the existing dataset.
But here we've got figures for schizophrenia and bipolar. And they look a little different from each other because I pulled them from a couple of different presentations. But essentially the data points in each figure represent the change from baseline to endpoint on either the PANNS on the left, or the YMRS on the right in critical trials of atypical antipsychotic medications for the treatment of either schizophrenia or bipolar one disorder.
And the drugs included in these figures are ones for which we have both adult and pediatric data. So on the left you can see that the trend for increasing placebo response over time is also evident in the adolescent trials. And then on the right, we have data from adult and adolescent bipolar one studies, which Dr. Khin didn't present. So there are a few data points in this side, fewer than in schizophrenia. But the trend is less obvious from the dots alone. But if you draw in the trend lines, which are here on the figure, that allows you to see that the same phenomenon is also at play in the bipolar studies.
All right. So let's go back to basics for a minute and talk about why we need placebos in clinical trials in the first place. So simply put, placebo-controlled studies are our bread and butter. And in order to support a marketing claim, companies need to provide substantial evidence of effectiveness for their drugs. Ni went over this a little bit as well. This is generally achieved with two positive adequate and well-controlled clinical studies. And the characteristics of adequate and well-controlled studies are outlined in the Code of Federal Regulations.
So there's seven different characteristics that are listed in CFR, but one of those states that the study has to use a design that permits a valid comparison with a control to provide a quantitative assessment of the drug effect. So more often than not, that's a placebo control.
And they've agreed we just need some way to determine that the drug itself is actually doing something. So if the treatment response in the drug arm is greater than the response in the placebo arm, then that difference is assumed to be evidence of a drug effect. But that may be oversimplifying things just a little bit. It's important to remember that the difference -- the difference between an effect and a response. So the response is the observed result like the change from baseline on a PANSS or a MADRS score. And the drug effect can be one component of that. But adherence to the drug, timing of the assessment, other factors also influence the observed response.
And yes, a portion of the drug response is probably attributable to placebo effect. Same thing with placebo response. Yes, the placebo effect itself is a component of the response observed. But you also have things like the natural history of the disease or regression to the mean or, you know, when we talk about adjunctive treatment, it could be that the other treatment is part of that effect. All of those play a role in the observed response in a study.
So what exactly is it that can account for the placebo response rate in our client's trials? So Dr. Detke went over several of these examples earlier. But let's start with expectancy. And this is a big one. If folks expect to have some benefit from a drug that they're taking, they oftentimes do experience some benefit. The structure of a clinical trial can also contribute to the placebo response. Folks are being seen on a regular basis; they have a caring clinician that they interact with routinely. Those things can in and of themselves be somewhat therapeutic.
The fact that we use subjective outcome assessment is another aspect of this that I want to highlight. Because in psychiatry trials, we can't draw labs or order a scan to ensure that we have the right patients in our trials or to objectively assess their response to the drug. What we have are clinician interviews and patient reported outcomes. And oftentimes these outcome assessments involve a report from a patient that is then being filtered through a clinician's interpretation and then translated into a score on a scale. So there is a lot of room for variability in that.
The distal nature of that assessment from the actual biological underpinnings of the disease can be problematic and it's certainly prone to misinterpretation and to biases also. So again, Dr. Detke also mentioned how enrolling inappropriate participants can impact placebo response. If you have folks in a trial who don't actually belong in the trial, whether that's the professional patients that he kind of finished with, or whether it's folks who just don't quite meet the inclusion criteria or who have been misdiagnosed somewhere along the line, any number of things. That's going to increase the variability in your study and could potentially result in increasing the placebo response. So, of course, there's lots of other factors that can contribute to the placebo response. But because Dr. Detka spent a lot of time on this already, I just wanted to highlight these few skews.
So next I want to talk a little bit about ways in which we could potentially manage the placebo response in clinical trials. First, I want to present one option that we actually have not yet accepted for new drugs in psychiatry, but it's an option that actually takes placebo out of the equation entirely. We have a bunch of approved antidepressants, a bunch of approved antipsychotics. So at this point you might be asking why we can't just do non-inferiority studies and attempt to demonstrate that the new drug is no worse than some approved drug.
So the complicating factor here is that conducting a non-inferiority study requires defining a non-inferiority margin. And in a non-inferiority study, you are trying to show that the amount by which the test drug is inferior to the active control is less than that prespecified non-inferiority margin, which is M1.
And M1 is estimated based on the past performance of the active control. But, unfortunately, because of the secular increase of placebo response over time, we can't really estimate M1. It's a moving target. So even though we have things that have been approved in the past, we don't know that the margin by which the active drug was superior to placebo in the clinical trial that supported its approval is the same margin that would be observed today under similar circumstances. So because we can't set a non-inferiority margin, we can't do non-inferiority trials, at least not for regulatory purposes in psychiatry.
Another strategy that's been employed in a few trials at this point is sequential parallel comparison design. And again, Dr. Detke went over this briefly so you have some idea of the principles behind this already. Now recall that this is a design in which you have two stages. And the first is intended to weed out the placebo responders so that in the second stage the drug placebo difference is amplified --
So there is some statistical concerns with this type of study design related to the relative weights of the two stages and the impact of dropouts. But we have had one application where trials that employed the kind of trial design made it to the New Drug Application stage. And this application was presented at an advisory committee meeting back in November of 2018. So there is publicly available information for me to share even though the application ultimately was not approved.
This was for a fixed-dose combination of Buprenorphine and Samidorphan. So it was intended for the adjunctive treatment of major depressive disorder. Now the figure on the right-hand side was taken directly from the AC briefing book. And it shows diagrams of three studies in which SPCD was employed as part of the clinical trial setting.
The important thing to observe here is that you do in fact have a large placebo response in stage one and a much smaller placebo response in stage two. But what we don't see is the expected amplification of the drug placebo difference in stage two.
So as I said at the advisory committee meeting, either SPCD isn't working or the drug isn't working. So regardless of the outcome here, the important take home point is that we were able to file an application with SPCD in it. We had reached agreement with the applicant on the weights for the two stages and the analyses. And there weren't many dropouts in stage one of the studies. So we were able to overcome two of the big hurdles for this design in this program.
But if we receive another application with SPCD in the future, we're going to have to look at those issues again because they really are trial specific. So we'd advise sponsors to use consistent stage lengths and to reach agreement with us in advance on the primary endpoint and other critical trial features. And even if we reach agreement on all of those things, we're still not going to be able to agree a priori that the study will be acceptable because of some things that we're concerned about will remain open questions until we have that data in hand.
I already mentioned that here there weren't many dropouts in stage one. You don't know that until stage one is done. So even if we do accept the design and the study is positive and all of these issues are resolved labeling is still going to be super complicated if you have an SPCD.
[AUDIO INTERRUPTION] end up writing a label for this one.
All right. So moving from complicated to something much more straightforward. This is a table taken from the clinical study section of the valbenazine label. This is the data that supported the approval of valbenazine for the treatment of tardive dyskinesia. The studies that supported this application really provide a good example of one of the strategies to mitigate placebo response that has been, you know, successful. And that's the use of blinded central raters.
In this study, the raters were blinded to treatment assignment and also to visit number. And using the blinded central raters was feasible here because the symptoms of tardive dyskinesia are directly observable and can even be captured on video. So they can be rated by the remote central raters fairly easily.
And then you'll note here that the change from baseline on the AIMS and the placebo arms was basically negligible.
All right. So I think it's also important to bear in mind that this phenomenon of placebo response in clinical trials is not something that's unique to psychiatry. We see it in multiple other areas of medicine. It's ultimately the reason that we have placebo controlled studies in the first place.
We do expect to see some response in a placebo group. Folks get something that they think could be an active drug and, lo and behold, they have some response. It's important, though, if you want to understand that the observed response is, in fact, related to the active treatment that you do show that folks on the investigational drug are doing better than folks on the placebo.
So for the next couple of slides, I'm going to show some examples of what we see in other disease areas and speculate a bit on why the placebo response rate in those trials is higher or lower than what we're used to seeing.
And I'll caveat this by noting that I pulled my examples from the most recent Office of New Drugs annual report, and I haven't done a deep dive to see if other drugs behave similarly or if my speculation here bears out consistently. But with those caveats in mind, I'm also going to try to draw some parallels to circumstances in psychiatry trials.
All right. So the first example I have here is from the clinical study section of labeling for zavegepant, which is an intranasal calcitonin gene related peptide antagonist that's approved for the acute treatment of migraine with or without aura in adults.
The point I want to make with this example is that the endpoint here, pain, is very subjective. So similar to a lot of what we do in psychiatry, the endpoint is relying on patient report of their subjective experience.
Now, in this case, it probably helps somewhat to have a dichotomous endpoint of pain free versus not, rather than asking participants to rate their pain on a Likert scale that would introduce more variability. And honestly, as somebody who gets migraines, I can tell you that pain free is what matters. Like, a little bit of migraine pain is still migraine pain. Like, I don't want to deal with it.
Anyhow, with that kind of subjectivity, it's not too surprising that about 15% of the folks in the placebo group were responders.
Now, if you think back to that slide I showed earlier about contributors to the placebo response, some of this could be placebo effect. Some of it could just be that their migraines were resolving spontaneously within two hours anyways. Regardless, we have a pretty high placebo response rate here.
But we also have a responder rate of almost 24% in the active treatment group and a statistically significant difference on the primary endpoint of pain free at two hours.
On the secondary of relief from the most bothersome symptoms, so things like photophobia, phonophobia, nausea, both the placebo and the active groups had even higher response rates, but again, a significantly higher response in the active treatment group than in placebo.
So this is from the clinical pharmacology section of that same label. And I want to point out that this is very similar to what a lot of our drugs look like in psychiatry. We describe what the drug does at the receptor level, and then we say that the relationship between that action and the clinical effect on depression or schizophrenia or whatever is unknown. And until we have a better understanding of pathophysiology, that's going to continue to be our approach in labeling.
All right. The next example I have comes from the clinical study section of labeling for linaclotide oral capsules. And I have to say, when I'm talking outside of my own disease area, hopefully I'm getting these pronunciations right. But anyways, it's a guanylate cyclase C agonist. The data here supported the irritable bowel syndrome with constipation indication.
And I think this is a really interesting example because we have two different endpoints here. Like our last example, one is a pain endpoint that's likely to be highly responsive to placebo. Again, it's subjective. But unlike the last example, it's not dichotomous. So it requires a bit more interpretation.
The other endpoint is something that's a bit closer to objective. CSBM is complete spontaneous bowel movements. So, clearly, the number of bowel movements is something that can be counted. But the endpoint itself is a little bit of a hybrid because it also involves a subjective report of the sense of completeness of evacuation.
So, interestingly, you see a much higher percentage of placebo subjects meeting the criteria for responder on the fully subjective pain endpoint than you do on the CSBM endpoint.
And I got to tell you, Section 12 of this label is something that I dream about being able to do for psychiatry. We can only aspire to this, frankly, at this point. The language here very clearly lays out the pathway between the action of the drug and the downstream physiologic effects on constipation. And it even presents an animal model to support the drug's effect on pain. So this suggests that the drug acts on some aspect of the underlying pathophysiology of IBS C.
All right. So, so far I started with an example of a trial with a subjective endpoint, then went to something that's a little bit more objectively measurable. Here I'm going to show data from the bimekizumab label and the studies that supported its indication for the treatment of moderate to severe plaque psoriasis in adults.
So bimekizumab is a humanized interleukin 17A and F antagonist. The endpoints in the study were Investigator Global Assessment, which is an overall assessment of psoriasis severity, and the Psoriasis Area and Severity Index. Now, you might think that these things are somewhat subjective because they are investigator assessments and, of course, require some interpretation to get to the score on these scales.
But these are assessments of the size and extent of the psoriasis plaques, things that are directly observable. And both scales have anchors that describe what type of appearance the plaques of a given severity would have. So, you know, it kind of like gives you a framework for how to, you know, rate these different lesions.
So even though these are global assessments and you might think of clear and almost clear as being analogous to something like improved or much improved on a CGI, we're really talking about very different things.
Here, both what the patient is experiencing and what the clinician is observing are things that you can see and measure. You're not asking the patient if the patient feels like their skin is redder, you can see the erythema. And here you can see a much lower rate of placebo response in the studies. When you're directly observing the pathophysiology in question, and it's something that is objective or relatively objectively measurable, you get less placebo response.
All right. And Section 12 of this label isn't quite as definitive as the linaclotide label in terms of directly linking the drug effect to pathophysiology, but it's pretty close. And, again, it's probably a combination of the relatively objective outcome measures and the tight link between drug action and pathophysiology that's contributing to the low placebo response in these trials.
Finally, I want to put up an example that, of course, has been in the news a lot lately. This is from Section 14 of the tirzepatide label, and this is one of the GLP 1 inhibitor drugs that's indicated for chronic weight management as an adjunct to reduced calorie diet and increased physical activity.
Now, there are all sorts of things that can contribute to placebo response in weight management studies. So, for example, the folks who are in these studies are likely to be motivated to lose weight in the first place. They're required to engage in diet and exercise as part of the study. And even though it's difficult, sometimes folks just lose weight.
So even though weight is something that is objectively measurable, there's multiple physiologic and behavioral factors that may contribute to changes in weight. So there's a lot of variability, and it's been traditionally pretty difficult to show improvement in weight loss trials, or at least to show enough improvement that it overcomes the adverse events that are observed in the trials.
Anyways, the primary outcome in these studies was the percent of patients losing at least 5% of their body weight [AUDIO INTERRUPTION]. Now, you'd think that that would be pretty difficult to surpass, but these studies still managed to show a treatment difference because the active treatment works like gangbusters.
So another way to overcome concerns about placebo response is to find something that really has an impressive treatment effect. Then, even if you have a massive placebo response rate, you'll still be able to show a difference. And so far we don't have much of anything with this kind of an effect in psychiatry, unfortunately.
And then again, once again, in Section 12 we have a mechanism of action description that links the drug action directly to the clinical effects. The drug binds to a physiologic regulator of appetite, the person taking the drug eats less. It's pretty straightforward.
All right. So what lessons can we take away from all of this? Ultimately, the point that I want folks to take home from the examples I've shown in psychiatry and in other disease areas is that there are things that we can do to help mitigate the placebo response in our clinical trials. For things like SPCD or other nontraditional study design elements, I would advise sponsors to talk to us early and often. There are still some methodological issues that, you know, need to be overcome, but we're willing to consider SPCD studies as long as we're able to agree on specific aspects of the design and analysis.
Folks can also do things like trying to improve rater training and to mitigate some of the variability that's just inherent in asking human beings to assign a rating to something that is subjective.
Still related to measurement, but maybe more of a medium term than a short term solution, it could be worthwhile to develop better clinical outcome assessments. The scales that we use in clinical trials now have been around a long time. You know, they were mostly expert consensus and, you know, just they're face valid, for sure, and obviously we have precedent for them, but they've been around longer than modern psychometric principles, quite frankly. So developing new ones would potentially be welcome.
Anyways, in terms of other sources of variability, I'd refer back to Dr. Detke's presentation and his comments on the number of sites, enrollment criteria, and so on. Essentially, quality controls on study design and implementation. But ultimately what's really going to be the real game changer here is when we can develop drugs that actually target pathophysiology. That's when we'll finally be able to take some of this variability and subjectivity out of our clinical trials and really get much more objective measures.
In the best of all possible worlds, we would have a much better understanding of pathophysiology of psychiatric disorders. We'd be able to develop drugs that target the pathophysiological underpinnings of our diseases, and we would even be able to define study entry criteria more appropriately because we wouldn't be relying on subjective assessments for diagnosis or inclusion.
We'd be able to get that blood test or get that scan that can tell us that, yes, this is, in fact, what's going on here, and this is a patient who is appropriate for this clinical trial.
And I understand that we're, you know, a long way from that today, but I hope that folks will think of this as an aspirational goal, that our current state of understanding is less of a roadblock and more of a call to action.
And so with that, and recognizing that I am the one thing standing between you and our break, I will just say thank you very much for your attention.
TOR WAGER: Okay, wonderful. Thank you to all of our speakers and panelists in this first session.
Let's take a short break. We have some questions in the chat. More questions are coming in. But we have a break now until 1:50. And so I suggest that it's a short break, but we can get back on track and start then in about seven minutes. Okay? Thank you.
[BREAK]
TOR WAGER: Okay. Hi, everybody. It's a short break, but thanks for hanging with us here and coming back after this short break.
Current State of Placebo in Device Trials
TOR WAGER: Our next session is going to be led off by Dr. Holly Lisanby and Zhi De Deng on the current state of placebo effects in device trials, and then we'll go for a series of placebo effects in psychosocial trials, and then, after that, the panel discussion. Dr. Lisanby, thank you.
Sham in device trials: Historical perspectives and lessons learned
SARAH “HOLLY” LISANBY: Thank you, Tor. And so these are my disclosures. And as Tor said, I'm going to be talking about placebo in device trials. And so although up until now in the workshop we've been talking about placebo in drug trials, which are typically given either by mouth or intravenous or intranasal, we're now turning our attention to how you would do a placebo in a device trial.
And that's where we use the term sham. So we blind device trials typically by doing a sham procedure. And the idea of sham is that the mode of application of the device and the ancillary effects that the device elicits are meant to be as closely matched as possible but without having active stimulation of the body or the brain specifically.
Now, one of the challenges in blinding device trials using sham procedures is that one sham does not fit all or even most. And let me explain what I mean by that.
There are a growing range of different devices. Here you see the landscape of neuromodulation devices. On the X axis is how invasive they are and on the Y axis is how focal they are. And they all use different forms of stimulation applied to the head or the body. Some are surgically implanted, others are not. And those are just the devices that directly apply energy to the head or cranial nerves.
But there's another space of devices that deliver audio or visual stimuli to affect brain activity indirectly, and these include prescription digital therapeutics and neurofeedback devices.
Now, even within one modality of device, here I'm going to use transcranial magnetic stimulation, or TMS, as an example. We have a broad range of different TMS devices. Here I'm showing you just a few of them. And while they all use rapidly alternating magnetic fields, they differ in how they apply that to the head.
So this device, for example, uses an iron core figure 8 coil. This device uses an air core figure 8 coil. Now, those are pretty similar in terms of the electric field induced in the brain, but this device uses three different types of coil that are called H coils with different coil windings that stimulate very different parts of the brain and have different ancillary effects.
The device on the left uses an air core figure 8 coil, but it has some additional bells and whistles to it. It uses neuronavigation. So there's a camera in the room and a tracker to be able to navigate the TMS coil to a specific spot in the brain that was identified before treatment on the basis of fMRI. And so there's an additional aspect of this procedure. And also it's given with an accelerated schedule, where ten treatments are given a day, each day, for five days.
Now that brings us to some of these ancillary effects of TMS. One is the intensive provider contact in a high tech environment. And I'm showing you here just a few pictures from our lab. And this is intensive contact. It can range from either one session a day for six weeks to ten sessions a day over five days. And this really highlights the importance of blinding, not just for the patient, but also the coil operator and the raters.
Now, there are also sensory components to TMS. It makes a clicking noise, which is induced by the vibration of the coil within the casing. And this is quite loud. Even with earplugs, you can't mask the bone conduction of the sound. And so that, in addition to the sound, which can it also can induce scalp sensations. And these sensations can range from just feeling a tapping on your head to feeling something that's a scalp discomfort, even to scalp pain.
And the TMS can also evoke movements. So if you're even if you're not over the motor cortex, if you're over the frontal cortex, which is for depression treatment, this can cause movement in the face or the jaw, which can be from directly stimulating scalp muscles, facial nerves, or cranial nerves.
You can also, depending on the shape of the coil, get some evoked movement from the motor cortex. And this is more common with the more diffuse coils, such as the H coil configurations.
Now, not only are these ancillary effects important for blinding of clinical trials, they also represent important confounds for physiological studies that we do with TMS, where we want to understand use TMS to probe brain function, such as coupling TMS with EEG to study evoked potentials or coupling TMS with fMRI.
Now, sham TMS has evolved over the years. I'm showing you in the center of this photograph active TMS, and in the corners are four different types of early forms of sham TMS, which were called coil tilt TMS configurations, where you tilt the coil off the head so that the magnetic field is sort of grazing the scalp. You get some sensation, you get the noise, but you're trying to not stimulate the brain.
Now, while this coil tilt sham does induce some scalp stimulation and clicking, it lacks operator blinding. But even worse than that, what we showed from intracerebral recordings of the electric field induced in the brain by these different forms of a coil tilt sham in non human primates is that compared to active TMS, which is the top line, one of these four sham coil tilt configurations was almost 75% strength of active TMS. And that's the second line from the top with the black circles.
And so some forms of these coil tilt shams were actually biologically active. And that represents a confound when you're trying to study the older literature, trying to look at, do meta analyses of TMS clinical effects.
The next evolution in the step of sham TMS was shielding. And for example, figure 8 coils could have a metal shield between the coil and the head that blocked the flow of the magnetic field. And here, this E shield has both the magnetic shield as well as a printed circuit board on top of the coil that was meant to be fired antiphase with the TMS in order to try to cancel out the magnetic field at the surface of the head.
These types of approaches look and sound like active TMS, and they provide operator masking. However and they're biologically inactive. However, they don't feel like active TMS. Here you're looking at subjective ratings of scalp pain, muscle twitch, and facial pain with active TMS in the red and sham in the black. So there's not appropriate masking or matching of these ancillary effects.
But that sham, the E shield sham was used in the pivotal trial for depression in adults. And that pivotal trial missed its primary endpoint, which is shown here in the yellow box, where active TMS is in the blue line and sham is in the gray line.
Ultimately, TMS became FDA cleared in 2008 for a limited indication based on this post hoc analysis, which I'm showing you here, where about half of the patients in the pivotal trial who had failed only one antidepressant medication in the current episode showed a significant separation between active in the black line and sham in the gray line. However, those who had more failed trials in the current episode, from two to four, did not separate between active and sham.
Subsequently, the label was expanded and CMS coverage determinations have been provided, but that was on the basis of additional evidence, which came from additional randomized controlled trials as well as open label experience and literature reviews.
Now, that same sham has been used in a pivotal trial for TMS for adolescent depression, which also failed its primary endpoint and failed to separate active from sham. Here you see the antidepressant scores on the Y axis with active TMS in the blue and sham in the red, and they were indistinguishable.
And the sham is described in the paper, as I'm showing you here in the quote, and this is another one of these metal shield or E shield shams that did not provide scalp stimulation.
Now, ultimately, FDA did clear TMS down to the age of 15 on the basis of retrospective analysis of real world data that were derived from a registry of over a thousand adolescents over a span of 15 years, all of whom were obviously receiving off label treatment, as well as a literature review. And the status of insurance coverage is to be determined.
The next step in the evolution of sham TMS was scalp stimulation, and that's what we used in the OPT TMS trial of almost 200 patients. And this was the first study to use scalp stimulation. And you see those little patches on her forehead. Those are electrodes through which we administered weak electrical stimulation to the scalp along with auditory masking in order to better mimic the ancillary effects of TMS.
And here you can see the ratings of scalp discomfort and headache were similar between active TMS in the red and this scalp stimulation sham in the black.
This, we did assess the integrity of the blind in the OPT TMS trial, and we found that the blind was preserved, very low percentage of extremely confident correct responses. And we found a separation between active and sham in this study with a 14% remission with active and 5% remission with sham. That was statistically significant.
Shams in the modern era have kept this idea of scalp stimulation and auditory masking, but they come in different versions that are now available as turnkey systems. For example, this sham, which has an active magnetic stimulation on one side of the coil and no stimulation on the other side, but the sides are identical in appearance, and this comes along with an adjustable output for electrical stimulation of the scalp, which is synchronous with the TMS pulses that's built into the system.
Now I'm going to shift from TMS to a different form of stimulation, transcranial direct current stimulation, or tDCS. This is from one of the randomized controlled trials that we conducted of active versus sham tDCS for depression in 130 patients, which failed its primary endpoint.
Now, I'm showing you the depression response on the Y axis for unipolar patients on the left and bipolar patients on the right. And although we did not find active tDCS to be better than sham, we found something curious, which was that sham was better than active, particularly in the unipolar patients. And that caused us to ask, well, what is going on in our sham tDCS intervention?
Here's what our active intervention looked like. We stimulated at 2.5 milliamps continuously over 30 minutes. The sham, which we thought was biologically innocuous, actually had these brief ramp ups and then ramp downs intermittently during the 30 minutes.
But in addition to that, it had a weak current of .032 milliamps that was continuous throughout the stimulation. We weren't aware of this continuous stimulation, and it begs the question whether this waveform might have had some biological activity. And certainly when you find sham better than active, one has to ask that question.
Now, this question of how to sham tDCS trials has been addressed in the literature. In this study in 2019, they reported that there were a great multiplicity of sham approaches that were being used in the field. And some of these might have biological action.
Now, in 2018 we had conducted an NIMH sponsored workshop and published a report from that workshop in which we urged the field to present the rationale and the effectiveness of sham stimulation when you do studies. And we observed that this is rarely documented. We also encouraged the field to do blinding checklists during the study design, reporting, and assessment of study validity. And we still encourage this. It's still timely.
Now I'm going to move from tDCS to another form of implanted stimulation. So TMS and tDCS are non surgical. Now we're dealing with a surgical implanted device, vagus nerve stimulation.
So it's surgically implanted pulse generator, and sham is done by implanting the device but not turning it on. The pivotal trial of VNS for depression failed its primary endpoint, which is shown in the yellow box here. But it was subsequently FDA cleared based on a non randomized open label comparison with treatment as usual, as you see here. Insurance coverage was frequently denied, which limited utilization.
More recently, there was a study called the RECOVER trial, which stands for randomized controlled blinded trial, to demonstrate the safety and effectiveness of VNS as an adjunctive therapy versus no stimulation control.
This RECOVER study was designed in accordance with the CMS coverage with evidence determination decision memo. The study is not yet published, to my knowledge, but according to a press release from the company that sponsored it, after one year of active VNS versus sham, which was implantation but not being turned on, this study failed its primary endpoint.
And I'm quoting here from the press release that it failed due to a strong response in the sham group, which they said was unforeseen in the study design. And I would say that we might have foreseen this based on the original pivotal trial, which also failed to differentiate active versus sham.
Now I'm going to move to deep brain stimulation. And this is the randomized controlled trial that we conducted on bilateral subcallosal cingulate DBS for depression. Sham was done by implanting but not turning it on. And this study, in a futility analysis, failed to differentiate between active and sham. So you can see this has been a recurring theme in the studies that I've shown you.
Now, there's some specific challenges to blinding DBS trials. By the time you get to DBS, you're dealing with a very severely ill, depressed population, and clinical severity may represent some dangers when you try to think about the relapse that may occur from crossover designs, like crossing over from active to sham.
There are unique things that may unblind the study such as battery recharging or batteries that don't need to be recharged that could cue a patient. And also there's a need for rigorous safety protocols to protect patients who are so severely ill during their sham phases due to the risk of clinical worsening.
So, to conclude, sham methodology poses a lot of complex challenges for device trials. One size does not fit all. The interpretation of the literature is complicated by this variability in the sham methodology across studies and across time as the sham approaches have evolved.
Measuring the biological activity of the sham intervention before using it in a clinical trial is important and it is seldom done. And assessing the integrity of the blind is important for patients, operators, and raters. And that's why with sham procedures we need to think about triple blinding, not just double blinding.
And the shortest pathway to regulatory approval, which I gave you in the example of VNS, does not guarantee insurance coverage nor clinical adoption.
Some thoughts about future directions. We could focus on developing next generation active devices that lack these ancillary effects that need to be mimicked by sham. Some examples that you'll hear about from Zhi Deng, who's coming up next, include quiet TMS and controllable pulse TMS. We could conduct studies to validate and characterize the biological actions and expectancy effects of sham interventions. And there's a role for active stimulation of a control brain area as a comparison condition.
These are the members of the Noninvasive Neuromodulation Unit in our lab at NIMH. And I'll just show you the slide that we're recruiting for jobs as well as for patients in our trial. And thank you very much, and let me hand it back to you, Tor.
TOR WAGER: Wonderful. Thank you, Holly. All right. I think we have Zhi up next. So please take it away, Zhi.
Challenges and Strategies in Implementing Effective Sham Stimulation for Noninvasive Brain Stimulation Trials
ZHI DE DENG: I will share screen and maximize it. Good day, everyone. Thanks for having me here today. And for the next few minutes, I will discuss the challenges and strategies in implementing effective sham stimulation for noninvasive brain stimulation trials.
Dr. Lisanby has already gave a very nice overview as to why this topic is crucial as we strive to improve the validity and reliability of our neurostimulation device trials. I'll be discussing in more in depth the physical characterizations, computational modeling, as well as some measurements that we took of various sham strategies and discuss their trade offs in case you are interested in picking or implementing a sham technique or improving one. And I'll be focusing primarily on TMS and tDCS.
Before we proceed, I need to disclose that I am inventor on patents and patent applications owned by various institutions. Some of them are on brain stimulation technology. Additionally, this work is supported in part by the NIMH Intramural Research Program.
So when we talk about ... is this panel in the way? Let me put that aside.
TOR WAGER: It looks good. I don't think we can see it.
ZHI DE DENG: Okay, good. So when we talk about creating a valid sham TMS, Dr. Lisanby has already mentioned that there are several critical elements that we need to consider.
Firstly, the sham should look and sound like the active TMS to ensure a blinding. This means that the visual and auditory cues must be indistinguishable between sham and active conditions.
Secondly, the sham should reproduce the same somatic sensations, such as coil vibrations and scalp nerve and muscle activation. This sensory mimicry is essential to maintain the perception of receiving active stimulation.
And finally, perhaps the more important one, that there should be no active brain stimulation, which means that the electric field induced in the brain should be minimized to avoid any therapeutic effects.
For TMS, there are several categories of ways to implement sham, which are loosely categorized into the coil tilt techniques, two coil configurations, and dedicated sham systems. I'm going to describe each of them in some detail next.
So Dr. Lisanby has already covered the coil tilt technique, and this is one that was pretty popular in the early days of TMS. By angling the coil 45 degrees or 90 degrees relative to the tangential plane of the head, one can minimize the stimulation to the brain. At least they thought so.
It turns out through modeling and also intracranial recordings of induced voltages that some of these coil tilt techniques remain biologically active. Here you see simulations on a spherical head model of various coil manipulations in coil tilt. Up here we have the active figure of 8 stimulation producing a single focus of electric field directly underneath the center of the figure of 8 coil.
When you tilt the coil 45 degrees or 90 degrees, and when you look into the brain, there is considerable residual electric field that is still induced with these coil tilt techniques.
A better way, a very clever way, and this is popularized by some folks in Europe who's doing motor excitability studies, involve two coil configurations. You use two TMS coils that are attached to two different TMS stimulators, and you would position these coils perpendicular to each other, one in the active tangential configuration and one that is 90 degrees on top of the active coil.
And with this technique, the advantage is that you can interleave active and sham TMS pulses in the same protocol because you are dealing with two different TMS stimulators. So in active mode, you would simply fire the coil that is closer to the head, which is tangential in the active configuration. In sham mode, you would simply fire the coil that is on top of the active coil.
However, this technique, like the coil tilt, there is a spacer involved in this perpendicular coil setup. So the field that is induced in the brain is less compared to the 90 degrees coil tilt, but it does also not induce any scalp stimulation. That means that the sensation at the scalp level is decreased and not felt by the participants.
Another implementation involves a sandwich design, also involving two coil setups that are sandwiching a metal shielding plate. In active stimulation mode, one would fire the coil that is closer to the head, and in sham mode one would fire the coil that's further away. And this shield ensures that you have the -- limits the penetration of the magnetic field, resulting in no scalp stimulation as well as no brain stimulation.
The final category of sham systems are these dedicated sham systems manufactured by different companies, the first of which is a reversed current sham. Magstim has an implementation of this concept. In active stimulation, the coil current in the coil is such that there is a same coil current direction underneath the center of the coil, summating the field underneath the center.
In the sham stimulation setup, the coil current in one of the loops is reversed such that at the center of the coil the field is canceled. This effectively creates a larger circular or oval type coil, which is a larger coil that has a lesser field decay, and so when you actually look into the brain, there remains substantial electric field stimulation there.
Another technique that was mentioned earlier is shielding by, again, putting a metal shield or new metal shield underneath the coil. You can effectively block out all of the field penetration, but one would also completely eliminate any scalp stimulation, making the sensation feel different.
Another implementation strategy involves using a spacer and a passive shielding. This is an implementation of the MagVenture coil, for example, using a large block coil, and the coil winding inside that large block is only built into one side of the coil. And so during active stimulation, one would flip the coil such that the active winding is closer to the head. And for sham stimulation, one would flip this coil over such that the passive shielding is closer to the head and the active winding elements are further away from the head.
This shield technique plus the spacer would completely eliminate any brain stimulation, but it also would eliminate any scalp stimulation.
A final coil setup was invented by our lab several years ago, which we called the quadrupole coil. This implementation splits the figure of 8 coil into four loops, and by reversing the coil current direction on the outside loops during sham stimulation, effectively, you may get into a smaller figure of 8 coil. And as we know, with smaller coils, it has a lower field penetration, and therefore the scalp stimulation is reduced as well as the brain stimulation is reduced.
How do all of these different sham stimulation strategies stack up on each other? The criteria we want to achieve is basically 100% scalp stimulation compared to the active electric field. So when we quantify this sham electric field at the scalp, one would like to achieve 100% compared to the active E field in the active configuration.
When it comes to brain stimulation, in sham, E field should be zero. You don't want any electric field induced in the sham condition. And so one would like to maximize this contrast between scalp stimulation and brain stimulation.
But looking across the coil tilt techniques, the two coil configurations and dedicated sham systems, none of these techniques perfectly achieve what we want. Either you have no scalp stimulation, but it also has no brain stimulation, or you have residual scalp stimulation and brain stimulation at the same time, confounding clinical trial results.
So these are the primary challenges in implementing sham systems. There is a incomplete mimicry of sensory experience that is the scalp stimulation or that you have too much of this residual, possibly biologically active brain electric field that is induced.
So why don't we take a coil that does not produce any brain stimulation and produce no scalp stimulation and add to it some scalp stimulation back? And this is a proposed technique using concurrent cutaneous electrical stimulation, which was used in some of the early clinical trials of TMS, utilizing two electrodes that are placed relatively close together, approximately one centimeter's edge to edge distance underneath the center of the coil.
And the placement of the electrodes is such that you maintain the current direction induced in the head compared to active TMS. And the current is mostly shunted in the scalp, but a little of it enters the brain.
The early implementations of this technique would use a customized ECT device, and the device would deliver low amplitude square pulses that are synchronized to TMS pulses. In more modern configurations, this electrical stimulation module is incorporated into a dedicated sham coil, for example, such as the MagVenture setup.
There are several ways to use this electrical stimulation. One way is to carefully titrate the stimulus intensity for this electrical stimulation to match the active TMS sensation, or some labs maximize the intensity of the electrical stimulation, and this electrical stimulation would be delivered in both active and sham TMS conditions to entirely mask scalp sensation in both conditions.
Now, there are some problems with this cutaneous electrical stimulation, the first of which is waveform considerations. What is the waveform of these electrical pulses that are accompanying this sham TMS pulses? First of all, the manufacturers specified triangular waveforms with a 200 microsecond rise time and a 2 millisecond fall time.
When we actually make measurements of these current pulses, though, the waveform deviates substantially from this triangular waveform that manufacturers specified in their manual. What we actually measured are these exponential decaying waveforms that has a much longer tail compared to the 2 millisecond fall time of the triangular waveform.
What's more is that if one were to characterize the decay constant of this exponential decay and plot it as a function of the intensity of these pulses, one would find that for pulses that are more intense, you have a shorter decaying constant, and therefore it's more pulsatile. If you reduce the electrical intensity, you would end up with a pulse waveform that is longer and longer. And I'll tell you why that's important a little bit later.
A second feature that is peculiar of this system is that the current amplitude is not linear with the dial setting. That is, if you were to increase the intensity from rotating the dial on the machine, a increase from setting of 1 to 2 is not the same as a setting jump from 8 to a 9, for example.
And the maximum current at maximum stimulator setting is upwards of 6.7 milliamps, which is considerably higher compared to other electrical stimulation such as tDCS, which typically uses 2 milliamps.
There's another issue with this electrical stimulation intensity, which is that this electrical E stim intensity was advertised to scale with TMS intensities. That is, as you dial up the intensity of the TMS pulses, the intensity of the electrical stimulation should also increase.
And this is not the case from our measurement. As you can see here, at two different electrical stimulation intensity settings, as we dial the TMS pulse intensity up from 50% to 90%, the amplitude of these electrical stimulation waveforms, they don't really change.
Why is pulse shape matter? Why do pulse shape matter? This has to do with the strength duration property of the sensory fibers underneath the TMS coil. Sensory fibers are classified in this rudimentary drawing of sensory nerves that I put up here.
There are A beta nerves, which are these larger diameter myelinated nerves. And typically they have faster conduction time, and so they carry information about vibrations, pressures, and touch. A delta nerves are slightly smaller, about one to five microns in diameter, and they typically carry information about sharper pain. And then we have these C fibers that are unmyelinated and they are smaller in diameter. And because of the lower conduction time, they would carry information about burning sensations and thermal pain.
I know this is not a very professional drawing of these nerves, and, of course, when it comes to drawing, I am no Rembrandt, but neither was Picasso.
This is actually a more professional drawing, but the important thing about the different pulse shape is that they preferentially activate different kinds of fibers with different time constants. So one can actually model that using a nerve model, which I have done here, and we can show that the proportional nerve activation is different across different waveforms.
On the left cluster of bars, we see what the profile of the proportional nerve activation is like for various types of TMS waveforms, including biphasic sinusoids, monophasic sinusoids, and controllable pulse width, which are near rectangular pulses.
These TMS waveforms preferentially activate A beta and A delta fibers, contributing to this tapping sensation that you feel with TMS.
But when it comes to electrical stimulation using these exponential decaying waveforms, you see that these waveforms preferentially activate C fibers. Not only that, as you change the intensity of the stimulation from maximum to minimum, you preferentially stimulate more and more of the C fibers. That is, if you decrease the amplitude, the tail here gets longer and longer, and you stimulate more and more of these C fibers, and you create more and more burning sensation and this tingling sensation that sometimes people report with tDCS, for example, which is uncomfortable to some people.
But as you increase the electrical stimulation intensity, yes, the pulses become shorter and it feels more pulsatile, but then the intensity is increased, so now it feels more painful.
And so that does not seem to be a way to achieve a very comfortable setup with this electrical stimulation. And what's more important is it does not feel like TMS, that the profile of these nerve activation is very different from a TMS waveform.
So we did not find any perfect sham. The next order of business is that we look into the clinical literature. Might there be any other stimulation parameters such as intensity or stimulation site or stimulation protocol that are predictive of sham response, something that we can modulate and modify.
So we looked into the literature, and we replicated and extended a previous meta analysis looking at depression trials that are randomized controlled trials of TMS. The average sample size across these trials are 35 subjects. In terms of stimulation protocol, predominantly high frequency stimulation and the second largest group would be low frequency stimulation.
In terms of intensity, we have a mixture of intensity with most protocols administering either 100%, 110% or 120% of motor thresholds. In terms of stimulation site, most of these clinical trials use left dorsolateral prefrontal cortex as the treatment target. That as a single site stimulation combined with bilateral dlPFC account for close to 80% of the clinical trials.
In terms of targeting approach, I was surprised to find that we were still using the scalp based targeting strategy of the five centimeter rule, which uses just measurements on the scalp, five centimeters on the scalp anterior to the motor hotspot. And that's where they determine the location for the left dorsolateral prefrontal cortex.
In terms of sham type, a lot of the earlier studies, as Dr. Lisanby mentioned, uses the coil tilt configuration, either 45 degrees or 90 degrees. And so in this analysis, they still account for majority of the studies, and only about a third of the studies included uses a dedicated sham's coil setup.
Manufacturers, you know, it's a mix. In terms of coil types, they're predominantly a figure of 8 coils. And in terms of the number of sessions that are in these studies, the median is 12 sessions of treatment.
So what did we find? What are the correlates of sham response in these clinical trials? The first thing we found was that the number of sessions is correlated with sham response. So here on the Y axis, we're plotting the percent change from baseline for the primary outcome of the study, typically a depression severity rating. So down is actually good, antidepressant. And here we see a weak correlation between the number of sessions in a typical clinical trial with improved sham stimulation.
And this, you know, over a longer treatment course, participants may develop stronger expectation of improvement, and this continued engagement with the treatment process plus regular clinic visits and interaction with a healthcare team can reinforce these expectations contributing to this sustained and enhanced placebo response, which can also accumulate over time.
The second correlate that we found to be significantly correlated with sham response is active response. So in any given clinical trial, the higher the active response, the higher the sham response. And the correlation between sham and active responses may indicate that the mechanisms driving the placebo effect are also at play in the active treatment response.
This correlation might also reflect any form of intervention. And this finding underscores the importance of effective blinding and management of participants expectations and in account for placebo effects in clinical trial design and interpretation. And the final correlate is effect of time. Something that was also mentioned in relation to pain medication a little bit earlier. So Dr. Wager mentioned earlier sham response seems to be increasing over time. We also observe this effect.
Now this increase in placebo response with drugs is sometimes hypothesized to be associated with societal changes in the attitude towards certain types of treatments and perhaps greater awareness in medical research and increased exposure to healthcare information. And also more advertising in general, particularly post approval of a drug or a device. And all together it can enhance participants' expectations and belief in the efficacy of certain types of treatments contributing to stronger placebo response.
Here we see the same thing with devices. There are also other interpretations of this increased placebo response. Perhaps the demographics of the characteristics of the participants in clinical trials might have changed over time. Perhaps participants today are more health conscious, they are more proactive and engage in healthcare, leading to stronger expectations of treatment options.
It could also be that sham response -- sham devices and procedures are becoming more realistic. Changing from the earlier coil till techniques and to now more dedicated sham systems that can enhance the belief that one is receiving an active treatment. The good news, though, is that active response is also increasing, although not quite at the same rate. Active response may be increasing over the years as well, likely attributed to improvements in dosing and targeting techniques.
Speaking of similarities between drugs and devices and their placebo response, there are also some key differences. A study was published last year in Neuromodulation pointing out the differential placebo responses between neurostimulation techniques and pharmacotherapy in late life depression. The time course of this sham placebo response is different between sham RTMS and placebo pills. Specifically at the four-week time point, participants receiving sham RTMS showed a significantly greater reduction in their Hamilton Depression Rating Scale compared to those receiving placebo pills. And this suggest a stronger early placebo response to neurostimulation compared to pharmacotherapy.
But when we look at 12 weeks, the placebo response for drugs start to catch up. And by the end of the twelve -- at the end of the trial at 12 weeks there are no significant statistical difference between the placebo pill response and the sham TMS response. This is important to consider if we're designing clinical trials to compare drugs versus devices, for example.
So we must take care of -- think about when to assess primary outcome and also employ statistical techniques to account for this time-dependent placebo effect.
Touching on TDCS for a second. We don't really have a lot of work on TDCS. Typical sham protocols in TDCS is implemented by changing the time, the temporal waveform of the stimulation, by ramping up during the beginning phase of the stimulation, and sometimes a ramp up/ramp down towards the end of the stimulation to give a transient sense of the brain is being stimulated. There are some protocols that maintained a constant low intensity as shown in Dr. Lisanby's slides that are these microamp stimulation which may or may not be biologically active and that may confound results of clinical trials.
NIMH Staff: Dr. De, I'm sorry, but we are going to need to wrap up to give enough time for our following speakers.
ZHI-DE DENG: Okay, wrap up. Sure. Sure. Final slides. And we're just going to be talking about the -- some of the determinants of sham response in TDCS trials. There seems to be a large sham effect. And there are some protocols that has better blinding compared to the others. And there are certain electrode placement that has lower sham response and that again, similar to TMS, the sham response in TDCS is correlated with the active TDCS response.
With that, I think I will skip the rest of this talk and, you know, allow questions if you have any.
TOR WAGER: Okay. Thank you. Great. Well, keep putting the questions in the chat. And for our panelists, please keep answering them as you can.
We'll move on to the next session right now which is going to cover placebo effects in psychosocial trials and interpersonal interactions.
So our two speakers are Winfried Rief and Lauren Atlas. I believe, Winfried, you are going to go first so please take it away.
Current State of Placebo in Psychosocial Trials
WINFRIED RIEF: Thank you. First, greetings from Germany. And I'm pleased to be invited to this exciting conference.
I was asked to talk about placebo effects in psychosocial trials. And this is certainly a quite critical question whether we can really apply the placebo construct to treatments on psychological therapies and trials in psychological therapies.
So I want to just try to highlight why this is complicated to transfer this concept to psychological treatments. But then I will dive into details how placebo mechanisms might apply and how we might be able to control them in psychological treatments.
So what is the problem? The problem is about the definition of psychological treatments. They're designed studies that utilize psychological mechanisms to treat clinical conditions. But if we consider the definition of placebo effects in medicine, this is pretty similar or highly overlapping with the definition of psychological treatments themselves.
So the impact of psychological and contact factors are typically considered the placebo mechanisms in medical interventions. So we can switch to other attempts either to define placebo mechanisms. But then we need the concept of what are specific, what are unspecific mechanisms. And this is quite difficult to define if we use psychological interventions because we don't have this very clear ingredient as we have in drug trials.
And the novel definition define placebo mechanisms as mechanisms of conditioning and expectation. But this is already a definition of psychological interventions.
And, as you know, CBT started with the concept of using learning mechanisms to improve clinical conditions. So there is an overlap in the definition what placebo mechanisms are and what psychological treatments are. And therefore it's quite difficult to disentangle the effects.
To provide more insight, I reanalyzed a meta analysis of Stephen Hoffman's group on depression and anxiety trials because they only included placebo-controlled trials on psychological interventions. For some of these trials, they were able to have some placebo-proof conditions if they also integrated some psychoactive drug arms. But most of the trials used arms that used some psycho education parts, information about the control or some supported therapies which means just to reflect emotional well being and to support emotional well being.
But some other trials used interventions that are known to be effective such as interpersonal psychotherapy or cognitive restructuring or GRN therapy. So they used therapies as control conditions that are known to be effective in other conditions. And this shows how difficult it is to define what a good placebo condition is in psychological interventions.
And in this meta analysis, in the first version of it six years ago, the authors defined a good psychological placebo conditions as someone -- as a condition that used an intervention and excludes the specific factor, only including the nonspecific factors. And these mechanisms that are used in the placebo arm should have shown to be non-effective for the treatment under -- for the clinical condition under consideration. And this is already a point that will be pretty hard to define in detail if we develop placebo conditions in psychological treatments.
Another attempt, as was already mentioned by Tor, is to disentangle the variant parts of treatment outcome. And this attempt, this approach is associated with names of like Bruce Wampold or Michael Lambert and others. And I show here the results of Michael Lambert's analysis. And you see that he defines placebo effects as the mere treatment expectation effect and to declare this is about 50% and allocates other parts of the effects to other factors.
We have to be aware that this kind of variants disentangling analysis, this is just about statistical modeling. This is not about causal investigation of factors. And a second shortcoming of it is also it does not consider the actions of these factors. And therefore the insight that we get from this kind of analysis is only limited.
But coming back to psychological treatments, we can say that patient's expectations are powerful predicters of outcome, as we know from medical interventions already. Here is data from a psychological treatment study on chronic pain conditions which shows that we find response rates of 35-36%, but only if patients have positive outcome expectations before they start treatment. And those who have negative outcome expectations have much lower success rates like 15%. And the relationship between positive and more negative expectations remains stable over month and years.
So what is the major challenge if we try to define control conditions in psychological treatments? The first point is we're unable to do a real blinding of psychological treatments. At least a psychotherapist should know what he or she is doing. And the placebo group in clinical trials often are different from the active interventions in terms of credibility or as we call it of being on a treatment -- a treatment that is as credible as the active treatment is.
And for some control conditions it's even questioned whether they are kind of nocebo conditions such as standard medical care or waiting list group. If you are randomized to standard medical care or waiting list, you might be disappointed, you don't expect much improvement. While being in the national core group might be even better, you try to do some self-help strategies, for instance. And another aspect is that the nonspecific effects can sometimes switch to become specific effects depending on what your treatment is and what your treatment rationale is.
I'll show one example of one of our studies for this effect. We investigated the treatment expectations in patients undergoing heart surgery. And before they had the heart surgery, we did a few sessions to optimize treatment outcome expectations. That means outcome expectations were moved from being a noise signal of placebo effect to being the target mechanism of our intervention. Like in this case the therapist is working with a patient to develop positive outcome expectations, what happens after they manage to survive the heart surgery.
So we did that with a randomized clinical trial with an expectation optimization in the major group when compared with two control groups. And we were able to show that if we optimize treatment outcome expectations in cardiac, in heart surgery patients, these patients really did better six months after surgery. Standard medical care has little improvement. It's mainly providing survival, which is important enough, no question about that. But where the patients are really feeling better six months after surgery depends on whether they got some psychological preoperative preparation.
And we also used this approach of optimizing expectation to develop complete psychological treatment programs also for patients with depression and with other mental disorders. So let's come to the other part of the placebo mechanisms, the nocebo effect. And I would like to report about nocebo effects in psychological treatments but the major problem is side effects and other effects are only rarely assessed in psychological treatments. This is really a shortcoming.
Here is just a top ten side effects from psychological treatments. Many of them are just increasing conflicts and problems. But some are also about new symptoms that develop. And some of our other studies we even found that symptoms such as suicidal ideation are increasing sometimes for some patients in psychological treatments. So negative side effects are an issue in psychological treatments and we need to assess them and to better understand afterwards whether nocebo effects occur.
How do they develop these treatment expectations, be it either positive or negative? One major effect was already shown in many placebo trials. And that is about pretreatment experience. Here are data of about 300 former psychotherapy users who plan to attend another psychological treatment. And you can see that how much improvement patients expect mainly depends on how much improvement they experienced during the last treatment.
And the same with negative expectations and the same with side effect expectations. Of note, positive clinical outcome expectations are not correlated with negative outcome correlations. That means people can be optimistic and worry at the same time. So a critical role about patient's frequent expectations is the clinician. And we wanted to evaluate the effect of the clinician using an experimental design. Here is our clinician. I will call him Tom. Who is explaining to a critical patient whether psychological treatments can help or not.
And we wanted to modulate this situation and therefore we first brought all our participants in this situation of developing negative treatment outcome expectations. We were quite successful in establishing negative treatment outcome expectations or as you see here, reduction of positive outcome expectations. After that, Tom explained to the patient that psychological treatments are helpful for his or her condition. But Tom changed his behavior. He always used the same information. Psychological treatments are powerful to improve your clinical condition.
But he sometimes was more warm and empathetic. Sometimes he showed no signs of competence. Sometimes both. You can see that it mainly depends on these behavior patterns of the therapist whether the information that he wants to transfer really has some action. If the therapist is low in competence and low in warmth, the same information doesn't have any effect while the same information can have a very powerful effect if the therapist shows warmth and competence.
So let me conclude these few insights into our placebo research. The distinction between specific treatment mechanisms and unspecific mechanisms is less clear than in biomedical interventions. But we can still say that expectations also predict outcome in psychological and psychosocial treatments.
And main determinant of treatment expectations are pretreatment experiences, but also the clinician/patient relationship and many other factors that contribute to a development of treatment expectations. Expectations can be an unspecific factor to be controlled for, but they can also be the focus of an intervention and can really boost their treatment effects and therefore they are -- it's really valuable to focus on them.
And, unfortunately, side effect assessments are typically overseen factors in clinical trials. I'll come to this back in a moment. We want to recommend that placebo-controlled trials are needed in psychosocial intervention -- for psychosocial interventions. But it's more difficult to decide what to include into them. The major idea is to exclude the active mechanisms, but this is not that easily to be defined and therefore we need some psychological attention conditions that are credible in our controlled conditions that psychological treatments are compared with.
I would say that we need a variety of trial designs. Maybe if you start with very new interventions, it might be justifiable to start with a waiting list control group or with a standard medical care group. But if you want to learn more about the treatment, you need more control group designs. And there is not one perfect control condition, but you need variations of it. And last, not least, we have a strong emphasis on side effects and adverse events and unwanted events need to be assessed in psychological treatments as well.
Finally, let's make two comments. I think placebo-controlled investigations are developed and have to be developed to better understand the treatment mechanisms. From the patient's view, they are less important. The patients want to know whether -- what the overall efficacy is of a treatment. That means the combination of specific and unspecific effects, the overall package. And we shouldn't lose that out of mind.
And second, all these mechanisms we are talking about, they are not really to be separated one from the other, but they are typically interacting. Expectation effects are interacting with the development of side effects are interacting with the experience of improvement that can go back to the drug or to the psychological treatment.
So, so far from my side, and I'm happy to hand over to Lauren who will continue to talk about this issue.
TOR WAGER: Wonderful. Thank you, Winfried.
Now we have Lauren Atlas.
LAUREN ATLAS: Thank you. So it's really an honor to be wrapping up this first exciting day of this workshop. And to kind of I guess in a way bring you back to some of the themes that Tor highlighted in his introduction.
So I'll be talking about why I think that we as a field would benefit from taking a social neuroscience approach to placebo analgesia and placebo effects more generally. So Tor used the same figure in his introduction to the day. And I think one of the things that I really want to highlight in this is the distinction between intrapersonal factors so things like expectations, learning, history of associations with different treatments and different clinical context. And this really has kind of been the foundation of most studies of how placebo effects works -- work really because it's quite easy to manipulate things like expectations and learning in the lab and understand how those affect clinical outcomes.
But there has been far less work on the interpersonal processes that support placebo. And in some ways I'd like to say this is really where we need to be going as a field because it could be a lot easier to teach clinicians how to enhance patient outcomes rather than sort of being to fold into what a patient brings to the table. Although of course these factors interact and are both important in determining clinical outcomes.
And so the way I like to think about this interplay is really from a social affect of neuroscience standpoint. So the term social neuroscience really has come about over the past couple of decades talking about how we can use neuroscience techniques to understand emotional and interpersonal processes across a variety of domains. And where I think about this in the context of placebo is, first of all, through neuroscience techniques we can understand how placebo effects are mediated, whether that be supporting specific different types of outcomes or more general processes that shape placebo effects across domains.
From an affect and neuroscience standpoint, we can determine whether the mechanisms of different types of placebo are shared or unique. So, for instance, in the context of placebo analgesia we can ask whether placebo affects are really supported by pain-specific mechanisms or are we looking at the same mechanisms that might also be relevant in placebo effects for depression.
And then finally, from a social standpoint we can really isolate what a role is of the social context surrounding treatment. And so I a couple of years back wrote a review kind of looking at placebo effects from this social affect of neuroscience standpoint focusing on the role of expectations, affect and the social context.
Today I'd like to focus first on mechanistic work using neuroscience to understand how placebo effects are mediated. And secondly to address the role of the social context surrounding treatment. Which I think has implications not only for the study of placebo and clinical outcomes but also for reducing health disparities more generally. And I think I do want to say that I think the study of placebo can really point to all of the different features of the psychosocial context that influence clinical outcomes.
So this is why I think there is so much we can take from the study of placebo more generally. So turning first to how placebo effects are mediated. First, throughout the day we've been talking about how expectations associated with treatment outcomes can directly influence clinical outcomes in the form of placebo. And as Tor mentioned, if we not only compare treatment arms to placebo groups to isolate drug effects but instead also include natural history control groups, we can isolate placebo effects on a treatment outcome by controlling for things like regression to the mean.
Now, again this came up earlier, but a meta analysis of clinical trials that compared placebo with no treatment revealed that there was no placebo effect on binary outcomes or objective outcomes. But there was a substantial placebo effect on continuous subjective outcomes and especially in the context of pain. The others concluded that the fact that placebos had no significant effect on objective continuous outcomes suggest that reporting bias may have been a factor in the trials with subjective outcomes.
So the idea here when we talk about kind of our model of placebo, traditionally we think that things like social dynamics, psychosocial context surrounding treatment, cues associated with treatments lead to changes in one's sensory processing or one's bodily state. And based on that one makes a subjective decision about how one is feeling. For instance, a placebo effect in depression might lead to shifts in emotional processing, or a placebo effect in pain would lead to someone reporting less pain. And this is really driven by our report biases.
The idea is that rather than expectations changing that sensory processing, they affect subjective responses directly perhaps by changing our criteria in first calling something painful. So for over two decades now the field has really focused on asking to what extent are these effects mediated by changes in sensory processing?
And placebo effects in pain are a really ideal way for us to ask this question because we can objectively manipulate pain in the lab. So we can use this device called a thermode heated up to different temperatures and measure how much pain it elicits. And the targets of nociceptive signals are well studied, very well known and we know the tracks that transfer this information to the cortex.
And these can be visualized using functional magnetic resonance imaging or fMRI. So we see reliable activation in response to changes to nociceptive stimuli in a network of regions often referred to as the pain matrix including the insulate, dorsal anterior cingulate, thalamus, medial sensory cortex and brainstem and cerebellum.
Now we used machine learning to identify pattern of weights, which we call the neurologic pain signature that is sensitive and specific to pain and can reliably detect whether something is painful or not and which of two conditions is more painful. So this really provides an opportunity to ask when placebos affect pain. So, for instance, if we apply an inert topical treatment to a patient's arm before administering a noxious stimuli that they believe will reduce pain, does this pain reduction come about through changes in pain specific brain mechanisms or do we see shifts in more general mechanisms such as shifts in affect, things like emotion regulation or value-based learning? So maybe people just feel less anxious but there is nothing specifically about pain. This isn't really a problem because this would also mean that what we're learning about might transfer to other domains.
So a couple of years back nearly all labs that use this neuroimaging to study placebo analgesia in the brain combined patient level data. And what we found is that there was a reliable reduction in pain reports during fMRI scanning when people had an analgesic treatment -- or a placebo, sorry, relative to a control treatment that they didn't believe would reduce pain with a moderate to large effect size.
But there was no reliable placebo effects on the NPS. So this suggests that really we're not seeing placebo effects on this kind of best brain-based biomarker of pain. What do we see the placebo effects modulating? Oh, sorry, it's important for me to say that even though we don't see placebo effects on NPS, there are other psychological manipulations such as mindfulness cues that predict different levels of pain or administering treatments that reduce pain both when subjects know they are receiving it or when they believe they are not receiving it. And these all did affect NPS responses. So it is possible for psychological treatments to modulate the NPS, but we didn't see any placebo effect on NPS responses.
We also conducted a meta analysis of placebo analgesia looking at other published studies. And what we found is that there were reliable reductions during pain with placebo administration in the insula, thalamus and dorsal anterior cingulate. Now these regions are indeed targets of those nociceptive pathways that I mentioned. However, these regions are also activated by pretty much any salient stimulus in that MRI scanner as well as by anything involving interoception or a tension to the body.
And so I think an important point for the discussion is to what extent are these mechanisms or any of the principles we've been talking about today unique to pain or depression or any specific clinical endpoint.
When we looked for regions that showed increases with placebo, we saw increases in the ventral medial prefrontal cortex, dorsolateral prefrontal cortex and the striatum; regions that really have been implicated in domain general shifts in affect, things like emotion regulation and learning about valued outcomes.
So in this first half of my talk I demonstrated that placebo effects seem to be mediated by domain general circuits involved in salience, affective value and cognitive control. We did not see any placebo effects on the neurologic pain signature pattern. And this really points to the idea that these placebo mechanisms are unlikely to be specific to pain.
However, you know, there is many different labs working on different mechanisms of placebo. And so I think this is an ongoing question that really demands on further trials and different comparisons within and across participants.
So now I'd like to turn to the second half of my talk addressing the role of the social context surrounding treatment. And I'm going to talk about this in terms of patient's expectations, providers' assessments of patient's pain, and patient pain outcomes themselves.
So we were interested in asking whether patient's perceptions of providers impact pain expectations. And we know from work that Winfried and many others have conducted that indeed placebo responses depend on many different factors in the patient-provider relationship including how a provider treats a patient.
So Ted Kaptchuk and his group showed that a warm provider can lead to reductions in IBS in an open label placebo trial. We just heard data on how a provider's warmth and competence can influence outcomes. And this has also been shown in an experimental context by Ally Klem’s lab. And finally -- and I'll present this briefly at the end of my talk – we also know that a patient's perceived similarity to their provider also influences pain and placebo effects in simulated clinical interactions.
So a former post doc in my lab, Liz Nekka, was interested in studying this by asking not only whether interactions between patient and provider influence pain expectations but also whether our first impressions of our providers, namely in terms of their competence and/or similarity to us influence expectations even without actual interactions.
And the reason Liz wanted to do this is because we know from social psychology that people's first impressions are really important for a lot of different behaviors. So simply looking at people's faces can predict -- and judging competence can predict the outcomes of elections. And this is work that really has been led by Alex Todorov and his group.
So these faces are morphed along a dimension of competence. And so you can kind of see moving from three standard deviations below the mean to three standard deviations above the mean that there are certain features that are sort of associated with competence and dominance and that we use to make judgments about that person's trait. And so Liz asked whether these types of first impressions also influenced expectations about pain and treatment outcomes.
We conducted five studies on -- using Amazon's Mechanical Turk. And the first studies used those morphed faces from Todorov's group. Importantly, these were just male faces in the first two studies. In our third study, we used the same competence dimensions morphed onto either male or female faces.
We conducted another study in which we removed any cues like hair or clothing and just showed the face, the morphed male or female face itself between subjects.
And in the final study we used real individual faces that varied in race and ethnicity and again had between groups a manipulation of sex. On each trial participants first went through a series of trials in which they saw two faces that varied in competence and told us which provider they would prefer for a potential painful medical intervention. And then they were asked to imagine that provider were performing a painful medical procedure on them, how painful would the procedure be. And after the procedure are you more likely to use over the counter or prescription medication assuming that if the procedure is less painful they would assume -- they would expect to be more likely to use over-the-counter medication.
We also asked about similarities, but I won't be focusing on that today. So across all of the studies, so this is chance. This is that first decision, how likely are you to select a more competent face. What we found is that participants chose the more competent looking provider based on those facial features in the first study. We replicated that in the second study. In the third study we found no difference as a function of the features related to competence. In part because people preferred doctors who -- female doctors who looked less competent based on these features.
In the fourth study we used other individual’s ratings of perceived competence and again found that people selected more competent faces. But they also preferred this particularly only in the male faces. And when we used these real individuals, we again found that other people's ratings of competence predicted somebody's likelihood of selecting that person as their provider. And this was strongest when it came to white providers. We found that competence directly influenced pain expectations in all of the studies except for study three. So here this is the association between ratings of competence and pain. And so you see higher competence is associated with less pain across all the studies but study three. And, again, all the studies showed that the stronger the competence, the more likely somebody was to say they would have an over-the-counter prescription treatment in that study. But we found an interaction with sex such that competence predicted over-the-counter treatment only for male participants whereas competent female providers were associated with higher likelihood of having prescription medication rather than over the counter.
Finally, we found that stereotypes for these kind of information about race, ethnicity and gender which we were able to test in the fifth study also impacted pain expectations. So in study five, we found that expectations about pain varied as a function of provider race. We found that people expected the least amount of pain and highest likelihood of over-the-counter medication from the Asian providers relative to all others. And we also found sex differences in the expected medication use.
And finally, when we ran the meta analysis across all the studies, we found that effects of similarity unexpected analgesic use were strongest in white participants. And this is likely to be kind of an end group preference mainly because studies one through four all included white providers. And we found no other effects of the perceived demographics themselves.
Just with the last like three minutes or so. We know that not only do patients' stereotypes impact perceptions of providers, but we also know through studies on health disparities that providers' beliefs also impact assessment of patient's pain. So Peter Mende-Siedlecki who in this area ran beautiful studies looking at how race bias on pain assessment may be mediated through perceptual changes. Peter had black or white male actors depict pain or neutral faces. And he created morphed images ranging from neutral to painful.
And what he found is that white perceivers needed more evidence of a pain expression before labeling pain on black faces relative to white faces. And the more of the difference they had in terms of likelihood of seeing pain on white relative to black faces also predicted prescribing more analgesics to white relative to black targets across a number of studies.
We asked whether we saw similar biases in evaluations of real pain by measuring facial reactions in acute pain in 100 healthy individuals who label rated pain in response to heat, shock or cold water bath. What you can see is people have very different reactions to pain. This is all kind of the same level of pain. But you see differences in expressiveness.
And we're going to be creating a public database that will be available for other researchers to use to study pain assessment in diverse individuals. We had other healthy volunteers view these videos and assess pain. And critically we selected pain so that there were no differences across target race or gender in terms of the pain or its intensity. All the videos we presented were matched. Subjects saw videos and rated whether the target was in pain or not and how intense the pain was.
And what we found is that perceivers were less likely to ascribe pain to black individuals relative to white individuals. So again, black is here in cyan and white is in pink. And the women are with the hash lines and males are solid. And these are all again selected for trials where everybody is feeling the same amount of pain. And this is really driven by a failure to ascribe pain to black male participants when they were experiencing pain. And this was supported by signal detection analysis. We found that these race-based differences in pain assessment correlated with scores on a modern racism scale but did not vary dependent on perceiver race or gender. And we're now doing a study basically looking at how this type of bias might be reduced through learning and instructions. So basically we find that when people are told about a participant's pain after every trial, they are more accurate in judging other people's pain and that whether or not people receive feedback on pain assessment accuracy improves over time as people practice, suggesting we may be able to reduce these pain assessment biases through training and perhaps in clinical samples.
And finally, I just want to acknowledge that in this kind of dyadic interaction, we really ultimately also want to look at the direct interpersonal interactions that shape placebo analgesia. And this has been done by a series of studies of simulated clinical interactions where healthy volunteers are randomly assigned to act as doctor or patient and they administer a placebo to somebody else.
So Andy Chen Chang showed that telling a doctor that a treatment was analgesic affected the patient's pain, and that this was likely to be mediated through nonverbal communication. Liz Losen's lab showed that -- or Liz Losen when she was in Tor's lab showed that the more similarity or trust somebody had for a clinician the lowest pain they experienced. And finally, Steve Anderson, a grad student with Liz Losen showed that racial concordance between the patient and the provider in a placebo context could reduce pain, particularly in black individuals. And this was also associated with reduced physiological outcome.
So just to summarize the second part on the role of the social context surrounding treatment. I've shown you that first impressions shape pain expectations. Stereotypes impact pain expectations and pain assessment. And that concordance can enhance treatment outcomes.
Finally, just to kind of make clear where I think the path forward is from this kind of social affect of neuroscience approach, I believe that further research on how social factors shape clinical outcomes including placebo effects in placebo analgesia can help us improve patient provider interactions, reduce health disparities in general and maximize beneficial patient outcomes. And that we need more work distinguishing between domain specific and domain general mechanisms of placebo in order to isolate general effects of the clinical context versus targeting disease-specific endpoints. And identifying these kind of domain-specific mechanisms and the features of both patients and providers can really help us address the goals of personalized medicine.
So with that, I want to thank the organizers again for the opportunity to present our work. And acknowledge my former post doc, Liz Netfek, my former PhD student, Troy Duline, my current post doc Allie Jao, and mention that we have positions available in my lab. Thank you.
TOR WAGER: All right. Wonderful. Thank you, Lauren. So that concludes the series of presentations for this webinar for today. But we're not done yet.
Now we're moving into a phase where we have a panel discussion. And so it's going to be very exciting. And we'll get a chance to sort of talk about some of your comments you brought up and other things.
So this is moderated by Carolyn Rodriguez and Alexander Talkovsky. So hi, thank you for doing this, and please lead us off.
Panel Discussion
CAROLYN RODRIGUEZ: Yeah, definitely. So it's my pleasure to do this with Alex. My name is Carolyn Rodriguez. I'm a professor at Stanford. And I see there has been a very lively Q&A already, and some of them are being answered. So maybe we'll just popcorn a little bit.
There is one question here which, you know, I think gets at what we have been presenting is a lot of human data. And so maybe it's just worth noting, are studies in animals free of placebo effect? And, Tor, I see you are typing an answer, but I don't know if you wanted to answer that.
TOR WAGER: Sure. Yeah, I just finished typing my answer. But yeah, it's a good discussion point.
I mean I think that one of the first studies of placebo effects was by Hernstein in 1965 in Science called Placebo Effects in the Rat I think it was called. And there's a resurgence, too, of modern neuroscience work on placebo effects in animals. Greg Corder is going to give a talk on this tomorrow as one of the group of investigators doing this.
So long story short, I think that there are conditioned or learned placebo effects. So pharmacological conditioning pairing with a drug cue or conditioning with place cues can change the response patterns of animals as well.
It's difficult to know what animals are expecting. But there is quite a bit of circumstantial evidence or other evidence from other places even from Robert Rescorla years back or from Jeff Schoenbaum that really used clever paradigms to suggest that animals, it's really a lot about the information value and that they are sort of expecting, you know, and predicting a lot more than we might at first assume.
So even in those conditioning paradigms there might be a lot of something very similar to what we call sort of internal or mental model or expectations that are -- that's happening. So that is my first -- others can jump in here and say more.
CAROLYN RODRIGUEZ: Thank you. Yeah, any other panelists -- panelists, feel free to just turn on your videos and we'll be sort of, you know, asking, anybody else want to weigh in on animals and placebo?
Go ahead, Dr. Atlas.
LAUREN ATLAS: I'd be happy to do so. So actually, there is a study I love from a former post doc who worked with me, Anza Lee, during her PhD that -- we haven't really talked about the roles of dopamine and opioids so far today, which is interesting because those often dominate our conversations about mechanisms of placebo. But Anza had a really lovely study in which she showed that dopamine was necessary for learning the association between a context and pain relief while opioids medullary receptor system was necessary for actually experiencing that pain relief. And so that is a really nice kind of disassociation between that learning development of expectation and the actual pain modulation.
So that was a really lovely place where I thought that the preclinical work had some really nice findings for those of us who are doing human studies.
CAROLYN RODRIGUEZ: Wonderful. Thank you. And I think there is still a day two, so stay tuned. There's -- I can see in the agenda there will be more on this.
But a question I think specifically for you was how does Naloxone influence the NPS? So if there's any -- I think you answered it, but if there's any additional things.
LAUREN ATLAS: I think that's a great question. And I actually don't know of any studies that have administered Naloxone and looked at NPS responses.
The Naloxone effects on fMRI responses in placebo, actually I think we may have -- I'll just say a bit of a final jury problem there. There are a lot of studies that haven't found effects. We really need everybody to kind of publish their data.
But I think we've shown that there are studies of opioid -- or there are effects of opioid analgesics. But I don't think we know anything about blocking the opioid system and its effect on the NPS. But that would be really interesting and important so that's a great suggestion and question.
CAROLYN RODRIGUEZ: Yeah, I look forward to it. That's a very, very exciting question.
I'm going to hop over to neuromodulation. Dr. Lisanby and Dr. Deng, I think you guys have already answered a question which I found fascinating about whether when you try and get the motor threshold, what -- like does that unblind people? So I loved your answer and I just wanted you guys to just say it out loud.
SARAH “HOLLLY” LISANBY: Yeah, thank you. I can start. And Zhi might want to comment as well. So as you may know, we individualized the intensity of transcranial magnetic stimulation by determining the motor threshold where we stimulate with single magnetic pulses over the primary motor cortex and measure a muscle twitch in the hand.
And this is real TMS. And we do real TMS for motor threshold determination regardless of whether the person is going to be getting active or sham in order to give them the same level of intensity and so on. And you might think, plausibly speaking, that this might unblind them if then you give them sham RTMS with repetitive pulses. It turns out that single pulses do not cause the same amount of scalp pain or discomfort that repetitive trains of stimulation can cause.
Also, the motor cortex is farther away from the facial muscles and facial nerves so there is less of a noxious effect of stimulating over the motor cortex. And because of these differences it is very -- a common occurrence that people think they are getting active RTMS even when they are assigned to sham.
Maybe Zhi may want to comment.
ZHI-DE DENG: No, I totally agree with that. The different protocols feel very different. So being non-naive to one protocol might not necessarily mean that you break a blind.
CAROLYN RODRIGUEZ: Wonderful, thank you so much. Dr. Deng, always appreciate your humor in your presentations so thank you for that.
We're going to move over -- Dr. Detke, I think you had messaged that you have a couple of slides that may address some of the questions. And particularly Steve Brennan had asked a question about COVID interference. And there was a question about excluding sites with unusual response patterns. So would love to hear more about that.
I think you are on mute, though. We'd love to hear you.
MICHAEL DETKE: There we go. I have one kind of interesting slide on COVID. It kind of doesn't -- it doesn't get directly at the placebo response.
Let me walk you through. It's a weird slide. Because we've been looking at slides all day that like from left to right is changing -- is the duration of the study or the treatment.
This is actually as you can see on the X axis is actual calendar months. And then focus at first on the blue line. The blue line is the ADCS-ADL which is a scale of activities of daily living. And there are actually questions in it about, you know, have you gone to the grocery store recently? Are you able to do that by yourself? Have you gone to, you know, attended doctor's appointments, things like that.
And the reduction from the -- from early 2020 to kind of the peak of the pandemic, this change of like five points or so, this would be kind of the biggest -- this is an Alzheimer's study -- and this would be the biggest drug effect in the history of Alzheimer's. And this changed back even faster of a similar actually slightly larger magnitude. It was also a huge change.
This is pooled drug and placebo patients. So there is nothing in here that tells you about drug effects or not. You can see this ADL was really impacted by the peak of COVID cases. And I'm actually surprised this came out as clean as it did because we had about 30% of our patients were in Europe, Italy, France, Spain. And as you may recall, the peak of cases there was at a different time than the U.S.
But I think the takeaway here is that obviously things like COVID can certainly impact assessment scales. And they are certainly going to impact scales that specifically say hey, have you gone to your doctor's office when you can't go to the doctor's office. Scales like that are going to be really more impacted obviously than, you know, maybe just -- and moods and things could be, too, obviously. That is one piece of data that I know COVID had a whopping effect on at least one scale.
As for the sites over time, there has been a lot that has been talked about and thought about, about, you know, excluding sites with high placebo response, excluding sites with low drug placebo separation. Of course, if you do that post hoc, it's certainly not valid. There's a banned pass approach where you exclude the extreme sites on both ends, high and low placebo response, is a somewhat more valid. But and my understanding from statisticians is that any of those things increase false positives if you are doing it post hoc.
The other thing to think about when you're thinking about site performance is, A, sites change over time. They have different raters, you know, that might be there for ten years or maybe ten months. And maybe the single most important point on this response is realize, you know, the average depression trial, 100 or 150 patients per arm, 80% power to see a separation. And it's really 50% power as Ni Khin has shown and others effectively.
Now imagine you are looking at a clinical trial site. They have ten patients, five per arm. What is the statistical power there? It's close to zero. And this -- so these are some data that my colleague Dave Dubroda at Lily put together a long time ago. Huge database of I think these were Prozac depression studies. And they had the same -- you know, over many studies and many of them went back to the same sites that performed well.
And as you can see here, the same slide, each chart is a site, a site that was in multiple different studies. And their performance over time and HAMD change was no different. This study is another study that just looks at these are different investigative sites in the same trial. And this is a little bit of a build, but you can see that this site and this site have virtually identical drug responses, the yellow bars. Sorry, that's supposed to be a little higher. They have almost identical efficacy response. But this one has a huge placebo response and that one has a tiny placebo response. Which is probably because they only had five or six subjects per site. And if you get just two or three huge placebo responders.
So trying to assess site performance in the context of a single trial is pretty hard just because of the Ns. And then so evaluating performance by sites is challenging. And then excluding them for reasons like high placebo responses is also challenging. So those are just a little bit of context on that.
CAROLYN RODRIGUEZ: Thank you. Yeah, appreciate that. Question for your colleague Dr. Khin, but maybe for everyone, right?
So there is a question that says isn't it difficult to say that a two-point difference on a 52-point scale is clinically significant? So I know a lot of slides we were trying to say this is going to be significant and what is the difference between, you know, these two scales. So at the end of the day we're, you know, wanting to help patients.
And so what can we say about a two-point change in significance?
NI AYE KHIN: So two-point change is the difference between drug and the placebo. So each individual might have ten-point change or 50% change depending on the individual response. And mostly drug approval is based on statistical significance.
So if there is a two-points difference between drug and placebo for, for example, Hamilton Depression Score, that's generally -- that's the approximate total point change between the two groups that most of the drugs get approved. So, of course, statistical significant changes basing -- we base for drug approval. But for in real world, we don't really know what clinically meaningful change or difference, right. So that's still an issue.
So Tiffany might be able to add more on this topic.
TIFFANY FARCHIONE: Yeah, I mean I can add a little bit. So in terms of like the depression studies, again, those were conducted before our sort of what we do now.
Like if we have a new indication, a new endpoint, something like that, we're going to ask companies to give us an a priori definition of clinically meaningful within patient change. And we're looking, like Ni said, at the difference for an individual. Not the difference between the drug and placebo. But what matters to patients. How much change do they need to have.
And then they can have that -- they can power their study to see some amount of difference that they think matters. But ultimately we have them anchor their studies to, you know, things like global assessments of functioning. We have sponsors if they are using new endpoints do qualitative work so that we can understand what that change means on that given scale. There is a lot of additional work that goes into it now. But yeah, it's the within patient change, not the between group changes that ultimately matters the most.
CAROLYN RODRIGUEZ: Thank you so much. I felt like it was worth saying out loud. And, Dr. Farchione, I know you've done a lot of wonderful work. I heard you speak at ACNP about kind of more global measurements of functioning and really thinking about patients more globally, right. You can change a little bit on a scale, but does that translate into life functioning, work function, these are the things that we care about for our patients. So thank you both for that.
I see Dr. Rief wants to weigh in and then Dr. Lisanby.
WINFRIED RIEF: Just a small little point. The more the question has to be asked about the benefit harm ratio. And it is an important issue and very good that the question was asked. If the difference is just two points, we have to compare it with the risk and potential side effects. It's not only that we can focus on the benefits.
TIFFANY FARCHIONE: We always compare it to the risk regardless of the size of that difference.
CAROLYN RODRIGUEZ: All right. Dr. Lisanby.
SARAH “HOLLY”LISANBY: So this is an opportunity to talk about outcome measures.
CAROLYN RODRIGUEZ: Yes.
SARAH “HOLLY”LISANBY: And how sensitive they are to the intervention and also how proximal they are to the intervention with respect to mechanism. These are some points that Dr. Farchione raised in her talk as well. In psychiatry, the degree to which we can have outcome measures that are more proximal to what our intervention does to engage mechanisms, this might help us be able to measure and differentiate active treatment effects versus nonspecific placebo effects.
And this is part of the rationale of the research domain criteria or dot-research platform to try to look at domains of function. To look at them across levels of analysis and have measurements that might not just be a clinical rating scale. It might be a neurocognitive task that's related to the cognitive function that might be the target of a therapy or a physiological measure that might be an intermediate outcome measure.
So I was hoping we might generate some discussion on the panel about regulatory pathways for these other types of outcome measures and how we might think about selecting outcome measures that may be better at differentiating real treatment effects from nonspecific placebo effects.
CAROLYN RODRIGUEZ: Thank you. I see Dr. Wager, I don't know if you had something to add onto Dr. Lisanby's point or if you had a separate question.
TOR WAGER: I would like to add on to that, if I may.
CAROLYN RODRIGUEZ: Okay. Yeah, of course.
TOR WAGER: I think that's a really important question. I'd love to hear people's opinions about it. Especially the FDA, you know, Tiffany's perspective on it.
Because for me to add to that, I just was wondering how strongly the FDA considers pathophysiology in mechanism of action and what counts as mechanism of action. So there are certainly certain pharmacological changes and cellular level changes that obviously seem to matter a lot. But what about fMRI, EEG, other kinds of indirect measures, do they count, have they counted as mechanistic evidence?
TIFFANY FARCHIONE: Yeah, so they haven't counted yet. And in part because we just don't have either so far an EEG, fMRI, we see group differences but those aren't the kinds of things that can help predict something for an individual patient.
It just goes back to the whole point about understanding pathophysiology and being able to, you know, not just describe that this drug works on this receptor but also working on this receptor has that relationship downstream to X, Y, and Z effects. And in a clinically meaningful way.
I think ultimately a lot of the things we do in terms of our biomarker qualification program and things like that, understanding not just that a drug has some action or interacts with some sort of biology but in what way and what kind of information does that give you that can help inform the trial or help inform, you know, your assessment of drug effect. That's also important. We're a long way off from being able to put things like that into a drug label I would say.
CAROLYN RODRIGUEZ: All right. Dr. Lisanby.
SARAH “HOLLY” LISANBY: I certainly agree with Dr. Farchione's comments.
And I would like to talk for a moment about devices. And there might be different -- there are different regulations and different considerations in drug design versus device trial design. And we are already at a stage in the field of devices where individual physiology is on the label. And that is the case with the Saint technology where individual resting state functional connectivity MRI is used to target on each patient basis where to put the TMS coil.
And I would say that we -- the jury is still out about, you know, studies that unpack Saint to show where that individualized targeting is essential or whether it's the accelerated intermittent data burst and the ten treatments a day and so on.
Regardless, it is on the label. It's in the instructions for how to use the product. And so I think that that might be a sign of where things may be going in the future. And when we think about the way focal brain stimulation is administered, whether it's non-invasive or surgically implanted, we're targeting circuits in the brain. And being able to measure the impact of that targeting stimulation on the functioning of that circuit, EEG or fMRI might be the right readout and it might give some evidence.
I think even still, though, those measures which may be useful in identifying treatments and optimizing their dosing, ultimately I understand from my FDA colleagues that we'll still need to demonstrate that intervention, whatever it is, improves the quality of life and the clinical aspect for those patients.
But it may be an important part of getting the treatments to that phase where they could be reviewed by FDA.
CAROLYN RODRIGUEZ: Thank you so much. That's a good point. Anyone else to contribute to that? I don't see any other hands raised.
Maybe I'll pass it to Dr. Talkovsky and see if there are any other questions that you see on the Q&A that we could continue to ask the panel.
ALEXANDER TALKOVSKY: Yeah, there was one that jumped out to me a bit earlier. There was a bit of a discussion about warmth and competence as well as a perceived tradeoff between the two. And also some ideas about manipulating them as experimental variables that I thought was interesting. I saw, Dr. Rief, you had jumped into that discussion, too.
I thought that was an important enough topic that would be worth spending a little bit more time here in the group discussion making sure that everybody sees it. So I'll throw it back to you, Dr. Rief.
If you could maybe even elaborate on the answer you gave in there about warmth and competence and those as experimental variables, too.
WINFRIED RIEF: The major point I want to make is that we have to control these variables. If we don't control them, we risk they are different between the two or three arms in our trials. Then we cannot interpret the results. That means we have to assess it and we have to make sure that they are comparable between the different treatments. But this is something I can really recommend, I think it makes a lot of sense. There are other points I'm not sure what to recommend. Some people suggest limit, minimize warmth and competence to minimize potential placebo effects. This is the point where the tradeoff comes into the game. If we minimize warmth and competence, people are not motivated to participate and they might discontinue treatments and they are not willing to cope with side effects.
But if we maximize warmth and competence, we risk that placebo effect is bolstering everything. So at this level, at this stage I would say let's try to keep it in an average level. But really assess it and make sure that it's comparable between the different treatment arms.
ALEXANDER TALKOVSKY: Dr. Atlas, I see your hand up.
LAUREN ATLAS: Yeah. I love this question because I think it depends what the goal is. So if the goal is to reduce placebo to find the best benefit of the drug, then yes, you know, in clinical trials when people never see the same rater, for instance, that reduces the likelihood of building relationship. And there's all these different kinds of features that if you really want to minimize placebo then we can use these things in that way.
On the other hand, if the goal is to have the best patient outcomes, then I think we want to do the exact opposite and essentially identify exactly how these features improve patient's wellbeing and heighten them. And so I think really that is part of why I think talking about placebo is so fascinating because it both tells us how to improve patient outcomes and then also reduce them in the context of trials. So I think it really depends kind of what context you're talking about.
ALEXANDER TALKOVSKY: Dr. Rief.
WINFRIED RIEF: Yeah, may I just add a point. Because I missed it and Lauren reminded me to that point.
Most of us assume that we have to reduce the placebo effects to maximize the difference between placebo and drug effects. And this is an assumption. This is not something that we really know. That means -- and we have studied -- for instance, have seen studies in antidepressants and SSRI. We know studies for analgesics. If you reduce the placebo mechanisms to minimum then you are not able to show a difference to the drug afterward because the drug effects are reduced.
In other words, a good drug needs some minimum of placebo mechanisms to show its full action. Therefore, the assumption that minimizing placebo mechanisms to increase the difference between placebo and drugs is an assumption that we have to be concerned about that. And maybe for some drugs it's much better to have kind of average amount of placebo mechanisms.
ALEXANDER TALKOVSKY: Dr. Wager, let's go to you. Then I think we have another question that we want to tackle in the chat after you wrap up.
TOR WAGER: Yep, that sounds good. I see it, too. But just to weigh in on this. Because I think this is one of the most important issues to me. And I think Winfried also just wrote a review about this. And there have been a couple of others. Which is that there is always this tendency to want to screen out placebo responders. It doesn't seem to work very well most of the time in clinical trials.
And if you have a synergistic interaction over additive interaction between an active drug element and a placebo factor motivation or expectation, then screening out -- that is when screening out placebo responders also screens out the drug responders.
And so I think there is this opportunity to test this more, to test, you know, jointly the effects of active treatments whether it's neuromodulation or drugs or something else. And factors like expectations or perceived warmth and competence of the care provider.
So I guess I'm wondering if in the neurostimulation world are there many studies like that or any studies like that because they seem to be very separate worlds, right? You either study the device or you study the psychosocial aspects.
SARAH “HOLLY”LISANBY: Well, I can and maybe others can as well. It's a good point. Lauren, your talk was really beautiful. And my take-home point from that is in a device trial even if we're not studying the effect of the device operator, the effect is occurring in the trial.
And so measuring these aspects of the whole context of care I think can help us sort that out. And in order to do that, I think it could be helpful for investigators who are designing device trials to partner with investigators who have that expertise. Also in terms of the expertise, I was listening very carefully to the talks about psychosocial interventions and maybe the ancillary effects of the procedure is like a psychosocial intervention that we might benefit from having mixed methods approaches that pull from both fields to really better understand what we're doing.
And then there are also trials that use drugs and devices together. So being able to have cross-pollination across the fields I think would be very useful both with respect to our selection of measures to test the integrity of the blind as well as looking at expectancy and even measuring anything about the provider which is usually not done I would just say for device studies. We're usually not even reporting anything about the provider or the perceptions of the subject about the context of their care.
CAROLYN RODRIGUEZ: I wanted to also jump in, in terms of, you know, just in terms of topics. For psychedelic assisted therapy, Harriet DeWitt has a very good question here in terms of commenting about special considerations and testing of placebos. This is something that has come up a lot. And Boris Heifets, among others, has, you know, really gotten us to think about different kinds of designs to disguise the effects of ketamine, for example, with general anesthesia. There's other designs. But questions around the space.
So how important is it when you have a very active placebo that can have empathogenic effects or psychedelic effects in terms of the placebo effect?
TIFFANY FARCHIONE: Yeah, I figure I should probably jump in on this one first.
So, you know, I will say that when it comes to the psychedelics whether it's a classic psychedelic like psilocybin or if it's the empathogen or tactogen types like MDMA, blinding is practically impossible. Folks know if they are on active drug or a placebo. And that makes it really challenging to have an adequate and well-controlled study, right?
On the one hand, we still need to have placebo-controlled studies so that we can get a fairly -- as accurate as you can get assessment of safety of the drug. On the other hand, we've really been struggling trying to figure out what is the best design. Trying to add some kind of an active comparator, you know, choosing something that might mimic some aspect of the psychedelic effect without actually having a treatment effect of any kind is next to impossible. People still know. You know, you've talked about anything from niacin or benzos, a little bit of this, a little bit of that. They know. They just know.
So the best that we've come up with so far is asking for at least one placebo-controlled study so we can get a clear idea of safety. And we've suggested trying to use complementary designs. For instance, you know, it is still possible to have a dose response study serve as an adequate and well-controlled study. Then there is no placebo there. If you can see that a lower dose, mid dose, high dose, if there is a linear increase in treatment effect in that kind of a study, that is helpful to us. If we have -- one of the other things we ask for is to have some assessment of, you know, like an unblinding questionnaire. Do you think you got the active drug? Yes or no. Do you think you got placebo?
And then one of the things we're starting to ask for now in addition to that is not just assessment at the end of whether folks thought they were on active drug or not, not just from the patient but also from the raters trying to see. Because a lot of times the raters can figure out what the person was on, too, so that could introduce some bias.
Now we're starting to think about asking for like a pre-dose expectancy questionnaire of some kind. And so even if we can't necessarily control for the unblinding issues and the expectancy and everything, at least we can try to -- we can have more data to assess the impact on the study and use those as maybe, you know, covariants in the analyses. But yeah, we don't have the right answer yet. We are learning as we go and we are learning very rapidly.
CAROLYN RODRIGUEZ: That may be a plug for NIMH to do like another -- this placebo panel is amazing. We could keep going. I see we have nine minutes left. I'm going to pass it back to Dr. Talkovsky.
And but I know Dr. Lisanby and Dr. Wager have their hands up so I'll pass it back to Alex.
ALEXANDER TALKOVSKY: Thank you. Because we're short on time, with apologies, Dr. Lisanby and Dr. Wager, there is a question I want to address from the Q&A box that I saw a couple of our panelists already addressed in text but seems worth bringing up here as a group.
Are we confident that the placebo effect and specific affect are additive and not interactive?
LAUREN ATLAS: So I'll just -- can I -- oh, sorry.
CAROLYN RODRIGUEZ: Dr. Atlas, yes, that was quick. You won the buzzer.
ALEXANDER TALKOVSKY: Yes, start us off.
LAUREN ATLAS: I had already responded and was putting something in the chat kind of addressing the dose in the same context.
So basically one approach for testing additivity is to use the balanced placebo design so people receive drug or control and that is crossed with instructions about drug administration. So basically people receive the drug under open administration and they also receive placebo. And they receive the drug when they believe they are not getting treatment leading to hidden administration.
And this has been tested with nicotine effects on -- so nicotine, caffeine. We've done it in the context of remifentanil. There has been a couple other trials of different analgesics. It was really developed in the context of studies of alcohol.
We found, for instance, that depending on the endpoint, we have different conclusions about additivity. So when it came to pain, we found additive effects on pain. But we found pure drug effects on neurologic pain signature responses during remifentanil regardless of whether people knew they were receiving the drug or not. We found interactions when we looked at effects on intention.
And other groups, Christian’s group, has found interactions when they did the same exact trial but used lidocaine. And then furthermore, this is what I think we were just talking about in the context of doses. If people have unblinding at higher doses then there is going to be less of an effect of the context surrounding it. So expectations could grow with higher drug effects.
So I think that the question of additivity or interactions really may depend on the dose, the specific drug, and the specific endpoint. I don't think we can really conclude that.
And so even though doing balanced placebo designs do require a level of deception, I think there is really an urgent need to kind of understand how expectations combine with drugs to influence outcomes.
So yeah, I'm really glad somebody asked that question.
CAROLYN RODRIGUEZ: Thank you, Dr. Atlas. I just want to acknowledge Dr. Cristina Cusin who is the other cochair for the panel. She's on, and I want to be mindful of the time and make sure that she and Dr. Wager have the final words or thoughts or if you want to give the panelist the thoughts.
But we wanted to just pass it back to you so you have plenty of time to say any of the things that you wanted to say to wrap things up.
CRISTINA CUSIN: I will leave to Tor if he has any concluding remarks. My job will be to summarize the wonderful presentation from today and do a brief overview of the meeting tomorrow. It was amazing.
TOR WAGER: Since we have a few minutes left, I would like to go back to what Holly was going to say. We have about five minutes. I'd love to use that time to continue that conversation.
SARAH “HOLLY”LISANBY: I'm assuming that you're referring to the psychedelic question. I agree there is no perfect answer to that and it's very complicated. And there are different views on how to address it.
One of my concerns is therapist unblinding and the potential impact of therapist unblinding on the therapy that is being administered. And because as we've heard, it's very likely that the patient receiving a psychedelic intervention may be unblinded. So might the therapist because they know what a patient going through psychedelic assisted therapy typically experiences.
And one thought I have about that could be to measure the therapy, record it, quantify adherence to the manual. At least document what is going on in the therapy interaction. That would give you some data that might help you interpret and better understand whether therapist unblinding is impacting the psychosocial aspects of the intervention because we do -- we've heard from the field that the setting and aspects and context of the use of the psychedelic are an important part. So let's measure that, too.
TOR WAGER: It's really interesting. I want to note there is another -- Boris Heifets has put in the chat there is something that is a different take.
There might be more things to discuss about whether it's possible to blind these things in some ways and some diversity of opinions there. But you can see the chat comment and we can think about that.
I have one other question about that which is that to me I understand the unblinding problem and that seems to be something we're all really concerned about. What about what you call a sensitivity analysis type of design which is if you can independently manipulate expectations or context and maybe some of these other kinds of drug manipulations that induce another kind of experience, right, that is not the target drug, then you can see whether the outcomes are sensitive to those things or not.
So for some outcomes, they might -- it might not matter what you think or feel or whether you had a, you know, crazy experience or not. And if it doesn't, then that is ignorable, right? So you can manipulate that independently. You don't have to blind it out of your, you know, main manipulation. Or it might turn out to be that yes, that outcome is very sensitive to those kinds of manipulations. So I was wondering what you think about this kind of design.
TIFFANY FARCHIONE: I'm not quite sure that I followed that entirely.
TOR WAGER: Yeah, it's really like so you have one that is the psychedelic drug and you don't unblind it. But then you do an independent manipulation to try to manipulate the non-specific factors. If it's, you know, having a, you know, sort of unique experience or having a -- yeah, or just treatment expectations.
TIFFANY FARCHIONE: I guess that's the piece I'm not quite understanding because I'm not sure what you would be manipulating and how you would accomplish that.
TOR WAGER: In the simplest way, the expectation piece is simpler because you can induce expectations in other ways as well, right? By, you know, giving people suggestions that it's going to really impact them. Or, for example, a design that we've used is to say okay, everyone is -- you know, if you get this drug it's going to make you, I don't know, you know, it's going to give you these sort of strange experiences. But if it gives you these experiences, that means it's not working for you, that's bad. Another group you say this is a sign that it's working.
So you take the subjective symptoms and give people different instructions that those are going to be either helpful or harmful and see if that matters.
TIFFANY FARCHIONE: Yeah, I mean I think if you are giving different people different instructions now you are introducing a different source of potential variability so that kind of makes me a little bit nervous.
I guess what I would say is that if somebody had, you know, some sort of creative problem solving approach to dealing with this, I'd love to hear about it. I would love to see a proposal and a protocol. I would say it's probably best to do in an exploratory proof of concept way first before trying to implement a bunch of fancy bells and whistles in a pivotal study that would try to support the actual approval of a product.
But again, because we're learning as we go, we do tend to be pretty open to different design ideas here and different strategies. You know, as long as people are being monitored appropriately because that piece we don't really budge on.
CAROLYN RODRIGUEZ: I see we're at time. Maybe give Dr. Lisanby the last word. Maybe just some food for thought is that maybe it would be nice to have a toolkit to help clinical trialists have some considerations about how to minimize placebo effects would be something nice. Wish list.
SARAH “HOLLY” LISANBY: Yeah, and I just wanted to add to that last question that this is part of why we're sponsoring this workshop. We want to hear from you what are the gaps in the field, what research needs to be done.
Because we are interested in developing safe and effective interventions, be they psychosocial, drug, device or some combination.
And in the research studies that we support use placebos or other forms of control. We're interested in hearing from you where the research gaps are. What sort of manipulations like, Tor, you were talking about, manipulating expectation, to figure out how to do that. All of that is really interesting research topics. Whether that is the design of a pivotal trial or not, doesn't necessarily need to be that.
We're interested in mapping that gap space so we can figure out how to be most helpful to the field.
TOR WAGER: That's a great last word. We still have tomorrow to solve it all. Hope you all join us tomorrow. Looking forward to it. Thank you.
(Adjourned)