Article Text
Statistics from Altmetric.com
Introduction
In healthcare, decision-making must always have the patient and the public at the centre. In psychiatry, there is little consensus regarding optimal ways to measure outcomes, but prioritising patient relevance is crucial for enhancing the evidence base of psychiatric interventions and, ultimately, improving the mental health outcomes of psychiatric patients.1 This paper examines critical challenges in selecting and measuring patient-important outcomes in psychiatric trials across diverse age groups, including children, adolescents, adults and older adults. We describe four general problems concerning outcome selection and measurement in psychiatric trials and propose solutions to these problems. Throughout the article, we use examples from research on major depressive disorder, schizophrenia and autism spectrum disorder, although we hypothesise that the methodological issues concern most other psychiatric disorders.
Problem 1: excessive reliance on symptom scales
Psychiatric symptoms are unquestionably important to psychiatric patients. Still, the way they are measured may not reflect this importance, and the suppression of symptoms may also come at the cost of other outcomes, such as general functioning. Many different symptom scales are used across psychiatric trials to assess symptom severity. For example, we previously evaluated 182 trials of psychological interventions published in major psychiatry journals and found that 77% of the primary outcomes were symptom scales.2
It is essential to establish a quantifiable minimal important difference (MID) when using symptom scales as outcome measures. The MID is the smallest value benefit to patients and is a patient-centred concept.3 4 However, it remains unclear whether any method currently suggested can produce an estimate that corresponds to real-world benefits.
The Hamilton Depression Rating Scale (HDRS) is the most commonly used depression symptom scale.5 However, the MID on the HDRS has been heavily debated and must be considered unclear.5 6 Further, the psychometric properties of the HDRS have also been questioned,5 including whether the scale should be considered an ordinal scale rather than an interval scale.6
The Positive and Negative Syndrome Scale (PANSS) is considered to be the gold standard for assessing symptom severity in schizophrenia in research and clinical settings.7 A wide range of different MIDs of PANSS (often improvement from baseline differences) has been suggested, from 4.3% to 31.5%.8 9 Hence, the MID for PANSS must be considered unclear. The scale has also been criticised for its inability to differentiate negative symptoms from depression and extrapyramidal adverse effects and that it is ambiguous concerning the cognitive items.7
The Autism Diagnostic Observation Schedule (ADOS) scale is a commonly used scale assessing social interaction, communication, play and imaginative use of materials for individuals suspected of having autism spectrum disorders.10 The MID of the ADOS scale is also unknown, and its diagnostic accuracy has been questioned.10 A numerical ADOS improvement will be demonstrated, for example, if the individual can sustain eye contact for extended durations. However, it is questionable if every individual’s quality of life will be correlated with the degree of eye contact—it is not necessarily beneficial for the individual to sustain eye contact (perhaps the individual prefers not to). The three examples above are not unique. The patient’s importance and interpretation of symptom scale scores are most often uncertain.
The two most commonly used methods to determine MIDs in clinical trials are anchor-based methods and distributional-based methods. Anchor-based methods relate the change in a score to another scale, which is used as an ‘anchor’. A systematic step-by-step approach may be used to select an optimal, anchor-based MID from various MID estimates of a given scale.3 However, if anchor-based methods are employed, the MIDs are typically derived either from extrapolations of within-patient (intra-individual) changes to between-patient (group) differences (the most common approach) or from estimations based on group differences among patients who share the same diagnosis but exhibit clinically distinct levels of impairment or disability.4 Both approaches fundamentally involve extrapolation and do not directly estimate MIDs that correspond to the differences between two groups in a clinical trial. Further, the MID associated with the anchor itself is often uncertain, thereby calling into question the validity of the MID quantification for the symptom scale under consideration.
Distribution-based methods are based on the sample’s statistical characteristics, for example, a SD, essentially with no connection to the patient’s symptoms, well-being or quality of life. We are unaware of any symptom scale with a well-established unquestionable MID. Hence, more research is still needed about what patients think is the smallest worthwhile effect.
We have previously described additional problems when symptom scales are used as outcomes, including problems with lack of blinding and large proportions of missing outcome data.1 2 11
Problem 2: short follow-up timeframes
A recent network meta-analysis of 21 commonly used antidepressants for major depressive disorder12 focused on outcome data as close to 8 weeks follow-up as possible (within 4 to 12 weeks). Longer-term data was only available in 28 of the 522 included 522 trials (16 to 36 weeks).12 In two systematic reviews with meta-analysis of selective serotonin reuptake inhibitors13 and tricyclic antidepressants14 for major depressive disorder, there was a notable scarcity of data available for analysis after the 12 week post-randomisation mark. However, the common clinical practice is to prescribe antidepressants for much longer periods. For example, half of patients on antidepressants in England and 70% of patients in the USA have used them for more than 2 years.15
Similar trends can be observed in schizophrenia trials. In a recent systematic review focusing on the long-term effects of antipsychotics for schizophrenia, inclusion was restricted to trials longer than 6 months in duration. Only 45 trials with a mean duration of follow-up in the included trials of 46 weeks were found.16 These durations appear to be insufficient, considering that clinical guidelines recommend at least 1 to 2 years of antipsychotic treatment after symptom remission of an acute episode and warn about the risks of relapse associated with treatment discontinuation.17
Research on autism spectrum disorder also demonstrates the issues of short follow-up timeframes. In a recent systematic review of parent-mediated interventions for children with autism, the mean follow-up duration was only 24 weeks. Only one trial (48 children were randomised) out of 30 trials assessed outcomes 2 years after randomisation. Since autism spectrum disorder is considered a chronic disorder, substantially longer observation periods are warranted.
Psychiatric trials most often focus on short-term effects, which is problematic due to the long-term and often chronic use of psychiatric drugs in many psychiatric conditions.
Problem 3: inadequate reporting of adverse effects
Beneficial and harmful (or adverse) effects should be evaluated for any intervention with the same degree of attention. In most trials, efficacy is carefully and systematically examined, while adverse effects are assessed inadequately using spontaneous reports or inadequate proxies (such as dropout rates) known to underestimate adverse effects. A previous network meta-analysis of antidepressants assessed overall dropout rates and dropouts due to adverse effects as measures of ‘acceptability’ and ‘tolerability’, respectively, whereas serious and non-serious adverse effects and events were not assessed.12 Further, in two systematic reviews of selective serotonin reuptake inhibitors and tricyclic antidepressants for major depressive disorder, there were only very sparse data for suicides, suicide attempts and suicidal ideation, and the certainty of evidence for these outcome results was very low.13 14
In two large and influential systematic reviews of antipsychotics for schizophrenia, all-cause discontinuation, weight gain, use of antiparkinsonian drugs, serum prolactin level, corrected QT-interval prolongation, akathisia and sedation were outcomes, but the authors did not assess (or report) suicides, suicide attempts or serious or non-serious adverse effects.16 18 In a recent meta-analysis of parent-mediated interventions for children and adolescents with autism spectrum disorder, only two of the included 30 trials assessed adverse events.
Problem 4: heterogeneity in outcome selection across trials
Previous psychiatric trials have used a wide range of outcomes, which makes it difficult to synthesise data in systematic reviews. For example, one scoping review assessed the heterogeneity in outcome selection in adolescent major depressive disorder. It showed that out of 32 trials assessing the effects of interventions for the treatment of adolescent major depressive disorder, 86 unique outcome terms were assessed and 118 different outcome measurement instruments were identified.19
In schizophrenia research, six different symptom scales have commonly been used to assess schizophrenia symptoms, including Present State Examination, Brief Psychiatric Rating Scale, Scale for Assessment of Positive Symptoms, Scale for Assessment of Negative Symptoms, PANSS or Clinical Global Impression-Scheme.20
Similarly, many symptom scales exist for autism spectrum disorders, for example, the Autism Diagnosis Interview–Revised (ADI‐R), ADOS, Childhood Autism Rating Scale (CARS) or CARS–Second Edition (CARS‐2).21 Additionally, a given symptom scale can have different types of respondents (eg, treatment providers, independent investigators, parents, teachers or the participants themselves). When heterogeneous symptom scales and respondents are used across trials, it threatens data synthesis on a meta-analytical level.
Future directions and solutions
Although it is relevant to assess the level of symptoms in psychiatric trials, the question is whether these symptom scales should be primary outcomes.1 2 To overcome the problem of excessive reliance on symptom scales in psychiatric trials, hard outcomes can be used as alternatives. Hard outcomes are often binary patient-important outcomes that are conclusive regarding the progression of the condition, demonstrating a patient’s feelings, functionality or survival (eg, suicide or suicide attempts, hospitalisations, employment status, convictions, on social assistance due to unemployment or disability, completing school).22 Defining hard outcomes as primary outcomes may solve some of the problems mentioned above as the patient importance is unquestionable, the results are objective and simple to interpret and the risk of missing outcome data is reduced (using registries often leads to 100% follow-up data).1 2 However, using rarer but clinically important hard outcomes such as suicides, suicide attempts and hospitalisation requires larger sample sizes.1 If sample sizes are not feasible to achieve, then defining composite outcomes could reduce the required sample size, or alternative statistical methods might be considered.23
Ideally, all trials would incorporate long-term follow-up (years) into their trial designs to assess the long-term benefits and harms following treatment regardless of the treatment length. Trialists must carefully consider follow-up timeframes that are most important to patients for each outcome, and systematic review authors should include in their protocols relevant follow-up periods, regardless of the available body of evidence, to elucidate evidence gaps. However, there are presumed barriers to long-term trials, including increased expenses and risk of missing outcome data if registry data are not used.
The medical principle ‘first do no harm’ (or primum non nocere) reminds clinicians to first consider the possible harm that any intervention might do. An objective way of assessing serious adverse events in trials and systematic reviews could be to rely on the definition by the International Conference on Harmonisation of technical requirements—Good Clinical Practice (ICH-GCP) guidelines.24 Furthermore, by adhering to the CONSORT extension for reporting harms in randomised clinical trials, trialists could significantly improve the evidence for balancing benefits and harms in psychiatry.25 Importantly, the assessment of adverse effects in trials should not be based on spontaneous reports but should be systematically evaluated for all trial participants.
Core outcome sets (COS) can be developed to establish a minimum list of patient-important outcomes for a specific patient group. The Core Outcome Measures in Effectiveness Trials (COMET) initiative promotes creating and implementing core outcome sets.26 On 15 February 2024, we searched the COMET initiative database for published COS projects within the category ‘mental health’. The search gave 39 hits and showed that only a few finalised COSs are available for common psychiatric disorders (see online supplemental file 1). Future COSs should take into account the problems we have raised in this paper.
Supplemental material
The methodological quality of psychiatric trials remains limited by excessive reliance on symptom scales, brief follow-up timeframes, inadequate reporting of adverse effects and heterogeneous outcome selection. Addressing these factors in psychiatric trials is essential if we are to strengthen the mental health of psychiatric patients.
Ethics statements
Patient consent for publication
Ethics approval
Not applicable.
References
Footnotes
X @joannamoncrieff, @markhoro, @HengartnerMP
Contributors SJ and JCJ wrote up the first manuscript draft. All authors significantly contributed to the writing and commented on the manuscript. All authors read and approved the final version of the manuscript.
Funding This project was supported by the Independent Research Fund Denmark (grant number 3166-00024B). The funding source had no role in the writing of the manuscript or the decision to submit it for publication.
Competing interests JM and MH are collaborating investigators on the NHMRC- and MRFF-funded RELEASE and RELEASE+ trials in Australia investigating hyperbolic tapering of antidepressants. MH is a co-founder of Outro Health, a digital clinic which aims to help people who wish to stop no longer needed antidepressant medication in North America. MH has received honoraria for lectures on deprescribing from NHS Trusts, Washington University and the University of Arizona. JM is a co-investigator on a National Institute of Health Research (NIHR) funded study exploring methods of antidepressant discontinuation (REDUCE) and the Chief Investigator on the RADAR trial of antipsychotic reduction and discontinuation funded by the NIHR. She collects royalties from three books on psychiatric drugs. ZS is a board member of the Canadian mood and anxiety disorders treatment guidelines. LM receives consulting fees from Astra Zeneca, Bayer, and WHO. All other authors declare that they have no competing interests.
Patient and public involvement statement Patients or the public were not involved in the design, conduct, reporting or dissemination plans of this paper.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.