Durable response rate as an endpoint in cancer immunotherapy: insights from oncolytic virus clinical trials

Background Traditional response criteria may be insufficient to characterize full clinical benefits of anticancer immunotherapies. Consequently, endpoints such as durable response rate (DRR; a continuous response [complete or partial objective response] beginning within 12 months of treatment and lasting ≥6 months) have been employed. There has not, however, been validation that DRR correlates with other more traditional endpoints of clinical benefit such as overall survival. Methods We evaluated whether DRR was associated with clinically meaningful measures of benefit (eg, overall survival [OS], quality of life [QoL], or treatment-free interval [TFI]) in a phase 3 clinical trial of an oncolytic virus for melanoma treatment. To evaluate the association between DRR and OS and to mitigate lead time bias, landmark analyses were used. QoL was evaluated using the FACT-BRM questionnaire (comprising the FACT-BRM Physical, Social/Family, Emotional, and Functional well-being domains, the Additional Concerns, Physical and Mental treatment-specific subscales, and the Trial Outcome Index [TOI]). TFI was defined as time from the last study therapy dose to first subsequent therapy dose (including any systemic anticancer therapy for melanoma after study therapy discontinuation). Results Four hundred thirty-six patients were included in the intent-to-treat population. Achieving DR was associated with a statistically significant improvement in OS in a landmark analysis at 9 months (HR = 0.07; P = 0.0003), 12 months (HR = 0.05, P < 0.0001), and 18 months (HR = 0.11; P = 0.0002) that persisted after adjusting for disease stage and line of therapy. Achieving a DR was associated with a longer median TFI (HR = 0.33; P = 0.0007) and a higher TOI improvement rate (58.1% versus 30.0%; P = 0.025). Conclusions Achieving a DR was associated with clinical benefits such as improved OS and QoL and prolonged TFI, thus supporting the usefulness of DR as a meaningful immunotherapy clinical trial endpoint. Trial registration ClinicalTrials.gov identifier, NCT00769704 (https://clinicaltrials.gov/ct2/show/NCT00769704) October 7, 2008 Electronic supplementary material The online version of this article (doi:10.1186/s40425-017-0276-8) contains supplementary material, which is available to authorized users.


Background
A growing body of clinical evidence indicates that traditional response criteria may not be sufficient to completely characterize the full spectrum of activities of current anticancer immunotherapies and/or define the clinical benefit associated with their use. Although a number of immunotherapies have recently been approved based, in part, upon tumor responses evaluated by traditional response criteria (eg, Response Evaluation Criteria in Solid Tumors [RECIST] [1,2] or World Health Organization [WHO] criteria for response [3]), these criteria may not accurately capture the full benefits of immunotherapy-based drugs or regimens. In particular, these measures of response lack a durability dimension, and thus therapeutic benefit may be overestimated. Conversely, because responses may occur after an initial increase in tumor burden or appearance of new tumor lesions characterized as progressive disease by WHO criteria or RECIST [4][5][6][7][8][9][10][11], therapeutic benefit may be underestimated. Consequently, allowing for disease progression before response is critical for evaluation of immunotherapies because initial tumor growth may not always be indicative of treatment failure [4,11].
Assuming response criteria that allow for progression prior to response are used, a conventional analysis is to evaluate duration of response for responders; however, no prospective criteria are designated for interpreting the clinical relevance of this information at the patient level or for treatment effect in a controlled study. To this end, the endpoint of durable response rate (DRR) has been proposed, which includes standard response criteria and a prospective durability dimension. Although DRR has not been validated as a surrogate endpoint for clinical benefit, anticancer agents have been approved based on a demonstrated ability to achieve retrospectively defined durable responses (DRs) [12][13][14]. With the allowance for progression prior to response, we hypothesized that DRR is an appropriate measure of the ability of immunotherapy anticancer agents to mediate a clinically meaningful, sustained reduction in tumor burden that may be associated with measures of clinical benefit such as prolonged overall survival (OS), improved quality of life (QoL), and freedom from successive anticancer therapies. This is especially important for immunotherapy-based regimens since the kinetics of response may be different when compared to cytotoxic chemotherapy or targeted therapy that mediate antitumor activity directly in tumor cells.
For this analysis, we used prospectively collected clinical data from the OPTiM trial, a randomized phase 3 study that compared intralesional talimogene laherparepvec, a genetically modified herpes simplex virus (HSV) type-1, designed to selectively replicate in tumors and produce human granulocyte macrophage-colony stimulating factor (GM-CSF) to subcutaneous GM-CSF, to evaluate whether DRR is associated with measures of clinical benefit, such as OS and QoL or treatment-free interval (TFI) after therapy. Talimogene laherparepvec is thought to mediate antitumor activity, at least in part, through activation of systemic immune responses and initial results from previous clinical trials demonstrated delayed kinetics of response similar to observations from T-cell checkpoint inhibitor trials [8,15]. DRR, defined as a continuous response (complete response [CR]/partial response [PR]) beginning in the first 12 months of treatment and lasting 6 months or longer, was the primary endpoint in OPTiM. Although results from the OPTiM study have been reported previously [15], this analysis was undertaken to further validate the DRR endpoint by determining if DR is associated with other measures of clinical benefit. These data have implications for incorporating DRR as a regulatory endpoint in tumor immunotherapy clinical trial designs.

Study design
Study design and patient population for OPTiM have been previously described [15]. Briefly, eligible patients with histologically confirmed, surgically unresectable, stage IIIB/IIIC/IV melanoma that could undergo direct/ ultrasound-guided injection of dermal, subcutaneous, or lymph node melanoma metastases were included. Patients were randomized 2:1 to receive intralesional talimogene laherparepvec or subcutaneous GM-CSF. The primary endpoint was DRR, defined as the rate of objective response (CR/PR) by modified WHO criteria lasting ≥6 months continuously and beginning within the first 12 months of initiating treatment where disease progression was allowed prior to onset of response. Key secondary endpoints included OS, best overall response, onset and duration of response, and time to treatment failure. Exploratory objectives included evaluation of patients' QoL and analysis of the influence of achieving a response on OS.

Analysis population
Data cutoff for DR and QoL analyses was December 2012, and data cutoff for OS and subsequent therapy analyses was February 2013. Data cutoff for duration of response per investigator was August 2014 (final analysis). A landmark approach was used to limit lead-time bias because the minimum time required to achieve a DR was 9 months (ie, 3 months to first tumor assessment plus minimum 6 months of response duration). Analyses excluded patients with a follow-up duration less than the landmark time.

Assessments
Assessment of durable response in OPTiM has been described previously [15]. Briefly, visible/palpable lesions were assessed by clinical evaluation performed on day 1 of each cycle; computed tomography assessments of response were performed at baseline and every 12 weeks thereafter. Patients who had a best response of CR/PR per modified WHO criteria based on investigator assessment or were on therapy ≥9 months were evaluated for response by a blinded endpoint assessment committee (EAC) [15]. Patients were considered in response if there was ≥50% reduction in total tumor burden of all measurable lesions compared with baseline or, if any new tumors/lesions had appeared, compared with when they were first documented. Disease progression was considered a >25% increase in total tumor burden or appearance of new lesions, and clinically relevant disease progression was defined as disease progression associated with a decline in performance status and/or required alternative therapy. Treatment continued for 24 weeks, irrespective of progression. After 24 weeks, treatment continued until clinically relevant disease progression, intolerability, withdrawal of consent, complete remission, lack of response by 12 months, or disappearance of all injectable lesions (talimogene laherparepvec only) [15]. After 12 months, patients with stable/responding disease could continue treatment for 6 additional months.
Quality of life was assessed using a validated Functional Assessment of Cancer Therapy-Biologic Response Modifier (FACT-BRM) questionnaire, which comprises four domains (Physical, Social/Family, Emotional, and Functional well-being) and two treatmentspecific subscales (Additional Concerns: Physical and Mental) [16]. The Trial Outcome Index (TOI) is a 27item sum of the scores of the Physical and Functional well-being domains and the two treatment-specific subscales of FACT-BRM that has been shown to be a helpful single scale instrument for assessment of QoL in clinical trials [17,18]. Changes in QoL were evaluated using the TOI as well as the FACT-BRM Total, FACT-BRM subdomains, and treatment-specific subscales. The questionnaire was completed before study procedures were conducted on day 1 of each treatment cycle (5 weeks for the first cycle and 4 weeks for subsequent treatment cycles) until the end of treatment and at the end-of-treatment visit (30 days after last treatment). In this analysis, a clinically meaningful improvement in QoL was defined as a ≥5-point increase from baseline lasting ≥28 days for the TOI and FACT-BRM Total and a ≥2-point increase for all other domains and Additional Concerns [19]. A minimum clinically important difference for TOI of 5 to 7 points has been established for group changes.
Melanoma treatments administered after discontinuation of talimogene laherparepvec/GM-CSF were recorded during the survival follow-up. Subsequent therapy was defined as any systemic anticancer therapy for melanoma after discontinuation of study therapy and was categorized as chemotherapy/targeted agents or immunotherapy.

Statistical analysis
The primary analysis was planned when no further patients had the possibility of meeting the criteria for durable response; the final OS analysis was planned for after all patients had reached 18 months from their first dose of treatment. To mitigate potential for lead-time bias (ie, patients who live long enough to have a DR tend to have longer survival, irrespective of whether durable response prolongs survival), OS for patients who had achieved a DR and were alive was compared with OS for patients who had not achieved a DR and were alive in a landmark analysis at 9, 12, and 18 months after randomization [20]. In this analysis, OS was calculated from a landmark time to death; patients who died or who were censored before the landmark time were excluded from the analysis. Because disease stage and line of therapy were found to be associated with DR and OS in multivariate analyses and because the talimogene laherparepvec treatment effect on DRR and OS was pronounced in patients with stage IIIB/IIIC/IVM1a melanoma and treatment-naive patients in exploratory subgroup analyses of OPTiM, Cox model hazard ratio estimates and log-rank tests in landmark analysis were stratified by these factors, accounting for observed baseline imbalances (ie, disease stages IIIB/IIIC/IVM1a versus IVM1b/IVM1c, first-line versus second-line or later therapy).
To further evaluate any potential association between achievement of DR and OS, the time when DR was achieved was also evaluated in a Cox proportional hazards model as a time-dependent covariate for all randomized patients. Treatment-free interval (defined as time from the last dose of study therapy to first dose of subsequent therapy or censoring in the absence thereof at end of follow-up) was estimated using the Kaplan-Meier method. As with OS, the association between TFI and achieving a DR was evaluated in a landmark analysis using a Cox proportional hazards model and log-rank test. The TOI analysis set included patients with a baseline score (defined as the maximum possible score-5point absolute increase criterion for improvement), ≥1 post-baseline non-missing score, and tumor assessments for ≥9 months of follow-up. Associations between FACT-BRM improvement and achievement of DR were evaluated using an odds ratio stratified by disease stage and line of therapy in a landmark analysis at 9 months after randomization. Odds ratios for those who achieved a DR versus those who did not and P values for TOI improvement were obtained by fitting an exact logistic regression model using DR status as a predictor and TOI improvement as response. Statistical significance was evaluated at a nominal two-sided 0.05 significance level without multiplicity assessment.

Patients
In total, 436 patients were included in the intent-to-treat population (295 in the talimogene laherparepvec arm and 141 in the GM-CSF arm). Baseline demographic/clinical characteristics for the study population are summarized in Table 1 and have been reported previously by arm [15]. At the primary analysis, 86 patients in the intent-to-treat population achieved an OR per EAC and, of these, the OR duration did not qualify for a DR for 35 patients but it did for 51 (talimogene laherparepvec, n = 48; GM-CSF, n = 3), 23 of whom had progression before response (all in the talimogene laherparepvec arm). Among the 86 patients with an OR, a similar proportion of DR versus non-DR patients had early stage disease (82% versus 74%) and received first-line therapy (65% versus 63%). At the final analysis, 59 patients achieved a DR per investigator (talimogene laherparepvec, n = 57; GM-CSF, n = 2; Additional file 1: Figure S1), indicating that responses continued to evolve over time.

Durable response and treatment-free interval
Of the 436 patients included in the intent-to-treat population, 129 received study therapy and had tumor assessments at ≥9 months of follow-up and were included in the TFI analysis; of these, 50 patients achieved a DR per the EAC (talimogene laherparepvec, n = 47; GM-CSF, n = 3) and 79 patients did not (talimogene laherparepvec, n = 65; GM-CSF, n = 14). Of the 50 patients with a DR, 11 (22%) were treated with  Kaplan-Meier plots of freedom from subsequent systemic anti-melanoma therapy in patients who achieved a DR versus patients who did not achieve a DR with tumor assessments at ≥9 months, including hazard ratio (HR) and log-rank P value, are shown in Fig. 1. DR was associated with a significantly longer median TFI (HR = 0.33; 95% CI, 0.17-0.65; P = 0.0007; Fig. 1d). Achieving a DR was also associated with a 27.5% (95% CI, 10

Quality of life
Of the 129 patients with tumor assessments at ≥9 months of follow-up, 117 had QoL data and 103 had data for the TOI analysis. Overall, achieving a DR was associated with improved scores in the TOI, the FACT-BRM Total, and all FACT-BRM domains and treatmentspecific subscales, except for social well-being (Fig. 2a).
Achievement of a DR was associated with a higher TOI improvement rate. Among patients who achieved a DR, 58.1% (25/43) had a clinically meaningful TOI improvement, compared with 30.0% (18/60) of patients who did not achieve a DR (P = 0.025; Fig. 2b). In a landmark analysis limited to patients with ≥9 months of follow-up for tumor response assessment, the odds ratio for TOI improvement was 2.8 (95% CI, 1.1-7.0; Table 3). A sensitivity analysis evaluated whether the association was retained when TOI improvement required a larger absolute increase from baseline and/or longer improvement duration. Although nominal 0.05 significance was not achieved in all cases, it was achieved for a magnitude up to 9 points or a duration up to 4 cycles. In addition, with one exception, odds ratios were greater than 1 for all magnitude and duration combinations (Additional file 1: Table S1).    Fig. 1 Kaplan-Meier plots of overall survival in patients who achieved a durable response versus patients who did not achieve a durable response prior to landmark times of 9 months (a), 12 months (b), and 18 months from randomization (c), and treatment-free survival in patients who achieved a durable response versus patients who did not achieve a durable response (d). For landmark analyses, OS was calculated from the landmark time (9, 12, or 18 months after randomization) to death. TFI analysis was limited to treated patients with tumor assessments ≥9 months. Unadjusted hazard ratios (HR) and log-rank P values are shown. DR durable response, NE not evaluable, OS overall survival, TFI treatment-free interval

Discussion
Data from this analysis of outcomes in the phase 3 OPTiM study indicate that achievement of a DR in patients with unresectable regionally and distantly advanced melanoma was associated with several independent clinical benefits, providing further support for DR as a useful endpoint in patients with melanoma. Achievement of a DR was associated with improved OS in landmark analyses at 9, 12, and 18 months, lower risk of subsequent systemic therapy use at 36 months, and improved QoL at ≥9 months of follow-up. The risk of death was decreased by approximately 90% for patients with a DR. Similar results were obtained in analyses adjusted for imbalances in baseline demographic/clinical characteristics and in a Cox proportional hazards model that evaluated achievement of DR as a time-dependent covariate.
Furthermore, patients with a DR have an approximately two-thirds reduction in the risk of requiring a new therapy and were approximately 3 times more likely to have a clinically meaningful improvement in QoL. The association with improvement in QoL is particularly notable in the melanoma setting where skin lesions may be regarded as disfiguring by patients. Taken together, this evidence indicates that DR is an efficacy endpoint that is associated with other favorable clinical outcomes.
Since the proposed mechanism of action of talimogene laherparepvec includes release of tumor-associated antigens and cell-and damage-associated molecular pattern molecules, and local expression of GM-CSF, talimogene laherparepvec is likely able to recruit and expand tumor associated antigen-reactive T cells [21]. This process may take time, and lesions may continue to grow before regression. Furthermore, this process may cause a local inflammatory-like response in lesions, resulting in a period of "pseudo-progression" before response [15,22]; thus, alternative measures of clinical response may be needed. These observations of progression before response are consistent with the immune-mediated antitumor activity associated with the talimogene laherparepvec mechanism of action, and have been reported with other immunotherapy agents such as ipilimumab [23]. With the allowance for progression before response, our results demonstrate the utility of DR as an appropriate endpoint for oncolytic virus clinical trials and for trials involving combination regimens of immunotherapeutic agents. An element of DR that may be a key to its clinical value is that it incorporates a time component: responses must be maintained for ≥6 months continuously and begun within the first 12 months of initiating treatment. In contrast, the WHO criteria and RECIST do not make allowance for progression before response or define a meaningful minimum duration of response. In a randomized clinical trial of ipilimumab in melanoma, a significant improvement in OS was seen; however, treatment had no impact on progressionfree survival (PFS) [10]. These results highlight the inherent problems in using PFS using conventional response criteria as a primary endpoint in immunotherapy clinical trials, and the need for identification of alternative endpoints to accurately assess immunotherapy drugs in early phase clinical development where OS may not be a feasible endpoint. Our data support the use of DRR as a flexible and appropriate surrogate of OS and other measures of clinical benefit.
Other attempts to define alternative clinical trial endpoints for immunotherapy agents include modifications of RECIST (such as immune-related response criteria; irRC [4]), which are being used to evaluate tumor responses in clinical trials across a variety of tumor types [24]. A benefit to the use of irRC is that progressive disease prior to treatment response is permitted, consistent with the criteria for DR in the OPTiM study. However, like the WHO criteria and RECIST, irRC does not define a meaningful duration for tumor response. Furthermore, endpoints such as objective response or progression-free survival based on irRC have not yet been validated as correlates of OS.
Unlike survival endpoints, response indicates a direct biological effect of a therapy on the tumor. Patients can survive for prolonged periods without treatment, but tumors rarely recede or disappear without intervention. To be considered a DR, however, the response must be maintained for a period of at least 6 months. This duration component may make achieving a DR a more descriptive endpoint than others such as objective response, which requires only response confirmation at least 4 weeks after the initial assessment of response. Six months was initially selected as a reasonable duration because most non-immunotherapy treatments rarely induce responses beyond 6 months in melanoma [25]. The expanded use of DRR for other cancers allows for the selection of the most appropriate duration based on the natural history of the disease and known impact of established therapeutic regimens. Thus, by altering the duration of response, DRR can be used in a more flexible manner for other cancers that may exhibit different natural history and/or have other standard treatments available that extend survival for variable times. The inclusion of the initiation at any time within 12 months of starting treatment allows capturing pseudoprogression, which has been reported in some immunotherapy trials, and permits continued treatment beyond clinically asymptomatic progression, as has been reported for irRC proposed for tumor immunotherapy studies [4]. It is also possible that a rate of durable stable disease might represent an informative endpoint; however, further evaluation would be required to validate such an endpoint.
This study had certain limitations. First, these analyses were exploratory and should therefore be interpreted with caution. Second, although we identified associations between DR and clinical outcomes, causative relationships could not be evaluated. For example, although our analysis indicated an association between DR and OS, it is possible that factors other than achievement of DR (eg, imbalances in disease stage and line of therapy, or subsequent therapy) might have influenced the observed outcome. Although endeavors were made to mitigate the potential for confounding by such factors, it is difficult to determine how successful they might have been. Additionally, the selection of 6 months as the minimum duration for achievement of a durable response was based on responses to cytotoxic chemotherapy agents; this time point was, to an extent, arbitrary. In the OPTiM study, some patients had DRs that lasted substantially longer than 6 months. It is unknown whether the associations found in this analysis would have been observed had the distribution of DR duration been closer to the 6-month minimum. Finally, it is possible that the associations seen were driven by the preferential occurrence of complete response among patients with DR (of any duration).

Conclusions
Results from this analysis indicate that the achievement of a DR in patients with unresectable regionally and distantly advanced melanoma was associated with improved clinical benefit and QoL. Although causal relationships cannot be determined via these analyses, these findings support the utility of DR as a meaningful trial endpoint when assessing efficacy of immunotherapies for solid tumors. Further consideration of durable response rate as a clinical endpoint in cancer immunotherapy trials is also warranted and may allow for improved selection of promising agents for later phase clinical development.

Additional file
Additional file 1: Table S1. TOI Improvement Rate Association with Achievement of a DR (per EAC). Figure S1. Talimogene laherparepvec Duration of Response (per Investigator). As of the final analysis 93 of 295 patients (31.5%) randomized to talimogene laherparepvec had an overall response per investigator assessment (complete response, n = 50; partial response, n = 43) and 57 patients (