Predictive analytics is a very useful tool in the risk-based monitoring and overall risk-based study management. It increases the proportion of correct decisions, as the decisions start being more data-driven. It also helps to understand for a central CRA or study manager the trending in the clinical data and to see the critical development earlier.
Predictive analytics involves extracting data from existing data sets with the goal of identifying trends and patterns used then to predict future outcomes and changes in trends. It is generating predictive scores (probabilities) for each individual organizational element in order to influence organizational processes. In the case of risk assessment/management, it means identifying subjects at risk. As of now, EarlyBird® features predictive analytics of clinical trial metrics allowing forecasting trend and possible future interval of a metric.
Figure 1. Comparison of a number of visits per site including a prediction for a phase III oncology trial.
EarlyBird® Predictive Analytics
Metric values for every day since the start of the trial (let us name them “data”) are into the sum of three components: a seasonal component, which is periodical, a remainder component characterizing outliers, and a trend component. For a period of the seasonal component longest possible period is used, for example, if total observations duration is 2 years or more, a period is 1 years, if it’s 2 months or more, a period is 1 month. The name of the applied algorithm is STL decomposition.
For decomposing, firstly the seasonal component is calculated by some complex averaging of data for each period followed by averaging between periods.
After that seasonal component is subtracted from data. The resulting difference is the sum of trend and remainder components and is then used for calculation of approximation of that difference – the trend component. The approximation allows excluding outliers, which are not useful for forecasting.
“Prediction model” is used for calculation of the trend component, which consists of a formula representing metric values through values preceding in time and the algorithm for finding parameters of a formula. Two models are probed: Exponential Smoothing State Space (ETS) and, if it cannot be evaluated because of too little amount of data available, AR model. ETS is representing metric values as weighted averages of past observations (with the weights diminishing exponentially for older observations). AR (autoregression) can be described in the same way, but weights are zero for older and fixed for several last observations so it is a rougher model.
The algorithm then calculates a forecast of values as the sum of
- forecast of a trend by substituting past observations into prediction model formula,
- values of the last period of seasonal component,
- random noise numbers imitating natural randomness and outliers. For non-negative metrics resulting forecasts are capped with zero.
To calculate prediction interval (possible range of forecast), a variance of forecast multiplied by a coefficient correspondent to confidence level is calculated and that value subtracted and added to a forecast value are boundaries of prediction interval for that value.
To identify subjects at risk software needs to calculate forecasts of Key Risk Indicators (KRIs) proactively and alerts when they exceed thresholds. Cyntegrity can proactively identify risks for most important KRIs. Time series forecast of an underlying metric alone is not enough for a risk identification. Cyntegrity needs to evaluate a solid model of what is measured to enable predictions with proven statistical power. It is necessary because risk identification has costly organizational consequences. That model can be created by aggregating all available information related to the KRI: the knowledge base for the trial study, third-party knowledge bases of medical data, trial study entities metadata, observations and forecasts of related metrics.