In our previous “predictive analytics in RBM” article, we started a discussion about algorithms of machine learning (ML), predictive analytics, and artificial intelligence. We also covered that a risk software needs to calculate forecasts of Key Risk Indicators (KRIs) proactively and alerts when they exceed thresholds.
In some cases, a program employing business rules and deterministic formulas without statistics or artificial intelligence can calculate forecast values. A combination of that approach with time series analysis can deliver good results, but in general, it is not enough and advanced ML methods should be used.
Such methods, be it regression model, neural network or some others also can output a forecast value. However, they can accept as input not only past metric observations but a list of any parameters mentioned above. However, the transformation of data into a list of input values for throwing into an ML model sometimes can be a complex task requiring experimentation and a lot of manual programming.
If some bulky ML model will be used it is more natural not to have a separate model for each predicted metric but predict a bunch of metrics by use of a single model. That single ML model (“world”) will permanently exist as a part of EarlyBird® driven by arriving metric values and other data exports. Internals of a “world” are relations of inputs to outputs (forecasts).
Reliability and greater power of the model can be achieved by using an ensemble of forecasts. Assembling models whether implies majority voting between forecast methods or splits work between methods so one specializes in cases where another produces weakest results. It can be sequential or parallel. For example, decomposition of time series into seasonal and trend components can be used as two inputs of a neural network.
For each prediction method used, for predictive analytics exists a statistical procedure strictly proving that with given required assumptions that method delivers predictions with a required level of reliability, typically 95%. For a study designed with the holistic approach these assumptions are already listed in the study documentation and Cyntegrity staff verifies the existence of that proof for each method relative to the concrete study.
However, predictive analytics is not limited to using KRI forecasts.
Risk detection can be based not only on separate metrics of entities (sites, patients, patient visits) but also on entities themselves. When risk is related to a concrete patient, predictive analytics system is named Clinical decision support system and currently, there were big advances in the field of such software.
A special branch of machine learning named anomaly detection can identify entities with every property in the desired range but with some unusual and unwanted combination of properties (whether predicted or actual but not directly observable). It is not possible to create KRI to every suspicious combination of metrics but anomaly detection solves that. Moreover being unsupervised learning method it is able to find risks of unexpected and surprising types from data.
In general, anomaly detection is one of the applications of ML classification methods. The latter methods can be also useful for grouping entities of sorts demanding special attention or treatment because they are potentially risky in a certain way.
Predicting entities rather than metrics is named structured prediction and includes probabilistic models on graphs, which are important for us. In this regard, Bayesian networks should be mentioned. They allow generalizing detected risks by automatically reasoning about causes of a risk or a fault and diagnosing it. More important are Markov chains capable of modeling queues (of enrolled subjects, of electronic data capture queries etc.). Cyntegrity actively uses the method of Prof. V.Anisimov based on Markov chains for prediction and adaptive adjustment of recruitment and have excellent results.
Figure 1. Presentation of V.Anisimov prediction model implemented in a phase III trial.
Often predictive analytics is looked at in the broader framework together with descriptive and prescriptive analytics. In our case, descriptive analytics should be used for visualization. More important use of it is for the preparation of data for consuming by predictive analytics methods. An important example is natural text processing for parsing fields of exported documents containing free form text into values of metrics. So text inputs data will not lay dormant in an archive but will constantly work.
As for prescriptive analytics, its main application in EarlyBird® is a selection of risk mitigation actions. It is also achieved with ML model with risks given as inputs and actions as outputs. In this case, it is a recommendation system. The advantage of recommendation systems is that recommendations of actions are personalized for a related site or a patient and consider decisions made by investigators of such incidents in the past. And they are automatically adjusted when a user discards a certain recommendation. These systems are closely related to predictive analytics because recommending an action requires a prediction what a user would do in such situation.
It should be mentioned that for sensitive facilities there are security systems targeting stuff of organization and predicting such things as what employee has a higher chance to violate standard operating procedure (SOP), for example, because of tiredness. Such systems analyze data from tourniquet checkpoints, corporate canteens etc. A more extreme example is they may be able to find an employee is going to quit the job before he realizes that himself. As an intermediate result, informal groups in personnel are identified. Therefore, it can be an impersonation of Big Brother raising all related ethical issues. It is mostly out of the sphere of EarlyBird® functionality because sites and sponsors have their own security departments, but Cyntegrity can elaborate on that by estimating personal risks for stuff or even patients of course in an ethical way and of course if blinding aspect of study design allows that.
 Read more here: Anisimov, V.V., Downing, D., Fedorov, V.V., 2007. Recruitment in Multicentre Trials: Prediction and Adjustment, in: mODa 8 – Advances in Model-Oriented Design and Analysis, Contributions to Statistics. Physica-Verlag HD, pp. 1–8.