Clinical Prediction Models

The clinical prediction model provides risk estimates for the graft failure in individual transplanted patients. 

Artifical Intelligence

Artifical intelligence and machine learning methods provide the basis for our clinical prediction model.  

De-Centralized Infrastructure for Machine Learning

A decentralized infrastructure for federated learning helps in protecting sensitive data by analyzing the data at the hospital site. 

The latest advances in computer hardware just enabled the wide adoption of machine learning approaches in a real-world setting. In comparison to traditional programming, supervised Machine Learning (ML) models are programmed through training. Supervised ML algorithms try to identify relevant aspects from the training data and incorporate them as rule sets for decision making. Models incorporate these rules as internal programs to derive decisions on unseen validation data.

Nowadays, selected challenges for training of ML algorithms are:

  • Availability of large sets of training data,
  • Computing resources for long-running model training, and
  • The dimensionality of training data.


In the NephroCAGE project, we will incorporate training data from Canada and Germany. We will pool the data to gain an ever-larger set of training data. Instead of forming a single huge data source, we will use multiple local data sources for training. Therefore, we will be able to reduce the required time for model training. Pre-trained models from one site will be exchanged via the federated learning infrastructure to continue training at the next site. Instead of exchanging sensitive healthcare data, we will exchange comparatively small models between partners only.

Due to the high dimensionality of available clinical parameters, the derived CPM becomes more and more complex. Therefore, we will investigate a minimal set of clinical parameters to achieve an adequate quality of CPM results keeping the complexity of the CPM at its minimum.

To address the challenges for training of ML and to cope with the ever-growing size of datasets and more complex models, training machine learning models increasingly requires distributing the optimization of model parameters over multiple machines. Existing machine learning algorithms are designed for highly controlled environments, such as data centers, where the data is distributed among machines in a balanced and independent and identically distributed fashion, and high-throughput networks are available. Recently, federated Learning and related decentralized approaches have been proposed as an alternative setting: a shared global model is trained under the coordination of a central server, from a federation of participating devices. Such approaches make it possible to avoid transferring the data to a central location and can therefore globally connect several data pools and generate models from them. One of the first use cases of federated learning was implemented by Google where ML models to predict the word someone was typing on their mobile phone was trained locally on the phones and only training results were shared with Google and thus with other users of the Android operating system.