SISTRAT Datasets
This repository is organized into four main sections: data preparation, deduplication, predictive modeling, and documentation.
1. Data Preparation & Standardization
Core Datasets
2. Data Cleaning & Deduplication (C1)
3. Predictive Modeling Pipeline
Database Formatting
Machine Learning – XGBoost
Penalized Survival Models
Deep Learning – DeepHit
Deep Learning – DeepSurv
Prediction & ML-informed survival modeling
4. Documentation
The main processes are summarized in the following figures.