This notebook implements a DeepSurv model for competing risks. It uses a single neural network to simultaneously predict the risk of death and readmission, explicitly modeling their competition. The model outputs Cumulative Incidence Functions (CIFs) for each risk, allowing evaluation of prediction accuracy over time for both outcomes using metrics like Uno’s C-Index (corrected for competing risks) and cause-specific Brier scores. It processes multiple imputed datasets and evaluates performance across different time horizons.
from IPython.display import display, Markdownifisinstance(imputations_list_jan26, list) andlen(imputations_list_jan26) >0: display(Markdown(f"**First element type:** `{type(imputations_list_jan26[0])}`"))ifisinstance(imputations_list_jan26[0], dict): display(Markdown(f"**First element keys:** `{list(imputations_list_jan26[0].keys())}`"))elifisinstance(imputations_list_jan26[0], (pd.DataFrame, np.ndarray)): display(Markdown(f"**First element shape:** `{imputations_list_jan26[0].shape}`"))
First element type:<class 'pandas.core.frame.DataFrame'>
First element shape:(88504, 56)
This code block:
Imports the pickle library: This library implements binary protocols for serializing and de-serializing a Python object structure.
Specifies the file_path: It points to the .pkl file you selected.
Opens the file in binary read mode ('rb'): This is necessary for loading pickle files.
Loads the object: pickle.load(f) reads the serialized object from the file and reconstructs it in memory.
Prints confirmation and basic information: It verifies that the file was loaded and shows the type of the loaded object, and some details about the first element if it’s a list containing common data structures.
Format data
Due to inconsistencies and structural heterogeneity across previously merged datasets, we decided not to proceed with a direct inspection and comparison of column names between the first imputed dataset from imputations_list_jan26 (which likely included dummy-encoded variables) and imputation_nodum_1 (which likely retained non–dummy-encoded variables).
Instead, we reconstructed the analytic datasets de novo using the most recent source files available in the original directory (BASE_DIR). Time-to-event variables were re-derived to ensure internal consistency. Variables that could introduce information leakage (e.g., time from admission) were excluded, and the center identifier variable was removed prior to modeling.
Code
#1.2. Build Surv objects from df_finalfrom IPython.display import display, Markdownfrom sksurv.util import Survfor i inrange(1, 6):# Get the DataFrame df =globals()[f"imputation_nodum_{i}"]# Extract time and event arrays time_readm = df["readmit_time_from_disch_m"].to_numpy() event_readm = (df["readmit_event"].to_numpy() ==1) time_death = df["death_time_from_disch_m"].to_numpy() event_death = (df["death_event"].to_numpy() ==1)# Create survival objects y_surv_readm = Surv.from_arrays(event=event_readm, time=time_readm) y_surv_death = Surv.from_arrays(event=event_death, time=time_death)# Store in global variables (optional but matches your pattern)globals()[f"y_surv_readm_{i}"] = y_surv_readmglobals()[f"y_surv_death_{i}"] = y_surv_death# Print info display(Markdown(f"\n--- Imputation {i} ---")) display(Markdown(f"**y_surv_readm dtype:** {y_surv_readm.dtype}\n"f"**shape:** {y_surv_readm.shape}" )) display(Markdown(f"**y_surv_death dtype:** {y_surv_death.dtype}\n"f"**shape:** {y_surv_death.shape}" ))
fold_output("Show imputation_nodum_1 (newer database) glimpse",lambda: glimpse(imputation_nodum_1))fold_output("Show first db of imputations_list_jan26 (older) glimpse",lambda: glimpse(imputations_list_jan26[0]))
For each imputed dataset (1–5), we identified and removed predictors with zero variance, as they provide no useful information and can destabilize models. We printed the dropped variables and produced a cleaned version of each design matrix. This ensures that all downstream analyses use only informative predictors.
Code
# Keep only these objectsobjects_to_keep = {"objects_to_keep","imputation_nodum_1","imputation_nodum_2","imputation_nodum_3","imputation_nodum_4","imputation_nodum_5","y_surv_readm","y_surv_death","imputations_list_jan26"}import typesfor name inlist(globals().keys()): obj =globals()[name]if ( name notin objects_to_keepandnot name.startswith("_")andnotcallable(obj)andnotisinstance(obj, types.ModuleType) # <- protects ALL modules ):delglobals()[name]
Code
from IPython.display import display, Markdown# 1. Define columns to exclude (same as before)target_cols = ["readmit_time_from_disch_m","readmit_event","death_time_from_disch_m","death_event",]leak_time_cols = ["readmit_time_from_adm_m","death_time_from_adm_m",]center_id = ["center_id"]cols_to_exclude = target_cols + center_id + leak_time_cols# 2. Create list of your EXISTING imputation DataFrames (1-5)imputed_dfs = [ imputation_nodum_1, imputation_nodum_2, imputation_nodum_3, imputation_nodum_4, imputation_nodum_5]# 3. Preprocessing loopX_reduced_list = []for d, df inenumerate(imputed_dfs): imputation_num = d +1# Convert 0-index to 1-index for display display(Markdown(f"\n=== Imputation dataset {imputation_num} ==="))# a) Identify and drop constant predictors const_mask = (df.nunique(dropna=False) <=1) dropped_const = df.columns[const_mask].tolist() display(Markdown(f"**Constant predictors dropped ({len(dropped_const)}):**")) display(Markdown(f"{dropped_const if dropped_const else'None'}"))# b) Remove constant columns X_reduced = df.loc[:, ~const_mask]# c) Drop target/leakage columns (if present) cols_to_drop = [col for col in cols_to_exclude if col in X_reduced.columns]if cols_to_drop: X_reduced = X_reduced.drop(columns=cols_to_drop) display(Markdown(f"**Dropped target/leakage columns:** {cols_to_drop}"))else: display(Markdown("No target/leakage columns found to drop"))# d) Store cleaned DataFrame X_reduced_list.append(X_reduced)# e) Report shapes display(Markdown(f"**Original shape:** {df.shape}")) display(Markdown(f"**Cleaned shape:** {X_reduced.shape} "f"(removed {df.shape[1] - X_reduced.shape[1]} columns)" ))display(Markdown("\n✅ **Preprocessing complete! X_reduced_list contains 5 cleaned DataFrames.**"))
A structured preprocessing pipeline was implemented prior to modeling. Ordered categorical variables (e.g., housing status, educational attainment, clinical evaluations, and substance use frequency) were manually mapped to numeric scales reflecting their natural ordering. For nominal categorical variables, prespecified reference categories were enforced to ensure consistent baseline comparisons across imputations. All remaining categorical predictors were then converted to dummy variables using one-hot encoding with the first category dropped to prevent multicollinearity. The procedure was applied consistently across all imputed datasets to ensure harmonized model inputs.
Code
import pandas as pdimport numpy as npfrom sklearn.preprocessing import OrdinalEncoderimport pandas as pdimport numpy as npfrom pandas.api.types import CategoricalDtypedef preprocess_features_robust(df): df_clean = df.copy()# ---------------------------------------------------------# 1. Ordinal encoding (your existing code)# --------------------------------------------------------- ordered_mappings = {# --- NEW: Housing & Urbanicity ---"tenure_status_household": {"illegal settlement": 4, # Situación Calle"stays temporarily with a relative": 3, # Allegado"others": 2, # En pensión / Otros"renting": 1, # Arrendando"owner/transferred dwellings/pays dividends": 0# Vivienda Propia },"urbanicity_cat": {"1.Rural": 2,"2.Mixed": 1,"3.Urban": 0 },# --- Clinical Evaluations (Minimo -> Intermedio -> Alto) ---"evaluacindelprocesoteraputico": {"logro minimo": 2, "logro intermedio": 1, "logro alto": 0},"eva_consumo": {"logro minimo": 2, "logro intermedio": 1, "logro alto": 0},"eva_fam": {"logro minimo": 2, "logro intermedio": 1, "logro alto": 0},"eva_relinterp": {"logro minimo": 2, "logro intermedio": 1, "logro alto": 0},"eva_ocupacion": {"logro minimo": 2, "logro intermedio": 1, "logro alto": 0},"eva_sm": {"logro minimo": 2, "logro intermedio": 1, "logro alto": 0},"eva_fisica": {"logro minimo": 2, "logro intermedio": 1, "logro alto": 0},"eva_transgnorma": {"logro minimo": 2, "logro intermedio": 1, "logro alto": 0},# --- Frequency (Less freq -> More freq) ---"prim_sub_freq_rec": {"1.≤1 day/wk": 0,"2.2–6 days/wk": 1,"3.Daily": 2 },# --- Education (Less -> More) ---"ed_attainment_corr": {"3-Completed primary school or less": 2,"2-Completed high school or less": 1,"1-More than high school": 0 } }for col, mapping in ordered_mappings.items():if col in df_clean.columns: df_clean[col] = df_clean[col].astype(str).str.strip() df_clean[col] = df_clean[col].map(mapping) n_missing = df_clean[col].isnull().sum()if n_missing >0:if n_missing ==len(df_clean):print(f"⚠️ WARNING: Mapping failed completely for '{col}'.") mode_val = df_clean[col].mode()[0] df_clean[col] = df_clean[col].fillna(mode_val)# ---------------------------------------------------------# 2. FORCE reference categories for dummies# --------------------------------------------------------- dummy_reference = {"sex_rec": "man","plan_type_corr": "ambulatory","marital_status_rec": "married/cohabiting","cohabitation": "alone","sub_dep_icd10_status": "hazardous consumption","tr_outcome": "completion","adm_motive": "spontaneous consultation","tipo_de_vivienda_rec2": "formal housing","plan_type_corr": "pg-pab","occupation_condition_corr24": "employed","any_violence": "0.No domestic violence/sex abuse","first_sub_used": "marijuana","primary_sub_mod": "marijuana", }for col, ref in dummy_reference.items():if col in df_clean.columns: df_clean[col] = df_clean[col].astype(str).str.strip() cats = df_clean[col].unique().tolist()if ref in cats: new_order = [ref] + [c for c in cats if c != ref] cat_type = CategoricalDtype(categories=new_order, ordered=False) df_clean[col] = df_clean[col].astype(cat_type)else:print(f"⚠️ Reference '{ref}' not found in {col}")# ---------------------------------------------------------# 3. One-hot encoding# --------------------------------------------------------- df_final = pd.get_dummies(df_clean, drop_first=True, dtype=float)return df_finalX_encoded_list_final = [preprocess_features_robust(X) for X in X_reduced_list]X_encoded_list_final = [clean_names(X) for X in X_encoded_list_final]
Code
from IPython.display import display, Markdown# 1. DIAGNOSTIC: Check exact string valuesdisplay(Markdown("### --- Diagnostic Check ---"))sample_df = X_encoded_list_final[0]if'tenure_status_household'in sample_df.columns: display(Markdown("**Unique values in 'tenure_status_household':**")) display(Markdown(str(sample_df['tenure_status_household'].unique())))else: display(Markdown("❌ 'tenure_status_household' is missing entirely from input data!"))if'urbanicity_cat'in sample_df.columns: display(Markdown("**Unique values in 'urbanicity_cat':**")) display(Markdown(str(sample_df['urbanicity_cat'].unique())))if'ed_attainment_corr'in sample_df.columns: display(Markdown("**Unique values in 'ed_attainment_corr':**")) display(Markdown(str(sample_df['ed_attainment_corr'].unique())))
— Diagnostic Check —
Unique values in ‘tenure_status_household’:
[3 0 1 2 4]
Unique values in ‘urbanicity_cat’:
[0 1 2]
Unique values in ‘ed_attainment_corr’:
[1 2 0]
We recoded first substance use so small categories are grouped into Others
Code
# Columns to combinecols_to_group = ["first_sub_used_opioids","first_sub_used_others","first_sub_used_hallucinogens","first_sub_used_inhalants","first_sub_used_tranquilizers_hypnotics","first_sub_used_amphetamine_type_stimulants",]# Loop over datasets 0–4 and modify in placefor i inrange(5): df = X_encoded_list_final[i].copy()# Collapse into one dummy: if any of these == 1, mark as 1 df["first_sub_used_other"] = df[cols_to_group].max(axis=1)# Drop the rest except the new combined column df = df.drop(columns=[c for c in cols_to_group if c !="first_sub_used_other"])# Replace the dataset in the original list X_encoded_list_final[i] = df
Code
import sysfold_output("Show first db of X_encoded_list_final (newer) glimpse",lambda: glimpse(X_encoded_list_final[0]))
Show first db of X_encoded_list_final (newer) glimpse
For each imputed dataset, we fitted two regularized Cox models (one for readmission and one for death) using Coxnet, which applies elastic-net penalization with a strong LASSO component to enable variable selection. The loop fits both models on every imputation, prints basic model information, and stores all fitted models so they can later be combined or compared across imputations.
Create bins for followup (landmarks)
We extracted the observed event times and corresponding event indicators directly from the structured survival objects (y_surv_readm and y_surv_death). Using the observed event times, we constructed evaluation grids based on the 5th to 95th percentiles of the event-time distribution. These grids define standardized time points at which model performance is assessed for both readmission and mortality outcomes.
Code
import numpy as npfrom IPython.display import display, Markdown# Extract event times directly from structured arraysevent_times_readm = y_surv_readm["time"][y_surv_readm["event"]]event_times_death = y_surv_death["time"][y_surv_death["event"]]# Build evaluation grids (5th–95th percentiles, 50 points)times_eval_readm = np.unique( np.quantile(event_times_readm, np.linspace(0.05, 0.95, 50)))times_eval_death = np.unique( np.quantile(event_times_death, np.linspace(0.05, 0.95, 50)))# Display only final resultdisplay(Markdown(f"**Eval times (readmission):** `{times_eval_readm[:5]}` ... `{times_eval_readm[-5:]}`"))display(Markdown(f"**Eval times (death):** `{times_eval_death[:5]}` ... `{times_eval_death[-5:]}`"))
First, we eliminated inmortal time bias (dead patients look like without readmission).
This correction is essentially the Cause-Specific Hazard preparation. It is the correct way to handle Aim 3 unless you switch to a Fine-Gray model (which treats death as a specific type of event 2, rather than censoring 0). For RSF/Coxnet, censoring 0 is the correct approach.
Code
import numpy as np# Step 3. Replicate across imputations (safe copies)n_imputations =len(X_encoded_list_final)y_surv_readm_list = [y_surv_readm.copy() for _ inrange(n_imputations)]y_surv_death_list = [y_surv_death.copy() for _ inrange(n_imputations)]def correct_competing_risks(y_readm_list, y_death_list): corrected = []for y_readm, y_death inzip(y_readm_list, y_death_list): y_corr = y_readm.copy()# death observed and occurs before (or at) readmission/censoring time mask = (y_death["event"]) & (y_death["time"] < y_corr["time"]) y_corr["event"][mask] =False y_corr["time"][mask] = y_death["time"][mask] corrected.append(y_corr)return corrected# Step 4. Apply correctiony_surv_readm_list_corrected = correct_competing_risks( y_surv_readm_list, y_surv_death_list)
Code
# Check type and lengthtype(y_surv_readm_list_corrected), len(y_surv_readm_list_corrected)# Look at the first elementy_surv_readm_list_corrected[0][:5] # first 5 rows
from IPython.display import display, HTMLimport htmldef nb_print(*args, sep=" "): msg = sep.join(str(a) for a in args) display(HTML(f"<pre style='margin:0'>{html.escape(msg)}</pre>"))
The fully preprocessed and encoded feature matrices were renamed from X_encoded_list_final to imputations_list_mar26 to reflect the finalized February 2026 analytic version of the imputed datasets.
This object contains the harmonized, ordinal-encoded, and one-hot encoded predictor matrices for all five imputations and will serve as the definitive input for subsequent modeling procedures.
# counts per stratum in train/testtrain_counts = sdiag.iloc[train_idx].value_counts()test_counts = sdiag.iloc[test_idx].value_counts()min_train =int(train_counts.min())min_test =int(test_counts.min())nb_print_md(f"**Min stratum count in TRAIN (used strata):** `{min_train}`")nb_print_md(f"**Min stratum count in TEST (used strata):** `{min_test}`")# strata that got 0 in test or 0 in trainzero_in_test =sorted(set(train_counts.index) -set(test_counts.index))zero_in_train =sorted(set(test_counts.index) -set(train_counts.index))nb_print_md(f"**Strata with 0 in TEST:** `{len(zero_in_test)}`")nb_print_md(f"**Strata with 0 in TRAIN:** `{len(zero_in_train)}`")# show examples with their full-data countsiflen(zero_in_test) >0: ex = zero_in_test[:10] nb_print_md(f"**Examples 0 in TEST (up to 10):** `{ex}`") nb_print_md(f"**Full-data counts:** `{[int(sdiag.value_counts()[k]) for k in ex]}`")
# Use the actual stratification mode that was used to splitstrata_used, strat_mode, _, _ = build_strata(full["X"][0], full["y_readm"][0], full["y_death"][0])s = pd.Series(strata_used)train_strata =set(s.iloc[train_idx].unique())test_strata =set(s.iloc[test_idx].unique())missing_in_test =sorted(train_strata - test_strata)missing_in_train =sorted(test_strata - train_strata)display(Markdown(f"**Strata used:** `{strat_mode}`"))display(Markdown(f"**# strata in train:** `{len(train_strata)}` | **# strata in test:** `{len(test_strata)}`"))display(Markdown(f"**Strata present in train but missing in test:** `{len(missing_in_test)}`"))display(Markdown(f"**Strata present in test but missing in train:** `{len(missing_in_train)}`"))
Strata used:fallback(plan+readm+death)
# strata in train:20 | # strata in test:20
Strata present in train but missing in test:0
Strata present in test but missing in train:0
Code
from pathlib import Pathimport pandas as pdimport numpy as npfrom IPython.display import display, MarkdownPROJECT_ROOT = find_project_root() # no hardcoded absolute pathOUT_DIR = PROJECT_ROOT /"_out"OUT_DIR.mkdir(parents=True, exist_ok=True)SPLIT_PARQUET = OUT_DIR /f"deepsurv_split_seed{SEED}_test{int(TEST_SIZE*100)}_mar26.parquet"split_df = pd.DataFrame({"row_id": np.arange(n),"is_train": np.isin(np.arange(n), train_idx)})split_df.to_parquet(SPLIT_PARQUET, index=False)display(Markdown(f"**Project root:** `{PROJECT_ROOT}`"))display(Markdown(f"**Saved split to:** `{SPLIT_PARQUET}`"))
df0 = imputations_list_mar26[0]# Calculate exactly what you needmean_age = df0["adm_age_rec3"].mean()count_foreign = (df0["national_foreign"] ==1).sum()# Print resultsnb_print(f"Mean of adm_age_rec3: {mean_age:.4f}")nb_print(f"Count of national_foreign == 1: {count_foreign}")
Mean of adm_age_rec3: 35.7256
Count of national_foreign == 1: 453
We cleaned our environment safely so that:
Old models
Temporary objects
Large intermediate datasets
do not interfere with the next modeling block.
Code
# Safe cleanup before Readmission XGBoost blocksimport typesimport gc# Ensure logger exists (some target cells expect it)if"nb_print"notinglobals():def nb_print(*args, **kwargs):print(*args, **kwargs)# Compatibility: one Optuna/Bootstrap block checks jan26 naming#if "imputations_list_jan26" not in globals() and "imputations_list_mar26" in globals():# imputations_list_jan26 = imputations_list_mar26KEEP = {"nb_print", "study","imputations_list_mar26", #"imputations_list",#"imputations_list_jan26""X_train", "y_surv_readm_list_corrected", "y_surv_readm_list", "y_surv_death_list",# Optional plot config objects:"plt", "sns", "matplotlib", "mpl", "rcParams", "PROJECT_ROOT"}# ensure both variants are keptKEEP.update({"y_surv_readm_list_corrected_mar26", "y_surv_readm_list_corrected","y_surv_readm_list_mar26", "y_surv_death_list_mar26"})# after cleanup, sanity-check alignment before tuningif"imputations_list_mar26"inglobals() and"y_surv_readm_list_corrected"inglobals():assertlen(imputations_list_mar26[0]) ==len(y_surv_readm_list_corrected[0]), \f"Row mismatch: X={len(imputations_list_mar26[0])}, y={len(y_surv_readm_list_corrected[0])}"for name, obj inlist(globals().items()):if name in KEEP or name.startswith("_"):continueifisinstance(obj, types.ModuleType): # keep importscontinueifcallable(obj): # keep functions/classescontinuedelglobals()[name]gc.collect()required = ["y_surv_readm_list_corrected", "y_surv_readm_list", "y_surv_death_list"]missing = [x for x in required if x notinglobals()]nb_print("Missing required objects:", missing)
Missing required objects: []
PyCox
We tune DeepSurv cause-specific Cox models for 1–5 year death and readmission via Optuna TPE with 5-fold stratified CV.
Optuna TPE replaces grid search (100 trials).
Cause-specific Cox handles competing risks.
Separate models fit for death and readmission.
Risk evaluated at 3, 6, 12, 36, 60 months.
IPCW C-index + IBS combined into one metric.
Metric: √((1−C)² + IBS²), lower is better.
Horizon weights emphasize 1–5 year outcomes.
Wide search: 1–4 layers, lr/dropout/decay.
Re-seeded per trial for full reproducibility.
Early stopping + patience tuned per trial.
Code
# Ver cuántas imputaciones hay y filas de cada unanb_print(f"Imputations: {len(imputations_list_mar26)}")for i, df inenumerate(imputations_list_mar26): nb_print(f"Imputation{i}: {df.shape[0]} rows × {df.shape[1]} columns")
#@title 📝 Take-Home Message: Interpretation of Best DeepSurv Configurationimport pandas as pdfrom IPython.display import display# UPDATED 2026-04-01:# Previous winner:# Nodes [64, 128, 64] | LR 0.000585 | WD 0.00056 | Dropout 0.393 | Batch 256# New Phase 2 winner:# Nodes [256, 64, 128, 64] | LR 0.000935 | WD 0.000475 | Dropout 0.543 | Batch 256# Phase1 Combined Metric = 0.253676# Phase2 Mean C-Index = 0.747027# Phase2 Std C-Index = 0.001992# Phase2 Mean IBS = 0.028331# Phase2 Std IBS = 0.000027# Phase2 Mean Combined = 0.254563# Phase2 Std Combined = 0.001982config_interpretation = pd.DataFrame([ {'Component': 'Regularization (Stronger Stochastic Shield)','Selected Value': 'Dropout: 0.543 | Weight Decay: 0.000475','Interpretation': 'The Phase 2 winner uses stronger dropout than the previous best model, but still keeps weight decay relatively light. This suggests the network benefits more from aggressively disrupting noisy co-adaptations during training than from heavily shrinking coefficients. In practical terms, the model needs freedom to express nonlinear risk structure, but it also needs strong protection against memorizing unstable patterns.' }, {'Component': 'Model Capacity (Deep Bottleneck Funnel)','Selected Value': 'Nodes: [256, 64, 128, 64]','Interpretation': 'The winning architecture is deeper and more structured than the earlier compact funnel. It starts wide, compresses sharply, expands again, and then contracts before output. That pattern is consistent with hierarchical feature extraction: broad first-pass interaction capture, bottleneck-based denoising, mid-level representation rebuilding, and final compression into a stable prognostic signal. The result implies the data supports more complex nonlinear structure than the earlier 3-layer winner suggested.' }, {'Component': 'Optimization Mechanics','Selected Value': 'Batch: 256 | LR: 0.000935','Interpretation': 'The optimizer again favored batch size 256, reinforcing the pattern that smaller batches work better than larger ones for this Cox setup. The learning rate moved upward relative to the previous Phase 2 winner, indicating that once the architecture became more expressive, slightly faster optimization helped the model reach a better ranking solution without losing stability.' }, {'Component': 'Performance Context (Best Generalization Across Imputations)','Selected Value': 'Phase1 Combined: 0.253676 | Phase2 C-Index: 0.747027 ± 0.001992 | Phase2 IBS: 0.028331 ± 0.000027 | Phase2 Combined: 0.254563 ± 0.001982','Interpretation': 'This model is especially interesting because it was only Rank 4 in Phase 1, yet it emerged as Rank 1 after validation across all imputations. That means the final winner was not simply the best single-study Optuna result; it was the configuration that generalized most reliably. IBS remained extremely stable across candidates, so the Phase 2 win was driven mainly by better discrimination while preserving essentially the same calibration level.' }])nb_print("\n>>> TAKE-HOME MESSAGE: UPDATED OPTUNA-OPTIMIZED DEEPSURV CONFIGURATION (PHASE 2 WINNER)")pd.set_option('display.max_colwidth', None)styled_table = ( config_interpretation.style .set_properties(**{"text-align": "left","white-space": "pre-wrap","font-size": "14px","vertical-align": "top" }) .set_table_styles([ {"selector": "th", "props": [("background-color", "#f0f2f6"), ("font-weight", "bold"), ("font-size", "14px")]}, {"selector": "td", "props": [("padding", "12px"), ("border-bottom", "1px solid #ddd")]} ]))display(styled_table)
The Phase 2 winner uses stronger dropout than the previous best model, but still keeps weight decay relatively light. This suggests the network benefits more from aggressively disrupting noisy co-adaptations during training than from heavily shrinking coefficients. In practical terms, the model needs freedom to express nonlinear risk structure, but it also needs strong protection against memorizing unstable patterns.
1
Model Capacity (Deep Bottleneck Funnel)
Nodes: [256, 64, 128, 64]
The winning architecture is deeper and more structured than the earlier compact funnel. It starts wide, compresses sharply, expands again, and then contracts before output. That pattern is consistent with hierarchical feature extraction: broad first-pass interaction capture, bottleneck-based denoising, mid-level representation rebuilding, and final compression into a stable prognostic signal. The result implies the data supports more complex nonlinear structure than the earlier 3-layer winner suggested.
2
Optimization Mechanics
Batch: 256 | LR: 0.000935
The optimizer again favored batch size 256, reinforcing the pattern that smaller batches work better than larger ones for this Cox setup. The learning rate moved upward relative to the previous Phase 2 winner, indicating that once the architecture became more expressive, slightly faster optimization helped the model reach a better ranking solution without losing stability.
3
Performance Context (Best Generalization Across Imputations)
This model is especially interesting because it was only Rank 4 in Phase 1, yet it emerged as Rank 1 after validation across all imputations. That means the final winner was not simply the best single-study Optuna result; it was the configuration that generalized most reliably. IBS remained extremely stable across candidates, so the Phase 2 win was driven mainly by better discrimination while preserving essentially the same calibration level.
We leveraged Optuna’s Bayesian optimization to efficiently find the best hyperparameters for a DeepSurv survival model. By combining stratified 5-fold cross-validation with multi-horizon C-index evaluation, we ensure the model is both highly accurate and generalizable. The code dynamically learns from previous trials to zero in on the optimal learning rate, dropout, and network architecture without wasting time on poor combinations.
Bayesian Efficiency: Optuna learns from past trials, crushing blind grid search speeds.
Targeted Space: Search is aggressively narrowed to moderate learning rates (~1e-3) and calibrated dropout (0.28-0.60).
Robust Validation: Stratified 5-fold CV ensures the model generalizes across patient splits.
Leakage Prevention: StandardScalers are strictly fit only on the training folds.
Smart Training: 250 epochs with Early Stopping halts training exactly when optimal.
Full trials history saved to: g:\My Drive\Alvacast\SISTRAT 2023\dh\_out\DS_Optuna_Tuning_TargetedCombined_20260401_1737.csv
Total Time: 247.25 min
Total Time: 247.25 min
Code
#@title 📝 Take-Home Message: Interpretation of Targeted Optuna Resultsimport pandas as pdfrom IPython.display import display# --- UPDATED HYPERPARAMETER INTERPRETATION DATAFRAME ---config_interpretation = pd.DataFrame([ {'Component': 'Regularization (Balanced Shield)','Selected Value': 'Dropout: 0.514 | Weight Decay: 0.000703','Interpretation': 'The updated targeted search winner favors moderate-high dropout together with modest L2 regularization. This combination suggests the model benefits from strong stochastic regularization to suppress noisy co-adaptations, while still using enough weight decay to stabilize the Cox risk surface without over-shrinking it.' }, {'Component': 'Model Capacity (Lean Twin Architecture)','Selected Value': 'Nodes: [128, 128]','Interpretation': 'The new best targeted-search model is a compact 2-layer architecture rather than a wider funnel. This indicates that, under the narrowed search space and combined-metric objective, a leaner network was sufficient to capture the relevant nonlinear survival structure. In other words, extra depth did not translate into better discrimination-calibration trade-off in this run.' }, {'Component': 'Optimization Mechanics','Selected Value': 'Batch: 256 | LR: 0.000983','Interpretation': 'The optimizer again selected batch size 256, reinforcing the earlier pattern that smaller batches are more effective for this DeepSurv setup. The learning rate remained close to 1e-3, which appears to be the stable operating region for navigating the Cox partial likelihood landscape efficiently without becoming erratic.' }, {'Component': 'Performance Context (Best Targeted Combined-Metric Result)','Selected Value': 'Combined Metric: 0.2498 | Weighted Uno C-Index: 0.7518 | Average IBS: 0.0283','Interpretation': 'This configuration delivered the best result in the targeted Optuna search under the Step 1 combined metric. The improvement over earlier targeted candidates is modest, and IBS changed very little across top trials, so the gain appears to come mainly from slightly better discrimination rather than a major shift in calibration.' }])# --- DISPLAY ---nb_print("\n>>> TAKE-HOME MESSAGE: DEEPSURV TARGETED OPTUNA OPTIMIZATION (BEST COMBINED METRIC = 0.2498)")pd.set_option('display.max_colwidth', None)styled_table = ( config_interpretation.style .set_properties(**{"text-align": "left","white-space": "pre-wrap","font-size": "14px","vertical-align": "top" }) .set_table_styles([ {"selector": "th", "props": [("background-color", "#f0f2f6"), ("font-weight", "bold"), ("font-size", "14px")]}, {"selector": "td", "props": [("padding", "12px"), ("border-bottom", "1px solid #ddd")]} ]))display(styled_table)
The updated targeted search winner favors moderate-high dropout together with modest L2 regularization. This combination suggests the model benefits from strong stochastic regularization to suppress noisy co-adaptations, while still using enough weight decay to stabilize the Cox risk surface without over-shrinking it.
1
Model Capacity (Lean Twin Architecture)
Nodes: [128, 128]
The new best targeted-search model is a compact 2-layer architecture rather than a wider funnel. This indicates that, under the narrowed search space and combined-metric objective, a leaner network was sufficient to capture the relevant nonlinear survival structure. In other words, extra depth did not translate into better discrimination-calibration trade-off in this run.
2
Optimization Mechanics
Batch: 256 | LR: 0.000983
The optimizer again selected batch size 256, reinforcing the earlier pattern that smaller batches are more effective for this DeepSurv setup. The learning rate remained close to 1e-3, which appears to be the stable operating region for navigating the Cox partial likelihood landscape efficiently without becoming erratic.
Combined Metric: 0.2498 | Weighted Uno C-Index: 0.7518 | Average IBS: 0.0283
This configuration delivered the best result in the targeted Optuna search under the Step 1 combined metric. The improvement over earlier targeted candidates is modest, and IBS changed very little across top trials, so the gain appears to come mainly from slightly better discrimination rather than a major shift in calibration.