Steps

Objectives

General Aim: The main goal of this study is to estimate and compare the risk and time to treatment readmission between adult women admitted to state-funded “women-only” and “mixed-gender” substance use disorder treatment programs in Chile between 2010 and 2019, considering the treatment outcome obtained prior to the first treatment readmission.
- To describe the incidence rate of SUD treatment readmissions among women admitted to women-only treatment program and among women admitted to mixed-gender treatment program.
- To estimate the risk and the time to treatment readmission by type of treatment program (i.e, women-only and mixed-gender treatment programs) conditioned by previous treatment outcome (i.e., administrative discharge, early and late drop-outs, therapeutic discharge).
- To discuss the appropriateness of women-only and mixed-gender treatment programs in light of the results obtained in the statistical analysis.

Database consolidation

We selected adult women at baseline (21,378) in the following variables of interest:

‘Type of program’(tipo_de_programa_2)
‘Marital status’(estado_conyugal_2)
‘Educational Attainment’ (escolaridad_rec)= We selected the most vulnerable category among continuous treatments for the same user.
‘Age at admission to treatment, grouped’(edad_al_ing_grupos)
‘Consumption frequency of primary or main substance’(freq_cons_sus_prin)= Among the categories of the Frequency of Drug Consumption, we grouped Less than 1 day a week (3.16%) with Did not use in the last 30 days (1.80%)
‘Pregnant at admission’(embarazo)
‘Tenure status of households’ (tenencia_de_la_vivienda_mod)= Three categories were collapsed into one single condition: Owner (28.66%), with Transferred dwellings (3.76%) and Pays Dividend (1.79%).
‘Primary or main substance’(sus_principal_mod)= We used the primary substance at admission of the largest treatment for continuous treatments. Every substance was chosen from the largest treatment, excepting cases where the value was not available in the largest treatment.
‘Co-occurring SUD’ (num_otras_sus_mod)= We counted the times that different substances at admission appear for each entry.
‘Number of children (max. Value) (Dichotomized)’(numero_de_hijos_mod_rec)= We selected the number of children with the maximum value among continuous treatments. We decided to dichotomize between no children (12.12%) and children (87.50%).
‘Treatment Setting’(tipo_de_plan_res)= We dichotomized into Residential and Outpatient treatments.

## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 21378 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning in chisq.test(xx, correct = FALSE): Chi-squared approximation may be
## incorrect

## Warning in compare.i(X[, i], y = y, selec.i = selec[i], method.i = method[i], :
## Some levels of 'tipo_de_plan_res' are removed since no observation in that/those
## levels

Table 1. Summary descriptives in Women, Between the Different Programs
Variables	Mixed gender	Women only	Sig.
	N=13170	N=8191
Marital status:			<0.001
Married/Shared living arrangements	4624 (35.1%)	2323 (28.4%)
Separated/Divorced	1736 (13.2%)	869 (10.6%)
Single	6411 (48.7%)	4843 (59.1%)
Widower	371 (2.82%)	150 (1.83%)
‘Missing’	28 (0.21%)	6 (0.07%)
Age at admission to treatment, grouped.:			.
18-29	4519 (34.3%)	3360 (41.0%)
30-39	4273 (32.4%)	2750 (33.6%)
40-49	2650 (20.1%)	1380 (16.8%)
50+	1725 (13.1%)	700 (8.55%)
‘Missing’	3 (0.02%)	1 (0.01%)
Educational Attainment:			<0.001
3-Completed primary school or less	4285 (32.5%)	2636 (32.2%)
2-Completed high school or less	6630 (50.3%)	4277 (52.2%)
1-More than high school	2160 (16.4%)	1258 (15.4%)
‘Missing’	95 (0.72%)	20 (0.24%)
Primary or main substance:			.
Alcohol	4840 (36.8%)	1903 (23.2%)
Cocaine hydrochloride	2526 (19.2%)	1434 (17.5%)
Marijuana	942 (7.15%)	473 (5.77%)
Other	561 (4.26%)	272 (3.32%)
Cocaine paste	4301 (32.7%)	4108 (50.2%)
‘Missing’	0 (0.00%)	1 (0.01%)
Consumption frequency of primary or main substance:			<0.001
Less than 1 day a week	842 (6.39%)	217 (2.65%)
2 to 3 days a week	3888 (29.5%)	1677 (20.5%)
4 to 6 days a week	2051 (15.6%)	1175 (14.3%)
1 day a week or more	1044 (7.93%)	295 (3.60%)
Daily	5252 (39.9%)	4804 (58.6%)
‘Missing’	93 (0.71%)	23 (0.28%)
Biopsychosocial involvement:			0.000
1-Mild	1299 (9.86%)	161 (1.97%)
2-Moderate	7723 (58.6%)	3328 (40.6%)
3-Severe	3870 (29.4%)	4610 (56.3%)
‘Missing’	278 (2.11%)	92 (1.12%)
Tenure status of households:			<0.001
Illegal Settlement	180 (1.37%)	146 (1.78%)
Others	343 (2.60%)	223 (2.72%)
Owner/Transferred dwellings/Pays Dividends	4836 (36.7%)	2469 (30.1%)
Renting	2662 (20.2%)	1348 (16.5%)
Stays temporarily with a relative	4555 (34.6%)	3660 (44.7%)
‘Missing’	594 (4.51%)	345 (4.21%)
Co-occurring SUD:			.
No additional substance	4393 (33.4%)	1726 (21.1%)
One additional substance	4894 (37.2%)	3210 (39.2%)
More than one additional substance	3883 (29.5%)	3254 (39.7%)
‘Missing’	0 (0.00%)	1 (0.01%)
Have children (Dichotomized):			0.003
No	1655 (12.6%)	911 (11.1%)
Yes	11459 (87.0%)	7254 (88.6%)
‘Missing’	56 (0.43%)	26 (0.32%)
Setting of Treatment:			0.000
Outpatient	12567 (95.4%)	4868 (59.4%)
Residential	603 (4.58%)	3323 (40.6%)
Note. Variables of C1 dataset had to be standardized before comparison;
Continuous variables are presented as Medians and Percentiles 25 and 75 were shown;
Categorical variables are presented as number (%)

Imputation

We generated a plot to see all the missing values in the sample.

#<div style="border: 1px solid #ddd; padding: 5px; overflow-y: scroll; height:400px; overflow-x: scroll; width:100%">
library(dplyr)
library(ggplot2)

vector_variables<-
c("tipo_de_programa_2", "estado_conyugal_2", "edad_al_ing_grupos", "escolaridad_rec", "sus_principal_mod", "freq_cons_sus_prin", "compromiso_biopsicosocial", "tenencia_de_la_vivienda_mod", "num_otras_sus_mod", "numero_de_hijos_mod_rec", "motivodeegreso_mod_imp","tipo_de_plan_res")

missing.values<-
CONS_C1_df_dup_SEP_2020_women %>%
  rowwise %>%
  dplyr::mutate_at(.vars = vars(vector_variables),
                   .funs = ~ifelse(is.na(.), 1, 0)) %>% 
  dplyr::ungroup() %>% 
  dplyr::summarise_at(vars(vector_variables),~sum(.))
#t(missing.values)

miss_val_bar<-
melt(missing.values) %>% 
    mutate(perc=scales::percent(value/nrow(CONS_C1_df_dup_SEP_2020_women))) %>% 
    arrange(desc(perc))

plot_miss<-
missing.values %>%
  data.table::melt() %>%  #condicion_ocupacional_corr
  dplyr::filter(!variable %in% c("row", "hash_key", "dias_treat_imp_sin_na", "dup")) %>% 
  dplyr::mutate(perc= value/sum(nrow(CONS_C1_df_dup_SEP_2020_women))) %>% 
  dplyr::mutate(label_text= paste0("Variable= ",variable,"<br>n= ",value,"<br>",scales::percent(round(perc,3)))) %>%
  dplyr::mutate(perc=perc*100) %>% 
  ggplot() +
  geom_bar(aes(x=factor(variable), y=perc,label= label_text), stat = 'identity') +
  sjPlot::theme_sjplot()+
#  scale_y_continuous(limits=c(0,1), labels=percent)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size=9))+
  labs(x=NULL, y="% of Missing Values", caption=paste0("Nota. Percentage of missing values (n= ",sum(nrow(CONS_C1_df_dup_SEP_2020_women)),")"))

  ggplotly(plot_miss, tooltip = c("label_text"))%>% layout(xaxis= list(showticklabels = T), height = 600, width=800) %>%   layout(yaxis = list(tickformat='%',  range = c(0, 5)))

Figure 3. Bar plot of Percentage of Missing Values per Variables at Basline

  #</div>

From the figure above, we could see that the Tenure status of households (tenencia_de_la_vivienda_mod) and the Biopsychosocial Involvement (compromiso_biopsicosocial) had a proportion of missing values, but no greater than 5%. This is why we imputed these values under MAR assumption.

vector_variables_only_for_imputation<-
c("row", "hash_key", "tipo_de_programa_2", "estado_conyugal_2", "edad_al_ing_grupos", "escolaridad_rec", "sus_principal_mod", "freq_cons_sus_prin", "compromiso_biopsicosocial", "tenencia_de_la_vivienda_mod", "num_otras_sus_mod", "numero_de_hijos_mod_rec","motivodeegreso_mod_imp","tipo_de_plan_res")

#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:

  #HACER BASE ESPECIAL QUE CONTENGA UNA VARIABLE DE EDAD DE INICIO DE CONSUMO DE SUSTANCIA PRINCIPAL PARA EQUIPARAR
CONS_C1_df_dup_SEP_2020_women_miss<-
CONS_C1_df_dup_SEP_2020_women %>% 
    #dplyr::group_by(hash_key) %>% 
    #dplyr::mutate(rn=row_number()) %>% 
    #dplyr::ungroup() %>% 
  
  #:#:#:#:#:#:#:#:#:#:#:
  # ORDINALIZAR LAS VARIABLES ORDINALES: 
  dplyr::select_(.dots = vector_variables_only_for_imputation) %>% 
    data.table::data.table()
  
#CONS_C1_df_dup_SEP_2020 %>% janitor::tabyl(evaluacindelprocesoteraputico) 
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:

library(Amelia)

amelia_fit <- amelia(CONS_C1_df_dup_SEP_2020_women_miss, 
#Warning message:
#In amcheck(x = x, m = m, idvars = numopts$idvars, priors = priors,  : 
#The number of categories in one of the variables marked nominal has greater than 10 categories. Check nominal specification.
                     m=61, 
                     parallel = "multicore",
                     idvars="row",
                     noms= c("tipo_de_programa_2", "estado_conyugal_2", "sus_principal_mod", "tenencia_de_la_vivienda_mod", "numero_de_hijos_mod_rec","motivodeegreso_mod_imp", "tipo_de_plan_res"),
                     ords= c("edad_al_ing_grupos", "escolaridad_rec", "freq_cons_sus_prin", "compromiso_biopsicosocial", "num_otras_sus_mod"),
                     cs = "hash_key",
                     incheck = TRUE)
# Se sacó el servicio de salud porque tiene mucha información: The number of categories in one of the variables marked nominal has greater than 10 categories. Check nominal specification.

#Error in yy %*% unique(na.omit(x.orig[, i])) :  non-conformable arguments.

Age at Admission to Treatment (in groups)

We started looking over the missing values in the age at admission (in groups) (n=4).

#On this graph, a y = x line indicates the line of perfect agreement; that is, if the imputation model was a perfect predictor of the true value, all the imputations would fall on this line
no_mostrar=0
if(no_mostrar==1){
  res <- { 
    setTimeLimit(nn_K*500)
    ovr_imp_edad_ini_cons<-overimpute(amelia_fit, var = "edad_al_ing_grupos")
  }
}

paste0("Users that had more than one treatment with no date of admission: ",CONS_C1_df_dup_SEP_2020_women_miss %>% 
    dplyr::group_by(hash_key) %>% 
    dplyr::mutate(na_edad_ing=sum(is.na(edad_al_ing_grupos))) %>% 
    dplyr::ungroup() %>% 
    dplyr::filter(na_edad_ing>0) %>% 
    dplyr::group_by(hash_key) %>% 
    dplyr::summarise(n=n()) %>% dplyr::filter(n>1) %>% nrow())

## [1] "Users that had more than one treatment with no date of admission: 0"

#Hay poca relación en las imputaciones.
#table(is.na(CONS_C1_df_dup_SEP_2020_women_not_miss$edad_al_ing),exclude=NULL)

edad_al_ing_grupos_imputed<-
  cbind.data.frame(amelia_fit$imputations$imp1$row,
       amelia_fit$imputations$imp1$edad_al_ing_grupos,
       amelia_fit$imputations$imp2$edad_al_ing_grupos,
       amelia_fit$imputations$imp3$edad_al_ing_grupos,
       amelia_fit$imputations$imp4$edad_al_ing_grupos,
       amelia_fit$imputations$imp5$edad_al_ing_grupos,
       amelia_fit$imputations$imp6$edad_al_ing_grupos,
       amelia_fit$imputations$imp7$edad_al_ing_grupos,
       amelia_fit$imputations$imp8$edad_al_ing_grupos,
       amelia_fit$imputations$imp9$edad_al_ing_grupos,
       amelia_fit$imputations$imp10$edad_al_ing_grupos,
       amelia_fit$imputations$imp11$edad_al_ing_grupos,
       amelia_fit$imputations$imp12$edad_al_ing_grupos,
       amelia_fit$imputations$imp13$edad_al_ing_grupos,
       amelia_fit$imputations$imp14$edad_al_ing_grupos,
       amelia_fit$imputations$imp15$edad_al_ing_grupos,
       amelia_fit$imputations$imp16$edad_al_ing_grupos,
       amelia_fit$imputations$imp17$edad_al_ing_grupos,
       amelia_fit$imputations$imp18$edad_al_ing_grupos,
       amelia_fit$imputations$imp19$edad_al_ing_grupos,
       amelia_fit$imputations$imp20$edad_al_ing_grupos,
       amelia_fit$imputations$imp21$edad_al_ing_grupos,
       amelia_fit$imputations$imp22$edad_al_ing_grupos,
       amelia_fit$imputations$imp23$edad_al_ing_grupos,
       amelia_fit$imputations$imp24$edad_al_ing_grupos,
       amelia_fit$imputations$imp25$edad_al_ing_grupos,
       amelia_fit$imputations$imp26$edad_al_ing_grupos,
       amelia_fit$imputations$imp27$edad_al_ing_grupos,
       amelia_fit$imputations$imp28$edad_al_ing_grupos,
       amelia_fit$imputations$imp29$edad_al_ing_grupos,
       amelia_fit$imputations$imp30$edad_al_ing_grupos,
       amelia_fit$imputations$imp31$edad_al_ing_grupos,
       amelia_fit$imputations$imp32$edad_al_ing_grupos,
       amelia_fit$imputations$imp33$edad_al_ing_grupos,
       amelia_fit$imputations$imp34$edad_al_ing_grupos,
       amelia_fit$imputations$imp35$edad_al_ing_grupos,
       amelia_fit$imputations$imp36$edad_al_ing_grupos,
       amelia_fit$imputations$imp37$edad_al_ing_grupos,
       amelia_fit$imputations$imp38$edad_al_ing_grupos,
       amelia_fit$imputations$imp39$edad_al_ing_grupos,
       amelia_fit$imputations$imp40$edad_al_ing_grupos,
       amelia_fit$imputations$imp41$edad_al_ing_grupos,
       amelia_fit$imputations$imp42$edad_al_ing_grupos,
       amelia_fit$imputations$imp43$edad_al_ing_grupos,
       amelia_fit$imputations$imp44$edad_al_ing_grupos,
       amelia_fit$imputations$imp45$edad_al_ing_grupos,
       amelia_fit$imputations$imp46$edad_al_ing_grupos,
       amelia_fit$imputations$imp47$edad_al_ing_grupos,
       amelia_fit$imputations$imp48$edad_al_ing_grupos,
       amelia_fit$imputations$imp49$edad_al_ing_grupos,
       amelia_fit$imputations$imp50$edad_al_ing_grupos,
       amelia_fit$imputations$imp51$edad_al_ing_grupos,
       amelia_fit$imputations$imp52$edad_al_ing_grupos,
       amelia_fit$imputations$imp53$edad_al_ing_grupos,
       amelia_fit$imputations$imp54$edad_al_ing_grupos,
       amelia_fit$imputations$imp55$edad_al_ing_grupos,
       amelia_fit$imputations$imp56$edad_al_ing_grupos,
       amelia_fit$imputations$imp57$edad_al_ing_grupos,
       amelia_fit$imputations$imp58$edad_al_ing_grupos,
       amelia_fit$imputations$imp59$edad_al_ing_grupos,
       amelia_fit$imputations$imp60$edad_al_ing_grupos,
       amelia_fit$imputations$imp61$edad_al_ing_grupos
        ) %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  #18-29 30-39 40-49 50+
  janitor::clean_names() %>% 
  dplyr::arrange(amelia_fit_imputations_imp1_row) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>%
  dplyr::summarise(edad_18_29=sum(value == "18-29",na.rm=T),
                   edad_30_39=sum(value == "30-39",na.rm=T),
                   edad_40_49=sum(value == "40-49",na.rm=T),
                  edad_50mas=sum(value =="50+",na.rm=T)) %>% 
  dplyr::ungroup() %>% 
  #dplyr::mutate(edad_suma = base::rowSums(dplyr::select(is.na(.),starts_with("edad"))))
  dplyr::mutate(ties= base::rowSums(dplyr::select(.,starts_with("edad"))>0)) %>% 
  dplyr::mutate(edad_al_ing_grupos_imp= dplyr::case_when(
      (edad_18_29> edad_30_39) & (edad_18_29> edad_40_49) & (edad_18_29> edad_50mas)~"18-29",
      (edad_30_39> edad_18_29) & (edad_30_39> edad_40_49) & (edad_30_39> edad_50mas)~"30-39",
      (edad_40_49> edad_18_29) & (edad_40_49> edad_30_39) & (edad_40_49> edad_50mas)~"40-49",
      (edad_50mas> edad_18_29) & (edad_50mas> edad_30_39) & (edad_50mas> edad_40_49)~"50+"
      )) 

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:
# Reemplazo los valores perdidos:
CONS_C1_df_dup_SEP_2020_women_miss0<-
CONS_C1_df_dup_SEP_2020_women_miss %>% 
  dplyr::left_join(edad_al_ing_grupos_imputed,by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
  #si la edad al ingreso no existe, el valor promedio imutado es
  dplyr::mutate(edad_al_ing_grupos=dplyr::case_when(is.na(edad_al_ing_grupos)~edad_al_ing_grupos_imp,
                                                    T~as.character(edad_al_ing_grupos))) %>% 
  dplyr::select(-edad_18_29, -edad_30_39, -edad_40_49, -edad_50mas, -ties, -edad_al_ing_grupos_imp)

if(nrow(CONS_C1_df_dup_SEP_2020_women_miss0)-nrow(CONS_C1_df_dup_SEP_2020_women_miss)>0){
  warning("AGS: Some rows were added in the imputation")}

After the imputation, there were no missing cases left.

Primary or main substance

Then we imputed the primary/main substance at admission (n= 1).

# Ver distintos valores propuestos para sustancia de inciio
sus_principal_mod_imputed<-
 cbind.data.frame(amelia_fit$imputations$imp1$row,
       amelia_fit$imputations$imp1$sus_principal_mod,
       amelia_fit$imputations$imp2$sus_principal_mod,
       amelia_fit$imputations$imp3$sus_principal_mod,
       amelia_fit$imputations$imp4$sus_principal_mod,
       amelia_fit$imputations$imp5$sus_principal_mod,
       amelia_fit$imputations$imp6$sus_principal_mod,
       amelia_fit$imputations$imp7$sus_principal_mod,
       amelia_fit$imputations$imp8$sus_principal_mod,
       amelia_fit$imputations$imp9$sus_principal_mod,
       amelia_fit$imputations$imp10$sus_principal_mod,
       amelia_fit$imputations$imp11$sus_principal_mod,
       amelia_fit$imputations$imp12$sus_principal_mod,
       amelia_fit$imputations$imp13$sus_principal_mod,
       amelia_fit$imputations$imp14$sus_principal_mod,
       amelia_fit$imputations$imp15$sus_principal_mod,
       amelia_fit$imputations$imp16$sus_principal_mod,
       amelia_fit$imputations$imp17$sus_principal_mod,
       amelia_fit$imputations$imp18$sus_principal_mod,
       amelia_fit$imputations$imp19$sus_principal_mod,
       amelia_fit$imputations$imp20$sus_principal_mod,
       amelia_fit$imputations$imp21$sus_principal_mod,
       amelia_fit$imputations$imp22$sus_principal_mod,
       amelia_fit$imputations$imp23$sus_principal_mod,
       amelia_fit$imputations$imp24$sus_principal_mod,
       amelia_fit$imputations$imp25$sus_principal_mod,
       amelia_fit$imputations$imp26$sus_principal_mod,
       amelia_fit$imputations$imp27$sus_principal_mod,
       amelia_fit$imputations$imp28$sus_principal_mod,
       amelia_fit$imputations$imp29$sus_principal_mod,
       amelia_fit$imputations$imp30$sus_principal_mod,
       amelia_fit$imputations$imp31$sus_principal_mod,
       amelia_fit$imputations$imp32$sus_principal_mod,
       amelia_fit$imputations$imp33$sus_principal_mod,
       amelia_fit$imputations$imp34$sus_principal_mod,
       amelia_fit$imputations$imp35$sus_principal_mod,
       amelia_fit$imputations$imp36$sus_principal_mod,
       amelia_fit$imputations$imp37$sus_principal_mod,
       amelia_fit$imputations$imp38$sus_principal_mod,
       amelia_fit$imputations$imp39$sus_principal_mod,
       amelia_fit$imputations$imp40$sus_principal_mod,
       amelia_fit$imputations$imp41$sus_principal_mod,
       amelia_fit$imputations$imp42$sus_principal_mod,
       amelia_fit$imputations$imp43$sus_principal_mod,
       amelia_fit$imputations$imp44$sus_principal_mod,
       amelia_fit$imputations$imp45$sus_principal_mod,
       amelia_fit$imputations$imp46$sus_principal_mod,
       amelia_fit$imputations$imp47$sus_principal_mod,
       amelia_fit$imputations$imp48$sus_principal_mod,
       amelia_fit$imputations$imp49$sus_principal_mod,
       amelia_fit$imputations$imp50$sus_principal_mod,
       amelia_fit$imputations$imp51$sus_principal_mod,
       amelia_fit$imputations$imp52$sus_principal_mod,
       amelia_fit$imputations$imp53$sus_principal_mod,
       amelia_fit$imputations$imp54$sus_principal_mod,
       amelia_fit$imputations$imp55$sus_principal_mod,
       amelia_fit$imputations$imp56$sus_principal_mod,
       amelia_fit$imputations$imp57$sus_principal_mod,
       amelia_fit$imputations$imp58$sus_principal_mod,
       amelia_fit$imputations$imp59$sus_principal_mod,
       amelia_fit$imputations$imp60$sus_principal_mod,
       amelia_fit$imputations$imp61$sus_principal_mod
       )  %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  #18-29 30-39 40-49 50+
  janitor::clean_names() %>% 
  dplyr::arrange(amelia_fit_imputations_imp1_row) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>%
  dplyr::summarise(sus_prin_mar=sum(value == "Marijuana",na.rm=T),
                   sus_prin_oh=sum(value == "Alcohol",na.rm=T),
                   sus_prin_pb=sum(value == "Cocaine paste",na.rm=T),
                  sus_prin_coc=sum(value =="Cocaine hydrochloride",na.rm=T),
                  sus_prin_other=sum(value =="Other",na.rm=T)) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(ties= base::rowSums(dplyr::select(.,starts_with("sus_prin_"))>0)) %>% 
  dplyr::mutate(sus_principal_mod_imp= dplyr::case_when(
  (sus_prin_mar> sus_prin_oh)& (sus_prin_mar> sus_prin_pb)& (sus_prin_mar> sus_prin_coc)& (sus_prin_mar> sus_prin_other)~"Marijuana",
  (sus_prin_oh> sus_prin_mar)& (sus_prin_oh> sus_prin_pb)& (sus_prin_oh> sus_prin_coc)& (sus_prin_oh> sus_prin_other)~"Alcohol",
  (sus_prin_pb> sus_prin_mar)& (sus_prin_pb> sus_prin_oh)& (sus_prin_pb> sus_prin_coc)& (sus_prin_pb> sus_prin_other)~"Cocaine paste",
  (sus_prin_coc> sus_prin_mar)& (sus_prin_coc> sus_prin_oh)& (sus_prin_coc> sus_prin_pb)& (sus_prin_coc> sus_prin_other)~"Cocaine hydrochloride",
  (sus_prin_other> sus_prin_mar)& (sus_prin_other> sus_prin_oh)& (sus_prin_other> sus_prin_pb)& (sus_prin_other> sus_prin_coc)~"Cocaine hydrochloride"
  ))

## `summarise()` ungrouping output (override with `.groups` argument)

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:
CONS_C1_df_dup_SEP_2020_women_miss1<-
CONS_C1_df_dup_SEP_2020_women_miss0 %>% 
   dplyr::left_join(sus_principal_mod_imputed, by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
    dplyr::mutate(sus_principal_mod=factor(dplyr::case_when(is.na(sus_principal_mod)~as.character(sus_principal_mod_imp),
                                 TRUE~as.character(sus_principal_mod)))) %>% 
  dplyr::select(-c(sus_prin_mar, sus_prin_oh, sus_prin_pb, sus_prin_coc, sus_prin_other, ties, sus_principal_mod_imp)) %>% 
  data.table()
#_#_#_#_#_#_#__#_##_#_#_#_#_#_#_#_#_#_#_#_#__#_##_#_#_#_#_##_#_#_#_#_#_#__#_##_#_#_#_#_#_#_#_#_#_#_#_#__#_##_#_#_#_#_#
if(nrow(CONS_C1_df_dup_SEP_2020_women_miss1)-nrow(CONS_C1_df_dup_SEP_2020_women_miss0)>0){
  warning("AGS: Some rows were added in the imputation")}

As a result of the imputations, there were no missing values.

Co-occurring SUD

Another variable worth imputing is the presence of additional Substance Use Disorders (n= 1).

#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
num_otras_sus_mod_imputed<-
  cbind.data.frame(amelia_fit$imputations$imp1$row,
       amelia_fit$imputations$imp1$num_otras_sus_mod,
       amelia_fit$imputations$imp2$num_otras_sus_mod,
       amelia_fit$imputations$imp3$num_otras_sus_mod,
       amelia_fit$imputations$imp4$num_otras_sus_mod,
       amelia_fit$imputations$imp5$num_otras_sus_mod,
       amelia_fit$imputations$imp6$num_otras_sus_mod,
       amelia_fit$imputations$imp7$num_otras_sus_mod,
       amelia_fit$imputations$imp8$num_otras_sus_mod,
       amelia_fit$imputations$imp9$num_otras_sus_mod,
       amelia_fit$imputations$imp10$num_otras_sus_mod,
       amelia_fit$imputations$imp11$num_otras_sus_mod,
       amelia_fit$imputations$imp12$num_otras_sus_mod,
       amelia_fit$imputations$imp13$num_otras_sus_mod,
       amelia_fit$imputations$imp14$num_otras_sus_mod,
       amelia_fit$imputations$imp15$num_otras_sus_mod,
       amelia_fit$imputations$imp16$num_otras_sus_mod,
       amelia_fit$imputations$imp17$num_otras_sus_mod,
       amelia_fit$imputations$imp18$num_otras_sus_mod,
       amelia_fit$imputations$imp19$num_otras_sus_mod,
       amelia_fit$imputations$imp20$num_otras_sus_mod,
       amelia_fit$imputations$imp21$num_otras_sus_mod,
       amelia_fit$imputations$imp22$num_otras_sus_mod,
       amelia_fit$imputations$imp23$num_otras_sus_mod,
       amelia_fit$imputations$imp24$num_otras_sus_mod,
       amelia_fit$imputations$imp25$num_otras_sus_mod,
       amelia_fit$imputations$imp26$num_otras_sus_mod,
       amelia_fit$imputations$imp27$num_otras_sus_mod,
       amelia_fit$imputations$imp28$num_otras_sus_mod,
       amelia_fit$imputations$imp29$num_otras_sus_mod,
       amelia_fit$imputations$imp30$num_otras_sus_mod,
       amelia_fit$imputations$imp31$num_otras_sus_mod,
       amelia_fit$imputations$imp32$num_otras_sus_mod,
       amelia_fit$imputations$imp33$num_otras_sus_mod,
       amelia_fit$imputations$imp34$num_otras_sus_mod,
       amelia_fit$imputations$imp35$num_otras_sus_mod,
       amelia_fit$imputations$imp36$num_otras_sus_mod,
       amelia_fit$imputations$imp37$num_otras_sus_mod,
       amelia_fit$imputations$imp38$num_otras_sus_mod,
       amelia_fit$imputations$imp39$num_otras_sus_mod,
       amelia_fit$imputations$imp40$num_otras_sus_mod,
       amelia_fit$imputations$imp41$num_otras_sus_mod,
       amelia_fit$imputations$imp42$num_otras_sus_mod,
       amelia_fit$imputations$imp43$num_otras_sus_mod,
       amelia_fit$imputations$imp44$num_otras_sus_mod,
       amelia_fit$imputations$imp45$num_otras_sus_mod,
       amelia_fit$imputations$imp46$num_otras_sus_mod,
       amelia_fit$imputations$imp47$num_otras_sus_mod,
       amelia_fit$imputations$imp48$num_otras_sus_mod,
       amelia_fit$imputations$imp49$num_otras_sus_mod,
       amelia_fit$imputations$imp50$num_otras_sus_mod,
       amelia_fit$imputations$imp51$num_otras_sus_mod,
       amelia_fit$imputations$imp52$num_otras_sus_mod,
       amelia_fit$imputations$imp53$num_otras_sus_mod,
       amelia_fit$imputations$imp54$num_otras_sus_mod,
       amelia_fit$imputations$imp55$num_otras_sus_mod,
       amelia_fit$imputations$imp56$num_otras_sus_mod,
       amelia_fit$imputations$imp57$num_otras_sus_mod,
       amelia_fit$imputations$imp58$num_otras_sus_mod,
       amelia_fit$imputations$imp59$num_otras_sus_mod,
       amelia_fit$imputations$imp60$num_otras_sus_mod,
       amelia_fit$imputations$imp61$num_otras_sus_mod
       ) %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  janitor::clean_names() %>% 
  dplyr::arrange(amelia_fit_imputations_imp1_row) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>%
  dplyr::summarise(no_ad_subs=sum(value == "No additional substance",na.rm=T),
                   one_ad_subs=sum(value == "One additional substance",na.rm=T),
                   more_one_ad_subs=sum(value == "More than one additional substance",na.rm=T)) %>% 
  dplyr::ungroup() %>% 
# Hacer la variable imputada
  dplyr::mutate(num_otras_sus_mod_imp= dplyr::case_when(
      (no_ad_subs>one_ad_subs)&(no_ad_subs>more_one_ad_subs)~"No additional substance",
      (one_ad_subs>no_ad_subs)&(one_ad_subs>more_one_ad_subs)~"One additional substance",
      (more_one_ad_subs>no_ad_subs)&(more_one_ad_subs>one_ad_subs)~"More than one additional substance",
      T~NA_character_))

#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:

CONS_C1_df_dup_SEP_2020_women_miss2<-
  CONS_C1_df_dup_SEP_2020_women_miss1 %>% 
  dplyr::left_join(num_otras_sus_mod_imputed[,c("amelia_fit_imputations_imp1_row","num_otras_sus_mod_imp")],
                   by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
  #si la edad al ingreso no existe, el valor promedio imutado es
  dplyr::mutate(num_otras_sus_mod= 
                  dplyr::case_when(is.na(num_otras_sus_mod)~num_otras_sus_mod_imp,
                                  T~as.character(num_otras_sus_mod))) %>% 
  dplyr::select(-num_otras_sus_mod_imp)

#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:

#is.na(edad_ini_cons) & is.na(edad_ini_sus_prin) & is.na(min_edad_al_ing)~as.numeric(avg),
#table(is.na(CONS_C1_df_dup_SEP_2020_women_miss1$edad_ini_cons))
paste0("Number of rows with values that did not fulfilled the conditions: ",CONS_C1_df_dup_SEP_2020_women_miss2 %>%  dplyr::filter(is.na(num_otras_sus_mod)) %>% 
    dplyr::select(hash_key, edad_al_ing_grupos,num_otras_sus_mod) %>% nrow())

## [1] "Number of rows with values that did not fulfilled the conditions: 0"

#Lo importante es tener en cuenta que las imputaciones se hicieron por filas; no, en cambio, ahora debemos reemplazar aquellos casos que tienen perdidos (no cumplieron con las condiciones) con el valor mínimo
if(nrow(CONS_C1_df_dup_SEP_2020_women_miss2)-nrow(CONS_C1_df_dup_SEP_2020_women_miss1)>0){
  warning("AGS: Some rows were added in the imputation")}

As a result of the imputations, there were 0 values of co-occurring SUDs available.

Frequency of Use of the Primary Substance at Admission

Another variable that is worth imputing is the Frequency of use of primary drug at admission (n= 116). In case of ties, we selected the imputed values with the value with the most frequent drug use.

# Ver distintos valores propuestos para sustancia de inciio
freq_cons_sus_prin_imputed<-
 cbind.data.frame(amelia_fit$imputations$imp1$row,
       amelia_fit$imputations$imp1$freq_cons_sus_prin,
       amelia_fit$imputations$imp2$freq_cons_sus_prin,
       amelia_fit$imputations$imp3$freq_cons_sus_prin,
       amelia_fit$imputations$imp4$freq_cons_sus_prin,
       amelia_fit$imputations$imp5$freq_cons_sus_prin,
       amelia_fit$imputations$imp6$freq_cons_sus_prin,
       amelia_fit$imputations$imp7$freq_cons_sus_prin,
       amelia_fit$imputations$imp8$freq_cons_sus_prin,
       amelia_fit$imputations$imp9$freq_cons_sus_prin,
       amelia_fit$imputations$imp10$freq_cons_sus_prin,
       amelia_fit$imputations$imp11$freq_cons_sus_prin,
       amelia_fit$imputations$imp12$freq_cons_sus_prin,
       amelia_fit$imputations$imp13$freq_cons_sus_prin,
       amelia_fit$imputations$imp14$freq_cons_sus_prin,
       amelia_fit$imputations$imp15$freq_cons_sus_prin,
       amelia_fit$imputations$imp16$freq_cons_sus_prin,
       amelia_fit$imputations$imp17$freq_cons_sus_prin,
       amelia_fit$imputations$imp18$freq_cons_sus_prin,
       amelia_fit$imputations$imp19$freq_cons_sus_prin,
       amelia_fit$imputations$imp20$freq_cons_sus_prin,
       amelia_fit$imputations$imp21$freq_cons_sus_prin,
       amelia_fit$imputations$imp22$freq_cons_sus_prin,
       amelia_fit$imputations$imp23$freq_cons_sus_prin,
       amelia_fit$imputations$imp24$freq_cons_sus_prin,
       amelia_fit$imputations$imp25$freq_cons_sus_prin,
       amelia_fit$imputations$imp26$freq_cons_sus_prin,
       amelia_fit$imputations$imp27$freq_cons_sus_prin,
       amelia_fit$imputations$imp28$freq_cons_sus_prin,
       amelia_fit$imputations$imp29$freq_cons_sus_prin,
       amelia_fit$imputations$imp30$freq_cons_sus_prin,
       amelia_fit$imputations$imp31$freq_cons_sus_prin,
       amelia_fit$imputations$imp32$freq_cons_sus_prin,
       amelia_fit$imputations$imp33$freq_cons_sus_prin,
       amelia_fit$imputations$imp34$freq_cons_sus_prin,
       amelia_fit$imputations$imp35$freq_cons_sus_prin,
       amelia_fit$imputations$imp36$freq_cons_sus_prin,
       amelia_fit$imputations$imp37$freq_cons_sus_prin,
       amelia_fit$imputations$imp38$freq_cons_sus_prin,
       amelia_fit$imputations$imp39$freq_cons_sus_prin,
       amelia_fit$imputations$imp40$freq_cons_sus_prin,
       amelia_fit$imputations$imp41$freq_cons_sus_prin,
       amelia_fit$imputations$imp42$freq_cons_sus_prin,
       amelia_fit$imputations$imp43$freq_cons_sus_prin,
       amelia_fit$imputations$imp44$freq_cons_sus_prin,
       amelia_fit$imputations$imp45$freq_cons_sus_prin,
       amelia_fit$imputations$imp46$freq_cons_sus_prin,
       amelia_fit$imputations$imp47$freq_cons_sus_prin,
       amelia_fit$imputations$imp48$freq_cons_sus_prin,
       amelia_fit$imputations$imp49$freq_cons_sus_prin,
       amelia_fit$imputations$imp50$freq_cons_sus_prin,
       amelia_fit$imputations$imp51$freq_cons_sus_prin,
       amelia_fit$imputations$imp52$freq_cons_sus_prin,
       amelia_fit$imputations$imp53$freq_cons_sus_prin,
       amelia_fit$imputations$imp54$freq_cons_sus_prin,
       amelia_fit$imputations$imp55$freq_cons_sus_prin,
       amelia_fit$imputations$imp56$freq_cons_sus_prin,
       amelia_fit$imputations$imp57$freq_cons_sus_prin,
       amelia_fit$imputations$imp58$freq_cons_sus_prin,
       amelia_fit$imputations$imp59$freq_cons_sus_prin,
       amelia_fit$imputations$imp60$freq_cons_sus_prin,
       amelia_fit$imputations$imp61$freq_cons_sus_prin
       ) 

freq_cons_sus_prin_imputed<-
freq_cons_sus_prin_imputed %>% 
  data.frame() %>% 
dplyr::mutate(across(c(amelia_fit.imputations.imp1.freq_cons_sus_prin:amelia_fit.imputations.imp30.freq_cons_sus_prin),~dplyr::case_when(grepl("1 day a week or more",as.character(.))~1,TRUE~0), .names="1_day_{col}"))%>%
dplyr::mutate(across(c(amelia_fit.imputations.imp1.freq_cons_sus_prin:amelia_fit.imputations.imp30.freq_cons_sus_prin),~dplyr::case_when(grepl("2 to 3 days a week",as.character(.))~1,TRUE~0), .names="2_3_{col}"))%>%
dplyr::mutate(across(c(amelia_fit.imputations.imp1.freq_cons_sus_prin:amelia_fit.imputations.imp30.freq_cons_sus_prin),~dplyr::case_when(grepl("4 to 6 days a week",as.character(.))~1,TRUE~0), .names="4_6_{col}"))%>%
  dplyr::mutate(across(c(amelia_fit.imputations.imp1.freq_cons_sus_prin:amelia_fit.imputations.imp30.freq_cons_sus_prin),~dplyr::case_when(grepl("Less than 1 day a week",as.character(.))~1,TRUE~0), .names="less_1_{col}"))%>%
  dplyr::mutate(across(c(amelia_fit.imputations.imp1.freq_cons_sus_prin:amelia_fit.imputations.imp30.freq_cons_sus_prin),~dplyr::case_when(grepl("Did not use",as.character(.))~1,TRUE~0), .names="did_not_{col}"))%>%
    dplyr::mutate(across(c(amelia_fit.imputations.imp1.freq_cons_sus_prin:amelia_fit.imputations.imp30.freq_cons_sus_prin),~dplyr::case_when(grepl("Daily",as.character(.))~1,TRUE~0), .names="daily_{col}"))%>%
  dplyr::mutate(freq_cons_sus_prin_daily = base::rowSums(dplyr::select(., starts_with("daily_")))) %>% 
  dplyr::mutate(freq_cons_sus_prin_4_6 = base::rowSums(dplyr::select(., starts_with("4_6_"))))%>%
  dplyr::mutate(freq_cons_sus_prin_2_3 = base::rowSums(dplyr::select(., starts_with("2_3_"))))%>%
  dplyr::mutate(freq_cons_sus_prin_1_day = base::rowSums(dplyr::select(., starts_with("1_day_"))))%>%
  dplyr::mutate(freq_cons_sus_prin_less_1 = base::rowSums(dplyr::select(., starts_with("less_1_"))))%>%
  dplyr::mutate(freq_cons_sus_prin_did_not = base::rowSums(dplyr::select(., starts_with("did_not_")))) %>% 
  #dplyr::summarise(min_mar=max(sus_ini_mod_mvv_mar[sus_ini_mod_mvv_mar<30]),min_oh=max(sus_ini_mod_mvv_oh[sus_ini_mod_mvv_oh<30]),min_pb=max(sus_ini_mod_mvv_pb[sus_ini_mod_mvv_pb<30]),min_coc=max(sus_ini_mod_mvv_coc[sus_ini_mod_mvv_coc<30]),min_otr=max(sus_ini_mod_mvv_otr[sus_ini_mod_mvv_otr<30]))
  dplyr::mutate(freq_cons_sus_prin_tot=dplyr::case_when(freq_cons_sus_prin_1_day>0~1,TRUE~0)) %>% 
  dplyr::mutate(freq_cons_sus_prin_tot=dplyr::case_when(freq_cons_sus_prin_2_3>0~freq_cons_sus_prin_tot+1,TRUE~freq_cons_sus_prin_tot)) %>% 
  dplyr::mutate(freq_cons_sus_prin_tot=dplyr::case_when(freq_cons_sus_prin_4_6>0~freq_cons_sus_prin_tot+1,TRUE~freq_cons_sus_prin_tot)) %>% 
  dplyr::mutate(freq_cons_sus_prin_tot=dplyr::case_when(freq_cons_sus_prin_less_1>0~freq_cons_sus_prin_tot+1,TRUE~freq_cons_sus_prin_tot)) %>% 
  dplyr::mutate(freq_cons_sus_prin_tot=dplyr::case_when(freq_cons_sus_prin_did_not>0~freq_cons_sus_prin_tot+1,TRUE~freq_cons_sus_prin_tot)) %>% 
  dplyr::mutate(freq_cons_sus_prin_tot=dplyr::case_when(freq_cons_sus_prin_daily>0~freq_cons_sus_prin_tot+1,TRUE~freq_cons_sus_prin_tot)) %>% 
  #hierarchy
  dplyr::mutate(freq_cons_sus_prin_to_imputation=
                  dplyr::case_when(freq_cons_sus_prin_tot==1 & freq_cons_sus_prin_daily>0~"Daily",
                                     freq_cons_sus_prin_tot==1 & freq_cons_sus_prin_4_6>0~"4 to 6 days a week",freq_cons_sus_prin_tot==1 & freq_cons_sus_prin_2_3>0~"2 to 3 days a week",freq_cons_sus_prin_tot==1 & freq_cons_sus_prin_1_day>0~"1 day a week or more",freq_cons_sus_prin_tot==1 & freq_cons_sus_prin_less_1>0~"Less than 1 day a week",freq_cons_sus_prin_tot==1 & freq_cons_sus_prin_did_not>0~"Did not use",freq_cons_sus_prin_tot>1 & freq_cons_sus_prin_daily>0~"Daily",freq_cons_sus_prin_tot>1 & freq_cons_sus_prin_4_6>0~"4 to 6 days a week",freq_cons_sus_prin_tot>1 & freq_cons_sus_prin_2_3>0~"2 to 3 days a week",freq_cons_sus_prin_tot>1 & freq_cons_sus_prin_1_day>0~"1 day a week or more",freq_cons_sus_prin_tot>1 & freq_cons_sus_prin_less_1>0~"Less than 1 day a week",freq_cons_sus_prin_tot>1 & freq_cons_sus_prin_did_not>0~"Did not use")) %>% 
  janitor::clean_names()

freq_cons_sus_prin_imputed<-
dplyr::select(freq_cons_sus_prin_imputed,amelia_fit_imputations_imp1_row,freq_cons_sus_prin_to_imputation)

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:

CONS_C1_df_dup_SEP_2020_women_miss3<-
CONS_C1_df_dup_SEP_2020_women_miss2 %>% 
   dplyr::left_join(freq_cons_sus_prin_imputed, by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
    dplyr::mutate(freq_cons_sus_prin=factor(dplyr::case_when(is.na(freq_cons_sus_prin)~as.character(freq_cons_sus_prin_to_imputation), TRUE~as.character(freq_cons_sus_prin)))) %>% 
  dplyr::select(-freq_cons_sus_prin_to_imputation) %>% 
  data.table()
if(nrow(CONS_C1_df_dup_SEP_2020_women_miss3)-nrow(CONS_C1_df_dup_SEP_2020_women_miss2)>0){
  warning("AGS: Some rows were added in the imputation")}

As a result of the imputations, there were no missing values.

Educational Attainment

Another variable that is worth imputing is the Educational Attainment (n= 115). In case of ties, we imputed for the most vulnerable category among the candidates.

# Ver distintos valores propuestos para sustancia de inciio
escolaridad_rec_imputed<-
 cbind.data.frame(amelia_fit$imputations$imp1$row,
                  amelia_fit$imputations$imp1$escolaridad_rec,
                  amelia_fit$imputations$imp2$escolaridad_rec,
                  amelia_fit$imputations$imp3$escolaridad_rec,
                  amelia_fit$imputations$imp4$escolaridad_rec,
                  amelia_fit$imputations$imp5$escolaridad_rec,
                  amelia_fit$imputations$imp6$escolaridad_rec,
                  amelia_fit$imputations$imp7$escolaridad_rec,
                  amelia_fit$imputations$imp8$escolaridad_rec,
                  amelia_fit$imputations$imp9$escolaridad_rec,
                  amelia_fit$imputations$imp10$escolaridad_rec,
                  amelia_fit$imputations$imp11$escolaridad_rec,
                  amelia_fit$imputations$imp12$escolaridad_rec,
                  amelia_fit$imputations$imp13$escolaridad_rec,
                  amelia_fit$imputations$imp14$escolaridad_rec,
                  amelia_fit$imputations$imp15$escolaridad_rec,
                  amelia_fit$imputations$imp16$escolaridad_rec,
                  amelia_fit$imputations$imp17$escolaridad_rec,
                  amelia_fit$imputations$imp18$escolaridad_rec,
                  amelia_fit$imputations$imp19$escolaridad_rec,
                  amelia_fit$imputations$imp20$escolaridad_rec,
                  amelia_fit$imputations$imp21$escolaridad_rec,
                  amelia_fit$imputations$imp22$escolaridad_rec,
                  amelia_fit$imputations$imp23$escolaridad_rec,
                  amelia_fit$imputations$imp24$escolaridad_rec,
                  amelia_fit$imputations$imp25$escolaridad_rec,
                  amelia_fit$imputations$imp26$escolaridad_rec,
                  amelia_fit$imputations$imp27$escolaridad_rec,
                  amelia_fit$imputations$imp28$escolaridad_rec,
                  amelia_fit$imputations$imp29$escolaridad_rec,
                  amelia_fit$imputations$imp30$escolaridad_rec,
                  amelia_fit$imputations$imp31$escolaridad_rec,
                  amelia_fit$imputations$imp32$escolaridad_rec,
                  amelia_fit$imputations$imp33$escolaridad_rec,
                  amelia_fit$imputations$imp34$escolaridad_rec,
                  amelia_fit$imputations$imp35$escolaridad_rec,
                  amelia_fit$imputations$imp36$escolaridad_rec,
                  amelia_fit$imputations$imp37$escolaridad_rec,
                  amelia_fit$imputations$imp38$escolaridad_rec,
                  amelia_fit$imputations$imp39$escolaridad_rec,
                  amelia_fit$imputations$imp40$escolaridad_rec,
                  amelia_fit$imputations$imp41$escolaridad_rec,
                  amelia_fit$imputations$imp42$escolaridad_rec,
                  amelia_fit$imputations$imp43$escolaridad_rec,
                  amelia_fit$imputations$imp44$escolaridad_rec,
                  amelia_fit$imputations$imp45$escolaridad_rec,
                  amelia_fit$imputations$imp46$escolaridad_rec,
                  amelia_fit$imputations$imp47$escolaridad_rec,
                  amelia_fit$imputations$imp48$escolaridad_rec,
                  amelia_fit$imputations$imp49$escolaridad_rec,
                  amelia_fit$imputations$imp50$escolaridad_rec,
                  amelia_fit$imputations$imp51$escolaridad_rec,
                  amelia_fit$imputations$imp52$escolaridad_rec,
                  amelia_fit$imputations$imp53$escolaridad_rec,
                  amelia_fit$imputations$imp54$escolaridad_rec,
                  amelia_fit$imputations$imp55$escolaridad_rec,
                  amelia_fit$imputations$imp56$escolaridad_rec,
                  amelia_fit$imputations$imp57$escolaridad_rec,
                  amelia_fit$imputations$imp58$escolaridad_rec,
                  amelia_fit$imputations$imp59$escolaridad_rec,
                  amelia_fit$imputations$imp60$escolaridad_rec,
                  amelia_fit$imputations$imp61$escolaridad_rec) %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  janitor::clean_names() %>% 
  dplyr::arrange(amelia_fit_imputations_imp1_row) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>%
  # 1-Mild 2-Moderate   3-Severe 
  dplyr::summarise(primary_3=sum(value == "3-Completed primary school or less",na.rm=T),
                   high_sc_2=sum(value == "2-Completed high school or less",na.rm=T),
                  more_h_sc_1=sum(value =="1-More than high school",na.rm=T)) %>% 
  dplyr::ungroup() %>%
    dplyr::mutate(escolaridad_rec_imp= dplyr::case_when(
      (more_h_sc_1>primary_3) & (more_h_sc_1>high_sc_2)~"1-More than high school",
      (high_sc_2>primary_3) & (high_sc_2>more_h_sc_1)~"2-Completed high school or less",
      (primary_3>more_h_sc_1) & (primary_3>high_sc_2)~"3-Completed primary school or less"
      )) %>% 
#2) Resolve ties    
  dplyr::mutate(ties= dplyr::case_when(is.na(escolaridad_rec_imp)~1,T~0)) %>% 
  dplyr::mutate(escolaridad_rec_imp= dplyr::case_when(ties==1 & ((primary_3>more_h_sc_1)|(primary_3>high_sc_2))~"3-Completed primary school or less", ties==1 & ((high_sc_2>primary_3)|(high_sc_2>more_h_sc_1))~"2-Completed high school or less",
                T~escolaridad_rec_imp))

## `summarise()` ungrouping output (override with `.groups` argument)

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:
#CONS_C1_df_dup_SEP_2020 %>% janitor::tabyl(motivodeegreso_mod_imp,evaluacindelprocesoteraputico)

CONS_C1_df_dup_SEP_2020_women_miss4<-
CONS_C1_df_dup_SEP_2020_women_miss3 %>% 
   dplyr::left_join(escolaridad_rec_imputed[,c("amelia_fit_imputations_imp1_row","escolaridad_rec_imp")], by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
    dplyr::mutate(escolaridad_rec=factor(dplyr::case_when(is.na(escolaridad_rec)~ escolaridad_rec_imp,
                                                                        TRUE~as.character(escolaridad_rec)))) %>% 
     dplyr::mutate(escolaridad_rec=parse_factor(as.character(escolaridad_rec),levels=c("1-More than high school", "2-Completed high school or less","3-Completed primary school or less"), ordered =T,trim_ws=T,include_na =F, locale=locale(encoding = "UTF-8"))) %>% 
  dplyr::select(-escolaridad_rec_imp) %>% 
  data.table()
if(nrow(CONS_C1_df_dup_SEP_2020_women_miss4)-nrow(CONS_C1_df_dup_SEP_2020_women_miss3)>0){
  warning("AGS: Some rows were added in the imputation")}

We ended having 0 missing values in educational attainment.

Additionally, we replaced missing values of the marital status (n=34). Since different marital status were not clearly more vulnerable between each other, we selected the most frequent imputed value among the different imputed databases. Only in case of ties in the candidate values, we resolved them by discarding “married” status, which could be somehow less vulnerable than other categories.

# Ver distintos valores propuestos para estado conyugal
estado_conyugal_2_imputed<-
 cbind.data.frame(amelia_fit$imputations$imp1$row,
       amelia_fit$imputations$imp1$estado_conyugal_2,
       amelia_fit$imputations$imp2$estado_conyugal_2,
       amelia_fit$imputations$imp3$estado_conyugal_2,
       amelia_fit$imputations$imp4$estado_conyugal_2,
       amelia_fit$imputations$imp5$estado_conyugal_2,
       amelia_fit$imputations$imp6$estado_conyugal_2,
       amelia_fit$imputations$imp7$estado_conyugal_2,
       amelia_fit$imputations$imp8$estado_conyugal_2,
       amelia_fit$imputations$imp9$estado_conyugal_2,
       amelia_fit$imputations$imp10$estado_conyugal_2,
       amelia_fit$imputations$imp11$estado_conyugal_2,
       amelia_fit$imputations$imp12$estado_conyugal_2,
       amelia_fit$imputations$imp13$estado_conyugal_2,
       amelia_fit$imputations$imp14$estado_conyugal_2,
       amelia_fit$imputations$imp15$estado_conyugal_2,
       amelia_fit$imputations$imp16$estado_conyugal_2,
       amelia_fit$imputations$imp17$estado_conyugal_2,
       amelia_fit$imputations$imp18$estado_conyugal_2,
       amelia_fit$imputations$imp19$estado_conyugal_2,
       amelia_fit$imputations$imp20$estado_conyugal_2,
       amelia_fit$imputations$imp21$estado_conyugal_2,
       amelia_fit$imputations$imp22$estado_conyugal_2,
       amelia_fit$imputations$imp23$estado_conyugal_2,
       amelia_fit$imputations$imp24$estado_conyugal_2,
       amelia_fit$imputations$imp25$estado_conyugal_2,
       amelia_fit$imputations$imp26$estado_conyugal_2,
       amelia_fit$imputations$imp27$estado_conyugal_2,
       amelia_fit$imputations$imp28$estado_conyugal_2,
       amelia_fit$imputations$imp29$estado_conyugal_2,
       amelia_fit$imputations$imp30$estado_conyugal_2,
       amelia_fit$imputations$imp31$estado_conyugal_2,
       amelia_fit$imputations$imp32$estado_conyugal_2,
       amelia_fit$imputations$imp33$estado_conyugal_2,
       amelia_fit$imputations$imp34$estado_conyugal_2,
       amelia_fit$imputations$imp35$estado_conyugal_2,
       amelia_fit$imputations$imp36$estado_conyugal_2,
       amelia_fit$imputations$imp37$estado_conyugal_2,
       amelia_fit$imputations$imp38$estado_conyugal_2,
       amelia_fit$imputations$imp39$estado_conyugal_2,
       amelia_fit$imputations$imp40$estado_conyugal_2,
       amelia_fit$imputations$imp41$estado_conyugal_2,
       amelia_fit$imputations$imp42$estado_conyugal_2,
       amelia_fit$imputations$imp43$estado_conyugal_2,
       amelia_fit$imputations$imp44$estado_conyugal_2,
       amelia_fit$imputations$imp45$estado_conyugal_2,
       amelia_fit$imputations$imp46$estado_conyugal_2,
       amelia_fit$imputations$imp47$estado_conyugal_2,
       amelia_fit$imputations$imp48$estado_conyugal_2,
       amelia_fit$imputations$imp49$estado_conyugal_2,
       amelia_fit$imputations$imp50$estado_conyugal_2,
       amelia_fit$imputations$imp51$estado_conyugal_2,
       amelia_fit$imputations$imp52$estado_conyugal_2,
       amelia_fit$imputations$imp53$estado_conyugal_2,
       amelia_fit$imputations$imp54$estado_conyugal_2,
       amelia_fit$imputations$imp55$estado_conyugal_2,
       amelia_fit$imputations$imp56$estado_conyugal_2,
       amelia_fit$imputations$imp57$estado_conyugal_2,
       amelia_fit$imputations$imp58$estado_conyugal_2,
       amelia_fit$imputations$imp59$estado_conyugal_2,
       amelia_fit$imputations$imp60$estado_conyugal_2,
       amelia_fit$imputations$imp61$estado_conyugal_2
       ) 

estado_conyugal_2_imputed<-
estado_conyugal_2_imputed %>% 
  data.frame() %>% 
dplyr::mutate(across(c(amelia_fit.imputations.imp1.estado_conyugal_2:amelia_fit.imputations.imp30.estado_conyugal_2),~dplyr::case_when(grepl("Married/Shared living arrangements",as.character(.))~1,TRUE~0), .names="married_{col}"))%>%
dplyr::mutate(across(c(amelia_fit.imputations.imp1.estado_conyugal_2:amelia_fit.imputations.imp30.estado_conyugal_2),~dplyr::case_when(grepl("Separated/Divorced",as.character(.))~1,TRUE~0), .names="sep_div_{col}"))%>%
dplyr::mutate(across(c(amelia_fit.imputations.imp1.estado_conyugal_2:amelia_fit.imputations.imp30.estado_conyugal_2),~dplyr::case_when(grepl("Single",as.character(.))~1,TRUE~0), .names="singl_{col}"))%>%
  dplyr::mutate(across(c(amelia_fit.imputations.imp1.estado_conyugal_2:amelia_fit.imputations.imp30.estado_conyugal_2),~dplyr::case_when(grepl("Widower",as.character(.))~1,TRUE~0), .names="widow_{col}"))%>%
 
  dplyr::mutate(estado_conyugal_2_married = base::rowSums(dplyr::select(., starts_with("married_"))))%>%
  dplyr::mutate(estado_conyugal_2_sep_div = base::rowSums(dplyr::select(., starts_with("sep_div_"))))%>%
  dplyr::mutate(estado_conyugal_2_singl = base::rowSums(dplyr::select(., starts_with("singl_"))))%>%
  dplyr::mutate(estado_conyugal_2_wid = base::rowSums(dplyr::select(., starts_with("widow_"))))%>%
  #dplyr::summarise(min_mar=max(sus_ini_mod_mvv_mar[sus_ini_mod_mvv_mar<30]),min_oh=max(sus_ini_mod_mvv_oh[sus_ini_mod_mvv_oh<30]),min_pb=max(sus_ini_mod_mvv_pb[sus_ini_mod_mvv_pb<30]),min_coc=max(sus_ini_mod_mvv_coc[sus_ini_mod_mvv_coc<30]),min_otr=max(sus_ini_mod_mvv_otr[sus_ini_mod_mvv_otr<30]))
  dplyr::mutate(estado_conyugal_2_tot=dplyr::case_when(estado_conyugal_2_married>0~1,TRUE~0)) %>% 
  dplyr::mutate(estado_conyugal_2_tot=dplyr::case_when(estado_conyugal_2_sep_div>0~estado_conyugal_2_tot+1,TRUE~estado_conyugal_2_tot)) %>% 
  dplyr::mutate(estado_conyugal_2_tot=dplyr::case_when(estado_conyugal_2_singl>0~estado_conyugal_2_tot+1,TRUE~estado_conyugal_2_tot)) %>% 
  dplyr::mutate(estado_conyugal_2_tot=dplyr::case_when(estado_conyugal_2_wid>0~estado_conyugal_2_tot+1,TRUE~estado_conyugal_2_tot)) %>% 
  janitor::clean_names()
  
estado_conyugal_2_imputed_cat_est_cony<-  
    estado_conyugal_2_imputed %>%
        tidyr::pivot_longer(c(estado_conyugal_2_married, estado_conyugal_2_sep_div, estado_conyugal_2_singl, estado_conyugal_2_wid), names_to = "cat_est_conyugal", values_to = "count") %>%
        dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
        dplyr::mutate(estado_conyugal_2_imputed_max=max(count,na.rm=T)) %>% 
        dplyr::ungroup() %>% 
        dplyr::filter(estado_conyugal_2_imputed_max==count) %>% 
        dplyr::select(amelia_fit_imputations_imp1_row,cat_est_conyugal,count) %>% 
        dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
        dplyr::mutate(n_row=n()) %>% 
        dplyr::ungroup() %>% 
        dplyr::mutate(cat_est_conyugal=dplyr::case_when(n_row>1~NA_character_,
                                                        TRUE~cat_est_conyugal)) %>% 
        dplyr::distinct(amelia_fit_imputations_imp1_row,.keep_all = T)
  
estado_conyugal_2_imputed<-
  estado_conyugal_2_imputed %>% 
    dplyr::left_join(estado_conyugal_2_imputed_cat_est_cony, by="amelia_fit_imputations_imp1_row") %>%
    dplyr::mutate(cat_est_conyugal=dplyr::case_when(cat_est_conyugal=="estado_conyugal_2_married"~"Married/Shared living arrangements",cat_est_conyugal=="estado_conyugal_2_sep_div"~"Separated/Divorced",cat_est_conyugal=="estado_conyugal_2_singl"~"Single",cat_est_conyugal=="estado_conyugal_2_wid"~"Widower"
    ))%>% 
  janitor::clean_names()

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:

CONS_C1_df_dup_SEP_2020_women_miss5_prev<-
CONS_C1_df_dup_SEP_2020_women_miss4 %>% 
   dplyr::left_join(dplyr::select(estado_conyugal_2_imputed,amelia_fit_imputations_imp1_row,cat_est_conyugal), by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
    dplyr::mutate(estado_conyugal_2=factor(dplyr::case_when(is.na(estado_conyugal_2)~as.character(cat_est_conyugal),TRUE~as.character(estado_conyugal_2)))) %>% 
  dplyr::select(-cat_est_conyugal) %>% 
  data.table()

# casos problemáticos de matrimonio c(59664, 17582, 161721, 36520)

no_calzaron_estado_cony<-
CONS_C1_df_dup_SEP_2020_women_miss5_prev %>% dplyr::filter(is.na(estado_conyugal_2)) %>% dplyr::distinct(row) %>% unlist()

estado_conyugal_2_imputed2<-
estado_conyugal_2_imputed %>% 
     dplyr::filter(amelia_fit_imputations_imp1_row %in%  no_calzaron_estado_cony) %>% 
  dplyr::select(amelia_fit_imputations_imp1_row, estado_conyugal_2_married, estado_conyugal_2_sep_div,estado_conyugal_2_singl, estado_conyugal_2_wid, estado_conyugal_2_tot, cat_est_conyugal) %>% 
  melt(id.vars="amelia_fit_imputations_imp1_row") %>% 
  dplyr::mutate(value=as.numeric(value)) %>% 
  dplyr::arrange(amelia_fit_imputations_imp1_row) %>% 
  dplyr::filter(value!="cat_est_conyugal") %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  slice_max(value, with_ties = T) %>% 
  dplyr::filter(variable!="estado_conyugal_2_married") %>% 
  dplyr::left_join(CONS_C1_df_dup_SEP_2020_women[,c("row","edad_al_ing")], by=c("amelia_fit_imputations_imp1_row"="row")) %>% 
  dplyr::mutate(value=dplyr::case_when(variable=="estado_conyugal_2_sep_div"~value*10,
                                       T~value)) %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  slice_max(value, with_ties = T) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(marital_status_imp=dplyr::case_when(grepl("_singl",variable)~"Single",
                grepl("_sep_div",variable)~"Separated/Divorced",
                grepl("_married",variable)~"Married/Shared living arrangements",
                grepl("_wid",variable)~"Widower"
                ))

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#2nd round of imputation for ties
CONS_C1_df_dup_SEP_2020_women_miss5<-
CONS_C1_df_dup_SEP_2020_women_miss5_prev %>% 
   dplyr::left_join(dplyr::select(estado_conyugal_2_imputed2,amelia_fit_imputations_imp1_row,marital_status_imp), by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
    dplyr::mutate(estado_conyugal_2=factor(dplyr::case_when(is.na(estado_conyugal_2)~as.character(marital_status_imp),TRUE~as.character(estado_conyugal_2)))) %>% 
  dplyr::select(-marital_status_imp) %>% 
  data.table()

#CONS_C1_df_dup_SEP_2020_women_miss5 %>% 
#dplyr::filter(hash_key %in% CONS_C1_df_dup_SEP_2020_women_miss5 %>% dplyr::filter(is.na(estado_conyugal_2)) %>% dplyr::distinct(hash_key) %>% unlist())

if(nrow(CONS_C1_df_dup_SEP_2020_women_miss5)-nrow(CONS_C1_df_dup_SEP_2020_women_miss4)>0){
  warning("AGS: Some rows were added in the imputation")}

We could not resolve Marital status in 0 cases due to ties in the most frequent values.

Cause of Discharge

We looked over possible imputations to the truly missing values, discarding missing values due to censorship (n=3). In case of ties, we replace with the more vulnerable value.

motivo_de_egreso_a_imputar<-
CONS_C1_df_dup_SEP_2020_women_miss %>% dplyr::filter(is.na(motivodeegreso_mod_imp)) %>% dplyr::left_join(dplyr::select(CONS_C1_df_dup_SEP_2020,row,fech_egres_imp)) %>% dplyr::filter(!is.na(fech_egres_imp))%>%dplyr::select(row)

## Joining, by = "row"

#CONS_C1_df_dup_SEP_2020 %>% dplyr::filter(is.na(motivodeegreso_mod_imp)) %>% 
#    dplyr::select(row, hash_key, motivodeegreso_mod_imp, fech_egres_imp)
#    dplyr::filter(fech_egres_imp=="2019-11-13")

motivodeegreso_mod_imp_imputed<-
 cbind.data.frame(amelia_fit$imputations$imp1$row,
       amelia_fit$imputations$imp1$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp2$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp3$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp4$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp5$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp6$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp7$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp8$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp9$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp10$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp11$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp12$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp13$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp14$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp15$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp16$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp17$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp18$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp19$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp20$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp21$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp22$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp23$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp24$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp25$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp26$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp27$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp28$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp29$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp30$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp31$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp32$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp33$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp34$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp35$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp36$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp37$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp38$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp39$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp40$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp41$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp42$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp43$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp44$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp45$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp46$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp47$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp48$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp49$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp50$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp51$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp52$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp53$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp54$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp55$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp56$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp57$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp58$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp59$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp60$motivodeegreso_mod_imp,
       amelia_fit$imputations$imp61$motivodeegreso_mod_imp
       ) %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  janitor::clean_names() %>% 
  dplyr::arrange(amelia_fit_imputations_imp1_row) %>% 
  dplyr::ungroup() %>% 
  dplyr::filter(amelia_fit_imputations_imp1_row %in% unlist(motivo_de_egreso_a_imputar$row)) %>% 
  #FILTRAR CASOS QUE SON ILÓGICOS: MUERTES CON TRATAMIENTOS POSTERIORES (1)
  dplyr::left_join(dplyr::select(CONS_C1_df_dup_SEP_2020,row,motivodeegreso_mod_imp, fech_egres_imp,dup, duplicates_filtered,evaluacindelprocesoteraputico,fech_ing_next_treat),by=c("amelia_fit_imputations_imp1_row"="row")) %>% 
  dplyr::mutate(value_death=dplyr::case_when(value=="Death"& !is.na(fech_ing_next_treat)~1,T~0)) %>% 
  dplyr::filter(value_death!=1) %>%  
  #FILTRAR CASOS QUE SON ILÓGICOS: NO PUEDEN HABER TRATAMIENTOS EN CURSO CON TRATAMIENTOS POSTERIORES (2)
  dplyr::mutate(value_fail=dplyr::case_when(value=="Ongoing treatment"& !is.na(fech_ing_next_treat)~1,T~0)) %>% 
  dplyr::filter(value_fail!=1) %>%  
  #FILTRAR CASOS QUE SON ILÓGICOS: NO PUEDE HABER OTRA COSA QUE TRATAMIENTO EN CURSO CON FECHA DE CENSURA
  dplyr::mutate(value_ong=dplyr::case_when(value!="Ongoing treatment" & fech_egres_imp=="2019-11-13"~1,T~0)) %>% 
  dplyr::filter(value_ong!=1) %>%  
  #:#:#:#:#:
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  dplyr::summarise(adm_dis=sum(value == "Administrative discharge",na.rm=T),
                    death=sum(value == "Death",na.rm=T),
                    referral=sum(value == "Referral to another treatment",na.rm=T),
                    ther_dis=sum(value == "Therapeutic discharge",na.rm=T),
                    on_treat=sum(value == "Ongoing treatment",na.rm=T),
                    dropout=sum(value =="Drop-out",na.rm=T)) %>% 
  melt(id.vars="amelia_fit_imputations_imp1_row") %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  dplyr::slice_max(value) %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  dplyr::mutate(n=n()) %>% 
  dplyr::mutate(emp=dplyr::case_when(variable=="adm_dis" & n>1~1,T~0)) %>% 
  dplyr::filter(emp!=1) %>% 
  dplyr::mutate(motivodeegreso_mod_imp_imputation=
                  dplyr::case_when(variable=="adm_dis"~"Administrative discharge",
                                   variable=="death"~"Death",  
                                   variable=="ther_dis"~"Therapeutic discharge",
                                   variable=="on_treat"~"Ongoing treatment",
                                   variable=="referral"~"Referral to another treatment",
                                   variable=="dropout"~"Drop-out"))

## `summarise()` ungrouping output (override with `.groups` argument)

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:
#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:
CONS_C1_df_dup_SEP_2020_women_miss6<-
CONS_C1_df_dup_SEP_2020_women_miss5 %>% 
   dplyr::left_join(motivodeegreso_mod_imp_imputed[,c("amelia_fit_imputations_imp1_row","motivodeegreso_mod_imp_imputation")], by=c("row"="amelia_fit_imputations_imp1_row")) %>%
  #dplyr::filter(is.na(motivodeegreso_mod_imp)) %>% dplyr::select(row,hash_key,motivodeegreso_mod_imp_original, motivodeegreso_mod_imp_imputation,motivodeegreso_mod_imp,fech_egres_num,fech_egres_imp)
  dplyr::left_join(cbind.data.frame(motivo_de_egreso_a_imputar,value_to_impute=1),"row") %>% 
  dplyr::mutate(motivodeegreso_mod_imp=factor(
     dplyr::case_when(is.na(motivodeegreso_mod_imp) & value_to_impute==1~motivodeegreso_mod_imp_imputation,
             T~as.character(motivodeegreso_mod_imp)))) %>% 
  dplyr::select(-motivodeegreso_mod_imp_imputation,-value_to_impute) %>% 
  data.table()
#CONS_C1_df_dup_SEP_2020_women_miss9 %>% janitor::tabyl(motivodeegreso_mod_imp,motivodeegreso_mod_imp_original)
#CONS_C1_df_dup_SEP_2020_women_miss9 %>% janitor::tabyl(motivodeegreso_mod_imp_original)

CONS_C1_df_dup_SEP_2020_women_miss6 %>% janitor::tabyl(motivodeegreso_mod_imp) %>%
    dplyr::mutate(percent=scales::percent(percent)) %>%  
    knitr::kable(.,format = "html", format.args = list(decimal.mark = ".", big.mark = ","),
               caption = paste0("Table 2. Imputed Cause of Discharge vs. Original Cause of Discharge"),
               #col.names = c("Cause of Discharge","1-High Achievement", "2- Medium Achievement","3- Minimum Achievement","Null Values"),
               align =rep('c', 101)) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size = 12) %>%
  kableExtra::add_footnote("Note. NA= Null values", notation="none") %>% 
  kableExtra::scroll_box(width = "100%", height = "375px")

Table 2. Imputed Cause of Discharge vs. Original Cause of Discharge
motivodeegreso_mod_imp	n	percent
Administrative discharge	1,738	8.13%
Early Drop-out	3,160	14.78%
Late Drop-out	7,097	33.20%
Ongoing treatment	1,599	7.48%
Referral to another treatment	2,670	12.49%
Therapeutic discharge	5,114	23.92%
Note. NA= Null values

#

if(nrow(CONS_C1_df_dup_SEP_2020_women_miss6)-nrow(CONS_C1_df_dup_SEP_2020_women_miss5)>0){
  warning("AGS: Some rows were added in the imputation")}

As a result of the imputations, there were no missing values.

Biopsychosocial involvement

Another variable that is worth imputing is the Biopsychosocial involvement (n= 370). In case of ties, we selected the imputed values with the value with the minimum involvement. In case of ties, we chose the most vulnerable value.

# Ver distintos valores propuestos para sustancia de inciio

#No se ve un patrón de dependencia entre el compromiso biopsicosocial y el estatus de egreso
#  table(CONS_C1_df_dup_SEP_2020_women_miss$compromiso_biopsicosocial,
#       CONS_C1_df_dup_SEP_2020_women_miss$motivodeegreso_mod_imp)

comp_biopsisoc_imputed<-
 cbind.data.frame(amelia_fit$imputations$imp1$row,
         amelia_fit$imputations$imp1$compromiso_biopsicosocial,
       amelia_fit$imputations$imp2$compromiso_biopsicosocial,
       amelia_fit$imputations$imp3$compromiso_biopsicosocial,
       amelia_fit$imputations$imp4$compromiso_biopsicosocial,
       amelia_fit$imputations$imp5$compromiso_biopsicosocial,
       amelia_fit$imputations$imp6$compromiso_biopsicosocial,
       amelia_fit$imputations$imp7$compromiso_biopsicosocial,
       amelia_fit$imputations$imp8$compromiso_biopsicosocial,
       amelia_fit$imputations$imp9$compromiso_biopsicosocial,
       amelia_fit$imputations$imp10$compromiso_biopsicosocial,
       amelia_fit$imputations$imp11$compromiso_biopsicosocial,
       amelia_fit$imputations$imp12$compromiso_biopsicosocial,
       amelia_fit$imputations$imp13$compromiso_biopsicosocial,
       amelia_fit$imputations$imp14$compromiso_biopsicosocial,
       amelia_fit$imputations$imp15$compromiso_biopsicosocial,
       amelia_fit$imputations$imp16$compromiso_biopsicosocial,
       amelia_fit$imputations$imp17$compromiso_biopsicosocial,
       amelia_fit$imputations$imp18$compromiso_biopsicosocial,
       amelia_fit$imputations$imp19$compromiso_biopsicosocial,
       amelia_fit$imputations$imp20$compromiso_biopsicosocial,
       amelia_fit$imputations$imp21$compromiso_biopsicosocial,
       amelia_fit$imputations$imp22$compromiso_biopsicosocial,
       amelia_fit$imputations$imp23$compromiso_biopsicosocial,
       amelia_fit$imputations$imp24$compromiso_biopsicosocial,
       amelia_fit$imputations$imp25$compromiso_biopsicosocial,
       amelia_fit$imputations$imp26$compromiso_biopsicosocial,
       amelia_fit$imputations$imp27$compromiso_biopsicosocial,
       amelia_fit$imputations$imp28$compromiso_biopsicosocial,
       amelia_fit$imputations$imp29$compromiso_biopsicosocial,
       amelia_fit$imputations$imp30$compromiso_biopsicosocial,
       amelia_fit$imputations$imp31$compromiso_biopsicosocial,
       amelia_fit$imputations$imp32$compromiso_biopsicosocial,
       amelia_fit$imputations$imp33$compromiso_biopsicosocial,
       amelia_fit$imputations$imp34$compromiso_biopsicosocial,
       amelia_fit$imputations$imp35$compromiso_biopsicosocial,
       amelia_fit$imputations$imp36$compromiso_biopsicosocial,
       amelia_fit$imputations$imp37$compromiso_biopsicosocial,
       amelia_fit$imputations$imp38$compromiso_biopsicosocial,
       amelia_fit$imputations$imp39$compromiso_biopsicosocial,
       amelia_fit$imputations$imp40$compromiso_biopsicosocial,
       amelia_fit$imputations$imp41$compromiso_biopsicosocial,
       amelia_fit$imputations$imp42$compromiso_biopsicosocial,
       amelia_fit$imputations$imp43$compromiso_biopsicosocial,
       amelia_fit$imputations$imp44$compromiso_biopsicosocial,
       amelia_fit$imputations$imp45$compromiso_biopsicosocial,
       amelia_fit$imputations$imp46$compromiso_biopsicosocial,
       amelia_fit$imputations$imp47$compromiso_biopsicosocial,
       amelia_fit$imputations$imp48$compromiso_biopsicosocial,
       amelia_fit$imputations$imp49$compromiso_biopsicosocial,
       amelia_fit$imputations$imp50$compromiso_biopsicosocial,
       amelia_fit$imputations$imp51$compromiso_biopsicosocial,
       amelia_fit$imputations$imp52$compromiso_biopsicosocial,
       amelia_fit$imputations$imp53$compromiso_biopsicosocial,
       amelia_fit$imputations$imp54$compromiso_biopsicosocial,
       amelia_fit$imputations$imp55$compromiso_biopsicosocial,
       amelia_fit$imputations$imp56$compromiso_biopsicosocial,
       amelia_fit$imputations$imp57$compromiso_biopsicosocial,
       amelia_fit$imputations$imp58$compromiso_biopsicosocial,
       amelia_fit$imputations$imp59$compromiso_biopsicosocial,
       amelia_fit$imputations$imp60$compromiso_biopsicosocial,
       amelia_fit$imputations$imp61$compromiso_biopsicosocial
       ) %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  janitor::clean_names() %>% 
  dplyr::arrange(amelia_fit_imputations_imp1_row) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>%
  # 1-Mild 2-Moderate   3-Severe 
  dplyr::summarise(severe_3=sum(value == "3-Severe",na.rm=T),
                   mod_2=sum(value == "2-Moderate",na.rm=T),
                  mild_1=sum(value =="1-Mild",na.rm=T)) %>% 
  dplyr::ungroup() %>%
    dplyr::mutate(comp_biopsisoc_imp= dplyr::case_when(
      (severe_3>mild_1) & (severe_3>mod_2)~"3-Severe",
      (mod_2>mild_1) & (mod_2>severe_3)~"2-Moderate",
      (mild_1>mod_2) & (mild_1>severe_3)~"1-Mild"
      )) %>% 
#2) Resolve ties    
  dplyr::mutate(ties= dplyr::case_when(is.na(comp_biopsisoc_imp)~1,T~0)) %>% 
  dplyr::mutate(comp_biopsisoc_imp= dplyr::case_when(ties==1 & ((severe_3>mod_2)|(severe_3>mild_1))~"3-Severe",
                                                     ties==1 & ((mod_2>mild_1)|(mod_2>severe_3))~"2-Moderate",
                T~comp_biopsisoc_imp))

## `summarise()` ungrouping output (override with `.groups` argument)

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:
##
#CONS_C1_df_dup_SEP_2020 %>% janitor::tabyl(motivodeegreso_mod_imp,evaluacindelprocesoteraputico)

CONS_C1_df_dup_SEP_2020_women_miss7<-
CONS_C1_df_dup_SEP_2020_women_miss6 %>% 
   dplyr::left_join(comp_biopsisoc_imputed[,c("amelia_fit_imputations_imp1_row","comp_biopsisoc_imp")], by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
    dplyr::mutate(compromiso_biopsicosocial=factor(dplyr::case_when(is.na(compromiso_biopsicosocial) ~comp_biopsisoc_imp,
                                                                        TRUE~as.character(compromiso_biopsicosocial)))) %>% 
     dplyr::mutate(compromiso_biopsicosocial=parse_factor(as.character(compromiso_biopsicosocial),levels=c('1-Mild', '2-Moderate','3-Severe'), ordered =T,trim_ws=T,include_na =F, locale=locale(encoding = "UTF-8"))) %>% 
  dplyr::select(-comp_biopsisoc_imp) %>% 
  data.table()

if(nrow(CONS_C1_df_dup_SEP_2020_women_miss7)-nrow(CONS_C1_df_dup_SEP_2020_women_miss6)>0){
  warning("AGS: Some rows were added in the imputation")}

As a result of the imputations, there were no missing values.

Tenure status of households

Another variable that is worth imputing is the Tenure status of households (n= 942). In case of ties, we selected the imputed values with the value with the minimum involvement. In case of ties, we kept what we thought was the most vulnerable value (discarding “Owner” or “Renting” values).

tenencia_de_la_vivienda_mod_imputed<-
 cbind.data.frame(amelia_fit$imputations$imp1$row,
         amelia_fit$imputations$imp1$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp2$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp3$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp4$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp5$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp6$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp7$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp8$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp9$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp10$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp11$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp12$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp13$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp14$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp15$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp16$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp17$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp18$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp19$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp20$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp21$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp22$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp23$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp24$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp25$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp26$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp27$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp28$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp29$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp30$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp31$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp32$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp33$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp34$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp35$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp36$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp37$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp38$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp39$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp40$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp41$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp42$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp43$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp44$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp45$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp46$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp47$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp48$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp49$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp50$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp51$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp52$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp53$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp54$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp55$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp56$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp57$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp58$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp59$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp60$tenencia_de_la_vivienda_mod,
       amelia_fit$imputations$imp61$tenencia_de_la_vivienda_mod
       ) %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  janitor::clean_names() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row, value) %>% 
  tally() %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  dplyr::top_n(1,n) %>% 
  dplyr::ungroup()

#tenencia_de_la_vivienda_mod_imputed %>% 
#  pivot_wider(id_cols="amelia_fit_imputations_imp1_row",names_from="value", values_from="n", values_fill=0) %>% 
#  dplyr::ungroup()

tenencia_de_la_vivienda_mod_imputed_dup<-
  tenencia_de_la_vivienda_mod_imputed %>% 
    dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
    dplyr::mutate(num=n()) %>% 
    dplyr::filter(num>1) %>% 
    dplyr::ungroup() %>% 
  #1) owner, discard if it is in the maximum
    dplyr::mutate(n=dplyr::case_when(value=="Owner/Transferred dwellings/Pays Dividends"~0,T~as.numeric(n))) %>% 
    dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
    dplyr::top_n(1,n) %>% 
    dplyr::ungroup() %>% 
    dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  #2) Renting vs. stays temporarily with a relative, keep the second
    dplyr::mutate(n=dplyr::case_when(value=="Renting"~0,T~as.numeric(n))) %>% 
    dplyr::top_n(1,n) %>% 
    dplyr::ungroup() %>% 
    dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
    dplyr::mutate(n_dup=n()) 

tenencia_de_la_vivienda_mod_imputed_final<-
tenencia_de_la_vivienda_mod_imputed %>% 
    dplyr::left_join(tenencia_de_la_vivienda_mod_imputed_dup, by=c("amelia_fit_imputations_imp1_row", "value")) %>% 
  #si es vacío, y no está en la base, es valor 0 (es difícil que)
    dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
    dplyr::mutate(sum= suppressWarnings(max(num, na.rm=T))) %>% 
    dplyr::ungroup() %>% 
  #descarto los que presentaron más de un valor para una misma fila y aquellos que no fueron seleccionados
    dplyr::mutate(descartar=dplyr::case_when(sum>1 & is.na(n.y)~1,T~0)) %>% 
    dplyr::filter(descartar==0)

ifelse(nrow(tenencia_de_la_vivienda_mod_imputed_final)/length(unique(CONS_C1_df_dup_SEP_2020_women_miss7$row))>1,
       "There are still more than one value in the imputation","")

## [1] ""

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:
#CONS_C1_df_dup_SEP_2020 %>% janitor::tabyl(motivodeegreso_mod_imp,evaluacindelprocesoteraputico)

CONS_C1_df_dup_SEP_2020_women_miss8<-
CONS_C1_df_dup_SEP_2020_women_miss7 %>% 
   dplyr::left_join(tenencia_de_la_vivienda_mod_imputed_final[,c("amelia_fit_imputations_imp1_row","value")], by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
    dplyr::mutate(tenencia_de_la_vivienda_mod=factor(dplyr::case_when(is.na(tenencia_de_la_vivienda_mod) ~value,
                                                                        TRUE~as.character(tenencia_de_la_vivienda_mod)))) %>% 
  dplyr::select(-value) %>% 
  data.table()
if(nrow(CONS_C1_df_dup_SEP_2020_women_miss8)-nrow(CONS_C1_df_dup_SEP_2020_women_miss7)>0){
  warning("AGS: Some rows were added in the imputation")}

As a result of the imputations, there were no missing values.

Number of children (max. Value) (Dichotomized)

A numeric variable that had a great proportion of missing values was this (n= 82).

As seen in the figure above, most of the imputations were around 1 and 3 children, leaving less space for an imputation of no children or more than 3. We imputed these values, by approximating the mean of the 61 candidate values to a discrete number.

numero_de_hijos_mod_rec_imputed<-
  cbind.data.frame(amelia_fit$imputations$imp1$row,
         amelia_fit$imputations$imp1$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp2$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp3$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp4$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp5$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp6$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp7$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp8$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp9$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp10$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp11$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp12$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp13$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp14$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp15$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp16$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp17$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp18$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp19$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp20$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp21$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp22$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp23$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp24$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp25$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp26$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp27$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp28$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp29$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp30$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp31$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp32$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp33$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp34$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp35$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp36$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp37$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp38$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp39$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp40$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp41$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp42$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp43$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp44$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp45$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp46$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp47$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp48$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp49$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp50$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp51$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp52$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp53$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp54$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp55$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp56$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp57$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp58$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp59$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp60$numero_de_hijos_mod_rec,
       amelia_fit$imputations$imp61$numero_de_hijos_mod_rec
       )  %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  janitor::clean_names() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  dplyr::summarise(children= sum(value=="Yes"),
                   no_children= sum(value=="No")) %>% 
  dplyr::mutate(numero_de_hijos_mod_rec_imp=dplyr::case_when(children>=31~"Yes",
                                                    no_children>=31~"No"))

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:

CONS_C1_df_dup_SEP_2020_women_miss9<-
CONS_C1_df_dup_SEP_2020_women_miss8 %>% 
    dplyr::left_join(dplyr::select(numero_de_hijos_mod_rec_imputed,amelia_fit_imputations_imp1_row,numero_de_hijos_mod_rec_imp), by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
  dplyr::mutate(numero_de_hijos_mod_rec=factor(dplyr::case_when(is.na(numero_de_hijos_mod_rec)~as.character(numero_de_hijos_mod_rec_imp),T~as.character(numero_de_hijos_mod_rec)))) %>%
  dplyr::select(-numero_de_hijos_mod_rec_imp) %>% 
  data.table()
#table(is.na(CONS_C1_df_dup_SEP_2020_women_miss12$numero_de_hijos_mod_rec))
if(nrow(CONS_C1_df_dup_SEP_2020_women_miss9)-nrow(CONS_C1_df_dup_SEP_2020_women_miss8)>0){
  warning("AGS: Some rows were added in the imputation")}

As a result of the imputations, there were no missing values.

Type of Program

A numeric variable that was important to impute missing values was the type of program (n= 17).

tipo_de_programa_2_imputed<-
  cbind.data.frame(amelia_fit$imputations$imp1$row,
         amelia_fit$imputations$imp1$tipo_de_programa_2,
       amelia_fit$imputations$imp2$tipo_de_programa_2,
       amelia_fit$imputations$imp3$tipo_de_programa_2,
       amelia_fit$imputations$imp4$tipo_de_programa_2,
       amelia_fit$imputations$imp5$tipo_de_programa_2,
       amelia_fit$imputations$imp6$tipo_de_programa_2,
       amelia_fit$imputations$imp7$tipo_de_programa_2,
       amelia_fit$imputations$imp8$tipo_de_programa_2,
       amelia_fit$imputations$imp9$tipo_de_programa_2,
       amelia_fit$imputations$imp10$tipo_de_programa_2,
       amelia_fit$imputations$imp11$tipo_de_programa_2,
       amelia_fit$imputations$imp12$tipo_de_programa_2,
       amelia_fit$imputations$imp13$tipo_de_programa_2,
       amelia_fit$imputations$imp14$tipo_de_programa_2,
       amelia_fit$imputations$imp15$tipo_de_programa_2,
       amelia_fit$imputations$imp16$tipo_de_programa_2,
       amelia_fit$imputations$imp17$tipo_de_programa_2,
       amelia_fit$imputations$imp18$tipo_de_programa_2,
       amelia_fit$imputations$imp19$tipo_de_programa_2,
       amelia_fit$imputations$imp20$tipo_de_programa_2,
       amelia_fit$imputations$imp21$tipo_de_programa_2,
       amelia_fit$imputations$imp22$tipo_de_programa_2,
       amelia_fit$imputations$imp23$tipo_de_programa_2,
       amelia_fit$imputations$imp24$tipo_de_programa_2,
       amelia_fit$imputations$imp25$tipo_de_programa_2,
       amelia_fit$imputations$imp26$tipo_de_programa_2,
       amelia_fit$imputations$imp27$tipo_de_programa_2,
       amelia_fit$imputations$imp28$tipo_de_programa_2,
       amelia_fit$imputations$imp29$tipo_de_programa_2,
       amelia_fit$imputations$imp30$tipo_de_programa_2,
       amelia_fit$imputations$imp31$tipo_de_programa_2,
       amelia_fit$imputations$imp32$tipo_de_programa_2,
       amelia_fit$imputations$imp33$tipo_de_programa_2,
       amelia_fit$imputations$imp34$tipo_de_programa_2,
       amelia_fit$imputations$imp35$tipo_de_programa_2,
       amelia_fit$imputations$imp36$tipo_de_programa_2,
       amelia_fit$imputations$imp37$tipo_de_programa_2,
       amelia_fit$imputations$imp38$tipo_de_programa_2,
       amelia_fit$imputations$imp39$tipo_de_programa_2,
       amelia_fit$imputations$imp40$tipo_de_programa_2,
       amelia_fit$imputations$imp41$tipo_de_programa_2,
       amelia_fit$imputations$imp42$tipo_de_programa_2,
       amelia_fit$imputations$imp43$tipo_de_programa_2,
       amelia_fit$imputations$imp44$tipo_de_programa_2,
       amelia_fit$imputations$imp45$tipo_de_programa_2,
       amelia_fit$imputations$imp46$tipo_de_programa_2,
       amelia_fit$imputations$imp47$tipo_de_programa_2,
       amelia_fit$imputations$imp48$tipo_de_programa_2,
       amelia_fit$imputations$imp49$tipo_de_programa_2,
       amelia_fit$imputations$imp50$tipo_de_programa_2,
       amelia_fit$imputations$imp51$tipo_de_programa_2,
       amelia_fit$imputations$imp52$tipo_de_programa_2,
       amelia_fit$imputations$imp53$tipo_de_programa_2,
       amelia_fit$imputations$imp54$tipo_de_programa_2,
       amelia_fit$imputations$imp55$tipo_de_programa_2,
       amelia_fit$imputations$imp56$tipo_de_programa_2,
       amelia_fit$imputations$imp57$tipo_de_programa_2,
       amelia_fit$imputations$imp58$tipo_de_programa_2,
       amelia_fit$imputations$imp59$tipo_de_programa_2,
       amelia_fit$imputations$imp60$tipo_de_programa_2,
       amelia_fit$imputations$imp61$tipo_de_programa_2
       )  %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  janitor::clean_names() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  dplyr::summarise(WE= sum(value=="Women specific"),
                   GP= sum(value=="General population")) %>% 
  dplyr::mutate(tipo_de_programa_2_imp=dplyr::case_when(WE>=31~"Women specific",
                                                    GP>=31~"General population"))

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:

CONS_C1_df_dup_SEP_2020_women_miss10<-
CONS_C1_df_dup_SEP_2020_women_miss9 %>% 
    dplyr::left_join(dplyr::select(tipo_de_programa_2_imputed,amelia_fit_imputations_imp1_row,tipo_de_programa_2_imp), by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
  dplyr::mutate(tipo_de_programa_2=factor(dplyr::case_when(is.na(tipo_de_programa_2)~as.character(tipo_de_programa_2_imp),T~as.character(tipo_de_programa_2)))) %>%
  dplyr::select(-tipo_de_programa_2_imp) %>% 
  data.table()
#table(is.na(CONS_C1_df_dup_SEP_2020_women_miss12$tipo_de_programa_2))
if(nrow(CONS_C1_df_dup_SEP_2020_women_miss10)-nrow(CONS_C1_df_dup_SEP_2020_women_miss9)>0){
  warning("AGS: Some rows were added in the imputation")}

As a result of the imputations, there were no missing values.

Type of Plan

We looked over possible imputations to the type of plan (n=17).

tipo_de_plan_res_imputed<-
 cbind.data.frame(amelia_fit$imputations$imp1$row,
       amelia_fit$imputations$imp1$tipo_de_plan_res,
       amelia_fit$imputations$imp2$tipo_de_plan_res,
       amelia_fit$imputations$imp3$tipo_de_plan_res,
       amelia_fit$imputations$imp4$tipo_de_plan_res,
       amelia_fit$imputations$imp5$tipo_de_plan_res,
       amelia_fit$imputations$imp6$tipo_de_plan_res,
       amelia_fit$imputations$imp7$tipo_de_plan_res,
       amelia_fit$imputations$imp8$tipo_de_plan_res,
       amelia_fit$imputations$imp9$tipo_de_plan_res,
       amelia_fit$imputations$imp10$tipo_de_plan_res,
       amelia_fit$imputations$imp11$tipo_de_plan_res,
       amelia_fit$imputations$imp12$tipo_de_plan_res,
       amelia_fit$imputations$imp13$tipo_de_plan_res,
       amelia_fit$imputations$imp14$tipo_de_plan_res,
       amelia_fit$imputations$imp15$tipo_de_plan_res,
       amelia_fit$imputations$imp16$tipo_de_plan_res,
       amelia_fit$imputations$imp17$tipo_de_plan_res,
       amelia_fit$imputations$imp18$tipo_de_plan_res,
       amelia_fit$imputations$imp19$tipo_de_plan_res,
       amelia_fit$imputations$imp20$tipo_de_plan_res,
       amelia_fit$imputations$imp21$tipo_de_plan_res,
       amelia_fit$imputations$imp22$tipo_de_plan_res,
       amelia_fit$imputations$imp23$tipo_de_plan_res,
       amelia_fit$imputations$imp24$tipo_de_plan_res,
       amelia_fit$imputations$imp25$tipo_de_plan_res,
       amelia_fit$imputations$imp26$tipo_de_plan_res,
       amelia_fit$imputations$imp27$tipo_de_plan_res,
       amelia_fit$imputations$imp28$tipo_de_plan_res,
       amelia_fit$imputations$imp29$tipo_de_plan_res,
       amelia_fit$imputations$imp30$tipo_de_plan_res,
       amelia_fit$imputations$imp31$tipo_de_plan_res,
       amelia_fit$imputations$imp32$tipo_de_plan_res,
       amelia_fit$imputations$imp33$tipo_de_plan_res,
       amelia_fit$imputations$imp34$tipo_de_plan_res,
       amelia_fit$imputations$imp35$tipo_de_plan_res,
       amelia_fit$imputations$imp36$tipo_de_plan_res,
       amelia_fit$imputations$imp37$tipo_de_plan_res,
       amelia_fit$imputations$imp38$tipo_de_plan_res,
       amelia_fit$imputations$imp39$tipo_de_plan_res,
       amelia_fit$imputations$imp40$tipo_de_plan_res,
       amelia_fit$imputations$imp41$tipo_de_plan_res,
       amelia_fit$imputations$imp42$tipo_de_plan_res,
       amelia_fit$imputations$imp43$tipo_de_plan_res,
       amelia_fit$imputations$imp44$tipo_de_plan_res,
       amelia_fit$imputations$imp45$tipo_de_plan_res,
       amelia_fit$imputations$imp46$tipo_de_plan_res,
       amelia_fit$imputations$imp47$tipo_de_plan_res,
       amelia_fit$imputations$imp48$tipo_de_plan_res,
       amelia_fit$imputations$imp49$tipo_de_plan_res,
       amelia_fit$imputations$imp50$tipo_de_plan_res,
       amelia_fit$imputations$imp51$tipo_de_plan_res,
       amelia_fit$imputations$imp52$tipo_de_plan_res,
       amelia_fit$imputations$imp53$tipo_de_plan_res,
       amelia_fit$imputations$imp54$tipo_de_plan_res,
       amelia_fit$imputations$imp55$tipo_de_plan_res,
       amelia_fit$imputations$imp56$tipo_de_plan_res,
       amelia_fit$imputations$imp57$tipo_de_plan_res,
       amelia_fit$imputations$imp58$tipo_de_plan_res,
       amelia_fit$imputations$imp59$tipo_de_plan_res,
       amelia_fit$imputations$imp60$tipo_de_plan_res,
       amelia_fit$imputations$imp61$tipo_de_plan_res
       ) %>% 
  melt(id.vars="amelia_fit$imputations$imp1$row") %>% 
  janitor::clean_names() %>% 
  dplyr::group_by(amelia_fit_imputations_imp1_row) %>% 
  dplyr::summarise(outpatient= sum(value=="Outpatient"),
                   residential= sum(value=="Residential")) %>% 
  dplyr::mutate(tipo_de_plan_res_imp=dplyr::case_when(outpatient>=31~"Outpatient",
                                                    residential>=31~"Residential"))

## `summarise()` ungrouping output (override with `.groups` argument)

#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:#:#:#::#:#:#:

CONS_C1_df_dup_SEP_2020_women_miss11<-
CONS_C1_df_dup_SEP_2020_women_miss10 %>% 
    dplyr::left_join(dplyr::select(tipo_de_plan_res_imputed,amelia_fit_imputations_imp1_row,tipo_de_plan_res_imp), by=c("row"="amelia_fit_imputations_imp1_row")) %>% 
  dplyr::mutate(tipo_de_plan_res=factor(dplyr::case_when(is.na(tipo_de_plan_res)~as.character(tipo_de_plan_res_imp),T~as.character(tipo_de_plan_res)))) %>%
  dplyr::select(-tipo_de_plan_res_imp) %>% 
  data.table()
#table(is.na(CONS_C1_df_dup_SEP_2020_women_miss11$tipo_centro_pub))
#table(is.na(CONS_C1_df_dup_SEP_2020_women_miss11$nombre_region))

As a result, there were no missing values once imputed.

Sample Characteristics

We checked the characteristics of the sample depending on type of program.

#añado los imputados
CONS_C1_df_dup_SEP_2020_women_miss_after_imp<-
CONS_C1_df_dup_SEP_2020_women_miss11 %>% 
#  relocate(otras_sus1_mod, .after = last_col()) %>% 
  dplyr::left_join(dplyr::select(CONS_C1_df_dup_SEP_2020, row, fech_ing, fech_egres_imp, fech_ing_num, fech_egres_num, dias_treat_imp_sin_na,fech_ing_next_treat), by="row")%>% 
  #dplyr::filter(fech_egres_num==18213,!is.na(motivodeegreso_mod_imp)) %>% 
  dplyr::mutate(motivodeegreso_mod_imp=factor(dplyr::case_when(dias_treat_imp_sin_na>=90 & motivodeegreso_mod_imp=="Drop-out"~ "Late Drop-out",
                                                        dias_treat_imp_sin_na<90 & motivodeegreso_mod_imp=="Drop-out"~ "Early Drop-out",
                                                        fech_egres_num==18213 & is.na(motivodeegreso_mod_imp)~"Ongoing treatment",
                                                        TRUE~as.character(motivodeegreso_mod_imp)
                                                        ))) %>%
  dplyr::mutate(sum_miss = base::rowSums(is.na(dplyr::select(.,c(tipo_de_programa_2:tipo_de_plan_res))))) %>% 
  dplyr::group_by(hash_key) %>% 
  dplyr::mutate(sum_miss=sum(sum_miss)) %>% 
  dplyr::ungroup() 

CONS_C1_df_dup_SEP_2020_women_miss_after_imp_descartados <-
  CONS_C1_df_dup_SEP_2020_women_miss_after_imp %>% 
  dplyr::filter(sum_miss>0)

CONS_C1_df_dup_SEP_2020_women_miss_after_imp_descartados %>% 
  rowwise %>%
  dplyr::mutate_at(.vars = vars(vector_variables),
                   .funs = ~ifelse(is.na(.), 1, 0)) %>% 
  dplyr::ungroup() %>% 
  dplyr::summarise_at(vars(vector_variables),~sum(.)) %>% 
  melt

## No id variables; using all as measure variables

##                       variable value
## 1           tipo_de_programa_2     0
## 2            estado_conyugal_2     0
## 3           edad_al_ing_grupos     0
## 4              escolaridad_rec     0
## 5            sus_principal_mod     0
## 6           freq_cons_sus_prin     0
## 7    compromiso_biopsicosocial     0
## 8  tenencia_de_la_vivienda_mod     0
## 9            num_otras_sus_mod     0
## 10     numero_de_hijos_mod_rec     0
## 11      motivodeegreso_mod_imp     0
## 12            tipo_de_plan_res     0

#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:   
#:#:#:#:#:#:#:#:#:#BASE DE DATOS DEFINITIVA#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
CONS_C1_df_dup_SEP_2020_women_miss_after_imp %>% 
  dplyr::filter(sum_miss==0) %>% 
  dplyr::select(-sum_miss) %>% 
#DAR FORMATO ORDINAL A LAS VARIABLES
  dplyr::mutate(edad_al_ing_grupos=parse_factor(as.character(edad_al_ing_grupos),levels=c('18-29', '30-39', '40-49', '50+'), ordered =T,trim_ws=F,include_na =F)) %>% #, locale=locale(encoding = "Latin1")
    dplyr::mutate(escolaridad_rec=parse_factor(as.character(escolaridad_rec),levels=c('3-Completed primary school or less', '2-Completed high school or less', '1-More than high school'), ordered =T,trim_ws=T,include_na =F, locale=locale(encoding = "Latin1"))) %>%  
  dplyr::mutate(freq_cons_sus_prin=parse_factor(as.character(freq_cons_sus_prin),levels=c('Less than 1 day a week','2 to 3 days a week','4 to 6 days a week','1 day a week or more','Daily'), ordered =T,trim_ws=F,include_na =F)) %>% #, locale=locale(encoding = "Latin1")
  dplyr::mutate(compromiso_biopsicosocial=parse_factor(as.character(compromiso_biopsicosocial),levels=c('1-Mild', '2-Moderate','3-Severe'), ordered =T,trim_ws=F,include_na =F)) %>% #, locale=locale(encoding = "Latin1")
  dplyr::mutate(num_otras_sus_mod=parse_factor(as.character(num_otras_sus_mod),levels=c('No additional substance', 'One additional substance','More than one additional substance'), ordered =T,trim_ws=F,include_na =F)) %>% #, locale=locale(encoding = "Latin1")
  dplyr::mutate(fech_ing_next_treat_date=as.Date(fech_ing_next_treat)) %>% 
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
  data.table::data.table() %>% 
    assign("CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados",.,envir=.GlobalEnv)

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$tipo_de_programa_2,"label") <- 'Type of program'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$edad_al_ing_grupos,"label") <- 'Age at admission to treatment, grouped.'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$estado_conyugal_2,"label") <- 'Marital status'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$numero_de_hijos_mod_rec,"label") <- 'Have children (Dichotomized)'

attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$sus_principal_mod,"label") <- 'Primary or main substance'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$num_otras_sus_mod,"label") <- 'Co-occurring SUD'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$freq_cons_sus_prin,"label") <- 'Consumption frequency of primary or main substance'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$compromiso_biopsicosocial,"label") <- 'Biopsychosocial involvement'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$tenencia_de_la_vivienda_mod,"label") <- 'Tenure status of households'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$escolaridad_rec,"label") <- 'Educational Attainment'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$motivodeegreso_mod_imp,"label") <- 'Cause of Discharge'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$fech_ing_next_treat_date,"label") <- 'Date of Admission to Posterior Treatment'
attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$tipo_de_plan_res,"label") <- 'Setting of Treatment'
#attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$tipo_centro,"label") <- 'Type of center of the last entry'
#attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$embarazo,"label") <- 'Pregnant at admission'
#attr(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados$edad_ini_sus_prin_grupos,"label") <- 'Age at first use of principal substance, grouped'

We ended the process having 21,378 compelte cases (users= 21,378).

kableone <- function(x, ...) {
  capture.output(x <- print(x,...))
  knitr::kable(x,format= "html", format.args= list(decimal.mark= ".", big.mark= ","))
}
Variables_after_imp<-c("edad_al_ing_grupos","escolaridad_rec","sus_principal_mod","freq_cons_sus_prin","compromiso_biopsicosocial","tenencia_de_la_vivienda_mod","num_otras_sus_mod","numero_de_hijos_mod_rec","motivodeegreso_mod_imp","dias_treat_imp_sin_na","tipo_de_plan_res")
catVars_after_imp<-
c("edad_al_ing_grupos","escolaridad_rec","sus_principal_mod","freq_cons_sus_prin","compromiso_biopsicosocial","tenencia_de_la_vivienda_mod","num_otras_sus_mod","numero_de_hijos_mod_rec","motivodeegreso_mod_imp","tipo_de_plan_res")

pre_tab1<-Sys.time()
tab1<-
CreateTableOne(vars = Variables_after_imp, 
               strata = "tipo_de_programa_2", 
               data = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados, 
               factorVars = catVars_after_imp, 
               smd=T)
post_tab1<-Sys.time()
diff_time_tab1=post_tab1-pre_tab1

kableone(tab1, 
         caption = paste0("Table 4. Covariate Balance in the Variables of Interest"),
         col.names= c("Variables","Ambulatory","Residential", "p-values","test","SMD"),
         nonnormal= c("edad_ini_cons","edad_al_ing","fech_ing_num"),#"\\hline",
                       smd=T, test=T, varLabels=T,noSpaces=T, printToggle=T, dropEqual=F) %>% 
    kableExtra::kable_styling(bootstrap_options = c("striped", "hover","condensed"),font_size= 11) %>%
    row_spec(1, bold = T, italic =T,color ="black",hline_after=T,extra_latex_after="\\arrayrulecolor{white}",font_size= 11) %>%
    #kableExtra::pack_rows("", 2, 2) %>% 
    scroll_box(width = "100%", height = "400px")

	General population	Women specific	p	SMD
n	13178	8200
Age at admission to treatment, grouped. (%)			<0.001	0.194
18-29	4522 (34.3)	3364 (41.0)
30-39	4277 (32.5)	2754 (33.6)
40-49	2651 (20.1)	1382 (16.9)
50+	1728 (13.1)	700 (8.5)
Educational Attainment (%)			0.023	0.039
3-Completed primary school or less	4360 (33.1)	2655 (32.4)
2-Completed high school or less	6645 (50.4)	4281 (52.2)
1-More than high school	2173 (16.5)	1264 (15.4)
Primary or main substance (%)			<0.001	0.384
Alcohol	4843 (36.8)	1903 (23.2)
Cocaine hydrochloride	2526 (19.2)	1435 (17.5)
Cocaine paste	4304 (32.7)	4116 (50.2)
Marijuana	943 (7.2)	474 (5.8)
Other	562 (4.3)	272 (3.3)
Consumption frequency of primary or main substance (%)			<0.001	0.414
Less than 1 day a week	842 (6.4)	218 (2.7)
2 to 3 days a week	3891 (29.5)	1680 (20.5)
4 to 6 days a week	2052 (15.6)	1175 (14.3)
1 day a week or more	1045 (7.9)	295 (3.6)
Daily	5348 (40.6)	4832 (58.9)
Biopsychosocial involvement (%)			<0.001	0.612
1-Mild	1329 (10.1)	163 (2.0)
2-Moderate	7823 (59.4)	3334 (40.7)
3-Severe	4026 (30.6)	4703 (57.4)
Tenure status of households (%)			<0.001	0.227
Illegal Settlement	180 (1.4)	146 (1.8)
Others	343 (2.6)	223 (2.7)
Owner/Transferred dwellings/Pays Dividends	4936 (37.5)	2498 (30.5)
Renting	2725 (20.7)	1355 (16.5)
Stays temporarily with a relative	4994 (37.9)	3978 (48.5)
Co-occurring SUD (%)			<0.001	0.299
No additional substance	4397 (33.4)	1727 (21.1)
One additional substance	4898 (37.2)	3215 (39.2)
More than one additional substance	3883 (29.5)	3258 (39.7)
Have children (Dichotomized) = Yes (%)	11522 (87.4)	7287 (88.9)	0.002	0.044
Cause of Discharge (%)			<0.001	0.222
Administrative discharge	994 (7.5)	744 (9.1)
Early Drop-out	1638 (12.4)	1522 (18.6)
Late Drop-out	4784 (36.3)	2313 (28.2)
Ongoing treatment	1026 (7.8)	573 (7.0)
Referral to another treatment	1628 (12.4)	1042 (12.7)
Therapeutic discharge	3108 (23.6)	2006 (24.5)
Días de Tratamiento (valores perdidos en la fecha de egreso se reemplazaron por la diferencia con 2019-11-13)/Days of Treatment (missing dates of discharge were replaced with difference from 2019-11-13) (mean (SD))	247.57 (198.18)	221.48 (190.78)	<0.001	0.134
Setting of Treatment = Residential (%)	603 (4.6)	3323 (40.5)	<0.001	0.953

#"tipo_de_plan_ambulatorio",
#https://cran.r-project.org/web/packages/tableone/vignettes/smd.html
#http://rstudio-pubs-static.s3.amazonaws.com/405765_2ce448f9bde24148a5f94c535a34b70e.html
#https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html
#https://cran.r-project.org/web/packages/tableone/tableone.pdf
#https://www.rdocumentation.org/packages/tableone/versions/0.12.0/topics/CreateTableOne

## Construct a table 
#standardized mean differences of greater than 0.1

Final Sample

Some transitions were shown to be simultaneous. Small adjustment such that transitions were sequential rather than simultaneous.

#  dplyr::filter(motivodeegreso_mod_imp!="En curso")%>% #Sacar los tratamientos que estén en curso 

tab1_lab<- paste0('Original C1 Dataset \n(n = ', formatC(nrow(CONS_C1), format='f', big.mark=',', digits=0), ';\nusers: ',formatC(CONS_C1%>% dplyr::distinct(HASH_KEY)%>% nrow(), format='f', big.mark=',', digits=0),')')
tab2_lab<- paste0('C1 Dataset \n(n = ', formatC(nrow(CONS_C1_df_dup_SEP_2020), format='f', big.mark=',', digits=0), ';\nusers: ',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::distinct(hash_key)%>% nrow(), format='f', big.mark=',', digits=0),')')
tab1_5_lab<- paste0('&#8226; Duplicated entries\\l &#8226; Overlapping treatments of users\\l &#8226; Intermediate events of treatment (continuous referrals)')
tab22_lab<- paste0('C1 Dataset \n(n = ', formatC(nrow(CONS_C1_df_dup_SEP_2020_women), format='f', big.mark=',', digits=0), ';\nusers: ',formatC(CONS_C1_df_dup_SEP_2020_women%>% dplyr::distinct(hash_key)%>% nrow(), format='f', big.mark=',', digits=0),')')
tab32_lab <- paste0('&#8226;Discard men (n= ',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(sexo_2=="Men") %>% nrow(), format='f', big.mark=',', digits=0), '; users=',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(sexo_2=="Men") %>% distinct(hash_key) %>% nrow(), format='f', big.mark=',', digits=0),')\\\\l&#8226;Discard posterior treatments (n= ',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(dup>1) %>% nrow(), format='f', big.mark=',', digits=0), '; users=',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(dup>1) %>% distinct(hash_key) %>% nrow(), format='f', big.mark=',', digits=0),')\\\\l&#8226;Discard population of minors (n= ',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(edad_al_ing<18) %>% nrow(), format='f', big.mark=',', digits=0), '; users=',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(edad_al_ing<18) %>% distinct(hash_key) %>% nrow(), format='f', big.mark=',', digits=0),')\\\\l')
tab3_5_lab<- paste0('C1 Dataset \n(n = ', formatC(nrow(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_descartados), format='f', big.mark=',', digits=0), ';\nusers: ',formatC(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_descartados%>% dplyr::distinct(hash_key)%>% nrow(), format='f', big.mark=',', digits=0),')')
tab_miss <-paste0("  Impute missing values: (n=",CONS_C1_df_dup_SEP_2020_women_miss[,..vector_variables_only_for_imputation] %>% complete.cases() %>% janitor::tabyl() %>% dplyr::filter(.=="FALSE") %>% dplyr::select(n) %>% as.numeric() %>% format(big.mark=","),")")
tab4_lab<- paste0('Imputed C1 Dataset \n(n = ', formatC(nrow(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados), format='f', big.mark=',', digits=0), ';\nusers: ',formatC(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados%>% dplyr::distinct(hash_key)%>% nrow(), format='f', big.mark=',', digits=0),')')
tab_ref_lab <- paste0("&#8226; Referrals (n=",CONS_C1_df_dup_SEP_2020_women_miss_after_imp %>% dplyr::filter(motivodeegreso_mod_imp=="Referral to another treatment") %>% nrow() %>% format(big.mark=","),")\\\\l&#8226; Referrals That Had a Readmission (n=",CONS_C1_df_dup_SEP_2020_women_miss_after_imp %>% dplyr::filter(motivodeegreso_mod_imp=="Referral to another treatment", !is.na(fech_ing_next_treat)) %>% nrow() %>% format(big.mark=","),")\\\\l")
tab_final <- paste0("  Imputed C1 Dataset w/o Referrals w/ Readmissions: \n(n=",CONS_C1_df_dup_SEP_2020_women_miss_after_imp %>% dplyr::mutate(cens=ifelse(motivodeegreso_mod_imp=="Referral to another treatment"& !is.na(fech_ing_next_treat),1,0)) %>% dplyr::filter(cens==0) %>% nrow() %>% format(big.mark=","),'\n users= ',CONS_C1_df_dup_SEP_2020_women_miss_after_imp %>% dplyr::mutate(cens=ifelse(motivodeegreso_mod_imp=="Referral to another treatment"& !is.na(fech_ing_next_treat),1,0)) %>% dplyr::filter(cens==0) %>% distinct(hash_key) %>% nrow() %>% format(big.mark=","),")")

#https://stackoverflow.com/questions/46750364/diagrammer-and-graphviz
#https://mikeyharper.uk/flowcharts-in-r-using-diagrammer/
#http://blog.nguyenvq.com/blog/2012/05/29/better-decision-tree-graphics-for-rpart-via-party-and-partykit/
#http://blog.nguyenvq.com/blog/2014/01/17/skeleton-to-create-fast-automatic-tree-diagrams-using-r-and-graphviz/
#https://cran.r-project.org/web/packages/DiagrammeR/vignettes/graphviz-mermaid.html
#https://stackoverflow.com/questions/39133058/how-to-use-graphviz-graphs-in-diagrammer-for-r
#https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781789802566/1/ch01lvl1sec21/creating-diagrams-via-the-diagrammer-package
#https://justlegal.be/2019/05/using-flowcharts-to-display-legal-procedures/
# paste0("No. of treatments: ",table(table(t_id_1)) %>% formatC(big.mark = ","),"; No. of controls: ",table(table(c_id_1))%>% formatC(big.mark = ","))
#
library(DiagrammeR) #⋉
grViz("digraph flowchart {
      # node definitions with substituted label text
      node [fontname = Times, shape = rectangle,fontsize = 9]        
      tab1 [label = '@@1']
      tab2 [label = '@@2']
      tab32 [label = '@@5',fontsize = 7]
      tab22 [label = '@@6']
      tab3 [label = '&#8226;Duplicated entries\\l&#8226;Intermediate events of treatment (continuous referrals)\\l',fontsize = 7]
      tab4 [label = '@@4']
      blank [label = '', width = 0.0001, height = 0.0001]
      blank1 [label = '', width = 0.0001, height = 0.0001]
      blank2 [label = '', width = 0.0001, height = 0.0001]
      blank3 [label = '', width = 0.0001, height = 0.0001]
      tab5 [label = '&#8226;Logically inconsistent candidates for imputation\\l&#8226;Ties in candidates for imputation\\l',fontsize = 7]
      tab6 [label = '@@8',fontsize = 7,style=dotted]#
      tab7 [label = '@@9',style=dotted]#

      # edge definitions with the node IDs
      tab1 -> blank [arrowhead = none,label='  Data wrangling and normalization process',fontsize = 8];
      blank -> tab3
      blank -> tab2
      tab2 -> blank1 [arrowhead = none,label='  Sample selection',fontsize = 8];
      blank1 -> tab32
      blank1 -> tab22
      tab22 -> blank2 [arrowhead = none, label= '@@7', fontsize= 8];
      blank2 -> tab5 
      blank2 -> tab4 [label='  Result of the imputation of missing values',fontsize = 8];#
      tab4 -> blank3 [arrowhead = none,label='  Unkown exact times of discharge',fontsize = 8, style=dotted];#
      blank3 -> tab6 [style=dotted]
      blank3 -> tab7 [style=dotted]
            subgraph {
              rank = same; tab3; blank;
            }
              subgraph {
              rank = same; tab32; blank1;
            }
            subgraph {
              rank = same; tab5; blank2;
            }
            subgraph {
              rank = same; tab6; blank3;
            }
      }

      [1]:  tab1_lab
      [2]:  tab2_lab
      [3]:  tab1_5_lab
      [4]:  tab4_lab
      [5]:  tab32_lab
      [6]:  tab22_lab
      [7]:  tab_miss
      [8]:  tab_ref_lab
      [9]:  tab_final
      ")

#      {rank=same; 'tab2'' -> tab3 [label='',fontsize = 11]}; #⋉
#CONS_C1_df_dup_SEP_2020_irrs_health

tab_miss <-paste0("      Impute missing values: (n=",CONS_C1_df_dup_SEP_2020_women_miss[,..vector_variables_only_for_imputation] %>% complete.cases() %>% janitor::tabyl() %>% dplyr::filter(.=="FALSE") %>% dplyr::select(n) %>% as.numeric() %>% format(big.mark=","),")")
tab4_lab<- paste0('Imputed C1 Dataset \n(users: ',formatC(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados%>% dplyr::distinct(hash_key)%>% nrow(), format='f', big.mark=',', digits=0),')')
tab32_lab <- paste0('&#8226;Discard men (users=',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(sexo_2=="Men") %>% distinct(hash_key) %>% nrow(), format='f', big.mark=',', digits=0),')\\\\l&#8226;Discard posterior treatments (>2) (n= ',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(dup>2) %>% nrow(), format='f', big.mark=',', digits=0), '; users=',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(dup>2) %>% distinct(hash_key) %>% nrow(), format='f', big.mark=',', digits=0),')        \\\\l&#8226;Discard population of minors (users=',formatC(CONS_C1_df_dup_SEP_2020%>% dplyr::filter(edad_al_ing<18) %>% distinct(hash_key) %>% nrow(), format='f', big.mark=',', digits=0),')\\\\l')

gr_selected<-
grViz("digraph flowchart {
      # node definitions with substituted label text
      node [fontname = Times, shape = rectangle,fontsize = 10]
      tab1 [label = '@@1']
      tab2 [label = '@@2']
      tab32 [label = '@@5',fontsize = 10]
      tab22 [label = '@@6', fontsize= 11]
      tab3 [label = '&#8226;Duplicated entries\\l&#8226;Intermediate events of treatment (continuous referrals)        \\l',fontsize = 10]
      tab4 [label = '@@4']
      blank [label = '', width = 0.0001, height = 0.0001]
      blank1 [label = '', width = 0.0001, height = 0.0001]
      blank2 [label = '', width = 0.0001, height = 0.0001]
      blank3 [label = '', width = 0.0001, height = 0.0001]
      tab5 [label = '&#8226;Logically inconsistent candidates for imputation    \\l&#8226;Ties in candidates for imputation\\l',fontsize = 10]

      # edge definitions with the node IDs
      tab1 -> blank [arrowhead = none,label='      Data wrangling and normalization process',fontsize = 13];
      blank -> tab3
      blank -> tab2
      tab2 -> blank1 [arrowhead = none,label='  Sample selection',fontsize = 13];
      blank1 -> tab32
      blank1 -> tab22
      tab22 -> blank2 [arrowhead = none, label= '@@7', fontsize= 13];
      blank2 -> tab5 
      blank2 -> tab4 [label='      Result of the imputation of missing values',fontsize = 13];#

            subgraph {
              rank = same; tab3; blank;
            }
              subgraph {
              rank = same; tab32; blank1;
            }
            subgraph {
              rank = same; tab5; blank2;
            }
      }

      [1]:  tab1_lab
      [2]:  tab2_lab
      [3]:  tab1_5_lab
      [4]:  tab4_lab
      [5]:  tab32_lab
      [6]:  tab22_lab
      [7]:  tab_miss
      [8]:  tab_ref_lab
      ")
#install.packages("rsvg")
#install.packages("DiagrammeRsvg")

library(rsvg)
library(DiagrammeRsvg)
gr_selected %>%
    export_svg %>% charToRaw %>% rsvg_pdf("flowchart_women_specific_programs.pdf")
gr_selected %>%
    export_svg %>% charToRaw %>% rsvg_png("flowchart_women_specific_programs.png")
#library(htmltools); html_print(HTML(gr_selected %>% export_svg))

#DiagrammeR::export_graph(gr_selected, height= 10, width=8)
#      {rank=same; 'tab2'' -> tab3 [label='',fontsize = 11]}; #⋉
#CONS_C1_df_dup_SEP_2020_irrs_health

Survival analysis

Incidence rate

To describe the incidence rate of SUD treatment readmissions by type of program, we recoded the different variables of interest into pairs of different groups by each variable.

CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr<-
CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados %>%  
  dplyr::left_join(CONS_C1_df_dup_SEP_2020[,c("row","person_years")], by="row") %>% 
  dplyr::mutate(readmission= factor(ifelse(!is.na(fech_ing_next_treat),1,0))) %>% 
  dplyr::mutate(day_to_readmission= dplyr::case_when(
                        readmission==1~ (fech_ing_next_treat-fech_ing_num)/365.25,#,
                        readmission==0~ (as.numeric(as.Date("2019-11-13"))-fech_ing_num)/365.25)) %>% 
  dplyr::mutate(comp_status=factor(dplyr::case_when(motivodeegreso_mod_imp=="Therapeutic discharge"~1,grepl("Drop",motivodeegreso_mod_imp)~2,grepl("Administrative",motivodeegreso_mod_imp)~2,T~0),levels = c(0,1,2),labels = c("Censored","Therapeutic discharge","Discharge without clinical advice"))) %>% 
  dplyr::mutate(time_to_outcome= dplyr::case_when(
                        grepl("Censored",comp_status,ignore.case= T)~(as.numeric(as.Date("2019-11-13"))-fech_ing_num)/365.25,
                        grepl("Discharge",comp_status,ignore.case= T)~(fech_egres_num-fech_ing_num)/365.25)) %>% 
  dplyr::mutate(outcome_to_readmission= dplyr::case_when(
                        readmission==1~ (fech_ing_next_treat-fech_egres_num)/365.25,# & grepl("",comp_status)
                        readmission==0~ (as.numeric(as.Date("2019-11-13"))-fech_egres_num)/365.25))
  
library(survminer)
tipo_de_programa_2_fit<- survfit(Surv(day_to_readmission, readmission==1) ~ strata(tipo_de_programa_2), 
                      data=CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr,
            type      = "kaplan-meier",
            error     = "greenwood",
            conf.type = "log-log") 
                      
survdiff_tipo_de_programa_2<-survdiff(Surv(day_to_readmission, readmission==1) ~ tipo_de_programa_2, 
                          data=CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr)

library(tidyverse)
library(lubridate)
library(ggfortify) 

tipo_de_programa_2_na <- tipo_de_programa_2_fit %>% fortify %>% group_by(strata) %>% mutate(CumHaz = cumsum(n.event/n.risk)) 
#%>% mutate(time=time*365.25)

#If one of the groups has not yet dropped to 50% survival at the end of the available data, you cannot compute a median survival and there will be NA values for median survival in such cases. Even if median survival has been reached in a group, it might not be possible to calculate complete confidence intervals for those median values, as you have seen.

#for which <50% of records have an event. Without further assumptions (e.g. that some specific parametric distribution applies) it is then not possible to give a point estimate for the median time to event (the time when half the records have an event).

#http://rstudio-pubs-static.s3.amazonaws.com/522481_5e55bec9c94044678e680a6d07e96a2e.html
#https://rstudio-pubs-static.s3.amazonaws.com/258589_cd197f86fb5548ac89d7bcffd4bc6afe.html
#http://pcool.dyndns.org:8080/statsbook/?page_id=513
#http://rstudio-pubs-static.s3.amazonaws.com/316989_83cbe556125645b698c9ff6cf88c4c1a.html
#https://cran.r-project.org/web/packages/survminer/readme/README.html
#https://docs.ufpr.br/~jlpadilha/CE077/Aulas/2.TecnicasNaoParametricas.pdf 
#http://www.columbia.edu/~sjm2186/EPIC_R/packages.pdf

if(no_mostrar==1){
tipo_de_programa_2_na %>% 
    ggplot(aes(time,CumHaz,color=strata,fill=strata))+
    geom_line()+
    geom_ribbon(aes(ymin = CumHaz - (1.96*std.err), ymax = CumHaz + (1.96*std.err)),alpha=.4)+
    sjPlot::theme_sjplot2()+
    scale_color_manual(name="Type of Program", values=c("#E69F00", "#56B4E9"))+
    scale_fill_manual(name="Type of Program", values=c("#E69F00", "#56B4E9"))
}

ggsurvplot_tipo_de_programa_2_fit<-
  ggsurvplot(tipo_de_programa_2_fit, 
           fun = "cumhaz",
           conf.int = TRUE,
           legend.labs = c("Mixed gender", "Women only"), 
           risk.table = "abs_pct",
           #ncensor.plot = TRUE,
           ggtheme = theme_classic2(base_size=10),
           #ylim=c(0,1),
           legend = c(0.88, 0.15), 
           legend.title="Type of Program",
           xlab= "Time (in years)", 
           cumevents = TRUE,
           surv.connect = T,
           tables.theme = theme_cleantable(),
           censor= F,
           tables.height = 0.15,
           risk.table.y.text.col = F,
           risk.table.col="black",
           font.tickslab = c(10),
           risk.table.height = .15,
           risk.table.fontsize = 2.5,
           cumevents.y.text.col = F,
           cumevents.col="black",
           cumevents.height = .12,
           cumevents.fontsize = 2,
           break.time.by = 1,
           pval = F,
           #xscale=  "d_y", #scale days to years
           palette = c("skyblue4","orangered4"))
ggsurvplot_tipo_de_programa_2_fit

Figure 5a. Cum. Hazards to Experience Readmission to SUD Treatment, Stratified by Type of Program

 # scale_y_continuous(breaks = sort(c(seq(0, 100, 10), 56)))

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

irrs<-function(x, y="event", z="person_days",db){
  #x= variable que agrupa
  #y= evento explicado
  #z= person days
  #db= base de datos
  fmla <- as.formula(paste0(y,"~",x))
  fmla2 <- as.formula(paste0(z,"~",x))
assign(paste0("irr_",y,"_por_",x),
       rateratio.test::rateratio.test(
     x=as.numeric(xtabs(fmla, data=get(db)))[c(2,1)],
     n=as.numeric(xtabs(fmla, data=get(db)))[c(2,1)]
    )
   )
return(
  rateratio.test::rateratio.test(
     x=as.numeric(xtabs(fmla, data=get(db)))[c(2,1)],
     n=as.numeric(xtabs(fmla2, data=get(db)))[c(2,1)]
      )
    )
}
#biostat3::survRate(Surv((day_to_readmission)/1000, readmission==1) ~ tipo_de_programa_2, data=CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk)

irrs_tipo_de_programa_2<-irrs(x="tipo_de_programa_2", z="day_to_readmission", y="as.numeric(readmission)-1", db="CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr")
#General population=1     Women specific=2 

if(no_mostrar==1){
jpeg("C:/Users/andre/Desktop/SUD_CL/eso.jpg", height=10, width= 10, res= 96, units = "in")
ggsurvplot_tipo_de_programa_2_fit
dev.off()
}

The incidence rate of readmission was 1.55 (95% IC 1.47-1.63) higher in users that were admitted in women-only programs, compared with users that were admitted in mixed-gender programs (p<0.001).

library(survminer)
comp_status_fit<- survfit(Surv(outcome_to_readmission, readmission==1) ~strata(comp_status), 
            data= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr,
            type      = "kaplan-meier",
            error     = "greenwood",
            conf.type = "log-log") 
library(tidyverse)
library(lubridate)
library(ggfortify) 
comp_status_fit_na <- comp_status_fit %>% fortify %>% group_by(strata) %>% mutate(CumHaz = cumsum(n.event/n.risk))

ggsurvplot_comp_status_fit<-
  ggsurvplot(comp_status_fit, 
           fun = "cumhaz",
           conf.int = TRUE,
          legend.labs = c("Censored","Therapeutic discharge", "Discharge w/o clinical advice"), 
           risk.table = "abs_pct",
           #ncensor.plot = TRUE,
           ggtheme = theme_classic2(base_size=10),
           risk.table.y.text.col = F,
           risk.table.col="black",
           font.tickslab = c(10),
           risk.table.height = .15,
           risk.table.fontsize = 2.5,
           break.time.by = 1,
           pval = F,
           cumevents = TRUE,
           tables.theme = theme_cleantable(),
           cumevents.y.text.col = F,
           cumevents.col="black",
           cumevents.height = .11,
           cumevents.fontsize = .5,
           #ylim=c(0,10),
           legend = c(0.58, 0.15), 
           legend.title="Type of Program",
           xlab= "Time (in years)", 
           #cumevents=T,
           surv.connect = T,
           censor= F,
           #xscale=  "d_y",
           palette = c("skyblue4","darkgreen","orangered4"))
ggsurvplot_comp_status_fit

Figure 5b. Cum. Hazards to Experience Readmission to SUD Treatment, Stratified by Treatment Outcome

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#table(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status,
#      CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$motivodeegreso_mod_imp)

irrs<-function(x, y="event", z="person_days",db){
  #x= variable que agrupa
  #y= evento explicado
  #z= person days
  #db= base de datos
  fmla <- as.formula(paste0(y,"~",x))
  fmla2 <- as.formula(paste0(z,"~",x))
assign(paste0("irr_",y,"_por_",x),
       rateratio.test::rateratio.test(
     x=as.numeric(xtabs(fmla, data=get(db)))[c(2,1)],
     n=as.numeric(xtabs(fmla, data=get(db)))[c(2,1)]
    )
   )
return(
  rateratio.test::rateratio.test(
     x=as.numeric(xtabs(fmla, data=get(db)))[c(2,1)],
     n=as.numeric(xtabs(fmla2, data=get(db)))[c(2,1)]
      )
    )
}

survdiff_comp_status<-survdiff(Surv(outcome_to_readmission, readmission==1) ~ comp_status, 
                          data=CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr)

CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk<-
CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr[which(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status!='Censored')]

CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$outcome_to_readmission<-
ifelse(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$outcome_to_readmission==0,.000001,
       CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$outcome_to_readmission)

#biostat3::survRate(Surv((outcome_to_readmission)/1000, readmission==1) ~ comp_status, data=CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk)
#para compararlo
comp_status_readmission <-
  irrs(x="as.numeric(comp_status)", 
                              y="as.numeric(readmission)-1",# 2021-04-06, 
                              z="outcome_to_readmission", 
                              db="CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk")
time_bef_inc_rate<-Sys.time()

Excluding censored cases, the incidence rate of readmission was significantly higher (1.32, 95% IC 1.23-1.42) in users that had a Discharge without clinical advice, compared with users that had a Therapeutic discharge (p<0.001).

Parametric regression of the risk of Readmission, Stratified by Treatment Outcome

To estimate the risk and the time to treatment readmission by type of treatment program (i.e, women-only and mixed-gender treatment programs) conditioned by previous treatment outcome (i.e., administrative discharge, early and late drop-outs, therapeutic discharge), we chose the parametric distribution that resembles most to the Kaplan-Meier survival curve among the intercept-only models.

#Revisión de los casos
#print(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr[,c("comp_status", "outcome_to_readmission", "readmission")],10)
#_#_#_#_#_#_#_#_#_#_#_

fitted_flexsurvreg0<-data.frame()
fit_flexsurvreg0<-data.frame()
dists_simple_surv <- cbind.data.frame(
                          formal=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                                       "Generalized gamma", "Lognormal", "Exponential"),1),
                          model=rep(c("weibull", "weibullph", "llogis", "gamma", "gengamma", "gompertz", "lnorm", "exp"),1))

fitform0a <- Surv(outcome_to_readmission, readmission==1) ~  1

fitform0b <- Surv(outcome_to_readmission, readmission==1) ~  edad_al_ing_grupos+ escolaridad_rec+ sus_principal_mod+ 
         freq_cons_sus_prin+ compromiso_biopsicosocial+ tenencia_de_la_vivienda_mod+ 
         num_otras_sus_mod+ numero_de_hijos_mod_rec+ tipo_de_plan_res+ tipo_de_programa_2+ time_to_outcome 

fitform0c <- Surv(outcome_to_readmission, readmission==1) ~  tipo_de_programa_2


#goes out for 15 years with cycles of one month 
tt15<-seq(0,15,1/12)    
newtime0 = seq(from=1/24, to= 15, by=1/24)

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#_#_#_#_#_#_#_#_#_#_#
#_#_#_#_#_#_#_#_#_#_##comp_status
for (u in c("Therapeutic discharge","Discharge without clinical advice")){
for (i in 1:nrow(dists_simple_surv)){  #
model<-paste0("mod_surv_simple_",dists_simple_surv[i,"model"])
  flexsurvreg(formula=fitform0a, 
  # IF time 0, then its changed by .000001 to avoid having continous transitions
  # Excluded censored cases at outcomes
              #data = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,
              data = subset(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,comp_status==u),
              dist = dists_simple_surv[i,"model"]) %>% 
    assign(model,.,envir=,.GlobalEnv)
  
  #Generate databases
 fitted_flexsurvreg0<-  rbind(fitted_flexsurvreg0,cbind.data.frame(dist=rep(dists_simple_surv[i,"formal"],),
                                                                   status=u,
                        data.table::data.table(summary(get(model), t=newtime0, type = "hazard", tidy=T)))) 
  # Generate fit indices
  fit_flexsurvreg0<-rbind(fit_flexsurvreg0,
     cbind(dist= dists_simple_surv[i,"formal"],
           status=u,
           fitstats.flexsurvreg(get(model))))
  #the BIC may not be appropriate if none of the candidate models are considered to be close to the ‘true’ model.     
  }
}

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#Calculate error
library("muhaz")
kernel_haz_est_simple_td <- muhaz(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$outcome_to_readmission[which(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$comp_status=="Therapeutic discharge")],
                        as.numeric(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$readmission[which(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$comp_status=="Therapeutic discharge")])-1)

kernel_haz_est_simple_dwca <- muhaz(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$outcome_to_readmission[which(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$comp_status=="Discharge without clinical advice")],
                        as.numeric(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$readmission[which(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk$comp_status=="Discharge without clinical advice")])-1)

kernel_haz_simple <- rbind(data.table(time = kernel_haz_est_simple_td$est.grid,
                         est = kernel_haz_est_simple_td$haz.est,
                         dist = "Kernel density",
                         status= "Therapeutic discharge"),
                         data.table(time = kernel_haz_est_simple_dwca$est.grid,
                         est = kernel_haz_est_simple_dwca$haz.est,
                         dist = "Kernel density",
                         status= "Discharge without clinical advice"))
                         
kernel_haz_simple_binned<-
  kernel_haz_simple %>% 
  dplyr::mutate(time=ifelse(time<.01,.01,time)) %>% 
  dplyr::group_by(status) %>% 
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(kernel_haz_simple$time),.05))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(status, dist, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup() %>% 
  dplyr::select(-dist)

fitted_flexsurvreg0_binned<-
  fitted_flexsurvreg0[,c("status","dist","time","est")] %>% 
  dplyr::filter(time<=max(kernel_haz_simple$time)) %>%
  dplyr::mutate(time=ifelse(time<.01,.01,time)) %>% 
  dplyr::group_by(status, dist) %>%
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(kernel_haz_simple$time),.05))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(status, dist, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup()     

fitted_flexsurvreg0_binned_mix<-
fitted_flexsurvreg0_binned %>% 
  dplyr::left_join(kernel_haz_simple_binned[,c("x_binned","mean_est")], by="x_binned",suffix = c("_par","_kernel"))

db_for_apply_rmse0<-
  data.frame(dist=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential"),1))
   
rmse_comp_fits_0<- data.frame()
for(i in 1:nrow(db_for_apply_rmse0)){
rmse1<- Metrics::rmse(subset(fitted_flexsurvreg0_binned_mix[complete.cases(fitted_flexsurvreg0_binned_mix),], 
                     dist==db_for_apply_rmse0[i,"dist"] & status=="Discharge without clinical advice")$mean_est_par,
              subset(fitted_flexsurvreg0_binned_mix[complete.cases(fitted_flexsurvreg0_binned_mix),], 
                     dist==db_for_apply_rmse0[i,"dist"] & status=="Discharge without clinical advice")$mean_est_kernel)
rmse2<- Metrics::rmse(subset(fitted_flexsurvreg0_binned_mix[complete.cases(fitted_flexsurvreg0_binned_mix),], 
                     dist==db_for_apply_rmse0[i,"dist"] & status=="Therapeutic discharge")$mean_est_par,
              subset(fitted_flexsurvreg0_binned_mix[complete.cases(fitted_flexsurvreg0_binned_mix),], 
                     dist==db_for_apply_rmse0[i,"dist"] & status=="Therapeutic discharge")$mean_est_kernel)

rmse_comp_fits_0<- rbind(rmse_comp_fits_0,cbind(dist=db_for_apply_rmse0[i,"dist"],
                                                  rmse_dwca=round(rmse1,4), rmse_td=round(rmse2,4)))
}

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
haz0 <- rbind(kernel_haz_simple, fitted_flexsurvreg0[,1:4])

haz_plot_0_int_only<-
haz0 %>% 
  dplyr::mutate(dist=factor(dist,levels=c("Kernel density",dists_simple_surv$formal))) %>% 
  dplyr::filter(est<1.5) %>% 
ggplot()+
    geom_line(aes(time, est, color=dist),size=1)+
    scale_color_manual(name="Distributions", values = c("black","#f54b96","#00e9b1","#69b763",
"#166000","#b27ff9","#fa863b","#013eab","#a7aa48","#b34b40"))+
                         #c("black",brewer.pal(n = 9, name = 'Paired')))+
                         #c("#112A60","#085754","#D3A347","#4F3C91","red","#112A60","#085754","#8F630D","#251363")) +
    facet_wrap(~status)+#labeller = labeller(trans = transition_label))+
    sjPlot::theme_sjplot2()+
    theme(legend.position="bottom",
          strip.background = element_rect(fill = "white", colour = "white"))+
  scale_x_continuous(breaks = seq(0, 15, 2))+
  #theme(axis.text.x = element_blank(), 
  #      panel.grid.major = element_blank(), 
  #      panel.grid.minor = element_blank()) +
  labs(y="Hazard",x="Time (years)")

haz_plot_0_int_only

Figure 5. Comparison of parametric models on intercept-only models vs. Kernel hazard density curves

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

dt_coefs_simple_surv<-
data.frame(
real_vars=c("tipo_de_programa_2Women specific", "edad_al_ing_grupos.L", "edad_al_ing_grupos.Q", "edad_al_ing_grupos.C","escolaridad_rec.L", "escolaridad_rec.Q", "sus_principal_modCocaine hydrochloride", "sus_principal_modCocaine paste", "sus_principal_modMarijuana", "sus_principal_modOther",  "freq_cons_sus_prin.L", "freq_cons_sus_prin.Q", "freq_cons_sus_prin.C", "freq_cons_sus_prin^4", "compromiso_biopsicosocial.L", "compromiso_biopsicosocial.Q", "tenencia_de_la_vivienda_modOthers", "tenencia_de_la_vivienda_modOwner/Transferred dwellings/Pays Dividends", "tenencia_de_la_vivienda_modRenting", "tenencia_de_la_vivienda_modStays temporarily with a relative", "num_otras_sus_mod.L", "num_otras_sus_mod.Q", "numero_de_hijos_mod_recYes", "tipo_de_plan_resResidential", "comp_statusDischarge without clinical advice", "tipo_de_programa_2Women specific:comp_statusDischarge without clinical advice","time_to_outcome"),
 formal_vars= c('Type of program-Women specific', 'Age at admission to treatment, grouped- 30-39', 'Age at admission to treatment, grouped- 40-49', 'Age at admission to treatment, grouped- 50+', 'Ed. Attainment- Completed high school or less', 'Ed. Attainment- More than high school', 'Primary or main substance- Cocaine hydrochloride', 'Primary or main substance- Cocaine paste', 'Primary or main substance- Marijuana', 'Primary or main substance- Other', 'Consumption frequency of primary or main substance- 2 to 3 days a week', 'Consumption frequency of primary or main substance- 4 to 6 days a week', 'Consumption frequency of primary or main substance- 1 day a week or more', 'Consumption frequency of primary or main substance- Daily', 'Biopsychosocial involvement- 2-Moderate', 'Biopsychosocial involvement- 3-Severe', 'Tenure status of households- Others', 'Tenure status of households- Owner/Transferred dwellings/Pays Dividends', 'Tenure status of households- Renting', 'Tenure status of households- Stays temporarily with a relative', 'Co-occurring SUD- One additional substance', 'Co-occurring SUD- More than one additional substance', 'Have children (Dichotomized)- Yes', 'Setting of Treatment- Residential', 'Cause of Discharge- W/o clinical advice', 'Type of Program x Cause of Discharge', 'Days in baseline treatment')#'Educational Attainment', tipo_de_plan_resResidential
)

  # si habían tiempos 0, los cambié por .000001 para que no hayan casos absolutamente continuos
  # saqué los casos censurados

flexsurv_1a_td<-
flexsurvreg(formula=fitform0b, 
              data = subset(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,grepl("Therapeutic",comp_status)),
              dist = "Gompertz"
    ) 
flexsurv_1b_td<-
flexsurvreg(formula=fitform0b, 
              data = subset(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,grepl("Therapeutic",comp_status)),
              dist = "weibullph"
    ) 
flexsurv_1c_td<-
flexsurvreg(formula=fitform0c, 
              data = subset(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,grepl("Therapeutic",comp_status)),
              dist = "Gompertz"
    ) 
flexsurv_1d_td<-
flexsurvreg(formula=fitform0c, 
              data = subset(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,grepl("Therapeutic",comp_status)),
              dist = "weibullph"
    ) 
flexsurv_1a_dwca<-
flexsurvreg(formula=fitform0b, 
              data = subset(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,grepl("without",comp_status)),
              dist = "Gompertz"
    ) 
flexsurv_1b_dwca<-
flexsurvreg(formula=fitform0b, 
              data = subset(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,grepl("without",comp_status)),
              dist = "Gamma"
    ) 
flexsurv_1c_dwca<-
flexsurvreg(formula=fitform0c, 
              data = subset(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,grepl("without",comp_status)),
              dist = "Gompertz"
    ) 
flexsurv_1d_dwca<-
flexsurvreg(formula=fitform0c, 
              data = subset(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_cmprsk,grepl("without",comp_status)),
              dist = "Gamma"
    ) 
#effect = "distp") # distp for distribution parameters (name can be changed)
#tipo_de_programa_2Women specific:comp_statusDischarge without clinical advice

note_flexsurv_1a_td<-
  paste0("N= ",format(as.numeric(flexsurv_1a_td$N), big.mark=","),"; Events= ",format(as.numeric(flexsurv_1a_td$events), big.mark=","),"; Censored= ",format(as.numeric(flexsurv_1a_td$N - flexsurv_1a_td$events), big.mark=","),"; Time at risk= ",format(as.numeric(flexsurv_1a_td$trisk), big.mark=","),"; Df= ",format(as.numeric(flexsurv_1a_td$npars), big.mark=","), "; AIC= ",format(as.numeric(flexsurv_1a_td$AIC), big.mark=","))
note_flexsurv_1b_td <-
  paste0("N= ",format(as.numeric(flexsurv_1b_td$N), big.mark=","),"; Events= ",format(as.numeric(flexsurv_1b_td$events), big.mark=","),"; Censored= ",format(as.numeric(flexsurv_1b_td$N - flexsurv_1b_td$events), big.mark=","),"; Time at risk= ",format(as.numeric(flexsurv_1b_td$trisk), big.mark=","),"; Df= ",format(as.numeric(flexsurv_1b_td$npars), big.mark=","), "; AIC= ",format(as.numeric(flexsurv_1b_td$AIC), big.mark=","))
note_flexsurv_1c_td<-
  paste0("N= ",format(as.numeric(flexsurv_1c_td$N), big.mark=","),"; Events= ",format(as.numeric(flexsurv_1c_td$events), big.mark=","),"; Censored= ",format(as.numeric(flexsurv_1c_td$N - flexsurv_1c_td$events), big.mark=","),"; Time at risk= ",format(as.numeric(flexsurv_1c_td$trisk), big.mark=","),"; Df= ",format(as.numeric(flexsurv_1c_td$npars), big.mark=","), "; AIC= ",format(as.numeric(flexsurv_1c_td$AIC), big.mark=","))
note_flexsurv_1d_td <-
  paste0("N= ",format(as.numeric(flexsurv_1d_td$N), big.mark=","),"; Events= ",format(as.numeric(flexsurv_1d_td$events), big.mark=","),"; Censored= ",format(as.numeric(flexsurv_1d_td$N - flexsurv_1d_td$events), big.mark=","),"; Time at risk= ",format(as.numeric(flexsurv_1d_td$trisk), big.mark=","),"; Df= ",format(as.numeric(flexsurv_1d_td$npars), big.mark=","), "; AIC= ",format(as.numeric(flexsurv_1d_td$AIC), big.mark=","))

note_flexsurv_1a_dwca<-
  paste0("N= ",format(as.numeric(flexsurv_1a_dwca$N), big.mark=","),"; Events= ",format(as.numeric(flexsurv_1a_dwca$events), big.mark=","),"; Censored= ",format(as.numeric(flexsurv_1a_dwca$N - flexsurv_1a_dwca$events), big.mark=","),"; Time at risk= ",format(as.numeric(flexsurv_1a_dwca$trisk), big.mark=","),"; Df= ",format(as.numeric(flexsurv_1a_dwca$npars), big.mark=","), "; AIC= ",format(as.numeric(flexsurv_1a_dwca$AIC), big.mark=","))
note_flexsurv_1b_dwca <-
  paste0("N= ",format(as.numeric(flexsurv_1b_dwca$N), big.mark=","),"; Events= ",format(as.numeric(flexsurv_1b_dwca$events), big.mark=","),"; Censored= ",format(as.numeric(flexsurv_1b_dwca$N - flexsurv_1b_dwca$events), big.mark=","),"; Time at risk= ",format(as.numeric(flexsurv_1b_dwca$trisk), big.mark=","),"; Df= ",format(as.numeric(flexsurv_1b_dwca$npars), big.mark=","), "; AIC= ",format(as.numeric(flexsurv_1b_dwca$AIC), big.mark=","))
note_flexsurv_1c_dwca<-
  paste0("N= ",format(as.numeric(flexsurv_1c_dwca$N), big.mark=","),"; Events= ",format(as.numeric(flexsurv_1c_dwca$events), big.mark=","),"; Censored= ",format(as.numeric(flexsurv_1c_dwca$N - flexsurv_1c_dwca$events), big.mark=","),"; Time at risk= ",format(as.numeric(flexsurv_1c_dwca$trisk), big.mark=","),"; Df= ",format(as.numeric(flexsurv_1c_dwca$npars), big.mark=","), "; AIC= ",format(as.numeric(flexsurv_1c_dwca$AIC), big.mark=","))
note_flexsurv_1d_dwca <-
  paste0("N= ",format(as.numeric(flexsurv_1d_dwca$N), big.mark=","),"; Events= ",format(as.numeric(flexsurv_1d_dwca$events), big.mark=","),"; Censored= ",format(as.numeric(flexsurv_1d_dwca$N - flexsurv_1d_dwca$events), big.mark=","),"; Time at risk= ",format(as.numeric(flexsurv_1d_dwca$trisk), big.mark=","),"; Df= ",format(as.numeric(flexsurv_1d_dwca$npars), big.mark=","), "; AIC= ",format(as.numeric(flexsurv_1d_dwca$AIC), big.mark=","))

if(no_mostrar==1){
jpeg("C:/Users/andre/Desktop/SUD_CL/eso4b.jpg", height=10, width= 8, res= 96, units = "in")
  haz_plot_0_int_only
dev.off()
}

We chose the most adequate distribution following visual assessment of the smooth kernel hazards. This was followed by a measure of the error of the data, and considering that the smooth kernel distribution had different time points than the parametric survival distributions presented, we split time into 198 categories, each one separated .05 years from 0 to the last available time point in the smooth kernel (in this case, 9.8562628 years). Once we had the same amount of data between the smooth kernel hazards and each distribution and different cause of discharge, we calculated the RMSE by comparing the smooth kernel hazards with each parametric survival distribution and type of discharge.

The estimated model included covariates of interest, stratified by Cause of Discharge of the baseline treatment.

#<div style="border: 1px solid #ddd; padding: 5px; overflow-y: scroll; height:350px; overflow-x: scroll; width:100%">
coefs_1a<-
data.table::data.table(round(exp(flexsurv_1a_td$res),2)[,1:3],keep.rownames = T) %>% 
   dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars")) %>% 
  #dplyr::mutate(p.value=ifelse(p.value<.001,"<0.001",as.character(sprintf("%1.3f",p.value)))) %>% 
  dplyr::mutate(conf.low=sprintf("%1.2f",`L95%`),
                conf.high=sprintf("%1.2f",`U95%`),
                statistic=sprintf("%1.2f",est),
                `CI 95%`=paste0(conf.low,", ",conf.high)) %>% 
  dplyr::filter(!rn %in% c("shape","scale","rate")) %>% 
  dplyr::select(formal_vars, statistic, `CI 95%`) 

coefs_1b<-
data.table::data.table(round(exp(flexsurv_1b_td$res),2)[,1:3],keep.rownames = T) %>% 
   dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars")) %>% 
  #dplyr::mutate(p.value=ifelse(p.value<.001,"<0.001",as.character(sprintf("%1.3f",p.value)))) %>% 
  dplyr::mutate(conf.low=sprintf("%1.2f",`L95%`),
                conf.high=sprintf("%1.2f",`U95%`),
                statistic=sprintf("%1.2f",est),
                `CI 95%`=paste0(conf.low,", ",conf.high)) %>% 
  dplyr::filter(!rn %in% c("shape","scale","rate")) %>% 
  dplyr::select(formal_vars, statistic, `CI 95%`) 

coefs_1a2<-
data.table::data.table(round(exp(flexsurv_1c_td$res),2)[,1:3],keep.rownames = T) %>% 
   dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars")) %>% 
  #dplyr::mutate(p.value=ifelse(p.value<.001,"<0.001",as.character(sprintf("%1.3f",p.value)))) %>% 
  dplyr::mutate(conf.low=sprintf("%1.2f",`L95%`),
                conf.high=sprintf("%1.2f",`U95%`),
                statistic=sprintf("%1.2f",est),
                `CI 95%`=paste0(conf.low,", ",conf.high)) %>% 
  dplyr::filter(!rn %in% c("shape","scale","rate")) %>% 
  dplyr::select(formal_vars, statistic, `CI 95%`) 

coefs_1b2<-
data.table::data.table(round(exp(flexsurv_1d_td$res),2)[,1:3],keep.rownames = T) %>% 
   dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars")) %>% 
  #dplyr::mutate(p.value=ifelse(p.value<.001,"<0.001",as.character(sprintf("%1.3f",p.value)))) %>% 
  dplyr::mutate(conf.low=sprintf("%1.2f",`L95%`),
                conf.high=sprintf("%1.2f",`U95%`),
                statistic=sprintf("%1.2f",est),
                `CI 95%`=paste0(conf.low,", ",conf.high)) %>% 
  dplyr::filter(!rn %in% c("shape","scale","rate")) %>% 
  dplyr::select(formal_vars, statistic, `CI 95%`) 


coefs_1c<-
data.table::data.table(round(exp(flexsurv_1a_dwca$res),2)[,1:3],keep.rownames = T) %>% 
   dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars")) %>% 
  #dplyr::mutate(p.value=ifelse(p.value<.001,"<0.001",as.character(sprintf("%1.3f",p.value)))) %>% 
  dplyr::mutate(conf.low=sprintf("%1.2f",`L95%`),
                conf.high=sprintf("%1.2f",`U95%`),
                statistic=sprintf("%1.2f",est),
                `CI 95%`=paste0(conf.low,", ",conf.high)) %>% 
  dplyr::filter(!rn %in% c("shape","scale","rate")) %>% 
  dplyr::select(formal_vars, statistic, `CI 95%`) 

coefs_1d<-
data.table::data.table(round(exp(flexsurv_1b_dwca$res),2)[,1:3],keep.rownames = T) %>% 
   dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars")) %>% 
  #dplyr::mutate(p.value=ifelse(p.value<.001,"<0.001",as.character(sprintf("%1.3f",p.value)))) %>% 
  dplyr::mutate(conf.low=sprintf("%1.2f",`L95%`),
                conf.high=sprintf("%1.2f",`U95%`),
                statistic=sprintf("%1.2f",est),
                `CI 95%`=paste0(conf.low,", ",conf.high)) %>% 
  dplyr::filter(!rn %in% c("shape","scale","rate")) %>% 
  dplyr::select(formal_vars, statistic, `CI 95%`) 

coefs_1c2<-
data.table::data.table(round(exp(flexsurv_1c_dwca$res),2)[,1:3],keep.rownames = T) %>% 
   dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars")) %>% 
  #dplyr::mutate(p.value=ifelse(p.value<.001,"<0.001",as.character(sprintf("%1.3f",p.value)))) %>% 
  dplyr::mutate(conf.low=sprintf("%1.2f",`L95%`),
                conf.high=sprintf("%1.2f",`U95%`),
                statistic=sprintf("%1.2f",est),
                `CI 95%`=paste0(conf.low,", ",conf.high)) %>% 
  dplyr::filter(!rn %in% c("shape","scale","rate")) %>% 
  dplyr::select(formal_vars, statistic, `CI 95%`) 

coefs_1d2<-
data.table::data.table(round(exp(flexsurv_1d_dwca$res),2)[,1:3],keep.rownames = T) %>% 
    dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars")) %>% 
    #dplyr::mutate(p.value=ifelse(p.value<.001,"<0.001",as.character(sprintf("%1.3f",p.value)))) %>% 
    dplyr::mutate(conf.low=sprintf("%1.2f",`L95%`),
                  conf.high=sprintf("%1.2f",`U95%`),
                  statistic=sprintf("%1.2f",est),
                  `CI 95%`=paste0(conf.low,", ",conf.high)) %>% 
    dplyr::filter(!rn %in% c("shape","scale","rate")) %>% 
    dplyr::select(formal_vars, statistic, `CI 95%`) 

coefs_surv<-
dplyr::right_join(coefs_1a2,coefs_1a,by="formal_vars")%>% #
    dplyr::left_join(coefs_1b2,by="formal_vars")%>% 
    dplyr::left_join(coefs_1b,by="formal_vars")%>% 
    dplyr::left_join(coefs_1c2,by="formal_vars")%>%
    dplyr::left_join(coefs_1c,by="formal_vars")%>%
    dplyr::left_join(coefs_1d2,by="formal_vars")%>% 
    dplyr::left_join(coefs_1d,by="formal_vars")
 options(knitr.kable.NA = '')

coefs_surv %>%   
    knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption=paste0("Table 5. Coefficients of selected parametric distributions"),
               col.names = c("Term",rep(c("Haz","CI 95%"),8)),
               align= c("l",rep('c', 16)))%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size=10)%>%
  kableExtra::row_spec(1, bold = TRUE) %>% 
  kableExtra::add_header_above(c(" ", "Gompertz" = 4, "Weibull (PH)" = 4, "Gompertz" = 4, "Gamma" = 4),bold=T) %>% 
  kableExtra::add_header_above(c(" ", "Therapeutic discarge" = 8, "Discharge w/o clinical advice" = 8),bold=T) %>% 
  kableExtra::add_footnote(c("Note. From left to right:",paste0("Gompertz: ",note_flexsurv_1c_td),
    paste0("Gompertz: ",note_flexsurv_1a_td),
    paste0("Weibull (PH): ",note_flexsurv_1d_td),
    paste0("Weibull (PH): ",note_flexsurv_1b_td),
    paste0("Gompertz: ",note_flexsurv_1c_dwca),
    paste0("Gompertz: ",note_flexsurv_1a_dwca),
    paste0("Gamma: ",note_flexsurv_1d_dwca),
    paste0("Gamma: ",note_flexsurv_1b_dwca)), notation = "none") %>%
  kableExtra::scroll_box(width = "100%", height = "350px")

Table 5. Coefficients of selected parametric distributions
	Therapeutic discarge								Discharge w/o clinical advice
	Gompertz				Weibull (PH)				Gompertz				Gamma
Term	Haz	CI 95%	Haz	CI 95%	Haz	CI 95%	Haz	CI 95%	Haz	CI 95%	Haz	CI 95%	Haz	CI 95%	Haz	CI 95%
Type of program-Women specific	1.85	1.64, 2.08	1.07	0.92, 1.24	1.83	1.62, 2.06	1.07	0.92, 1.24	1.44	1.35, 1.54	1.11	1.03, 1.20	1.89	1.68, 2.14	1.23	1.06, 1.42
Age at admission to treatment, grouped- 30-39			0.81	0.69, 0.96			0.82	0.70, 0.97			0.73	0.65, 0.82			0.56	0.45, 0.69
Age at admission to treatment, grouped- 40-49			0.89	0.77, 1.01			0.89	0.78, 1.01			0.87	0.79, 0.95			0.77	0.65, 0.92
Age at admission to treatment, grouped- 50+			1.09	0.97, 1.23			1.09	0.96, 1.23			1.01	0.94, 1.09			1.01	0.88, 1.17
Ed. Attainment- Completed high school or less			0.83	0.72, 0.95			0.83	0.72, 0.95			0.85	0.78, 0.92			0.72	0.61, 0.85
Ed. Attainment- More than high school			0.95	0.85, 1.05			0.95	0.86, 1.05			0.89	0.84, 0.95			0.80	0.71, 0.89
Primary or main substance- Cocaine hydrochloride			1.24	1.02, 1.50			1.21	1.00, 1.46			1.17	1.04, 1.31			1.32	1.07, 1.64
Primary or main substance- Cocaine paste			1.16	0.98, 1.36			1.15	0.97, 1.35			1.32	1.20, 1.45			1.63	1.36, 1.94
Primary or main substance- Marijuana			0.75	0.55, 1.03			0.75	0.54, 1.02			0.97	0.82, 1.14			0.90	0.66, 1.22
Primary or main substance- Other			0.65	0.46, 0.92			0.64	0.45, 0.90			0.80	0.63, 1.03			0.63	0.39, 1.02
Consumption frequency of primary or main substance- 2 to 3 days a week			0.92	0.74, 1.13			0.91	0.74, 1.13			1.26	1.10, 1.44			1.55	1.20, 2.00
Consumption frequency of primary or main substance- 4 to 6 days a week			1.05	0.87, 1.28			1.06	0.87, 1.29			0.85	0.75, 0.96			0.73	0.58, 0.92
Consumption frequency of primary or main substance- 1 day a week or more			1.09	0.88, 1.36			1.10	0.88, 1.37			1.10	0.98, 1.24			1.20	0.96, 1.50
Consumption frequency of primary or main substance- Daily			1.17	0.97, 1.41			1.17	0.97, 1.42			0.87	0.79, 0.96			0.78	0.64, 0.94
Biopsychosocial involvement- 2-Moderate			1.34	1.08, 1.68			1.34	1.07, 1.67			1.07	0.95, 1.21			1.13	0.90, 1.42
Biopsychosocial involvement- 3-Severe			1.03	0.89, 1.18			1.02	0.89, 1.18			1.01	0.94, 1.10			1.03	0.89, 1.19
Tenure status of households- Others			1.64	0.83, 3.27			1.62	0.81, 3.23			0.95	0.69, 1.30			0.83	0.46, 1.47
Tenure status of households- Owner/Transferred dwellings/Pays Dividends			1.47	0.81, 2.69			1.50	0.82, 2.75			0.99	0.76, 1.28			0.93	0.58, 1.48
Tenure status of households- Renting			1.34	0.72, 2.47			1.38	0.75, 2.54			0.99	0.76, 1.28			0.90	0.56, 1.45
Tenure status of households- Stays temporarily with a relative			1.53	0.84, 2.78			1.56	0.86, 2.84			0.94	0.73, 1.21			0.83	0.52, 1.32
Co-occurring SUD- One additional substance			1.30	1.14, 1.48			1.30	1.14, 1.48			1.09	1.01, 1.16			1.13	1.00, 1.28
Co-occurring SUD- More than one additional substance			0.83	0.75, 0.92			0.83	0.75, 0.92			1.04	0.98, 1.10			1.08	0.97, 1.20
Have children (Dichotomized)- Yes			0.97	0.80, 1.18			0.97	0.80, 1.18			1.12	1.00, 1.25			1.25	1.01, 1.55
Setting of Treatment- Residential			2.17	1.84, 2.56			2.13	1.80, 2.51			1.63	1.48, 1.79			2.16	1.83, 2.56
Days in baseline treatment			1.01	0.90, 1.13			1.02	0.91, 1.14			1.27	1.18, 1.37			1.54	1.34, 1.77
Note. From left to right:
Gompertz: N= 5,114; Events= 1,063; Censored= 4,051; Time at risk= 16,862.38; Df= 3; AIC= 7,574.561
Gompertz: N= 5,114; Events= 1,063; Censored= 4,051; Time at risk= 16,862.38; Df= 27; AIC= 7,329.749
Weibull (PH): N= 5,114; Events= 1,063; Censored= 4,051; Time at risk= 16,862.38; Df= 3; AIC= 7,221.371
Weibull (PH): N= 5,114; Events= 1,063; Censored= 4,051; Time at risk= 16,862.38; Df= 27; AIC= 6,989.208
Gompertz: N= 11,995; Events= 3,595; Censored= 8,400; Time at risk= 43,125.41; Df= 3; AIC= 23,179.76
Gompertz: N= 11,995; Events= 3,595; Censored= 8,400; Time at risk= 43,125.41; Df= 27; AIC= 22,852.54
Gamma: N= 11,995; Events= 3,595; Censored= 8,400; Time at risk= 43,125.41; Df= 3; AIC= 22,337.32
Gamma: N= 11,995; Events= 3,595; Censored= 8,400; Time at risk= 43,125.41; Df= 27; AIC= 22,044.21

#</div>

#"baseline.real" which transforms the baseline distribution parameters to the real number line used for estimation.
#"coefs.exp" which exponentiates the covariate effects.
library(flexsurv)
newdat <- data.table::data.table(tipo_de_programa_2= factor(c(rep("Women specific",2),rep("General population",2))),
  comp_status= factor(rep(c("Therapeutic discharge","Discharge without clinical advice"),2)),
  edad_al_ing_grupos= factor(rep("50+",4)),
  escolaridad_rec= factor(rep("1-More than high school",4)),
  sus_principal_mod= factor(rep("Marijuana",4)),
  freq_cons_sus_prin= factor(rep("2 to 3 days a week",4)),
  compromiso_biopsicosocial= factor(rep("1-Mild",4)),
  tenencia_de_la_vivienda_mod= factor(rep("Owner/Transferred dwellings/Pays Dividends",4)),
  num_otras_sus_mod= factor(rep("No additional substance",4)),
  numero_de_hijos_mod_rec= factor(rep("No",4)),
  tipo_de_plan_res= factor(rep("Outpatient",4))
                       )
newdat2 <- data.table::data.table(tipo_de_programa_2= factor(c(rep("Women specific",2),rep("General population",2))),#factor(c(rep("Women specific",2),rep("General population",2))),
  comp_status= factor(rep(c("Therapeutic discharge","Discharge without clinical advice"),2)),
  edad_al_ing_grupos= factor(rep("30-39",4)),
  escolaridad_rec= factor(rep("2-Completed high school or less",4)),
  sus_principal_mod= factor(rep("Alcohol",4)),
  freq_cons_sus_prin= factor(rep("Less than 1 day a week",4)),
  compromiso_biopsicosocial= factor(rep("2-Moderate",4)),
  tenencia_de_la_vivienda_mod= factor(rep("Stays temporarily with a relative",4)),
  num_otras_sus_mod= factor(rep("One additional substance",4)),
  numero_de_hijos_mod_rec= factor(rep("Yes",4)),
  tipo_de_plan_res= factor(rep("Outpatient",4))
                       )

haz_covs1 <- summary(flexsurv_1, t= newtime, newdata = newdat, type = "hazard", tidy = TRUE) %>% 
  dplyr::mutate(int=factor(paste0(tipo_de_programa_2, " - ", comp_status))) %>% 
  dplyr::mutate(tooltip= paste0("Days after finishing treatments: ", round(time*365.25),0))
haz_covs2 <- summary(flexsurv_1, t= newtime, newdata = newdat2, type = "hazard", tidy = TRUE) %>% 
  dplyr::mutate(int=factor(paste0(tipo_de_programa_2, " - ", comp_status))) %>% 
  dplyr::mutate(tooltip= paste0("Days after finishing treatments: ", round(time*365.25),0))

plot_haz_covs<-
 ggplot(haz_covs1, aes(x = time, y = est, col = int)) + #[c(2,66),]
    geom_line() +
    xlab("Days") + ylab("Hazard") +
    scale_color_discrete(name = "") +
    theme(legend.position = "bottom")+
   sjPlot::theme_sjplot()
 #+ xlim(c(.1,10))+ ylim(c(0,.2))

plot_haz_covs2<-
 ggplot(haz_covs1) + #[c(2,66),]
    geom_line_interactive(aes(x = time, y = est, col = int, group=int, tooltip=tooltip)) +
    geom_ribbon_interactive(aes(x=time, ymin=lcl, ymax=ucl, fill= int, group=int),alpha=.3)+
    xlab("Time (in years)") + ylab("Hazard") +
    theme(legend.position = "bottom")+
    scale_fill_manual(name="Terms", values =                         c("gray70","#B89673","#A0A36D","#886894","darkorchid4","#496A72","gray70","#496A72")) +
    scale_color_manual(name="Terms", values =                         c("gray70","#B89673","#A0A36D","#886894","darkorchid4","#496A72","#496A72")) +
   sjPlot::theme_sjplot()+
  theme(legend.position="bottom", 
        legend.title=element_text(size=9), 
        legend.text=element_text(size=7))+
guides(color=guide_legend(ncol=2),fill=guide_legend(ncol=2))

tooltip_css <- "background-color:gray;color:white;font-style:italic;padding:10px;border-radius:10px 20px 10px 20px;"
ggiraph(code = {print(plot_haz_covs2)}, tooltip_extra_css = tooltip_css, tooltip_opacity = .75)

Competing risks

We estimated the cumulative incidence (risk over time) of each outcome of treatment type in the presence of the other event types.

time_aft_inc_rate<-Sys.time()

paste0("Time taken in process: ");time_aft_inc_rate-time_bef_inc_rate

## [1] "Time taken in process: "

## Time difference of 1.605014 mins

## Calculate the grouped cumulative incidence functions (CIF)

cuminc_comp_status_0<-
  CumIncidence (CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome,
                CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status,
                CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$tipo_de_programa_2, 
                cencode = "Censored", 
                xlab="Time (in years)",
                level=.95)

## 
## +-------------------------------------------------------------------+
## | Cumulative incidence function estimates from competing risks data |
## +-------------------------------------------------------------------+
## Test equality across groups:
##                                   Statistic df p-value
## Therapeutic discharge                 3.674  1 0.05528
## Discharge without clinical advice     9.032  1 0.00265
## 
## Estimates at time points:
##                                                              0      2      4
## General population Therapeutic discharge             0.0012141 0.2451 0.2699
## Women specific Therapeutic discharge                 0.0000000 0.2598 0.2821
## General population Discharge without clinical advice 0.0007588 0.5853 0.5974
## Women specific Discharge without clinical advice     0.0004878 0.5807 0.5871
##                                                           6      8     10
## General population Therapeutic discharge             0.2701 0.2701 0.2701
## Women specific Therapeutic discharge                 0.2824 0.2824 0.2824
## General population Discharge without clinical advice 0.5974 0.5974 0.5974
## Women specific Discharge without clinical advice     0.5871 0.5871 0.5871
## 
## Standard errors:
##                                                              0        2
## General population Therapeutic discharge             0.0003034 0.003963
## Women specific Therapeutic discharge                 0.0000000 0.005132
## General population Discharge without clinical advice 0.0002399 0.004471
## Women specific Discharge without clinical advice     0.0002439 0.005657
##                                                             4        6        8
## General population Therapeutic discharge             0.004127 0.004128 0.004128
## Women specific Therapeutic discharge                 0.005316 0.005323 0.005323
## General population Discharge without clinical advice 0.004478 0.004478 0.004478
## Women specific Discharge without clinical advice     0.005669 0.005669 0.005669
##                                                            10
## General population Therapeutic discharge             0.004128
## Women specific Therapeutic discharge                 0.005323
## General population Discharge without clinical advice 0.004478
## Women specific Discharge without clinical advice     0.005669
## 
## 95% pointwise confidence intervals:
## 
## , , General population Therapeutic discharge
## 
##               0      2      4      6      8     10
## lower 0.0007305 0.2374 0.2619 0.2620 0.2620 0.2620
## upper 0.0019471 0.2529 0.2780 0.2782 0.2782 0.2782
## 
## , , Women specific Therapeutic discharge
## 
##         0      2      4      6      8     10
## lower NaN 0.2498 0.2717 0.2720 0.2720 0.2720
## upper NaN 0.2699 0.2925 0.2929 0.2929 0.2929
## 
## , , General population Discharge without clinical advice
## 
##               0      2      4      6      8     10
## lower 0.0003973 0.5764 0.5886 0.5886 0.5886 0.5886
## upper 0.0013739 0.5940 0.6061 0.6061 0.6061 0.6061
## 
## , , Women specific Discharge without clinical advice
## 
##               0      2      4      6      8     10
## lower 0.0001715 0.5695 0.5759 0.5759 0.5759 0.5759
## upper 0.0012234 0.5917 0.5981 0.5981 0.5981 0.5981

cuminc_comp_status_0b<-
  CumIncidence (CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr[CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome<3]$time_to_outcome,
                CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr[CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome<3]$comp_status,
                CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr[CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome<3]$tipo_de_programa_2, 
                cencode = "Censored", 
                xlab="Time (in years)",
                level=.95)

## 
## +-------------------------------------------------------------------+
## | Cumulative incidence function estimates from competing risks data |
## +-------------------------------------------------------------------+
## Test equality across groups:
##                                   Statistic df p-value
## Therapeutic discharge                 4.015  1 0.04509
## Discharge without clinical advice     7.966  1 0.00477
## 
## Estimates at time points:
##                                                              0     0.5      1
## General population Therapeutic discharge             0.0013314 0.03248 0.1505
## Women specific Therapeutic discharge                 0.0000000 0.03623 0.1588
## General population Discharge without clinical advice 0.0008322 0.38753 0.5738
## Women specific Discharge without clinical advice     0.0005307 0.43215 0.5772
##                                                         1.5      2    2.5
## General population Therapeutic discharge             0.2400 0.2787 0.3020
## Women specific Therapeutic discharge                 0.2502 0.2940 0.3162
## General population Discharge without clinical advice 0.6293 0.6498 0.6613
## Women specific Discharge without clinical advice     0.6228 0.6383 0.6454
## 
## Standard errors:
##                                                              0      0.5
## General population Therapeutic discharge             0.0003327 0.001646
## Women specific Therapeutic discharge                 0.0000000 0.002194
## General population Discharge without clinical advice 0.0002631 0.004513
## Women specific Discharge without clinical advice     0.0002653 0.005784
##                                                             1      1.5        2
## General population Therapeutic discharge             0.003407 0.004129 0.004385
## Women specific Therapeutic discharge                 0.004404 0.005290 0.005637
## General population Discharge without clinical advice 0.004665 0.004612 0.004602
## Women specific Discharge without clinical advice     0.005865 0.005821 0.005825
##                                                           2.5
## General population Therapeutic discharge             0.004547
## Women specific Therapeutic discharge                 0.005829
## General population Discharge without clinical advice 0.004622
## Women specific Discharge without clinical advice     0.005861
## 
## 95% pointwise confidence intervals:
## 
## , , General population Therapeutic discharge
## 
##               0     0.5      1    1.5      2    2.5
## lower 0.0008009 0.02937 0.1438 0.2320 0.2701 0.2931
## upper 0.0021346 0.03582 0.1572 0.2482 0.2873 0.3109
## 
## , , Women specific Therapeutic discharge
## 
##         0     0.5      1    1.5      2    2.5
## lower NaN 0.03211 0.1503 0.2398 0.2830 0.3048
## upper NaN 0.04071 0.1675 0.2606 0.3051 0.3277
## 
## , , General population Discharge without clinical advice
## 
##               0    0.5      1    1.5      2    2.5
## lower 0.0004355 0.3787 0.5646 0.6201 0.6407 0.6521
## upper 0.0015061 0.3964 0.5828 0.6382 0.6588 0.6702
## 
## , , Women specific Discharge without clinical advice
## 
##               0    0.5      1    1.5      2    2.5
## lower 0.0001864 0.4208 0.5656 0.6113 0.6268 0.6338
## upper 0.0013301 0.4435 0.5886 0.6341 0.6496 0.6568

cuminc_comp_status<-  
cuminc(ftime= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome,
    fstatus= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status, 
    group=CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$tipo_de_programa_2, 
    cencode = "Censored")

## CIF and the variance of point estimates
timepoints_cuminc_comp<-cmprsk::timepoints(cuminc_comp_status, times = seq(0,3,.05))

#Estimates

cuminc_comp_table<-
cbind.data.frame(melt(timepoints_cuminc_comp$est),var=melt(timepoints_cuminc_comp$var)[,3]) %>% 
    dplyr::mutate(lower= value ^ exp(-qnorm(1-(1-.95)/2)*sqrt(var)/(value*log(value)))) %>% 
    dplyr::mutate(upper= value ^ exp(qnorm(1-(1-.95)/2)*sqrt(var)/(value*log(value))))  
    # z <- qnorm(1-(1-.95)/2)
    #  lower <- x$est ^ exp(-qnorm(1-(1-.95)/2)*sqrt(x$var)/(x$est*log(x$est)))
    #  upper <- x$est ^ exp(qnorm(1-(1-.95)/2)*sqrt(x$var)/(x$est*log(x$est)))
    #  col <- if(missing(col)) rep(1:(s-1), rep(g,(s-1))) 
    #         else             rep(col, g*(s-1))
    #  lwd <- if(missing(lwd)) rep(1, g*(s-1)) 
    #         else             rep(lwd, g*(s-1))   

#rio::export(
#CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr, 
#paste0(dta_path,"cmprsk.dta"))

#plot(cuminc_comp_status, col = 1:4, lwd = 3, lty = 1, xlab = "Time (in years)")

#at year 10, the estimated marginal probability of discharge w/o clinical advice was 59%

#https://rpubs.com/oireyescortes/rpubs-supervivencia-riesgos-competitivos
plot_incidence_cum<-
cuminc_comp_table %>% 
    data.frame() %>% 
    dplyr::mutate(Var1=dplyr::case_when(grepl("population Therapeutic",Var1)~"Mixed gender & Therapeutic discharge",
                                        grepl("specific Therapeutic",Var1)~"Women only & Therapeutic discharge",
                                        grepl("specific Discharge without",Var1)~"Women only & Discharge without clinical advice",
                                        T~"Mixed gender & Discharge without clinical advice")) %>% 
    #reshape(idvar= c("V2", "V3"),timevar="V1", direction="wide") %>% 
    ggplot()+
    geom_line(aes(x=Var2, y=value, fill=Var1))+
    geom_ribbon(aes(x=Var2, ymin = lower, ymax = upper, fill=Var1),alpha=.4)+
    sjPlot::theme_sjplot()+
    ylim(0,1)+
    guides(fill=guide_legend(ncol=2))+
    scale_y_continuous(breaks = seq(0,.65, .1),
                     labels = scales::percent,
                     limits = c(0, .65)) +
    scale_x_continuous(breaks=seq(0,3,.25))+
    theme(legend.position="bottom")+
    scale_fill_manual(name="Type of\nprogram &\nOutcome",values=c("cornflowerblue","brown3","#461c50","darkolivegreen"))+
    labs(x="Time (in years)",y="Probability")+
   # geom_vline(xintercept=1095/365.25, color="darkred", linetype=2)
  labs(caption= "Note. There are no discharges after 1,095 days from the admission.")

plot_incidence_cum

Figure 6. Cum. Incidence of Treatment Outcome, Grouped by Type of Program

if(no_mostrar==1){
jpeg("C:/Users/andre/Desktop/SUD_CL/eso7.jpg", height=10, width= 10, res= 96, units = "in")
plot_incidence_cum
dev.off()
}
time_bef_cmprsk<-Sys.time()

tests_comp_status_crsk<-
data.table(cuminc_comp_status$Tests, keep.rownames = T)

The cumulative incidence allow us to estimate different causes of failure (Therapeutic discharge & Discharge without clinical advice) and compare between groups (Women in Women-only and Mixed-gender Programs).

#<div style="border: 1px solid #ddd; padding: 5px; overflow-y: scroll; height:350px; overflow-x: scroll; width:100%">

## Regression (crr can only take a covariate matrix)

# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5326634/
# Para ver residuos
# https://www.nature.com/articles/bmt2009359.pdf?origin=ppub

 covars <- model.matrix(~edad_al_ing_grupos+ escolaridad_rec+ sus_principal_mod+ 
                 freq_cons_sus_prin+ compromiso_biopsicosocial+ tenencia_de_la_vivienda_mod+ 
                 num_otras_sus_mod+ numero_de_hijos_mod_rec+ tipo_de_programa_2+ tipo_de_plan_res, CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr)[,-1]
#The final [,-1] removes the constant term from the output of model.matrix

bmtDisMat <- matrix(
              cbind(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$tipo_de_programa_2,
                    CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$tipo_de_plan_res,
                    #2021-04-30, removed marital status
                    #CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$estado_conyugal_2,
                    CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$edad_al_ing_grupos,
                    CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$escolaridad_rec,
                    CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$sus_principal_mod,
                    CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$freq_cons_sus_prin,
                    CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$compromiso_biopsicosocial,
                    CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$tenencia_de_la_vivienda_mod,
                    CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$num_otras_sus_mod,
                    CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$numero_de_hijos_mod_rec
                    ),ncol=10)
colnames(bmtDisMat) <- c("program","plan","age at admission","educational attainment","substance","frequency of subs use","biopsychosoc comp", "housing cond", "other subs", "kids")

cov_alone<-matrix(as.numeric(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$tipo_de_programa_2=="Women specific"))

dimnames(cov_alone)[[2]] <- c("Type of program-Women specific")

resCrrRelByDis_only <-
cmprsk::crr(ftime= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome, 
   fstatus= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status, 
   cov1= cov_alone,
   failcode = "Discharge without clinical advice", 
   cencode  = "Censored")

resCrrRelByDis <- cmprsk::crr(ftime= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome, # vector of failure/censoring times
                      fstatus= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status, # vector with a unique code for each failure type and censoring
                      cov1= covars, #  matrix (nobs x ncovs) of fixed covariates
                      ## cov2     = , # matrix of covariates that will be multiplied by functions of time
                      ## tf       = , # functions of time
                      ## cengroup = , # vector with different values for each group with a distinct censoring distribution
                      failcode = "Discharge without clinical advice", # code of fstatus that denotes the failure type of interest
                      cencode  = "Censored" # code of fstatus that denotes censored observations
                      )

#~15 min. in a DELL 3ghz 32gb ram
#2021-04-30, we deleted marital status
caso_hipotetico<-rbind(bmtDisMat[120,2:10],bmtDisMat[120,2:10])

caso_hipotetico[1,]<- c(1,4,3,4,1,1,3,1,1)#2021-04-30, we deleted marital status
caso_hipotetico[2,]<- c(1,4,3,4,1,1,3,1,1)#2021-04-30, we deleted marital status
#The first argument to the function wald.test() is the estimated covariance matrix for the coefficients, followed
#by the vector of coefficients estimates, and the position of coefficients for which we want to assess significance
#aod::wald.test(resCrrRelByDis$var, resCrrRelByDis$coef, Terms = 1:3)
#aod::wald.test(resCrrRelByDis$var, resCrrRelByDis$coef, Terms = 6:9)
#aod::wald.test(resCrrRelByDis$var, resCrrRelByDis$coef, Terms = 10:13)
#aod::wald.test(resCrrRelByDis$var, resCrrRelByDis$coef, Terms = 16:19) #no diffs in tennence in every  levels
#
#Phase is a factor with relapse as baseline, so each P-value provides a test for the difference
#of each level with respect to the baseline. An overall Pvalue for Phase (the overall P-value is always required
#when modeling a factor with more than two levels), can be obtained through the Wald test. 
# ESTO PERMITE VER SI UNA VRIABLE FACTOR TIENE DIFERENCIAS EN TODOS SUS NIVELES
 options(knitr.kable.NA = '')

data.table::data.table(summary(resCrrRelByDis)$conf.int, keep.rownames = T) %>% 
  dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars"))%>% 
  dplyr::mutate(formal_vars= ifelse(rn=="factor(tipo_de_programa_2)Women specific","Type of program-Women specific",formal_vars))%>% 
  dplyr::left_join(data.table::data.table(summary(resCrrRelByDis_only)$conf.int,keep.rownames=T), by=c("formal_vars"="rn"))%>% 
  dplyr::mutate(formal_vars=dplyr::case_when(grepl("Type of program-Women specific",formal_vars)~" Type of program-Women only",T~formal_vars)) %>% 
  dplyr::arrange(formal_vars) %>% 
  dplyr::mutate(conf.low2=sprintf("%1.2f",`2.5%.y`),
                conf.high2=sprintf("%1.2f",`97.5%.y`),
                conf.low1=sprintf("%1.2f",`2.5%.x`),
                conf.high1=sprintf("%1.2f",`97.5%.x`),
                `exp(coef)2`=sprintf("%1.2f",`exp(coef).y`),
                `exp(coef)1`=sprintf("%1.2f",`exp(coef).x`),
                `exp(-coef)2`=sprintf("%1.2f",`exp(-coef).y`),
                `exp(-coef)1`=sprintf("%1.2f",`exp(-coef).x`),
                `CI 95%1`=paste0(conf.low1,", ",conf.high1),
                `CI 95%2`=paste0(conf.low2,", ",conf.high2)) %>% 
  dplyr::filter(!rn %in% c("shape","scale")) %>% 
  dplyr::select(formal_vars, `exp(coef)2`, `CI 95%2`,
                `exp(coef)1`,  `CI 95%1`) %>%
  dplyr::mutate(`CI 95%1`=ifelse(`CI 95%1`=="NA, NA",NA,`CI 95%1`)) %>% 
  dplyr::mutate(`CI 95%2`=ifelse(`CI 95%2`=="NA, NA",NA,`CI 95%2`)) %>% 
  dplyr::mutate_at(c(2,4),~as.numeric(.)) %>% 
    knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption=paste0("Table 6. Competing Risks (Treatment Outcomes, Failure Type of Interest= Discharge without clinical advice) Regression Coefficients"),
               col.names = c("Term","Hazard", "CI 95%", "Hazard", "CI 95%"),
               align= c("l",rep('c', 6)))%>%
  kableExtra::row_spec(1, bold = TRUE) %>% 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size=11)%>%
  kableExtra::add_header_above(c(" ", "Only Type of Program" = 2, "Plus other covariates" = 2),bold=T) %>% 
  kableExtra::add_footnote("Note. ", notation = "none") %>%
  kableExtra::scroll_box(width = "100%", height = "340px")

Table 6. Competing Risks (Treatment Outcomes, Failure Type of Interest= Discharge without clinical advice) Regression Coefficients
	Only Type of Program		Plus other covariates
Term	Hazard	CI 95%	Hazard	CI 95%
Type of program-Women only	1.06	1.02, 1.10	0.95	0.91, 0.99
Age at admission to treatment, grouped- 30-39			0.70	0.67, 0.74
Age at admission to treatment, grouped- 40-49			0.95	0.91, 0.99
Age at admission to treatment, grouped- 50+			0.99	0.96, 1.03
Biopsychosocial involvement- 2-Moderate			1.00	0.94, 1.06
Biopsychosocial involvement- 3-Severe			1.05	1.01, 1.09
Co-occurring SUD- More than one additional substance			0.93	0.90, 0.96
Co-occurring SUD- One additional substance			1.05	1.01, 1.09
Consumption frequency of primary or main substance- 1 day a week or more			1.14	1.08, 1.21
Consumption frequency of primary or main substance- 2 to 3 days a week			1.12	1.06, 1.19
Consumption frequency of primary or main substance- 4 to 6 days a week			0.94	0.89, 0.99
Consumption frequency of primary or main substance- Daily			1.09	1.03, 1.15
Ed. Attainment- Completed high school or less			0.79	0.76, 0.83
Ed. Attainment- More than high school			0.97	0.94, 1.00
Have children (Dichotomized)- Yes			1.06	1.00, 1.13
Primary or main substance- Cocaine hydrochloride			1.30	1.23, 1.37
Primary or main substance- Cocaine paste			1.54	1.47, 1.62
Primary or main substance- Marijuana			1.12	1.04, 1.22
Primary or main substance- Other			0.84	0.76, 0.94
Setting of Treatment- Residential			0.92	0.87, 0.97
Tenure status of households- Others			1.03	0.86, 1.22
Tenure status of households- Owner/Transferred dwellings/Pays Dividends			0.88	0.76, 1.02
Tenure status of households- Renting			0.94	0.80, 1.09
Tenure status of households- Stays temporarily with a relative			0.95	0.82, 1.10
Note.

#resCrrRelByDis$loglik.null
#resCrrRelByDis$loglik

#For eGFR as determinant, inverse hazard ratio was determined (1/HR) in order to report risk of ESKD associated with decrease of eGFR

# Scrucca L, Santucci A, Aversa F (2009) Regression Modeling of Competing 
#   Risk Using R: An In Depth Guide for Clinicians. Submitted to Bone Marrow Transplantation
#MODEL SELECTION
modsel.crr(resCrrRelByDis_only,resCrrRelByDis)

## Model selection table
## 
## Model 0: Null model
## Model 1: cmprsk::crr(ftime = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome, fstatus = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status, cov1 = cov_alone, failcode = "Discharge without clinical advice", cencode = "Censored")
## Model 2: cmprsk::crr(ftime = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome, fstatus = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status, cov1 = covars, failcode = "Discharge without clinical advice", cencode = "Censored")
##   Num.obs  logLik Df.fit    BIC BIC diff
## 0   21378 -114520      0 229039   1143.6
## 1   21378 -114515      1 229040   1144.9
## 2   21378 -113828     24 227895      0.0

#Hacer distintos modelos y compararlos
#</div>

# To adjust for confounding, three models were constructed: the first model was adjusted for sex and age and the second model was further adjusted for smoking, T2DM, SBP, BMI, non-HDL-cholesterol and exercise (if not a determinant of interest). A third model was constructed with addition of use of glucose-lowering medication, antihypertensive medication and lipid-lowering medication to the second model.

#Predicted Cumulative Incidence
#We predict the CIF of death for a justice appointed at age 55 completed years in 1950 and 2000. The code below adds 0.5 to convert to exact ages and mid years.)

df_pred_cmprsk_only<-
predict(resCrrRelByDis_only, matrix(rbind("Type of program-Women specific"=1,"Type of program-Women specific"=0),nrow=2))

cifd_only <- data.frame(
   program = factor(rep(c("Women specific", "General population"),each=length(df_pred_cmprsk_only[,1]))),
   years = c(df_pred_cmprsk_only[,1],df_pred_cmprsk_only[,1]),
   cif = c(df_pred_cmprsk_only[,2], df_pred_cmprsk_only[,3])
 )
ggplot(cifd_only, aes(years, cif, color=program)) + geom_step()

df_pred_cmprsk<-
predict(resCrrRelByDis, rbind(c(program=1,caso_hipotetico[1,]),c(program=0,caso_hipotetico[1,])))

cifd <- data.frame(
  program = factor(rep(c("Women specific", "General population"),each=length(df_pred_cmprsk[,1]))),
   years = c(df_pred_cmprsk[,1],df_pred_cmprsk[,1]),
   cif = c(df_pred_cmprsk[,2], df_pred_cmprsk[,3])
 )

ggplot(cifd, aes(years, cif, color=program)) + geom_step() + scale_color_manual(name="Type of program", values=c("tomato", "steelblue"), labels=c("Mixed gender", "Women only"))
#We see that the probability of dying while serving in the court is declining over time net of age at appointment. For a justice appointed at age 55, the probability would be 31.6% if appointed in 1950 and 22.0% if appointed in the year 2000, assuming of course that current trends continue, so the model applies.

time_aft_cmprsk<-Sys.time()

#https://rpubs.com/kaz_yos/cmprsk2

paste0("Time taken in process: ");time_aft_cmprsk-time_bef_cmprsk

## [1] "Time taken in process: "

## Time difference of 5.792451 mins

par(mfrow = c(4,6))
for(j in 1:ncol(resCrrRelByDis$res))
  scatter.smooth(resCrrRelByDis$uft, 
                 resCrrRelByDis$res[,j],
    main = names(resCrrRelByDis$coef)[j],
    xlab = "Failure time",
    ylab = "Schoenfeld residuals")

Figure 7a. Plots of Schoenfeld-type residuals against time failure of the covariates

#Schoenfeld residuals against failure time for each covariate. It is noted that the residuals follows a non-constant distribution across failure times, indicating a potential violation to the proportional assumption.

#the PH assumption can be assessed by fitting a LOESS curve to the plot. A straight line passing through a residual value of 0 with gradient 0 indicates that the variable satisfies the PH assumption and therefore does not depend on time.
time_bef_cmprsk2<-Sys.time()

options(knitr.kable.NA = '')

resCrrRelByDis_only2 <-
cmprsk::crr(ftime= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome, 
   fstatus= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status, 
   cov1= cov_alone,
   failcode = "Therapeutic discharge", 
   cencode  = "Censored")

resCrrRelByDis2 <- cmprsk::crr(ftime= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome, # vector of failure/censoring times
                      fstatus= CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status, # vector with a unique code for each failure type and censoring
                      cov1= covars, #  matrix (nobs x ncovs) of fixed covariates
                      ## cov2     = , # matrix of covariates that will be multiplied by functions of time
                      ## tf       = , # functions of time
                      ## cengroup = , # vector with different values for each group with a distinct censoring distribution
                      failcode = "Therapeutic discharge", # code of fstatus that denotes the failure type of interest
                      cencode  = "Censored" # code of fstatus that denotes censored observations
                      )

data.table::data.table(summary(resCrrRelByDis2)$conf.int, keep.rownames = T) %>% 
  dplyr::left_join(dt_coefs_simple_surv, by=c("rn"="real_vars"))%>% 
  dplyr::mutate(formal_vars= ifelse(rn=="factor(tipo_de_programa_2)Women specific","Type of program-Women specific",formal_vars))%>% 
  dplyr::left_join(data.table::data.table(summary(resCrrRelByDis_only2)$conf.int,keep.rownames=T), by=c("formal_vars"="rn"))%>%  
  dplyr::mutate(formal_vars=dplyr::case_when(grepl("Type of program-Women specific",formal_vars)~" Type of program-Women only",T~formal_vars)) %>% 
  dplyr::arrange(formal_vars) %>% 
  dplyr::mutate(conf.low2=sprintf("%1.2f",`2.5%.y`),
                conf.high2=sprintf("%1.2f",`97.5%.y`),
                conf.low1=sprintf("%1.2f",`2.5%.x`),
                conf.high1=sprintf("%1.2f",`97.5%.x`),
                `exp(coef)2`=sprintf("%1.2f",`exp(coef).y`),
                `exp(coef)1`=sprintf("%1.2f",`exp(coef).x`),
                `exp(-coef)2`=sprintf("%1.2f",`exp(-coef).y`),
                `exp(-coef)1`=sprintf("%1.2f",`exp(-coef).x`),
                `CI 95%1`=paste0(conf.low1,", ",conf.high1),
                `CI 95%2`=paste0(conf.low2,", ",conf.high2)) %>% 
  dplyr::filter(!rn %in% c("shape","scale")) %>% 
  dplyr::select(formal_vars, `exp(coef)2`, `CI 95%2`,
                `exp(coef)1`, `CI 95%1`) %>%
  dplyr::mutate(`CI 95%1`=ifelse(`CI 95%1`=="NA, NA",NA,`CI 95%1`)) %>% 
  dplyr::mutate(`CI 95%2`=ifelse(`CI 95%2`=="NA, NA",NA,`CI 95%2`)) %>% 
  dplyr::mutate_at(c(2,4),~as.numeric(.)) %>% 
    knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption=paste0("Table 7. Competing Risks (Treatment Outcomes, Failure Type of Interest= Therapeutic discharge) Regression Coefficients"),
               col.names = c("Term","Hazard", "CI 95%", "Hazard", "CI 95%"),
               align= c("l",rep('c', 6)))%>%
  kableExtra::row_spec(1, bold = TRUE) %>% 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size=11)%>%
  kableExtra::add_header_above(c(" ", "Only Type of Program" = 2, "Plus other covariates" = 2),bold=T) %>% 
  kableExtra::add_footnote("Note. ", notation = "none") %>%
  kableExtra::scroll_box(width = "100%", height = "340px")

Table 7. Competing Risks (Treatment Outcomes, Failure Type of Interest= Therapeutic discharge) Regression Coefficients
	Only Type of Program		Plus other covariates
Term	Hazard	CI 95%	Hazard	CI 95%
Type of program-Women only	1.05	0.99, 1.11	1.22	1.15, 1.30
Age at admission to treatment, grouped- 30-39			1.50	1.40, 1.61
Age at admission to treatment, grouped- 40-49			1.02	0.96, 1.08
Age at admission to treatment, grouped- 50+			1.01	0.96, 1.07
Biopsychosocial involvement- 2-Moderate			0.83	0.77, 0.90
Biopsychosocial involvement- 3-Severe			0.98	0.93, 1.03
Co-occurring SUD- More than one additional substance			1.06	1.02, 1.11
Co-occurring SUD- One additional substance			0.90	0.86, 0.96
Consumption frequency of primary or main substance- 1 day a week or more			0.86	0.79, 0.93
Consumption frequency of primary or main substance- 2 to 3 days a week			0.87	0.79, 0.95
Consumption frequency of primary or main substance- 4 to 6 days a week			1.05	0.97, 1.14
Consumption frequency of primary or main substance- Daily			0.94	0.87, 1.01
Ed. Attainment- Completed high school or less			1.36	1.29, 1.44
Ed. Attainment- More than high school			0.99	0.95, 1.04
Have children (Dichotomized)- Yes			0.89	0.82, 0.97
Primary or main substance- Cocaine hydrochloride			0.72	0.66, 0.78
Primary or main substance- Cocaine paste			0.62	0.57, 0.67
Primary or main substance- Marijuana			0.92	0.82, 1.04
Primary or main substance- Other			1.05	0.93, 1.18
Setting of Treatment- Residential			1.20	1.10, 1.30
Tenure status of households- Others			0.95	0.70, 1.30
Tenure status of households- Owner/Transferred dwellings/Pays Dividends			1.08	0.84, 1.39
Tenure status of households- Renting			0.99	0.77, 1.29
Tenure status of households- Stays temporarily with a relative			1.01	0.79, 1.31
Note.

modsel.crr(resCrrRelByDis_only2,resCrrRelByDis2)

## Model selection table
## 
## Model 0: Null model
## Model 1: cmprsk::crr(ftime = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome, fstatus = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status, cov1 = cov_alone, failcode = "Therapeutic discharge", cencode = "Censored")
## Model 2: cmprsk::crr(ftime = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$time_to_outcome, fstatus = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_irr$comp_status, cov1 = covars, failcode = "Therapeutic discharge", cencode = "Censored")
##   Num.obs logLik Df.fit   BIC BIC diff
## 0   21378 -49561      0 99121   714.11
## 1   21378 -49559      1 99128   721.24
## 2   21378 -49084     24 98407     0.00

#By fitting a set of candidate
#models for which it pursues model selection after removal of all
#non significant and marginally significant covariates.

Results were not conclusive. Without covariates, Women only programs had greater hazards of therapeutic discharge (HR= 1.05; CI 95% 0.99, 1.11), but these estimations overlap with null hazards. However, Women specific had greater hazards of therapeutic discharge when controlling for covariates of interest (HR= 1.22; CI 95% 1.15, 1.30).

#Predicted Cumulative Incidence
#We predict the CIF of death for a justice appointed at age 55 completed years in 1950 and 2000. The code below adds 0.5 to convert to exact ages and mid years.)

df_pred_cmprsk_only2<-
predict(resCrrRelByDis_only2, matrix(rbind("Type of program-Women specific"=1,"Type of program-Women specific"=0),nrow=2))

cifd_only2 <- data.frame(
   program = factor(rep(c("Women specific", "General population"),each=length(df_pred_cmprsk_only2[,1]))),
   years = c(df_pred_cmprsk_only2[,1],df_pred_cmprsk_only2[,1]),
   cif = c(df_pred_cmprsk_only2[,2], df_pred_cmprsk_only2[,3])
 )
ggplot(cifd_only2, aes(years, cif, color=program)) + geom_step()

df_pred_cmprsk2<-
predict(resCrrRelByDis2, rbind(c(program=1,caso_hipotetico[1,]),c(program=0,caso_hipotetico[1,])))

cifd2 <- data.frame(
  program = factor(rep(c("Women specific", "General population"),each=length(df_pred_cmprsk2[,1]))),
   years = c(df_pred_cmprsk2[,1],df_pred_cmprsk2[,1]),
   cif = c(df_pred_cmprsk2[,2], df_pred_cmprsk2[,3])
 )

grid.arrange(
    ggplot(cifd2, aes(years, cif, color=program)) + geom_step()+ggtitle("Therapeutic Discharge")+ sjPlot::theme_sjplot2()+ scale_color_manual(name="Type of program",values=c("tomato", "steelblue"), labels=c("Mixed gender", "Women only"))+theme(legend.position="none")+ylab("")+ scale_y_continuous(labels = function(x) paste0(x*100, "%"), limits=c(0,round(max(c(cifd2$cif,cifd$cif))+.1,1)))+ xlim(0,3),
    ggplot(cifd, aes(years, cif, color=program)) + geom_step()+ggtitle("Discharge without clinical advice")+ sjPlot::theme_sjplot2()+theme(legend.position = "bottom")+ scale_color_manual(name="Type of program",values=c("red","steelblue"), labels=c("Mixed gender", "Women only"))+ylab("")+ scale_y_continuous(labels = function(x) paste0(x*100, "%"),limits=c(0,round(max(c(cifd2$cif,cifd$cif))+.1,1)))+ xlim(0,3),
    left = grid::textGrob("Predicted probability of achieving one of the outcomes", rot = 90, vjust = 1),
    heights=c(1,1.2)
)

#We see that the probability of dying while serving in the court is declining over time net of age at appointment. For a justice appointed at age 55, the probability would be 31.6% if appointed in 1950 and 22.0% if appointed in the year 2000, assuming of course that current trends continue, so the model applies.

time_aft_cmprsk2<-Sys.time()

#https://rpubs.com/kaz_yos/cmprsk2

paste0("Time taken in process: ");time_aft_cmprsk2-time_bef_cmprsk2

## [1] "Time taken in process: "

## Time difference of 24.87332 mins

par(mfrow = c(4,6))
for(j in 1:ncol(resCrrRelByDis2$res))
  scatter.smooth(resCrrRelByDis2$uft, 
                 resCrrRelByDis2$res[,j],
    main = names(resCrrRelByDis2$coef)[j],
    xlab = "Failure time",
    ylab = "Schoenfeld residuals")

Figure 7b. Plots of Schoenfeld-type residuals against time failure of the covariates

#Schoenfeld residuals against failure time for each covariate. It is noted that the residuals follows a non-constant distribution across failure times, indicating a potential violation to the proportional assumption.

#the PH assumption can be assessed by fitting a LOESS curve to the plot. A straight line passing through a residual value of 0 with gradient 0 indicates that the variable satisfies the PH assumption and therefore does not depend on time.

These inconsistencies and differences between the type of program and the treatment outcome, add support to the implementation of a joint model that capture different trajectories and their subsequent relationship with readmissions.

Multi-state

Transition matrix

The model schematic illustrates treatment progression and the possible transitions between the states and the model structure. The first represents a semi-competing risks model, and the second represent a competing risk structure.

library(igraph)

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
## 3 ESTADOS SIMPLES ##
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
links<-data.frame(stringsAsFactors=FALSE,
  from = c(rep("Admitted to\nTreatment", 2),"Therapeutic\nDischarge"),
  to = c(rep(c("Therapeutic\nDischarge"),1), rep("Readmission", 2)),
  value = c("1","2","3"))

links2<-data.frame(stringsAsFactors=FALSE,
   from = c(rep("Admitted to\nTreatment", 3),"Therapeutic\nDischarge","Discharge w/o\nClinical Advice"),
   to = c(rep(c("Therapeutic\nDischarge","Discharge w/o\nClinical Advice"),1), rep("Readmission", 3)),
   value = c("1","2","3","4","5"))

#dev.off()
#https://www.r-graph-gallery.com/248-igraph-plotting-parameters.html
#https://rstudio-pubs-static.s3.amazonaws.com/341807_7b9d1a4e787146d492115f90ad75cd2d.html
par(mfrow=c(1, 2))  
#for (i in c(2560:2660)){
  set.seed(2630) #i #2660 #2646 #2642 #2630 #2650
  co <- layout.fruchterman.reingold(graph_from_data_frame(links, directed=TRUE))
  plot(graph_from_data_frame(links, directed=TRUE),
            asp = 0,
            layout= co,
            edge.label=links$value,
            edge.label.cex=3,
            edge.label.color="black",
            #vertex.label= rev(),
            vertex.color="white",
            vertex.size=120,
            vertex.size2=25,
            vertex.label.cex=1, 
            edge.arrow.size=1,
            edge.color="black",
            vertex.shape="rectangle",
            vertex.label.color="black",
            edge.curved=0,
            edge.width=1.5,
           #main=paste0(i),
            outputorder="edgesfirst",  
            vertex.label.dist=0,
            vertex.cex = 3)
#}
title("a) Three-states Model (Simplest)", sub = "No recurring states; Absorving state: Readmission;\nOther causes of discharge were not events of interest;\nModified version of an illness-death model") ## internal titles

set.seed(10990)#i #10990 10921 10898 10835
co2 <- layout.fruchterman.reingold(graph_from_data_frame(links2, directed=TRUE))
  plot(graph_from_data_frame(links2, directed=TRUE),
            asp = 0,
            layout= co2,
            edge.label=links2$value,
            edge.label.cex=2.5,
            edge.label.color="black",
            #vertex.label= rev(),
            vertex.color="white",
            vertex.size=120,
            vertex.size2=25,
            vertex.label.cex=1, 
            edge.arrow.size=1,
            edge.color="black",
            vertex.shape="rectangle",
            vertex.label.color="black",
            edge.curved=0,
            edge.width=1.5,
            outputorder="edgesfirst",
            #main=paste0(i),
            vertex.label.dist=0,
            vertex.cex = 3)
  title("b) Four-states Model", sub = "No recurring states; Absorving state: Readmission;\nOther causes of discharge were not events of interest") ## internal titles

#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
## Probando con paquetes estadísticos
if(no_mostrar==1){
library(igraph)
Nodes <- c("Admitted to\nGP","Admitted to\nWE","Therapeutic\nDischarge","Discharge w/o\nClinical Advice","Readmission") #states possible in MSM
Edges <- list("Admitted to\nGP"=list(edges=c("Therapeutic\nDischarge","Discharge w/o\nClinical Advice")),
              "Admitted to\nWE"=list(edges=c("Therapeutic\nDischarge","Discharge w/o\nClinical Advice")),
              "Therapeutic\nDischarge"=list(edges=c("Readmission")),
              "Discharge /wo\nClinical Advice"=list(edges=c("Readmission")),
              "Readmission"=list(edges=NULL)) #transitions from each state

RCLTtree <- new("graphNEL",nodes=Nodes,edgeL=Edges,edgemode="directed")
plot(RCLTtree)
#https://www.rdocumentation.org/packages/msSurv/versions/1.2-2/topics/msSurv
msSurv(LTRCdata, RCLTtree, cens.type="ind", LT=FALSE, bs=FALSE, B=200)
}

library(Epi)
#For censored state transitions it can be awkward having to replicate the censoring time for each non-visited state
#https://github.com/stulacy/multistateutils
mat_3_states <- trans.illdeath(names=c("Admission","Therapeutic\nDischarge","Readmission"))
#All possible paths through the multi-state model can be found here:
#paths(mat_3_states)
box_ms_3s<-
boxes.Lexis(mat_3_states, wmult=1.25, hmult=1.25, cex=1.2,
             boxpos = list(x = c(20, 70, 70), y = c(80, 80, 20)),
            txt.arr=c(expression(" 1) " *lambda['01']), 
                      expression(" 2) " *lambda['02']),
                      expression(" 3) " *lambda['12'])))
title("a) Three-states Model (Simplest)", sub = "No recurring states; Absorving state: Readmission;\nOther causes of discharge were not events of interest;\nModified version of an illness-death model") ## internal titles

#Los números son determinados por posición en cada columna (o eje Y). 
#Si uno quiere definir para la otra fila, salta al siguiente vector: 
mat_4_states<- transMat(list(c(2,3,4), 4, 4, c()), 
                names = c("Admission", "Therapeutic\nDischarge", "Discharge Without\nMedical Advice", "Readmission"))


box_ms_4s<-
boxes.Lexis(mat_4_states, wmult=1.25, hmult=1.5, boxpos = list(x = c(20, 20, 70, 70), y = c(80, 20, 80, 20)), cex=1.2,
                        txt.arr=c(
                      expression("1)" *lambda['01']), 
                      expression("2)" *lambda['02']),
                      expression("3)" *lambda['03']),
                      expression("4)" *lambda['13']),
                      expression("5)" *lambda['23'])))
  title("b) Four-states Model", sub = "No recurring states; Absorving state: Readmission;\nOther causes of discharge were not events of interest") ## internal titles

Any observation where an event occurs which is not the event of interest for a specific transition is treated as a censored at the end of the study (2019, November 13) observation (only referrals and also had inexact dates of discharge), that is, in the same way as a patient that was lost to follow-up. If there is a readmission posterior to loss-to follow-up cases, these cases we obtained the length in days between being readmitted posterior and the time of admission, knowing that any intermediate date of discharge was interval-censored (the exact time in which users discharged is unknown, and its treatment outcomes are unknown, so there is no exact time-to-readmission). Covariates are fixed at baseline. Some transitions were shown to be simultaneous (n= 132). Small adjustment such that transitions were sequential rather than simultaneous by adding one day to the absorbing event, and subtracting a day if the transition corresponded to an intermediate state.

The stacked format of the data allows to calculate hazards very simply by each transition defined earlier (trans).

#Data should be in a data frame with column names "id", "stop", "st.stage", and "stage" where "id" is the individual's identification number, "stop" is the transition time from state j to j', "st.stage" is the state the individual is transitioning from (i.e., j), and "stage" is the state the individual is transitioning to (i.e., j') and equals 0 if right censored.

## 80% of sample is LT, rest has start time of 0
### AGS: Parten en 0, salvo que estén truncados a la izquierda. 
### Parece que todos comparten un mismo tiempo ojetivo.
### AGS: Cuando hay un estado seguido no es necesario interval censoring, se dn en tun tiempo continuo
### El 0 es censura

library(mstate)

#### MSPREP= 3 STATES 
#_#_#_#_#_#_#_
CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep<-
  CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados %>% 
# 2021-03-24: Eliminate cases with readmissions posterior to interval-censored discharges
  #dplyr::mutate(cens=ifelse(motivodeegreso_mod_imp=="Referral to another treatment"& !is.na(fech_ing_next_treat),1,0)) %>% dplyr::filter(cens==0) %>% 
  dplyr::mutate(ther_disch=ifelse(motivodeegreso_mod_imp=="Therapeutic discharge",1,0)) %>% 
  # para problema para intervalos 0 entre un tratamiento nuevo y otro y entre el ingreso a un tratamiento y su término y que comento más abajo
  dplyr::mutate(cambio_fecha_ing= dplyr::case_when(dias_treat_imp_sin_na==0~1 ,T~0)) %>%
  dplyr::mutate(fech_ing_num= dplyr::case_when(cambio_fecha_ing==1~fech_ing_num-1,T~fech_ing_num)) %>% 
  dplyr::mutate(dias_treat_imp_sin_na= dplyr::case_when(cambio_fecha_ing==1~fech_egres_num-fech_ing_num, T~dias_treat_imp_sin_na)) %>% 
  dplyr::select(-cambio_fecha_ing) %>% 
  #If status=1, the corresponding transition has been observed. Si no no se ha observado
  # para efectos de este modelo simple, experimentar el alta terapéutica es el objetivo, por lo que el resto será censura
  dplyr::mutate(readmission=ifelse(!is.na(fech_ing_next_treat),1,0)) %>% 
  #time of arrival at the state
  dplyr::mutate(diff_bet_treat=fech_ing_next_treat-fech_egres_num) %>% 
  # para problema para intervalos 0 entre un tratamiento nuevo y otro y entre el ingreso a un tratamiento y su término y que comento más abajo. Le añado un día
  dplyr::mutate(cambio_fecha_ing_nuevo_t= dplyr::case_when(diff_bet_treat==0~1 ,T~0)) %>%
  dplyr::mutate(fech_ing_next_treat= dplyr::case_when(cambio_fecha_ing_nuevo_t==1~fech_ing_next_treat+1,T~fech_ing_next_treat)) %>% 
  dplyr::mutate(diff_bet_treat= dplyr::case_when(cambio_fecha_ing_nuevo_t==1~fech_ing_next_treat-fech_egres_num, T~diff_bet_treat)) %>% 
  dplyr::select(-cambio_fecha_ing_nuevo_t) %>% 
  #dplyr::filter(diff_bet_treat_corr!=diff_bet_treat)
  #ADD DATES TO THE FINAL FOLLOW UP OF THE STUDY IF CENSORED
  dplyr::mutate(dias_treat_imp_sin_na= dplyr::case_when(ther_disch==0~ as.numeric(as.Date("2019-11-13"))-fech_ing_num,T~dias_treat_imp_sin_na)) %>% 

  ## 2021-03-24, I had to reespecify times to objective times, in order to avoid further problems
  dplyr::mutate(date_ther_disch= dias_treat_imp_sin_na, #fech_ing_num+ 
  ## 2021-03-24, The posterior treatment has to include the days in the previous treatment
                date_post_treat= dplyr::case_when(ther_disch==1 & readmission==1~ dias_treat_imp_sin_na+ diff_bet_treat,
                                #if the first time is censored, 
                                ther_disch==0 & readmission==1~ fech_ing_next_treat-fech_ing_num,
                                readmission==0~ as.numeric(as.Date("2019-11-13"))-fech_ing_num)) %>% 
  ## EL RESTO DE LOS PACIENTES NO VAN A HABER REGISTRADO EVENTOS EN ESE TIEMPO, POR LO QUE LLEGARÁN AL FINAL DEL SEGUIMIENTO
  dplyr::select(row, fech_ing_num, date_ther_disch, ther_disch, date_post_treat, readmission, tipo_de_programa_2:numero_de_hijos_mod_rec,tipo_de_plan_res) %>% 
  as.data.frame() 

#### MSPREP= 4 STATES 
#_#_#_#_#_#_#_
CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep2<-
  CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados %>% 
  # 2021-03-24: Eliminate cases with readmissions posterior to interval-censored discharges
  #dplyr::mutate(cens=ifelse(motivodeegreso_mod_imp=="Referral to another treatment"& !is.na(fech_ing_next_treat),1,0)) %>% dplyr::filter(cens==0) %>% 
  dplyr::mutate(ther_disch=ifelse(motivodeegreso_mod_imp=="Therapeutic discharge",1,0)) %>% 
  # For problems with 0 intervals between a new treatment and other between the admission and discharge, I subtract 1 day of admission. Then, and if there is a change in the date, we replace it. If not, we maintain the original value
  dplyr::mutate(cambio_fecha_ing= dplyr::case_when(dias_treat_imp_sin_na==0~1 ,T~0)) %>%
  dplyr::mutate(fech_ing_num= dplyr::case_when(cambio_fecha_ing==1~fech_ing_num-1,T~fech_ing_num)) %>% 
  dplyr::mutate(dias_treat_imp_sin_na= dplyr::case_when(cambio_fecha_ing==1~fech_egres_num-fech_ing_num, T~dias_treat_imp_sin_na)) %>% 
  dplyr::select(-cambio_fecha_ing) %>% 
  #If status=1, the corresponding transition has been observed. If not, it assumes not observed
  dplyr::mutate(disch_wo_clin_adv=ifelse(motivodeegreso_mod_imp %in% c("Early Drop-out","Late Drop-out","Administrative discharge"),1,0)) %>% 
  dplyr::mutate(readmission=ifelse(!is.na(fech_ing_next_treat),1,0)) %>% 
  #time of arrival at the state of readmission
  dplyr::mutate(diff_bet_treat=fech_ing_next_treat-fech_egres_num) %>% 
  # For problems with 0 intervals between a new treatment and other between the admission and discharge, I add 1 day of admission to the next treatment. Then, and if there is a change in the date, we replace it. If not, we maintain the original value
  dplyr::mutate(cambio_fecha_ing_nuevo_t= dplyr::case_when(diff_bet_treat==0~1 ,T~0)) %>%
  dplyr::mutate(fech_ing_next_treat= dplyr::case_when(cambio_fecha_ing_nuevo_t==1~fech_ing_next_treat+1,T~fech_ing_next_treat)) %>% 
  dplyr::mutate(diff_bet_treat= dplyr::case_when(cambio_fecha_ing_nuevo_t==1~fech_ing_next_treat-fech_egres_num, T~diff_bet_treat)) %>% 
  dplyr::select(-cambio_fecha_ing_nuevo_t) %>% 
  #dplyr::filter(diff_bet_treat_corr!=diff_bet_treat)
  #ADD DATES TO THE FINAL FOLLOW UP OF THE STUDY IF CENSORED
  dplyr::mutate(days_to_ther_disch= dplyr::case_when(ther_disch==0~ as.numeric(as.Date("2019-11-13"))-fech_ing_num,T~dias_treat_imp_sin_na)) %>% 
  dplyr::mutate(days_to_disch_wo_clin_adv= dplyr::case_when(disch_wo_clin_adv==0~ as.numeric(as.Date("2019-11-13"))-fech_ing_num,T~dias_treat_imp_sin_na)) %>% 

  ## 2021-03-24, I had to reespecify times to objective times, in order to avoid further problems
  dplyr::mutate(date_ther_disch= days_to_ther_disch, #fech_ing_num+ 
    date_disch_wo_clin_adv= days_to_disch_wo_clin_adv,
  ## 2021-03-24, The posterior treatment has to include the days in the previous treatment
    date_post_treat= dplyr::case_when(ther_disch==1 & readmission==1~ days_to_ther_disch+ diff_bet_treat,
                    disch_wo_clin_adv==1 & readmission==1~ days_to_disch_wo_clin_adv+ diff_bet_treat,
                    #if the first time is censored, 
                    disch_wo_clin_adv==0 & ther_disch==0 & readmission==1~ fech_ing_next_treat-fech_ing_num,
                    ther_disch==1 & disch_wo_clin_adv==0 & readmission==1~ fech_ing_next_treat-fech_ing_num,
                    readmission==0~ as.numeric(as.Date("2019-11-13"))-fech_ing_num)) %>% 
  ## EL RESTO DE LOS PACIENTES NO VAN A HABER REGISTRADO EVENTOS EN ESE TIEMPO, POR LO QUE LLEGARÁN AL FINAL DEL SEGUIMIENTO
  dplyr::select(row, fech_ing_num, date_ther_disch, ther_disch, date_disch_wo_clin_adv, disch_wo_clin_adv, date_post_treat, readmission, tipo_de_programa_2:numero_de_hijos_mod_rec,tipo_de_plan_res) %>% 
  as.data.frame() 

tail(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep) %>% 
      knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption="Table 8a. Data in Wide, 3-states",
               align= c("c",rep('c', 5)))%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size= 11)%>% 
  kableExtra::add_footnote("Note= Proportions from the initial state") %>% 
  kableExtra::scroll_box(width = "100%", height = "350px")

Table 8a. Data in Wide, 3-states
	row	fech_ing_num	date_ther_disch	ther_disch	date_post_treat	readmission	tipo_de_programa_2	estado_conyugal_2	edad_al_ing_grupos	escolaridad_rec	sus_principal_mod	freq_cons_sus_prin	compromiso_biopsicosocial	tenencia_de_la_vivienda_mod	num_otras_sus_mod	numero_de_hijos_mod_rec	tipo_de_plan_res
21373	18,172	15,187	3,026	0	1,761	1	General population	Single	30-39	3-Completed primary school or less	Cocaine paste	Daily	2-Moderate	Stays temporarily with a relative	One additional substance	Yes	Outpatient
21374	41,467	15,873	2,340	0	275	1	Women specific	Married/Shared living arrangements	18-29	3-Completed primary school or less	Cocaine paste	Daily	2-Moderate	Illegal Settlement	More than one additional substance	Yes	Residential
21375	16,343	15,114	3,099	0	2,244	1	Women specific	Single	18-29	2-Completed high school or less	Alcohol	2 to 3 days a week	3-Severe	Renting	More than one additional substance	Yes	Residential
21376	139,357	17,688	525	0	525	0	Women specific	Single	18-29	2-Completed high school or less	Cocaine paste	Daily	3-Severe	Stays temporarily with a relative	More than one additional substance	No	Residential
21377	24,900	15,345	349	1	2,868	0	Women specific	Single	18-29	1-More than high school	Alcohol	Daily	2-Moderate	Owner/Transferred dwellings/Pays Dividends	One additional substance	Yes	Outpatient
21378	41,118	15,874	2,339	0	2,339	0	Women specific	Married/Shared living arrangements	30-39	2-Completed high school or less	Cocaine paste	Daily	3-Severe	Stays temporarily with a relative	No additional substance	Yes	Residential
^a Note= Proportions from the initial state

tail(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep2) %>% 
      knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption="Table 8b. Data in Wide, 4-states",
               align= c("c",rep('c', 5)))%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size= 11)%>% 
  kableExtra::add_footnote("Note= Proportions from the initial state") %>% 
  kableExtra::scroll_box(width = "100%", height = "350px")

Table 8b. Data in Wide, 4-states
	row	fech_ing_num	date_ther_disch	ther_disch	date_disch_wo_clin_adv	disch_wo_clin_adv	date_post_treat	readmission	tipo_de_programa_2	estado_conyugal_2	edad_al_ing_grupos	escolaridad_rec	sus_principal_mod	freq_cons_sus_prin	compromiso_biopsicosocial	tenencia_de_la_vivienda_mod	num_otras_sus_mod	numero_de_hijos_mod_rec	tipo_de_plan_res
21373	18,172	15,187	3,026	0	304	1	1,761	1	General population	Single	30-39	3-Completed primary school or less	Cocaine paste	Daily	2-Moderate	Stays temporarily with a relative	One additional substance	Yes	Outpatient
21374	41,467	15,873	2,340	0	2,340	0	275	1	Women specific	Married/Shared living arrangements	18-29	3-Completed primary school or less	Cocaine paste	Daily	2-Moderate	Illegal Settlement	More than one additional substance	Yes	Residential
21375	16,343	15,114	3,099	0	115	1	2,244	1	Women specific	Single	18-29	2-Completed high school or less	Alcohol	2 to 3 days a week	3-Severe	Renting	More than one additional substance	Yes	Residential
21376	139,357	17,688	525	0	56	1	525	0	Women specific	Single	18-29	2-Completed high school or less	Cocaine paste	Daily	3-Severe	Stays temporarily with a relative	More than one additional substance	No	Residential
21377	24,900	15,345	349	1	2,868	0	2,868	0	Women specific	Single	18-29	1-More than high school	Alcohol	Daily	2-Moderate	Owner/Transferred dwellings/Pays Dividends	One additional substance	Yes	Outpatient
21378	41,118	15,874	2,339	0	38	1	2,339	0	Women specific	Married/Shared living arrangements	30-39	2-Completed high school or less	Cocaine paste	Daily	3-Severe	Stays temporarily with a relative	No additional substance	Yes	Residential
^a Note= Proportions from the initial state

#For censored state transitions it can be awkward having to replicate the censoring time for each non-visited state
#https://github.com/stulacy/multistateutils
mat_3_states <- trans.illdeath(names=c("Admission","Therapeutic\nDischarge","Readmission"))
#All possible paths through the multi-state model can be found here:
#paths(mat_3_states)

#Los números son determinados por posición en cada columna (o eje Y). 
#Si uno quiere definir para la otra fila, salta al siguiente vector: 
mat_4_states<- transMat(list(c(2,3,4), 4, 4, c()), 
                names = c("Admission", "Therapeutic\nDischarge", "Discharge Without\nMedical Advice", "Readmission"))

#illness-death model without recovery
ms_CONS_C1_SEP_2020_women_imputed <- msprep(time = c(NA, "date_ther_disch", "date_post_treat"), 
                                            status = c(NA, "ther_disch", "readmission"), 
                                            data = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep,
                                            id = "row",
                                            trans = mat_3_states,
                                            keep =  c("tipo_de_programa_2","estado_conyugal_2","edad_al_ing_grupos","escolaridad_rec","sus_principal_mod", "freq_cons_sus_prin","compromiso_biopsicosocial","tenencia_de_la_vivienda_mod","num_otras_sus_mod","numero_de_hijos_mod_rec","tipo_de_plan_res"))
#model of 5 states without recovery.
ms2_CONS_C1_SEP_2020_women_imputed <- msprep(time = c(NA, "date_ther_disch", 
                                                      "date_disch_wo_clin_adv", "date_post_treat"), 
                                            status = c(NA, "ther_disch", "disch_wo_clin_adv", "readmission"), 
                                            data = CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep2,
                                            id = "row",
                                            trans = mat_4_states,
                                            keep = c("tipo_de_programa_2","estado_conyugal_2","edad_al_ing_grupos","escolaridad_rec","sus_principal_mod", "freq_cons_sus_prin","compromiso_biopsicosocial","tenencia_de_la_vivienda_mod","num_otras_sus_mod","numero_de_hijos_mod_rec","tipo_de_plan_res"))

#Starting from state 1, simultaneous transitions possible for subjects 36666 36586 56465 136847 37595 60609 51706 76376 117544 140210 at times 126 472 32 36 1 203 45 14 5 71; smallest receiving state chosen
invisible(c("This problem responds to differences between treatments 0. Should be resolved in the initial stages"))
if(no_mostrar==1){
  CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep2 %>% 
    dplyr::filter(row %in% unlist(
      ms2_CONS_C1_SEP_2020_women_imputed %>% 
        dplyr::filter(Tstop<=Tstart) %>% 
        dplyr::select(row,from,to,trans,Tstart,Tstop,time,status) %>% 
        distinct(row)
    )) %>%
    #dplyr::mutate(diff_bet_treat=fech_ing_next_treat-fech_egres_num)%>% 
    View()
}

if(no_mostrar==1){
CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados %>% 
    dplyr::filter(row %in% unlist(
        ms2_CONS_C1_SEP_2020_women_imputed %>% 
            dplyr::filter(Tstop<=Tstart) %>% 
            dplyr::select(row,from,to,trans,Tstart,Tstop,time,status) %>% 
            distinct(row)
    )) %>%
    dplyr::select(row, motivodeegreso_mod_imp, contains("fech"))
}
invisible(c("Investigate warning: done"))
#37595    Administrative discharge    2013-03-18    2013-03-19    2013-03-19
#

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#SCALE INTERVALS TO YEARS
ms_CONS_C1_SEP_2020_women_imputed[, c("Tstart", "Tstop", "time")] <- ms_CONS_C1_SEP_2020_women_imputed[, c("Tstart", "Tstop", "time")]/365.25
ms2_CONS_C1_SEP_2020_women_imputed[, c("Tstart", "Tstop", "time")] <- ms2_CONS_C1_SEP_2020_women_imputed[, c("Tstart", "Tstop", "time")]/365.25

path<-rstudioapi::getSourceEditorContext()$path
if (grepl("CISS Fondecyt",path)==T){
    dta_path<-paste0("C:/Users/CISS Fondecyt/OneDrive/Escritorio/SUD_CL/")
  } else if (grepl("andre",path)==T){
    dta_path<-paste0('C:/Users/andre/Desktop/SUD_CL/')
  } else if (grepl("E:",path)==T){
    dta_path<-paste0("E:/Mi unidad/Alvacast/SISTRAT 2019 (github)/_WO vs MG/")
  } else {
    dta_path<-paste0("G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/_WO vs MG/")
  }

rio::export(
CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep %>% 
  dplyr::rename("id"="row", "ther_disch_time"="date_ther_disch","ther_disch_status"="ther_disch",
                "readmission_time"="date_post_treat","readmission_status"="readmission"),
paste0(dta_path,"three_st_msprep.dta"))


rio::export(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep2%>% 
  dplyr::rename("id"="row", "ther_disch_time"="date_ther_disch","ther_disch_status"="ther_disch",
                "disch_wo_clin_adv_time"="date_disch_wo_clin_adv",
                "disch_wo_clin_adv_status"="disch_wo_clin_adv",
                "readmission_time"="date_post_treat","readmission_status"="readmission"),
  paste0(dta_path,"four_st_msprep.dta"))

Then we present the transition with the frequencies and proportions.

data.frame(events(ms_CONS_C1_SEP_2020_women_imputed)$Frequencies) %>% 
    dplyr::filter(to!="total entering") %>% 
    left_join(data.frame(events(ms_CONS_C1_SEP_2020_women_imputed)$Proportions), by=c("from", "to")) %>% 
    dplyr::rename("Frequencies"="Freq.x", "Proportions"="Freq.y") %>% 
    dplyr::arrange(from, to) %>% 
    dplyr::mutate(diff=ifelse(as.character(from)!=as.character(to),0,1)) %>% 
    dplyr::filter(diff==0) %>%
    dplyr::select(-diff) %>% 
    dplyr::mutate(Proportions=scales::percent(Proportions)) %>% 
    knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption="Table 9a. Empirical State Transition Matrix, 3 States Model",
               align= c("c",rep('c', 5)))%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size= 11)%>% 
  kableExtra::add_footnote("Note= Proportions from the initial state") %>% 
  kableExtra::scroll_box(width = "100%", height = "350px")

Table 9a. Empirical State Transition Matrix, 3 States Model
from	to	Frequencies	Proportions
Admission	Therapeutic Discharge	5,114	23.92%
Admission	Readmission	4,348	20.34%
Admission	no event	11,916	55.74%
Therapeutic Discharge	Admission	0	0.00%
Therapeutic Discharge	Readmission	1,063	20.79%
Therapeutic Discharge	no event	4,051	79.21%
Readmission	Admission	0	0.00%
Readmission	Therapeutic Discharge	0	0.00%
Readmission	no event	5,411	100.00%
^a Note= Proportions from the initial state

data.frame(events(ms2_CONS_C1_SEP_2020_women_imputed)$Frequencies) %>% 
    dplyr::filter(to!="total entering") %>% 
    left_join(data.frame(events(ms2_CONS_C1_SEP_2020_women_imputed)$Proportions), by=c("from", "to")) %>% 
    dplyr::rename("Frequencies"="Freq.x", "Proportions"="Freq.y") %>% 
    dplyr::arrange(from, to) %>% 
    dplyr::mutate(diff=ifelse(as.character(from)!=as.character(to),0,1)) %>% 
    dplyr::filter(diff==0) %>%
    dplyr::select(-diff) %>%
    dplyr::mutate(Proportions=scales::percent(Proportions)) %>% 
    knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption="Table 9b. Empirical State Transition Matrix, 4 States Model",
               align= c("c",rep('c', 5)))%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size= 11)%>% 
  kableExtra::add_footnote("Note= Proportions from the initial state")%>%
  kableExtra::scroll_box(width = "100%", height = "350px")

Table 9b. Empirical State Transition Matrix, 4 States Model
from	to	Frequencies	Proportions
Admission	Therapeutic Discharge	5,114	23.9%
Admission	Discharge Without Medical Advice	11,995	56.1%
Admission	Readmission	753	3.5%
Admission	no event	3,516	16.4%
Therapeutic Discharge	Admission	0	0.0%
Therapeutic Discharge	Discharge Without Medical Advice	0	0.0%
Therapeutic Discharge	Readmission	1,063	20.8%
Therapeutic Discharge	no event	4,051	79.2%
Discharge Without Medical Advice	Admission	0	0.0%
Discharge Without Medical Advice	Therapeutic Discharge	0	0.0%
Discharge Without Medical Advice	Readmission	3,595	30.0%
Discharge Without Medical Advice	no event	8,400	70.0%
Readmission	Admission	0	0.0%
Readmission	Therapeutic Discharge	0	0.0%
Readmission	Discharge Without Medical Advice	0	0.0%
Readmission	no event	5,411	100.0%
^a Note= Proportions from the initial state

Consideration of the Appropriateness of the proportional hazards assumption

Continuous variables need to be categorised into groups. The plot described is also known as the log(−log(survival)) plot, as the cumulative hazard is equal to the negative logarithm of the survival proportion. This approach requires a subjective assessment (Bradburn, Clark, Love, et al., 2003).

#Bradburn, M., Clark, T., Love, S. et al. Survival Analysis Part III: Multivariate data analysis – choosing a model and assessing its adequacy and fit. Br J Cancer 89, 605–611 (2003). https://doi.org/10.1038/sj.bjc.6601120

#reescribo el tipo de programa para indicar 0 y 1s para tratamiento vs. control
if(dimnames(table(ms_CONS_C1_SEP_2020_women_imputed$tipo_de_programa_2))[[1]][1]=="General population"){
ms_CONS_C1_SEP_2020_women_imputed$tipo_de_programa_2<-ifelse(ms_CONS_C1_SEP_2020_women_imputed$tipo_de_programa_2=="Women specific",1,0)
}
if(dimnames(table(ms2_CONS_C1_SEP_2020_women_imputed$tipo_de_programa_2))[[1]][1]=="General population"){
ms2_CONS_C1_SEP_2020_women_imputed$tipo_de_programa_2<-ifelse(ms2_CONS_C1_SEP_2020_women_imputed$tipo_de_programa_2=="Women specific",1,0)
}

# ’expand.covs()’ permits to define the set of covariates which impacts
# each transition:
#NO SE SI CONVIENE TRAER ESTA FUNCION
#data_mstate <- expand.covs(ms2_CONS_C1_SEP_2020_women_imputed, names(CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados_msprep)[-c(1:6)], append = TRUE, longnames = FALSE)

plots<- data.frame(title=rep(c("(a) Admission -> Ther. Disch.","(b) Admission -> Readmission","(c) Ther. Disch. -> Readmission"),1),trans=rep(1:3,1))

## SIN COVARIABLES
layout(matrix(1:6, nc = 2, byrow = F))
for(i in c(1:3)){
plot(log(survfit(Surv(time,status)~1, data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==plots[i,"trans"] & tipo_de_programa_2==0))$time), 
     log(-log(survfit(Surv(time,status)~1, data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==plots[i,"trans"] & tipo_de_programa_2==0))$surv)), type="l",
     xlab="log(Years)", ylab="", xaxs="i",yaxs="i",
     las=1,cex.lab=1.5, cex.axis=1.5)
lines(log(survfit(Surv(time,status)~1, data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==plots[i,"trans"] & tipo_de_programa_2==1))$time), 
      log(-log(survfit(Surv(time,status)~1, data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==plots[i,"trans"] & tipo_de_programa_2==1))$surv)), lty=2)
legend(0,-4, c("MG", "WO"), bty="n", lty=c(2,1), cex=1.5)
title(main=paste0("LOG CUMULATIVE HAZARD VS LOG TIME PLOT \n",plots[i,"title"]), cex.main=1.5)
}

for(i in c(1:3)){
plot(survfit(Surv(time,status)~1, data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==plots[i,"trans"] & tipo_de_programa_2==0))$time, 
     -log(survfit(Surv(time,status)~1, data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==plots[i,"trans"] & tipo_de_programa_2==0))$surv), type="l",
     xlab="Years", ylab="", xaxs="i",yaxs="i", 
     las=1,cex.lab=1.5, cex.axis=1.5, col=1)
lines(survfit(Surv(time,status)~1, data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==plots[i,"trans"] & tipo_de_programa_2==1))$time, 
      -log(survfit(Surv(time,status)~1, data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==plots[i,"trans"] & tipo_de_programa_2==1))$surv), lty=2)
legend(6,.1, c("MG", "WO"), bty="n", lty=c(2,1), cex=1.5)
title(main=paste0("CUMULATIVE HAZARD PLOT: -LOG(KM SURVIVAL) \n",plots[i,"title"]), cex.main=1.5)
}

Figure 9a. Vissual Assessment of Hazards, Three-states Model (w/o covars)

As seen n the Figure above, the cumulative hazards does not follow a proportional trend in the transitions of (b) Admission → Readmission, and (c) Ther. Disch. → Readmission between Mixed gender or Women-only programs.

plots2<- data.frame(title=rep(c("(a) Admission -> Ther. Disch.","(b) Admission -> Discharge w/o Med. Adv.","(c) Admission -> Readmission", "(d) Ther. Disch. -> Readmission","(e) Discharge w/o Med. Adv. -> Readmission"),1),trans=1:5)

layout(matrix(1:10, nc = 2, byrow = F))
for(i in c(1:5)){
plot(log(survfit(Surv(time,status)~1, data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans==plots2[i,"trans"] & tipo_de_programa_2==0))$time), 
     log(-log(survfit(Surv(time,status)~1, data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans==plots2[i,"trans"] & tipo_de_programa_2==0))$surv)), type="l",
     xlab="log(Years)", ylab="", xaxs="i",yaxs="i",
     las=1,cex.lab=1.5, cex.axis=1.5)
lines(log(survfit(Surv(time,status)~1, data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans==plots2[i,"trans"] & tipo_de_programa_2==1))$time), 
      log(-log(survfit(Surv(time,status)~1, data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans==plots2[i,"trans"] & tipo_de_programa_2==1))$surv)), lty=2)
legend(0,-4, c("MG", "WO"), bty="n", lty=c(2,1), cex=1.5)
title(main=paste0("LOG CUMULATIVE HAZARD VS LOG TIME PLOT \n",plots2[i,"title"]), cex.main=1.5)
}

for(i in c(1:5)){
plot(survfit(Surv(time,status)~1, data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans==plots2[i,"trans"] & tipo_de_programa_2==0))$time, 
     -log(survfit(Surv(time,status)~1, data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans==plots2[i,"trans"] & tipo_de_programa_2==0))$surv), type="l",
     xlab="Years", ylab="", xaxs="i",yaxs="i", 
     las=1,cex.lab=1.5, cex.axis=1.5, col=1)
lines(survfit(Surv(time,status)~1, data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans==plots2[i,"trans"] & tipo_de_programa_2==1))$time, 
      -log(survfit(Surv(time,status)~1, data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans==plots2[i,"trans"] & tipo_de_programa_2==1))$surv), lty=2)
legend(7.5,.23, c("MG", "WO"), bty="n", lty=c(2,1), cex=1.5)
title(main=paste0("CUMULATIVE HAZARD PLOT: -LOG(KM SURVIVAL) \n",plots2[i,"title"]), cex.main=1.5)
}

Figure 9b. Vissual Assessment of Hazards, Four-states Model (w/o covars)

As seen in the figures above, the transition (b) Admission → Discharge w/o Med. Adv., did not show a proportional trajectory in time between women in the General-population and Women-specific programs in the clog-log plot. On the other part, the transitions (c) Admission → Readmission, (d) Ther. Disch. → Readmission, (e) Discharge w/o Med. Adv. → Readmission did not follow proportional cumulative hazards between women in the Mixed gender or Women-only programs.

#The Coxproportional hazards model can still be used, but the interpretation of the results is different. Thiswill be outlined in some detail in Section 3

#The effect of covariates on disease progression is most often modelled using the Cox proportionalhazards model. 

#https://onlinelibrary.wiley.com/doi/epdf/10.1002/sim.2712
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#If the SR plot for a given variable shows deviation from a straight line while it stays flat for the rest of the variables, then it is something you shouldn't ignore. First thing you can do is to look at the results of the global test. The global test might indicate the overall assumption of PH holds true [or not]. If the global test is fine then switching the reference category of the variable for which the assumption didn't held true, you might be able to achieve PH. The hazards may be proportional when compared to one reference category but not the other. Hence, by switching the reference categories, you might be able to find the category which results in PH assumption being true.

#If the switching doesn't solve your problem, and assuming you have got the right variables in your model, then this indicates that the hazards is not proportional for this particular variable i.e. different hazards at different time points. Hence, you may want to introduce interaction between variable and time in your model.

#_________
#https://stats.stackexchange.com/questions/33615/schoenfeld-residuals?rq=1
#Judgement of proportional hazards(PH) should be based on the results from a formal statistical test and the Schoenfeld residuals (SR) plot together.

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#The best cures for some problems--running an experiment longer or doing more aggressive follow-up to avoid a large proportion of censored values, or using a large enough sample size to lessen the problems of small sample sizes--are outside the scope of statistical analysis per se.

#Alternative procedures:
#Stratification: Dividing the sample into homogeneous subsamples
#Parametric methods: Using methods that assume a specific survival distribution
#Choosing a particular nonparametric test: Situations favoring one of Mantel-Cox, Gehan-Breslow, or Tarone-Ware over the others
#Other nonparametric tests

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#Is it possible that the analysis could be valid despite the non-proportional hazards?

#As others have mentioned, large sample size may be a factor leading to the violation of the statistical test of the PH assumption. As @EdM mentioned, it will depend on your context/knowledge of the subject as to whether the deviations matter From page 1461 of this paper

#When the sample size is small, this method may lack power to detect deviations from PH; while for large sample sizes, hypothesis tests may be over sensitive to slight deviations from this assumption.

#The answer to this question suggests that the effect of fitting a Cox model with non-proportional hazards is "slightly less power" which can be recovered with robust standard errors, leading to the hazard ratios being interpreted as the time-weighted average of the hazard ratio

#This interesting paper (Why Test for Proportional Hazards?) was published recently highlighting that there are legitimate reasons to assume violation of PH, and that one of the effects is on the interpretation rather than invalidity of the results. They actually suggest that statistical testing of PH is unnecessary if it's expected that the hazard ratio varies over time. (I'd be interested to hear what others think of this paper)

#In the section "How should hazard ratios be interpreted?" on page 1402

#As a weighted average of the time-varying hazard ratios, the hazard ratio estimate from a Cox proportional hazards model is often used as a convenient summary of the treatment effect during the follow-up. However, a hazard ratio from a Cox model needs to be interpreted as a weighted average of the true hazard ratios over the entire follow-up period. The 95% confidence interval should be estimated using a valid method such as bootstrapping and also using inverse probability weighting to adjust for losses to follow-up.
#_________
#xtna (https://stats.stackexchange.com/users/307358/xtna), Violated non-proportional hazards - Cox regression model of time-dependent covariable, URL (version: 2021-01-15): https://stats.stackexchange.com/q/505055

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#The Cox proportional hazards model has traditionally been applied to assess the accuracy of prognostic models. However, it may be suboptimal due to the inflexibility to model the baseline survival function and when the proportional hazards assumption is violated

#_________
#Miladinovic B, Kumar A, Mhaskar R, Kim S, Schonwetter R, Djulbegovic B (2012) A Flexible Alternative to the Cox Proportional Hazards Model for Assessing the Prognostic Accuracy of Hospice Patient Survival. PLoS ONE 7(10): e47804. https://doi.org/10.1371/journal.pone.0047804

Decision whether to use Markov or Semi-Markov

For the three-states model, we only looked at transition 3 as only transition where different times entering the state. Interested in knowing the risk of readmission in completion of a treatment w/therapeutic discharge (status) from different treatments and different time to progression.

In the four-states model, we looked at transition 4 and 5.

#state arrival extended (semi-)Markov to mean that the i → j transition hazard depends on thetime of arrival at state i. 

#Build a Cox proportional hazard model including treatment and time in previous state as covariates
CoxMarkov<-coxph(Surv(Tstart,Tstop,status)~factor(tipo_de_programa_2)+Tstart,
                  data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==3),method = "breslow")

## Warning in coxph(Surv(Tstart, Tstop, status) ~ factor(tipo_de_programa_2) + : a
## variable appears on both the left and right sides of the formula

HR<-round(exp(coef(CoxMarkov)),2)
CI<-round(exp(confint(CoxMarkov)),2)
P<-round(coef(summary(CoxMarkov))[,5],4)

CoxMarkov12<-coxph(Surv(time,status)~factor(tipo_de_programa_2)+Tstart,
                  data=subset(ms_CONS_C1_SEP_2020_women_imputed, trans==3),method = "breslow") 
HR12<-round(exp(coef(CoxMarkov12)),2)
CI12<-round(exp(confint(CoxMarkov12)),2)
P12<-round(coef(summary(CoxMarkov12))[,5],4)

#In Surv(Tstart, Tstop, status) : Stop time must be > start time, NA created
CoxMarkov2<-coxph(Surv(Tstart,Tstop,status)~factor(tipo_de_programa_2)+Tstart,
                  data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans  %in% c(4,5)),method = "breslow")

## Warning in coxph(Surv(Tstart, Tstop, status) ~ factor(tipo_de_programa_2) + : a
## variable appears on both the left and right sides of the formula

HR2<-round(exp(coef(CoxMarkov2)),2)
CI2<-round(exp(confint(CoxMarkov2)),2)
P2<-round(coef(summary(CoxMarkov2))[,5],4)
tab_cox_markov<- as.data.frame(cbind(HR2,CI2,P2))

CoxMarkov22<-coxph(Surv(time,status)~factor(tipo_de_programa_2)+Tstart,
                  data=subset(ms2_CONS_C1_SEP_2020_women_imputed, trans  %in% c(4,5)),method = "breslow") 
HR22<-round(exp(coef(CoxMarkov22)),2)
CI22<-round(exp(confint(CoxMarkov22)),2)
P22<-round(coef(summary(CoxMarkov22))[,5],4)
tab_cox_markov2<- as.data.frame(cbind(HR22,CI22,P22))

#Stop time must be > start time, NA created #INVESTIGATE - RESOLVED IN 2021-03-24

rbindlist(list(data.table(cbind(HR,CI,P),keep.rownames=T),
      data.table(cbind(HR12,CI12,P12),keep.rownames=T),
      data.table(cbind(HR2,CI2,P2),keep.rownames=T),
      data.table(cbind(HR22,CI22,P22),keep.rownames=T))) %>% 
  dplyr::rename("Terms"="rn") %>% 
  dplyr::mutate(Terms=dplyr::case_when(grepl("tipo_de_", Terms)~"Type of Program",
                                    grepl("Tstart",Terms)~"Time in previous state(in years)")) %>% 
  dplyr::mutate(P=ifelse(as.numeric(P)<.001,"<.001",as.character(round(P,3)))) %>% 
  dplyr::rename("Sig."="P") %>% 
  dplyr::mutate(`95% CIs`=paste0(`2.5 %`,", ",`97.5 %`)) %>% 
  dplyr::select(-`2.5 %`,-`97.5 %`) %>% 
  dplyr::select(Terms, HR, `95% CIs`, Sig.) %>% 
      knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption="Table 10. PH Model incluiding time in previous state & Type of Program as a covariate",
               align= c("c",rep('c', 5)))%>%
  kableExtra::pack_rows("Three-states", 1, 2) %>% 
  kableExtra::pack_rows("Three-states- Renewal time", 3, 4) %>% 
  kableExtra::pack_rows("Four-states", 5, 6) %>% 
  kableExtra::pack_rows("Four-states- Renewal time", 7, 8) %>% 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size= 11)%>% 
  kableExtra::scroll_box(width = "100%", height = "350px")

## Column 2 ['HR12'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names. use.names='check' (default from v1.12.2) emits this message and proceeds as if use.names=FALSE for  backwards compatibility. See news item 5 in v1.12.2 for options to control this message.

Table 10. PH Model incluiding time in previous state & Type of Program as a covariate
Terms	HR	95% CIs	Sig.
Three-states
Type of Program	1.85	1.64, 2.09	<.001
Time in previous state(in years)	1.55	1.37, 1.76	<.001
Three-states- Renewal time
Type of Program	1.83	1.63, 2.07	<.001
Time in previous state(in years)	0.89	0.8, 1	0.042
Four-states
Type of Program	1.48	1.39, 1.56	<.001
Time in previous state(in years)	1.56	1.47, 1.66	<.001
Four-states- Renewal time
Type of Program	1.50	1.42, 1.59	<.001
Time in previous state(in years)	0.90	0.85, 0.95	<.001

#a variable appears on both the left and right sides of the formula
#this warning should be normal, since we are dealing with time to arrival at a determined state.

columna_dummy <- function(df, columna) {
  df %>% 
  mutate_at(columna, ~paste(columna, eval(as.symbol(columna)), sep = "_")) %>% 
    mutate(valor = 1) %>% 
    spread(key = columna, value = valor, fill = 0)
}

catVars<-c("edad_al_ing_grupos","escolaridad_rec","sus_principal_mod","freq_cons_sus_prin","compromiso_biopsicosocial",
           "tenencia_de_la_vivienda_mod","num_otras_sus_mod","numero_de_hijos_mod_rec","tipo_de_programa_2","tipo_de_plan_res")

ms_CONS_C1_SEP_2020_women_imputed_mod<-ms_CONS_C1_SEP_2020_women_imputed
for (i in c(1:length(catVars))){#catVars[-10] excluding treatment indicator
  cat<-as.character(catVars[i])#catVars[-10] excluding treatment indicator
  ms_CONS_C1_SEP_2020_women_imputed_mod<-columna_dummy(ms_CONS_C1_SEP_2020_women_imputed_mod,cat)
}

colnames(ms_CONS_C1_SEP_2020_women_imputed_mod)[8:length(ms_CONS_C1_SEP_2020_women_imputed_mod)]<-
  colnames(janitor::clean_names(ms_CONS_C1_SEP_2020_women_imputed_mod))[8:length(ms_CONS_C1_SEP_2020_women_imputed_mod)]

attr(ms_CONS_C1_SEP_2020_women_imputed_mod,"trans")<-mat_3_states

#data: dataset in etm format: "entry", "exit", "from", "to", "id". Should also contain the relevant covariates: no factors allowed
#Multi-state data in msdata format. Should also contain (dummy codings of) the relevant covariates; no factors allowed  
#  ms_CONS_C1_SEP_2020_women_imputed$id<-ms_CONS_C1_SEP_2020_women_imputed$row

ms_CONS_C1_SEP_2020_women_imputed_mod$id<-ms_CONS_C1_SEP_2020_women_imputed_mod$row
ms_CONS_C1_SEP_2020_women_imputed_mod$row<-NULL
formula_char<-
"edad_al_ing_grupos_18_29+ edad_al_ing_grupos_30_39+ 
edad_al_ing_grupos_40_49+ edad_al_ing_grupos_50+ escolaridad_rec_1_more_than_high_school+
escolaridad_rec_2_completed_high_school_or_less+ escolaridad_rec_3_completed_primary_school_or_less+
sus_principal_mod_alcohol+ sus_principal_mod_cocaine_hydrochloride+ sus_principal_mod_cocaine_paste+
sus_principal_mod_marijuana+ sus_principal_mod_other+ freq_cons_sus_prin_1_day_a_week_or_more+
freq_cons_sus_prin_2_to_3_days_a_week+ freq_cons_sus_prin_4_to_6_days_a_week+
freq_cons_sus_prin_daily+ freq_cons_sus_prin_less_than_1_day_a_week+
compromiso_biopsicosocial_1_mild+ compromiso_biopsicosocial_2_moderate+
compromiso_biopsicosocial_3_severe+ tenencia_de_la_vivienda_mod_illegal_settlement+
tenencia_de_la_vivienda_mod_others+ tenencia_de_la_vivienda_mod_owner_transferred_dwellings_pays_dividends+
tenencia_de_la_vivienda_mod_renting+ tenencia_de_la_vivienda_mod_stays_temporarily_with_a_relative+
num_otras_sus_mod_more_than_one_additional_substance+ num_otras_sus_mod_no_additional_substance+
num_otras_sus_mod_one_additional_substance+ numero_de_hijos_mod_rec_no+
numero_de_hijos_mod_rec_yes+ tipo_de_programa_2_1+
tipo_de_programa_2_0+ tipo_de_plan_res_outpatient+
tipo_de_plan_res_residential
"

MT <- MarkovTest(ms_CONS_C1_SEP_2020_women_imputed_mod, 
                 formula= formula_char,
                 transition = 2,
                 grid = 1,#seq(0, 11, by = 1/12), 
                 B = 25)
#Tried with transition 2 and 3
#Error in rep(mmm, length.out = l1) : 
#  attempt to replicate an object of type 'symbol'
#Además: There were 50 or more warnings (use warnings() to see the first 50)

data<-ms_CONS_C1_SEP_2020_women_imputed_mod #no puede ir arrival
id<-"id"
transition<-3
grid<-90 #3 months
grid<-1096 #3 years
dist<-"poisson"
trans=ifelse(is.null(attr(data, "trans")),get("mat_3_states"),attr(data, "trans"))
fn = list(function(x) mean(abs(x), na.rm = TRUE))
fn2 = list(function(x) mean(x, na.rm = TRUE))
formula<-formula_char
B=25

MarkovTest <- function(data, id, formula = NULL, transition, grid,
                       trans=NULL,
                       B = 1000,
                       fn = list(function(x) mean(abs(x), na.rm = TRUE)),
                       fn2 = list(function(x) mean(x, na.rm = TRUE)),
                       min_time = 0,
                       other_weights = NULL,
                       dist = c("poisson", "normal")) {
  
  dist <- match.arg(dist)
  if (missing(id)) id <- "id"
  # Remove "empty" lines in the data
  wh <- which(data$Tstop <= data$Tstart)
  if (length(wh)>0)
  {
    warning(length(wh), " lines with Tstart <= Tstop, have been removed before applying tests!")
    data <- data[-wh, ]
  }

  # Convert data to etm data
  # Make sure to retain all covariates (possibly way to many) in msdata (needed in formula perhaps)
  mtch <- match(c("id", "from", "to", "trans", "Tstart", "Tstop", "status"), names(data)) 
  covcols <- 1:ncol(data)
  covcols <- covcols[!covcols %in% mtch]
  ncovs <- length(covcols)
  
  trans <- get("mat_3_states")
  etmdata <- msdata2etm(data, id)
  if (ncovs > 0) etmdata <- msdata2etm(data, id, names(data)[covcols])
  trans2 <- to.trans2(trans)
  tfrom <- trans2$from[trans2$transno == transition]
  tto <- trans2$to[trans2$transno == transition]
  
  # Determine qualifying set
  qualset <- c(tfrom, which(trans2Q(trans)[, tfrom] > 0))
  qualset <- sort(unique(qualset))  # for circular models, tfrom is included twice

  # Functions
  if (!is.list(fn)) 
    fn <- list(fn)  # coerce to be list if a single function is provided
  if (is.list(fn) & is.function(fn[[1]])) {
    # coerce to be a list of lists, by repeating the same list each time
    tempfn <- list()
    for (i in 1:length(qualset)) tempfn[[i]] <- fn
    fn <- tempfn
  }
  if (!is.list(fn2)) 
    fn2 <- list(fn2)  # coerce to be list if a single function is provided
  
  # Establish the relevant patients who ever enter tfrom
  relpat <- sort(unique(etmdata$id[etmdata$from == tfrom]))
  
  rdata <- etmdata[etmdata$from == tfrom, ]  # only need time periods in the relevant state...
  rdata$status <- 1 * (rdata$to == tto)
  if (!is.null(formula)) {
    form <- as.formula(paste("Surv(entry, exit, status) ~ ", formula, 
                             sep = ""))
    progfit <- coxph(form, data = rdata)
    if (length(progfit$coefficients) > 0) {
      #Sacado por andrés
      Zmat <- as.matrix(rdata[, match(names(progfit$coefficients), 
                                      names(rdata))])
      #Zmat <- as.matrix(rdata[, 7:44])
      Ncov <- dim(Zmat)[2]
    } else {
      Ncov <- 0
    }
    if (!is.null(progfit$offset)) {
      offset <- progfit$offset
    } else {
      offset <- rep(0, dim(rdata)[1])
    }
  } else {
    Ncov <- 0
    offset <- rep(0, dim(rdata)[1])
    progfit <- NULL
  }
  
  # Minimal data, change names
  progdat <- rdata[, match(c("id", "entry", "exit", "status"), names(rdata))]
  names(progdat) <- c("id", "T0", "T1", "D")
  
  nobs_grid <- sapply(grid, function(x) sum(progdat$D[progdat$T1 > x]))
  
  # Have the extra dimension of indexes
  index_gM <- array(0, c(length(relpat), length(grid), length(qualset)))
  for (indx in 1:length(qualset)) {
    qualstate <- qualset[indx]
    index_g <- sapply(grid, function(y) sapply(relpat, function(x)
      which(etmdata$entry < y & etmdata$exit >= y & etmdata$id == x)))
    index_g <- array(1 * (etmdata$from[sapply(index_g, function(y)
      ifelse(length(y) > 0, y, 
             dim(etmdata)[1] + 1))] == qualstate), c(length(relpat), length(grid)))
    index_g[is.na(index_g)] <- 0
    index_gM[, , indx] <- index_g
  }

  # Need a separate Z3mat for each group as well...
  Z3mat <- index_gM[match(progdat$id, relpat), , , drop = FALSE]
  
  N1 <- dim(progdat)[1]
  
  if (Ncov > 0) {
    LP <- c(Zmat %*% progfit$coefficients) + offset
  } else {
    LP <- rep(0, N1) + offset
  }
  S0 <- sapply(1:N1, function(x) sum(exp(LP) * (progdat$T0 < progdat$T1[x] & 
                                                  progdat$T1 >= progdat$T1[x])))
  
  incr <- progdat$D / S0
  cumhaz <- approxfun(c(0, sort(unique(progdat$T1)), Inf),
                      c(0, cumsum(tapply(incr, progdat$T1, sum)), sum(incr)),
                      method = "constant")
  resid_mat <- sapply(grid, function(x) progdat$D * (progdat$T1 > x) - exp(LP) *
                        (cumhaz(pmax(x, progdat$T1)) - cumhaz(pmax(x, progdat$T0))))
  
  # Have a separate trace for each qualifying state...
  obs_trace <- array(0, c(length(grid), length(qualset)))
  for (indx in 1:(length(qualset))) {
    obs_trace[, indx] <- sapply(1:length(grid), function(k)
      sum(resid_mat[, k] * Z3mat[, k, indx] * (progdat$T1 > grid[k])))
  }
  
  nqstate <- length(qualset)
  if (Ncov > 0) 
    Ifish <- progfit$var
  
  N1 <- dim(progdat)[1]
  if (Ncov > 0) 
    Zbar0 <- array(0, c(N1, Ncov))
  
  Zbar <- array(0, c(N1, length(grid), nqstate))
  for (i in 1:N1) {
    x <- i
    if (Ncov > 0) {
      for (j in 1:Ncov) {
        Zbar0[i, j] <- sum(Zmat[, j] * exp(LP) *
                             (progdat$T0 < progdat$T1[x] & progdat$T1 >= progdat$T1[x])) /
          sum(exp(LP) * (progdat$T0 < progdat$T1[x] & progdat$T1 >= progdat$T1[x]))
      }
    }
    for (j in 1:length(grid)) {
      for (k in 1:nqstate)
        Zbar[i, j, k] <- sum(Z3mat[, j, k] * exp(LP) * 
                               (progdat$T0 < progdat$T1[x] & progdat$T1 >= progdat$T1[x])) /
          sum(exp(LP) * (progdat$T0 < progdat$T1[x] & progdat$T1 >= progdat$T1[x]))
    }
  }
  
  NAe <- incr
  
  if (Ncov > 0) {
    Hmat <- array(0, c(length(grid), Ncov, nqstate))
    for (j in 1:Ncov) {
      # for (k in 1:nqstate) Hmat[,j,k] <- sapply(1:length(grid),function(y)
      # sum(sapply(1:N1,function(x) sum(exp(LP) *Zmat[,j]* (Z3mat[x,y,k] -
      # Zbar[x,y,k]) * NAe[x] * (progdat$T0[x] > grid[y] & progdat$T1[x] <=
      # progdat$T1)))))
      for (k in 1:nqstate) Hmat[, j, k] <- sapply(1:length(grid), function(y) 
        sum(sapply(1:N1, function(x)
          sum(exp(LP[x]) * ((Zmat[x, j] - Zbar0[, j]) *
                              (Z3mat[x, y, k] - Zbar[, y, k])) * NAe *
                (progdat$T1[x] > grid[y]) * (progdat$T1 > progdat$T0[x] & progdat$T1 <= progdat$T1[x])))))
    }
  }
  
  if (Ncov > 0) {
    multiplier <- array(0, dim(Hmat))
    for (k in 1:nqstate) multiplier[, , k] <- Hmat[, , k] %*% Ifish
    est_cov <- array(0, c(length(grid), nqstate, nqstate))
    for (indx1 in 1:nqstate) {
      for (indx2 in (indx1):nqstate) {
        est_var <- sapply(1:length(grid), function(k)
          sum(sapply(1:N1, function(v) 
            sum(((Z3mat[v, k, indx1] - Zbar[, k, indx1]) * 
                   (progdat$T1 > grid[k]) - c(multiplier[k, , indx1, drop = FALSE] %*%
                                                t(Zmat[v, ] - Zbar0))) *
                  ((Z3mat[v, k, indx2] - Zbar[, k, indx1]) * (progdat$T1 > grid[k]) - 
                     c(multiplier[k, , indx2, drop = FALSE] %*% t(Zmat[v, ] - Zbar0))) *
                  exp(LP[v]) * (progdat$T0[v] < progdat$T1 & progdat$T1[v] >= progdat$T1) * NAe))))
        est_cov[, indx1, indx2] <- est_cov[, indx2, indx1] <- est_var
      }
    }
    
  } else {
    est_cov <- array(0, c(length(grid), nqstate, nqstate))
    for (indx1 in 1:nqstate) {
      for (indx2 in (indx1):nqstate) {
        est_var <- sapply(1:length(grid), function(k)
          sum(sapply(1:N1, function(v)
            sum((Z3mat[v, k, indx1] - Zbar[, k, indx1]) * (Z3mat[v, k, indx2] - Zbar[, k, indx2]) *
                  exp(LP[v]) * (progdat$T1 > grid[k] & progdat$T0[v] < progdat$T1 & progdat$T1[v] >= progdat$T1) * NAe))))
        est_cov[, indx1, indx2] <- est_cov[, indx2, indx1] <- est_var
      }
    }
  }
  
  # First obtain the individually normalized traces...
  est_var <- obs_norm_trace <- array(0, c(length(grid), nqstate))
  for (k in 1:nqstate) {
    est_var[, k] <- est_cov[cbind(1:length(grid), k, k)]
    # This should be the same as before...
    obs_norm_trace[, k] <- obs_trace[, k] / sqrt(est_var[, k] + 1 * (est_var[, k] == 0))
  }
  # Find singular matrices
  obs_chisq_trace <- rep(0, length(grid))
  for (k in 1:length(grid)) {
    sol <- tryCatch(solve(est_cov[k, -1, -1]), error = function(e)
      return(diag(0, nqstate - 1)))
    obs_chisq_trace[k] <- (obs_trace[k, -1]) %*% sol %*%
      (obs_trace[k, -1]) # do something about singular matrices...
  }
  
  ############## 
  
  n_wb_trace <- wb_trace0 <- wb_trace <- array(0, c(B, length(grid), nqstate))
  nch_wb_trace <- array(0, c(B, length(grid)))
  for (wb in 1:B) {
    if (dist == "poisson") {
      G <- rpois(dim(progdat)[1], 1) - 1
    } else if (dist == "normal") {
      G <- rnorm(dim(progdat)[1], 0, 1)
    } else stop("argument dist should be poisson or normal")
    trace0 <- array(0, c(length(grid), nqstate))
    for (k in 1:nqstate) {
      trace0[, k] <- apply(sapply(1:length(grid), function(x)
        progdat$D * (Z3mat[, x, k] - Zbar[, x, k]) * (progdat$T1 > grid[x]) * G), 2, sum)
      if (Ncov > 0) {
        Imul <- sapply(1:Ncov, function(x)
          sum(progdat$D * (Zmat[, x] - Zbar0[, x]) * G))
        trace1 <- (Hmat[, , k] %*% Ifish %*% Imul)[, 1]
      } else {
        trace1 <- 0
      }
      wb_trace[wb, , k] <- trace0[, k] - trace1
      n_wb_trace[wb, , k] <- wb_trace[wb, , k]/sqrt(est_var[, k] + 
                                                      1 * (est_var[, k] == 0))
      for (w in 1:length(grid)) {
        sol <- tryCatch(solve(est_cov[w, -1, -1]), error = function(e)
          return(diag(0, nqstate - 1)))
        nch_wb_trace[wb, w] <- (wb_trace[wb, w, -1]) %*% sol %*% 
          (wb_trace[wb, w, -1]) # do something about singular matrices...
      }
    }
  }
  
  # Need to have one of these per nqstate
  NS <- length(fn[[1]])
  
  orig_stat <- array(sapply(1:nqstate, function(y)
    sapply(fn[[y]], function(g) g(obs_norm_trace[, y]))), c(NS, nqstate))
  orig_ch_stat <- sapply(fn2, function(g) g(obs_chisq_trace))
  
  p_stat_wb <- array(0, c(NS, nqstate))
  wb_stat <- array(0, c(B, NS, nqstate))
  for (k in 1:nqstate) {
    wb_stat[, , k] <- array(t(apply(n_wb_trace[, , k, drop = FALSE], 
                                    1, function(x)
                                      sapply(fn[[k]], function(g) g(x)))), c(B, NS))
    p_stat_wb[, k] <- sapply(1:NS, function(x) mean(wb_stat[, x, k] > orig_stat[x, k]))
  }
  est_quant <- array(0, c(2, length(grid), nqstate))
  for (k in 1:nqstate)
    est_quant[, , k] <- apply(n_wb_trace[, , k, drop = FALSE], 2,
                              quantile, c(0.025, 0.975), na.rm = TRUE)
  NS2 <- length(fn2)
  p_ch_stat_wb <- rep(0, NS2)
  wb_ch_stat <- array(t(apply(nch_wb_trace, 1, function(x)
    sapply(fn2, function(g) g(x)))), c(B, NS2))
  p_ch_stat_wb <- sapply(1:NS2, function(x) mean(wb_ch_stat[, x] > orig_ch_stat[x]))
  # Is a question whether should use Nsub as number of subjects or number
  # of spells within the state
  
  MTres <- list(orig_stat = orig_stat, orig_ch_stat = orig_ch_stat, p_stat_wb = p_stat_wb, 
                p_ch_stat_wb = p_ch_stat_wb, b_stat_wb = wb_stat, zbar = obs_norm_trace, 
                nobs_grid = nobs_grid, Nsub = length(relpat), est_quant = est_quant, 
                obs_chisq_trace = obs_chisq_trace, nch_wb_trace = nch_wb_trace, 
                n_wb_trace = n_wb_trace, est_cov = est_cov, transition = transition,
                from = tfrom, to = tto, B = B, dist = dist,
                qualset = qualset, coxfit = progfit, fn = fn, fn2 = fn2)
  
  class(MTres) <- c("MarkovTest")
  return(MTres)
}

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#' @export
optimal_weights_multiple <- function(data, id, grid, transition, min_time = 0)
{
  # Convert data to etm data
  trans <- attr(data, "trans")
  etmdata <- msdata2etm(data, id)
  trans2 <- to.trans2(trans)
  from <- trans2$from[trans2$transno == transition]
  to <- trans2$to[trans2$transno == transition]

  numbers <- sapply(grid, function(x)
    table(factor(etmdata$from)[(etmdata$entry <= x & etmdata$exit > x)]))
  subevent <- sapply(grid, function(x)
    sum(etmdata$from == from & etmdata$to == to & etmdata$exit > x))
  tnumbers <- apply(numbers, 2, sum)
  weights <- sapply(1:dim(numbers)[1], function(x)
    subevent * numbers[x, ] * (tnumbers - numbers[x, ])/tnumbers^2)
  weights[is.nan(weights)] <- 0
  weight <- apply(weights, 1, max)
  weight * diff(c(min_time, grid))
}

#' @export
optimal_weights_matrix <- function(data, id, grid, transition, min_time = 0, 
                                   other_weights = NULL)
{
  # Convert data to etm data
  trans <- attr(data, "trans")
  etmdata <- msdata2etm(data, id)
  trans2 <- to.trans2(trans)
  from <- trans2$from[trans2$transno == transition]
  to <- trans2$to[trans2$transno == transition]

  numbers <- sapply(grid, function(x)
    table(factor(etmdata$from)[(etmdata$entry <= x & etmdata$exit > x)]))
  subevent <- sapply(grid, function(x)
    sum(etmdata$from == from & etmdata$to == to & etmdata$exit > x))
  tnumbers <- apply(numbers, 2, sum)
  weights <- sapply(1:dim(numbers)[1], function(x)
    sqrt(subevent * numbers[x, ] * (tnumbers - numbers[x, ]))/tnumbers)
  weights[is.nan(weights)] <- 0
  fn_list <- list()
  for (i in 1:dim(numbers)[1]) {
    # Take into account the distance between grids
    val <- weights[, i] * diff(c(min_time, grid))
    fn_list[[i]] <- list(fn = function(x)
      weighted.mean(abs(x), w = val, na.rm = TRUE))
    if (!is.null(other_weights)) {
      nother <- length(other_weights)
      fn_list[[i]][2:(nother + 1)] <- other_weights
    }
  }
  # Store the weights as an attribute
  attr(fn_list, "weights") <- weights
  fn_list
}

The model considered the transition from intermediate states to our absorbing state (being readmitted) is explained by the time spent in the previous health state. This covariate (time in the previous state) was shown to be statistically significant (p<.001); results indicated a longer duration spent in the first treatment is associated with increased risk of readmission. Therefore, a semi-Markov (called a Markov renewal model) or clock reset approach should be undertaken for both models. They bear the advantage that information from the process history can be included as transition-specific explanatory covariates.

#Since prtime only makes sense for transition 3 (PR → RelDeath), we need the transition-specific covariate
#of prtime for transition 3, which is prtime.3. The corresponding model is termed the ”state
#arrival extended Markov PH” model in the tutorial, and appears on the right of Table III.

ms_CONS_C1_SEP_2020_women_imputed$arrival<-ms_CONS_C1_SEP_2020_women_imputed$Tstart
ms2_CONS_C1_SEP_2020_women_imputed$arrival<-ms2_CONS_C1_SEP_2020_women_imputed$Tstart

ms_CONS_C1_SEP_2020_women_imputed_exp <- expand.covs(ms_CONS_C1_SEP_2020_women_imputed, "arrival", append = TRUE, longnames = FALSE)
ms2_CONS_C1_SEP_2020_women_imputed_exp <- expand.covs(ms2_CONS_C1_SEP_2020_women_imputed, "arrival", append = TRUE, longnames = FALSE)

Assessment of Fit of Transitions

We need to derive appropriate functional forms and define respective survival functions. One reason to favor patametric models is that they can be easier to generalize. Seven candidate distributions were considered to model transitions from Adm→Ther.Disch., Adm.→Readm. and Ther.Disch.→Readm., including Weibull (assume a monotonically increasing or decreasing hazard), Log-logistic (non-monotonic hazards), Gompertz (assume a monotonically increasing or decreasing hazard), Log-normal (non-monotonic hazards), Exponential (constant hazard), Gamma & Generalized gamma (more flexible).

The following plots fitted survival curves from each model (coloured lines), with the Kaplan-Meier estimate, in red, obtained from an example of Jackson available here, added to the contributions of Wathers & Cutler available here.

options(warn=-1)

n_iter<-10000

tiempo_antes_fits<-Sys.time()

#Weathers, Brandon and Cutler, Richard Dr., "Comparision of Survival Curves Between Cox Proportional 
#Hazards, Random Forests, and Conditional Inference Forests in Survival Analysis" (2017). All Graduate 
#Plan B and other Reports. 927. 
#https://digitalcommons.usu.edu/gradreports/927 

#<div style="border: 1px solid #ddd; padding: 5px; overflow-y: scroll; height:350px; overflow-x: scroll; width:100%">            
#https://devinincerti.com/2019/01/01/sim-mstate.html

n_trans <- max(mat_3_states, na.rm = TRUE)
fits_wei <- vector(mode = "list", length = n_trans)
fits_weiph <- vector(mode = "list", length = n_trans)
fits_llogis <- vector(mode = "list", length = n_trans)
fits_gomp <- vector(mode = "list", length = n_trans)
fits_logn <- vector(mode = "list", length = n_trans)
fits_exp <- vector(mode = "list", length = n_trans)
fits_gam <- vector(mode = "list", length = n_trans)
fits_ggam <- vector(mode = "list", length = n_trans)
fits_genf <- vector(mode = "list", length = n_trans)
fits_c_wei <- vector(mode = "list", length = n_trans)
fits_c_weiph <- vector(mode = "list", length = n_trans)
fits_c_llogis <- vector(mode = "list", length = n_trans)
fits_c_gomp <- vector(mode = "list", length = n_trans)
fits_c_logn <- vector(mode = "list", length = n_trans)
fits_c_exp <- vector(mode = "list", length = n_trans)
fits_c_gam <- vector(mode = "list", length = n_trans)
fits_c_ggam <- vector(mode = "list", length = n_trans)
fits_c_genf <- vector(mode = "list", length = n_trans)
km.lc<-list()
fits_cox<-list()
fits_c_cox<-list()
#"gengamma" Generalized gamma (stable parameterisation)
#"gengamma.orig"    Generalized gamma (original parameterisation)
#"genf" Generalized F (stable parameterisation)
#"genf.orig"    Generalized F (original parameterisation)
#"weibull"  Weibull
#"gamma"    Gamma
#"exp"  Exponential
#"lnorm"    Log-normal
#"gompertz" Gompertz

library(flexsurv)

#Specify the formula
fitform <- Surv(time, status) ~ 1

for (i in 1:n_trans){  
  fits_wei[[i]] <- flexsurvreg(formula=fitform,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "weibull")
}

for (i in 1:n_trans){  
  fits_weiph[[i]] <- flexsurvreg(formula=fitform,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "weibullph")
}

for (i in 1:n_trans){
  fits_llogis[[i]] <- flexsurvreg(formula=fitform,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "llogis")
}

for (i in 1:n_trans){
  fits_gam[[i]] <- flexsurvreg(formula=fitform,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "gamma")
}
#LOS ERRORES OCURREN CUANDO NO HAY COVARIABLES
#In (function (q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE,  ... :   NaNs produced
#gamma no funcionó: NaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs produced
for (i in 1:n_trans){
  fits_ggam[[i]] <- flexsurvreg(formula=fitform,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "gengamma")
}

for (i in 1:n_trans){
  fits_gomp[[i]] <- flexsurvreg(formula=fitform,
                                data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                                dist = "gompertz")
}


for (i in 1:n_trans){
  fits_logn[[i]] <- flexsurvreg(formula=fitform,
                                data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                                dist = "lnorm")
}

for (i in 1:n_trans){
  fits_exp[[i]] <- flexsurvreg(formula=fitform,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "exp")
}

for (i in 1:n_trans){
  fits_genf[[i]] <- flexsurvreg(formula=fitform,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "genf")
}

#_#_#_#_#_#_#_#_#_#_#_#_#_
#covariates

#Specify the formula
fitform2 <- Surv(time, status) ~  edad_al_ing_grupos+ escolaridad_rec+ sus_principal_mod+ 
         freq_cons_sus_prin+ compromiso_biopsicosocial+ tenencia_de_la_vivienda_mod+ 
         num_otras_sus_mod+ numero_de_hijos_mod_rec+ factor(tipo_de_programa_2)+ tipo_de_plan_res
fitform2_t3 <- Surv(time, status) ~  edad_al_ing_grupos+ escolaridad_rec+ sus_principal_mod+ 
         freq_cons_sus_prin+ compromiso_biopsicosocial+ tenencia_de_la_vivienda_mod+ 
         num_otras_sus_mod+ numero_de_hijos_mod_rec+ factor(tipo_de_programa_2)+ tipo_de_plan_res+ arrival

for (i in 1:(n_trans-1)){
  fits_c_wei[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "weibull")
}

for (i in 1:(n_trans-1)){
  fits_c_weiph[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "weibullph")
}

for (i in 1:(n_trans-1)){
  fits_c_llogis[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "llogis")
}

for (i in 1:(n_trans-1)){
  fits_c_gam[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "gamma")
}
#LOS ERRORES OCURREN CUANDO NO HAY COVARIABLES
#In (function (q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE,  ... :   NaNs produced
#gamma no funcionó: NaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs produced
for (i in 1:(n_trans-1)){
  fits_c_ggam[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "gengamma")
}

for (i in 1:(n_trans-1)){
  fits_c_gomp[[i]] <- flexsurvreg(formula=fitform2,
                                data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                                dist = "gompertz")
}


for (i in 1:(n_trans-1)){
  fits_c_logn[[i]] <- flexsurvreg(formula=fitform2,
                                data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                                dist = "lnorm")
}

for (i in 1:(n_trans-1)){
  fits_c_exp[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "exp")
}

for (i in 1:(n_trans-1)){
  fits_c_genf[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "genf")
}

fits_c_wei[[n_trans]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == n_trans),
                               dist = "weibull")
fits_c_weiph[[n_trans]] <- flexsurvreg(formula=fitform2_t3,
                                 data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == n_trans),
                                 dist = "weibullph")
fits_c_llogis[[n_trans]] <- flexsurvreg(formula=fitform2_t3,
                                  data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == n_trans),
                                  dist = "llogis")
fits_c_gam[[n_trans]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == n_trans),
                               dist = "gamma")
fits_c_ggam[[n_trans]] <- flexsurvreg(formula=fitform2_t3,
                                data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == n_trans),
                                dist = "gengamma")
fits_c_gomp[[n_trans]] <- flexsurvreg(formula=fitform2_t3,
                                data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == n_trans),
                                dist = "gompertz")
fits_c_logn[[n_trans]] <- flexsurvreg(formula=fitform2_t3,
                                data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == n_trans),
                                dist = "lnorm")
fits_c_exp[[n_trans]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == n_trans),
                               dist = "exp")
fits_c_genf[[n_trans]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == n_trans),
                               dist = "genf")
for (i in 1:n_trans){
km.lc[[i]] <- survfit(formula= fitform, data = subset(ms_CONS_C1_SEP_2020_women_imputed_exp, trans == i))
}
endTime <- Sys.time()
paste0("Time in process: ");endTime - tiempo_antes_fits

## [1] "Time in process: "

## Time difference of 20.90386 mins

transition_label<- c(`1`="Transition 1: Admission to Ther.Discharge",`2`="Transition 2: Admission to Readmission",`3`="Transition 3: Ther.Discharge to Readmission")

startTime <- Sys.time()
layout(matrix(1:3, nc = 1, byrow = F))
for (i in 1:n_trans){
plot(km.lc[[i]], col="red", conf.int=F, xlim=c(0,12));
  lines(fits_wei[[i]], col="coral4", ci=F);
  lines(fits_weiph[[i]], col="navyblue", ci=F);
  lines(fits_gomp[[i]], col="lightpink", ci=F);
  lines(fits_llogis[[i]], col="gray25", ci=F);#A0A36D
  lines(fits_gam[[i]], col="darkorchid4", ci=F);
  lines(fits_ggam[[i]], col="#496A72", ci=F);
  lines(fits_logn[[i]], col="gray70", ci=F);
  lines(fits_exp[[i]],col="#A0A36D", ci=F);
  lines(fits_exp[[i]],col="cadetblue", ci=F)
  
legend("bottomleft", legend = c("Kaplan-Meier","Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"), col = 
         c("red","coral4","navyblue","lightpink","gray25",#"#A0A36D","#886894",
           "darkorchid4","#496A72","gray70","#A0A36D", "cadetblue"), 
       title = "Distributions", cex = .95, bty = "n", lty=1,lwd=3)# lty = 1:2, 
title(main=transition_label[[i]])
}

Figure 10a. Vissual Assessment of Survival Curves, Three-states Model

endTime <- Sys.time()

paste0("Time in process: ");endTime - startTime

## [1] "Time in process: "

## Time difference of 0.6183422 secs

#23 min aprox.

#For more complicated models, users should specify what covariate values they want summaries for, rather than relying on the default
#</div>
options(warn=0)

options(warn=-1)

n_iter<-10000

tiempo_antes_fits<-Sys.time()

#Weathers, Brandon and Cutler, Richard Dr., "Comparision of Survival Curves Between Cox Proportional 
#Hazards, Random Forests, and Conditional Inference Forests in Survival Analysis" (2017). All Graduate 
#Plan B and other Reports. 927. 
#https://digitalcommons.usu.edu/gradreports/927 

#<div style="border: 1px solid #ddd; padding: 5px; overflow-y: scroll; height:350px; overflow-x: scroll; width:100%">            
#https://devinincerti.com/2019/01/01/sim-mstate.html

n_trans2 <- max(mat_4_states, na.rm = TRUE)
fits_wei2 <- vector(mode = "list", length = n_trans2)
fits_weiph2 <- vector(mode = "list", length = n_trans2)
fits_llogis2 <- vector(mode = "list", length = n_trans2)
fits_gomp2 <- vector(mode = "list", length = n_trans2)
fits_logn2 <- vector(mode = "list", length = n_trans2)
fits_exp2 <- vector(mode = "list", length = n_trans2)
fits_gam2 <- vector(mode = "list", length = n_trans2)
fits_ggam2 <- vector(mode = "list", length = n_trans2)
fits_genf2 <- vector(mode = "list", length = n_trans2)
fits_c_wei2 <- vector(mode = "list", length = n_trans2)
fits_c_weiph2 <- vector(mode = "list", length = n_trans2)
fits_c_llogis2 <- vector(mode = "list", length = n_trans2)
fits_c_gomp2 <- vector(mode = "list", length = n_trans2)
fits_c_logn2 <- vector(mode = "list", length = n_trans2)
fits_c_exp2 <- vector(mode = "list", length = n_trans2)
fits_c_gam2 <- vector(mode = "list", length = n_trans2)
fits_c_ggam2 <- vector(mode = "list", length = n_trans2)
fits_c_genf2 <- vector(mode = "list", length = n_trans2)

km.lc2<-list()
fits_cox2<-list()
fits_c_cox2<-list()
#"gengamma" Generalized gamma (stable parameterisation)
#"gengamma.orig"    Generalized gamma (original parameterisation)
#"genf" Generalized F (stable parameterisation)
#"genf.orig"    Generalized F (original parameterisation)
#"weibull"  Weibull
#"gamma"    Gamma
#"exp"  Exponential
#"lnorm"    Log-normal
#"gompertz" Gompertz

library(flexsurv)

#Specify the formula
fitform_4s <- Surv(time, status) ~ 1

for (i in 1:n_trans2){  
  fits_wei2[[i]] <- flexsurvreg(formula=fitform_4s,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "weibull")
}

for (i in 1:n_trans2){  
  fits_weiph2[[i]] <- flexsurvreg(formula=fitform_4s,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "weibullph")
}

for (i in 1:n_trans2){
  fits_llogis2[[i]] <- flexsurvreg(formula=fitform_4s,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "llogis")
}

for (i in 1:n_trans2){
  fits_gam2[[i]] <- flexsurvreg(formula=fitform_4s,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "gamma")
}
#LOS ERRORES OCURREN CUANDO NO HAY COVARIABLES
#In (function (q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE,  ... :   NaNs produced
#gamma no funcionó: NaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs produced
for (i in 1:n_trans2){
  fits_ggam2[[i]] <- flexsurvreg(formula=fitform_4s,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "gengamma")
}

for (i in 1:n_trans2){
  fits_gomp2[[i]] <- flexsurvreg(formula=fitform_4s,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                                dist = "gompertz")
}


for (i in 1:n_trans2){
  fits_logn2[[i]] <- flexsurvreg(formula=fitform_4s,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                                dist = "lnorm")
}

for (i in 1:n_trans2){
  fits_exp2[[i]] <- flexsurvreg(formula=fitform_4s,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "exp")
}
for (i in 1:n_trans2){
  fits_genf2[[i]] <- flexsurvreg(formula=fitform_4s,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "genf")
}

#_#_#_#_#_#_#_#_#_#_#_#_#_
#covariates

#Specify the formula
fitform2 <- Surv(time, status) ~  edad_al_ing_grupos+ escolaridad_rec+ sus_principal_mod+ 
         freq_cons_sus_prin+ compromiso_biopsicosocial+ tenencia_de_la_vivienda_mod+ 
         num_otras_sus_mod+ numero_de_hijos_mod_rec+ factor(tipo_de_programa_2)+ tipo_de_plan_res
fitform2_t3 <- Surv(time, status) ~  edad_al_ing_grupos+ escolaridad_rec+ sus_principal_mod+ 
         freq_cons_sus_prin+ compromiso_biopsicosocial+ tenencia_de_la_vivienda_mod+ 
         num_otras_sus_mod+ numero_de_hijos_mod_rec+ factor(tipo_de_programa_2)+ tipo_de_plan_res+ arrival

for (i in 1:(n_trans2-2)){
  fits_c_wei2[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "weibull")
}

for (i in 1:(n_trans2-2)){
  fits_c_weiph2[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "weibullph")
}

for (i in 1:(n_trans2-2)){
  fits_c_llogis2[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "llogis")
}

for (i in 1:(n_trans2-2)){
  fits_c_gam2[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "gamma")
}
#LOS ERRORES OCURREN CUANDO NO HAY COVARIABLES
#In (function (q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE,  ... :   NaNs produced
#gamma no funcionó: NaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs produced
for (i in 1:(n_trans2-3)){
  fits_c_ggam2[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "gengamma")
}
#https://stackoverflow.com/questions/47100140/estimating-survival-with-a-generalized-gamma-function-in-flexsurv-fails
#optim uses a finite-difference approximation for the gradient as stated in help("optim")
  fits_c_ggam2[[3]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == 3),
                               dist = "gengamma", inits=c(.0001, mean(subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == 3)[,"time"],na.rm=T)))
  
#A function of the uncensored survival times t, which returns a vector of reasonable initial values for maximum likelihood estimation of each parameter. For example, function(t){ c(1, mean(t)) } will always initialize the first of two parameters at 1, and the second (a scale parameter, for instance) at the mean of t.

for (i in 1:(n_trans2-2)){
  fits_c_gomp2[[i]] <- flexsurvreg(formula=fitform2,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                                dist = "gompertz")
}


for (i in 1:(n_trans2-2)){
  fits_c_logn2[[i]] <- flexsurvreg(formula=fitform2,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                                dist = "lnorm")
}

for (i in 1:(n_trans2-2)){
  fits_c_exp2[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "exp")
}
  
for (i in 1:(n_trans2-2)){
  fits_c_genf2[[i]] <- flexsurvreg(formula=fitform2,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i),
                               dist = "genf")
}

##4th state
fits_c_wei2[[(n_trans2-1)]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-1)),
                               dist = "weibull")
fits_c_weiph2[[(n_trans2-1)]] <- flexsurvreg(formula=fitform2_t3,
                                 data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-1)),
                                 dist = "weibullph")
fits_c_llogis2[[(n_trans2-1)]] <- flexsurvreg(formula=fitform2_t3,
                                  data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-1)),
                                  dist = "llogis")
fits_c_gam2[[(n_trans2-1)]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-1)),
                               dist = "gamma")
fits_c_ggam2[[(n_trans2-1)]] <- flexsurvreg(formula=fitform2_t3,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-1)),
                                dist = "gengamma")
fits_c_gomp2[[(n_trans2-1)]] <- flexsurvreg(formula=fitform2_t3,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-1)),
                                dist = "gompertz")
fits_c_logn2[[(n_trans2-1)]] <- flexsurvreg(formula=fitform2_t3,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-1)),
                                dist = "lnorm")
fits_c_exp2[[(n_trans2-1)]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-1)),
                               dist = "exp")
fits_c_genf2[[(n_trans2-1)]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-1)),
                               dist = "genf")
#5th state
fits_c_wei2[[n_trans2]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-0)),
                               dist = "weibull")
fits_c_weiph2[[n_trans2]] <- flexsurvreg(formula=fitform2_t3,
                                 data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-0)),
                                 dist = "weibullph")
fits_c_llogis2[[n_trans2]] <- flexsurvreg(formula=fitform2_t3,
                                  data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-0)),
                                  dist = "llogis")
fits_c_gam2[[n_trans2]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-0)),
                               dist = "gamma")
fits_c_ggam2[[n_trans2]] <- flexsurvreg(formula=fitform2_t3,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-0)),
                                dist = "gengamma")
fits_c_gomp2[[n_trans2]] <- flexsurvreg(formula=fitform2_t3,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-0)),
                                dist = "gompertz")
fits_c_logn2[[n_trans2]] <- flexsurvreg(formula=fitform2_t3,
                                data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-0)),
                                dist = "lnorm")
fits_c_exp2[[n_trans2]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-0)),
                               dist = "exp")
fits_c_genf2[[n_trans2]] <- flexsurvreg(formula=fitform2_t3,
                               data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == (n_trans2-0)),
                               dist = "genf")

for (i in 1:n_trans2){
km.lc2[[i]] <- survfit(formula= fitform_4s, data = subset(ms2_CONS_C1_SEP_2020_women_imputed_exp, trans == i))
}
endTime <- Sys.time()
paste0("Time in process: ");endTime - tiempo_antes_fits

## [1] "Time in process: "

## Time difference of 31.604 mins

transition_label_4s<- c(`1`="Transition 1: Admission to Ther.Discharge",`2`="Transition 2: Admission to Discharge w/o Clinical Advice",`3`="Transition 3: Admission to Readmission",`4`="Transition 4: Ther.Discharge to Readmission", `5`="Transition 5: Discharge w/o Clinical Advice to Readmission")

startTime <- Sys.time()
layout(matrix(1:6, nc = 2, byrow = F))
for (i in 1:n_trans2){
plot(km.lc2[[i]], col="red", conf.int=F, xlim=c(0,12));
  lines(fits_wei2[[i]], col="coral4", ci=F);
  lines(fits_weiph2[[i]], col="navyblue", ci=F);
  lines(fits_gomp2[[i]], col="lightpink", ci=F);
  lines(fits_llogis2[[i]], col="gray25", ci=F);##A0A36D
  lines(fits_gam2[[i]], col="darkorchid4", ci=F);##886894
  lines(fits_ggam2[[i]], col="#496A72", ci=F);
  lines(fits_logn2[[i]], col="gray70", ci=F);
  lines(fits_exp2[[i]],col="#A0A36D", ci=F);
  lines(fits_genf2[[i]],col="cadetblue", ci=F)

legend("bottomleft", legend = c("Kaplan-Meier","Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"), col = 
         c("red","coral4","navyblue","lightpink","gray25",#"#B89673",##886894
           "darkorchid4","#496A72","gray70","#A0A36D","cadetblue"), 
       title = "Distributions", cex = .95, bty = "n", lty=1.2,lwd=3)# lty = 1:2, 
title(main=transition_label_4s[[i]])
}
endTime <- Sys.time()

paste0("Time in process: ");endTime - startTime

## [1] "Time in process: "

## Time difference of 1.047194 secs

#23 min aprox.

#For more complicated models, users should specify what covariate values they want summaries for, rather than relying on the default
#</div>
options(warn=0)

Figure 10b. Vissual Assessment of Survival Curves, Four-states Model

Additionally, we compared the hazards of these distributions with non-parametric techniques, using a smoothed hazard function from right-censored data using kernel-based methods (as Incerti shows in his blog). The confidence interval around the predicted hazard function is estimated using bootstrapping methods of 10,000 iterations. Confidence intervals are obtained by sampling randomly from the asymptotic normal distribution of the maximum likelihood estimates and then taking quantiles (see, e.g. Mandel, 2013).

#Micha Mandel (2013) Simulation-Based Confidence Intervals for Functions With Complicated Derivatives, The American Statistician, 67:2, 76-81, DOI: 10.1080/00031305.2013.783880
# https://rpubs.com/martina_morris/testhazard
tiempo_antes_fits2<-Sys.time()

newtime2 = seq(from=1/24, to= 15, by=1/12)

#```{r fit,eval=T, echo=T, paged.print=TRUE, fig.height=13, fig.width=10, fig.align="center", results = 'asis'}
kernel_haz_est<-list()
kernel_haz<-list()
for (i in 1:n_trans){
library("muhaz")
kernel_haz_est[[i]] <- muhaz(ms_CONS_C1_SEP_2020_women_imputed_exp[which(ms_CONS_C1_SEP_2020_women_imputed_exp$trans==i),"time"],
                        ms_CONS_C1_SEP_2020_women_imputed_exp[which(ms_CONS_C1_SEP_2020_women_imputed_exp$trans==i),"status"])
kernel_haz[[i]] <- data.table(time = kernel_haz_est[[i]]$est.grid,
                         est = kernel_haz_est[[i]]$haz.est,
                         dist = "Kernel density")
}

haz_int_only<-
cbind(trans=rep(1:n_trans,each=nrow(kernel_haz[[i]])),
      rbindlist(kernel_haz))
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#create list of survreg for different transitions
#<- flexsurvreg_list(fit1, fit2, fit3)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

dists_wo_covs <- cbind.data.frame(covs=c(rep("fits_",9)),
              formal=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"),1),
              model=rep(c("wei", "weiph", "gomp", "llogis", "gam","ggam", "logn", "exp", "genf"),1))

dists_w_covs <- cbind.data.frame(covs=c(rep("fits_c_",9)),
              formal=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"),1),
              model=rep(c("wei", "weiph", "gomp", "llogis", "gam","ggam", "logn", "exp", "genf"),1))

#hr.exp <- round(exp(coef(get(paste0("fits_",dists[x,"model"]))[[y]])["groupGood"]),3)

#WO COVS
fitted_flexsurvreg2a<-data.frame()
fit_flexsurvreg2a<-data.frame()
for (y in 1:n_trans){
  for (x in 1:nrow(dists_wo_covs)){
  cat(paste0("#### Flexible Survival Model (w/o covs): ",
             dists_wo_covs[x,"formal"], "; transition: ",y, "\n \n"))
    #Return fitted survival, cumulative hazard or hazard at a series of times from a fitted model
    #
    mod_flexsurv<-paste0(dists_wo_covs[x,"covs"],dists_wo_covs[x,"model"])
    #FITTED LINES
    #Lines
    est_by_time<-
      data.table::data.table(summary(get(mod_flexsurv)[[y]], ci = F, t= newtime2, B=10000,type = "hazard", tidy=T))
    #dataframe
    fitted_flexsurvreg2a<-rbind.data.frame(fitted_flexsurvreg2a,
          cbind.data.frame(dist= rep(dists_wo_covs[x,"formal"],),
                           trans=rep(y,),
                           est_by_time))
    #
    fit_flexsurvreg2a<-rbind(fit_flexsurvreg2a,
                      cbind(dist= dists_wo_covs[x,"formal"], trans=y, 
                                                        covs="w/o covs",
                           AIC= get(mod_flexsurv)[[y]]$AIC,
                           Llik= get(mod_flexsurv)[[y]]$loglik,
                           npars= get(mod_flexsurv)[[y]]$npars,
                           pars= get(mod_flexsurv)[[y]]$AIC/2 + get(mod_flexsurv)[[y]]$loglik,
                           BIC= get(mod_flexsurv)[[y]]$loglik+ log(get(mod_flexsurv)[[y]]$N)* (get(mod_flexsurv)[[y]]$AIC/2 + get(mod_flexsurv)[[y]]$loglik)
  #the BIC may not be appropriate if none of the candidate models are considered to be close to the ‘true’ model.
                           )
                      )
    }
}

## #### Flexible Survival Model (w/o covs): Weibull (AFT); transition: 1
##  
## #### Flexible Survival Model (w/o covs): Weibull (PH); transition: 1
##  
## #### Flexible Survival Model (w/o covs): Gompertz; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Log-logistic; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Gamma; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Generalized gamma; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Lognormal; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Exponential; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Generalized F; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Weibull (AFT); transition: 2
##  
## #### Flexible Survival Model (w/o covs): Weibull (PH); transition: 2
##  
## #### Flexible Survival Model (w/o covs): Gompertz; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Log-logistic; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Gamma; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Generalized gamma; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Lognormal; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Exponential; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Generalized F; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Weibull (AFT); transition: 3
##  
## #### Flexible Survival Model (w/o covs): Weibull (PH); transition: 3
##  
## #### Flexible Survival Model (w/o covs): Gompertz; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Log-logistic; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Gamma; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Generalized gamma; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Lognormal; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Exponential; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Generalized F; transition: 3
##

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#Calculate error
haz_int_only1_binned<-
  haz_int_only %>% 
  dplyr::mutate(time=ifelse(time<.05,.05,time)) %>% 
  dplyr::group_by(trans) %>% 
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(haz_int_only$time),.05))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(trans, dist, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup() %>% 
  dplyr::select(-dist)

## `summarise()` regrouping output by 'trans', 'dist' (override with `.groups` argument)

#180 rows vs. 303 in haz int only
fitted_flexsurvreg2a_binned<-
fitted_flexsurvreg2a[,c("dist","trans","time","est")] %>% 
  dplyr::filter(time<=max(haz_int_only$time)) %>%
  dplyr::mutate(time=ifelse(time<.05,.05,time)) %>% 
  dplyr::group_by(trans, dist) %>%
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(haz_int_only$time),.05))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(trans, dist, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup()

## `summarise()` regrouping output by 'trans', 'dist' (override with `.groups` argument)

fitted_flexsurvreg2a_binned_mix<-
fitted_flexsurvreg2a_binned %>% 
  dplyr::left_join(haz_int_only1_binned, by=c("trans", "x_binned")) %>% 
  dplyr::select(-mean_time.y,-mean_time.x) %>% 
  dplyr::rename("mean_est_flexsurv"="mean_est.x","mean_est_cumhaz"="mean_est.y")

db_for_apply_rmse_a<-
  cbind.data.frame(
      dist=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"),3),
      trans=rep(c(1:3),each=9))
   
rmse_comp_fits_2a<- data.frame()
for(i in 1:nrow(db_for_apply_rmse_a)){
rmse<- Metrics::rmse(subset(fitted_flexsurvreg2a_binned_mix[complete.cases(fitted_flexsurvreg2a_binned_mix),], 
                       dist==db_for_apply_rmse_a[i,"dist"] & 
                       trans==db_for_apply_rmse_a[i,"trans"])$mean_est_flexsurv,
              subset(fitted_flexsurvreg2a_binned_mix[complete.cases(fitted_flexsurvreg2a_binned_mix),], 
                       dist==db_for_apply_rmse_a[i,"dist"] & 
                       trans==db_for_apply_rmse_a[i,"trans"])$mean_est_cumhaz)

rmse_comp_fits_2a<- rbind(rmse_comp_fits_2a,cbind(dist=db_for_apply_rmse_a[i,"dist"],
                                                  trans=db_for_apply_rmse_a[i,"trans"],
                                                  rmse=rmse))
}

rmse_comp_fits_2a<-
  rmse_comp_fits_2a %>% 
  dplyr::mutate(rmse=round(as.numeric(rmse),4)) %>% 
  dplyr::arrange(trans,rmse)

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
haz <- rbind(haz_int_only, fitted_flexsurvreg2a)

#brown burlywood3 blueviolet blue4 cadetblue4

haz_plot_int_only<-
haz %>% 
  dplyr::mutate(dist=factor(dist,levels=c("Kernel density",dists_wo_covs$formal))) %>% 
ggplot()+
    geom_line(aes(time, est, color=dist),size=1)+
    #geom_ribbon(aes(ymin=lcl, ymax=ucl), alpha=.5, linetype=0) +
    scale_color_manual(name="Distributions", values = c("black","#f54b96","#00e9b1","#69b763",
"#166000","#b27ff9","#fa863b","#013eab","#a7aa48","#b34b40","darkred"))+
                         #c("black",brewer.pal(n = 9, name = 'Paired')))+
                         #c("#112A60","#085754","#D3A347","#4F3C91","red","#112A60","#085754","#8F630D","#251363")) +
    facet_wrap(~trans,labeller = labeller(trans = transition_label))+
    sjPlot::theme_sjplot2()+
    theme(legend.position="bottom",
          strip.background = element_rect(fill = "white", colour = "white"))+
  scale_x_continuous(breaks = seq(1, 15, 2))+
  #theme(axis.text.x = element_blank(), 
  #      panel.grid.major = element_blank(), 
  #      panel.grid.minor = element_blank()) +
  labs(y="Hazard",x="Time (years)")

fit_plot_int_only<-
fit_flexsurvreg2a %>%  
  melt(id.vars=c("dist","trans","covs","npars","pars")) %>% 
  dplyr::filter(variable!="Llik") %>% 
  dplyr::mutate(dist=factor(dist, levels= dists_wo_covs$formal),
                value=as.numeric(value)) %>% 
ggplot(aes(dist, value, fill=variable))+
  geom_bar(stat= "identity")+
    #geom_ribbon(aes(ymin=lcl, ymax=ucl), alpha=.5, linetype=0) +
    scale_fill_manual(name="Indices", values = c("gray30","gray70"))+
                         #c("#112A60","#085754","#D3A347","#4F3C91","red","#112A60","#085754","#8F630D","#251363")) +
    facet_wrap(~trans,labeller = labeller(trans = transition_label))+
    sjPlot::theme_sjplot2()+
    theme(legend.position="bottom",
          axis.text.x= element_text(angle=45, vjust=0.5),
          strip.background = element_blank(),
          strip.text.x = element_blank(),
          axis.title.x=element_blank())+
  #theme(axis.text.x = element_blank(), 
  #      panel.grid.major = element_blank(), 
  #      panel.grid.minor = element_blank())+
  labs(y="Value")

haz_plot_int_only

Figure 10c. Vissual Assessment of Hazards, Three-states Model (w/o covars)

if(no_mostrar==1){
jpeg("C:/Users/andre/Desktop/SUD_CL/eso3.jpg", height=14, width= 10, res= 96, units = "in")
haz_plot_int_only
dev.off()
}

We looked over estimations and confidence intervals (made by resamples of 10,000 iterations), compared to the mentioned smooth estimate of the hazard function for censored data. The main difference is these models contain covariates of interest for the study. To extrapolate confidence intervals, we defined a different model for the third transition, including the time of arrival to the intermediate state.

options(warn=-1)

kernel_haz_est2a<-list()
kernel_haz_est2b<-list()
kernel_haz2a<-list()
kernel_haz2b<-list()
for (i in 1:n_trans){
library("muhaz")
kernel_haz_est2a[[i]] <- muhaz(ms_CONS_C1_SEP_2020_women_imputed_exp[which(ms_CONS_C1_SEP_2020_women_imputed_exp$trans==i &
                        ms_CONS_C1_SEP_2020_women_imputed_exp$tipo_de_programa_2==1),"time"],
                        ms_CONS_C1_SEP_2020_women_imputed_exp[which(ms_CONS_C1_SEP_2020_women_imputed_exp$trans==i &
                        ms_CONS_C1_SEP_2020_women_imputed_exp$tipo_de_programa_2==1),"status"])
kernel_haz2a[[i]] <- data.table(time = kernel_haz_est2a[[i]]$est.grid,
                         est = kernel_haz_est2a[[i]]$haz.est,
                         dist = "Kernel density")
kernel_haz_est2b[[i]] <- muhaz(ms_CONS_C1_SEP_2020_women_imputed_exp[which(ms_CONS_C1_SEP_2020_women_imputed_exp$trans==i &
                        ms_CONS_C1_SEP_2020_women_imputed_exp$tipo_de_programa_2==0),"time"],
                        ms_CONS_C1_SEP_2020_women_imputed_exp[which(ms_CONS_C1_SEP_2020_women_imputed_exp$trans==i &
                        ms_CONS_C1_SEP_2020_women_imputed_exp$tipo_de_programa_2==0),"status"])
kernel_haz2b[[i]] <- data.table(time = kernel_haz_est2b[[i]]$est.grid,
                         est = kernel_haz_est2b[[i]]$haz.est,
                         dist = "Kernel density")
}

haz_int_only2<-
  rbind(cbind(trans=rep(1:n_trans,each=nrow(kernel_haz2a[[i]])),
              tipo_programa=rep(1,nrow(kernel_haz2a[[i]])),
      rbindlist(kernel_haz2a)),
      cbind(trans=rep(1:n_trans,each=nrow(kernel_haz2b[[i]])),
            tipo_programa=rep(0,nrow(kernel_haz2b[[i]])),
      rbindlist(kernel_haz2b)))


# Database to contrast adjustments
newdat2 <- data.table::data.table(tipo_de_programa_2= factor(c(rep(1,1),rep(0,1))),
  #comp_status= factor(rep(c("Therapeutic discharge","Discharge without clinical advice"),2)),
  edad_al_ing_grupos= factor(rep("50+",2)),
  escolaridad_rec= factor(rep("1-More than high school",2)),
  sus_principal_mod= factor(rep("Marijuana",2)),
  freq_cons_sus_prin= factor(rep("2 to 3 days a week",2)),
  compromiso_biopsicosocial= factor(rep("1-Mild",2)),
  tenencia_de_la_vivienda_mod= factor(rep("Owner/Transferred dwellings/Pays Dividends",2)),
  num_otras_sus_mod= factor(rep("No additional substance",2)),
  numero_de_hijos_mod_rec= factor(rep("No",2)),
  tipo_de_plan_res= factor(rep("Outpatient",2)))

newdat2_t3 <- data.table::data.table(tipo_de_programa_2= factor(c(rep(1,1),rep(0,1))),
  #comp_status= factor(rep(c("Therapeutic discharge","Discharge without clinical advice"),2)),
  edad_al_ing_grupos= factor(rep("50+",2)),
  escolaridad_rec= factor(rep("1-More than high school",2)),
  sus_principal_mod= factor(rep("Marijuana",2)),
  freq_cons_sus_prin= factor(rep("2 to 3 days a week",2)),
  compromiso_biopsicosocial= factor(rep("1-Mild",2)),
  tenencia_de_la_vivienda_mod= factor(rep("Owner/Transferred dwellings/Pays Dividends",2)),
  num_otras_sus_mod= factor(rep("No additional substance",2)),
  numero_de_hijos_mod_rec= factor(rep("No",2)),
  tipo_de_plan_res= factor(rep("Outpatient",2)),
  arrival=0)

## W COVS
fitted_flexsurvreg2b<-data.frame()
fit_flexsurvreg2b<-data.frame()
for (y in 1:n_trans){
  for (x in 1:nrow(dists_w_covs)){
  cat(paste0("#### Flexible Survival Model (w/ covs): ",
             dists_w_covs[x,"formal"], "; transition: ",y, "\n \n"))
    #Return fitted survival, cumulative hazard or hazard at a series of times from a fitted model
    #
    mod_flexsurv<-paste0(dists_w_covs[x,"covs"],dists_w_covs[x,"model"])
    
    #FITTED LINES
    #Lines
      if(y==n_trans){
             est_by_time<- data.table::data.table(summary(get(mod_flexsurv)[[y]],newdata=newdat2_t3,t=newtime2,B=10000,type="hazard",tidy=T))} else{
             est_by_time<- data.table::data.table(cbind.data.frame(summary(get(mod_flexsurv)[[y]],newdata=newdat2,t=newtime2,B=10000,type="hazard",tidy=T),arrival=rep(0,)))
}
          #"survival" for survival probabilities.
          #"cumhaz" for cumulative hazards.
          #"hazard" for hazards.
          #"rmst" for restricted mean survival.
          #"mean" for mean survival.
          #"median" for median survival (alternative to type="quantile" with quantiles=0.5).
          #"quantile" for quantiles of the survival time distribution.
          #"link" for the fitted value of the location parameter (i.e. the "linear predictor")
    
    #dataframe
    fitted_flexsurvreg2b<-rbind.data.frame(fitted_flexsurvreg2b,
          cbind.data.frame(dist= rep(dists_w_covs[x,"formal"],),
                           hr= round(exp(coef(get(mod_flexsurv)[[y]])["factor(tipo_de_programa_2)1"]),3), 
                           trans=rep(y,),
                           est_by_time))
    #
    fit_flexsurvreg2b<-rbind(fit_flexsurvreg2b,
                      cbind(dist= dists_w_covs[x,"formal"], 
                            trans=y, 
                            covs="w/ covs",
                           AIC= get(mod_flexsurv)[[y]]$AIC,
                           Llik= get(mod_flexsurv)[[y]]$loglik,
                           npars= get(mod_flexsurv)[[y]]$npars,
                           pars= get(mod_flexsurv)[[y]]$AIC/2 + get(mod_flexsurv)[[y]]$loglik,
                           BIC= get(mod_flexsurv)[[y]]$loglik+ log(get(mod_flexsurv)[[y]]$N)* (get(mod_flexsurv)[[y]]$AIC/2 + get(mod_flexsurv)[[y]]$loglik)
  #the BIC may not be appropriate if none of the candidate models are considered to be close to the ‘true’ model.
                           )
                      )
    }
}

## #### Flexible Survival Model (w/ covs): Weibull (AFT); transition: 1
##  
## #### Flexible Survival Model (w/ covs): Weibull (PH); transition: 1
##  
## #### Flexible Survival Model (w/ covs): Gompertz; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Log-logistic; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Gamma; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Generalized gamma; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Lognormal; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Exponential; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Generalized F; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Weibull (AFT); transition: 2
##  
## #### Flexible Survival Model (w/ covs): Weibull (PH); transition: 2
##  
## #### Flexible Survival Model (w/ covs): Gompertz; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Log-logistic; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Gamma; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Generalized gamma; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Lognormal; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Exponential; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Generalized F; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Weibull (AFT); transition: 3
##  
## #### Flexible Survival Model (w/ covs): Weibull (PH); transition: 3
##  
## #### Flexible Survival Model (w/ covs): Gompertz; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Log-logistic; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Gamma; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Generalized gamma; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Lognormal; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Exponential; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Generalized F; transition: 3
##

tipo_programa_label<- c(`0`="General Population",`1`="Women specific") 
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
if(no_mostrar==1){
  fitted_flexsurvreg2b %>% 
    dplyr::group_by(trans) %>% 
    summarise(mean(ucl,na.rm=T))
  }
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#Calculate error
haz_int_only2_binned<-
  haz_int_only2 %>% 
  dplyr::mutate(time=ifelse(time<.5,.5,time)) %>% 
  dplyr::group_by(trans, tipo_programa) %>% 
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(haz_int_only2$time),.5))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(trans, tipo_programa, dist, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup() %>% 
  dplyr::rename("tipo_de_programa_2"="tipo_programa") %>% 
  dplyr::mutate(tipo_de_programa_2=factor(tipo_de_programa_2))%>% 
  dplyr::select(-dist)

## `summarise()` regrouping output by 'trans', 'tipo_programa', 'dist' (override with `.groups` argument)

fitted_flexsurvreg2b_binned<-
fitted_flexsurvreg2b[,c("dist","trans","time","est","factor(tipo_de_programa_2)")] %>% 
  dplyr::rename("tipo_de_programa_2"="factor(tipo_de_programa_2)") %>% 
  dplyr::filter(time<=max(haz_int_only2$time)) %>%
  dplyr::mutate(time=ifelse(time<.5,.5,time)) %>% 
  dplyr::group_by(trans, dist, tipo_de_programa_2) %>%
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(haz_int_only2$time),.5))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(trans, dist, tipo_de_programa_2, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup()

## `summarise()` regrouping output by 'trans', 'dist', 'tipo_de_programa_2' (override with `.groups` argument)

fitted_flexsurvreg2b_binned_mix<-
fitted_flexsurvreg2b_binned %>% 
  dplyr::left_join(haz_int_only2_binned, by=c("trans","tipo_de_programa_2", "x_binned")) %>% 
  dplyr::select(-mean_time.y,-mean_time.x) %>% 
  dplyr::rename("mean_est_flexsurv"="mean_est.x","mean_est_cumhaz"="mean_est.y")

db_for_apply_rmse<-
  cbind.data.frame(tipo_prog=rep(c("0","1"),each=9,3),
      dist=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"),2*3),
      trans=rep(c(1:3),each=9*2))
   
rmse_comp_fits_2b<- data.frame()
for(i in 1:nrow(db_for_apply_rmse)){
rmse<- Metrics::rmse(subset(fitted_flexsurvreg2b_binned_mix[complete.cases(fitted_flexsurvreg2b_binned_mix),], 
                     tipo_de_programa_2==db_for_apply_rmse[i,"tipo_prog"] & 
                       dist==db_for_apply_rmse[i,"dist"] & 
                       trans==db_for_apply_rmse[i,"trans"])$mean_est_flexsurv,
              subset(fitted_flexsurvreg2b_binned_mix[complete.cases(fitted_flexsurvreg2b_binned_mix),], 
                     tipo_de_programa_2==db_for_apply_rmse[i,"tipo_prog"] & 
                       dist==db_for_apply_rmse[i,"dist"] & 
                       trans==db_for_apply_rmse[i,"trans"])$mean_est_cumhaz)

rmse_comp_fits_2b<- rbind(rmse_comp_fits_2b,cbind(dist=db_for_apply_rmse[i,"dist"],
                                                  tipo_prog=db_for_apply_rmse[i,"tipo_prog"],
                                                  trans=db_for_apply_rmse[i,"trans"],
                                                  rmse=rmse))
}

rmse_comp_fits_2b<-
  rmse_comp_fits_2b %>% 
      tidyr::pivot_wider(names_from = tipo_prog, values_from = rmse) %>% 
      dplyr::rename("gp"="0","we"="1") %>% 
      dplyr::mutate(gp=as.numeric(gp),we=as.numeric(we)) %>% 
      dplyr::mutate(mean_rmse=rowSums(.[3:4])/2) %>% 
      dplyr::arrange(trans,dist, mean_rmse)%>%
      dplyr::mutate(mean_rmse=round(mean_rmse,4))

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#http://colorschemedesigner.com/csd-3.5/#
#https://www.r-graph-gallery.com/42-colors-names.html

plot_fitted_flexsurvreg2b<-
fitted_flexsurvreg2b[,c("dist","trans","time","est","lcl","ucl","factor(tipo_de_programa_2)")] %>% 
  rbind(cbind(dplyr::rename(haz_int_only2, "factor(tipo_de_programa_2)"="tipo_programa"),
              "lcl"=haz_int_only2$est,"ucl"=haz_int_only2$est)) %>% 
  dplyr::rename("tipo_de_programa_2"="factor(tipo_de_programa_2)") %>% 
  dplyr::mutate(tipo_de_programa_2=factor(tipo_de_programa_2),
                dist=factor(dist, levels=c("Kernel density",dists_w_covs$formal)),
                trans=factor(trans)) %>% 
  #dplyr::group_by(tipo_de_programa_2,dist,time,trans) %>% summarise(n=n()) %>% 
ggplot()+
    geom_line(aes(time, est, color=dist),size=1)+
    geom_ribbon(aes(x = time, ymin = lcl, ymax = ucl, fill = dist), alpha = 0.2)+
    #geom_ribbon(aes(ymin=lcl, ymax=ucl), alpha=.5, linetype=0) +
    scale_color_manual(name="Distributions", values =                         c("red","gray20","darkolivegreen","#D3A347","#4F3C91","coral4","#085754","lightpink","turquoise","navyblue")) +
    scale_fill_manual(name="Distributions", values = c("red","gray20","darkolivegreen","#D3A347","#4F3C91","coral4","#085754","lightpink","turquoise","navyblue")) +
    facet_wrap(tipo_de_programa_2~trans,labeller = labeller(trans = transition_label, tipo_de_programa_2=tipo_programa_label))+
    sjPlot::theme_sjplot2()+
    #ylim(0,.3)+
    scale_x_continuous(breaks = seq(0, 15, 2))+
    theme(legend.position="bottom",
          strip.background = element_rect(fill = "white", colour = "white"))+
    labs(y="Hazard",x="Time (years)",caption="Note. Kernel density, stratified by type of program")

plot_fitted_flexsurvreg2b

Figure 10d. Vissual Assessment of Hazards, Three-states Model (w/ covars)

## http://www.statistica.it/gianluca/teaching/r-hta-workshop/2019/Bullement.pdf
tiempo_despues_fits2<-Sys.time()

paste0("Time in process: ");tiempo_despues_fits2-tiempo_antes_fits2

## [1] "Time in process: "

## Time difference of 2.391648 mins

#13 minutos aprox. en DELL
options(warn=0)

if(no_mostrar==1){
jpeg("C:/Users/andre/Desktop/SUD_CL/eso4.jpg", height=10, width= 10, res= 96, units = "in")
plot_fitted_flexsurvreg2b
dev.off()
}

4 states

#Micha Mandel (2013) Simulation-Based Confidence Intervals for Functions With Complicated Derivatives, The American Statistician, 67:2, 76-81, DOI: 10.1080/00031305.2013.783880
# https://rpubs.com/martina_morris/testhazard
tiempo_antes_fits2<-Sys.time()

newtime2 = seq(from=1/24, to= 15, by=1/12)

kernel_haz_est_4s<-list()
kernel_haz_4s<-list()
for (i in 1:n_trans2){
library("muhaz")
kernel_haz_est_4s[[i]] <- muhaz(ms2_CONS_C1_SEP_2020_women_imputed_exp[which(ms2_CONS_C1_SEP_2020_women_imputed_exp$trans==i),"time"],
                        ms2_CONS_C1_SEP_2020_women_imputed_exp[which(ms2_CONS_C1_SEP_2020_women_imputed_exp$trans==i),"status"])
kernel_haz_4s[[i]] <- data.table(time = kernel_haz_est_4s[[i]]$est.grid,
                         est = kernel_haz_est_4s[[i]]$haz.est,
                         dist = "Kernel density")
}
#, n.est.grid=length(unique(ms2_CONS_C1_SEP_2020_women_imputed_exp[which(ms2_CONS_C1_SEP_2020_women_imputed_exp$trans==i),"time"]))/300

haz_int_only_4s<-
cbind(trans=rep(1:n_trans2,each=nrow(kernel_haz_4s[[i]])),
      rbindlist(kernel_haz_4s))
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#create list of survreg for different transitions
#<- flexsurvreg_list(fit1, fit2, fit3)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_


dists_wo_covs_4s <- cbind.data.frame(covs=c(rep("fits_",9)),
              formal=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"),1),
              model=rep(c("wei2", "weiph2", "gomp2", "llogis2", "gam2","ggam2", "logn2", "exp2", "genf2"),1))

dists_w_covs_4s <- cbind.data.frame(covs=c(rep("fits_c_",9)),
              formal=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"),1),
              model=rep(c("wei2", "weiph2", "gomp2", "llogis2", "gam2","ggam2", "logn2", "exp2", "genf2"),1))

#WO COVS
fitted_flexsurvreg2a_4s<-data.frame()
fit_flexsurvreg2a_4s<-data.frame()
for (y in 1:n_trans2){
  for (x in 1:nrow(dists_wo_covs_4s)){
  cat(paste0("#### Flexible Survival Model (w/o covs): ",
             dists_wo_covs_4s[x,"formal"], "; transition: ",y, "\n \n"))
    #Return fitted survival, cumulative hazard or hazard at a series of times from a fitted model
    #
    mod_flexsurv2<-paste0(dists_wo_covs_4s[x,"covs"],dists_wo_covs_4s[x,"model"])
    #FITTED LINES
    #Lines
    est_by_time2<-
      data.table::data.table(summary(get(mod_flexsurv2)[[y]], ci = F, t= newtime2, B=10000,type = "hazard", tidy=T))
    #dataframe
    fitted_flexsurvreg2a_4s<-rbind.data.frame(fitted_flexsurvreg2a_4s,
          cbind.data.frame(dist= rep(dists_wo_covs_4s[x,"formal"],),
                           trans=rep(y,),
                           est_by_time2))
    #
    fit_flexsurvreg2a_4s<-rbind(fit_flexsurvreg2a_4s,
                      cbind(dist= dists_wo_covs_4s[x,"formal"], trans=y, 
                                                        covs="w/o covs",
                           AIC= get(mod_flexsurv2)[[y]]$AIC,
                           Llik= get(mod_flexsurv2)[[y]]$loglik,
                           npars= get(mod_flexsurv2)[[y]]$npars,
                           pars= get(mod_flexsurv2)[[y]]$AIC/2 + get(mod_flexsurv2)[[y]]$loglik,
                           BIC= get(mod_flexsurv2)[[y]]$loglik+ log(get(mod_flexsurv2)[[y]]$N)* (get(mod_flexsurv2)[[y]]$AIC/2 + get(mod_flexsurv2)[[y]]$loglik)
  #the BIC may not be appropriate if none of the candidate models are considered to be close to the ‘true’ model.
                           )
                      )
    }
}

## #### Flexible Survival Model (w/o covs): Weibull (AFT); transition: 1
##  
## #### Flexible Survival Model (w/o covs): Weibull (PH); transition: 1
##  
## #### Flexible Survival Model (w/o covs): Gompertz; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Log-logistic; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Gamma; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Generalized gamma; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Lognormal; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Exponential; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Generalized F; transition: 1
##  
## #### Flexible Survival Model (w/o covs): Weibull (AFT); transition: 2
##  
## #### Flexible Survival Model (w/o covs): Weibull (PH); transition: 2
##  
## #### Flexible Survival Model (w/o covs): Gompertz; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Log-logistic; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Gamma; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Generalized gamma; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Lognormal; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Exponential; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Generalized F; transition: 2
##  
## #### Flexible Survival Model (w/o covs): Weibull (AFT); transition: 3
##  
## #### Flexible Survival Model (w/o covs): Weibull (PH); transition: 3
##  
## #### Flexible Survival Model (w/o covs): Gompertz; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Log-logistic; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Gamma; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Generalized gamma; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Lognormal; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Exponential; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Generalized F; transition: 3
##  
## #### Flexible Survival Model (w/o covs): Weibull (AFT); transition: 4
##  
## #### Flexible Survival Model (w/o covs): Weibull (PH); transition: 4
##  
## #### Flexible Survival Model (w/o covs): Gompertz; transition: 4
##  
## #### Flexible Survival Model (w/o covs): Log-logistic; transition: 4
##  
## #### Flexible Survival Model (w/o covs): Gamma; transition: 4
##  
## #### Flexible Survival Model (w/o covs): Generalized gamma; transition: 4
##  
## #### Flexible Survival Model (w/o covs): Lognormal; transition: 4
##  
## #### Flexible Survival Model (w/o covs): Exponential; transition: 4
##  
## #### Flexible Survival Model (w/o covs): Generalized F; transition: 4
##  
## #### Flexible Survival Model (w/o covs): Weibull (AFT); transition: 5
##  
## #### Flexible Survival Model (w/o covs): Weibull (PH); transition: 5
##  
## #### Flexible Survival Model (w/o covs): Gompertz; transition: 5
##  
## #### Flexible Survival Model (w/o covs): Log-logistic; transition: 5
##  
## #### Flexible Survival Model (w/o covs): Gamma; transition: 5
##  
## #### Flexible Survival Model (w/o covs): Generalized gamma; transition: 5
##  
## #### Flexible Survival Model (w/o covs): Lognormal; transition: 5
##  
## #### Flexible Survival Model (w/o covs): Exponential; transition: 5
##  
## #### Flexible Survival Model (w/o covs): Generalized F; transition: 5
##

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#Calculate error
haz_int_only1_4s_binned<-
  haz_int_only_4s %>% 
  dplyr::mutate(time=ifelse(time<.05,.05,time)) %>% 
  dplyr::group_by(trans) %>% 
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(haz_int_only_4s$time),.05))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(trans, dist, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup() %>% 
  dplyr::select(-dist)

## `summarise()` regrouping output by 'trans', 'dist' (override with `.groups` argument)

#180 rows vs. 303 in haz int only
fitted_flexsurvreg2a_4s_binned<-
fitted_flexsurvreg2a_4s[,c("dist","trans","time","est")] %>% 
  dplyr::filter(time<=max(haz_int_only_4s$time)) %>%
  dplyr::mutate(time=ifelse(time<.05,.05,time)) %>% 
  dplyr::group_by(trans, dist) %>%
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(haz_int_only_4s$time),.05))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(trans, dist, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup()

## `summarise()` regrouping output by 'trans', 'dist' (override with `.groups` argument)

fitted_flexsurvreg2a_4s_binned_mix<-
fitted_flexsurvreg2a_4s_binned %>% 
  dplyr::left_join(haz_int_only1_4s_binned, by=c("trans", "x_binned")) %>% 
  dplyr::select(-mean_time.y,-mean_time.x) %>% 
  dplyr::rename("mean_est_flexsurv"="mean_est.x","mean_est_cumhaz"="mean_est.y")

db_for_apply_rmse_c<-
  cbind.data.frame(
      dist=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"),5),
      trans=rep(c(1:5),each=9))
   
rmse_comp_fits_2c<- data.frame()
for(i in 1:nrow(db_for_apply_rmse_c)){
rmse<- Metrics::rmse(subset(fitted_flexsurvreg2a_4s_binned_mix[complete.cases(fitted_flexsurvreg2a_4s_binned_mix),], 
                       dist==db_for_apply_rmse_c[i,"dist"] & 
                       trans==db_for_apply_rmse_c[i,"trans"])$mean_est_flexsurv,
              subset(fitted_flexsurvreg2a_4s_binned_mix[complete.cases(fitted_flexsurvreg2a_4s_binned_mix),], 
                       dist==db_for_apply_rmse_c[i,"dist"] & 
                       trans==db_for_apply_rmse_c[i,"trans"])$mean_est_cumhaz)

rmse_comp_fits_2c<- rbind(rmse_comp_fits_2c,cbind(dist=db_for_apply_rmse_c[i,"dist"],
                                                  trans=db_for_apply_rmse_c[i,"trans"],
                                                  rmse=rmse))
}

rmse_comp_fits_2c<-
  rmse_comp_fits_2c %>% 
  dplyr::mutate(rmse=round(as.numeric(rmse),4)) %>% 
  dplyr::arrange(trans,rmse)

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
haz_4s <- rbind(haz_int_only_4s, fitted_flexsurvreg2a_4s)

#brown burlywood3 blueviolet blue4 cadetblue4

haz_plot_4s_int_only<-
haz_4s %>% 
  dplyr::mutate(dist=factor(dist,levels=c("Kernel density",dists_wo_covs_4s$formal))) %>% 
ggplot()+
    geom_line(aes(time, est, color=dist),size=1)+
    #geom_ribbon(aes(ymin=lcl, ymax=ucl), alpha=.5, linetype=0) +
    scale_color_manual(name="Distributions", values = c("black","#f54b96","#00e9b1","#69b763",
"#166000","#b27ff9","#fa863b","#013eab","#a7aa48","#b34b40","darkred"))+
                         #c("black",brewer.pal(n = 9, name = 'Paired')))+
                         #c("#112A60","#085754","#D3A347","#4F3C91","red","#112A60","#085754","#8F630D","#251363")) +
    facet_wrap(.~trans,labeller = labeller(trans = transition_label_4s), ncol=1, scales = "free_y")+
    sjPlot::theme_sjplot2()+
    theme(legend.position="bottom",
          strip.background = element_rect(fill = "white", colour = "white"))+
  scale_x_continuous(breaks = seq(1, 15, 2))+
  #ylim(0,1.25)+
  #theme(axis.text.x = element_blank(), 
  #      panel.grid.major = element_blank(), 
  #      panel.grid.minor = element_blank()) +
  labs(y="Hazard",x="Time (years)")

fit_plot_int_4s_only<-
fit_flexsurvreg2a_4s %>%  
  melt(id.vars=c("dist","trans","covs","npars","pars")) %>% 
  dplyr::filter(variable!="Llik") %>% 
  dplyr::mutate(dist=factor(dist, levels= dists_wo_covs_4s$formal),
                value=as.numeric(value)) %>% 
ggplot(aes(dist, value, fill=variable))+
  geom_bar(stat= "identity")+
    #geom_ribbon(aes(ymin=lcl, ymax=ucl), alpha=.5, linetype=0) +
    scale_fill_manual(name="Indices", values = c("gray30","gray70"))+
                         #c("#112A60","#085754","#D3A347","#4F3C91","red","#112A60","#085754","#8F630D","#251363")) +
    facet_wrap(~trans,labeller = labeller(trans = transition_label_4s), ncol=1)+
    sjPlot::theme_sjplot2()+
    theme(legend.position="bottom",
          axis.text.x= element_text(angle=45, vjust=0.5),
          strip.background = element_blank(),
          strip.text.x = element_blank(),
          axis.title.x=element_blank())+
  #theme(axis.text.x = element_blank(), 
  #      panel.grid.major = element_blank(), 
  #      panel.grid.minor = element_blank())+
  labs(y="Value")

#grid.arrange(haz_plot_4s_int_only,fit_plot_int_4s_only, heights=c(2,1.5))
haz_plot_4s_int_only

$Figure 10e. Vissual Assessment of Hazards, Four-states Model (w/o covars)$

Figure 10e. Vissual Assessment of Hazards, Four-states Model (w/o covars)

if(no_mostrar==1){
jpeg("C:/Users/andre/Desktop/SUD_CL/eso9.jpg", height=17, width= 10, res= 96, units = "in")
haz_plot_4s_int_only
#grid.arrange(haz_plot_4s_int_only,fit_plot_int_4s_only, heights=c(2,1.5))
dev.off()
}

options(warn=-1)
tiempo_antes_fits22<- Sys.time()

kernel_haz_est2a_4s<-list()
kernel_haz_est2b_4s<-list()
kernel_haz2a_4s<-list()
kernel_haz2b_4s<-list()
for (i in 1:n_trans2){
library("muhaz")
kernel_haz_est2a_4s[[i]] <- muhaz(ms2_CONS_C1_SEP_2020_women_imputed_exp[which(ms2_CONS_C1_SEP_2020_women_imputed_exp$trans==i &
                        ms2_CONS_C1_SEP_2020_women_imputed_exp$tipo_de_programa_2==1),"time"],
                        ms2_CONS_C1_SEP_2020_women_imputed_exp[which(ms2_CONS_C1_SEP_2020_women_imputed_exp$trans==i &
                        ms2_CONS_C1_SEP_2020_women_imputed_exp$tipo_de_programa_2==1),"status"])
kernel_haz2a_4s[[i]] <- data.table(time = kernel_haz_est2a_4s[[i]]$est.grid,
                         est = kernel_haz_est2a_4s[[i]]$haz.est,
                         dist = "Kernel density")
kernel_haz_est2b_4s[[i]] <- muhaz(ms2_CONS_C1_SEP_2020_women_imputed_exp[which(ms2_CONS_C1_SEP_2020_women_imputed_exp$trans==i &
                        ms2_CONS_C1_SEP_2020_women_imputed_exp$tipo_de_programa_2==0),"time"],
                        ms2_CONS_C1_SEP_2020_women_imputed_exp[which(ms2_CONS_C1_SEP_2020_women_imputed_exp$trans==i &
                        ms2_CONS_C1_SEP_2020_women_imputed_exp$tipo_de_programa_2==0),"status"])
kernel_haz2b_4s[[i]] <- data.table(time = kernel_haz_est2b_4s[[i]]$est.grid,
                         est = kernel_haz_est2b_4s[[i]]$haz.est,
                         dist = "Kernel density")
}

haz_int_only2_4s<-
  rbind(cbind(trans=rep(1:n_trans2,each=nrow(kernel_haz2a_4s[[i]])),
              tipo_programa=rep(1,nrow(kernel_haz2a_4s[[i]])),
      rbindlist(kernel_haz2a_4s)),
      cbind(trans=rep(1:n_trans2,each=nrow(kernel_haz2b_4s[[i]])),
            tipo_programa=rep(0,nrow(kernel_haz2b_4s[[i]])),
      rbindlist(kernel_haz2b_4s)))


# Database to contrast adjustments
newdat2_4s <- data.table::data.table(tipo_de_programa_2= factor(c(rep(1,1),rep(0,1))),
  #comp_status= factor(rep(c("Therapeutic discharge","Discharge without clinical advice"),2)),
  edad_al_ing_grupos= factor(rep("50+",2)),
  escolaridad_rec= factor(rep("1-More than high school",2)),
  sus_principal_mod= factor(rep("Marijuana",2)),
  freq_cons_sus_prin= factor(rep("2 to 3 days a week",2)),
  compromiso_biopsicosocial= factor(rep("1-Mild",2)),
  tenencia_de_la_vivienda_mod= factor(rep("Owner/Transferred dwellings/Pays Dividends",2)),
  num_otras_sus_mod= factor(rep("No additional substance",2)),
  numero_de_hijos_mod_rec= factor(rep("No",2)),
  tipo_de_plan_res= factor(rep("Outpatient",2)))

newdat2_t45 <- data.table::data.table(tipo_de_programa_2= factor(c(rep(1,1),rep(0,1))),
  #comp_status= factor(rep(c("Therapeutic discharge","Discharge without clinical advice"),2)),
  edad_al_ing_grupos= factor(rep("50+",2)),
  escolaridad_rec= factor(rep("1-More than high school",2)),
  sus_principal_mod= factor(rep("Marijuana",2)),
  freq_cons_sus_prin= factor(rep("2 to 3 days a week",2)),
  compromiso_biopsicosocial= factor(rep("1-Mild",2)),
  tenencia_de_la_vivienda_mod= factor(rep("Owner/Transferred dwellings/Pays Dividends",2)),
  num_otras_sus_mod= factor(rep("No additional substance",2)),
  numero_de_hijos_mod_rec= factor(rep("No",2)),
  tipo_de_plan_res= factor(rep("Outpatient",2)),
  arrival=0)

## W COVS
fitted_flexsurvreg2b_4s<-data.frame()
fit_flexsurvreg2b_4s<-data.frame()
for (y in 1:n_trans2){
  for (x in 1:nrow(dists_w_covs_4s)){
  cat(paste0("#### Flexible Survival Model (w/ covs): ",
             dists_w_covs[x,"formal"], "; transition: ",y, "\n \n"))
    #Return fitted survival, cumulative hazard or hazard at a series of times from a fitted model
    #
    mod_flexsurv2<-paste0(dists_w_covs_4s[x,"covs"],dists_w_covs_4s[x,"model"])
    
    #FITTED LINES
    #Lines
      if(y%in% c(4,5)){
             est_by_time2<- data.table::data.table(summary(get(mod_flexsurv2)[[y]],newdata=newdat2_t45,t=newtime2,B=10000,type="hazard",tidy=T))} else{
             est_by_time2<- data.table::data.table(cbind.data.frame(summary(get(mod_flexsurv2)[[y]],newdata=newdat2_4s,t=newtime2,B=10000,type="hazard",tidy=T),arrival=rep(0,)))
}
          #"survival" for survival probabilities.
          #"cumhaz" for cumulative hazards.
          #"hazard" for hazards.
          #"rmst" for restricted mean survival.
          #"mean" for mean survival.
          #"median" for median survival (alternative to type="quantile" with quantiles=0.5).
          #"quantile" for quantiles of the survival time distribution.
          #"link" for the fitted value of the location parameter (i.e. the "linear predictor")
    
    #dataframe
    fitted_flexsurvreg2b_4s<-rbind.data.frame(fitted_flexsurvreg2b_4s,
          cbind.data.frame(dist= rep(dists_w_covs_4s[x,"formal"],),
                           hr= round(exp(coef(get(mod_flexsurv2)[[y]])["factor(tipo_de_programa_2)1"]),3), 
                           trans=rep(y,),
                           est_by_time2))
    #
    fit_flexsurvreg2b_4s<-rbind(fit_flexsurvreg2b_4s,
                      cbind(dist= dists_w_covs_4s[x,"formal"], 
                            trans=y, 
                            covs="w/ covs",
                           AIC= get(mod_flexsurv2)[[y]]$AIC,
                           Llik= get(mod_flexsurv2)[[y]]$loglik,
                           npars= get(mod_flexsurv2)[[y]]$npars,
                           pars= get(mod_flexsurv2)[[y]]$AIC/2 + get(mod_flexsurv2)[[y]]$loglik,
                           BIC= get(mod_flexsurv2)[[y]]$loglik+ log(get(mod_flexsurv2)[[y]]$N)* (get(mod_flexsurv2)[[y]]$AIC/2 + get(mod_flexsurv2)[[y]]$loglik)
  #the BIC may not be appropriate if none of the candidate models are considered to be close to the ‘true’ model.
                           )
                      )
    }
}

## #### Flexible Survival Model (w/ covs): Weibull (AFT); transition: 1
##  
## #### Flexible Survival Model (w/ covs): Weibull (PH); transition: 1
##  
## #### Flexible Survival Model (w/ covs): Gompertz; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Log-logistic; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Gamma; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Generalized gamma; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Lognormal; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Exponential; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Generalized F; transition: 1
##  
## #### Flexible Survival Model (w/ covs): Weibull (AFT); transition: 2
##  
## #### Flexible Survival Model (w/ covs): Weibull (PH); transition: 2
##  
## #### Flexible Survival Model (w/ covs): Gompertz; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Log-logistic; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Gamma; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Generalized gamma; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Lognormal; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Exponential; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Generalized F; transition: 2
##  
## #### Flexible Survival Model (w/ covs): Weibull (AFT); transition: 3
##  
## #### Flexible Survival Model (w/ covs): Weibull (PH); transition: 3
##  
## #### Flexible Survival Model (w/ covs): Gompertz; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Log-logistic; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Gamma; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Generalized gamma; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Lognormal; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Exponential; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Generalized F; transition: 3
##  
## #### Flexible Survival Model (w/ covs): Weibull (AFT); transition: 4
##  
## #### Flexible Survival Model (w/ covs): Weibull (PH); transition: 4
##  
## #### Flexible Survival Model (w/ covs): Gompertz; transition: 4
##  
## #### Flexible Survival Model (w/ covs): Log-logistic; transition: 4
##  
## #### Flexible Survival Model (w/ covs): Gamma; transition: 4
##  
## #### Flexible Survival Model (w/ covs): Generalized gamma; transition: 4
##  
## #### Flexible Survival Model (w/ covs): Lognormal; transition: 4
##  
## #### Flexible Survival Model (w/ covs): Exponential; transition: 4
##  
## #### Flexible Survival Model (w/ covs): Generalized F; transition: 4
##  
## #### Flexible Survival Model (w/ covs): Weibull (AFT); transition: 5
##  
## #### Flexible Survival Model (w/ covs): Weibull (PH); transition: 5
##  
## #### Flexible Survival Model (w/ covs): Gompertz; transition: 5
##  
## #### Flexible Survival Model (w/ covs): Log-logistic; transition: 5
##  
## #### Flexible Survival Model (w/ covs): Gamma; transition: 5
##  
## #### Flexible Survival Model (w/ covs): Generalized gamma; transition: 5
##  
## #### Flexible Survival Model (w/ covs): Lognormal; transition: 5
##  
## #### Flexible Survival Model (w/ covs): Exponential; transition: 5
##  
## #### Flexible Survival Model (w/ covs): Generalized F; transition: 5
##

#### Flexible Survival Model (w/ covs): Exponential; transition: 2
tipo_programa_label<- c(`0`="General Population",`1`="Women specific") 
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
if(no_mostrar==1){
  fitted_flexsurvreg2b_4s %>% 
    dplyr::group_by(trans) %>% 
    summarise(mean(ucl,na.rm=T))
  }
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#Calculate error
haz_int_only2_binned<-
  haz_int_only2_4s %>% 
  dplyr::mutate(time=ifelse(time<.5,.5,time)) %>% 
  dplyr::group_by(trans, tipo_programa) %>% 
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(haz_int_only2_4s$time),.5))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(trans, tipo_programa, dist, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup() %>% 
  dplyr::rename("tipo_de_programa_2"="tipo_programa") %>% 
  dplyr::mutate(tipo_de_programa_2=factor(tipo_de_programa_2))%>% 
  dplyr::select(-dist)

## `summarise()` regrouping output by 'trans', 'tipo_programa', 'dist' (override with `.groups` argument)

fitted_flexsurvreg2b_4s_binned<-
fitted_flexsurvreg2b_4s[,c("dist","trans","time","est","factor(tipo_de_programa_2)")] %>% 
  dplyr::rename("tipo_de_programa_2"="factor(tipo_de_programa_2)") %>% 
  dplyr::filter(time<=max(haz_int_only2_4s$time)) %>%
  dplyr::mutate(time=ifelse(time<.5,.5,time)) %>% 
  dplyr::group_by(trans, dist, tipo_de_programa_2) %>%
  dplyr::mutate(x_binned = cut(time, 
          breaks= seq(0,max(haz_int_only2_4s$time),.5))) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(trans, dist, tipo_de_programa_2, x_binned) %>% 
  dplyr::summarise(mean_time=mean(time,na.rm=T),mean_est=mean(est,na.rm=T)) %>% 
  dplyr::ungroup()

## `summarise()` regrouping output by 'trans', 'dist', 'tipo_de_programa_2' (override with `.groups` argument)

fitted_flexsurvreg2b_4s_binned_mix<-
fitted_flexsurvreg2b_4s_binned %>% 
  dplyr::left_join(haz_int_only2_binned, by=c("trans","tipo_de_programa_2", "x_binned")) %>% 
  dplyr::select(-mean_time.y,-mean_time.x) %>% 
  dplyr::rename("mean_est_flexsurv"="mean_est.x","mean_est_cumhaz"="mean_est.y")

db_for_apply_rmse_d<-
  cbind.data.frame(tipo_prog=rep(c("0","1"),each=9,5),
      dist=rep(c("Weibull (AFT)", "Weibull (PH)", "Gompertz", "Log-logistic", "Gamma",
                "Generalized gamma", "Lognormal", "Exponential", "Generalized F"),2*5),
      trans=rep(c(1:5),each=9*2))
   
rmse_comp_fits_2b_4s<- data.frame()
for(i in 1:nrow(db_for_apply_rmse_d)){
rmse<- Metrics::rmse(subset(fitted_flexsurvreg2b_4s_binned_mix[complete.cases(fitted_flexsurvreg2b_4s_binned_mix),], 
                     tipo_de_programa_2==db_for_apply_rmse_d[i,"tipo_prog"] & 
                       dist==db_for_apply_rmse_d[i,"dist"] & 
                       trans==db_for_apply_rmse_d[i,"trans"])$mean_est_flexsurv,
              subset(fitted_flexsurvreg2b_4s_binned_mix[complete.cases(fitted_flexsurvreg2b_4s_binned_mix),], 
                     tipo_de_programa_2==db_for_apply_rmse_d[i,"tipo_prog"] & 
                       dist==db_for_apply_rmse_d[i,"dist"] & 
                       trans==db_for_apply_rmse_d[i,"trans"])$mean_est_cumhaz)

rmse_comp_fits_2b_4s<- rbind(rmse_comp_fits_2b_4s,cbind(dist=db_for_apply_rmse_d[i,"dist"],
                                                  tipo_prog=db_for_apply_rmse_d[i,"tipo_prog"],
                                                  trans=db_for_apply_rmse_d[i,"trans"],
                                                  rmse=rmse))
}

rmse_comp_fits_2b_4s<-
  rmse_comp_fits_2b_4s %>% 
      tidyr::pivot_wider(names_from = tipo_prog, values_from = rmse) %>% 
      dplyr::rename("gp"="0","we"="1") %>% 
      dplyr::mutate(gp=as.numeric(gp),we=as.numeric(we)) %>% 
      dplyr::mutate(mean_rmse=rowSums(.[3:4])/2) %>% 
      dplyr::arrange(trans,dist, mean_rmse)%>%
      dplyr::mutate(mean_rmse=round(mean_rmse,4))

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

#http://colorschemedesigner.com/csd-3.5/#
#https://www.r-graph-gallery.com/42-colors-names.html

plot_fitted_flexsurvreg2b_4s<-
fitted_flexsurvreg2b_4s[,c("dist","trans","time","est","lcl","ucl","factor(tipo_de_programa_2)")] %>% 
  rbind(cbind(dplyr::rename(haz_int_only2_4s, "factor(tipo_de_programa_2)"="tipo_programa"),
              "lcl"=haz_int_only2_4s$est,"ucl"=haz_int_only2_4s$est)) %>% 
  dplyr::rename("tipo_de_programa_2"="factor(tipo_de_programa_2)") %>% 
  dplyr::mutate(tipo_de_programa_2=factor(tipo_de_programa_2),
                dist=factor(dist, levels=c("Kernel density",dists_w_covs$formal)),
                trans=factor(trans)) %>% 
  #dplyr::group_by(tipo_de_programa_2,dist,time,trans) %>% summarise(n=n()) %>% 
ggplot()+
    geom_line(aes(time, est, color=dist),size=1)+
    geom_ribbon(aes(x = time, ymin = lcl, ymax = ucl, fill = dist), alpha = 0.2)+
    #geom_ribbon(aes(ymin=lcl, ymax=ucl), alpha=.5, linetype=0) +
    scale_color_manual(name="Distributions", values =                         c("red","gray20","darkolivegreen","#D3A347","#4F3C91","coral4","#085754","lightpink","turquoise","greenyellow")) + 
    scale_fill_manual(name="Distributions", values = c("red","gray20","darkolivegreen","#D3A347","#4F3C91","coral4","#085754","lightpink","turquoise","greenyellow")) +
    facet_wrap(tipo_de_programa_2~trans,labeller = labeller(trans = transition_label_4s, tipo_de_programa_2=tipo_programa_label),ncol=2, dir="v", scales="free_y")+
    sjPlot::theme_sjplot2()+
    #ylim(0,.3)+
    scale_x_continuous(breaks = seq(0, 15, 2))+
    theme(legend.position="bottom",
          strip.background = element_rect(fill = "white", colour = "white"))+
    labs(y="Hazard",x="Time (years)",caption="Note. Kernel density, stratified by type of program")

plot_fitted_flexsurvreg2b_4s

Figure 10f. Vissual Assessment of Hazards, Four-states Model (w/ covars)

## http://www.statistica.it/gianluca/teaching/r-hta-workshop/2019/Bullement.pdf
tiempo_despues_fits22<-Sys.time()

paste0("Time in process: ");tiempo_despues_fits22-tiempo_antes_fits22

## [1] "Time in process: "

## Time difference of 3.624504 mins

#13 minutos aprox. en DELL
options(warn=0)

if(no_mostrar==1){
jpeg("C:/Users/andre/Desktop/SUD_CL/eso13.jpg", height=17, width= 10, res= 96, units = "in")
plot_fitted_flexsurvreg2b_4s
dev.off()
}

Session Info

Sys.getenv("R_LIBS_USER")

## [1] "C:/Users/CISS Fondecyt/OneDrive/Documentos/R/win-library/4.0"

rstudioapi::getSourceEditorContext()

## Document Context: 
## - id:        'DF5EAB49'
## - path:      'G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/SUD_CL/Proyecto_carla3.Rmd'
## - contents:  <6270 rows>
## Document Selection:
## - [4320, 89] -- [4320, 89]: ''

#save.image("G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/mult_state_carla.RData")

tryCatch(
  save.image("G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/mult_state_carla.RData"),
  error = function(e) {
  save.image("C:/Users/andre/Desktop/SUD_CL/mult_state_carla.RData")
  })

tryCatch(
CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados%>%
  dplyr::arrange(hash_key, desc(fech_ing))%>% 
  #rio::export(file = "G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/mult_state_carla.dta")
  rio::export(file = "C:/Users/andre/Desktop/SUD_CL/mult_state_carla.dta"),
error = function(e) {
CONS_C1_df_dup_SEP_2020_women_miss_after_imp_conservados%>%
  dplyr::arrange(hash_key, desc(fech_ing))%>% 
  rio::export(file = "G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/mult_state_carla.dta")  
  })

sessionInfo()

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Spanish_Chile.1252  LC_CTYPE=Spanish_Chile.1252   
## [3] LC_MONETARY=Spanish_Chile.1252 LC_NUMERIC=C                  
## [5] LC_TIME=Spanish_Chile.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] Epi_2.40                cmprsk_2.2-10           lubridate_1.7.9        
##  [4] compareGroups_4.4.5     gurobi_9.1-0            radiant.update_1.4.1   
##  [7] Metrics_0.1.4           muhaz_1.2.6.1           flexsurv_2.0           
## [10] DiagrammeR_1.0.6.1.9000 Amelia_1.7.6            Rcpp_1.0.5             
## [13] mstate_0.3.1            igraph_1.2.6            eha_2.8.1              
## [16] cobalt_4.2.3            sensitivityfull_1.5.6   sensitivity2x2xk_1.01  
## [19] MatchIt_3.0.2           tableone_0.12.0         stargazer_5.2.2        
## [22] reshape2_1.4.4          exactRankTests_0.8-31   gridExtra_2.3          
## [25] foreign_0.8-80          glpkAPI_1.3.2           designmatch_0.3.1      
## [28] Rglpk_0.6-4             slam_0.1-47             MASS_7.3-51.6          
## [31] survMisc_0.5.5          ggfortify_0.4.10        rateratio.test_1.0-2   
## [34] survminer_0.4.8         ggpubr_0.4.0            epiR_1.0-15            
## [37] forcats_0.5.0           purrr_0.3.4             readr_1.3.1            
## [40] tibble_3.0.3            tidyverse_1.3.0         treemapify_2.5.3       
## [43] ggiraph_0.7.0           chilemapas_0.2          sf_0.9-3               
## [46] finalfit_1.0.1          lsmeans_2.30-0          emmeans_1.4.8          
## [49] choroplethrAdmin1_1.1.1 choroplethrMaps_1.0.1   choroplethr_3.6.3      
## [52] acs_2.1.4               XML_3.99-0.3            RColorBrewer_1.1-2     
## [55] panelr_0.7.3            lme4_1.1-23             Matrix_1.2-18          
## [58] dplyr_1.0.1             data.table_1.13.0       codebook_0.9.2         
## [61] devtools_2.3.0          usethis_1.6.1           sqldf_0.4-11           
## [64] RSQLite_2.2.0           gsubfn_0.7              proto_1.0.0            
## [67] broom_0.7.0             zoo_1.8-8               altair_4.0.1           
## [70] rbokeh_0.5.1            janitor_2.0.1           plotly_4.9.2.1         
## [73] kableExtra_1.1.0        Hmisc_4.4-0             Formula_1.2-3          
## [76] survival_3.1-12         lattice_0.20-41         ggplot2_3.3.2          
## [79] stringr_1.4.0           stringi_1.4.6           tidyr_1.1.1            
## [82] knitr_1.29              matrixStats_0.56.0      boot_1.3-25            
## 
## loaded via a namespace (and not attached):
##   [1] class_7.3-17        ps_1.3.3            rprojroot_1.3-2    
##   [4] crayon_1.3.4        V8_3.1.0            nlme_3.1-148       
##   [7] backports_1.1.7     reprex_0.3.0        rlang_0.4.7        
##  [10] readxl_1.3.1        performance_0.4.8   nloptr_1.2.2.2     
##  [13] callr_3.4.3         flextable_0.5.10    rjson_0.2.20       
##  [16] ggmap_3.0.0         bit64_0.9-7         glue_1.4.1         
##  [19] sjPlot_2.8.4        parallel_4.0.2      processx_3.4.3     
##  [22] classInt_0.4-3      tcltk_4.0.2         haven_2.3.1        
##  [25] tidyselect_1.1.0    km.ci_0.5-2         rio_0.5.16         
##  [28] sjmisc_2.8.5        chron_2.3-55        xtable_1.8-4       
##  [31] magrittr_1.5        evaluate_0.14       gdtools_0.2.2      
##  [34] RgoogleMaps_1.4.5.3 cli_2.0.2           rstudioapi_0.11    
##  [37] sp_1.4-2            rpart_4.1-15        jtools_2.0.5       
##  [40] sjlabelled_1.1.6    RJSONIO_1.3-1.4     maps_3.3.0         
##  [43] gistr_0.5.0         xfun_0.16           parameters_0.8.2   
##  [46] pkgbuild_1.1.0      cluster_2.1.0       ggfittext_0.9.0    
##  [49] png_0.1-7           withr_2.2.0         bitops_1.0-6       
##  [52] plyr_1.8.6          cellranger_1.1.0    e1071_1.7-3        
##  [55] survey_4.0          coda_0.19-3         pillar_1.4.6       
##  [58] multcomp_1.4-13     fs_1.5.0            vctrs_0.3.2        
##  [61] ellipsis_0.3.1      generics_0.0.2      rgdal_1.5-8        
##  [64] tools_4.0.2         munsell_0.5.0       compiler_4.0.2     
##  [67] pkgload_1.1.0       abind_1.4-5         tigris_0.9.4       
##  [70] sessioninfo_1.1.1   visNetwork_2.0.9    jsonlite_1.7.0     
##  [73] WDI_2.6.0           scales_1.1.1        carData_3.0-4      
##  [76] estimability_1.3    lazyeval_0.2.2      car_3.0-8          
##  [79] latticeExtra_0.6-29 reticulate_1.16     effectsize_0.3.2   
##  [82] checkmate_2.0.0     rmarkdown_2.6       openxlsx_4.1.5     
##  [85] sandwich_2.5-1      statmod_1.4.34      webshot_0.5.2      
##  [88] pander_0.6.3        numDeriv_2016.8-1.1 yaml_2.2.1         
##  [91] systemfonts_0.2.3   htmltools_0.5.0     memoise_1.1.0      
##  [94] quadprog_1.5-8      viridisLite_0.3.0   jsonvalidate_1.1.0 
##  [97] digest_0.6.25       assertthat_0.2.1    rappdirs_0.3.1     
## [100] repr_1.1.0          bayestestR_0.7.2    BiasedUrn_1.07     
## [103] KMsurv_0.1-5        units_0.6-6         remotes_2.2.0      
## [106] blob_1.2.1          labeling_0.3        deSolve_1.28       
## [109] splines_4.0.2       hms_0.5.3           rmapshaper_0.4.4   
## [112] modelr_0.1.8        colorspace_1.4-1    base64enc_0.1-3    
## [115] nnet_7.3-14         mvtnorm_1.1-1       fansi_0.4.1        
## [118] truncnorm_1.0-8     R6_2.4.1            grid_4.0.2         
## [121] crul_0.9.0          lifecycle_0.2.0     acepack_1.4.1      
## [124] labelled_2.5.0      zip_2.1.1           writexl_1.3        
## [127] curl_4.3            geojsonlint_0.4.0   ggsignif_0.6.0     
## [130] pryr_0.1.4          minqa_1.2.4         testthat_2.3.2     
## [133] snakecase_0.11.0    desc_1.2.0          TH.data_1.0-10     
## [136] htmlwidgets_1.5.1   officer_0.3.13      crosstalk_1.1.0.1  
## [139] mgcv_1.8-31         rvest_0.3.6         insight_0.9.0      
## [142] htmlTable_2.0.1     codetools_0.2-16    prettyunits_1.1.1  
## [145] dbplyr_1.4.4        vegawidget_0.3.1    gtable_0.3.0       
## [148] DBI_1.1.0           etm_1.1             httr_1.4.2         
## [151] highr_0.8           KernSmooth_2.23-17  farver_2.0.3       
## [154] uuid_0.1-4          hexbin_1.28.1       mice_3.11.0        
## [157] xml2_1.3.2          ggeffects_0.15.1    bit_1.1-15.2       
## [160] sjstats_0.18.0      jpeg_0.1-8.1        pkgconfig_2.0.3    
## [163] maptools_1.0-1      rstatix_0.6.0       mitools_2.4        
## [166] HardyWeinberg_1.6.6 Rsolnp_1.16         httpcode_0.3.0

Readmission risk among women in women-only versus mixed gender treatment programs for substance use disorder in Chile (1st step)

22 mayo, 2021

Steps

Objectives

Database consolidation

Imputation

Age at Admission to Treatment (in groups)

Primary or main substance

Co-occurring SUD

Frequency of Use of the Primary Substance at Admission

Educational Attainment

Cause of Discharge

Biopsychosocial involvement

Tenure status of households

Number of children (max. Value) (Dichotomized)

Type of Program

Type of Plan

Sample Characteristics

Final Sample

Survival analysis

Incidence rate

Parametric regression of the risk of Readmission, Stratified by Treatment Outcome

Competing risks

Multi-state

Transition matrix

Consideration of the Appropriateness of the proportional hazards assumption

Decision whether to use Markov or Semi-Markov

Assessment of Fit of Transitions

4 states

Session Info