SISTRAT- Fondecyt 1191282
SUD treatment and ER admissions, hospitalizations, and death among adult patients in Chile
Welcome to the repositories of the construction of the treatment information system (SISTRAT) datasets. On this repository you can find the different processes and actions taken to standardize and prepare the data for the analysis of the investigators of the project.
SISTRAT Datasets
This page is composed by the following main topics:
Encryption of RUTs and Generation of HASHs
Data Preparation and Standardization of C1
Associations & Analytic Exercises
3a.1. Effect of residential versus ambulatory treatment for substance use disorders on readmission risk in a register-based national retrospective cohort- Main analyses
3a.1.1.a Step 1: Imputation, Matching & set the database
3a.1.1.b Stata, Step 1: Set the database, AJ estimates
3a.1.2.b Stata, Step 2: Compute Transition Probabilities
3a.1.3.b Stata, Step 3: Summarise estimates in tables and export
3a.1.4.b Stata, Step 4: Plot probabilities and differences
3a.1.5.b Stata, Step 5: CI 83% transitions 3 & 4
3a.1.6.b Stata, Step 6: Predicted survival curves for stratified and clustered cox model with time-varying coefficients
3a.1.7.a Step 5: Sensitivity analyses, Sankey, Transition Trees
3a.1.8.a Figures to paper
3a.2. Effect of residential versus ambulatory treatment for substance use disorders on readmission risk in a register-based national retrospective cohort- Supplemental analyses
3a.2.0.a Step 1: Explore relationships, Matching, set multistate framework, check PH
3a.2.1.b Stata, Step 1: Set the database, AJ estimates (complete-cases)
3a.2.2.b Stata, Step 2: Compute transition probabilities (complete-cases)
3a.2.3.b Stata, Step 3: Summarise estimates in tables and export (complete-cases)
3a.2.4.b Stata, Step 2b: Compute transition probabilities (Royston Parmar)
3a.2.5.b Stata, Step 3b: Summarise estimates in tables and export (Royston Parmar)
3a.2.1.a Step 1.25: Markovianity
3a.2.2.a Step 2: Cumulative Hazards, Landmark Aalen-Johansen Estimator
3a.2.3.a Step 3: AFT & hazards by transitions & transformation from AFT to HRs
3a.2.4.b Stata, Step 6: Frailty with Royston-Parmar & cluestered by ID & match
3b.1. Treatment outcome and readmission risk among women in women-only versus mixed-gender drug treatment programs in Chile- Main
3b.2. Treatment outcome and readmission risk among women in women-only versus mixed-gender drug treatment programs in Chile- Supplemental
3c. Living with ( consolidation )
Data Preparation and Standardization of TOP or Profile of Treatment Results
Chilean prosecutor’s office Data merge
Webinar “¿Qué sabemos de los programas de tratamiento de drogas en Chile? (What do we know about Chilean substance use treatments?)
The main processes are summarized in the following figures.
To open in a new window
FONDECYT Analyst Dataset for query Up to this point, we must have the events within each treatment differentiated and without duplicated events Ask SENDAs Professional Analyse the origin of discrpancies Does it comes from an error of encryption, or by an error of SENDAs dataset? Contact developer of encryption Dataset with duplicated data HASH-KEYs (Masked ID) w/ more than one SENDAs ID? Identify duplicated treatments Distinguish entries by unique events within treatments Institutional validations of SENDAs professional Duplicated/ Overlapped entries Send doubts to SENDAs professional Add to a Dataset Normalization of Dataset & Cleansing in Relevant Variables Specific Goal N° 1 Discard Does the discrpenacy affect the identification of unique users, treatments and state of treatments? Approximate them until reaching a criteria that identify each user effectivelly Establish the criteria that identify each user with more confidence Changes in the application for the retrieval of IDs from DEIS datasets ►ACTIONS (e.g.,) - Define variables, - Standardize dates, - Standardize programs & plans, - Correct ages, - Gender & Sex related to plans - Normalize Days of treatment See the origin and ask to third parties E.g., through Probabilistic Match ● Must consider that: 1- Users can have more than one admissions and treatments, but some of them can be duplicated due to insufficient or wrong information that in a next entry would be completed. 2- Discarded information should be available in a separated dataset, to query in case we should impute values of other variables. 3- Must consider the latest registry of admission (understood as the registry that contains a date of discharge, comes from a recent yearly dataset, or maybe, the entry in this yearly dataset that comes last, in equal conditions). ►INVARIANT TO USER: - HASH-Key (hash_key) - Sex (sexo_2) - Age (edad) - Nationality (nacionalidad) ►INVARIANT TO TREATMENTS: - Center ID (id_centro) - Motive of Admission (origen_de_ingreso) - Date of Admission(fech_ing) ►VARIANT: - Treatment Days (dias_trat) - Date of Discharge (fech_egres) - Educational Attainment (educacion) - No. of Children Describe the incidence rate of readmissions by health conditions, in every admitted to a public tratamiento between the study period, comparing these rates with the general population with similar demographic characteristics VARIABLES ►Outcomes: - Readmission to treatments. ►Exposure: - Treatment Outcome (administrative discharge, early or late drop-out, referral) - Identify referrals that are part of the same treatment ►Effect-modifiers: - Sex - Age - Substance of Admission (e.g., polydrug user) - Type of treatment plan or program ►Covariates: - Marital Status - Educational Attainment - Occupational Status - Age of Onset of Drug Use - Frequency of Consumption of the Main Substance - Motive of Admission to Treatment - Psychiatric Comorbidity - Region - Type of treatment Yes No No Yes Yes Yes No Keep corrected entries Logical & Probabilistic Imputation Collapse events within treatments into individual treatments IT Professional Generate modifications to the encrypter SENDAs Professional Original ID in the Original Dataset Send an e-mail w/ discrepancies Validations of entries in previous datasets Protocols, algorithms & institutional procedures of case examination Cases with row numbers and user's identifiers from the processed dataset by FONDECYT professionals, will be contrasted with original ID Fase 1= Entries with Unique ID's Fase 2 = Generate Entries of Unique Events Fase 3 = Data Cleaning and Generation of Unique Treatments No
Diagram
To open in a new window
SUD treatments from different yearly datasets Discard cases that share the same values in 103 variables SUD treatments once duplicated variables were discarded SUD treatments w/ different values in 13 variables Discard cases that share the same values in 13 variables Discard case that was in a type of plan under probation/parol e Invalid or Missing Ages were filled w/ information of TOP datasets Defined unique dates of birth for users that had more than one Standardized & normalized variables relevant for the study Unique combination of HASH & Date of Admission Entries w/ same HASH Key & Date of Admission Discarded 417 entries of 381 distinct HASHs & Dates of admission Kept most recent cases, except for ~3 cases After 1st discard or kept earliest treatments in overlapped cases overlappings in treatment ranges & same HASH Key Different admissions of each user w/o overlaps n= 117,388 overlappings in treatment ranges & same HASH Key Changed the Date of Discharge of Negative Days of Treatment Cases w/ different values in normalized & standardized variables Cases w/ different values in 17 normalized & standardized variables After application of criteria provided by SENDA professional Data Editing/ Wrangling Normalization of more than one User invariant-values in users Different admissions of each user w/o overlaps and valid cases n= 117,212 Normalization of more than one User invariant-values in users (more complex) Correction of ties in values of variables of Rule-Based Replacements Standrardizatio n of Variables to provide for internal studies of the project Delete entries with >1095 days of treatment Database w/o overlaps or intermediate events marked by referrals n= 109,756 <45 days of difference w/ a posterior entry & Referral as a cause of discharge Discard cases with >1095 days of treatment (& no intermediate treatments) 63,206 cases are entries that repeat the same information in two rows 7,305 cases are entries that repeat the same information in three rows 1,116 cases are entries that repeat the same information in four rows 175 cases are entries that repeat the same information in five rows 48 cases are entries that repeat the same information in six rows 7 cases are entries that repeat the same information in seven rows Age, SENDA ID, Date of Birth, Type of Plan, Age of Onset of Drug Use, Pregnancy Status, Days of Treatment, Main Substance of Consumption, Other Substances (1, 2 and 3), Starting Substance, Marital Status, Occupational Status, Occupational Category, Motive of Admission to Treatment, Educational Attainment, Route of Administration of Main Substance, Frequency of Consumption of the Main Substance. The rows that corresponded to particular cases: - 4118 and 4842, discarded the first - 38147 and 38755,discarded the fir st - 9875 and 7161, discarded the last - Imputed treatment days, and replaced the date of discharge - Kept the earliest treatment - Discarded the earliest treatment - Substracted days from the date of discharge of the last treatment Most of these cases corresponded to overlappings: - With more than two cases involved, or with missing values - With imputed treatment days - Without center ID or name of the center 1st part of Deduplication: Exploratory Approach 2nd Part of Deduplication: Definition of Individual Treatments and Related Events HASH Key, Masked Identifier (RUT) Date of Admission to Treatment Type of Center Type of Program Type of Plan Program Financed by SENDA Main Substance of Consumption Other Substances (1) Other Substances (2) Other Substances (3) Frequency of Consumption of the Main Substance Starting Substance Age of Onset of Drug Use 14,926 cases are entries that repeat the same information in two rows 147 cases are entries that repeat the same information in three rows - Masked Identifier - Date of Admission to Treatment - Center ID - Primary or Main Substance of Consumption at Admission - Other Substances (1, 2 & 3) - Starting Substance - Age of Onset of Drug Use - Marital Status - Occupational Status - Occupational Category - Age in groups - Motive of Admission to Treatment - Education Attainment - Route of Administration of the Primary or Main Substance - Frequency of Consumption of the Primary or Main Substance - Recoded route of administration depending on the primary substance - Replaced first DSM-IV and ICD-10 diagnostics, with the second and third if empty - Sex - Nationality - Ethnicity - Starting Substance 3rd Part of Deduplication: Standardization of Variables and Exploration of Spaces Between Treatments - Age of Onset of Drug Use - Age of Onset of Drug Use Primary Substance - Sex - Age of Onset of Drug Use - Age of Onset of Drug Use Primary Substance - Geographical (Communes & Regions) - Related to Drug Use - Related to social support and socioeconomic variables - Related to dependent children - Cases that started before 2010 - Cases that rounded the yearly change of the databases 4th Part of Deduplication: Collapse of Continous Entries of Referrals into Treatments - Entries with <45 days of difference w/ a posterior entry & Referral a s a cause of discharge (n= 3,067) - Involved other entries that were surrounding continuous entries (n= 12,945) - Kept treatments that were in the middle of a trajectory of a user (n =71;users= 71) n= 163,146 n=37,496 (1.1) n=7,561 (1.3) n= 125,650 n≈ 215 (1.4 & 1.5) n≈ 5222 (1.7) n=797 (2.2) n≈1,448 (2.3) n= 117,619 n=1 n≈ 8702 (1.6) n≈ 11 (1.2) n= 118,036 n=118,089 n=118,088 n=52 (2.1) n≈ 11 (2.4) n= 117,388 n≈ 29,257 (2.5) n≈ 31,384 (2.6) n≈ 44,550 (3.01 & 3.02) n≈ 2,123 (3.03) n= 541 (3.04) n= 6,809 n=647
STROBE