Summary

We conducted a direct deterministic linkage process through the encryption of the patient’s ID by commissioning the development of an offline web application to a third party. This program normalizes each RUN (unique number and identifier for Chileans or foreigners remaining in Chile) to a standardized format and then transforms them into a hash code through an algorithm MD5 encryption method. The generated hash is encrypted using a secret key (“salt key”) only known by the developer and the counterparts of SENDA and DEIS in order to ensure the protection of the user’s identity (see Bioethical considerations).

Previous setting for encryption

On each center, it is necessary to define processes and actors involved in them. This means that it is necessary to identify roles (informatics, users, permissions). The encryption requires defining scopes in terms of the resources (characteristics of the datasets, character codification formats), ethical considerations of personal information contained in them, and characteristics of computers in which the program will be executed. It is also necessary to define how the dataset will be obtained (yearly, by indices measured, etc.), range of time, format (.txt, .csv) and how is separated.

Once these concerns are resolved, it is recommended that the analysts may explore how RUNs are formatted and distributed, in order to make some adjustments to the program if required, or consider transforming RUNs under different assumptions. This process is very important, because the quality of the linkage may not depend on how similar RUNs were written if they corresponds to the same patient. If possible, an address book or a sample line with brief information about the characteristics of the dataset can be created.

Program Structure and RUN normalization

The program encrypts the RUN with the MD5 algorithm, but also including a secret key that is concatenated as a suffix of each RUN, which makes their decryption less probable (We recommend a key of at least 30 characters composed of alphanumeric characters).

The process of the normalization of RUNs to standardized their format is explained in the following diagram:

knitr::include_graphics(paste0(path,"/Figures/diagram.svg"))