Background
The Health Improvement Network (“THIN”) is a database containing health information from millions of patients provided by thousands of doctors throughout the UK, France, Spain, Belgium, Romania and Italy[1]. The data is used by several institutions to develop important studies on health management, primary and secondary care, epidemiologic research, medicinal approval, among others. Last June, the Italian Privacy Authority penalized THIN’s Italian subsidiary for violating the General Data Protection Regulation (“GDPR”) due to the use of allegedly inadequate anonymization techniques[2]. The pecuniary fine applied by the Authority was minor (15,000 Euros), but the precedent’s repercussions are significant not only for THIN, but for any player who may process anonymized data in Italy.
Decision
Health data is classified as a “special category of personal data” by art. 9 of the GDPR, which dictates strict compliance rules. As a result, entities that process health data must either: (i) anonymize such data; or (ii) follow one of the legal bases listed in art. 9(2) of the GDPR. THIN’s business model relies upon the first option: all data collected by THIN’s affiliated doctors is anonymized, thus avoiding the application of the GDPR altogether.
THIN’s anonymization process is robust and involves a series of layers[3]. First, the patient’s ID is replaced by a random GUID[4] code, on top of which a hash algorithm is applied, transforming it into a 64-character alphanumeric code. Subsequently, the data is minimized (patient’s birthdate is limited to a single year, height and weight of the patient are classified in large intervals of centimeters and kilograms, etc). Finally, the information is sent to an independent third company that attempts to re-identify it in order to test the efficacy of the anonymization process. Any data that shows a risk of re-identification is then disregarded.
The Italian Privacy Authority concluded that said anonymization process is flawed, since THIN still individualizes each patient’s information around a single code. Hence, in the Authority’s view, there can be no true anonymization while a link remains between a given set of data and an individualized person, even though the controller cannot really know who that person is.
Comment
In order to analyze the Authority’s decision, one should turn to the concept of anonymization as established by Recital 26 of the GPDR. Recital 26 defines “anonymous information” as any information “which does not relate to an identified or identifiable natural person” (emphasis added). In its reasoning, the Italian Authority essentially affirmed that it does not matter whether a given set of data is individualized around a single personal ID or around a single alphanumeric code, inasmuch as in both cases there is individualization and thus the data remains personal. In contrary, Recital 26 does not focus on whether there is individualization in a given case, but rather on whether an identifiable natural person is being individualized. In other words, individualization, in itself, is not necessarily incompatible with anonymization – it all boils down to whether the data controller can identify who is being individualized.
In the case at hand, the Authority did not thoroughly demonstrate how each set of data processed by THIN could be traced back to a single, identifiable natural person. Conversely, the Authority simply stated that each set of data could be traced back to a single alphanumeric code. However, it happens to be the case that being able to trace a set of data back to a single code, but not to a single, identifiable natural person, is actually a classic example of anonymization. For instance, if THIN could hypothetically affirm that a male person who lives in Milan, was born in 1993 and weighs between 70 and 80 kilos was diagnosed with a viral infection on 10 January 2023, but could not affirm who that person is (no name, no social security number, no photo, etc.), then, by all means, that person is not identifiable and that data is presumably anonymous[5].
Moreover, Recital 26 makes it clear that “to determine whether a natural person is identifiable, (…) account should be taken of all objective factors, such as the costs of and the amount of time required for identification” (emphasis added). The Authority, however, failed to mention in its reasoning any objective factors that would allow THIN to re-identify the data being processed.
THIN naturally appealed the decision, which suffered criticism from the Italian legal community[6], and the outcome of the case may yet shift. Still, at the current time, the fact remains that companies who process anonymized data will need to tread a very “THIN” line if they wish to be GDPR compliant in Italy.
[1] More information available at: https://www.the-health-improvement-network.com/.
[2] Access the full decision here: https://www.garanteprivacy.it/home/docweb/-/docweb-display/docweb/9913795.
[3] Said anonymization process was explained by the Italian Privacy Authority itself in its decision.
[4] Abbreviation of “Globally Unique Identifier”.
[5] Indeed, art. 3(1) of the GDPR determines that “an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”
[6] See, e.g., Silvia Stefanelli, Thin, perché il Garante sbaglia e blocca la ricerca in Sanità, Agenda Digitale, August 2, 2023, available at: https://www.agendadigitale.eu/sanita/sanita-perche-su-thin-il-garante-ha-sbagliato-e-blocca-la-ricerca/.