Expanded "De-identified patient data" secton with more information on methods and cases
← Older revision
Revision as of 05:34, 28 April 2015
Line 29:
Line 29:
== De-identified patient data ==
== De-identified patient data ==
−
'''
De-identified patient data
'''
is
[[Protected Health Information (PHI)|
patient data
]]
that has been
removed
of important identifiers such as birth date, gender, address, and age.
+
De-identified patient data is patient data that has been
scrubbed
of important identifiers such as birth date, gender, address, and age
. De-identified patient data is often used for research. The [[Health Insurance Portability and Accountability Act (HIPAA)]] allows use or disclosure of such data without restrictions or individuals’ consent. The rule acknowledges the risk of [[Data re-identification]] and is meant to balance this risk with the utility of sharing data for research. The Privacy Rule allows for two methods to satisfy the HIPAA de-identification standard: the “Safe Harbor” method in which the above 18 identifiers are removed, or the “Expert Determination” method. The latter method requires the use of mitigation strategies and statistical determination that the risk of identification is very small, but not zero.<ref name="HHS">U.S. Department of Health and Human Services Office of Civil Rights. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. [http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html]</ref> The Safe Harbor method is far more commonly used.<ref name="Malin2012">Malin, B., Benitez, K., & Masys, D. (2011). Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule. Journal of the American Medical Informatics Association : JAMIA, 18(1), 3–10. doi:10.1136/jamia.2010.004622 [http://jamia.oxfordjournals.org/content/18/1/3]</ref>
.
−
De-identified patient
data is
often used for research
.
The [[Health Insurance Portability and Accountability Act
(
HIPAA
)
]] allows the use of such de-
identified
data without requiring special authorization
,
and its use
or
disclosure without restrictions
+
Under the expert determination method, an analysis is conducted to assess three factors that may contribute to identification risk:
+
*Replicability: the degree to which
data
will serve as a unique marker that
is
stable over time
.
+
*Data source availability: other sources that can be used to cross-reference patient records
(
ie, public records
)
.
+
*Distinguishability: identifiers that allow records to be uniquely
identified,
alone
or
in combination (such as birthdate, zip code, and gender).
−
Information is de-identified
when it
is
not possible
to
'reasonable ascertain'
the
identity
of
a person from that
data.
+
Apart from the removal of Safe Harbor identifiers, de-identification methods must be adapted to the context. Methods may include '''suppression''' of identifiers, in whole or in part; '''generalization''' to reduce the specificity of identifiers (ie, truncating zip codes), and '''perturbation''' to introduce noise into the data <ref name="HHS" />. Efforts are being made to automate the anonymization of health information by developing de-identifications models that can successfully remove personal health information.<ref>State-of-the-art Anonymization of Medical Records Using an Iterative Machine Learning Framework; György Szarvas, Richárd Farkas, Róbert Busa-Fekete b J Am Med Inform Assoc. 2007 Sep–Oct; 14(5): 574–580.</ref><ref>Friedlin, F. J., McDonald, C. J. A Software Tool for Removing Patient Identifying
Information
from Clinical Documents. (2008) JAMIA, 15 (5); 601 – 610. PMCID: PMC2528047</ref> One popular algorithm concept
is
known as k-anonymity, under which records in a
de-identified
dataset can be proven to be similar to a set number of other records. The data
is
thus transformed
to
meet suitably low criteria for records’ similarity while preserving
the
information content
of
the dataset.<ref name="Malin2012" /><ref name="ElEmam2009">El Emam, K., Dankar, F. K., Issa, R., Jonker, E., Amyot, D., Cogo, E., … Bottomley, J. (2009). A globally optimal k-anonymity method for the de-identification of health
data.
Journal of the American Medical Informatics Association : JAMIA, 16(5), 670–82. doi:10.1197/jamia.M3144 [http://jamia.oxfordjournals.org/content/16/5/670]</ref> Another example of risk mitigation involved the Heritage Health Prize dataset. This insurance claims data was released to researchers competing to predict the rate of future hospitalizations. An algorithm was developed to reduce identification risk. This included the use of pseudonyms for providers and patients, suppression of records deemed high risk, and generalization of certain claims. The competition also required participants to agree to an access restriction policy.<ref>El Emam, K., Arbuckle, L., Koru, G., Eze, B., Gaudette, L., Neri, E., … Gluck, J. (2012). De-identification Methods for Open Health Data: The Case of the Heritage Health Prize Claims Dataset. Journal of Medical Internet Research, 14(1), e33. doi:10.2196/jmir.2001 [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3374547/]</ref>
−
The definition of Irreversible de-identification of data is context driven. The capacity of re-identify de-identified data may depend critically on particular resources( Intellectual, Information Technology, Access to multiple data sets).(1)
+
== References ==
−
Efforts are being made to automate the anonymization of health information by developing de-identifications models that can successfully remove personal health information. (2)
+
<references/>
+
See also
+
El Emam, Khaled. Guide to the De-Identification of Personal Health Information. CRC Press, 2013 [https://books.google.com/books?id=Bc8nAAAAQBAJ]
+
Submitted by (Jacob Schwartzman)
−
# Australian Government. Office of the Privacy Commissioner.
+
[[Category:BMI512
-
SPRING
-15
]]
−
# State
-
of
-
the-art Anonymization of Medical Records Using an Iterative Machine Learning Framework; György Szarvas, Richárd Farkas, Róbert Busa-Fekete b J Am Med Inform Assoc. 2007 Sep–Oct; 14(5): 574–580.
+
−
+
−
+
−
== Reference ==
+
−
<references/>
+
−
+
−
Friedlin, F. J., McDonald, C. J. A Software Tool for Removing Patient Identifying Information from Clinical Documents. (2008) JAMIA,
15
(5); 601 – 610. PMCID: PMC2528047
+