Amazon announced a new service that allows healthcare professionals to de-identify medical images by automatically removing any identifying protected health information (PHI) presented in the images.
Pictures taken in medical settings often contain personally identifiable information (PII) stored as text within the image, including the patient’s name, date of birth, age, and other data. As this information falls under the remit of HIPAA and is classed as PHI, individuals wanting to use these images for research purposes must seek consent from the patient to access the image, and the accompanying information. Alternatively, the information must be removed before the researchers gain access to the image.
Before Amazon released its new service, removing PHI from images required the image to be manually checked and altered. This is a time-consuming and laborious process, and unnecessarily costly for the organisation needing to anonymise the data. Some research studies require huge numbers of images to be altered, and the costs and time involved may be prohibitive.
Amazon’s solution to this problem involves its Rekognition machine-learning service. Amazon offers Rekognition as a service to users so they can easily ‘add image and video analysis to [your] applications’. Researchers have trained the deep-learning algorithms to detect and extract text from images. The text is then fed through Amazon Comprehend Medical to identify any PHI. The service can quickly redact any PHI in the images. The system works on PNG, JPEG, and DICOM images.
The service provides a score once its task is completed, which indicates the level of confidence in the accuracy of the detected entity. Users can use this score to ensure that information has been correctly identified. The desired confidence level – from 0.00 to 1.00 – can be set by the user. A confidence level of 0.00 will see all text identified by the service be redacted.
In a blog post, James Wiggins, a senior healthcare solutions architect at AWS Amazon, says the system allows healthcare organisations to de-identify large numbers of images quickly and inexpensively. The tech giant says that the system is versatile, and users can process process thousands or millions of images. Once an image has been processed and the location of PHI has been identified, it is possible to associate a Lambda function to automatically redact PHI from any new images when they are uploaded to an Amazon S3 bucket.