Skip to content
EN-UK 

SHARE:

 

What is metadata scrubbing?

Metadata scrubbing is one of the terms used to describe when important data about a document, such as the time it was created or the author, are accidentally or maliciously lost or overwritten.

Having access to the original metadata is crucial when analysing potential fraud as it increases the accuracy of the results significantly. For example OpenAI uses the C2PA metadata to recognise documents that have been created or modified by GenAI.

This precious metadata often contains the evidence that photoshopping or modification has taken place, a record of where and when the document was generated and by whom. All of which are crucial for fraud detection.

Common metadata scraping situations.

  • Embedding images - Images shared directly in Whatsapp, Slack, email messages and other tools are likely to lose their metadata
  • Compression - While file data is often the primary target for compression, the process can lead to the document metadata becoming inaccessible.
  • Anu virus checks. Anti virus scanning itself seldom updates metadata, however when operations such as quarantine or file repair are necessary these can override the original values.
  • Copy and paste - Creating claim investigation reports and documents sometimes results in users pasting files and images inside MSWord and PDF documents to make it easier for the reader. Unfortunately the metadata is compromised in most cases.
  • Internal system changes and upgrades - Many insurers hold documents inside legacy and siloed systems and even simple upgrades can wipe important metadata.
  • Document transfer - Moving documents from one store to another, e.g. from a claims system to a data lake can compromise metadata.
  • Format limitations - Applications and services that require certain file types may force an end-user to convert the file to a different type, losing original metadata.

How can we protect metadata?

  • Train handlers to use attachments and if they need to tell a story around attachments, also attach the original documents.
  • Change processes to persist metadata if needed. Consider creating a document store or portal where documents can be uploaded and code built to protect metadata.
  • Understand potential risks potential risks of metadata scrubbing on any core system transformation project
  • Work with IT to use anti-virus checkers, compression tools and other services at optimal points in the process.
  • Avoid pasting and annotating in separate tools. Shift summarisation automates case summaries without losing data.
  • Transfer methods can interfere with metadata. Shift can help advise on the best approach to preserve important details
  • Leverage cross checking. Shift has the capability to combine the entire pipeline of metadata available. Using dedicated scenarios, Shift combines this with structured claim data and external data sources in order to detect anomalies across sources.
  • Ensure software has options for controlling metadata. Shift Case Management allows insurers to configure how they want to protect metadata.
  • Secure uploads. The Shift document storage architecture preserves metadata on upload.

How can we prevent metadata scrubbing from hiding document fraud and manipulation?

 

About the Author

Haiyuan Shi
Shift Technology, Tech Lead Data Scientist