Skip to content


Health care fraud is rampant, and health insurance companies are challenged with preventing a staggering amount of fraud, waste, and abuse cases. It is estimated that these cases cost insurers and payers more than $600 billion a year globally. Technology poses a way to simplify the process — health insurance fraud detection has come a long way thanks to innovations in artificial intelligence and analytics. However, to make those tools work, your organization must feed them clean data. Even the most advanced anomaly detection suites can't help you prevent fraud if you can't give them the high quality data to do their work. Let's explore why getting clean data is difficult, and what it takes to prepare data properly.

The State of Health Insurance Data

No matter how your company is detecting fraud, one principle holds true: The results you produce can only be as good as the data you call upon. In other words, if you put poor-quality data into your fraud model, you'll end up with poor results. While analytics and data science are powerful tools for detecting fraudulent activity, they are dependent on good data. Your organization must be able to gather and organize data in order to take advantage of the benefits AI presents for health insurance fraud detection.

It's no simple matter ensuring your data is up to par, considering the number and types of data sources you're calling upon. On average, health insurance companies process claims containing 38-39 services per member annually. [1] Multiply that by the number of members and providers — and the number of internal and external data sources that go into every claim processing transaction — and the volume is tremendous. As a result, it takes significant time and effort to prepare data for detecting fraudulent claims accurately. Sorting and preparing data is like building a house; it takes time to make a quality foundation, but the results are worth the effort

Let’s look more closely at all the permutations of that data:

  • Internal data sources. Health care companies often pull information from multiple legacy systems in order to detect and prevent fraud, waste and abuse. In addition to confirming code sets and policies – and consulting medical records and documentation – companies need to arrive at a comprehensive view of the provider making the claim.

  • External data sources. It’s also necessary to pull in data from third-party sources. These might include aggregated information on historical claims data, fraud watch and sanction lists, business and credit bureaus, news and social media, and medical billing data, to name a few.

  • Structured and unstructured data. Every company generates and works with a combination of structured and unstructured data. Just as it sounds, structured data is stored in a structured format (think rows and columns within a relational database, for example). This enables software programs to easily analyze the data. On the other hand, there is also is unstructured data, which lacks any consistent or defined organization. Examples of unstructured data include emails, images, and medical records. It's far more difficult for a software program to understand data in this form. The additional challenge for health insurers is that over 80% of health insurance data is considered unstructured data. That 80% is what makes fraud detection difficult. Unstructured data is a challenge for almost every modern organization. In the health care industry, it's particularly difficult to tackle since you need your solution to process that data appropriately to detect fraudulent claims; in health care, there is a concrete cost for failing to detect a false claim or not catching on to a fraud scheme.

Going from Raw to Clean Data

In addition to aggregating and normalizing all this data so it's consistent, your company needs to make sure the data is accurate and usable. This involves steps such as confirming entities are correct and determining if data is missing. These checks are essential since it's easy for people to incorrectly enter or deliberately falsify data — such as date of birth — into a system. This is possibly the biggest obstacle to streamlining health insurance fraud detection sheerly because of the volume of work required to audit and establish useful databases for your model to pull from.

Experience has shown that it takes a significant amount of upfront work to prepare data for your fraud detection model. In fact, most companies spend about 80% of their time preparing their data for use in the model, and the other 20% of their time designing and launching the model. To bear the load, you’ll need to invest heavily in a strong analytics and data science team or find a partner to support your internal team. Partnering offers the advantage of being able to access already built models and databases to spin up your fraud prevention efforts quickly, but some organizations prefer to build their own or use technology to supplement an internal team.

However, preparing the data is time well spent. The more data you feed into your fraud detection model, the greater the results and insights. At Shift, we have found that an effective fraud, waste, and abuse reduction program powered by clean, accurate data can significantly help your organization.

Health insurance fraud is a costly problem, and it's worth investing the time and effort to ensure you have the capability to:

  • Detect more fraud, waste and abuse cases
  • Reduce fraud losses and improper payments
  • Prioritize actionable alerts

Turning raw data into clean data is a tall order, but it's necessary for utilizing AI to improve your ability to detect fraud, waste, and abuse and address the problem before it can cause damage to your organization. Big data is an inherent problem in the insurance industry, and the best way to address it is with emerging automation tools that can flag suspicious claims autonomously. It's no longer a question of whether or not you want to implement these tools, it's a matter of which one best fits your business.

To learn more about how Shift approaches health insurance fraud detection and how our solution helps reduce fraud, waste, and abuse for health insurers, please visit our healthcare solutions page.

[1], Medicare Physicians/Suppliers: Utilization, Program Payments, Cost Sharing, and Balance Billing for Original Medicare Beneficiaries, by Type of Entitlement, Calendar Years 2012-2017