Research & Reports

The State of AI in Insurance: Claims Decisioning and Liability Determination in Subrogation (Vol. VI)

Written by Shift Technology | Jul 24, 2025 4:00:14 PM

The Report in Brief:

  • The introduction and continued advancement of reasoning LLMs creates new opportunities to apply Gen AI to important insurance use cases such as subrogation liability assessment and claims decisions
  • The wide variety of LLMs available make assessment of pros and cons even more important when determining which LLM to use for a specific use case
  • In certain situations “standard” models achieve comparable performance to “reasoning” models on reasoning tasks

A Continuing LLM Evolution:

Since Shift began publishing this report more than a year ago, the use of generative artificial intelligence (Gen AI) to drive efficiency, accuracy, and fairness in the claims process has become increasingly mainstream. And like most technologies, the large language models (LLMs) powering this important insurance transformation have continued to evolve. From its beginnings, this report was designed to provide insight into the intersection between LLMs and specific insurance use cases, and help provide some clarity around how specific LLMs performed when applied against specific tasks.

With the latest edition of the State of AI in Insurance Report we tested a total of 21 LLMs. As with subsequent reports, in an effort to best represent the current state-of-the-art as well as highlight those LLMs most likely to be in use in insurance environments we both retire older, and include newer, models to create an optimal testing environment. For this report we have added 10 new LLMS to the benchmark:

  • GPT4.5: the short-lived OpenAI flagship standard model to be retired soon
  • GPT4.1, GPT4.1-mini, and GPT4.1-nano: the new suite of OpenAI’s standard models
  • o4-mini: the latest OpenAI reasoning model
  • Deepseek V3: the latest version of Deepseek’s standard model
  • MAI-DS-R1: Microsoft’s version of Deepseek R1, the Deepseek reasoning model
  • Claude3.7 Sonnet: an updated version of Anthropic’s flagship model
  • Mistral Small 2503: the latest version of Mistral’s small model
  • Llama4-maverick: the latest available Llama model 

Download the full report for our complete findings and analysis.