The Report in Brief:
- Developers continue to introduce LLMs — both brand new and extensions of existing model families — bringing with them new questions about cost, performance and appropriateness for different use cases
- We continue to find that “best” is a relative term when comparing LLM performance, which is tightly bound to individual use case
- As the LLM landscape becomes more diverse, understanding the intended purpose of an LLM becomes an important evaluation criteria  
- The price/performance ratio continues to be a critical metric for evaluating which LLM is right for each use case
- Deepseek R1 demonstrates the viability of large models coming from the open source community
A LLM Evolution and the Emergence of Open Source Models:
The LLM landscape continues to evolve at a rapid pace, which can make it feel impossible to keep up. Established models introduce new versions, and new players enter the mix. It becomes critically important to understand how these changes may impact the way LLMs are used in support of critical insurance processes and use cases.
This publication series began with the inaugural The State of AI in Insurance report, where we explored the performance of six different Large Language Models (LLMs) when applied to various insurance-specific use cases. Since that first publication, some of the models tested have been retired from the testing suite and new models added. Shift researchers do so to ensure that the report best reflects the current state-of-the-art of available LLMs, highlights models receiving significant interest from the technology community (e.g. Deepseek R1), and includes those most likely to be considered for deployment against insurance-specific use cases. The report is intended to not only compare relative performance against a set of predetermined tasks, but also illustrate the cost/performance comparisons associated with each of the LLMs tested.
Download the full report for our complete findings and analysis.
 
      
    .png) 
      
    