TPAS logo: a sword piercing a stack of papers.
Published on

Pennsylvania Claims Data Release

Authors
  • avatar
    Name
    Mike Gartner, PhD
    Twitter

Context

In May of 2023 we submitted a public records request under the Pennsylvania Right to Know Law to the Pennsylvania Insurance Department in an attempt to acquire comprehensive records documenting health insurer claims denial data in the state.

The request was partially granted, to the extent possible given the records maintained by the department, and today we are open sourcing the data.

Raw Data

The data received corresponds to claims denial data from plan years 2020 and 2021 only, although the data for those years were reported in 2022 and 2023 respectively. We requested data for the last ten years, but to the extent our request was granted, the provided public records cover only 2020 and 2021 plan years. See the notes in the open source repository readme for more details about the request submitted.

Population Represented

The data provided in the public records corresponds primarily to "on-exchange" marketplace plans for insured consumers in PA, and provides breakdowns pertaining to insurers, individual plans, denials, internal and external appeals, and denial rationales.

In aggregate, the data from the two years of public records corresponds to:

  • 13 unique insurers.
  • 599 unique plans.
  • At least 337,722 consumers.
  • 21,646,696 claims adjudicated
  • 2,964,421 claims denials.
  • 2,751 internal appeals.
  • 103 external appeals.
  • All marketplace plans.
  • Possibly additional, non-marketplace group plans regulated by the PA Department of Insurance.

Detail

Different levels of detail are provided in the data for marketplace and non-marketplace insurers.

At the level of individual insurers, the data provides only high level aggregate statistics about claims, denials, and appeals.

  • Issuer Level Data
    • Claims received.
    • Claims denied.
    • Claims internally appealed.
    • Claims overturned on internal appeal.
    • Claims externally appealed.
    • Claims overturned on external appeal.

These data elements correspond to the total aggregate values available for each insurer, and include data from both marketplace and non-marketplace plans.

At the level of individual plans, the data provides only aggregate statistics about claims, denials, and claims denial rationales.

  • Plan Level Data (marketplace only)
    • Claims received.
    • Claims denied.
    • Claims denied broken down by rationale.

These data elements correspond to total aggregate values available only for plans from each insurer sold on a marketplace.

Format

The raw data was delivered in the form of pdfs with tables indicating issuer and plan level details; oddly, the pdf tables were split into horizontal pieces across tables. You can see one example of the raw pdfs provided here.

Disclaimer

We note that the underlying raw data was provided by the Pennsylvania Insurance Department, but the Department is not responsible for any findings in, or manipulation of, the data.

In particular, the repository containing data that we are open sourcing includes both the public records provided by the PA Insurance Department (pdf files) and our own manipulations of those data to, what we view to be, more accessible formats. We have made an effort to be extremely explicit in labeling which data constitutes the raw records we received from the DOI, and which we produced ourselves.

Analysis

Our entire motivation for requesting and releasing this data is to facilitate analysis of the landscape and practices of claims denials in U.S. health insurance by interested members of the public (ourselves included).

Naturally, we were eager to jump into the details of the data ourselves. We are releasing a detailed, formal article about this and other data early next week, but we discuss a sample of our analysis here.

Parsed Data

First things first; we had to parse and extract the numerical data in the public records we obtained to remove them from split pdf tables, and store them in a more usable format for analysis. We centralized the data for all insurers and plans into two csv files.

Outcomes Overview

We analyzed the data in numerous ways, most of which will be described in detail in our forthcoming article.

The most notable highlights are that, as is typical among well-studied claims denial data, initial denial rates are high, internal appeal rates are miniscule, but internal and external appeal overturn rates are high.

The tables below summarizes these statistics for the PA Insurance Department data.

Plan YearClaims ReceivedClaims DenialsDenial RateInternal AppealsInternal Appeal OverturnsInternal Appeal RateInternal Appeal Overturn Rate
202010,183,2181,289,006.131,515905.0012.60
202111,463,4771,675,414.151,234723.0007.59
Plan YearExternal AppealsExternal Appeal OverturnsExternal Appeal RateExternal Appeal Overturn Rate
20206217.1.27
2021409.08.23

For a sneak peak into our article, we show below two types of more detailed breakdowns that the data supports.

The figure below shows how the claims received in the aggregate, two-year dataset are broken by insurer.

The data supports performing the same insurer-breakdown at the level of denials; the figure below shows the denial rate for the 10 insurers in the data with the most claims received.

In addition to breaking down statistics by individual insurer, the data supports breaking down denial counts for marketplace plans by the rationale associated with each denial. The figure below shows the distribution of denial rationales in all of the data.

More to come on this data in our article release!

All of our analyses, including the plots above, will be open sourced along with the article release; details to come.

Get Involved

If you are interested in analyzing this data, you can get started right away by downloading our parsed aggregates from the pdfs.

If you enjoy python and pandas, or rust and polars, you can use the following snippets to get started with your analysis directly:

load_data.py
import pandas as pd

issuers_url = "https://repos.persius.org/public-records/data/claims_denials/pa/processed/issuers.csv"
plans_url = "https://repos.persius.org/public-records/data/claims_denials/pa/processed/plans.csv"
issuers_df = pd.read_csv(issuers_url)
plans_df = pd.read_csv(plans_url)


# Your insightful analysis goes here.
# Happy coding!

load_data.rs
use std::io;
use std::fs::File;

use polars::prelude::*;

async fn download_file(url: &str, local_path: &str)
{
    let resp = reqwest::get(url).await.expect("request failed");
    let body = resp.text().await.expect("body invalid");
    let mut out = File::create(local_path).expect("failed to create file");
    io::copy(&mut body.as_bytes(), &mut out).expect("failed to copy content");
}


#[tokio::main]
async fn main() {
    let issuers_url = "https://repos.persius.org/public-records/data/claims_denials/pa/processed/issuers.csv";
    let plans_url = "https://repos.persius.org/public-records/data/claims_denials/pa/processed/plans.csv";
    let local_issuers_path = "issuers.csv";
    let local_plans_path = "plans.csv";

    download_file(issuers_url, local_issuers_path).await;
    download_file(plans_url, local_plans_path).await;

    let _issuers_df = CsvReader::from_path(local_issuers_path).unwrap().finish().unwrap();
    let _plans_df = CsvReader::from_path(local_plans_path).unwrap().finish().unwrap();

    
    // Your insightful analysis goes here.
    // Happy coding!
    
}

Alternatively, you can work directly with the raw pdfs we received if you'd like to validate that we parsed and extracted the data correctly.

In any case, we'd love to hear about how you're using the data and whether it proves useful; please feel free to reach out with questions, comments, or feedback to info@persius.org.