trove / Claude Code skills / hcris-analyst
Two of the most cited datasets in U.S. hospital research — CMS Medicare Cost Reports (HCRIS) and IRS Form 990 Schedule H — describe overlapping things about the same hospitals using different field definitions, different fiscal periods, and different file formats. CMS ships 100,000-row long-skinny CSVs; the IRS ships 990s as XML bulk ZIPs. Joining them at the hospital level is real work.
hcris-analyst is a Claude Code skill that queries the trove-published bundles where that work has already been done. It runs in your own Claude Code session, translates natural-language questions into DuckDB SQL, knows where the data lives, and handles the definitional caveats that trip up naive cross-form comparisons.
Install
/plugin marketplace add cbetz/trove
/plugin install trove@trove
Or copy the skill directory directly:
git clone https://github.com/cbetz/trove
cp -r trove/skills/hcris-analyst ~/.claude/skills/
Example prompts
- "Show me a profile of New York-Presbyterian."
- "What does Worksheet S-10 line 23 column 1 mean?"
- "Show the aligned-period CMS and IRS charity-care fields for Yale-New Haven."
- "Which hospitals provided the most uncompensated care in FY2023?"
- "Did Memorial Hermann amend their TY2022 990?"
- "How do charity-care cost ratios vary by home-county SVI band?"
What this skill does
- Knows the bundles. Three Parquet files on troveproject.com: HCRIS FY2023 (Hospital 2552-10, wide format), IRS 990 Schedule H TY2022 (across 2024/2025/2026 IRS release years), and the joined community-benefit dataset.
- Translates questions to DuckDB SQL. Single-hospital lookups, peer cohorts, cross-form comparisons, field-glossary questions — the skill picks the right query shape and runs it against the bundles over HTTPS.
- Handles period alignment. HCRIS labels its bulk files by federal-fiscal-year reporting cycle, not period covered. For many hospitals "FY2023" data is 12 months later than the matching TY2022 990. The skill defaults to filtering on aligned-period rows for cross-form analyses and flags when comparisons are misaligned.
- Disambiguates names. "Memorial Hermann" matches a dozen rows; the skill surfaces candidates with bed counts and revenue and asks which one.
- Names the structural reasons when a cross-form difference looks suspicious. Children's hospitals, specialty cancer centers, and major non-Medicare-volume teaching hospitals legitimately report near-zero charity care on HCRIS S-10 while reporting substantial financial assistance on Schedule H 7a — the skill explains the pattern rather than presenting it as a finding.
- Carries SVI context. Each system carries CDC Social Vulnerability Index 2022 percentiles for its home county. Useful for coarse geographic context; not a service-area measure.
What this skill doesn't do
- Patient-level outcomes, quality metrics, or readmissions. Those live in CMS Care Compare / Hospital Compare, not HCRIS.
- For-profit and government hospitals' cross-form view. They're in HCRIS but don't file 990s, so the matched dataset only covers nonprofits. The skill can still answer HCRIS-only questions about them.
- Pre-TY2022 / pre-FY2023 trends. Earlier years are planned for v2 — not available now.
- Hospitals outside the U.S. HCRIS is CMS Medicare-specific.
- Definitive accusations from cross-form differences. The skill explains why HCRIS S-10 and Schedule H 7a legitimately differ rather than treating a gap as evidence of bad faith.
Coverage
1,295 nonprofit U.S. hospital systems matched at the EIN level. HCRIS Hospital 2552-10, FY2023. IRS 990 Schedule H, TY2022 (across 2024/2025/2026 release years). CCN↔EIN crosswalk from Community Benefit Insight (RTI International / RWJF), December 2024 vintage. Browse the index at /hospitals/.
Source and license
MIT-licensed. Source code, parsers, raw artifacts: github.com/cbetz/trove. The data is U.S. government work, public domain — CMS HCRIS, IRS 990 e-file, and CDC SVI. The CCN↔EIN crosswalk is from Community Benefit Insight: RTI Press DOI 10.3768/rtipress.2023.op.0080.2302.