trove

Reference tools for underused healthcare data.

trove builds open-source lookup tools, parsers, and Claude skills on top of public-domain healthcare datasets that are widely cited but rarely usable in their raw form. Each area is search-first — type a name or identifier, see a single record at a time. The full data, parsers, and skills are MIT-licensed at github.com/cbetz/trove.

Datasets covered

v0.1 · NEW

FDA drug approvals

Look up any FDA novel drug approval from 2021–2024 and find the approval package — sponsor, application number, key dates, the FDA-approved label, and a deep link to every document FDA released for that approval (medical review, statistical review, pharmacology review, chemistry review).

v1.1

Hospital reporting

Look up related CMS Worksheet S-10 and IRS Form 990 Schedule H charity-care reporting fields for 1,295 nonprofit U.S. hospital systems, side-by-side, with filing-period context and a home-county Social Vulnerability Index proxy.

More areas coming. Suggestions welcome.

Claude Code skills

Each area ships with a Claude Code skill that translates natural-language questions into queries over the area's published data. The skills are bundled as a single Claude Code plugin in Anthropic's community marketplace:

/plugin marketplace add anthropics/claude-plugins-community
/plugin install trove@claude-community

This is the public community marketplace, not the separate official claude-plugins-official marketplace.

Skill details and example prompts:

fda-analyst — reads FDA approval-package documents to answer questions about the basis for approval, trials, endpoints, and regulatory pathway.
hcris-analyst — queries CMS Worksheet S-10 and IRS Form 990 Schedule H side-by-side for U.S. nonprofit hospitals, with home-county SVI context.

What makes this useful

Public-domain healthcare data is famously messy. CMS publishes 100,000+ row long-skinny CSVs; the IRS publishes 990s as XML in bulk ZIPs; the FDA scatters approval reviews across hundreds of PDF directories. trove's job is to do the parsing, joining, and packaging so the data is browsable and queryable rather than something only people with a Python environment and free time can use.

Each area is also a Claude skill — meaning you can install it, ask questions in natural language, and get answers grounded in the actual underlying data rather than what an LLM half-remembers from training.

Docs

Plain-language explainers for the datasets:

What is HCRIS? — CMS Medicare Cost Reports, Form 2552-10, Worksheet S-10 charity care.
What is IRS Form 990 Schedule H? — nonprofit hospital community-benefit reporting.
What is an FDA novel drug approval? — NMEs, novel BLAs, and the approval package.