Docs

Plain-language explainers for the datasets trove indexes.

Three of the most widely cited public-domain datasets in U.S. healthcare are also three of the least approachable. CMS Medicare Cost Reports are shipped as 100,000-row long-skinny CSVs. IRS Form 990 Schedule H is buried in XML bulk ZIPs. FDA approval packages are scattered across hundreds of PDF directories on accessdata.fda.gov. The data is public; learning what's in it is a project.

These pages explain what each dataset is, what's in it, and how to use it — in plain language, with citations.

Datasets

CMS

What is HCRIS?

CMS Medicare Cost Reports — the financial filing every Medicare-participating hospital submits annually. Covers beds, staffing, revenue, costs, charity care, and uncompensated care via the Hospital 2552-10 form.

IRS

What is IRS Form 990 Schedule H?

The community-benefit schedule that nonprofit hospitals attach to their annual Form 990 tax return. Covers financial assistance, Medicaid shortfall, research, education, and other categories that justify tax-exempt status.

FDA

What is an FDA novel drug approval?

The annual curated list of meaningful first-time drug approvals (NMEs and novel BLAs), and the approval package each one ships with — medical, statistical, pharmacology, and chemistry reviews.

Tools

The docs explain the data. The tools let you query it.