📊 CSV Dataset Explorer

Lightweight, accessible data dictionary for government health datasets

🚀 Quick Start

Try the Demo

Load and explore CSV files with an interactive data dictionary, validation rules, and AI prompts.

Install Bookmarklet

One-click access from healthcare.gov, CDC, and other data portals. Works anywhere with CSV data.

✨ Features

📋

Auto Schema

Infers field types and constraints from data

🔍

Field Search

Filter & inspect columns instantly

Validation Rules

Auto-generated constraints per field

💾

Local Cache

IndexedDB stores data in browser only

📤

Export

Download schema as JSON or CSV

🤖

AI Prompts

Copy prompts for LLM analysis

Accessible

Keyboard-only, ARIA labels, screen readers

🏥

Healthcare.gov

Direct integration via bookmarklet

🔖 Bookmarklet Installation

Drag this link to your bookmarks bar:

Manual Installation

If drag-and-drop doesn't work, create a bookmark manually with this code:

How to Use

Navigate to any page with CSV data (healthcare.gov, CDC, CMS, etc.) and click your bookmark. The explorer will automatically find and load CSV files.

💡 Use Cases

1. Explore Healthcare.gov Data

Find datasets on healthcare.gov, click the bookmark, and instantly see the data dictionary with field definitions, validation rules, and sample values.

Example: https://data.healthcare.gov/dataset/5k5i-wzex

2. Quick Schema Review

Before loading a CSV into your analysis tool, use the explorer to understand the data structure, field types, and constraints.

3. AI-Assisted Analysis

Generate pre-written prompts for ChatGPT, Claude, or other LLMs. Copy prompts directly from the explorer to analyze specific fields.

4. Data Documentation

Export the data dictionary as JSON or CSV headers to include in your project documentation.

5. Field-Level Search

Search for specific fields across large CSVs. Find all fields matching "patient", "date", "cost", etc.

🔌 API Endpoints (Server)

The backend proxy server provides these endpoints for programmatic access:

POST /api/proxy/csv

Fetch any CSV file with CORS handling. Returns raw CSV data.

curl -X POST http://localhost:3000/api/proxy/csv \ -H "Content-Type: application/json" \ -d '{"url":"https://example.com/data.csv"}'
GET /api/healthcare/dataset/:id

Extract CSV URLs and metadata from healthcare.gov datasets.

curl http://localhost:3000/api/healthcare/dataset/5k5i-wzex

Returns: JSON with dataset title and CSV URLs

GET /api/socrata

Query Socrata data portals (CDC, CMS, etc.) for dataset metadata.

curl "http://localhost:3000/api/socrata?domain=data.cdc.gov&id=dataset-id"

🌐 Supported Data Portals

The proxy server handles CORS restrictions for these whitelisted domains:

  • healthcare.gov - Health datasets
  • cdc.gov - Disease and health statistics
  • cms.gov - Medicare/Medicaid data
  • healthdata.gov - Health tracking data
  • github.com - CSV files in public repos
⚠️ Other domains: Direct browser requests work if CORS allows. Otherwise, add the domain to server.js whitelist.

🏗️ How It Works

Browser Node.js Server (localhost:3000) ├─ CSV Explorer App ├─ Express.js ├─ IndexedDB Cache ├─ CORS Proxy ├─ Bookmarklet ├─ Healthcare.gov extractor └─ Vanilla JavaScript └─ Socrata API gateway User clicks bookmark on healthcare.gov ↓ Bookmarklet extracts dataset ID ↓ App opens with ?dataset=ID&domain=healthcare.gov ↓ Server fetches metadata and CSV URLs ↓ Browser parses CSV, infers schema ↓ Data cached locally in IndexedDB (browser only) ↓ User browses, searches, exports (all local)

🔒 Security & Privacy

  • No data storage - Server is stateless, doesn't store any data
  • Client-side caching - Data stays in your browser (IndexedDB)
  • Domain whitelisting - Only trusted sources allowed through proxy
  • URL validation - Unsafe URLs rejected before fetching
  • HTTPS ready - Works with HTTPS in production

❓ Troubleshooting

Bookmarklet not working

Ensure the server is running on http://localhost:3000 and the bookmarklet code includes the correct PROXY_URL.

CSV not loading

Check browser console (F12 → Console). Common issues:

  • Domain not whitelisted (edit server.js)
  • URL is a dataset page, not a direct CSV file
  • CSV requires authentication

Healthcare.gov dataset not detected

Run diagnostic in console:

curl http://localhost:3000/api/healthcare/dataset/5k5i-wzex

📚 Next Steps

  1. Try the demo: Open demo
  2. Install bookmarklet: Copy or drag the bookmark above
  3. Test with healthcare.gov: Visit https://data.healthcare.gov/dataset/5k5i-wzex and click the bookmark
  4. Deploy to production: See SERVER_DEPLOYMENT.md in the project repo