The national water quality data landscape encompasses 200+ discrete sources spanning federal agencies, 56 state and territorial programs, satellite systems, and thousands of published regulatory documents. This is the plan to connect them all.
PIN connects to 29 data sources spanning USGS, EPA, NOAA, state portals, and regional programs. 15 are returning live data; 14 are offline due to endpoint errors, URL changes, or missing headers.
| Source | Data Type | Status | Coverage | Action Required |
|---|---|---|---|---|
| USGS Real-Time (IV) | Stream flow, gage height, DO, temp | LIVE | ~13,000 active sites | — |
| USGS Daily Values | Daily stats from continuous monitors | LIVE | 1.9M+ sites | — |
| USGS Groundwater | Well levels, aquifer data | LIVE | 850K+ sites | — |
| EPA WQP | Unified portal: 1,000+ orgs | LIVE | 430M+ records | — |
| EPA ATTAINS | Assessment units, impairments, TMDLs | LIVE | 565K units | — |
| EPA SDWIS | Drinking water systems, violations | LIVE | 150K systems | — |
| NOAA CO-OPS | Tidal stations, water level, temp | LIVE | 210+ stations | — |
| Chesapeake Bay Program | Bay monitoring, nutrients, DO | LIVE | 20M+ records | — |
| Blue Water Baltimore | Harbor bacteria, nutrients | LIVE | Water Reporter API | — |
| CA CEDEN | California surface water monitoring | LIVE | 10K+ sites | — |
| TX TCEQ | Texas surface water quality | LIVE | 5M+ records | — |
| EJScreen | Environmental justice screening (EPA API offline Feb 2025; Census/SDWIS fallback active) | LIVE | State-level fallback | — |
| CBP DataHub | Chesapeake watershed detail | LIVE | Multiple endpoints | — |
| ICIS-NPDES | Discharge permits, violations | OFFLINE | 400K+ permits | HTTP 404 — verify Envirofacts table names |
| ICIS DMR | Discharge monitoring measurements | OFFLINE | Monthly per permit | HTTP 404 — correct table name |
| ECHO Facilities | CWA compliance status | OFFLINE | Facility-level | HTTP 404 — fix URL params |
| EPA FRS | Facility locations, NPDES links | OFFLINE | All NPDES | HTTP 404 — verify table name |
| PFAS / UCMR | PFAS screening data | OFFLINE | National UCMR | HTTP 406 — add Accept header |
| NPS Water Quality | National Park monitoring | OFFLINE | 11M results | HTTP 406 — add Accept header |
| NY Open Data | New York state WQ | OFFLINE | — | HTTP 404 — find Socrata dataset ID |
| NJ Open Data | New Jersey state WQ | OFFLINE | — | HTTP 404 — find Socrata dataset ID |
| PA Open Data | Pennsylvania state WQ | OFFLINE | — | HTTP 404 — find Socrata dataset ID |
| VA Open Data | Virginia state WQ | OFFLINE | — | HTTP 404 — find Socrata dataset ID |
| MD DNR ERDDAP | Maryland tidal continuous monitoring | OFFLINE | — | HTTP 400 — fix tabledap query |
| Monitor My Watershed | Citizen science sensor network | OFFLINE | — | HTTP 404 — verify API path |
| FL DBHYDRO | South Florida water management | OFFLINE | 35M+ records | Fetch fail — add retry logic |
| State Portal (generic) | State-level discovery | OFFLINE | — | Fetch fail — per-state URL config |
| CDC NWSS | Wastewater pathogen surveillance | OFFLINE | — | Not wired — Socrata endpoint ready |
| ICIS Enforcement | Enforcement actions | OFFLINE | — | Not wired — Envirofacts table |
Every source already in the codebase returns live data. URL corrections, header fixes, endpoint verification. No new architecture needed. ~20 hours total LOE.
| Source | Data Type | Status | LOE | Action Required |
|---|---|---|---|---|
| ICIS-NPDES | Discharge permits + violations | OFFLINE | 2 hrs | Verify Envirofacts table names at data.epa.gov/efservice/ |
| ICIS DMR | Monthly discharge measurements | OFFLINE | 2 hrs | Correct table name, add state filter |
| ECHO Facilities | CWA compliance status | OFFLINE | 1 hr | Fix URL params for cwa_rest_services |
| FRS WWTPs | Treatment plant locations | OFFLINE | 1 hr | Verify FRS_PROGRAM_FACILITY table |
| PFAS / UCMR | National PFAS screening | OFFLINE | 2 hrs | Check UCMR table name, add Accept header |
| NPS Water Quality | Park-specific monitoring | OFFLINE | 1 hr | Add Accept: application/json header |
| NY / NJ / PA / VA Open Data | State water quality (Socrata) | OFFLINE | 4 hrs | Find correct dataset IDs from state portals |
| MD DNR ERDDAP | Maryland tidal real-time | OFFLINE | 2 hrs | Fix tabledap query format, verify dataset ID |
| Monitor My Watershed | Citizen science sensors | OFFLINE | 2 hrs | Verify EnviroDIY API path |
| FL DBHYDRO | South Florida 35M+ records | OFFLINE | 1 hr | Add retry logic for intermittent FL DEP servers |
| State Portal (generic) | 50 state portals | OFFLINE | 2 hrs | Per-state URL configuration |
Sources identified in the national manifest not yet in the codebase. All public, no-auth APIs. Each requires a new fetch handler and registry entry. ~38 hours total LOE.
| Source | Data Type | Status | Coverage | LOE |
|---|---|---|---|---|
| CDC NWSS | Wastewater pathogens (COVID, flu, RSV) | NEW | National treatment plants | 3 hrs |
| NASA Earthdata CMR | Satellite chlorophyll-a, SST, turbidity | NEW | Global catalog | 4 hrs |
| Data.gov Catalog | 1,798+ water quality datasets (meta-index) | NEW | Source discovery engine | 4 hrs |
| EWG Tap Water | 50,000+ utility profiles, contaminant results | NEW | National utilities | 6 hrs |
| NOAA NDBC | 1,300+ ocean/lake buoys, waves, wind, temp | NEW | Coastal monitoring | 4 hrs |
| NOAA NERRS | 29 estuarine research reserves, continuous | NEW | Estuarine reference | 4 hrs |
| EPA NARS | National probabilistic surveys | NEW | Rivers, lakes, coast | 3 hrs |
| USACE | Lock/dam water quality, reservoir data | NEW | Corps infrastructure | 4 hrs |
| US Desalination | Municipal desal plants, capacity | NEW | Data.gov | 2 hrs |
| NEWTS | Brackish water treatment for energy | NEW | USGS/DOE | 2 hrs |
| Tribal WQP | 100+ tribal org submissions | NEW | WQP org filter | 2 hrs |
| SAM.gov Entity API | Registered water-infrastructure contractors per state | NEW | NAICS-filtered entities | 3 hrs |
Every state environmental agency publishes water quality data through their own portal. Some data never reaches WQP. Direct integration catches what federal systems miss. ~120 hours across all 50 states.
| Source | Data Type | Status | Coverage | LOE |
|---|---|---|---|---|
| Tier 1: 10 states | Socrata/open API portals — wire directly | NEW | MD, VA, PA, NY, NJ, FL, CA, TX, OH, MI | ~2 hrs each |
| Tier 2: 20 states | Downloadable CSV/Excel — scheduled scrape + parse | NEW | IL, NC, GA, WA, MN, and 15 more | ~3 hrs each |
| Tier 3: 15 states | Already in WQP — verify freshness | PLANNED | Cross-check vs WQP pull dates | ~1 hr each |
| Tier 4: 5 states + territories | Limited online data — manual outreach or FOIA | PLANNED | Flag for partnership | ~1 hr each |
States publish Integrated Reports (305(b)/303(d)) every 2 years containing data 12–18 months fresher than ATTAINS. AI-powered extraction turns thousands of PDFs into structured data with full citation. ~100 hours.
| Source | Data Type | Status | Coverage | LOE |
|---|---|---|---|---|
| 56 Integrated Reports | Assessment data fresher than ATTAINS | PLANNED | 56 state/territory PDFs | 20 hrs collect, 40 hrs extract |
| Consumer Confidence Reports | Annual drinking water test results per utility | PLANNED | ~50,000 reports nationally | Phased |
| TMDL Documents | Load allocations, monitoring data, targets | PLANNED | Thousands nationally | Phased |
| Stormwater Annual Reports | Outfall monitoring, BMP performance per MS4 | PLANNED | Thousands nationally | Phased |
| HAB Reports | Cyanotoxin concentrations, bloom locations | PLANNED | Seasonal per state | Phased |
| Fish Consumption Advisories | Contaminant levels in tissue as WQ proxy | PLANNED | Annual per state | Phased |
Satellite remote sensing fills spatial gaps where no ground monitoring exists. International datasets enable benchmarking. Emerging sources from academia, citizen science, and industrial self-monitoring round out coverage. ~80 hours.
| Source | Data Type | Status | Coverage | LOE |
|---|---|---|---|---|
| NASA Earthdata (MODIS/Landsat) | Chlorophyll-a, SST, turbidity raster | PLANNED | Global coverage | 12 hrs |
| NOAA CoastWatch | Ocean color, sea surface temperature | PLANNED | Gridded raster | 8 hrs |
| Sentinel-2/3 (ESA) | Higher resolution water quality indices | PLANNED | Free, global | 12 hrs |
| GEMStat (UN) | Global water quality from 100+ countries | PLANNED | International benchmark | 6 hrs |
| HydroShare | Published academic monitoring datasets | PLANNED | University research | 8 hrs |
| Citizen Science Networks | ALLARM, Izaak Walton League, state volunteer programs | PLANNED | Quality-flagged | 10 hrs |
| TRI (Toxics Release Inventory) | Industrial self-monitoring, partially via Envirofacts | PLANNED | National facilities | 4 hrs |
| USDA NRCS | Agricultural runoff, edge-of-field monitoring | PLANNED | Conservation practice data | 6 hrs |
PIN is the first platform to unify all of it, show exactly where the gaps are, and fill them with fresher data from state reports, mobile sensors, and lab results.
Learn About the Technology →