Skip to main content
Data Source Roadmap

From 29 sources to 200+

The national water quality data landscape encompasses 200+ discrete sources spanning federal agencies, 56 state and territorial programs, satellite systems, and thousands of published regulatory documents. This is the plan to connect them all.

29
Sources Today
75+
90-Day Target
150+
6-Month Target
200+
Full Landscape
15 live of 29 wiredTarget: 200+
04075130200
00
Current State
Today
29 Sources (15 Live)

PIN connects to 29 data sources spanning USGS, EPA, NOAA, state portals, and regional programs. 15 are returning live data; 14 are offline due to endpoint errors, URL changes, or missing headers.

SourceData TypeStatusCoverageAction Required
USGS Real-Time (IV)Stream flow, gage height, DO, tempLIVE~13,000 active sites
USGS Daily ValuesDaily stats from continuous monitorsLIVE1.9M+ sites
USGS GroundwaterWell levels, aquifer dataLIVE850K+ sites
EPA WQPUnified portal: 1,000+ orgsLIVE430M+ records
EPA ATTAINSAssessment units, impairments, TMDLsLIVE565K units
EPA SDWISDrinking water systems, violationsLIVE150K systems
NOAA CO-OPSTidal stations, water level, tempLIVE210+ stations
Chesapeake Bay ProgramBay monitoring, nutrients, DOLIVE20M+ records
Blue Water BaltimoreHarbor bacteria, nutrientsLIVEWater Reporter API
CA CEDENCalifornia surface water monitoringLIVE10K+ sites
TX TCEQTexas surface water qualityLIVE5M+ records
EJScreenEnvironmental justice screening (EPA API offline Feb 2025; Census/SDWIS fallback active)LIVEState-level fallback
CBP DataHubChesapeake watershed detailLIVEMultiple endpoints
ICIS-NPDESDischarge permits, violationsOFFLINE400K+ permitsHTTP 404 — verify Envirofacts table names
ICIS DMRDischarge monitoring measurementsOFFLINEMonthly per permitHTTP 404 — correct table name
ECHO FacilitiesCWA compliance statusOFFLINEFacility-levelHTTP 404 — fix URL params
EPA FRSFacility locations, NPDES linksOFFLINEAll NPDESHTTP 404 — verify table name
PFAS / UCMRPFAS screening dataOFFLINENational UCMRHTTP 406 — add Accept header
NPS Water QualityNational Park monitoringOFFLINE11M resultsHTTP 406 — add Accept header
NY Open DataNew York state WQOFFLINEHTTP 404 — find Socrata dataset ID
NJ Open DataNew Jersey state WQOFFLINEHTTP 404 — find Socrata dataset ID
PA Open DataPennsylvania state WQOFFLINEHTTP 404 — find Socrata dataset ID
VA Open DataVirginia state WQOFFLINEHTTP 404 — find Socrata dataset ID
MD DNR ERDDAPMaryland tidal continuous monitoringOFFLINEHTTP 400 — fix tabledap query
Monitor My WatershedCitizen science sensor networkOFFLINEHTTP 404 — verify API path
FL DBHYDROSouth Florida water managementOFFLINE35M+ recordsFetch fail — add retry logic
State Portal (generic)State-level discoveryOFFLINEFetch fail — per-state URL config
CDC NWSSWastewater pathogen surveillanceOFFLINENot wired — Socrata endpoint ready
ICIS EnforcementEnforcement actionsOFFLINENot wired — Envirofacts table
01
Fix Offline Sources
Week 1–2
29/29 Online

Every source already in the codebase returns live data. URL corrections, header fixes, endpoint verification. No new architecture needed. ~20 hours total LOE.

SourceData TypeStatusLOEAction Required
ICIS-NPDESDischarge permits + violationsOFFLINE2 hrsVerify Envirofacts table names at data.epa.gov/efservice/
ICIS DMRMonthly discharge measurementsOFFLINE2 hrsCorrect table name, add state filter
ECHO FacilitiesCWA compliance statusOFFLINE1 hrFix URL params for cwa_rest_services
FRS WWTPsTreatment plant locationsOFFLINE1 hrVerify FRS_PROGRAM_FACILITY table
PFAS / UCMRNational PFAS screeningOFFLINE2 hrsCheck UCMR table name, add Accept header
NPS Water QualityPark-specific monitoringOFFLINE1 hrAdd Accept: application/json header
NY / NJ / PA / VA Open DataState water quality (Socrata)OFFLINE4 hrsFind correct dataset IDs from state portals
MD DNR ERDDAPMaryland tidal real-timeOFFLINE2 hrsFix tabledap query format, verify dataset ID
Monitor My WatershedCitizen science sensorsOFFLINE2 hrsVerify EnviroDIY API path
FL DBHYDROSouth Florida 35M+ recordsOFFLINE1 hrAdd retry logic for intermittent FL DEP servers
State Portal (generic)50 state portalsOFFLINE2 hrsPer-state URL configuration
02
New Federal Sources
Week 3–4
40+ Sources

Sources identified in the national manifest not yet in the codebase. All public, no-auth APIs. Each requires a new fetch handler and registry entry. ~38 hours total LOE.

SourceData TypeStatusCoverageLOE
CDC NWSSWastewater pathogens (COVID, flu, RSV)NEWNational treatment plants3 hrs
NASA Earthdata CMRSatellite chlorophyll-a, SST, turbidityNEWGlobal catalog4 hrs
Data.gov Catalog1,798+ water quality datasets (meta-index)NEWSource discovery engine4 hrs
EWG Tap Water50,000+ utility profiles, contaminant resultsNEWNational utilities6 hrs
NOAA NDBC1,300+ ocean/lake buoys, waves, wind, tempNEWCoastal monitoring4 hrs
NOAA NERRS29 estuarine research reserves, continuousNEWEstuarine reference4 hrs
EPA NARSNational probabilistic surveysNEWRivers, lakes, coast3 hrs
USACELock/dam water quality, reservoir dataNEWCorps infrastructure4 hrs
US DesalinationMunicipal desal plants, capacityNEWData.gov2 hrs
NEWTSBrackish water treatment for energyNEWUSGS/DOE2 hrs
Tribal WQP100+ tribal org submissionsNEWWQP org filter2 hrs
SAM.gov Entity APIRegistered water-infrastructure contractors per stateNEWNAICS-filtered entities3 hrs
03
All 50 State Direct Portals
Month 2–3
75+ Sources

Every state environmental agency publishes water quality data through their own portal. Some data never reaches WQP. Direct integration catches what federal systems miss. ~120 hours across all 50 states.

SourceData TypeStatusCoverageLOE
Tier 1: 10 statesSocrata/open API portals — wire directlyNEWMD, VA, PA, NY, NJ, FL, CA, TX, OH, MI~2 hrs each
Tier 2: 20 statesDownloadable CSV/Excel — scheduled scrape + parseNEWIL, NC, GA, WA, MN, and 15 more~3 hrs each
Tier 3: 15 statesAlready in WQP — verify freshnessPLANNEDCross-check vs WQP pull dates~1 hr each
Tier 4: 5 states + territoriesLimited online data — manual outreach or FOIAPLANNEDFlag for partnership~1 hr each
04
State Integrated Report Extraction
Month 3–4
130+ Sources

States publish Integrated Reports (305(b)/303(d)) every 2 years containing data 12–18 months fresher than ATTAINS. AI-powered extraction turns thousands of PDFs into structured data with full citation. ~100 hours.

SourceData TypeStatusCoverageLOE
56 Integrated ReportsAssessment data fresher than ATTAINSPLANNED56 state/territory PDFs20 hrs collect, 40 hrs extract
Consumer Confidence ReportsAnnual drinking water test results per utilityPLANNED~50,000 reports nationallyPhased
TMDL DocumentsLoad allocations, monitoring data, targetsPLANNEDThousands nationallyPhased
Stormwater Annual ReportsOutfall monitoring, BMP performance per MS4PLANNEDThousands nationallyPhased
HAB ReportsCyanotoxin concentrations, bloom locationsPLANNEDSeasonal per statePhased
Fish Consumption AdvisoriesContaminant levels in tissue as WQ proxyPLANNEDAnnual per statePhased
05
Satellite, International & Emerging
Month 4–6
200+ Sources

Satellite remote sensing fills spatial gaps where no ground monitoring exists. International datasets enable benchmarking. Emerging sources from academia, citizen science, and industrial self-monitoring round out coverage. ~80 hours.

SourceData TypeStatusCoverageLOE
NASA Earthdata (MODIS/Landsat)Chlorophyll-a, SST, turbidity rasterPLANNEDGlobal coverage12 hrs
NOAA CoastWatchOcean color, sea surface temperaturePLANNEDGridded raster8 hrs
Sentinel-2/3 (ESA)Higher resolution water quality indicesPLANNEDFree, global12 hrs
GEMStat (UN)Global water quality from 100+ countriesPLANNEDInternational benchmark6 hrs
HydroSharePublished academic monitoring datasetsPLANNEDUniversity research8 hrs
Citizen Science NetworksALLARM, Izaak Walton League, state volunteer programsPLANNEDQuality-flagged10 hrs
TRI (Toxics Release Inventory)Industrial self-monitoring, partially via EnvirofactsPLANNEDNational facilities4 hrs
USDA NRCSAgricultural runoff, edge-of-field monitoringPLANNEDConservation practice data6 hrs

PIN is the first platform to unify all of it, show exactly where the gaps are, and fill them with fresher data from state reports, mobile sensors, and lab results.

Learn About the Technology →