Advanced Caching Optimization
Hot-path cache orchestration, partial refresh, and source-aware invalidation to reduce latency and stale reads.
The national water quality data landscape encompasses 200+ discrete sources spanning federal agencies, 56 state and territorial programs, satellite systems, and thousands of published regulatory documents. This is the plan to connect them all.
PIN's core source stack is largely online. The roadmap was overstating outages; most previously flagged sources are live, and the remaining work is probe hardening or environment-specific configuration rather than missing integrations.
| Source | Data Type | Status | Coverage | Action Required |
|---|---|---|---|---|
| USGS Real-Time (IV) | Stream flow, gage height, DO, temp | LIVE | ~13,000 active sites | — |
| USGS Daily Values | Daily stats from continuous monitors | LIVE | 1.9M+ sites | — |
| USGS Groundwater | Well levels, aquifer data | LIVE | 850K+ sites | — |
| EPA WQP | Unified portal: 1,000+ orgs | LIVE | 430M+ records | — |
| EPA ATTAINS | Assessment units, impairments, TMDLs | LIVE | 565K units | — |
| EPA SDWIS | Drinking water systems, violations | LIVE | 150K systems | — |
| NOAA CO-OPS | Tidal stations, water level, temp | LIVE | 210+ stations | — |
| Chesapeake Bay Program | Bay monitoring, nutrients, DO | LIVE | 20M+ records | — |
| Blue Water Baltimore | Harbor bacteria, nutrients | DEGRADED | Water Reporter API | Needs Water Reporter API key in this environment |
| CA CEDEN | California surface water monitoring | LIVE | 10K+ sites | — |
| TX TCEQ | Texas surface water quality | LIVE | 5M+ records | — |
| EJScreen | Environmental justice screening (EPA API offline Feb 2025; Census/SDWIS fallback active) | LIVE | State-level fallback | — |
| CBP DataHub | Chesapeake watershed detail | LIVE | Multiple endpoints | — |
| ICIS-NPDES | Discharge permits, violations | LIVE | 400K+ permits | — |
| ICIS DMR | Discharge monitoring measurements | LIVE | Monthly per permit | — |
| ECHO Facilities | CWA compliance status | LIVE | Facility-level | — |
| EPA FRS | Facility locations, NPDES links | LIVE | All NPDES | — |
| PFAS / UCMR | PFAS screening data | LIVE | National UCMR | — |
| NPS Water Quality | National Park monitoring | LIVE | 11M results | — |
| NY Open Data | New York state WQ | LIVE | — | — |
| NJ Open Data | New Jersey state WQ | LIVE | — | — |
| PA Open Data | Pennsylvania state WQ | LIVE | — | — |
| VA Open Data | Virginia state WQ | LIVE | — | — |
| MD DNR ERDDAP | Maryland tidal continuous monitoring | OFFLINE | — | Primary MDE ArcGIS endpoint is still returning HTTP 500 in health checks |
| Monitor My Watershed | Citizen science sensor network | LIVE | — | — |
| FL DBHYDRO | South Florida water management | LIVE | 35M+ records | — |
| State Portal (generic) | State-level discovery | DEGRADED | — | Health probe reaches an authenticated internal route; source path is live but should not be classified as public-online |
| CDC NWSS | Wastewater pathogen surveillance | LIVE | — | — |
| ICIS Enforcement | Enforcement actions | LIVE | — | — |
The immediate work is no longer mass recovery. It is making source-health reporting accurate across environments, adding fallback-aware probes, and tightening cache-backed resilience where public endpoints are brittle.
| Source | Data Type | Status | LOE | Action Required |
|---|---|---|---|---|
| Source Health Probes | Per-source health classification | NEW | 3 hrs | Normalize live vs degraded vs offline so timeout-only probes do not read as dead integrations |
| State Portal Proxy | Internal state discovery health | DEGRADED | 1 hr | Probe the state portal through a health-safe internal path instead of an authenticated dashboard route |
| Blue Water Baltimore | Regional harbor monitoring | DEGRADED | 1 hr | Inject Water Reporter key in each deployment environment or suppress environment-specific warning from roadmap claims |
| MD MDE ArcGIS | Integrated report and TMDL overlays | OFFLINE | 2 hrs | Mirror the cron failover chain in health checks instead of probing a single brittle ArcGIS root |
| Public vs Internal Probe Split | Health classification hygiene | NEW | 1 hr | Separate public-source availability from internal-route auth requirements so health dashboards stay truthful |
| Roadmap Status Sync | Roadmap page truthfulness | NEW | 2 hrs | Drive roadmap counts from live source-health output or periodically refresh the static inventory |
Sources identified in the national manifest not yet in the codebase. All public, no-auth APIs. Each requires a new fetch handler and registry entry. ~38 hours total LOE.
| Source | Data Type | Status | Coverage | LOE |
|---|---|---|---|---|
| CDC NWSS | Wastewater pathogens (COVID, flu, RSV) | NEW | National treatment plants | 3 hrs |
| NASA Earthdata CMR | Satellite chlorophyll-a, SST, turbidity | NEW | Global catalog | 4 hrs |
| Data.gov Catalog | 1,798+ water quality datasets (meta-index) | NEW | Source discovery engine | 4 hrs |
| EWG Tap Water | 50,000+ utility profiles, contaminant results | NEW | National utilities | 6 hrs |
| NOAA NDBC | 1,300+ ocean/lake buoys, waves, wind, temp | NEW | Coastal monitoring | 4 hrs |
| NOAA NERRS | 29 estuarine research reserves, continuous | NEW | Estuarine reference | 4 hrs |
| EPA NARS | National probabilistic surveys | NEW | Rivers, lakes, coast | 3 hrs |
| USACE | Lock/dam water quality, reservoir data | NEW | Corps infrastructure | 4 hrs |
| US Desalination | Municipal desal plants, capacity | NEW | Data.gov | 2 hrs |
| NEWTS | Brackish water treatment for energy | NEW | USGS/DOE | 2 hrs |
| Tribal WQP | 100+ tribal org submissions | NEW | WQP org filter | 2 hrs |
| SAM.gov Entity API | Registered water-infrastructure contractors per state | NEW | NAICS-filtered entities | 3 hrs |
Every state environmental agency publishes water quality data through their own portal. Some data never reaches WQP. Direct integration catches what federal systems miss. ~120 hours across all 50 states.
| Source | Data Type | Status | Coverage | LOE |
|---|---|---|---|---|
| Tier 1: 10 states | Socrata/open API portals — wire directly | NEW | MD, VA, PA, NY, NJ, FL, CA, TX, OH, MI | ~2 hrs each |
| Tier 2: 20 states | Downloadable CSV/Excel — scheduled scrape + parse | NEW | IL, NC, GA, WA, MN, and 15 more | ~3 hrs each |
| Tier 3: 15 states | Already in WQP — verify freshness | PLANNED | Cross-check vs WQP pull dates | ~1 hr each |
| Tier 4: 5 states + territories | Limited online data — manual outreach or FOIA | PLANNED | Flag for partnership | ~1 hr each |
States publish Integrated Reports (305(b)/303(d)) every 2 years containing data 12–18 months fresher than ATTAINS. AI-powered extraction turns thousands of PDFs into structured data with full citation. ~100 hours.
| Source | Data Type | Status | Coverage | LOE |
|---|---|---|---|---|
| 56 Integrated Reports | Assessment data fresher than ATTAINS | PLANNED | 56 state/territory PDFs | 20 hrs collect, 40 hrs extract |
| Consumer Confidence Reports | Annual drinking water test results per utility | PLANNED | ~50,000 reports nationally | Phased |
| TMDL Documents | Load allocations, monitoring data, targets | PLANNED | Thousands nationally | Phased |
| Stormwater Annual Reports | Outfall monitoring, BMP performance per MS4 | PLANNED | Thousands nationally | Phased |
| HAB Reports | Cyanotoxin concentrations, bloom locations | PLANNED | Seasonal per state | Phased |
| Fish Consumption Advisories | Contaminant levels in tissue as WQ proxy | PLANNED | Annual per state | Phased |
Satellite remote sensing fills spatial gaps where no ground monitoring exists. International datasets enable benchmarking. Emerging sources from academia, citizen science, and industrial self-monitoring round out coverage. ~80 hours.
| Source | Data Type | Status | Coverage | LOE |
|---|---|---|---|---|
| NASA Earthdata (MODIS/Landsat) | Chlorophyll-a, SST, turbidity raster | PLANNED | Global coverage | 12 hrs |
| NOAA CoastWatch | Ocean color, sea surface temperature | PLANNED | Gridded raster | 8 hrs |
| Sentinel-2/3 (ESA) | Higher resolution water quality indices | PLANNED | Free, global | 12 hrs |
| GEMStat (UN) | Global water quality from 100+ countries | PLANNED | International benchmark | 6 hrs |
| HydroShare | Published academic monitoring datasets | PLANNED | University research | 8 hrs |
| Citizen Science Networks | ALLARM, Izaak Walton League, state volunteer programs | PLANNED | Quality-flagged | 10 hrs |
| TRI (Toxics Release Inventory) | Industrial self-monitoring, partially via Envirofacts | PLANNED | National facilities | 4 hrs |
| USDA NRCS | Agricultural runoff, edge-of-field monitoring | PLANNED | Conservation practice data | 6 hrs |
Beyond data-source onboarding, the roadmap expands PIN into a more autonomous intelligence and operations platform: higher-confidence data, earlier anomaly detection, stronger forecasting, and automated execution support.
Hot-path cache orchestration, partial refresh, and source-aware invalidation to reduce latency and stale reads.
Per-source confidence, freshness, completeness, and lineage scoring surfaced directly in user workflows.
Cross-signal anomaly detection on monitoring, compliance, weather, and infrastructure data streams.
Forecast models for impairments, contamination pressure, flood effects, and program workload.
Scenario overlays for heat, drought, flooding, salinity, wildfire smoke, and long-range watershed stress.
Operational ingestion of Earth observation imagery for HABs, turbidity, sediment plumes, and shoreline change.
Shared incident workspace connecting hazards, utilities, response teams, and executive actions in one operational thread.
Auto-generated permit, inspection, monitoring, and executive reporting packages from live system data.
Rules-driven routing, escalations, reminders, and cross-team handoffs tied to live environmental conditions.
Direct sensor, deployment, SCADA-adjacent, and field device integration with alerting and health diagnostics.
PIN is the first platform to unify all of it, show exactly where the gaps are, and fill them with fresher data from state reports, mobile sensors, and lab results.
Learn About the Technology →