Harvey article / data science workup
Harvey floodplain stress test: the data workup behind the map.
This page is the technical companion to the article. The article tells the
story; this page shows the evidence pipeline, scoring method, caveats, and
what still needs to be tested before the project can claim more.
535Houston-region Harvey high-water marks scored
170intersect Zone X / minimal context
92+ ft Zone X marks at least 0.25 mi from SFHA
42+ ft Zone X marks at least 1 mi from SFHA
Short version:
The tested Variant C result does not say “the maps failed.” It says most
outside-SFHA marks were boundary-near, but a smaller material tail was not:
among Zone X/minimal-context marks with at least 2 ft above ground, 9 were
at least a quarter mile from current SFHA geometry and 4 were at least a mile.
Open the working map or
read the narrative article.
Pipeline
Acquire
Pull USGS Harvey peak high-water marks from the CUAHSI HydroShare WFS
service. Preserve the raw GeoJSON and a source manifest.
Normalize
Keep coordinates, county, waterbody, peak stage, height above ground,
peak date, datum, estimated flags, and notes. Filter to Harris, Brazoria,
Fort Bend, Liberty, Montgomery, Chambers, Galveston, and Waller counties.
Score
Query FEMA NFHL Flood Hazard Zones at each high-water mark point.
Classify the result as current SFHA context or Zone X/minimal context.
Distance
Fetch current FEMA SFHA polygons for the Houston-region bounding box,
project points and polygons to EPSG:3083, and calculate distance from each
Zone X/minimal-context Harvey mark to the nearest SFHA polygon geometry.
Visualize
Aggregate the observed marks into coarse stress cells so the reader sees
the pattern first, then can toggle individual marks as receipts.
What is defensible now
- The workflow can reproduce the 535-mark Houston-region subset.
- Current FEMA context can be queried and cached for each mark.
- The distance calculation scored 170 Zone X/minimal-context marks against current SFHA geometry.
- The material subset is 54 Zone X/minimal-context marks with at least 2 ft above ground.
- Among that material subset, 9 marks are at least 0.25 miles from current SFHA geometry and 4 are at least 1 mile away.
What is not proven yet
- This is not yet matched to the exact effective pre-Harvey FIRM panel for each point.
- It does not yet incorporate terrain/elevation or official flood-depth rasters.
- It should not be used as insurance, engineering, emergency, or property advice.
Variant C result
The surprise-distance test is now built:
Zone X mark + height above ground + distance to nearest SFHA boundary
The median Zone X/minimal-context distance is 0.032 miles, which means many
misses are boundary-near. The engaging part is the tail: 9 material marks are
at least 0.25 miles from current SFHA geometry, and 4 are at least 1 mile away.
| Site |
County |
Waterbody |
Above ground |
Distance to current SFHA |
FEMA context |
| TXGAL20912 | Galveston | cloud bayou | 2.6 ft | 1.469 mi | Zone X, minimal hazard |
| TXCHA21578 | Chambers | Oyster Bayou | 2.9 ft | 1.278 mi | Zone X, 0.2 pct annual chance |
| TXGAL20911 | Galveston | cloud bayou | 3.58 ft | 1.214 mi | Zone X, 0.2 pct annual chance |
| TXLIB21725 | Liberty | Ditch | 5.8 ft | 1.085 mi | Zone X, minimal hazard |
| TXHAR22784 | Harris | Buffalo Bayou | 2.9 ft | 0.741 mi | Zone X, minimal hazard |
Full generated table:
harvey-surprise-top10.json.
Version E context comparison
The current FEMA point-intersection split is not balanced: 365 marks fall in
current SFHA context and 170 fall in Zone X/minimal context. The outside group
is shallower on median, but still contains 54 marks at or above 2 ft above ground.
| Current FEMA context |
Marks |
At least 2 ft above ground |
Median above ground |
Max above ground |
Zero-height marks |
| Inside SFHA | 365 | 181 | 1.97 ft | 11.85 ft | 132 |
| Zone X/minimal context | 170 | 54 | 1.07 ft | 7.5 ft | 67 |
Version E sensitivity check
The 2 ft material threshold is a screening choice, so the distance tail was
checked against lower and higher height cutoffs. The tail persists at 1 ft
and tightens as the height threshold rises.
| Zone X/minimal height cutoff |
Marks |
At least 0.05 mi from SFHA |
At least 0.25 mi from SFHA |
At least 1 mi from SFHA |
| 0+ ft | 170 | 76 | 37 | 8 |
| 1+ ft | 88 | 39 | 19 | 4 |
| 2+ ft | 54 | 23 | 9 | 4 |
| 3+ ft | 34 | 12 | 4 | 2 |
| 4+ ft | 16 | 7 | 1 | 1 |
| 5+ ft | 9 | 5 | 1 | 1 |
Version E near-vs-far check
Among the 54 material Zone X/minimal-context marks, 45 are less than 0.25 miles
from current SFHA geometry and 9 are at least 0.25 miles away. This keeps the
main claim narrow: most outside marks are boundary-near, while the far tail is
small enough to inspect site by site.
| Distance bucket |
Material marks |
Median above ground |
Max above ground |
Median distance |
Max distance |
| Less than 0.05 mi | 31 | 3.54 ft | 6.2 ft | 0.008 mi | 0.046 mi |
| 0.05 to 0.25 mi | 14 | 3.865 ft | 7.5 ft | 0.102 mi | 0.226 mi |
| 0.25 to 1 mi | 5 | 2.9 ft | 3.7 ft | 0.307 mi | 0.741 mi |
| 1 mi or more | 4 | 3.24 ft | 5.8 ft | 1.246 mi | 1.469 mi |
Version E data handling
Zero-height records are retained as source observations with
height_above_ground_ft = 0. They remain in the 535-mark scored set
and in context counts, but they are excluded from material-threshold tables such
as the 2+ ft outside-SFHA test.
Estimated flags are preserved as audit fields and are not used as filters in this
version. Counts below are for the 535 scored Houston-region marks.
| Field |
Definition used here |
True |
False |
is_peak_estimated | Peak value includes an estimated component in the normalized source record. | 530 | 5 |
is_peak_stage_estimated | Peak stage was flagged as estimated in the normalized source record. | 393 | 142 |
is_peak_time_estimated | Peak timing was flagged as estimated in the normalized source record. | 532 | 3 |
Measurement plan
For the public site, track the funnel without collecting more personal data
than needed:
- LinkedIn post variant using UTM tags:
utm_content=harvey_a or utm_content=harvey_b.
- Article page view and scroll-depth completion.
- Click-through from article to this data science workup.
- Time on this page and outbound clicks to source manifest, map, notebook, or repo.
- Aggregate counts by page and referrer only; keep raw IP/user-agent logs siloed from the public proof packet.