Technical: Why H3?
Why iSamples uses Uber’s H3 hexagonal grid for spatial aggregation, and why specifically resolutions 4 / 6 / 8
The progressive globe and the Interactive Explorer both render millions of samples by aggregating points into pre-computed H3 cells at three resolutions. This page documents why H3, why those resolutions, and what we considered before adopting it.
H3 is a hierarchical hexagonal grid system, originally built by Uber for ride-sharing analytics and released as open source in 2018. We use it because hexagons have uniform neighbor distance (square grids do not), the index is a single 64-bit integer (cheap to store, fast to filter), and DuckDB has first-class H3 support. We pre-aggregate at resolutions 4 / 6 / 8 — chosen so each tier file fits comfortably in browser memory and roughly matches a continental / regional / neighborhood zoom band on the globe.
1 What H3 is
H3 partitions the Earth’s surface into a hierarchy of hexagonal cells at 16 resolutions (0 = ~4.4 M km² per cell, 15 = ~0.9 m²). Each cell has a unique 64-bit integer ID. Parents and children at adjacent resolutions don’t perfectly nest (a hexagon can’t tile to seven smaller hexagons exactly), but the approximate parent–child relationship is good enough for binning and rollup.
| Resource | Why follow it |
|---|---|
| h3geo.org | Authoritative documentation, including the resolution table |
| H3 GitHub | C library (canonical) plus bindings for Python, JS, Java, R, Go |
| Uber Engineering blog (2018) | Original announcement and design rationale |
| Sahr (2008), Discrete Global Grid Systems | Theoretical underpinning for hex-based DGGs |
| Wikipedia: Discrete global grid | Background — H3 has no standalone Wikipedia article yet (April 2026) |
The lack of a standalone Wikipedia article is a real gap and a fair signal of how niche the technology still is outside ride-sharing, mapping, and geo-analytics circles. iSamples chose H3 anyway; this page is part of how we make the choice legible to people who would otherwise have a Wikipedia article to fall back on.
2 Why hexagons over squares (or triangles)
Most map-tile systems and database geohash schemes (Google’s quadkey, the geohash string, S2’s Hilbert-curve-on-square cells) partition the world into squares or rectangles. The classical critique:
| Property | Square grid | Hex grid |
|---|---|---|
| Neighbor count | 8 (4 edge + 4 corner) | 6 (all edge) |
| Distance to all neighbors | Two distinct values (d and d√2) |
One value |
| Direction sampling | Anisotropic (axis-aligned) | More uniform |
| Fits a sphere cleanly | No (poles/dateline distortion) | Better (12 pentagons hide the curvature) |
For aggregation queries — “how many samples in this cell and its neighbors?” — uniform neighbor distance is the property that matters. With squares, you must decide whether a diagonal neighbor “counts the same” as an edge neighbor; with hexagons, you don’t.
Triangles satisfy uniform neighbor distance too, but they alternate orientation (point-up / point-down), which makes neighborhood logic and rendering both more complex.
3 Why H3 over S2 or geohash
S2 (Google) and geohash are the two most common alternatives. Both partition into squares.
- Geohash uses a string-based base-32 encoding. Adjacent cells often have very different prefixes (the poles-and-dateline problem), which makes range queries unreliable for neighborhood lookups. Cells get badly distorted near the poles.
- S2 uses a quad-tree projected onto the six faces of a cube, with cells indexed via a Hilbert curve. The neighborhood logic is sound, but cells are still squares with anisotropic neighbor distance, and the index is not as straightforward to use from SQL.
- H3 is a hex grid with a 64-bit integer index, with first-class C / Python / JavaScript / SQL bindings. The DuckDB H3 extension (which we use) operates on the integer index directly —
WHERE h3_res6 = 612345...is a fast equality scan over a sorted column.
We did not benchmark S2 head-to-head against H3 for this project. The hexagon-vs-square argument plus H3’s DuckDB integration plus the prior art (Eric Kansa flagged H3 in December 2025; pqg #19 added the H3 indexing CLI in February 2026) was enough.
4 Why resolutions 4 / 6 / 8 specifically
H3 has 16 resolutions. We pre-aggregate at three of them — 4, 6, 8 — and serve each as a separate parquet file. The choice is driven by:
Cell size at each resolution, from the H3 resolution table:
Resolution Avg edge length Avg cell area Roughly… 4 22 km 1,770 km² Subregion of a small country 5 8.5 km 253 km² County 6 3.2 km 36 km² Town 7 1.2 km 5 km² Neighborhood 8 460 m 0.74 km² A few city blocks Globe altitude bands. The Cesium camera at altitude 1000 km sees roughly continental scale; at 100 km, regional; below ~10 km, neighborhood. Resolutions 4 / 6 / 8 land near the centers of those bands — odd resolutions (5, 7, 9) would also work but offer diminishing returns at the cost of an extra file to ship.
Parquet size budget. The progressive globe loads the lowest-resolution tier first and reaches for higher resolution as the user zooms in. Each tier has to fit comfortably in browser memory:
File Resolution Cells Size on R2 isamples_202601_h3_summary_res4.parquet4 ~38 K 580 KB isamples_202601_h3_summary_res6.parquet6 ~112 K 1.6 MB isamples_202601_h3_summary_res8.parquet8 ~176 K 2.4 MB Adding res-5 and res-7 tiers would roughly triple the on-the-wire payload for a barely-perceptible improvement in cluster smoothness during zoom transitions.
Skip-by-two leaves obvious detail jumps, which the renderer leans into rather than fights — the user perceives the level change as deliberate progressive disclosure rather than a stutter.
Below res-8 (zoom ≥ ~10, altitude < ~120 km on the globe), aggregation stops mattering: there are usually fewer than a few thousand individual samples in view, and we serve them as points from samples_map_lite.parquet instead.
5 What this means for queries
The wide parquet carries h3_res4, h3_res6, and h3_res8 BIGINT columns (added by pqg add-h3 — see pqg PR #19). Filtering or grouping on these columns is a sorted-integer scan — much faster than recomputing H3 cells from (latitude, longitude) at query time, and DuckDB-WASM doesn’t ship the H3 extension, so the alternative would mean shipping every point.
Two query patterns are common:
# 1. Aggregate cells in a region (use the dedicated tier file — much smaller)
con.sql("""
SELECT h3_cell, sample_count, dominant_source, center_lat, center_lng
FROM read_parquet('https://data.isamples.org/isamples_202601_h3_summary_res6.parquet')
WHERE center_lat BETWEEN 30 AND 40
AND center_lng BETWEEN -125 AND -115
ORDER BY sample_count DESC
""").df()# 2. Filter the wide file to one or more H3 cells (use the precomputed column)
con.sql("""
SELECT pid, label, latitude, longitude, n AS source
FROM read_parquet('https://data.isamples.org/current/wide.parquet')
WHERE h3_res6 = 612345678901234567 -- one cell at resolution 6
AND otype = 'MaterialSampleRecord'
LIMIT 100
""").df()For the full schema and aggregate columns, see the serialization catalog and data downloads.
6 What we’d revisit
The current design assumes dominant-source-per-cell is good enough for color encoding on the globe (see the source-color legend in the Interactive Explorer). When two sources are nearly equally represented in the same cell, the rendered color hides the second one. Eric Kansa raised this in our December 2025 discussion; we accepted the simplification for the initial release and may revisit it with per-source counts if the closeout demos or the June 2026 keynote surface the issue.
We also do not ship resolutions 0–3 (continental / global) or 9–15 (sub-meter). The globe never zooms out far enough to need lower resolutions, and the lite parquet covers the high-resolution case better than per-cell aggregates would.
7 See also
tutorials/progressive_globe.qmd— the tier files in action on a Cesium globetutorials/narrow_vs_wide_performance.qmd— performance comparison across schema shapes, including the H3-augmented widedata.qmd §4— the zoom-to-resolution breakpoint tableSERIALIZATIONS.md— full catalog including the H3 tier files- pqg PR #19 — the build-time CLI that adds H3 columns and emits the tier files