Narrow vs Wide vs H3 Schema Performance Comparison
parquet
performance
benchmarking
h3
This page benchmarks the performance difference between narrow, wide, and wide + H3 parquet schema formats when accessing data “over the wire” via HTTP range requests in DuckDB-WASM.
1 Introduction
1.1 What are Narrow vs Wide Schemas?
The iSamples property graph data can be serialized in three different parquet formats:
| Format | Description | File Size | Row Count | Sources |
|---|---|---|---|---|
| Narrow | Stores relationships as separate edge rows (otype='_edge_') |
~850 MB | ~106M rows | All 4 sources |
| Wide | Stores relationships as p__* columns on entity rows |
~278 MB | ~20M rows | All 4 sources |
| Wide + H3 | Wide format + pre-computed H3 spatial indices (h3_res4/6/8) |
~292 MB | ~20M rows | All 4 sources |
All three formats represent the same underlying data (SESAR, OpenContext, GEOME, Smithsonian) with identical semantics. The wide format eliminates edge rows; the H3 variant adds pre-computed hexagonal spatial indices for accelerated geospatial queries.
A facet summaries file (2KB) provides pre-aggregated counts for instant facet lookups.
Data source: Cloudflare R2 (updated January 2026)
1.2 Why Performance Matters
When using DuckDB-WASM in the browser:
- Data is fetched via HTTP range requests (206 Partial Content)
- Only the columns and row groups needed for a query are downloaded
- Smaller files with fewer rows = fewer bytes to transfer, faster queries
Expected speedup: Wide format should be 2-3x faster based on local benchmarks.
2 Methodology
NoteBenchmarking Approach
- Cold run: First query (includes metadata fetch, JIT compilation) - reported separately
- Warm runs: Runs 2-3 (metadata cached, JIT warmed up)
- Warm median: Median of warm runs only (excludes cold run for fair comparison)
- Sequential execution: Benchmarks run one after another, not concurrently
- Network variability: Results will vary based on your network connection and hardware
Results are shown in real-time as benchmarks complete. Data loading only begins when you click the button.
3 Setup
3.1 Environment Info
3.2 Run Benchmarks
4 Data Validity Check
Before benchmarking, let’s confirm both schemas represent the same underlying data.
Checking data validity…
Code
validityCheck = {
// Only run when button clicked AND databases are initialized
if (runBenchmarks < 1 || !dbNarrow || !dbWide || !dbWideH3) return null;
const loadingDiv = document.getElementById('loading_validity');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
try {
// Count rows in narrow
const narrowCount = await dbNarrow.query(`SELECT COUNT(*) as cnt FROM narrow`);
const narrowTotal = narrowCount[0].cnt;
// Count rows in wide
const wideCount = await dbWide.query(`SELECT COUNT(*) as cnt FROM wide`);
const wideTotal = wideCount[0].cnt;
// Count rows in wide+H3
const wideH3Count = await dbWideH3.query(`SELECT COUNT(*) as cnt FROM wide_h3`);
const wideH3Total = wideH3Count[0].cnt;
// Count entity types in narrow (excluding edges)
const narrowEntities = await dbNarrow.query(`
SELECT COUNT(*) as cnt FROM narrow
WHERE otype != '_edge_'
`);
const narrowEntityCount = narrowEntities[0].cnt;
// Count samples in all three
const narrowSamples = await dbNarrow.query(`
SELECT COUNT(*) as cnt FROM narrow
WHERE otype = 'MaterialSampleRecord'
`);
const wideSamples = await dbWide.query(`
SELECT COUNT(*) as cnt FROM wide
WHERE otype = 'MaterialSampleRecord'
`);
const wideH3Samples = await dbWideH3.query(`
SELECT COUNT(*) as cnt FROM wide_h3
WHERE otype = 'MaterialSampleRecord'
`);
return {
narrowTotal: narrowTotal,
wideTotal: wideTotal,
wideH3Total: wideH3Total,
narrowEntities: narrowEntityCount,
narrowSamples: narrowSamples[0].cnt,
wideSamples: wideSamples[0].cnt,
wideH3Samples: wideH3Samples[0].cnt,
sampleMatch: narrowSamples[0].cnt === wideSamples[0].cnt
&& wideSamples[0].cnt === wideH3Samples[0].cnt
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Validity check failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}5 Benchmark 1: Entity Count Query
This benchmark tests a simple COUNT(*) GROUP BY otype query, which requires scanning row metadata.
Running Benchmark 1…
Code
benchmark1 = {
// Wait for validity check to complete first (sequential execution)
if (runBenchmarks < 1 || !validityCheck || validityCheck.error) return null;
const loadingDiv = document.getElementById('loading_b1');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
const query = `SELECT otype, COUNT(*) as cnt FROM {table} GROUP BY otype ORDER BY cnt DESC`;
const runs = 3;
try {
// Narrow benchmark
const narrowTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbNarrow.query(query.replace('{table}', 'narrow'));
narrowTimes.push(performance.now() - start);
}
// Wide benchmark
const wideTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWide.query(query.replace('{table}', 'wide'));
wideTimes.push(performance.now() - start);
}
// Calculate medians (excluding cold run for warm median)
const median = arr => {
const sorted = [...arr].sort((a, b) => a - b);
if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
return sorted[Math.floor(sorted.length / 2)];
};
const warmMedian = arr => {
if (arr.length <= 1) return arr[0] || 0;
const warm = arr.slice(1); // exclude first (cold) run
return median(warm);
};
const narrowMedian = warmMedian(narrowTimes);
const wideMedian = warmMedian(wideTimes);
return {
name: "Entity Count (GROUP BY otype)",
narrowCold: narrowTimes[0],
narrowMedian: narrowMedian,
narrowAll: narrowTimes,
wideCold: wideTimes[0],
wideMedian: wideMedian,
wideAll: wideTimes,
speedup: narrowMedian / wideMedian
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Benchmark 1 failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}6 Benchmark 2: Sample Count by Site
This benchmark counts samples per sampling site, requiring a join between samples and sites.
Query complexity: - Narrow: Requires joining through edge rows - Wide: Direct join via p__* columns
Running Benchmark 2…
Code
benchmark2 = {
// Wait for benchmark1 to complete first (sequential execution)
if (runBenchmarks < 1 || !benchmark1 || benchmark1.error) return null;
const loadingDiv = document.getElementById('loading_b2');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
// Narrow query: traverse edges to get from sample -> event -> site
const narrowQuery = `
WITH sample_events AS (
SELECT
e.o[1] as event_id,
s.row_id as sample_id
FROM narrow s
JOIN narrow e ON s.row_id = e.s AND e.p = 'produced_by'
WHERE s.otype = 'MaterialSampleRecord'
),
event_sites AS (
SELECT
se.sample_id,
e2.o[1] as site_id
FROM sample_events se
JOIN narrow e2 ON se.event_id = e2.s AND e2.p = 'sampling_site'
)
SELECT
site.label,
COUNT(*) as sample_count
FROM event_sites es
JOIN narrow site ON es.site_id = site.row_id
GROUP BY site.label
ORDER BY sample_count DESC
LIMIT 10
`;
// Wide query: direct column access
const wideQuery = `
WITH sample_sites AS (
SELECT
s.row_id as sample_id,
e.p__sampling_site[1] as site_id
FROM wide s
JOIN wide e ON s.p__produced_by[1] = e.row_id
WHERE s.otype = 'MaterialSampleRecord'
AND e.otype = 'SamplingEvent'
)
SELECT
site.label,
COUNT(*) as sample_count
FROM sample_sites ss
JOIN wide site ON ss.site_id = site.row_id
WHERE site.otype = 'SamplingSite'
GROUP BY site.label
ORDER BY sample_count DESC
LIMIT 10
`;
const runs = 3;
try {
// Narrow benchmark
const narrowTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbNarrow.query(narrowQuery);
narrowTimes.push(performance.now() - start);
}
// Wide benchmark
const wideTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWide.query(wideQuery);
wideTimes.push(performance.now() - start);
}
const median = arr => {
const sorted = [...arr].sort((a, b) => a - b);
if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
return sorted[Math.floor(sorted.length / 2)];
};
const warmMedian = arr => {
if (arr.length <= 1) return arr[0] || 0;
const warm = arr.slice(1);
return median(warm);
};
const narrowMedian = warmMedian(narrowTimes);
const wideMedian = warmMedian(wideTimes);
return {
name: "Sample Count by Site (multi-join)",
narrowCold: narrowTimes[0],
narrowMedian: narrowMedian,
narrowAll: narrowTimes,
wideCold: wideTimes[0],
wideMedian: wideMedian,
wideAll: wideTimes,
speedup: narrowMedian / wideMedian
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Benchmark 2 failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}7 Benchmark 3: Material Type Distribution
This benchmark aggregates sample counts by material category.
Running Benchmark 3…
Code
benchmark3 = {
// Wait for benchmark2 to complete first (sequential execution)
if (runBenchmarks < 1 || !benchmark2 || benchmark2.error) return null;
const loadingDiv = document.getElementById('loading_b3');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
// Narrow query: join through edges to material concepts
const narrowQuery = `
SELECT
c.label as material,
COUNT(*) as sample_count
FROM narrow s
JOIN narrow e ON s.row_id = e.s AND e.p = 'has_material_category'
JOIN narrow c ON e.o[1] = c.row_id
WHERE s.otype = 'MaterialSampleRecord'
GROUP BY c.label
ORDER BY sample_count DESC
LIMIT 10
`;
// Wide query: direct column access to material category
const wideQuery = `
SELECT
c.label as material,
COUNT(*) as sample_count
FROM wide s
JOIN wide c ON s.p__has_material_category[1] = c.row_id
WHERE s.otype = 'MaterialSampleRecord'
AND c.otype = 'IdentifiedConcept'
GROUP BY c.label
ORDER BY sample_count DESC
LIMIT 10
`;
const runs = 3;
try {
// Narrow benchmark
const narrowTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbNarrow.query(narrowQuery);
narrowTimes.push(performance.now() - start);
}
// Wide benchmark
const wideTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWide.query(wideQuery);
wideTimes.push(performance.now() - start);
}
const median = arr => {
const sorted = [...arr].sort((a, b) => a - b);
if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
return sorted[Math.floor(sorted.length / 2)];
};
const warmMedian = arr => {
if (arr.length <= 1) return arr[0] || 0;
const warm = arr.slice(1);
return median(warm);
};
const narrowMedian = warmMedian(narrowTimes);
const wideMedian = warmMedian(wideTimes);
return {
name: "Material Type Distribution",
narrowCold: narrowTimes[0],
narrowMedian: narrowMedian,
narrowAll: narrowTimes,
wideCold: wideTimes[0],
wideMedian: wideMedian,
wideAll: wideTimes,
speedup: narrowMedian / wideMedian
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Benchmark 3 failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}8 Benchmark 4: Geospatial Bounding Box Query
This benchmark counts samples within the western United States (lat 32-49, lon -125 to -104), comparing lat/lon filtering on wide format versus H3-cell-based filtering on the H3-indexed file.
Running Benchmark 4…
Code
benchmark4 = {
if (runBenchmarks < 1 || !benchmark3 || benchmark3.error) return null;
const loadingDiv = document.getElementById('loading_b4');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
// Wide: baseline lat/lon bounding box
const wideQuery = `
SELECT COUNT(*) as cnt FROM wide
WHERE otype = 'MaterialSampleRecord'
AND latitude BETWEEN 32 AND 49
AND longitude BETWEEN -125 AND -104
`;
// Wide+H3: H3-accelerated — first identify cells in the bbox, then filter by cells
const h3Query = `
WITH cells AS (
SELECT DISTINCT h3_res4 FROM wide_h3
WHERE latitude BETWEEN 32 AND 49
AND longitude BETWEEN -125 AND -104
AND otype = 'MaterialSampleRecord'
)
SELECT COUNT(*) as cnt FROM wide_h3
WHERE h3_res4 IN (SELECT h3_res4 FROM cells)
AND otype = 'MaterialSampleRecord'
`;
const runs = 3;
try {
const wideTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWide.query(wideQuery);
wideTimes.push(performance.now() - start);
}
const h3Times = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWideH3.query(h3Query);
h3Times.push(performance.now() - start);
}
const median = arr => {
const sorted = [...arr].sort((a, b) => a - b);
if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
return sorted[Math.floor(sorted.length / 2)];
};
const warmMedian = arr => {
if (arr.length <= 1) return arr[0] || 0;
return median(arr.slice(1));
};
const wideMedian = warmMedian(wideTimes);
const h3Median = warmMedian(h3Times);
return {
name: "Geospatial BBox (Western US)",
wideCold: wideTimes[0],
wideMedian: wideMedian,
wideAll: wideTimes,
h3Cold: h3Times[0],
h3Median: h3Median,
h3All: h3Times,
speedup: wideMedian / h3Median
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Benchmark 4 failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}9 Benchmark 5: Facet Aggregation — Full Scan vs Pre-computed Summary
This benchmark compares full-scan source aggregation on the wide file versus a pre-computed 2KB facet summary file.
Running Benchmark 5…
Code
benchmark5 = {
if (runBenchmarks < 1 || !benchmark4 || benchmark4.error) return null;
const loadingDiv = document.getElementById('loading_b5');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
// Full scan: aggregate source counts from full wide file
const fullScanQuery = `
SELECT n, COUNT(*) as cnt FROM wide
WHERE otype = 'MaterialSampleRecord'
GROUP BY n
`;
// Pre-computed: read from 2KB summary file
const summaryQuery = `
SELECT facet_value, count FROM read_parquet('${summariesUrl}')
WHERE facet_type = 'source'
`;
const runs = 3;
try {
const fullScanTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWide.query(fullScanQuery);
fullScanTimes.push(performance.now() - start);
}
const summaryTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWide.query(summaryQuery);
summaryTimes.push(performance.now() - start);
}
const median = arr => {
const sorted = [...arr].sort((a, b) => a - b);
if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
return sorted[Math.floor(sorted.length / 2)];
};
const warmMedian = arr => {
if (arr.length <= 1) return arr[0] || 0;
return median(arr.slice(1));
};
const fullScanMedian = warmMedian(fullScanTimes);
const summaryMedian = warmMedian(summaryTimes);
return {
name: "Facet Aggregation (full scan vs summary)",
fullScanCold: fullScanTimes[0],
fullScanMedian: fullScanMedian,
fullScanAll: fullScanTimes,
summaryCold: summaryTimes[0],
summaryMedian: summaryMedian,
summaryAll: summaryTimes,
speedup: fullScanMedian / summaryMedian
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Benchmark 5 failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}10 Results Summary
11 Technical Notes
11.1 Pitfalls and Considerations
| Consideration | How We Address It |
|---|---|
| Browser caching | First run is “cold” (metadata not cached), subsequent runs are “warm” |
| Network variability | We run 3 iterations and report the warm-run median (exclude cold run) |
| JIT compilation | First run includes JIT overhead; warm runs are more representative |
| Memory limits | 691 MB narrow file may stress browser memory; wide format is safer |
11.2 Schema Differences
Narrow schema stores relationships as edge rows:
-- Edge row example
{otype: '_edge_', s: 123, p: 'produced_by', o: [456]}Wide schema stores relationships as columns:
-- Entity row with relationship columns
{otype: 'MaterialSampleRecord', p__produced_by: [456], p__has_material_category: [789]}Wide+H3 schema adds pre-computed spatial indices:
-- Same as wide, plus H3 hexagonal index columns
{..., h3_res4: 595536348953485311, h3_res6: 604265133842685951, h3_res8: 613003918731886591}The wide format eliminates ~9M edge rows (60% file size reduction). The H3 variant adds ~14MB for spatial index columns that enable cell-based geospatial filtering.
Facet summaries (2KB) pre-compute common aggregations:
{facet_type: 'source', facet_value: 'SESAR', scheme: null, count: 4600000}12 See Also
- Interactive Explorer - Browse samples on a 3D globe
- Search Explorer - Faceted search and filter across all samples
- Deep-Dive Analysis - Comprehensive DuckDB-WASM tutorial