iSamples Data Serializations

A catalog of the parquet files that back the iSamples query substrate

data

architecture

parquet

Author

iSamples team

Published

July 10, 2026

1 1. Purpose and scope

iSamples has roughly a dozen parquet files in circulation at any given moment — each with a specific role, a specific upstream parent, and a specific set of downstream consumers (the web Explorer, the Python reference notebook, the progressive globe, the PQG conformance work). Some are primary archival products; others are derived aggregates or caches; still others are source-specific variants published outside the data.isamples.org namespace.

This document is a catalog, not an ingestion guide: it tells you what each file is, where it came from, who consumes it, and where in the spec tree to look for its normative definition. For how to build these files, see the scripts in scripts/ and the converters in pqg/. For how to query them, see query-spec.qmd. For how to cite them, see the Zenodo deposition plan.

All sizes and row counts below were verified by DuckDB DESCRIBE + COUNT(*) against https://data.isamples.org/ on 2026-04-24.

2 2. The derivation DAG

Zenodo export (doi:10.5281/zenodo.15278211, ~300 MB, 6.7 M samples)
  │   sample-centric, nested STRUCTs (PQG "export" format)
  │
  └─► isamples_202512_narrow.parquet  (820 MB, 101 M rows)
        │   graph-normalized, nodes + _edge_ rows (PQG "narrow")
        │
        └─► isamples_202601_wide.parquet  (278 MB, 20.7 M rows)
              │   entity-centric, p__* relationship arrays (PQG "wide")
              │
              ├─► isamples_202604_wide.parquet  (292 MB, 20.7 M rows)
              │     = 202601 wide + ~47 K OpenContext thumbnails
              │     (see scripts/enrich_wide_with_oc_thumbnails.py)
              │
              ├─► isamples_202601_wide_h3.parquet  (292 MB, 20.7 M)
              │     = wide + h3_res4 / h3_res6 / h3_res8 columns
              │
              ├─► isamples_202601_samples_map_lite.parquet  (60 MB, 6.0 M)
              │     display projection for map points
              │
              ├─► isamples_202601_sample_facets_v2.parquet  (63 MB, 6.0 M)
              │     pid → facet-URI strings for multi-dim filtering
              │
              ├─► isamples_202601_facet_summaries.parquet  (2 KB, 56 rows)
              │     baseline (facet_type, facet_value, count) tuples
              │
              ├─► isamples_202601_facet_cross_filter.parquet  (6 KB, 526 rows)
              │     single-active-filter cross cache
              │
              └─► isamples_202601_h3_summary_res{4,6,8}.parquet
                    geospatial aggregates for the progressive globe
                    (38 K / 112 K / 176 K cells)

Source-specific variants (parallel to the substrate, not derived from it):

oc_isamples_pqg.parquet        (GCS, 11.8 M, narrow, OC-only)
oc_isamples_pqg_wide.parquet   (GCS,  2.5 M, wide,   OC-only)
  └─► serve as upstream for OpenContext thumbnails folded into 202604 wide

Vocabulary labels (parallel to the substrate, sourced from isamplesorg/vocabularies):

vocab_labels.parquet           (58 KB, 537 SKOS concepts)
  └─► consumed by Search Explorer to render facet URIs as prefLabels

Arrows indicate derivation, not containment. The Stage-4 frontend-derived files are rebuilt by isamplesorg.github.io/scripts/build_frontend_derived.py (+ build_vocab_labels.py); the Stage-2 narrow/wide files are rebuilt by pqg/. Note: the currently deployed isamples_202601_* files predate that builder — a fresh build is NOT bit-for-bit identical to them (see DATA_PROVENANCE.md, “deployed 202601 not reproducible”).

3 3. Catalog

3.1 Tier: source of truth

File	Role	Size	Rows	Upstream	Consumers	Spec
`zenodo.15278211` export	Aggregated Zenodo export (all 4 sources, sample-centric, nested)	~300 MB	6.7 M	SESAR + OpenContext + GEOME + Smithsonian ingestion	PQG converters (narrow, wide)	PQG §3.3 (export format)

3.2 Tier: graph normalization

File	Role	Size	Rows	Upstream	Consumers	Spec
`isamples_202512_narrow.parquet`	Graph-normalized with explicit `_edge_` rows; canonical archival form	820 MB	101.4 M	Zenodo export	Graph traversals, PQG tutorials, narrow→wide converter, Zenodo archive	PQG §3.1, §4.2
`isamples_202601_wide.parquet`	Entity-centric, relationships as `p__*` arrays; primary analytic substrate	278 MB	20.7 M	narrow	Search Explorer, Python notebook, facet/h3/lite derivations	PQG §3.2, §4.5
`isamples_202604_wide.parquet`	202601 wide + ~47 K OC thumbnails folded in	292 MB	20.7 M	202601 wide + `oc_isamples_pqg.parquet`	`current/wide.parquet` alias points here	PQG §3.2

3.3 Tier: derived aggregates (progressive globe / H3)

File	Role	Size	Rows	Upstream	Consumers	Spec
`isamples_202601_wide_h3.parquet`	Wide with `h3_res{4,6,8}` BIGINT columns pre-joined	292 MB	20.7 M	wide	Deep-Dive Analysis tutorial (H3 filtering without join)	QUERY_SPEC §2.4
`isamples_202601_h3_summary_res4.parquet`	Continental tier: `(h3_cell, sample_count, center_lat, center_lng, dominant_source, source_count, resolution)`	580 KB	38 K	wide_h3	Interactive Explorer globe (zoomed out), Python Explorer H3 tier mode	QUERY_SPEC §2.4
`isamples_202601_h3_summary_res6.parquet`	Regional tier	1.6 MB	112 K	wide_h3	Interactive Explorer globe (mid zoom)	QUERY_SPEC §2.4
`isamples_202601_h3_summary_res8.parquet`	Neighborhood tier	2.4 MB	176 K	wide_h3	Interactive Explorer globe (close zoom)	QUERY_SPEC §2.4

3.4 Tier: display projections

File	Role	Size	Rows	Upstream	Consumers	Spec
`isamples_202601_samples_map_lite.parquet`	Minimum map projection; only `MaterialSampleRecord` rows with coordinates	60 MB	6.0 M	wide (filtered)	Interactive Explorer point-level rendering below ~120 km altitude	QUERY_SPEC §4.1

3.5 Tier: facet caches

File	Role	Size	Rows	Upstream	Consumers	Spec
`isamples_202601_sample_facets_v2.parquet`	`(pid, source, material, context, object_type, label, description, place_name)` — all VARCHAR scalars; each facet column is a single URI per sample (not an array)	63 MB	6.0 M	wide	Search Explorer multi-dim facet filtering	QUERY_SPEC §3.3, §5.1
`isamples_202601_facet_summaries.parquet`	Baseline `(facet_type, facet_value, scheme, count)`	2 KB	56	wide	Every tutorial (instant initial facet counts)	QUERY_SPEC §3.3 tier 1
`isamples_202601_facet_cross_filter.parquet`	Pre-computed counts for single-filter cross-facet queries	6 KB	526	wide	Search Explorer cross-filter UI	QUERY_SPEC §3.3 tier 2a
`<tag>_sample_facet_index.parquet`	Complete per-pid facet index `(pid, source, material_mask, context_mask, object_type_mask, build_id, schema_version)` — one row per located sample, including samples with no tree membership (zero-masked, #306). Scanned by the multi-filter global-view count path (#304/#305).	~60 MB	6.0 M	wide (membership + samp_geo)	Interactive Explorer multi-filter facet counts	§4.12 below
`<tag>_sample_facet_index_meta.parquet`	Tiny trusted manifest `(source, count, build_id, schema_version, total_rows)` — per-source histogram + generation id, built DIRECTLY from `samp_geo` (not by reading back `sample_facet_index`). Read by the explorer’s `facetIndexReady` boot preflight instead of a live GROUP BY scan of the 9.68 MB index (#313 P1). Must always be uploaded/deployed paired with `sample_facet_index` of the same `build_id`.	~1 KB	~30	samp_geo (same source as sample_facet_index)	Interactive Explorer boot-time facet-index readiness check	§4.13 below

3.6 Tier: vocabulary labels

File	Role	Size	Rows	Upstream	Consumers	Spec
`vocab_labels.parquet`	SKOS concept URI → human-readable `pref_label` map (plus `definition`, `alt_labels`, `scheme`); covers material, sample object type, and sampled feature type vocabularies	58 KB	537	`isamplesorg/vocabularies` TTLs (built by `scripts/build_vocab_labels.py`)	Search Explorer (renders facet URIs as prefLabels); any tutorial that surfaces controlled-vocabulary URIs	issue #148

3.7 Tier: alternative export formats (upstream of the aggregated Zenodo export)

The export_client can emit each source’s records in multiple formats; the aggregated Zenodo deposition archives the GeoParquet flavor, but JSONL and CSV are also emitted by the same pipeline and are useful for streaming or human inspection.

File	Role	Size	Rows	Upstream	Consumers	Spec
`isamples_export_*.jsonl`	Streaming JSON export (one sample per line, nested structs)	per query	—	`isamplesorg/export_client` (`isample export -f jsonl`)	Local DuckDB ingestion, STAC catalog generation	export_client docs
`isamples_export_*.csv`	Flat CSV export — convenience only, not authoritative for the query substrate	per query	—	`isamplesorg/export_client` (`isample export -f csv`)	Human inspection	export_client docs
`stac.json` / `manifest.json`	STAC/discovery sidecars emitted with local exports	< 1 KB	—	`isamplesorg/export_client`	STAC browser, local server, refresh workflow	export_client README

3.8 Tier: legacy bindings and convenience copies

File	Role	Size	Rows	Upstream	Consumers	Spec
Solr indexed documents	Legacy search-server binding for the same canonical query dimensions. Not a portable serialization; listed here because QUERY_SPEC §5.3 documents the Solr dialect bindings	N/A	~6 M	`isamplesorg/isamples_inabox` harvest/index pipelines + schema mappings	iSamples Central (API offline as of Aug 2025; Solr schema remains the authoritative precedent for dimension names)	QUERY_SPEC §5.3
H3 + lite CSV twins	Human-readable CSV duplicates of `samples_map_lite.parquet` and `h3_summary_res{4,6,8}.parquet`	~640 MB total	mirror	the corresponding parquet files	Manual inspection only	parquet copies are authoritative; CSV twins excluded from the Zenodo substrate deposition by design

3.9 Tier: source-specific variants (not part of the substrate)

File	Role	Size	Rows	Upstream	Consumers	Spec
`oc_isamples_pqg.parquet` (GCS)	OpenContext-only narrow; carries `thumbnail_url` values absent from the aggregated export	~1.8 GB	11.8 M	OpenContext ETL (Eric Kansa)	`scripts/enrich_wide_with_oc_thumbnails.py` → 202604 wide; PQG development	PQG §3.1
`oc_isamples_pqg_wide.parquet` (GCS)	OpenContext-only wide	~600 MB	2.5 M	OC narrow	OC-specific analyses, PQG benchmarks	PQG §3.2

No OpenContext sidecar file exists yet. Per the sidecar-pattern plan (Raymond endorsed 2026-04-17), thumbnails are currently merged directly into isamples_202604_wide.parquet rather than joined at query time from a sidecar. A future isamples_202601_oc_sidecar.parquet (keyed on pid, with thumbnail_url, is_public, license, media_url, harvested_at) is planned — see project_isamples_sidecar_pattern.md.

4 4. Per-file detail

URL convention: each file is available at https://data.isamples.org/<filename> (versioned, 1-yr immutable cache) and, where applicable, at https://data.isamples.org/current/<alias> (302 redirect, 5-min cache). Examples below use the versioned URL; swap for the alias when you want “latest.”

4.1 4.1 Zenodo export (source of truth)

Role: The raw aggregated Zenodo export — all four sources, sample-centric, nested STRUCTs.
DOI: 10.5281/zenodo.15278211
Headline schema (PQG export, 19 cols): sample_identifier, label, description, produced_by {sampling_site {sample_location {latitude, longitude, ...}}}, etc.
Query pattern: one row per sample; no JOINs needed for basic queries.
DuckDB: download the parquet from Zenodo, then SELECT * FROM read_parquet('isamples_export_*.parquet') LIMIT 10.

4.2 4.2 `isamples_202512_narrow.parquet`

Role: PQG narrow format — the canonical, lossless graph-normalized representation.
Headline schema (40 cols): row_id, pid, otype, s, p, o, n, altids, geometry, ...entity-specific columns.... Edges are rows with otype='_edge_' and populated s/p/o.
Query pattern: multi-hop JOIN via _edge_ rows (see PQG §2.2).

DuckDB:

SELECT COUNT(*) FROM read_parquet('https://data.isamples.org/isamples_202512_narrow.parquet')
WHERE otype = 'MaterialSampleRecord';

4.3 4.3 `isamples_202601_wide.parquet`

Role: PQG wide format — primary analytic substrate for Explorer + notebook.
Headline schema (49 cols): same core columns as narrow, plus p__produced_by, p__sample_location, p__sampling_site, p__site_location, p__responsibility, p__registrant, p__has_material_category, p__has_context_category, p__has_sample_object_type, p__keywords, p__curation, p__related_resource — each an integer array of target row_ids. Exact DuckDB types are mixed: p__produced_by, p__sample_location, p__sampling_site, p__site_location, p__registrant, p__curation are INTEGER[]; p__has_material_category, p__has_context_category, p__has_sample_object_type, p__keywords, p__responsibility, p__related_resource are BIGINT[].
Column name gotcha: the source column is n on wide/narrow (PQG convention), not source. Alias it in projections (e.g. n AS source) to match what the lite and facet parquets already call it.
Query pattern: entity-centric; relationships via array-element JOIN (see PQG §3.2).

DuckDB:

SELECT n AS source, COUNT(*) FROM read_parquet('https://data.isamples.org/isamples_202601_wide.parquet')
WHERE otype = 'MaterialSampleRecord' GROUP BY n ORDER BY 2 DESC;

4.4 4.4 `isamples_202604_wide.parquet`

Role: 202601 wide enriched with ~47 K OpenContext thumbnails. current/wide.parquet 302-redirects here.
Headline schema: identical to 202601 wide (49 cols). Only the thumbnail_url column on OC MaterialSampleRecord rows is populated differently.
Query pattern: drop-in replacement for 202601 wide; use current/wide.parquet unless you need a pinned version.

DuckDB:

SELECT COUNT(*) FROM read_parquet('https://data.isamples.org/current/wide.parquet')
WHERE thumbnail_url IS NOT NULL;

4.5 4.5 `isamples_202601_wide_h3.parquet`

Role: Wide with H3 indices pre-joined, so H3 predicates don’t need a join.
Headline schema (52 cols): wide columns + h3_res4, h3_res6, h3_res8 (BIGINT).
Query pattern: direct H3-cell filtering without an H3 UDF.

DuckDB:

SELECT COUNT(*) FROM read_parquet('https://data.isamples.org/isamples_202601_wide_h3.parquet')
WHERE h3_res6 = 604932829406232575;

4.6 4.6 `isamples_202601_h3_summary_res{4,6,8}.parquet`

Role: Zoom-adaptive aggregates that back the Cesium progressive globe and the Python Explorer’s “H3 tier” rendering mode.
Headline schema (7 cols, identical across resolutions): h3_cell (UBIGINT — H3 cells are unsigned 64-bit; a signed BIGINT would go negative for high-bit cells), sample_count (INT), center_lat, center_lng (DOUBLE, rounded 6 dp), dominant_source (VARCHAR; ties broken by source name ASC for determinism), source_count (INT), resolution (INT).
Query pattern: fetch the right resolution for the current zoom; no join needed.

DuckDB:

SELECT * FROM read_parquet('https://data.isamples.org/isamples_202601_h3_summary_res6.parquet')
ORDER BY sample_count DESC LIMIT 20;

4.7 4.7 `isamples_202601_samples_map_lite.parquet`

Role: Display projection for point-level map rendering. Contains only MaterialSampleRecord rows with valid coordinates.
Headline schema (9 cols): pid, label, source, latitude, longitude, place_name, result_time, h3_res8, h3_res8_hex. No description — it’s in wide only.
Query pattern: the Explorer reads this directly when altitude drops below the point-render threshold.

DuckDB:

SELECT source, COUNT(*) FROM read_parquet('https://data.isamples.org/isamples_202601_samples_map_lite.parquet')
WHERE latitude BETWEEN 32 AND 42 GROUP BY 1;

4.8 4.8 `isamples_202601_sample_facets_v2.parquet`

⚠️ Deployed-file caveat: the live isamples_202601_sample_facets_v2.parquet still contains 346,768 bare-root “Material” rows — it predates the #271 selection rule below. The rule describes the builder contract for the next rebuild (verified to drop the root → 0), not the file currently served.

Role: Cross-dimension facet filtering — one row per sample, each facet column holds a single controlled-vocabulary URI.
Headline schema (8 cols, all VARCHAR): pid, source, material, context, object_type, label, description, place_name. material/context/object_type are scalar URI strings, NOT arrays — one row per sample, so a sample tagged with multiple URIs is represented by a single chosen URI. Selection rule: material = the first NON-ROOT concept in the array (the broad root .../material/1.0/material is dropped — #265/#271); root-only samples → NULL material. This is NOT necessarily the leaf/most-specific concept (the arrays are not clean SKOS paths). context/object_type = the first array element ([1]). place_name is a VARCHAR cast of the wide’s VARCHAR[] (note: samples_map_lite keeps place_name as VARCHAR[]). For multi-value accuracy, JOIN back to wide.p__has_*_category.
Query pattern: WHERE material = '<uri>' for exact match; WHERE material ILIKE '%rock%' to substring-match URI fragments.

DuckDB:

SELECT pid, label
FROM read_parquet('https://data.isamples.org/isamples_202601_sample_facets_v2.parquet')
WHERE material ILIKE '%rock%'
LIMIT 10;

4.9 4.9 `isamples_202601_facet_summaries.parquet`

Role: Baseline (no-filter) facet counts. Loaded by every tutorial at startup.
Headline schema (4 cols, 56 rows): facet_type (source|material|context|object_type), facet_value, scheme, count.
Query pattern: sort by count DESC to render a top-N facet list.

DuckDB:

SELECT * FROM read_parquet('https://data.isamples.org/isamples_202601_facet_summaries.parquet')
WHERE facet_type = 'material' ORDER BY count DESC;

4.10 4.10 `isamples_202601_facet_cross_filter.parquet`

Role: Cross-facet counts for the single-active-filter case (QUERY_SPEC §3.3 tier 2a). Avoids recomputing when one facet dimension is active.
Headline schema (7 cols): filter_source, filter_material, filter_context, filter_object_type, facet_type, facet_value, count. Two row kinds: baseline rows have all filter_* NULL (these equal facet_summaries); single-dimension rows have exactly one filter_* non-NULL. Single-dimension rows include self-dimension counts (facet_type == filter dim), which the explorer ignores. (Both kinds are emitted by build_frontend_derived.py and asserted by validate_frontend_derived.py.)
Query pattern: lookup by the active filter to get counts for the remaining dimensions.

DuckDB:

SELECT facet_type, facet_value, count FROM read_parquet('https://data.isamples.org/isamples_202601_facet_cross_filter.parquet')
WHERE filter_source = 'SESAR' ORDER BY facet_type, count DESC;

4.11 4.11 `oc_isamples_pqg.parquet` and `oc_isamples_pqg_wide.parquet` (OC variants)

Role: OpenContext-specific PQG files maintained by Eric Kansa. Hosted at https://storage.googleapis.com/opencontext-parquet/, not under data.isamples.org. They are not part of the cross-source substrate — they carry OC-internal detail (notably thumbnail_url) that the aggregated Zenodo export drops.
Headline schema: PQG narrow (40 cols) and wide (47 cols). OC wide has slightly fewer p__* columns than the unified wide — this is schema drift, not semantically meaningful for standard queries.
Consumer: scripts/enrich_wide_with_oc_thumbnails.py uses OC narrow to fill thumbnails into 202604 unified wide. Also used directly in PQG benchmark work.
Future: these become the prototype upstream for per-source sidecars (see §3, bottom row).

4.13 4.13 `<tag>_sample_facet_index_meta.parquet` (tiny trusted manifest, #313 P1)

Role: replaces the explorer’s former boot-time live queries against sample_facet_index.parquet — a SELECT DISTINCT build_id, schema_version plus a full GROUP BY source coverage scan that forced a near-full read of the 9.68 MB / 6 M-row index on every page load (issue #313: this could block multi-filter count readiness for 20–80 s on a slow connection). The explorer’s facetIndexReady cell now fetches this KB-sized manifest instead.
Headline schema (5 cols, one row per non-null/non-empty source): source (VARCHAR), count (BIGINT), build_id (VARCHAR), schema_version (INTEGER), total_rows (BIGINT). build_id and schema_version are the same values written into sample_facet_index for the same build (repeated as constants on every row); total_rows is the full located universe count from samp_geo (COUNT(*), including null/empty-source pids) — matching how sample_facet_index covers all of samp_geo, not just pids with a source (#306).
Independence (Codex requirement): built DIRECTLY from samp_geo — the same authoritative table build_facet_summaries/build_sample_facet_index derive from — and NEVER by reading back sample_facet_index.parquet. Embedding metadata only inside the same index file would not be an independent staleness guarantee; deriving it from the shared upstream source, then validating it independently against the actual on-disk index (below), is.
Validation: validate_frontend_derived.py --index <index file> --index-meta <meta file> (or --dir/--tag auto-discovery) reads the ACTUAL on-disk sample_facet_index.parquet (full scan — fine at CI/batch time, never on the browser critical path), independently recomputes the per-source histogram, build_id, schema_version, and row count, and asserts they match the meta file’s content (relational content, not byte-identical Parquet). Also cross-checked against facet_summaries’ source facet, mirroring the comparison the explorer’s runtime preflight performs.
Build invocation / escape hatch: produced alongside sample_facet_index in a normal build or --only sample_facet_index,sample_facet_index_meta. A narrower --only sample_facet_index_meta (used ALONE) builds just this file without forcing a full sample_facet_index rebuild — useful for pairing a newly-built meta file with an already-deployed index built from the identical wide input (same build_id).
Deployment contract: sample_facet_index_meta and sample_facet_index must always be uploaded to R2 together, with the same build_id — the explorer’s preflight compares meta.build_id against window.__nodeBitsBuild and would (correctly) mark the index failed if a mismatched pair were ever deployed.
Immutability: published under the same versioned tag as its paired sample_facet_index (never overwrites a prior tag’s meta file).

5 5. URL convention

All substrate files live under https://data.isamples.org/ — a Cloudflare Worker fronting an R2 bucket. The Worker provides:

Versioned URLs https://data.isamples.org/isamples_<YYYYMM>_<variant>.parquet — 1-year immutable cache. Safe to pin in papers, Zenodo manifests, reproducibility notebooks.
Alias URLs https://data.isamples.org/current/<alias> — 302 redirect with 5-min cache; always resolves to the latest snapshot. Use for “always fresh” consumers. Currently current/wide.parquet → isamples_202604_wide.parquet.

Never reference the raw pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/... URL. It bypasses the Worker and defeats both the alias layer and the Cache-Control headers that DuckDB-WASM relies on for HTTP range requests.

OpenContext-specific variants live at https://storage.googleapis.com/opencontext-parquet/ and are maintained outside this convention.

6 6. Relationship to other documents

query-spec.qmd §5.1 — the DuckDB binding table, which maps query-spec dimensions (source, material, bbox, h3, time, text) to the specific parquet files above. This catalog says what the files are; the query spec says which dimension each file serves.
ZENODO_DEPOSITION_PLAN.md (in the monorepo root) — specifies which subset of these files are archived in each Zenodo deposition. The 202601 deposition bundles the 10 R2-served files plus a MANIFEST.json and README.md. Source-specific OC variants and the raw Zenodo export are not part of the substrate deposition.
pqg/docs/PQG_SPECIFICATION.md — defines the three canonical formats (export, narrow, wide) whose schemas the primary files conform to. §3.5 is the normative section.
pqg/docs/conformance_matrix.md (planned) — will document, for each file above, exactly which clauses of the PQG spec it satisfies (required columns, allowed otype values, edge-type constraints, etc.). This catalog is the prose companion; the conformance matrix will be the machine-checkable companion.
project_isamples_sidecar_pattern.md (memory) — planning for per-source sidecars that would sit alongside the unified wide file rather than being folded in at build time (as OC thumbnails currently are). When that lands, it adds a new tier to §3.

Last updated: 2026-04-24 by iSamples team. Row counts and sizes verified by DuckDB against https://data.isamples.org/ on the same date.

1 1. Purpose and scope

2 2. The derivation DAG

3 3. Catalog

3.1 Tier: source of truth

3.2 Tier: graph normalization

3.3 Tier: derived aggregates (progressive globe / H3)

3.4 Tier: display projections

3.5 Tier: facet caches

3.6 Tier: vocabulary labels

3.7 Tier: alternative export formats (upstream of the aggregated Zenodo export)

3.8 Tier: legacy bindings and convenience copies

3.9 Tier: source-specific variants (not part of the substrate)

4 4. Per-file detail

4.1 4.1 Zenodo export (source of truth)

4.2 4.2 isamples_202512_narrow.parquet

4.3 4.3 isamples_202601_wide.parquet

4.4 4.4 isamples_202604_wide.parquet

4.5 4.5 isamples_202601_wide_h3.parquet

4.6 4.6 isamples_202601_h3_summary_res{4,6,8}.parquet

4.7 4.7 isamples_202601_samples_map_lite.parquet

4.8 4.8 isamples_202601_sample_facets_v2.parquet

4.9 4.9 isamples_202601_facet_summaries.parquet

4.10 4.10 isamples_202601_facet_cross_filter.parquet

4.11 4.11 oc_isamples_pqg.parquet and oc_isamples_pqg_wide.parquet (OC variants)

4.12 4.12 <tag>_sample_facet_index.parquet (complete per-pid facet index, #305/#306)

4.13 4.13 <tag>_sample_facet_index_meta.parquet (tiny trusted manifest, #313 P1)

5 5. URL convention

6 6. Relationship to other documents

4.2 4.2 `isamples_202512_narrow.parquet`

4.3 4.3 `isamples_202601_wide.parquet`

4.4 4.4 `isamples_202604_wide.parquet`

4.5 4.5 `isamples_202601_wide_h3.parquet`

4.6 4.6 `isamples_202601_h3_summary_res{4,6,8}.parquet`

4.7 4.7 `isamples_202601_samples_map_lite.parquet`

4.8 4.8 `isamples_202601_sample_facets_v2.parquet`

4.9 4.9 `isamples_202601_facet_summaries.parquet`

4.10 4.10 `isamples_202601_facet_cross_filter.parquet`

4.11 4.11 `oc_isamples_pqg.parquet` and `oc_isamples_pqg_wide.parquet` (OC variants)

4.12 4.12 `<tag>_sample_facet_index.parquet` (complete per-pid facet index, #305/#306)

4.13 4.13 `<tag>_sample_facet_index_meta.parquet` (tiny trusted manifest, #313 P1)