Zenodo — archived datasets for reproducible research
Data Catalog
All files are served from data.isamples.org backed by Cloudflare R2. A Cloudflare Worker in front of the bucket sets Cache-Control: public, max-age=31536000, immutable on filename-versioned parquets (so browsers and the Cloudflare edge cache aggressively) and exposes CORS headers required by DuckDB-WASM’s HTTP range requests.
File naming convention: isamples_<YYYYMM>_<variant>.parquet. The month in the filename is the data-generation snapshot — content at a given URL never changes.
Primary datasets
The two main files carrying the sample records themselves:
Graph traversals, relationship-centric analysis, PQG work
∗ /current/wide.parquet is a stable alias that HTTP 302-redirects to the latest dated file (currently isamples_202604_wide.parquet, enriched with ~47 K OpenContext thumbnails). The dated filename is immutable; the alias rotates atomically when we rebuild. Use the alias for interactive work, the dated URL when you want a pinned, reproducible reference. The original isamples_202601_wide.parquet (278 MB, no thumbnails) is kept available for historical pinning.
All three represent the same underlying data (SESAR + OpenContext + GEOME + Smithsonian) with identical semantics — they differ only in serialization strategy. See the Technical: Narrow vs Wide tutorial for a performance comparison.
Pre-aggregated helpers
Small lookup tables computed ahead of time so a page can render facets and counts instantly, without touching the 278 MB primary file:
You need to render facet URIs as human-readable text
Geospatial aggregates (H3)
Hexagonal H3 cells pre-aggregated at three resolutions for zoom-adaptive globe rendering. Each row: h3_cell, center_lat, center_lng, sample_count, dominant_source, source_count. For the design rationale (why hexagons, why these resolutions), see Technical: Why H3?.
import duckdbcon = duckdb.connect()con.sql(""" SELECT source, COUNT(*) AS n FROM read_parquet('https://data.isamples.org/current/wide.parquet') WHERE otype = 'MaterialSampleRecord' GROUP BY 1 ORDER BY 2 DESC""").df()
From the browser via DuckDB-WASM — see the tutorials for complete examples with HTTP range requests.