How to Use iSamples

Get started exploring 6.7 million scientific samples

Quick Start

Open the Interactive Explorer — a 3D globe loads with clustered sample data
Zoom in — clusters break into finer detail as you zoom (resolution 4 → 6 → 8 → individual samples)
Filter by source — use the checkboxes to show/hide data from SESAR, OpenContext, GEOME, or Smithsonian
Click a cluster — see sample count and nearby samples with links to source records
Click an individual sample — view metadata and follow the “View at source” link to the original repository
Share your view — copy the URL to share your exact position, zoom level, and selected sample

What’s in the Data?

Source	Samples	Focus
SESAR	4.6M	Earth science — rocks, minerals, sediments, soils
OpenContext	1M	Archaeology — artifacts, excavation materials
GEOME	605K	Biology — genomic and tissue specimens
Smithsonian	322K	Natural history — museum collections

No Installation Required

Everything runs in your browser using:

DuckDB-WASM — a fast analytical database running client-side
HTTP range requests — only the data you need is downloaded (typically < 1 MB to start)
Cesium — 3D globe visualization

Works in Chrome, Firefox, Edge, Safari, and Brave. No plugins, no downloads, no accounts.

For Developers

All code is visible and foldable on tutorial pages. Want to build your own analysis?

Deep-Dive Analysis — statistical exploration with Observable Plot
Tutorials index — step-by-step guides from basic exploration to advanced analysis
GitHub — all source code and data pipelines
Zenodo — archived datasets for reproducible research

Data Catalog

All files are served from data.isamples.org backed by Cloudflare R2. A Cloudflare Worker in front of the bucket sets Cache-Control: public, max-age=31536000, immutable on filename-versioned parquets (so browsers and the Cloudflare edge cache aggressively) and exposes CORS headers required by DuckDB-WASM’s HTTP range requests.

File naming convention: isamples_<YYYYMM>_<variant>.parquet. The month in the filename is the data-generation snapshot — content at a given URL never changes.

Primary datasets

The two main files carrying the sample records themselves:

File	Size	Shape	Rows	Use when you need…
`current/wide.parquet` ∗	292 MB	Wide (one row per entity, nested relationships in `p__*` array columns)	20 M	General entity queries, UI filtering, description text
`isamples_202601_wide_h3.parquet`	292 MB	Wide + H3 BIGINT indices (`h3_res4`, `h3_res6`, `h3_res8`)	20 M	Geospatial queries with H3 clustering at arbitrary zoom
`isamples_202512_narrow.parquet`	820 MB	Narrow (graph: nodes + explicit `_edge_` rows, s/p/o/n fields)	106 M	Graph traversals, relationship-centric analysis, PQG work

∗ /current/wide.parquet is a stable alias that HTTP 302-redirects to the latest dated file (currently isamples_202604_wide.parquet, enriched with ~47 K OpenContext thumbnails). The dated filename is immutable; the alias rotates atomically when we rebuild. Use the alias for interactive work, the dated URL when you want a pinned, reproducible reference. The original isamples_202601_wide.parquet (278 MB, no thumbnails) is kept available for historical pinning.

All three represent the same underlying data (SESAR + OpenContext + GEOME + Smithsonian) with identical semantics — they differ only in serialization strategy. See the Technical: Narrow vs Wide tutorial for a performance comparison.

Pre-aggregated helpers

Small lookup tables computed ahead of time so a page can render facets and counts instantly, without touching the 278 MB primary file:

File	Size	Contents	Use when…
`isamples_202601_facet_summaries.parquet`	2 KB	`(facet_type, facet_value, count)` for source, material, context, object_type	You want instant initial facet counts with no filters applied
`isamples_202601_facet_cross_filter.parquet`	6 KB	Pre-computed counts for single-facet selections	You want instant cross-filtered counts for a single active filter
`isamples_202601_sample_facets_v2.parquet`	63 MB	`(pid, material, context, object_type)` facet URIs per sample	You need to filter on combinations of facets at query time
`vocab_labels.parquet`	58 KB	`(uri, pref_label, definition, alt_labels, scheme)` for 537 SKOS concepts (material, sample object type, sampled feature type)	You need to render facet URIs as human-readable text

Geospatial aggregates (H3)

Hexagonal H3 cells pre-aggregated at three resolutions for zoom-adaptive globe rendering. Each row: h3_cell, center_lat, center_lng, sample_count, dominant_source, source_count. For the design rationale (why hexagons, why these resolutions), see Technical: Why H3?.

File	Size	Cells	Typical altitude
`isamples_202601_h3_summary_res4.parquet`	580 KB	~38 K	Continental (world view)
`isamples_202601_h3_summary_res6.parquet`	1.6 MB	~112 K	Regional (country / state)
`isamples_202601_h3_summary_res8.parquet`	2.4 MB	~176 K	Neighborhood

CSV twins exist alongside each parquet (3× larger) for human inspection — browsers use the parquet versions.

Individual sample points (lite)

File	Size	Contents	Use when…
`isamples_202601_samples_map_lite.parquet`	60 MB	`pid, label, source, latitude, longitude, place_name, result_time, h3_res8, h3_res8_hex` — no description	Point-level rendering below ~120 km altitude

Which tutorial uses which file

	Interactive Explorer	Search Explorer	Deep-Dive Analysis
`wide.parquet`		●
`wide_h3.parquet`			●
`facet_summaries.parquet`	●	●	●
`facet_cross_filter.parquet`		●
`sample_facets_v2.parquet`	●	●
`h3_summary_res4/6/8.parquet`	●
`samples_map_lite.parquet`	●
`vocab_labels.parquet`	●	●

Quick query recipes

From Python:

import duckdb
con = duckdb.connect()
con.sql("""
    SELECT source, COUNT(*) AS n
    FROM read_parquet('https://data.isamples.org/current/wide.parquet')
    WHERE otype = 'MaterialSampleRecord'
    GROUP BY 1 ORDER BY 2 DESC
""").df()

From the browser via DuckDB-WASM — see the tutorials for complete examples with HTTP range requests.