How to Use iSamples

Get started exploring 6.7 million scientific samples

Quick Start

  1. Open the Interactive Explorer — a 3D globe loads with clustered sample data
  2. Zoom in — clusters break into finer detail as you zoom (resolution 4 → 6 → 8 → individual samples)
  3. Filter by source — use the checkboxes to show/hide data from SESAR, OpenContext, GEOME, or Smithsonian
  4. Click a cluster — see sample count and nearby samples with links to source records
  5. Click an individual sample — view metadata and follow the “View at source” link to the original repository
  6. Share your view — copy the URL to share your exact position, zoom level, and selected sample

What’s in the Data?

Source Samples Focus
SESAR 4.6M Earth science — rocks, minerals, sediments, soils
OpenContext 1M Archaeology — artifacts, excavation materials
GEOME 605K Biology — genomic and tissue specimens
Smithsonian 322K Natural history — museum collections

No Installation Required

Everything runs in your browser using:

  • DuckDB-WASM — a fast analytical database running client-side
  • HTTP range requests — only the data you need is downloaded (typically < 1 MB to start)
  • Cesium — 3D globe visualization

Works in Chrome, Firefox, Edge, Safari, and Brave. No plugins, no downloads, no accounts.

For Developers

All code is visible and foldable on tutorial pages. Want to build your own analysis?

  • Search Explorer — faceted search across all 6.7M samples with cross-filtering
  • Deep-Dive Analysis — statistical exploration with Observable Plot
  • Tutorials index — step-by-step guides from basic exploration to advanced analysis
  • GitHub — all source code and data pipelines
  • Zenodo — archived datasets for reproducible research

Data Catalog

All files are served from data.isamples.org backed by Cloudflare R2. A Cloudflare Worker in front of the bucket sets Cache-Control: public, max-age=31536000, immutable on filename-versioned parquets (so browsers and the Cloudflare edge cache aggressively) and exposes CORS headers required by DuckDB-WASM’s HTTP range requests.

File naming convention: isamples_<YYYYMM>_<variant>.parquet. The month in the filename is the data-generation snapshot — content at a given URL never changes.

Primary datasets

The two main files carrying the sample records themselves:

File Size Shape Rows Use when you need…
current/wide.parquet 292 MB Wide (one row per entity, nested relationships in p__* array columns) 20 M General entity queries, UI filtering, description text
isamples_202601_wide_h3.parquet 292 MB Wide + H3 BIGINT indices (h3_res4, h3_res6, h3_res8) 20 M Geospatial queries with H3 clustering at arbitrary zoom
isamples_202512_narrow.parquet 820 MB Narrow (graph: nodes + explicit _edge_ rows, s/p/o/n fields) 106 M Graph traversals, relationship-centric analysis, PQG work

/current/wide.parquet is a stable alias that HTTP 302-redirects to the latest dated file (currently isamples_202604_wide.parquet, enriched with ~47 K OpenContext thumbnails). The dated filename is immutable; the alias rotates atomically when we rebuild. Use the alias for interactive work, the dated URL when you want a pinned, reproducible reference. The original isamples_202601_wide.parquet (278 MB, no thumbnails) is kept available for historical pinning.

All three represent the same underlying data (SESAR + OpenContext + GEOME + Smithsonian) with identical semantics — they differ only in serialization strategy. See the Technical: Narrow vs Wide tutorial for a performance comparison.

Pre-aggregated helpers

Small lookup tables computed ahead of time so a page can render facets and counts instantly, without touching the 278 MB primary file:

File Size Contents Use when…
isamples_202601_facet_summaries.parquet 2 KB (facet_type, facet_value, count) for source, material, context, object_type You want instant initial facet counts with no filters applied
isamples_202601_facet_cross_filter.parquet 6 KB Pre-computed counts for single-facet selections You want instant cross-filtered counts for a single active filter
isamples_202601_sample_facets_v2.parquet 63 MB (pid, material, context, object_type) facet URIs per sample You need to filter on combinations of facets at query time
vocab_labels.parquet 58 KB (uri, pref_label, definition, alt_labels, scheme) for 537 SKOS concepts (material, sample object type, sampled feature type) You need to render facet URIs as human-readable text

Geospatial aggregates (H3)

Hexagonal H3 cells pre-aggregated at three resolutions for zoom-adaptive globe rendering. Each row: h3_cell, center_lat, center_lng, sample_count, dominant_source, source_count. For the design rationale (why hexagons, why these resolutions), see Technical: Why H3?.

File Size Cells Typical altitude
isamples_202601_h3_summary_res4.parquet 580 KB ~38 K Continental (world view)
isamples_202601_h3_summary_res6.parquet 1.6 MB ~112 K Regional (country / state)
isamples_202601_h3_summary_res8.parquet 2.4 MB ~176 K Neighborhood

CSV twins exist alongside each parquet (3× larger) for human inspection — browsers use the parquet versions.

Individual sample points (lite)

File Size Contents Use when…
isamples_202601_samples_map_lite.parquet 60 MB pid, label, source, latitude, longitude, place_name, result_time, h3_res8, h3_res8_hex — no description Point-level rendering below ~120 km altitude

Which tutorial uses which file

Interactive Explorer Search Explorer Deep-Dive Analysis
wide.parquet
wide_h3.parquet
facet_summaries.parquet
facet_cross_filter.parquet
sample_facets_v2.parquet
h3_summary_res4/6/8.parquet
samples_map_lite.parquet
vocab_labels.parquet

Quick query recipes

From Python:

import duckdb
con = duckdb.connect()
con.sql("""
    SELECT source, COUNT(*) AS n
    FROM read_parquet('https://data.isamples.org/current/wide.parquet')
    WHERE otype = 'MaterialSampleRecord'
    GROUP BY 1 ORDER BY 2 DESC
""").df()

From the browser via DuckDB-WASM — see the tutorials for complete examples with HTTP range requests.