Tutorials
Learn to explore 6.7 million physical samples from scientific collections worldwide using modern browser-based tools.
Start Here
| Tutorial | What You’ll Learn |
|---|---|
| Interactive Explorer | Browse samples on a 3D globe with H3-clustered, zoom-adaptive rendering |
| Search Explorer | Faceted search and filter across all 6.7M samples with cross-filtering |
| Deep-Dive Analysis | Comprehensive DuckDB-WASM analysis with Observable JS — charts, maps, statistics |
| Technical: Narrow vs Wide | Schema comparison and performance benchmarks for the PQG data formats |
What’s in the Data?
| Source | Samples | Focus |
|---|---|---|
| SESAR | 4.6M | Earth science — rocks, minerals, sediments, soils |
| OpenContext | 1M | Archaeology — artifacts, excavation materials |
| GEOME | 605K | Biology — genomic and tissue specimens |
| Smithsonian | 322K | Natural history — museum collections |
Data Files
All data is hosted on data.isamples.org with HTTP range request support — DuckDB-WASM only downloads the bytes it needs.
| File | Size | Description |
|---|---|---|
| Wide format | 292 MB | One row per entity, all sources — primary file for tutorials. Stable alias redirects to the current dated build (isamples_YYYYMM_wide.parquet). |
| Wide + H3 | 292 MB | Wide format with H3 spatial indices for globe visualizations |
| Facet summaries | 2 KB | Pre-computed filter counts — loads instantly |
| H3 clusters (res4) | 0.6 MB | Zoomed-out globe view |
Why Browser-Based?
Our approach using geoparquet + DuckDB-WASM provides:
- Universal access — No installation, works in Chrome, Firefox, Edge, Safari, and Brave
- Fast analysis — 5-10x faster than downloading full datasets
- Memory efficient — Analyze 300MB datasets using <100MB browser memory
- Minimal transfer — HTTP range requests download only the columns and rows you need (typically <1 MB to start)
- Reproducible — All code is visible and foldable on tutorial pages
For Developers
All tutorial source code is on GitHub. Want to build your own analysis? Fork the repo, modify a .qmd file, and run quarto preview.
- GitHub repositories — all source code and data pipelines
- Zenodo community — archived datasets for reproducible research
- Query architecture — how the Explorer queries work under the hood