Provenance

A serious genealogy platform for people who actually own their data. Cross-platform desktop and mobile via Tauri 2 + SvelteKit 5, a live web build of the desktop app, and a Python provenance CLI that handles the unglamorous data work — scan, check, fix, dedupe, merge, and export — without ever uploading a tree to someone else’s cloud.

7
Build targets — macOS, Windows, Linux, Android, iOS, Web, Chrome Extension
9
CLI subcommands — full GEDCOM operations surface
5.5.1
GEDCOM fidelity target with structured-issue scanner output
Provenance cross-platform genealogy app interface showing tree view, GEDCOM cleanup, and AI-assisted research panel

Signature Key Art

A cinematic visual system built around the Provenance product theme — rendered via Flux + custom LoRA stacks on brand palette.

Why Provenance Exists

The genealogy software market is split between three uncomfortable choices. Subscribe to a cloud (Ancestry, MyHeritage) and accept that your tree is on someone else’s server, with their export rules, their pricing model, and no programmatic access. Buy a closed desktop product (Family Tree Maker, MacFamilyTree) and accept proprietary file formats, no scripting, and one-vendor risk. Or use an SQLite-backed power-user tool (RootsMagic) and accept a UI that lags the modern web and a workflow that assumes Windows.

Provenance takes the fourth path: a single SvelteKit codebase that targets every desktop and mobile OS through Tauri 2, a web build for collaboration, a Chrome extension for in-browser capture, and a Python CLI that treats GEDCOM as a real engineering surface — structured issue reports, deterministic backups, scriptable batch operations. The data lives on your hardware. The format is canonical GEDCOM 5.5.1. The interface is modern. The automation is real.

The platform was built against a specific lineage problem — multigenerational Polish-Jewish ancestry from Galicia (Kalush) crossing into Vancouver and Pittsburgh — where commercial tools either dropped diacritics, mishandled BET/AND date ranges, or refused to surface duplicate detection without manual review. The features that work for that case work for any tree where the data is messy because the history is messy.

Platform Capabilities

  • Tauri 2 + SvelteKit 5 desktop builds for macOS, Windows, and Linux — production-complete
  • Live web build for read-only collaboration and shareable views
  • Tauri 2 mobile (Android, iOS) and Chrome MV3 side-panel extension in active development
  • Python provenance CLI — scan, check, fix, dedupe, review, stats, merge, export, version
  • GEDCOM 5.5.1 export fidelity with structured-issue JSON output for every scan
  • Living-person privacy filtering on export — --filter-living flag for public sharing
  • Date normalization handling American order, FROM/TO, BET/AND with month+year support
  • Fuzzy-match duplicate detection with relationship-safe merge and rollback
  • Public TypeScript shared-schema package @provenance/core for downstream tooling

Deep Dive: The GEDCOM Operations Surface

Provenance treats GEDCOM as data, not folklore. The Python CLI ships nine subcommands that turn the format’s ambiguities into structured, scriptable operations — with deterministic backups before every mutating run.

Scan, Check, Fix — the read-write triad

provenance scan walks an entire tree and emits a structured JSON report keyed by stable rule IDs. Every issue carries a rule code, a record reference, and a severity classification, so downstream tooling — the desktop app, a CI job, a custom dashboard — can render the same data without re-implementing the parser. scan_file() is exposed as a public Python API for programmatic use.

provenance check runs a stricter consistency pass and writes a markdown report suitable for review or commit. Both commands are read-only; nothing mutates state. provenance fix is the mutating counterpart — date normalization (American order rewriting, FROM/TO ranges, BET/AND with month+year), name canonicalization, place-string normalization — and every fix invocation produces an AutoFix annotation log plus a timestamped backup, so any change is reversible.

A separate in-memory API, fix_gedcom_text(), takes a GEDCOM string and returns the normalized output without touching disk — the seam used by the Tauri app, the web build, and the Chrome extension to share one parser across every surface.

Dedupe, Merge, Review — the trust layer

provenance dedupe runs fuzzy-match duplicate detection across individuals and surfaces candidate pairs with a similarity score. The merge logic is relationship-safe: when a pair is collapsed, child and spouse references on both sides are retained, conflicting facts are preserved as alternates, and the entire operation is captured as a reversible diff against the pre-merge backup.

provenance merge joins two GEDCOM files into one with automatic xref collision handling — the engineering equivalent of merging two researchers’ trees without manually renumbering every @I pointer. provenance review opens an interactive review session over the consistency-check output, letting a researcher walk issues one at a time, accept or annotate each, and produce a clean follow-up scan to confirm closure.

provenance stats rounds out the surface with tree statistics (individuals, families, surnames, date ranges) for quick orientation when picking up a file you haven’t opened in months. provenance export --filter-living applies living-person privacy rules — suppressing dates of birth, current locations, and identifying details for individuals presumed living — before producing a public-shareable file.

Platform Components

Tauri 2 + SvelteKit 5 desktop / mobile shell

One Svelte codebase compiled to native bundles for macOS, Windows, and Linux through Tauri 2’s Rust backend, with the same code path producing Android and iOS builds via Tauri 2 mobile (in active development). The native shell handles file-system access, OS keychain integration, and the local data layer; the Svelte frontend handles the entire UI. The build matrix is wired into a single GitHub Actions release workflow triggered by an app-v* tag.

Web build for the browser

The same SvelteKit codebase compiled with VITE_WEB=true ships as a static web bundle for read-only collaboration, shareable lineage views, and the kind of tree exploration that doesn’t require local file write access. It is the public face of the platform — the URL you give a relative when you want them to see what you’ve assembled without asking them to install anything.

Chrome MV3 side-panel extension

A Manifest V3 Chrome extension with a side-panel UI that captures records mid-research from FamilySearch, Ancestry, FindAGrave, and other genealogy sites. The extension bridges to the desktop app over local port 19876, so captured records land directly in the active tree without an intermediate clipboard or import step. In active development.

@provenance/core shared TypeScript schema

The contract between the desktop app, the web build, the Chrome extension, and any downstream tooling. Shared types, the canonical record schema, and prepared queries live in one TypeScript package so an AI plugin, a custom report generator, or a third-party visualizer can target the platform without re-implementing the model. A small surface, a hard contract.

Implementation Details

Concrete engineering specifics — the contract surface that downstream tooling can rely on without surprise.

CLI distribution and install path

The Python CLI ships as tools/cli in the monorepo and installs editable via pip install -e tools/cli. provenance --help enumerates all nine subcommands. Tested against Python 3.11 and 3.12 in CI; MIT-licensed.

Public Python APIs

normalize_gedcom_date_safe(text), fix_gedcom_text(gedcom_str), and scan_file(path) are stable public APIs as of v1.0.0. Every API returns structured types — DateResult, normalized strings, or a list of typed issue records — never raw strings or untyped dicts. Downstream code can rely on the shape.

Date parsing rules

American date order (e.g. “January 15, 1850” → “15 JAN 1850”), FROM/TO and BET/AND patterns including month+year support, GEDCOM-canonical month abbreviation, and a deterministic DateResult output that records both the normalized form and the original string so reversibility is preserved.

Backup discipline

Every mutating CLI run — fix, dedupe, merge, export — produces a timestamped backup of the input file before touching it. Backups are local, named with the operation that produced them, and never deleted automatically. Rollback is a file copy.

Release pipeline

Pushing an app-v* tag triggers a single GitHub Actions release workflow that produces macOS, Windows, Linux, Android, iOS, and web builds in parallel. The CI pipeline already validates the Python CLI on 3.11 and 3.12 on every push.

Technology Stack

App Shell

Tauri 2 (Rust) SvelteKit 5 TypeScript Tauri 2 mobile MV3 Chrome extension

Data Engine

Python 3.11 / 3.12 GEDCOM 5.5.1 Structured-issue JSON DateResult type Fuzzy match

Shared Contract

@provenance/core Canonical schema Prepared queries Plugin contract

Distribution

GitHub Actions CI app-v* release tag VITE_WEB=true web build pnpm dev / pnpm dev:web pip install -e tools/cli MIT

Why It Matters

Genealogy software is one of the most user-hostile categories in consumer computing — closed formats, cloud lock-in, and dated UIs that assume your tree is small and tidy. Provenance is the reset: open format, local-first, modern stack, real automation surface.

Data Sovereignty

Your tree is a file on your hardware in canonical GEDCOM 5.5.1, not a row in someone else’s database. Every operation is reversible. Living-person filtering on export is a flag, not a feature request.

Engineering Surface

Nine CLI subcommands plus three public Python APIs mean a researcher with scripting skills can automate batch operations across many trees — the kind of work that closed desktop products simply do not allow.

Modern UI on Every OS

One SvelteKit codebase, seven build targets, one release tag. The desktop app feels native because Tauri 2 ships native; the web build runs on any browser; the Chrome extension captures records mid-research without breaking flow.

Try Provenance

The web build of the desktop app is live. For desktop, mobile, the Python CLI, or partnership, talk to us.

Open the App → Get in Touch