Versioneer | Domain Data Platform Engineering

Domain Platform Engineering

Turn domain knowledge into platform capabilities.

A useful data platform is not a generic place where every file is copied. It is a set of domain-aware workflows that let teams register, check, version, approve, and publish data products while keeping ownership close to the people who understand them.

Versioneer gives those workflows a technical foundation: storage connections, snapshots, access rules, lifecycle states, metadata profiles, workflow definitions, and live status. In Earth Observation this becomes an EO profile with STAC, product states, publication gates, and long-term archival handover. For other domains, the same pattern captures their own product rules. Publication becomes a governed transition, not a late file move.

Open source

Building blocks you can inspect and evolve.

DataLab, storage access, catalogs, credentials, policies, snapshots, and controllers can be adopted as composable infrastructure instead of a closed platform.

Expert services

Hands-on data platform engineering.

We help platform teams and data stewards define the domain profile: product model, lifecycle states, metadata, validation, access, publication gates, and the interfaces that make those rules usable by humans, pipelines, and agents.

Private SaaS

A control plane we operate with you.

Your data plane stays in your environment. Versioneer operates the control plane with your team, handling lifecycle automation, policy rollout, monitoring, updates, and support.

What We Believe

Put platform changes in code.

We have run cloud sandboxes for years. The lesson is simple and more important now: people, scripts, and AI agents all touch the same infrastructure. Domain rules should not live only in meetings and memory. Important changes should be written down, reviewed, applied automatically, and limited by policy.

Declarative

Describe what should exist.

Workspaces, data products, access, lifecycle states, credentials, storage, and services should be defined in one clear form, not scattered across tickets and manual scripts.

Kubernetes-first

Use Kubernetes as the common base.

Kubernetes gives teams a shared way to run namespaces, policies, services, storage, sandboxes, agents, and the custom resources that describe them.

GitOps-first

Use Git for platform changes.

The wanted state should be versioned, reviewed, and easy to trace. Controllers can then keep applying it without relying on hand-run commands.

Policy-first

Put rules in front of the work.

Identity, permissions, quotas, security, approvals, and data lifecycle changes should be checked before work starts, especially when agents can act faster than humans can review.

01

Early registration

Register datasets while they are still changing. Do not wait until publication, force a full transfer first, or manage the lifecycle in spreadsheets and tickets.

02

Clear lifecycle

Keep working data separate from immutable published snapshots, with clear states such as staged, committed, and published.

03

Domain profile

Encode product structure, metadata, validation, lifecycle states, access policy, and publication checks once. EO can have an EO profile; other domains get their own profile.

04

Workflow as code

Describe inspection, validation, snapshots, publication, and permissions as code so controllers can apply them and keep status visible to facilitators and platform teams.

How We Facilitate

Composable building blocks, shaped into your domain profile.

Where we are involved

Our Earth Observation work with EOX, EarthCODE, and EOEPCA is one concrete profile. The same platform pattern applies wherever domain teams need governed products, shared infrastructure, and reproducible publication.

Company collaboration EOX Earth observation data space ESA EarthCODE Open platform ecosystem EOEPCA

Domain profiles, not generic workflows

We translate domain rules into reusable contracts: product models, metadata expectations, validation hooks, lifecycle states, access policies, and publication gates. The EO profile for Earth Observation is the example we are proving in practice.

Run by facilitators close to users

Data stewards, platform teams, and domain leads should own the operating context. We support them with implementation, automation, and hands-on engineering so stewardship becomes visible, repeatable, and less manual.

Works with what you already have

Existing object storage, shared filesystems, scientific tools, machine learning libraries, and cloud infrastructure can be connected without forcing teams into one new system or repeated full-dataset copies. The control plane coordinates lifecycle and policy without forcing a central copy of all data.

Explore Versioneer open source

datalab.yaml DataLab example

apiVersion: pkg.internal/v1beta2
kind: Datalab
metadata:
  name: s-research-team
spec:
  users: [jane, jim, john]
  sessions: [{name: default, state: started}, {name: analysis, state: stopped}]
  vcluster: true
  persistence:
    storageClassName: sbs-default-retain
  data:
    readOnlyMount: true
  quota:
    memory: 64Gi
    storage: 2Ti
    budget: x-large
  registry: # OCI container registry
    enabled: true
    storage: 500Gi
  security:
    policy: privileged
    kubernetesRole: admin
    kubernetesAccess: false
  databases: # PostgreSQL
    pg0:
      names: [analytics, dev, prod]
      storage: 250Gi
      backupStorage: 750Gi
  documentStores: # MongoDB
    prod:
      storage: 200Gi
  cacheStores: # Redis
    prod:
      storage: 100Gi
  vectorStores: # Qdrant
    prod:
      storage: 50Gi

Source: DataLab examples on GitHub

DataLab Foundation

One manifest for governed domain workspaces.

The DataLab creates shared cloud workspaces on Kubernetes for the people and agents working closest to the data. A single Datalab claim says who can enter, which sessions exist, whether the lab needs its own cluster space, which storage credentials are mounted, and which services are available.

The important part is the Kubernetes resource model behind the manifest. The claim is a durable platform contract with metadata, desired state, observed state, labels, RBAC, admission checks, audit, reconciliation, and status. Teams get self-service inside a bounded workspace; the platform keeps governance around it.

This is the welcome experience we want for data engineering: humans and agents arrive in the same governed sandbox, with domain data, tools, storage access, quotas, security settings, and databases already in place.

For humans

Researchers, engineers, and stewards get a ready lab with storage, services, permissions, and enough room to work.

For agents

Agents can use the same governed sandbox with scoped credentials, quotas, services, and policy boundaries.

Define personal or shared labs with users, admins, and session state.
Attach storage, durable workspaces, credentials, read-only data mounts, and policy-controlled services.
Expose status that humans can read, controllers can react to, GitOps can wait on, and agents can reason over.

Our Offering

We can operate a control plane in your cloud.

Versioneer can run the control plane where your organization needs it: inside your cloud and governance boundary. The data plane remains your storage, compute, identity, network, and workspaces. Domain facilitators stay close to end users while Versioneer handles lifecycle automation, policy rollout, monitoring, updates, and support.

Register Inspect Stage Commit snapshot Publish

Customer data plane

Your clouds, data, and domain workspaces.

Storage, compute, identity, network boundaries, sensitive data, pipelines, and DataLab workspaces stay close to the teams, stewards, and platforms that use them.

Versioneer control plane

Operate lifecycle, policy, and publication.

Versioneer coordinates snapshots, validation, monitoring, policy changes, updates, and support without becoming the place where all data must be copied.

Contract

Data Product Model

States, metadata, versions, policies, publication rules, and domain profiles.

Interface

APIs & CRDs

Actions for validation, promotion, publication, archive, and access.

Storage

Federated Storage

Several buckets and storage systems with shared identity and rules, without forcing one central copy.

Control

Reconciliation

Controllers compare declared state with the live system, then run validation, indexing, copying, lifecycle changes, policy updates, and status checks.

Publish

Publication

Metadata, catalog updates, webhooks, and long-term storage steps.

Policy

Governance & Access

Public, embargoed, and licensed access, established through common OIDC and STS concepts with secrets handled safely.

History

Audit Log

Git history, lifecycle events, current status, and user/API views.

Change

Changing Datasets

Version graphs, delta copy, reused unchanged content, and clear change reports.

Delivery

Service Packaging

Reusable Helm or Kustomize packages, docs, reference deployments, and demos.

Data platforms built for your domain, in your cloud.