Researcher in laboratory

The hard part shouldn't be getting to the data

Researchers lose months fighting through access requests, approvals, and IT hoops just to start an analysis. BioVault cuts through the red tape: a free, open-source platform that lets you run pipelines directly on distributed data with end-to-end encryption and secure enclaves. Using data visitation and optional human-in-the-loop approvals, data owners can rest easy knowing they never give up ownership — raw data is never uploaded or exposed via open ports. No centralizing, no delays — just research projects starting in days, not months.

Join the Beta and get:

  • Personal help setting up BioVault for your research
  • Introductions to collaborators and patient groups
  • Directly steer our development to solve your problems

For Researchers

Are you a researcher working with sensitive datasets—like whole genome sequences, clinical records, or phenotype data? You want to collaborate with other scientists, but you can't just hand over your raw data.That's where BioVault comes in.

  1. Create your private BioVault — set up a secure workspace for your sensitive data (e.g., patient genomes or clinical records) that only you control.
    $ bv init my_patient_data
    ✓ BioVault "my_patient_data" created locally
  2. Add your data — place sequencing files (FASTQ, BAM/CRAM, VCF, phenotype data, etc.) into the vault. The files stay on your machine and never leave your control.
    $ bv add ./data/patient1.vcf --vault my_patient_data
    ✓ Added 1 file to BioVault "my_patient_data"
  3. Collaborator prepares code — another researcher builds and tests a workflow or arbitrary code against your vault (e.g., GWAS, splicing analysis, or custom scripts).
    $ bv vault list
    ✓ Listing Vaults
    $ bv run ./variant_discovery_analysis syft://research@my_patient_data#participants --test
    ✓ Testing workflow on target vault
  4. Workflow is submitted for your review — the collaborator sends their code to your inbox. You see exactly what will run before deciding whether to approve.
    $ bv submit ./variant_discovery_analysis syft://research@my_patient_data#participants
    ⚡ Analysis Submitted
    $ bv inbox
    1 pending request: Collaborator@University → pipeline: variant_discovery_analysis.nf
  5. Approve and run locally — once approved, the workflow or arbitrary code runs inside your BioVault. The raw data never leaves your machine; only results are shared back.
    $ bv approve variant_discovery_analysis
    ✓ Approved. Running NextFlow pipeline securely...
    $ bv inbox
    ProjectStatusPath
    variant_discovery_analysis✅ Success./results/variant_discovery_out.csv

BioVault is powered by SyftBox, an open-source protocol from OpenMined for privacy-preserving remote data science. Instead of moving sensitive datasets to outside servers, SyftBox enables data visitation: code travels securely to where the data lives, runs locally, and only the results are shared back. This technology is already proven in industry for secure distributed computation, and BioVault applies it directly to genomics and biomedical research.

Collaboration Without Surrendering Control

You Stay in Control

Run pipelines on sensitive datasets without moving raw files—code travels to the data, not the other way around.

Privacy First

Protect participants with end-to-end encryption and secure enclaves for joint analysis without centralizing data.

Global Equity

Empower researchers and patient groups everywhere—including the Global South and under-resourced labs—by keeping data under local control while still enabling insights to be shared worldwide.

Reproducible Science

Standardized workflows and versioned pipelines ensure results can be validated and reproduced across labs.

Fast, Remote Data Science

Perform remote data visitation at scale—submit pipelines across multiple vaults and get results back quickly.

Open and Transparent

Open-source and free to use. No gatekeepers, no hidden costs—just science moving forward.

Join as a Beta Tester

BioVault is part of a pilot initiative to make data free and open access for research whileprotecting the rights of individuals whose samples power these datasets. Your participation helps prove that global collaboration in science and genomics is possible without ceding control of data to centralized institutions or big tech companies.

Frequently Asked Questions

What is BioVault?

BioVault is a platform for secure, privacy-preserving genomics research. It enables data visitation—where code travels to the data, runs locally, and only results are shared. This allows rare disease groups, biobanks, and population genomics researchers to collaborate globally without surrendering control of sensitive datasets.

Who should join the Beta?

We're looking for rare disease organizations, population genomics projects, and research teams who want to collaborate securely across institutions and borders. If you steward valuable datasets but need to protect participant privacy, BioVault is built for you.

Is BioVault free to use?

Yes. BioVault is open-source and free under the Apache 2.0 License. Our goal is to advance science and diagnosis by making genomic data accessible for research while ensuring equity—especially for communities often excluded from large-scale projects.

How does BioVault protect privacy?

Your raw data never leaves your control. BioVault uses end-to-end encryption, secure enclaves, and user approval for every analysis request. Collaborators receive results, not your underlying data.

What is SyftBox and why does it matter?

SyftBox is an open-source protocol from OpenMined that powers privacy-preserving remote data science. It enables distributed computations across datasets without centralizing them—technology already trusted in industry and now applied to genomics through BioVault.

Can this work across countries and under-resourced labs?

Yes. BioVault is designed for global collaboration—including researchers in the Global South. Because data never leaves local machines, institutions can participate without losing sovereignty over their datasets, helping ensure genomic equity worldwide.