For · Research teams

FAIR Data Compliance Without a Data Manager

Most research teams promised FAIR-aligned data in the proposal and never built the practice. How to make FAIR compliance real without a dedicated data manager.

Published · 8 April 2026·7 min read

The Data Management Plan you submitted with the proposal said your project's outputs would be FAIR — Findable, Accessible, Interoperable, Reusable. Two and a half years later, the actual state of the project's data is: scattered across team laptops, undocumented schemas, no persistent identifiers, no reuse licence, and no plan for archival deposition. The reviewer's question — "where can a researcher access your cleaned dataset?" — does not have a clean answer. The DMP was aspirational. The execution wasn't.

This is the most common pattern in FAIR data compliance for grant-funded research. The principles are well-known. The practices that produce FAIR-aligned outputs are not, especially for teams without dedicated data-management headcount. This post is a practical guide to making FAIR compliance real without hiring a data manager.

What FAIR actually requires (in operational terms)

FAIR is a four-letter acronym describing four properties research data should have. In practice, each letter translates to a small set of concrete operational requirements.

F — Findable

The data needs to be discoverable by anyone who might want to reuse it. This means:

A persistent identifier (typically a DOI from Zenodo, your institutional repository, or a domain-specific archive)
Structured metadata that describes what the data is, who produced it, when, and under what conditions
Indexed in a searchable repository that other researchers actually search

In operational terms: deposit your dataset in Zenodo or equivalent, with a complete metadata form, and you've satisfied F.

A — Accessible

The data needs to be retrievable using standard protocols, with clear access controls.

Open access if the data is non-sensitive and your funder permits
Authenticated access with documented procedures if the data is restricted (clinical, personal, commercially sensitive)
Long-term availability — the data needs to be retrievable years after the project ends

The metadata should be openly accessible even when the data itself is restricted. A reviewer should be able to see that the dataset exists and what it contains, even if they can't download the data without an access request.

I — Interoperable

The data needs to use formats and vocabularies that machines (and other research teams) can read.

Open file formats (CSV, JSON, Parquet, NetCDF — not proprietary Excel-only formats)
Standard vocabularies for variable names where they exist in your domain (ontologies, controlled vocabularies)
Schema documentation so a reuser knows what each column means without having to ask you

This is the letter most research teams underweight. Interoperability is what makes the dataset useful to anyone outside your team.

R — Reusable

The data needs to come with the licence terms and provenance information that allows reuse.

Clear reuse licence (Creative Commons, Open Data Commons, custom licence — not "open access" in the abstract)
Provenance metadata describing how the data was collected, processed, and cleaned
Data dictionary documenting variables, units, codes, and decision rules

Without this, technically-FAIR data is practically unreusable. The licence and the data dictionary are non-negotiable.

The five-stage research data lifecycle and where FAIR fits

| Stage | What FAIR requires | What teams without a data manager often miss | |---|---|---| | Plan | DMP aligned with funder template, FAIR principles named explicitly | DMP written once at proposal, never revisited | | Collect | Schema discipline, metadata at ingestion | Schemas drift; metadata captured retroactively | | Process | Reproducible cleaning with provenance | Cleaning in Excel with no record of decisions | | Analyse | Versioned analysis tied to specific data versions | Analyses cite "the cleaned dataset" without specifying which one | | Archive | Deposition with metadata, DOI, licence | Archive deposit rushed in the last week, often incomplete |

The research data lifecycle is where FAIR either gets implemented or doesn't. Most teams without dedicated data-management roles do well on Plan and Collect, struggle on Process and Analyse, and rush Archive. The single highest-leverage investment is treating Process as code, not as ad-hoc work.

A 12-step FAIR readiness audit you can run today

Block two hours. Open your project drive and your DMP.

Find your DMP. Read it. Note every FAIR commitment it makes.
Identify the canonical version of your dataset. If there are three "cleaned final" files, pick one and rename the others.
Open your data dictionary. If it doesn't exist, draft one now: variable name, type, units, allowed values, source, decision rules.
Find the cleaning script. If cleaning is manual, document the manual steps in a script-shaped checklist that another person could follow.
Confirm the file format. Are your data files in open formats (CSV, Parquet, NetCDF)? If not, plan a conversion.
Choose a licence. CC BY 4.0 is a common default for non-sensitive research data. CC0 if you want maximal reuse. Custom licences for sensitive data with a documented justification.
Identify your archive target. Zenodo for general-purpose; your institutional repository if it's discoverable; domain-specific archives where they exist (NCBI, EBI, neurodata.io, ICPSR).
Draft your repository metadata. Title, authors, contributors, abstract, keywords, related publications, funder, grant number.
Plan access controls if data is restricted. Document who can request access, on what conditions, through what process.
Document provenance. How the data was collected, by whom, with what protocols, and what cleaning was applied.
Test reusability. Can a colleague who didn't collect the data understand it from the metadata + dictionary alone? If not, fix the gap.
Schedule the actual deposition. Don't leave it to the last week. Deposit a near-final version 4–6 weeks before closeout, and update if needed.

Two hours of focused work usually moves a project from "FAIR aspirational" to "FAIR mostly real". The gaps that remain are usually specific and addressable.

The implementation gap most teams miss

The biggest practical gap between a FAIR-aligned project and a non-FAIR project isn't the principles — it's the engineering practices that make the principles deliverable.

Schema validation as code

Define your schemas (column names, types, allowed values) in code that runs at ingestion. A simple Pydantic model or JSON Schema definition catches drift on day one rather than month six. Without this, your "structured" data accumulates inconsistencies that cleaning has to retroactively fix.

Cleaning as code

Even simple cleaning — dropping missing values, standardising date formats, deduplicating subjects — should be in a versioned script, runnable end-to-end. The script is the provenance documentation. Without it, "the cleaned dataset" is a snapshot of someone's manual Excel work that nobody can reproduce.

Analysis tied to specific data versions

Each analysis should reference the specific version of the dataset it ran on. When the dataset gets updated, analyses can be re-run against the new version, with documented changes. Without this, the link between published figures and the underlying data degrades over time.

Documentation as the work happens

Documentation written at the deadline is bad documentation. The README, data dictionary, and decision log should grow alongside the data, not be assembled in the last week. A team without a data manager can still do this — it just requires the discipline to update the README every time a meaningful data decision is made.

FAIR for sensitive data

The FAIR principles apply even when the data is restricted. The metadata should still be findable and accessible; the access controls just gate the data itself.

For projects working with clinical data, personal data, or commercially sensitive data:

The dataset metadata is public even if the data is not. Findability requires this.
Access procedures are documented and discoverable. A reviewer should know how to request access.
The data dictionary is public so a potential reuser knows whether the data is relevant before requesting access.
De-identification, where possible, is standard. A de-identified version of the data may be openly publishable even when raw data is not.

This is increasingly funder-required. Horizon Europe's "as open as possible, as closed as necessary" principle is the explicit framing.

Where Pragma fits

We implement FAIR-aligned research data management as a service for grant-funded teams that don't have dedicated data-management headcount. A typical engagement runs 4–8 weeks: schema validation as code, reproducible cleaning pipelines, FAIR metadata structures, archive deposition, handover documentation. The output is infrastructure your team owns and operates without further dependency.

If your DMP committed to FAIR and the practices to deliver it haven't materialised, that's the engagement we exist for. We've done this work for academic research collaborations, EU-funded consortia, and health-domain projects across multiple sites.

Three things to do this week

Run the 12-step audit above. Note your top three gaps.
Pick the most blocking gap and define the smallest fix that closes it. Often it's writing the data dictionary or testing the cleaning script end-to-end — not a multi-week project.
If the cumulative gap exceeds your team's available capacity, request a scope review. We'll be honest about what's worth implementing externally and what your team should keep.

FAIR data compliance without a data manager is doable. It's a discipline, not a job title. The trick is treating data work as code-shaped engineering, building the practices once, and committing to them through the project lifecycle. The DMP's commitments stop being aspirational and start being operationally true.

Related notes

Research teams