For · Research teams

Research Data Management Without a Full-Time Data Manager

How grant-funded research teams handle the full data-management lifecycle — collection, cleaning, FAIR, handover — without a dedicated data manager.

Published · 11 November 2025·7 min read

Most grant-funded research projects need someone who looks after the data. Fewer of them have anyone whose job that actually is. The funder expects a data-management plan, FAIR-aligned outputs, archival-ready handover, and a chain of custody that survives the postdoc moving on. The team has a PI, two PhD students, a postdoc, and 0.2 FTE of an institutional research-software-engineer who is also supporting four other projects.

This is the gap research data management lives in. Universities frame it as a library function. Funders frame it as a compliance obligation. The teams actually doing research treat it as something that has to happen between two grant submissions, on top of fieldwork, on top of analysis, on top of writing. The data manager job exists in the funder's mind and the project budget rarely.

This post is for grant-funded research teams that don't have one and aren't going to. It covers what research data management actually involves across the project lifecycle, where the highest-impact gaps are, and how to handle the work without hiring a full-time data role.

What research data management actually means

Research data management (RDM) is the set of practices that make data collected, generated, or curated by a research project usable beyond the moment of collection. It spans the full research data lifecycle — from the data-management plan written at proposal time, through collection and processing, through analysis and publication, to long-term archival and reuse by other teams.

A common misreading: RDM is about storage and backup. It is, but only superficially. The harder part is everything that makes the data legible to someone who didn't collect it. Schemas. Metadata. Provenance. Versioning. License and access policies. Documentation that survives the team. None of that happens by accident, and most of it can't be retrofitted at deadline.

The funder-facing requirements have ratcheted up. Horizon Europe expects FAIR-aligned data. National funders increasingly require open-data archives. Most institutions have data-management policies even if individual projects don't follow them. The question isn't whether RDM is required — it's how to do it without dedicated headcount.

Where the gaps actually live

In practice, the gaps in research data management cluster in three places.

Schema drift during collection

The data-management plan defines a clean schema. Six months in, the actual data has three extra columns nobody documented, two field-name conflicts between sites, and a date format that flips between ISO and DD/MM/YYYY depending on which RA was on shift. Nothing was malicious — every change had a reason at the time. But the schema document is now wrong, and the team doesn't know which version they should be re-aligning to.

Fix: schema as code, not as a document. A small validation script run at every data ingestion catches drift on day one rather than month six. If the validation fails, the data either gets fixed or the schema does — explicitly, with a version bump.

Cleaning that isn't reproducible

By the time the analysis runs, the dataset has been touched by at least three people, mostly in Excel, mostly without recording what they did. The published figures cite "the cleaned dataset". Nobody can produce that dataset from the raw inputs deterministically.

Fix: cleaning as code, even simple cleaning. A single notebook or script that takes raw → cleaned, runnable from end to end, version-controlled. It does not need to be sophisticated. It needs to exist and be re-runnable. Scientific data management without reproducible cleaning is a fiction.

Documentation that lives in heads

The PI knows why the early-2025 measurements are excluded. The postdoc knows which subjects had sensor calibration issues. The PhD student knows that subject-codes starting with BCN- are pilot-phase and shouldn't be in the main analysis. None of this is written down. When any of those people leaves, the institutional knowledge of the dataset goes with them.

Fix: a README.md at the dataset level, and a notes.md per significant decision. Not glamorous. Sustains the dataset's interpretability after staff turnover.

The five-stage lifecycle, practically

| Stage | What it requires | What "done well" looks like | |---|---|---| | Plan | DMP at proposal, updated at kickoff | Aligned with funder template (Horizon Europe, AEI, AGAUR), revisited annually | | Collect | Schema discipline, validation at ingestion | Raw data versioned, schema enforced, every batch logged | | Process | Reproducible cleaning, documented decisions | Cleaning as code, decisions recorded, traceability raw → analysis-ready | | Analyse | Analysis pipelines, documented choices | Analysis as code, parameters configurable, figures regenerable | | Archive | FAIR metadata, repository deposition, license clarity | Zenodo / institutional / domain repo, citable DOI, reuse license stated |

Most teams are fine on Plan and Collect. The breakdown happens at Process and Analyse. Archive is often the rushed last week.

The single highest-leverage move for teams without a data manager: invest in the Process stage. Reproducible cleaning makes everything downstream tractable. Without it, the analysis stage is fragile and the archive stage is dishonest.

What a "done-with-you" approach looks like

The all-or-nothing framing of research data management — either you hire a data manager or you ignore it — isn't the only option. There's a third path: bring in a partner who handles the implementation work, leaves your team with maintainable code, and exits. This is the research data services model that makes sense for funded projects without recurring data-engineering needs.

Concretely, this looks like:

A focused engagement (4–8 weeks) that lifts your data from "scattered on team laptops" to "structured, documented, reproducible".
Cleaning pipelines, schema validation, and metadata structure delivered as code your team owns.
A documentation pack — data dictionary, decision log, README, FAIR-alignment notes — that the next person on the project can actually use.
Archive deposition (Zenodo, institutional repository) with citable DOIs and stated reuse licenses.

The output is a research data infrastructure your team operates without further dependency. The exit criterion is that the team can re-run, document new datasets, and respond to funder data requests independently. No subscription. No retainer. No vendor lock-in.

This is what we mean by done-with-you: the implementation is done by us, the operating model is owned by you.

What evaluators actually ask for

A practical list, drawn from real Horizon Europe and ERC review feedback:

"Where is the cleaned dataset accessible?" — It needs a URL or DOI.
"How were quality decisions made and recorded?" — They need to be readable.
"Can a reviewer re-run your analysis?" — One command, not "follow these 12 steps".
"What licence governs reuse?" — A specific licence name, not "open access".
"What's the long-term plan?" — Not "the institution's repository", but who maintains, until when, with what support.

Failing any of these doesn't usually break the project. But across enough deliverables, it shifts the dissemination + impact score, and that's where consortia lose marks they didn't notice they were losing.

A 90-minute audit you can do this afternoon

Block 90 minutes. Open your project drive.

Find the canonical version of your dataset. If there are three "cleaned final" files dated within a month of each other, document which one is true and delete or rename the others.
Open the cleaning script (if one exists). Re-run it from raw → cleaned. If the output doesn't match the canonical version, you have hidden manual steps. Document them.
Read the README in the dataset folder. If it doesn't exist, write a draft now.
Check whether your data-management plan was updated more recently than your kickoff. If not, flag it.
Find your archive deposition target (Zenodo, institutional repo). Confirm you have an account and know the deposition process.

Five steps, ninety minutes, and you'll have a clearer picture of where the gaps actually are. Most of them turn out to be smaller than feared. A few turn out to be the difference between a clean closeout and a frantic one.

Where Pragma fits

We deliver research data management as a service for grant-funded teams that don't have, and don't want to hire, an internal data manager. The engagement is finite — one of our typical Data-to-Report Sprints runs 2–4 weeks, leaves a reproducible pipeline behind, and exits cleanly. We've done this work for EU-funded consortia, university research groups, and academic collaborations across health, neuroscience, and policy domains.

If you're carrying the data-management workload without dedicated headcount, that's the engagement we exist for.

Three things to do this week

Run the 90-minute audit above. Note the top three gaps.
Identify the one gap that, if fixed, would prevent the largest closeout-time scramble. That's your highest-leverage investment.
If the team's available capacity is below what the gap needs, request a scope review. We'll be honest about what's worth doing externally and what your team should keep.

Research data management without a full-time data manager is doable. The trick is treating it as code-shaped engineering work, not as an administrative obligation. The rest follows.

Related notes

Research teams