© Unsplash/Clay Banks

Note on the project!

This project is finished.

  • Theme Field Tests

Post-COVID Data Model for the Data Institute Challenge

With the digitization of all areas of life, more and more data is being generated – including on the COVID-19 pandemic and Long COVID. But how can this data benefit everyone – from society and administration to research? Using health data as an example, we are developing a Post-COVID data model as part of the Data Institute Challenge by the Federal Ministry of the Interior and Community (BMI).

What is the Post-COVID data model about?

Medical research has already collected a large amount of data on Post-COVID, yet key questions remain unanswered. What impact does Long COVID have not only on individuals, but on society as a whole? What can administration or research do to make everyday life with the condition easier? Linking health data with general datasets – for example, on education or the labor market – could help answer these questions.

Why a Post-COVID data model?

The COVID-19 pandemic has shown how society at large can benefit when relevant data is processed and made available. COVID data already helps us to better understand the disease, its effects, and its circumstances. But what additional potential could this data unlock if we connect it with other datasets beyond its original collection purpose? This is the key question guiding our approach to a Post-COVID data model.

Linked data can provide answers to many questions, for example:

  • Which demographic groups are most affected by Post-COVID?

  • Are there spatial differences in how accessible and effective treatment and support services are?

  • Is there a correlation between poverty and Post-COVID?

  • How does Post-COVID affect student performance?

Together with the Berlin Institute of Health (BIH) at Charité Berlin, we are developing a concept for a sustainable and open data model that supports Post-COVID research. For this, we combine the data and prototyping expertise of our colleagues at the Open Data Information Office (ODIS) and CityLAB. The project is being carried out on behalf of the Federal Ministry of the Interior and Community (BMI).

How are we developing the Post-COVID data model?

Our Post-COVID data model follows a transparent and open approach:

  • It should both facilitate access to key medical research data and enable linking with additional data from various research areas.

  • The model must be capable of integrating different datasets – including personal, anonymized, or pseudonymized data – and ideally update them regularly and automatically via APIs.

  • The goal is to make the data model openly accessible to the public free of charge – e.g., under a CC-BY 4.0 or DLD-2.0 license. To this end, we are developing processes that ensure the interoperability and quality of the data and the data model, while considering existing metadata standards.

What’s next in developing the Post-COVID data model?

In the first project phase, we developed a concept for the Post-COVID data model – describing the necessary processes and requirements for building an initial software implementation in the next stages. We also documented processes for how data holders and users can connect to the model, and how the research community can bring in its needs. Our vision is a data model that can be reused and transferred to other domains beyond Post-COVID data.

After qualifying for the second project phase, our team focused on the technical implementation of tools for data merging, setting up a metadata catalog, further stakeholder engagement, and developing frontend components such as wireframes to enable interaction with the data model.

In the third and final stage of the Challenge, we continue to work intensively on implementing our prototype. The interface is being programmed and the scripts for data merging finalized. In addition, we are designing a concept for operating the data model.

The project was successfully completed in May 2025. The challenge enabled us to refine our concept and technical prototype and publish it as a minimum viable product. The final result shows how medical research data can be combined with other public data sets and what infrastructure is required to do so. Our team came second together with the German Foundation for General and Family Medicine. There was no third place. The MVP is publicly available and can be viewed and tested.

The prototype includes tools for data integration, a metadata catalogue and an interface that facilitates access and orientation. Processes for data holders and researchers who want to use or expand the model are also documented. Transferability to other subject areas remains a key result, as the model can serve as a blueprint for other domains.

Im Auftrag von

In Zusammenarbeit mit