The urgent need for better coordination across the global data sharing landscape
#datasaveslives: Never has this statement been more true than during the COVID-19 pandemic. Access to robust data has informed policy, planning and the public health response. Research using health data has been crucial to help understand how the virus is evolving, who is most at risk, what the most effective treatments might be, and whether vaccines are safe.
With significant investment in new trials, an ever-increasing amount of relevant data is being generated and collected. One year after the start of the pandemic, 2866 COVID-related clinical trials have been registered. The UKCDR COVID Research Project Tracker reports that, as of January 2021, 7,778 projects, across 136 countries, have been funded or repurposed to tackle COVID-19.
However, it is not always easy for researchers to know how to access the data resulting from these studies. Despite initial optimism that the pandemic would be the trigger to encourage researchers to share data, the jury is still out as to whether this has been the case. Indeed, the evidence increasingly appears to suggest the opposite.
To what extent is this failure because there is not the infrastructure in place to support researchers to share and access data? What more needs to be done to remove barriers and enable effective data sharing between countries?
Researchers want to be able to combine data across multiple platforms, working across institutional and geographical boundaries with no additional effort. An effective FAIR data ecosystem, with data that is Findable, Accessible, Interoperable and Reusable, requires a number of elements:
- Metadata must be shared and syndicated across platforms, allowing researchers to search and discover data through multiple gateways.
- There need to be streamlined and automated mechanisms to allow researchers to apply to access data, with clear and transparent processes for decision-making.
- Data should be hosted in permanent repositories around the world, so it can be collected once and then used or combined in different settings.
- Secure analysis platforms, or Trusted Research Environments, can allow researchers to access data in a controlled way and to reduce the need to send data to multiple places. The use of federated access will increasingly enable analysis without the need to transfer data.
In addition to interoperable infrastructure, an effective FAIR data ecosystem also needs to demonstrate an ethical and trustworthy approach. Patient and public involvement should be embedded from the outset, with responsible data stewardship and clarity over data governance to reduce complexity and confusion. The approach needs to encourage collaboration, for example focusing on the need for harmonisation of data standards to allow effective comparison. And it is crucial that appropriate credit and attribution is given to those who collected, generated and curated the data, to incentivise sharing.
To what extent do each of these elements exist to help address the COVID-19 pandemic or to tackle future health challenges? Initial mapping by the International COVID-19 Data Alliance (ICODA), a global alliance convened by HDR UK, in autumn 2020 found nearly 100 data repositories, platforms, databases and libraries of datasets that were relevant to COVID-19. Further details about this mapping exercise, and the full list of data initiatives we identified, are publicly available. TDR, the Special Programme for Research and Training in Tropical Diseases, has commissioned a more in-depth study of key COVID-19 repositories, which will examine how they meet FAIR requirements. We look forward to seeing the outputs shortly.
The key finding from our mapping is that the landscape of COVID-19 data-related activities is busy and complex. There has been a flurry of COVID-19 data initiatives, with new initiatives set up and well-established repositories pivoting to include COVID platforms (for example, the COVID-19 Data Portal from EMBL-EBI, Vivli’s COVID-19 portal, and IDDO’s COVID-19 data platform). Six months on, the challenges remain the same:
- Fragmentation. While there are a number of high quality data repositories, these are often isolated islands and it is difficult to navigate from one to another.
- Lack of coherence. Even the terminology is confused, with descriptions such as repositories, platforms, registries, dashboards and libraries used inconsistently across different initiatives.
- Coordination. Improved collaboration will be crucial to reduce the risk of duplication and to identify and fill missing gaps.
Recognising the fragmentation, ICODA was set up to help to bridge the gap between different silos, by enabling multiple datasets, often from different countries, to be analysed together. ICODA aims to enable data sharing by providing a secure Trusted Research Environment through the ICODA Workbench, so that relevant data from multiple sources can be brought together for analysis. We are working to improve discoverability, building on the Health Data Research Innovation Gateway. And, over time, we will increasingly facilitate federated access. Where it is not possible to transfer data, providing a federated approach will allow researchers to send analysis to the data (rather than moving data to the researcher) in one or more different location. Initial results from our second Driver Project, bringing together data from up to 42 countries to explore the impact of the pandemic on perinatal outcomes, are already demonstrating what can be achieved.
ICODA offers one part of the solution, but the complex challenges cannot be solved by any one player alone. Instead, it will require many stakeholders to come together in a networked ecosystem. And that will need coordination and collaboration.
Such coordination already exists at a country level, for example in the UK there is the Health Data Research Alliance, which brings together over 40 national health organisations, NHS trusts, research institutes and charities. There are also excellent examples if we look at different sectors, including the Global Alliance for Genomics and Health which has driven collaboration across genomics, and the Global Partnership for Sustainable Development Data, which supports better use of data to monitor the Sustainable Development Goals.
However, there is not yet the coordination needed to improve sharing across clinical trials, research studies and real world evidence to support an effective pandemic response. Indeed, there is growing concern about the number of unconnected initiatives that are beginning to emerge, each separately attempting to improve coordination, leading to further duplication and confusion. The national Academies have recently come together in the lead up to the G7 summit to make the case for a better level of ‘data readiness’ for future international health emergencies. There is an urgent need for G7 Governments, the WHO and funders to listen to their call, and to work together to provide the coordination and collaboration needed to develop a networked and effective FAIR data ecosystem. Only then will we be able to harness the real power of data.