Xiaofan Liu at the City University of Hong Kong and colleagues will reconstruct COVID-19 transmission chains between individuals in communities and households using statistical methods applied to existing datasets to more reliably estimate COVID-19 transmission characteristics, such as reproduction rates, that are critical for planning effective control measures. Currently, transmission characteristics are estimated using aggregated-level data, which leads to inaccuracies. Ideally, data on how COVID-19 is transmitted between individuals are needed. They will curate an existing collection of datasets containing over 40,000 COVID-19 cases in five Asian countries with person-to-person transmission evidence to reconstruct transmission chains. They will then apply statistical tests and an analytical methodology called regression analysis to identify the most important transmission risk factors, which may include virus strain, transmission media, population density, and climate conditions.
To improve control measures for COVID-19 by applying statistical methods to existing datasets containing over 40,000 COVID-19 cases from five Asian countries to reconstruct transmission chains between individuals in households and communities.
We’re using epidemiological survey data from eight countries. The challenges so far of course is the data curation process – different data sets, they have different standards, they have different organisations and we have to identify the identical parts in the data and merge them together.
Once the project has been finished that is once the COVID-19 transmission chains have different variations in different worlds and a different culture, demographic features has been characterised. Well hopefully we can help the human beings understanding COVID-19 better and help the policymakers to make more scientific strategies.
We have already finished that data curation parts and we are really applying statistical models onto our data. Hopefully we can the results in three months.
I think the first priority of health’s data science right now is to opening up data. During our data collection process we have identified many papers that are characterising a specific set of transmission chain in a specific demographic or geographic settings. However, these data are not open. And also we have identified many useful databases that contain useful data for this project. But we don’t have access to those databases. We understand that there were privacy concerns in this line of research. That is the epidemiological survey data could sometimes be sensitive. In privacy prospective, however, we really hope that these databases can at least be open to the global size, sociality. The second priority for health data science is the normalisation and standardisation of the datasets. We spend a lot of time curating the data sets from different sources, they will be better if they had been prepared in a standardised way and that will save not only our time, but other researchers time so that they can put more time into an analysis instead of data processing.
It’s obvious that our research question cannot be answered without being shared. We rely on different data sources. For example, academic papers, and the government public reports for collecting and curating these epidemiologic reports or the traces of the transmission chains.