This study presents a novel hybrid machine-learning approach that combines supervised and unsupervised learning for predicting COVID-19 cases and mortality rates. Using a dataset covering 3,142 U.S. counties from January 2020 to June 2021, the study evaluates how different factors—such as demographics, climate, traffic, public health policies, and social distancing adherence—affect the spread of COVID-19. The proposed hybrid hierarchical ensemble outperforms single classification models, achieving high accuracy (91.2%) and improving regression performance by up to 43% in key metrics. The findings highlight virus pressure as the most significant factor in COVID-19 spread and suggest that this approach can aid in designing public health strategies for future outbreaks.
Takeaways:
- The study introduces a hybrid machine-learning model combining supervised and unsupervised learning for COVID-19 case and mortality prediction.
- The model outperforms traditional classification algorithms, achieving an accuracy of 91.2% and an F1-score of 0.916.
- The proposed approach improves regression accuracy by up to 43% in key performance metrics.
- Virus pressure is identified as the most critical factor influencing COVID-19 spread.
- Additional significant features include demographics, climate, traffic, public health policies, and social distancing adherence.
- The method provides insights that can help policymakers design better prevention and control measures for future public health crises.
- The study demonstrates the effectiveness of combining supervised and unsupervised learning for epidemiological predictions.