Identifying global biases in hydro-hazard research by mining the scientific literature
Abstract
Floods, droughts, and rainfall-induced landslides are hydro-geomorphic hazards that affect millions of people every year. These hazards are therefore heavily researched topics with several hundred thousand articles published. The large number of published articles means identifying existing gaps is a challenge, especially regarding research specific to local risk conditions and impacts. How well does hydro-geomorphic hazard research cover heavily impacted regions, different hydro-climatic processes, or relevant socio-economic aspects? In this work, we use natural language processing to search a database of 100 million abstracts for mentions of floods, droughts, and landslides. We annotate all hazards and location mentions and geolocate each study via Nominatim. We use this information to create global gridded research densities for the three hazards based on all study locations from 293,156 abstracts. We then compare research density to environmental, socio-economic, and disaster impact data. The global distribution of research is heavily influenced by human activity, national wealth, data availability, and population distribution. Countries that have been heavily impacted by hydro-geomorphic hazards in the past have a higher research density. However, this relationship strongly depends on country wealth. In low-income countries 100 times more people need to be affected before a comparable research density to high-income countries is reached. This disparity needs to be addressed to reduce disaster impact and adapt to changing conditions in the future. We here give guidance for which regions and hydro-climatic conditions an increased research focus on hydro-geomorphic hazards is most urgent.