Using Xgboost models for daily rainfall prediction
- Rafael Grecco Sanches Universidade de São Paulo https://orcid.org/0000-0003-2471-5070
- Rodrigo Sanches Miani School of Computer Science, Federal University of Uberlandia (UFU), Minas Gerais, Brazil https://orcid.org/0000-0002-8176-8040
- Bruno César dos Santos Department of Environmental Sciences (DCAM), Federal University of São Carlos (UFSCar), São Carlos, Brazil https://orcid.org/0000-0001-8218-6803
- Rodrigo Martins Moreira Environmental Engineering Department, Federal University of Rondônia (UNIR), Rondônia, Brazil https://orcid.org/0000-0001-6794-6026
- Gustavo Zen de Figueiredo Neves aSão Carlos School of Engineering, University of São Paulo (USP), São Paulo, Brazil https://orcid.org/0000-0002-3952-8238
- Vandoir Bourscheidt Department of Environmental Sciences (DCAM), Federal University of São Carlos (UFSCar), São Carlos, Brazil https://orcid.org/0000-0001-5419-323X
- Pedro Augusto Toledo Rios Faculty of Civil Engineering, Federal University of Uberlandia (UFU), Minas Gerais, Brazil
Resumen
Los modelos de aprendizaje automático para predecir las precipitaciones diarias han ganado fuerza en los últimos años. Comprender los beneficios del uso de esta tecnología en diferentes regiones es un tema de investigación relevante. Por esta razón, este estudio tiene como objetivo evaluar los pronósticos de lluvia diaria a partir de datos climáticos entre 1983 y 2019 en Itirapina, São Paulo, Brasil. Utilizamos un novedoso algoritmo de aprendizaje automático, XGBoost (eXtreme Gradient Boosting), para crear varios modelos de predicción de lluvia diaria. Se modelaron dos tareas: la aparición de precipitación diaria (clasificación) y la cantidad de precipitación diaria (regresión). Los resultados revelaron que la aparición de precipitaciones diarias se podía predecir con una precisión de alrededor del 90%. Además, se desarrollaron modelos para predecir la cantidad de lluvia diaria con tasas de error de alrededor de 3 mm. Observamos que la precipitación en el área de estudio está directamente asociada con la radiación solar, y los pronósticos de precipitaciones y los meses correspondientes son característicos del clima tropical.
Descargas
Biografía del autor/a
Possui graduação em Matemática pela Universidade Federal de São Carlos (2005), mestrado em Engenharia Elétrica pela Universidade Estadual de Campinas (2009) e doutorado em Engenharia Elétrica pela Universidade Estadual de Campinas (2013). Durante seu doutorado, passou seis meses como pesquisador visitante no CyQL (Laboratório de Quantificação de Segurança Cibernética) da Universidade de Maryland, EUA.
Atualmente é professor adjunto da Faculdade de Computação (FACOM) da Universidade Federal de Uberlândia (UFU).
Bachelor's degree in Geography from the Federal University of Alfenas-MG (UNIFAL-MG). Master's and PhD in Environmental Engineering Sciences from the University of São Paulo (USP). Post-doctorate in progress at the Department of Environmental Sciences (DCAm) of the Federal University of São Carlos (UFSCar). I conduct research in the areas of Climatology and Geotechnologies applied to the environment. I have experience as a teacher in the area of Geography education at elementary and high school levels.
Adjunct Professor in the Department of Environmental Engineering at the Federal University of Rondônia - Ji-Paraná Campus, promoting quality public education and popularization of science. Coordinator of the Laboratory of Geomatics and Statistics (LABGET - UNIR) and leader of the Environmental Engineering Research Group (GPEA-UNIR). I am a data scientist who transforms multi-sensor remote sensing products (optical and radar) and census data into spatial information to respond to complex environmental problems related to climate science, forest fragmentation, water resources, and public health. Post-Doctorate at San Diego State University - USA (2023). Post-Doctorate in Natural Resources - UFMS (2020). PhD in Environmental Engineering Sciences - EESC/USP (2017) with a sandwich period at the University of Michigan - USA. Master in Agricultural Sciences (2014), Environmental Sanitarian (2011) and Environmental Manager (2013) from IFGoiano - Campus Rio Verde. Mastery of Google Earth Engine, R and Python for spatial data analysis.
He holds a bachelor's and a degree in Geography from UTPR. He holds a master's degree in Science from USP (2014), with a study on Urban Climate in the city of São Carlos-SP. He holds a PhD in Science from USP (2018), whose research focused on the performance of atmospheric systems and the rainfall distribution of the state of Goiás and the Federal District. He was a collaborating professor in the Postgraduate Program in Environmental Engineering Sciences at the University of São Paulo, offering two courses: [SEA5916] Introduction to the Dynamic Study of Climate Generalities and Specificities and [SEA5862] Climatology Applied to the Environment; he is an advisor for academic master's degrees in the research line Climatology Applied to the Environment. He taught courses related to geosciences at the State University of Goiás and at the Barretos Educational Foundation, in undergraduate and graduate courses. Scientific advisor and visiting researcher at the National Laboratory for Sustainable Living and Communities of the Faculty of Architecture of the Autonomous University of Chiapas (UNACH - Mexico). He is a member of the Brazilian Association of Climatology (ABClima), serving as Director-Secretary (2018-2021) and of the Deliberative Council (2023-2025; 2021-2023; 2016-2018 and 2014-2016). Member of the research group Readings and Analysis in Hydrography, Climatology and Cartography (UFRN/CNPq) and of the Locality Study Group (USP/CNPq). He has teaching and professional experience in the area of Geosciences, with interests in the themes of Geography, Climatology, Atmospheric circulation of South America, Comfort and Technology of the built environment, Engineering Geology and Environmental Geotechnics. Reviewer for national and international scientific journals.
He is an associate professor at the Department of Environmental Sciences (DCAm) and a permanent professor at the Postgraduate Program in Environmental Sciences at the Federal University of São Carlos (UFSCar), where he has worked since 2013. He has a degree in Geography from the Federal University of Santa Maria (2006), a master's degree (2008) and a doctorate (2012) in Space Geophysics (concentration in Atmospheric Sciences) from the National Institute for Space Research (INPE). He was a postdoctoral fellow at the atmospheric electricity group (ELAT) at INPE until early 2013. He was a visiting researcher at INRAE (Institut national de recherche pour l'agriculture, l'alimentation et l'environnement), Antony, France, between 2019 and 2020. He works mainly in the area of atmospheric sciences and GIS, with an emphasis on topics such as: climatology and hydrology, storm formation, hydrological modeling, remote sensing and geoprocessing.
Citas
Althoff, D., Rodrigues, L. N., & Silva, D. D. (2022). Predicting runoff series in ungauged basins of the Brazilian Cerrado biome. Environmental Modelling & Software, 149, 105315. https://doi.org/10.1016/j.envsoft.2022.105315
Aires, U. R. V., Silva, D. D., Fernandes Filho, E. I., Rodrigues, L. N., Uliana, E. M., Amorim, R. S. S., Ribeiro, C. B. M., & Campos, J. A. (2023). Machine learning-based modeling of surface sediment concentration in Doce river basin. Journal of Hydrology, 619, 129320. https://doi.org/10.1016/j.jhydrol.2023.129320
Anwar, M. T., Winarno, E., Hadikurniawati, W., & Novita, M. (2021). Rainfall prediction using Extreme Gradient Boosting. Journal of Physics: Conference Series, 1869, 012078. https://doi.org/10.1088/1742-6596/1869/1/012078
Ardabili, S. F., Mosavi, A., Dehghani, M., & Várkonyi-Kóczy, A. R. (2019). Deep Learning and Machine Learning in Hydrological Processes, Climate Change and Earth Systems: A Systematic Review.
https://doi.org/10.20944/preprints201908.0166.v1
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A Comparative Analysis of XGBoost. Artificial Intelligence Review, 54, 1937–1967.
https://doi.org/10.1007/ s10462-020-09896-5
Bouras, E. H., Jarlan, L., Er-Raki, S., Balaghi, R., Amazirh, A., Richard, B., & Khabba, S. (2021). Cereal Yield Forecasting with Satellite Drought-Based Indices, Weather Data and Regional Climate Indices Using Machine Learning in Morocco. Remote Sensing, 13, 3101. https://doi.org/10.3390/rs13163101
Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324
Bresciani, C., Boiaski, N. T., Ferraz, S. E. T., Rosso, F. V., Portalanza, D., de Souza, D. C., Kubota, P. Y., & Herdies, D. L. (2023). Brazilian Annual Precipitation Analysis Simulated by the Brazilian Atmospheric Global Model. Water, 15, 256. https://doi.org/10.3390/w15020256
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
Chen, W., Huang, Y.-C., Lebar, K., & Bezak, N. (2023). A systematic review of the incorrect use of an empirical equation for the estimation of the rainfall erosivity around the globe. Earth-Science Reviews, 238, 104339. https://doi.org/10.1016/j.earscirev.2023.104339
Das, R., Chattoraj, S. L., Singh, M., & Bisht, A. (2024). Synergetic use of geospatial and machine learning techniques in modelling landslide susceptibility in parts of Shimla to Kinnaur National Highway, Himachal Pradesh. Modeling Earth Systems and Environment. https://doi.org/10.1007/s40808-024-01993-6
Dong, J., Peng, J., He, X., Cai, W., Gao, L., & Xiao, Z. (2023). Enhancing short-term forecasting of daily precipitation using numerical weather prediction bias correcting with XGBoost in different regions of China. Engineering Applications of Artificial Intelligence, 117, 105579.
https://doi.org/10.1016/j.engappai.2022.105579
dos Santos, B. C., Sanches, R. G., de Melo Bolleli, T., Neves, G. Z. F., Pereira, D. N. B., & Tech, A. R. B. (2022). On the quality of satellite-based precipitation estimates for time series analysis at the central region of the state of São Paulo, Brazil. Theoretical and Applied Climatology. https://doi.org/10.1007/s00704-022-04287-y
Facco, M., Campos, M. A. A., Vargas, D. S., Silveira, R. B., & Bisognin, C. (2020). Algoritmos de Machine Learning Aplicados na Ocorrência de Chuvas na Cidade de Santa Maria. Ciência & Natura, 42, 28. https://doi.org/10.5902/2179460X40537
Ghafarian, F., Wieland, R., Lüttschwager, D., & Nendel, C. (2022). Application of extreme gradient boosting and Shapley Additive explanations to predict temperature regimes inside forests from standard open-field meteorological data. Environmental Modelling & Software, 156, 105466.
https://doi.org/10.1016/j.envsoft.2022.105466
Hao, R., & Bai, Z. (2023). Comparative Study for Daily Streamflow Simulation with Different Machine Learning Methods. Water, 15, 1179. https://doi.org/10.3390/w15061179
He, R., Zhang, L., & Chew, A. W. Z. (2022). Modeling and predicting rainfall time series using seasonal-trend decomposition and machine learning. Knowledge-Based Systems, 251, 109125. https://doi.org/10.1016/j.knosys.2022.109125
Heydarizad, M., Pumijumnong, N., Sorí, R., Salari, P., & Gimeno, L. (2022). Fractional Importance of Various Moisture Sources Influencing Precipitation in Iran Using a Comparative Analysis of Analytical Hierarchy Processes and Machine Learning Techniques. Atmosphere, 13, 2019.
https://doi.org/10.3390/atmos13122019
Ibrahem Ahmed Osman, A., Najah Ahmed, A., Chow, M. F., Feng Huang, Y., & El-Shafie, A. (2021). Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Engineering Journal, 12, 1545–1556. https://doi.org/10.1016/j.asej.2020.11.011
Knighton, J., Pleiss, G., Carter, E., Lyon, S., Walter, M. T., & Steinschneider, S. (2019). Potential Predictability of Regional Precipitation and Discharge Extremes Using Synoptic-Scale Climate Information via Machine Learning: An Evaluation for the Eastern Continental United States. Journal of Hydrometeorology, 20, 883–900. https://doi.org/10.1175/JHM-D-18-0196.1
Liu, F., Wang, X., Sun, F., Wang, H., Wu, L., Zhang, X., Liu, W., & Che, H. (2022). Correction of Overestimation in Observed Land Surface Temperatures Based on Machine Learning Models. Journal of Climate, 35, 5359–5377.
https://doi.org/10.1175/JCLI-D-21-0447.1
Liu, Y., Zhao, Q., Yao, W., Ma, X., Yao, Y., & Liu, L. (2019). Short-term rainfall forecast model based on the improved BP–NN algorithm. Scientific Reports, 9, 19751. https://doi.org/10.1038/s41598-019-56452-5
Liyew, C. M., & Melese, H. A. (2021). Machine learning techniques to predict daily rainfall amount. Journal of Big Data, 8, 153. https://doi.org/10.1186/s40537-021-00545-4
Ma, C., Yao, J., Mo, Y., Zhao, Y., Jiang, Y., & Jiang, Z. (2024). Prediction of summer precipitation via machine learning with key climate variables: A case study in Xinjiang, China. Journal of Hydrology: Regional Studies, 56, 101964. https://doi.org/10.1016/j.ejrh.2024.101964
Makarieva, A. M., Gorshkov, V. G., Sheil, D., Nobre, A. D., Bunyard, P., & Li, B.-L. (2014). Why Does Air Passage over Forest Yield More Rain? Examining the Coupling between Rainfall, Pressure, and Atmospheric Moisture Content. Journal of Hydrometeorology, 15, 411–426. https://doi.org/10.1175/JHM-D-12-0190.1
Manandhar, S., Dev, S., Lee, Y. H., Meng, Y. S., & Winkler, S. (2019). A Data-Driven Approach for Accurate Rainfall Prediction. IEEE Transactions on Geoscience and Remote Sensing, 57, 9323–9331.
https://doi.org/10.1109/TGRS.2019.2926110
Monego, V. S., Anochi, J. A., & de Campos Velho, H. F. (2022). South America Seasonal Precipitation Prediction by Gradient-Boosting Machine-Learning Approach. Atmosphere, 13, 243. https://doi.org/10.3390/atmos13020243
Mu, Y., Biggs, T., & Shen, S. S. P. (2021). Satellite-based precipitation estimates using a dense rain gauge network over the Southwestern Brazilian Amazon: Implication for identifying trends in dry season rainfall. Atmospheric Research, 261, 105741. https:// doi.org/10.1016/j.atmosres.2021.105741
Muhammad, A., Evenson, G. R., Unduche, F., & Stadnyk, T. A. (2020). Climate Change Impacts on Reservoir Inflow in the Prairie Pothole Region: A Watershed Model Analysis. Water, 12, 271. https://doi.org/10.3390/w12010271
Nakhaei, M., Mohebbi Tafreshi, A., & Saadi, T. (2023). An evaluation of satellite precipitation downscaling models using machine learning algorithms in Hashtgerd Plain, Iran. Modeling Earth Systems and Environment, 9, 2829–2843. https://doi.org/10.1007/s40808-022-01678-y
Nielsen, D. (2016). Tree boosting with XGBoost-why does XGBoost win "every" machine learning competition? [Dissertação de Mestrado]. NTNU.
Nguyen, D. H., Hien Le, X., Heo, J.-Y., & Bae, D.-H. (2021). Development of an Extreme Gradient Boosting Model Integrated With Evolutionary Algorithms for Hourly Water Level Prediction. IEEE Access, 9, 125853–125867. https://doi.org/10.1109/ACCESS.2021.3111287
Oliveira, G., Pedrollo, O., & Castro, N. (2014). O Desempenho das Redes Neurais Artificiais (RNAs) para Simulação Hidrológica Mensal. Revista Brasileira de Recursos Hídricos, 19, 251–265. https://doi.org/10.21168/rbrh.v19n2.p251-265
Pham, B. T., Le, L. M., Le, T.-T., Bui, K.-T. T., Le, V. M., Ly, H.-B., & Prakash, I. (2020). Development of advanced artificial intelligence models for daily rainfall prediction. Atmospheric Research, 237, 104845. https://doi.org/10.1016/j.atmosres.2020.104845
Prodhan, F. A., Zhang, J., Hasan, S. S., Pangali Sharma, T. P., & Mohana, H. P. (2022). A review of machine learning methods for drought hazard monitoring and forecasting: Current research trends, challenges, and future research directions. Environmental Modelling & Software, 149, 105327.
https://doi.org/10.1016/j.envsoft.2022.105327
Qian, Q., Jia, X., Lin, H., & Zhang, R. (2021). Seasonal Forecast of Nonmonsoonal Winter Precipitation over the Eurasian Continent Using Machine-Learning Models. Journal of Climate, 34, 7113–7129. https://doi.org/10.1175/JCLI-D-21-0113.1
Qian, Q. F., Jia, X. J., & Lin, H. (2020). Machine Learning Models for the Seasonal Forecast of Winter Surface Air Temperature in North America. Earth and Space Science, 7, e2020EA001140. https://doi.org/10.1029/2020EA001140
Ramirez, S., & Lizarazo, I. (2017). Detecting and tracking mesoscale precipitating objects using machine learning algorithms. International Journal of Remote Sensing, 38, 5045– 5068. https://doi.org/10.1080/01431161.2017.1323280
Rasouli, K., Hsieh, W. W., & Cannon, A. J. (2012). Daily streamflow forecasting by machine learning methods with weather and climate inputs. Journal of Hydrology, 414–415, 284–293. https://doi.org/10.1016/j.jhydrol.2011.10.039
Rodrigues, D. T., Gonçalves, W. A., Silva, C. M. S. E., Spyrides, M. H. C., & Lúcio, P. S. (2023). Imputation of precipitation data in northeast Brazil. Anais da Academia Brasileira de Ciências, 95, e20210737. https://doi.org/10.1590/0001-3765202320210737
Rolnick, D., Donti, P. L., Kaack, L. H., Kochanski, K., Lacoste, A., Sankaran, K., Ross, A. S., Milojevic-Dupont, N., Jaques, N., Waldman-Brown, A., Luccioni, A. S., Maharaj, T., Sherwin, E. D., Mukkavilli, S. K., Kording, K. P., Gomes, C. P., Ng, A. Y., Hassabis, D., Platt, J. C., Creutzig, F., Chayes, J., & Bengio, Y. (2022). Tackling Climate Change with Machine Learning. ACM Computing Surveys, 55, 42:1-42:96. https://doi.org/10.1145/3485128
Sachindra, D. A., Ahmed, K., Rashid, M. M., Shahid, S., & Perera, B. J. C. (2018). Statistical downscaling of precipitation using machine learning techniques. Atmospheric Research, 212, 240–258.
https://doi.org/10.1016/j.atmosres.2018.05.022
Sanches, R. G. (2015). As chuvas na região de São Carlos/SP: estudo do comportamento pluviométrico a partir de dados de estações climatológicas, 1993-2014 [Tese de Doutorado, Universidade de São Paulo].
https://doi.org/10.11606/D.18.2015.tde-16112015-100925
Sanches, R. G., Neves, G. Z. F., Santos, B. C., Silva, M. S. D., Pereira, D. N. B., & Tech, A. R.B. (2018). Intense Rainfall in São Carlos/SP: Determination of Threshold Values Using Climate Indices and Their Spatio-Temporal Repercussion. American Journal of Climate Change, 7, 388–401.
https://doi.org/10.4236/ajcc.2018.73023
Sanches, R. G., Santos, B. C. D., Miani, R. S., Neves, G. Z. F., Silva, M. S. D., & Tech, A. R. B. (2020). Analysis of Daily Rainfall in São Carlos/SP, Brazil over 1979-2017 Using Laplace Trend Test. Journal of Geoscience and Environment Protection, 8, 104–125. https:// doi.org/10.4236/gep.2020.87006
Shilong, Z., Jianwei, L., Dawei, L., Pengfei, Z., Yuanfang, D., & Mengxuan, Z. (2021). Machine learning model for sales forecasting by using XGBoost. In IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE) (pp. 480-483). IEEE.
Stanley, T. A., Kirschbaum, D. B., Sobieszczyk, S., Jasinski, M. F., Borak, J. S., & Slaughter, S. L. (2020). Building a landslide hazard indicator with machine learning and land surface models. Environmental Modelling & Software, 129, 104692. https://doi.org/10.1016/j.envsoft.2020.104692
Tan, W. Y., Lai, S. H., Teo, F. Y., & El-Shafie, A. (2022). State-of-the-Art Development of Two- Waves Artificial Intelligence Modeling Techniques for River Streamflow Forecasting. Archives of Computational Methods in Engineering, 29, 5185–5211. https://doi.org/10.1007/s11831-022-09763-2
Tarwidi, D., Pudjaprasetya, S. R., Adytia, D., & Apri, M. (2023). An optimized XGBoost- based machine learning method for predicting wave run-up on a sloping beach. MethodsX, 10, 102119.
https://doi.org/10.1016/j.mex.2023.102119
Tian, D., He, X., Srivastava, P., & Kalin, L. (2022). A hybrid framework for forecasting monthly reservoir inflow based on machine learning techniques with dynamic climate forecasts, satellite-based data, and climate phenomenon information. Stochastic Environmental Research and Risk Assessment, 36, 2353–2375. https://doi.org/10.1007/s00477-021-02023-y
Tolentino, M. (2007). Estudo crítico sobre o clima da região de São Carlos. EdUFSCar; Imprensa Oficial do Estado de São Paulo.
Wang, Y., Pan, Z., Zheng, J., Zhou, W., & Jia, J. (2019). A hybrid ensemble method for pulsar candidate classification. Astrophysics and Space Science, 364, 139. https:// doi.org/10.1007/s10509-019-3602-4
Xu, J., Jiang, Y., & Yang, C. (2022). Landslide Displacement Prediction during the Sliding Process Using XGBoost, SVR and RNNs. Applied Sciences, 12, 6056. https://doi.org/10.3390/app12126056
Zhou, J., & Lau, K.-M. (2001). Principal modes of interannual and decadal variability of summer rainfall over South America. International Journal of Climatology, 21, 1623– 1644. https://doi.org/10.1002/joc.700
Zhou, S., Liu, Z., Wang, M., Gan, W., Zhao, Z., & Wu, Z. (2022). Impacts of building configurations on urban stormwater management at a block scale using XGBoost. Sustainable Cities and Society, 87, 104235.
https://doi.org/10.1016/j.scs.2022.104235
Zhou, Z., Zhao, L., Lin, A., Qin, W., Lu, Y., Li, J., Zhong, Y., & He, L. (2021). Exploring the potential of deep factorization machine and various gradient boosting models in modeling daily reference evapotranspiration in China. Arabian Journal of Geosciences, 13, 1287. https://doi.org/10.1007/s12517-020-06293-8
Descarga artículo
Licencia
La revista Anales de Geografía de la Universidad Complutense, para fomentar el intercambio global del conocimiento, facilita el acceso sin restricciones a sus contenidos desde el momento de su publicación en la presente edición electrónica, y por eso es una revista de acceso abierto. Los originales publicados en esta revista son propiedad de la Universidad Complutense de Madrid y es obligatorio citar su procedencia en cualquier reproducción total o parcial. Todos los contenidos se distribuyen bajo una licencia de uso y distribución Creative Commons Reconocimiento 4.0 (CC BY 4.0). Esta circunstancia ha de hacerse constar expresamente de esta forma cuando sea necesario. Puede consultar la versión informativa y el texto legal de la licencia.







