Original Research

Ensemble learning models for the prediction of the weekly peak of PM2.5 concentration in Algiers, Algeria

Abstract

Introduction: This paper focuses on the prediction of weekly peak levels of Particulate Matter with an aerodynamic diameter of less than 2.5 µm (PM2.5), using various Machine Learning (ML) models. The study compares ML models to deep learning models and emphasizes the explain ability of ML models for PM2.5 prediction.
Materials and methods: We examine different combinations of features and time window dimensions to evaluate the performance of ML models. It utilizes Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (DT), and five Ensemble Models (EL) including AdaBoost, XGBoost, LightGBM, CatBoost, and Random Forest (RF). The dataset includes three years of daily measurements of weather parameters and PM2.5.
Results: Lagged values of PM2.5 improves prediction performance, particularly when the lagged value window size spans seven days or multiples thereof. This confirms that road traffic, which exhibits a weekly seasonality, is the primary source of PM2.5 in Algiers. Interestingly, including lagged values of weather parameters decreases prediction performance, even when chosen based on their correlation with PM2.5. The AdaBoost model performs the best, achieving a Root Mean Squared Error (RMSE) of 2.899 µg/m³ and an R2 value of 0.96.
Conclusion: EL models, specifically AdaBoost, exhibit strong performance in predicting PM2.5 levels. They not only provide accurate predictions but also allow analysis of feature importance. Lagged values of PM2.5 have a greater impact on predictions compared to weather parameters. Surprisingly, including weather parameters hampers prediction performance. Therefore, the utilization of ensemble learning models offers valuable insights into feature significance.

1. Bouhila Z, Mouzai M, Azli T, Nedjar A, Mazouzi C, Zergoug Z, Boukhadra D, Chegrouche S, Lounici H. Investigation of aerosol trace element concentrations nearby Algiers for environmental monitoring using instrumental neutron activation analysis. Atmospheric Research. 2015 Dec 1;166:49-59.
2. Talbi A, Kerchich Y, Kerbachi R, Boughedaoui M. Assessment of annual air pollution levels with PM1, PM2.5, PM10 and associated heavy metals in Algiers, Algeria. Environmental Pollution. 2018 Jan 1;232:252-63.
3. Belarbi N, Belamri M, Dahmani B, Benamar MA. Road traffic and PM10, PM2.5 emission at an urban area in Algeria: identification and statistical analysis. Pollution. 2020 Jul 1;6(3):651-60.
4. Ladji R, Yassaa N, Balducci C, Cecinato A. Particle size distribution of n-alkanes and polycyclic aromatic hydrocarbons (PAHS) in urban and industrial aerosol of Algiers, Algeria. Environmental Science and Pollution Research. 2014 Feb;21:1819-32.
5. Pu Q, Yoo EH. Ground PM2.5 prediction using imputed MAIAC AOD with uncertainty quantification. Environmental Pollution. 2021 Apr;274:116574.
6. Chellali MR, Abderrahim H, Hamou A, Nebatti A, Janovec J. Artificial neural network models for prediction of daily fine particulate matter concentrations in Algiers. Environmental Science and Pollution Research. 2016 Jul;23:14008-17.
7. Ibrir A, Kerchich Y, Hadidi N, Merabet H, Hentabli M. Prediction of the concentrations of PM1, PM2.5, PM4, and PM10 by using the hybrid dragonfly-SVM algorithm. Air Quality, Atmosphere & Health. 2020 Sep 11;14(3):313–23.
8. Wang Y, Wang H, Zhang SH. Prediction of daily PM2.5 concentration in China using data-driven ordinary differential equations. 2020 Jun 15;375:125088–8.
9. Wu H, Liu H, Duan Z. PM2.5 concentrations forecasting using a new multi-objective feature selection and ensemble framework. Atmospheric Pollution Research. 2020 Jul;11(7):1187–98.
10. Liou NC, Luo CH, Mahajan S, Chen LJ. Why is Short-Time PM2.5 Forecast Difficult? The Effects of Sudden Events. IEEE Access. 2020;8:12662–74.
11. Analitis A, Barratt B, Green D, Beddows A, Samoli E, Schwartz J, et al. Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London. Atmospheric Environment. 2020 Nov;240:117757,
12. Xing H, Wang G, Liu C, Suo M. PM2.5 concentration modeling and prediction by using temperature-based deep belief network. Neural Networks. 2021 Jan 1;133:157-65.
13. Harishkumar KS, Yogesh KM, Gad I. Forecasting air pollution particulate matter (PM2.5) using machine learning regression models. Procedia Computer Science. 2020 Jan 1;171:2057-66.
14. Kamińska JA. The use of random forests in modelling short-term air pollution effects based on traffic and meteorological conditions: a case study in Wrocław. Journal of environmental management. 2018 Jul 1;217:164-74.
15. Miskell G, Pattinson W, Weissert L, Williams D. Forecasting short-term peak concentrations from a network of air quality instruments measuring PM2.5 using boosted gradient machine models. Journal of environmental management. 2019 Jul 15;242:56-64.
16. Gao X, Li W. A graph-based LSTM model for PM2.5 forecasting. Atmospheric Pollution Research. 2021 Sep 1;12(9):101150.
17. Ma J, Ding Y, Cheng JC, Jiang F, Gan VJ, Xu Z. A Lag-FLSTM deep learning network based on Bayesian Optimization for multi-sequential-variant PM2.5 prediction. Sustainable Cities and Society. 2020 Sep 1;60:102237.
18. Zhang B, Zhang H, Zhao G, Lian J. Constructing a PM2.5 concentration prediction model by combining auto-encoder with Bi-LSTM neural networks. Environmental Modelling & Software. 2020 Feb 1;124:104600.
19. Pak U, Ma J, Ryu U, Ryom K, Juhyok U, Pak K, Pak C. Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Science of the Total Environment. 2020 Jan 10;699:133561.
20. Xu X, Tong T, Zhang W, Meng L. Fine-grained prediction of PM2.5 concentration based on multisource data and deep learning. Atmospheric Pollution Research. 2020 Oct 1;11(10):1728-37.
21. Hough I, Sarafian R, Shtein A, Zhou B, Lepeule J, Kloog I. Gaussian Markov random fields improve ensemble predictions of daily 1 km PM2.5 and PM10 across France. Atmospheric Environment. 2021 Nov 1;264:118693.
22. Stafoggia M, Bellander T, Bucci S, Davoli M, de Hoogh K, de’ Donato F, et al. Estimation of daily PM10 and PM2.5 concentrations in Italy, 2013–2015, using a spatiotemporal land-use random-forest model. Environment International [Internet]. 2019 Mar 1;124:170–9. Available from: https://www.sciencedirect.com/science/article/pii/S0160412018327685.
23. Zamani Joharestani M, Cao C, Ni X, Bashir B, Talebiesfandarani S. PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere. 2019 Jul 4;10(7):373.
24. Breiman L. Random forests. Machine learning. 2001 Oct;45:5-32.
25. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences. 1997 Aug 1;55(1):119-39.
26. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016 Aug 13 (pp. 785-794).
27. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems. 2017;30.
28. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems. 2018;31.
29. Algerian Office of Statistics [Internet]. [Cited 19-07-2021]. 2018. Available from: .https://www.ons.dz/IMG/pdf/Demographie. 2018.pdf.
30. Algerian Office of Statistics [Internet]. [Cited 31-10-2021]. 2019. Available from: https://www.ons.dz/IMG/pdf/e.immats2_2019.pdf
31.Project TWAQI. Air Quality Historical Data Platform [Internet]. Available from: https://aqicn.org/data-platform/
32. Tutz G, Ramzan S. Improved methods for the imputation of missing data by nearest neighbor methods. Computational Statistics & Data Analysis. 2015 Oct 1;90:84-99.
33. Zhou Y, Chang FJ, Chang LC, Kao IF, Wang YS, Kang CC. Multi-output support vector machine for regional multi-step-ahead PM2.5 forecasting. Science of the Total Environment. 2019 Feb 15;651:230-40.
Files
IssueVol 8 No 3 (2023): Summer 2023 QRcode
SectionOriginal Research
DOI https://doi.org/10.18502/japh.v8i3.13783
Keywords
Particulate matter with an aerodynamic diameter of less than 2.5 µm (PM2.5); Air pollution; Ensemble learning; Time series forecasting; Air pollution prediction

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
How to Cite
1.
Ghazi S, Dib A, Mendjel MS, Tarek M, Dugdale J. Ensemble learning models for the prediction of the weekly peak of PM2.5 concentration in Algiers, Algeria. JAPH. 2023;8(3):381-398.