Review Article

Review on air pollution of Delhi zone using machine learning algorithm


The issue of pollution in urban cities is a major problem these days especiallyin cities like the New Delhi is detected with more number of toxic gases in air, whic h has deduced the air quality of New Delhi. Thus, predictive analytics play a significant role in predicting the future instances of air quality based on the historical data. Forecasting the air quality of these cities is mandatory to overcome its consequences. Several machines learning algorithm is widely used these days to predict the future instances. Such as random forest, support vector machine, regression, classification, and so on. Main pollutants which present in the air are PM2.5, PM10, CO, NO2, SO2and O3 . In this paper we have focused mainly on data set of New Delhi for predicting ambient air pollution and quality using several machines learning algorithm.

[1] Chavi Srivastava, shyamli singh “Estimation of Air Pollution in Delhi Using Machine Learning
Techniques” 2018 International Conference on Computing, Power and Communication Technologies
(GUCON) Galgotias University, Greater Noida, UP, India. Sep 28-29, 2018.
[2] C.B. Guerreiro, V. Foltescu, F. De Leeuw, Air quality status and trends in Europe, Atmos.
Environ. 98 (2014) 376–384.
[3] I. Djalalova, L. DelleMonache, J. Wilczak, PM2. 5 analog forecast and Kalman filter post-
processing for the Community Multiscale Air Quality (CMAQ) model, Atmos. Environ. 108 (2015)
[4] Y. Guo, Q. Tang, D.-Y. Gong, Z. Zhang, Estimating ground-level PM2.concentrations in Beijing
using a satellite-based geographically and temporally weighted regression model, Remote Sens.
Environ. 198 (2017) 140–149.
[5] A. Azid, et al., Prediction of the level of air pollution using principal component analysis and
artificial neural network techniques: a case study in Malaysia, Water, Air, Soil Pollut.225 (8) (2014)
[6] D.R. Michanowicz, et al., A hybrid land use regression/AERMOD model for predicting intra-
urban variation in PM2. 5, Atmos. Environ. 131 (2016) 307–315.
[7] Q. Zhou, H. Jiang, J. Wang, J. Zhou, A hybrid model for PM2. 5 forecasting based on ensemble
empirical mode decomposition and a general regression neural network, Sci. Total Environ. 496
(2014) 264–274.
[8] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees, Wadsworth
International Group, Belmont, USA, 1984.Chapter 9.Bibliography.
[9] L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140.
[10] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.
[11] T. Chen, C. Guestrin, Xgboost: a scalable tree boosting system, in: Proceedings of the 22nd
AcmSigkdd International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794.
[12] R.O. Sinnott, Z. Guan, Prediction of air pollution through machine learning approaches on the
cloud, in: 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and
Technologies (BDCAT), IEEE, 2018, pp. 51–60.
[13] M. ZamaniJoharestani, C. Cao, X. Ni, B. Bashir, S. Talebies fandarani, PM2. 5 prediction based
on random forest, XGBoost, and deep learning using multisource remote sensing data, Atmosphere 10
(7) (2019) 373.
[14] M.H.D.M. Ribeiro, L. dos Santos Coelho, Ensemble approach based on bagging, boosting and
stacking for short-term prediction in agribusiness time series, Appl. Soft Comput.86 (2020) 105837.
[15] C. Xiao, N. Chen, C. Hu, K. Wang, J. Gong, Z. Chen, Short and mid-term sea surface
temperature prediction using time-series satellite data and LSTM-AdaBoost combination approach,
Remote Sens. Environ. 233 (2019) 111358.
[16] L. Li, S. Dai, Z. Cao, J. Hong, S. Jiang, K. Yang, Using improved gradient-boosted decision tree
algorithm based on Kalman filter (GBDT-KF) in time series prediction, J. Supercomput. (2020) 1–14.
[17] G. Bian, J. Liu, W. Lin, Internet Traffic Forecasting Using Boosting LSTM Method,"DEStech
Transactions on Computer Science and Engineering, 2017.
[18] J.-Y. Tao, Z.-M. Wu, D.-Z.Yue, X.-S.Tan, Q.-Q.Zeng, G.-Q. Xia, Performance enhancement of a
delay-based Reservoir computing system by using gradient boosting technology, IEEE Access 8
(2020) 151990–151996.
[19] H. Al-Hadeethi, S. Abdulla, M. Diykh, R.C. Deo, J.H. Green, Adaptive boost LS-SVM
classification approach for time-series signal classification in epileptic seizure diagnosis applications,
Expert Syst. Appl. 161 (2020) 113676.
[20] Y. Li, T. Bao, J. Gong, X. Shu, K. Zhang, The prediction of Dam displacement time series using
STL, extra-trees, and stacked LSTM neural network, IEEE Access 8 (2020) 94440–94452.
[21] V. John, Z. Liu, C. Guo, S. Mita, K. Kidono, Real-time lane estimation using deep features and
extra trees regression, in: Image and Video Technology, Springer, 2015, pp. 721–733.
[22] H. Tyralis, G. Papacharalampous, A. Langousis, Super Learning for Daily Streamflow
Forecasting: Large-Scale Demonstration and Comparison with Multiple Machine Learning
Algorithms, 2019 arXiv preprint arXiv:1909.04131.
[23] P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, Mach. Learn. 63 (1)
(2006) 3–42.
[24] Y. Freund, R.E. Schapire, A desicion-theoretic generalization of on-line learning and an
application to boosting, in: European Conference on Computational Learning Theory, Springer, 1995,
pp. 23–37.
[25] J.C.-W. Chan, D. Paelinckx, Evaluation of Random Forest and Adaboost tree-based ensemble
classification and spectral band selection for ecotope mapping using air borne hyper spectral imagery,
Remote Sens. Environ. 112 (6) (2008) 2999–3011.
[26] S. Chowdhury, S. Dey, L. Di Girolamo, K.R. Smith, A. Pillarisetti, A. Lyapustin, Tracking
ambient PM2.5 build-up in Delhi national capital region during the dry season over 15 years using a
high-resolution (1 km) satellite aerosol dataset, Atmos. Environ. 204 (2019) 142–150.
[27] K. Hu, A. Rahman, H. Bhrugubanda and V. Sivaraman, "HazeEst: Machine learning based
metropolitan air pollution estimation from fixed and mobile sensors", IEEE Sens. J., vol. 17, no. 11,
pp. 3517- 3525, 2017.
[28] Li, N. Hsu and S. Tsay, "A study on the potential applications of satellite data in air quality
monitoring and forecasting", Atmos. Environ., vol. 45, no. 22, pp. 3663-3675, 2011.
[29] G. Box and G. Jenkins, Time series analysis: Forecasting and Control. Hoboken: Wiley S. Pro.,
[30] Petersen, W. B. User‟s guide for HIWAY-2: a highway air pollution model. NC: U.S. EPA,
Research Triangle Park. EPA-600/8-80-018, 1980.
[31] Benson, P. E. CALINE 4. A dispersion model for predicting air pollution concentrations near
roadways. FHWA/CA/TL-84-15. Sacramento: California Department of Transportation, 1989.
[32] X. Tie, F. Geng, L. Peng, W. Gao and C. Zhao, "Measurement and modeling of O3 variability in
Shanghai, China: Application of the WRF-Chem model", Atmos. Environ., vol. 43, no. 28, pp. 4289009.
[33] K. Appel, A. Gilliland, G. Sarwar and R. Gilliam, "Evaluation of the Community Multiscale Air
Quality (CMAQ) model version 4.5: Sensitivities impacting model performance", Atmos. Environ.,
vol. 41, no. 40, pp. 9603-9615, 2007.
[34] SijieGe, Sujing Wang, QiangXu, Thomas Ho. "Study on regional air quality impact from a
chemical plant emergency shutdown".Chemosphere,vol. 201,pp.655-666, 2018.
[35] M. Baawain, "Systematic Approach for the Prediction of Ground- Level Air Pollution (around an
Industrial Port) Using an Artificial Neural Network", Aerosol Air Qual. Res., 2014.
[36] M. Huang, T. Zhang, J. Wang and L. Zhu, "A new air quality forecasting model using data
mining and artificial neural network", in 6th IEEE.
[37] S. Saxena and A. Mathur, "Prediction of Respirable Particulate Matter (PM10) concentration
using artificial neural network in Kota city", Asian Journal for Convergence in Technology, vol. 3, no.
3, 2018.
[38] S. Mihalache, M. Popescu and M. Oprea, "Particulate matter prediction using ANFIS modelling
techniques", in 19th International Conference on System Theory, Control and Computing (ICSTCC),
2015, pp. 895-900.
[39] Kumar and P. Goyal, "Forecasting of air quality index in Delhi using neural network based on
principal component analysis", Pure Appl. Geophy., vol. 170, no. 4, pp. 711-722, 2012.
[40] Azid, A., Juahir, H., Toriman, M.E. et al., "Prediction of the level of air pollution using principal
component analysis and artificial neural network techniques: A case study in Malaysia", Water, Air,
& Soil Pollution, vol. 225, no. 8, 2014.
[41] K. Hu, V. Sivaraman, H. Bhrugubanda, S. Kang and A. Rahman, "SVR based dense air pollution
estimation model using static and wireless sensor network," IEEE SENS J , Orlando, FL, pp. 1-3,
[42] W. Sun and J. Sun, "Daily PM 2.5 concentration prediction based on principal component
analysis and LSSVM optimized by cuckoo search algorithm", J. of Environ. Manage, vol. 188, pp.
144-152, 2017.
[43] Bai, Y. Li, X. Wang, J. Xie and C. Li, "Air pollutants concentrations forecasting using back
propagation neural network based on wavelet decomposition with meteorological conditions", Atmos.
Pollution Res., vol. 7, no. 3, pp. 557-566, 2016.
[44] J. Kleine Deters, R. Zalakeviciute, M. Gonzalez and Y. Rybarczyk, "Modeling PM2.5 urban
pollution using machine learning and selected meteorological parameters", Journal of Electrical and
Computer Engineering, vol. 2017, pp. 1-14, 2017.
[45] T. M. Chiwewe and J. Ditsela, "Machine learning based estimation of Ozone using spatio-
temporal data from air quality monitoring stations," 2016 IEEE 14th International Conference on
Industrial Informatics (INDIN), Poitiers, 2016, pp. 58-63.
[46] Peng, H., Lima, A.R., Teakles, A. et al. Evaluating hourly air quality forecasting in Canada with
nonlinear updatable machine learning methods. Air Qual. Atmos. Hlth. No. 10, Issue 2, pp 195–211,
March 2017.[47] Government of NCT of Delhi, "Economic-Survey of Delhi: 2014- 2015", New Delhi, 2015.
[48] Google. “The pollution monitoring systems selected for study in New Delhi.” [Online].
Available:,77.1261609,11z?hl=en. Accessed March 18,
[49] Central Pollution Control Board, (Ministry of Environment, Forests & Climate Change), Govt of
India, "National Air Quality Index", Central Pollution Control Board (CPCB), 2018.
[50] "Preparing your dataset for machine learning: 8 basic techniques that make your data better", [Online]. Available: /preparing-your-
datasetfor- machine-learning-8-basic-techniques-that-make-your-data-better.
[51] Nidhi Sharmaa , Shweta Tanejab* , Vaishali Sagarc , Arshita Bhattd “Forecasting air pollution
load in Delhi using data analysis tools” International Conference on Computational Intelligence and
Data Science (ICCIDS 2018)
IssueVol 5 No 4 (2020): Autumn 2020 QRcode
SectionReview Article(s)
Forecasting, Air pollution, Machine Learning.

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
How to Cite
Sinha A, Singh S. Review on air pollution of Delhi zone using machine learning algorithm. japh. 2020;5(4):259-272.