[Sustinere] Manuscript Template

Data Driven Analysis using Fuzzy Time Series for Air Quality Management in Surabaya

Didiet Darmawan1, Mohammad Isa Irawan2, Arie Dipareza Syafei3

1Department of Technology Management Faculty of Business and Technology Management,

2Department of Mathematics Faculty of Mathematics and Natural Sciences.

3Department of Environmental Engineering Faculty of Civil, Environmental and Geo Engineering.,

Sepuluh Nopember Institute of Technology, Surabaya-Indonesia


Abstract. One of the environmental issues that can affect human health is air pollution. As the second largest city in Indonesia, economic development and construction in the city of Surabaya led to the increasing role of industrial and motor vehicle use which is proportional to the increase in fuel oil consumption which ultimately led to declining air quality. Gas pollutants that contribute to air pollution such as CO, SO2, O3, NO2 and particulate matter PM10 are pollutants that have a direct impact on health. This study aims to analyze, monitor and predict air pollutant concentrations recorded by the Environment Agency Surabaya City based on time series with Fuzzy Time Series.MAPE calculation results on the parameters of pollutants NO2: 23.6%, CO: 19.5%, O3: 22.75%, PM10: 9.96% and SO2: 3.6%.

Keywords: Fuzzy Time Series; forecasting; air pollution; air quality management

1.  Introduction

Surabaya as one of the major cities in Indonesia with a population of 2,943,528 inhabitants (based on data from Population and Civil Registration in Surabaya on 2015) has a variety of environmental quality issues. One of the environmental issues and a major concern on the problems of big cities in Indonesia, especially in the city of Surabaya that can affect human health is air pollution. Economic development in Surabaya city has led the increasing role of industrial and vehicle usage which is also affect the increased fuel oil consumption.

These activities contribute to pollutants in the atmosphere, each 1liter fuel used for the combustion process generates nitrogen oxides 30gram / NOx, 100gram Carbon Monoxide / CO(Hickman,1999).These pollutants can damage the health of the respiratory tract in humans if the levels exceed safe levels. Epidiomology study can conclude about the close connection between the level of urban air pollution in the incidence of respiratory disease. For example in gas pollutant NO2,if the levels of NO2 at 250 μg/m3up to 500 μg/m3 can cause respiratory function in patients with asthma [11]. Lungs gas contaminated with NO2 will swell making it difficult to breathe.

To mitigate the impact of air pollution it is important to manage environmental quality, air quality management activities carried out by the efforts of supervising and monitoring the activities that have the potential to contaminate the air (City Regulation No. 3 Surabaya, 2008). Pollutant monitoring activities carried out by the Environment Agency (BLH) and shown to the public in the form of Units Pollutant Index. Parameters of air pollutants, such as: Nitrogen dioxide (NO2),Carbon Monoxide (CO), ozone (O3),particulates (PM10)and sulfur dioxide (SO2)

The air quality prediction analysis based on the previous time series is done using structural time model model model ARIMA (Kusumawardhana, 2015). This model can predict air pollutant contents in using training data for next day's prediction based on data sequences of air pollutant concentrations released by Environment Agency (BLH). In this study, Fuzzy time series average interval based is used to establish prediction based on time series data record from Quality Monitoring Stations (SUF) of the BLH in Surabaya. Five parameters above will be used as the pollutant variables to be calculated using Fuzzy time series.

2.  Fuzzy Set and Fuzzy Time Series

2.1.  Fuzzy Set

The basic principle of fuzzy sets is represent a value that does not just mean false (0) or true (1). In the fuzzy set membership value lies in the range of 0 to 1, which means that the interpretation of fuzzy sets can represent each value based opinions or decisions. Still there are values that lie between right and wrong. In other words, the truth value of an item not only right and wrong. In terms of fuzzy sets contained universe of discourse is the whole value is allowed to be operated in a fuzzy variables. Universe of discourse is the set of real numbers always rise or increase monotonically from left to right. Value universe of discourse can be either negative or positive number. For example: the universe of discourse for variable sample temperature [-7°C, 50°C].

2.2.  Fuzzy Time Series

Reference manager such as Mendeley is strongly suggested to be used in the manuscript to organize citation and references. The style of referencing is in accordance with American Psychological Association (APA). Using common reference manager, this style is already available in its library of reference/citation style, authors only need to apply it.

Fuzzy time series prediction data is a method that uses the concept of fuzzy sets as the basis for calculation. The method used is to capture the pattern of past data is then used to project the data come. The definition of fuzzy time series according to (Q.Song and BS Chissom, 1993):

·  Definition 1: Y(t) (t = 0, 1, 2, ...) is a subset of R. Let Y(t) is the set of rules described by fuzzy set μi(t). If F (t) consisting of μi(t)(i = 1, 2, ...), F(t) is called fuzzy time series Y(t).

·  Definition 2: if F (t + 1) = Ai and F (t) = Aj, a fuzzy logical relationship can be described as Ai → Aj, where Ai and Aj the left side and the right side of fuzzy logic relationship.

3.  Air Quality Monitoring

Air quality monitoring is an activity to calculated the pollutant content in the air.Calculations of air pollutant content are carried out in order to get a general idea of how pollution conditions occur.General purpose of air quality monitoring is (Ahmet and Dijk, 1994):

·  Provides a strong scientific basis for development that requires cost-effective policy controls and solutions to overcome air pollution.

·  Determine how air quality standards and thresholds are.

·  Evaluate the potential impacts of air pollution to ecosystems and the environment.

·  Meet air quality legal reporting requirements.

Status of air quality category is obtained from the measurement of the monitoring station. The Air Pollution Standard Index is a non-unit number that describes ambient air quality conditions at a particular location and time based on impacts on human health, aesthetic values and other living things

Table 1. Figures and categories of Air Pollutant Unit Index

Index / Category
1 -50 / Good
51 – 100 / Average
101 – 199 / Unhealty
200 – 299 / Very Unhealty
> 300 / Dangerous

Table 2. Limit of Air Pollution Unit Index

Air Pollution Unit Index / NO2 per Hour
(µg/Nm3) / CO per 8 Hrs
(µg/Nm3) / O3 per Hour
(µg/Nm3) / PM10 per 24 Hrs (µg/Nm3) / SO2 per 24Hrs
(µg/Nm3)
50 / (2) / 5 / 120 / 50 / 80
100 / (2) / 10 / 235 / 150 / 365
200 / 1130 / 17 / 400 / 350 / 800
300 / 2260 / 34 / 800 / 420 / 1600
400 / 3000 / 46 / 1000 / 500 / 2100
500 / 3750 / 57,5 / 1200 / 600 / 2620
50 / (2) / 5 / 120 / 50 / 80

1.  At 25oC and 760 mm Hg.

2.  No index can be reported at low concentrations with short exposure terms.

Calculation formula of air pollutant unit index is as follows:

I = Ia-IbXa-Xb (Xx – Xb) + Ib (1)

where:

4.  Research Methods

The method used is Fuzzy Time Series, data collection is obtained from historical data of Fixed Monitor Station. Schematic of research procedure can be seen in the following figure 1. Nitrogen Dioxide (NO2) and Carbon Monoxide (CO) parameter, using data from the records of fixed monitoring station SUF 6 located in Wonorejo, from date July 8th to July 21st 2013. Ozon (O3), data used is from record date July 8th to July 21st 2013 on SUF 1, located Prestasi park. PM10, using data from date July 8th to July 21st 2013 from SUF 3 located on Sukomanunggal. SO2 parameter using data record from SUF 4 located on Gayungan, date record is July 20th to August 2nd 2013. This data represent actual content value of each air pollutant parameter per 30 minutes. On this paper, all data is multiplied by 100 to make calculation easier, and divide it back to original value when counting MSE and MAPE.

Figure 1. Flowchart diagram

Fuzzy Time Series step as mention below:

1)  Detemine the data interval. Calculate the absolute value of the data difference value where Di(i = 1,... n-1), so we can get:

i=1n-1|Di+1-Di| (2)

2)  Then divide the summary with total amount data

3)  To divine the interval basis, divide result from point (2) into 2

4)  Next step is form Universe of Discourse (U), using equation:

(Dmax – Dmin)/interval length (3)

5)  Where Dmax is the biggest data and Dmin are minimum data. Then U:

U={U1, U2, U3, .....Un} (4)

6)  Which will formlike U1={Dmin, X1}, U2= {X1, X2}, ... Un={Xn-1, Dmax}, where x1 < x2 < ...... < xn-1

7)  Define each member of fuzzy set Ai which has been previously divided, is described below:

Ai = fAi(U1)/ U1 + fAi(U2)/ U2 + .... + fAi(Un)/ Un (4)

Where fAi is the membership function of Ai, fAi: U→ [0,1]. fAi (Ui),then constructed where the fuzzy set Ai a k where 1 ≤ i ≤ k. For Un as much Aj, where 1 ≤ j ≤ n, in order to obtain fuzzy set as follows:

A1 = a11/ U1 + a12/ U2 + ... + a1n/ Un

A2 = a21/ U1 + a22/ U2 + ... + a2n/ Un

......

Ak = ak1/ U1 + ak2/ U2 + ... + akn/ Un (5)

8)  Define fuzzy logic relationship, if Y(t) (t=0, 1, 2, 3, ...) is the set of universe which describe as fuzzy set Ui(t). If F(t) consists of Ui(t) (i = 1, 2, 3, ...), F(t) is called a fuzzy time series on Y(t).

If F(t) then Y(t) (6)

9)  Let F(i) = Ai and F(i + 1)= Aj. Relationship between the two consecutive sampling, F(i) and F(i + 1) to F(i)→ F(i + 1), referred to as fuzzy logic relationship, can be denoted by Ai → Aj, where Ai referred to as the left or Current state and Aj is called the right side or the Next state.

10) Combining fuzzy logic relationship into a Fuzzy Logic Relationship Group (FLRG), starting from the left side which having the same group. E.g (Ai): Ai → Aj1, Ai→ Aj1 and Ai → Aj2. All of three relationship of fuzzy logic can be grouped, and assuming as a same group, so it was taken just one.

11) Defuzzification process, e.g from FLRG result generate: Ai(Current State), Aj1, Aj1 , Aj1, Aj2, Aj2, ... Ajp (Next State), then calculation process for Fuzzy Logic Relationship Group produce: Aj1, Aj1 , Aj1, Aj2, Aj2, ... Ajp where maximum membership value of : Aj1, Aj1 , Aj1, Aj2, Aj2, ... Ajp is the midpoint of U1, U2, U3, ..., Up thus the calculation is: (m1, m2, ...., mp)/p.

Then calculate the forecast error rate using MAPE and MSE.

MAPE = | Ai – Fi | / Ain (7)

MSE = i=1n(Ai – Fi)2n (8)

Where,

Ai = actual value on the data to-i

Fi = value prediction results on the data to -i

n = lots of data

5.  Analysis Result

5.1.  Pollutant Parameter NO2

In NO2 parameter, value of MAPE is 23%, and MSE 13,8. From the predicted result, the biggest predictor value is 22 μg/m3, based on air pollutant index, display data does not display any information, because the value of Air pollutant unit index that can be reported for NO2 parameter is 1130. Based on air pollutant index, forecasted result from fuzzy time series method showing that public display won’t show any information, because index parameter is in below lower limit that can be recorded. Comparison graph between actual and forecasted data from 8th July – 21st July 2013 can be seen on figure 2 below.

Figure 2. Comparison graph for pollutan parameter NO2

(8th July – 21st July 2013)

5.2.  Pollutant Parameter CO

Figure 3. Comparison graph for pollutan parameter CO

(8th July – 21st July 2013)

From the calculation using FTS method, obtained MAPE value is 19,5% and MSE 0,11. Based on formula (1) and table 1, CO Pollutant status is in Average. Short term management recommendation to anticipate increasing value of pollutant is City government whould have an alternative traffic route, or distributing mask. CO pollutant sources are naturally generated from metal oxidation in the atmosphere, mountains, volcanic activity and forest fires. Meanwhile, human activities that produce CO are mostly from motor vehicle emissions. Another source of CO is a coal charcoal gas containing over 5% CO, which is gas-fired heating appliances, gas refrigerators, gas stoves, and chimneys that work poorly.

5.3.  Pollutant Parameter O3