Geostatistical Approach In Determining Appropriate Cell Size For Slope Length (Ls) Factor In Soil Erosion Modeling

1 Mohammad Almasinia, 2 Wan Muhd Aminuddin Wan Hussin, 3hasan alizadeh, 4Shahryar Mamzaie

1Lecturer, Department of Geography, Payame Noor University,

PO BOX 19395-3697 Tehran, IRAN

3Assistant Professor, Department of Geology, Payame Noor University,

PO BOX 19395-3697 Tehran, IRAN

1&2School of Civil Engineering

Universiti Sains Malaysia

Engineering Campus

14300 Nibong Tebal, Pulau Pinang

Email:

Abstract

Soil erosion has been one of the environmental problems that have a very significant impact in human life. Various models have been introduced and used in studying the impact which include, among others, Revised Universal Soil Loss Equation (RUSLE) which is used in this study. This model requires a number of parameters such as rainfall erosivity (R), soil erodability (K), slope length and steepness (LS), crop cover (C) and prevention practice (P). With a Geographical Information System (GIS), such modeling can be accomplished and commonly done using raster technique whereby all the parameters are mapped in the form of grid layers of a specific cell size. The issue is that the appropriate cell size is always ignored or wrongly chosen without a thorough statistical analysis made prior to the analysis. This is the focus of this research whereby four cell sizes are selected, i.e. 30m, 50m, 100m and 300m and applied to data of Kuala Yong (in Pergau, Kelantan) as a sample test. LS factor has been selected for this comparison where semivariogram models are fitted to the height information (based on the original 20-m contour topographic map) and the best parameters such as nugget, partial sill and sill are derived. Overall results show that with increasing cell size up to 50m, the nugget effect decreases and spatial dependency increases. According to best spatial dependency and high variances and diversity of 50 m cell spacing, the Digital Elevation Model (DEM) of this cell size is found the most appropriate for such dataset. In general, results of this study confirm that geostatistical analysis could and should be applied to select suitable cell spacing in DEM to predict topographical factor in soil erosion modeling.

Key words: Soil Erosion Modeling, Cell Size, Slope Length, GIS

  1. Introduction

RUSLE is a series of mathematical equations that estimates the average annual soil loss and sediment yield resulting from interrill and rill erosion. RUSLE is an exceptionally well-validated and documented equation. The strength of RUSLE is that it was developed by a group of nationally-recognized scientists and soil conservationists who had considerable experience with erosional processes. (Soil and Water Conservation Society, 1993).

RUSLE is a model to predict annual soil loss erosion at longtime average, based on several parameters, i.e. rainfall-runoff, soil erodibility, slope length and steepness, cover management, and support practice. The yield of slope length L and steepness S is topographic factor LS, implying the topographic effect on soil loss.

In RUSLE, the LS factor is much more essential to the soil erosion. The soil erosion increases as the slope length and steepness increase although it is more sensitive to slope steepness than to slope length. So, the LS factor is combined from two factors as follows:

The slope length factor L which is a fundamental slope length measured in meters

The slope steepness factor S which is a function of slope angles measured in degrees and reflects the influence of slope gradient on erosion

In RUSLE, L and S are calculated by using a set of empirical models (Renard et. al., 1997). With a field sample, Wang et al., in 2000 have presented a spatial prediction and uncertainty analysis of the LS factor which was derived from the empirical models using the geostatistical methods. Thus, a physical based topographical factor LS equation has been developed based on a Digital Elevation Model (DEM) (Moore and Wilson, 1992).

For an accurate prediction, obtaining the spatial dependency with an appropriate DEM spacing which represents the spatial characteristics in the LS factor is essential. Recently, using DEMs with different cell sizes is readily in practice but the leading problem is to select an appropriate DEM spacing for a given task. Furthermore, the appropriate DEM spacing may be a fundamental to other variables, e.g. the complexity of the terrain, required precision, desired information and etc. It is clear that choosing appropriate DEM spacing is very important and may affect the results of consequent computation such as soil loss (due to erosion).

  1. Case Study Area and Datasets

The area selected for this research Kuala Yong which is located in the northwest region of peninsular state of Kelantan, Malaysia.The area located between latitude 5°38′41″N, 5°30′17″N and longitude 101°39′53″E, 101°45′18″ E as shown in Figure 1. The climate in this area is influenced by both the northeast and southwest monsoons and also annual rainfall is approximately 1200mm in the lower plain, but more in the hilly and mountainous area. Maximum rainfall occurs during September to January (Malaysian Meteorology Department, 2010). The extent of this area is approximately 90 Km2, 90 % of which is covered by forests.

Figure 1: Location of study area

Source: Google Earth

Data needed for this research project include:

  • R factor (from rainfall data)
  • K factor (obtained from soil map and soil samples)
  • LS factor (obtained from topographic map- contours)
  • C factor (obtained from Remote Sensing image)
  • P factor (obtained from Remote Sensing image and previous site visit)
  1. Methodology

Figure 2 illustrates the framework of this study in three stages. Firstly, data collections, such as collections of topographic map, hydrological map, satellite images, and soil samples. Secondly, statistical analysis, i.e. kriging and interpolation, are used to find most appropriate cell size in DEM spacing. Finally, this proposed methodology concludes via mapping of annual soil loss erosion with five parameters. The methodology is arranged in a systematic order so that the aims and objectives of this research study are achieved.

Figure 2: The project framework

This research had been conducted using an integrated approach to determine the parameters influencing erosion especially LS factor in the study area. The soil maps with the C, P, and K factors were obtained from the Department of Agriculture, while land use type maps were acquired from Department of Town Regional Planning, Malaysia. Also, rainfall data were obtained from information recorded in the Department of Meteorological, Malaysia as R factor. Finally, the digital topographic map and its derivative LS factor can be gained from the Survey and Mapping Department (JUPEM).

S and L are two topography factors of soil erosion which are calculated as a common factor. These factors calculate the effect of slope steepness (S) and slope length (L) on erosion. The slope steepness and slope length factor had been calibrated from a digitized topographic map of the study area. The digitized topographic map in line format (GIS shapefile) was then converted to digital elevation model (DEM) using “TIN to grid extension”.

A generated TIN based on contour line with 20m interval and boundary is shown in Figure 3. There is need to convert TIN layer into raster layer based on four cell sizes comprise 30, 50, 100 and 300 m resolution. These four DEM layers are fundamental to create the most appropriate sell size from the contour lines digitized with 20m interval. Figure 4 shows the DEM created from original contour lines layer with different cell size.

Figure 4: The comparison of DEMs generated with different cell sizes

According to the different cell size, the generated DEMs will be converted to grid which consists of desired tables. Extracting randomly some appropriate points, as the sample points through these tables are so important ingeostatistical methods. These samples points can be regularly or randomly spaced. Wherever the input points and their distribution is more, the results will be more reliable. Figure 5 indicates the distribution of the points for each DEM based on different cell size.

Using grid to point toolset can create a point layer with integer values that have a table to enter X, Y, and Z values for the points created from DEM layers. Due to the excessive points obtained, the model needs to reduce the number of these points for data fitting in the variogram. The locations of points are at center of each pixel. The geostatistical analyst toolbar has a toolset which is called cerate subset. The cerate subset toolset is the process of randomly dividing the database into two parts, the training and test dataset. To create a model using the training dataset and by validation tool it can be evaluated how good the predictions are relative to the known values in the dataset test

The geostatistical analysis of LS concludes in the following phases:

  • modeling the semivariogram or covariance to analyze surface properties
  • Kriging

A number of kriging methods are available for surface creation in geostatistical analyst, including ordinary, simple, universal, indicator, probability, and disjunctive kriging. Spatial statistical analyst provides the following functions from which desirable number of function are chosen for modeling the empirical semivariogram:

  • Circular
  • Spherical
  • Exponential
  • Gaussian
  • Linear

The selected model influences the prediction of the unknown values, particularly when the shape of the curve near the origin differs significantly. The steeper the curve nears the origin is, the more influence the closest neighbors will have on the prediction. As a result, the output surface will be less smooth.

Spherical model shows a progressive decrease of spatial autocorrelation (equivalently, an increase of semivariance) until beyond some distance autocorrelation is zero. The spherical model is one of the most commonly used models.

Kriging uses the semivariance to measure the spatially correlated component which is also called dependence or spatial autocorrelation. The semivariance is computed by the Eq 1:

(Eq 1)

Where:

γ = experimental semivariogram value.

z (xi) and z(xi +h) = values of variable z at xi xi+h, respectively

xi and xi+h = position in two dimensions.

Kriging is an advanced geostatistical procedure that generates an estimated surface from a scattered set of points with Z-values. Unlike the other interpolation methods supported by ArcGIS spatial analyst, kriging involves an interactive investigation of the spatial behavior of the phenomenon represented by the Z-values before selecting the best estimation method for generating the output surface.

One of the important tasks that must be done before fitting the data is the review of the data set to select a suitable model using Exploratory Spatial Data Analysis (ESDA) tools. The ESDA environment comprise of a set of tools so that each one allows a view into the data. Each view can be manipulated and explored based on different insights into the data such as the distribution of the data, the spatial autocorrelation, and the covariation among multiple data sets. ESDA promotes understanding about of natural phenomena so that itcan make better decisions on the issue related to the data.

Some methods in the geostatistical analyst require that the data to be normally distributed. When the data is skewed (the distribution is lopsided), it must be converted to the normal data. The histogram and normal QQ Plot allow for exploring the effects of different transformations on the distribution of the dataset. If the interpolation model uses one of the kriging methods, and the data as one of the steps are chosen to transform, the predictions will be transformed back to the original scale in the interpolated surface.

If data points are not normally distributed, they can be close to normal using of different transformation option. There are two methods as follows:

  • The Box-Cox transformation is shown by the Eq 2:

Y(s) = (Z(s) λ - 1)/λ, for λ≠ 0 (Eq 2)

For example, suppose that data is composed of counts of some phenomenon. For these types of data, the variance is often related to the mean. If data have small counts in part of study area, the variability in that local region will be smaller than the variability in another region where the counts are larger. In this case, the square-root transformation may help to make the variances more constant throughout the study area and often makes the data appear normally distributed as well. The square-root transformation is a special case of the Box-Cox.

  • Log transformation: The log transformation is actually a special case of the Box-Cox transformation when λ = 0; the transformation is described by the Eq 3:

Y(s) = ln(Z(s)), for Z(s) > 0 (Eq 3)

Where ln is the natural logarithm

The log transformation is often used where the data has a positively skewed distribution and there are a few very large values. If these large values are located in the study area, the log transformation will help make the variances more constant and it normalizes the data. Concerning terminology, when a log transformation is implemented with kriging, the prediction method is known as log normal kriging, whereas for all other values of λ, the associated kriging method is known as trans-Gaussian kriging (help of ArcGIS 9.2). Figure 6 illustrates both skewness and kurtosis diagrams.

In Figure 6 and Figure 7, the data in this study shows positively skewed distribution; therefore the skewed data must betransforming into the normal distribution by using log option.

Figure 7: Data skewed to transform normal distribution using log

The next step is to fit a model to the points forming the empirical semivariogram. The semivariogram modeling is a key step between spatial description and spatial prediction. The main application of kriging is the prediction of attribute values at unsampled locations. The empirical semivariogram provides information on the spatial autocorrelation of datasets. However, it does not provide the information for all possible directions and distances. For this reason, and to ensure that kriging predictions have positive kriging variances, it is necessary to fit a model which is to be a continuous function or curve to the empirical semivariogram. Abstractly, this is similar to a regression analysis in which a continuous line or curve is fitted to the data points.

A semivariogram plots the average semivarince against the average distance. Because of the directional component, one or more average semivariances may be plotted at the same distance. If spatial dependence exists among the sample points, then pairs of points that are closer in distance will have more similar values than pairs that are farther apart. In other words, the semivariance is expected to increase as the distance increases in the presence of spatial dependence.

A semivariogram can also be examined by direction. If spatial dependence has directional differences, then the semivariance values may change more rapidly in one direction than another. Anisotropy is a term describing the existence of directional differences in spatial dependence (Eriksson and Siska 2000). Isotropy represents the opposite case in which spatial dependence changes with the distance but not with the direction.

A semivariogram may be used alone as measure of spatial autocorrelation in the data set. But to be used as an interpolator in kriging, the semivariogram must be fitted with a mathematical function or model. The fitted semivariogram can then be used for estimating the semivariance at any given distance. Fitting a model to a semivariogram is a difficult and often controversial task in geostatistics (Webster and Oliver 2001). One reason for the difficulty is the number of models to choose from (geostatistical analyst offers 11 models). The other reason is the lack of a standardized procedure for comparing the models. Webster and Oviler (2001) recommend a procedure that combines visual inspection and cross-validation. Crass-validation is a method for comparing interpolation methods. A fitted semivarance can be dissected into three possible elements:

  • Nugget: is the semivariance at the distance of 0, representing measurement error
  • Range: is the distance at which the semivariance starts to level off
  • Sill: the semivariance at which the leveling takes place

The sill comprises two components: the nugget and partial sill. In other words, the partial sill is the difference between the sill and the nugget which is shown in Figure 8.

The geostatistical analyst can provide various types of map layers including prediction maps, quantile maps, probability maps, and prediction standard error maps.

The method used in this study is universal probability by using spherical model because this method can be used to predict where values exceed a critical threshold. Calculation equation spherical is shown in the Eq 4:

(Eq 4)

Universal kriging assumes that the spatial variation in Z values has a drift or a trend in addition to the spatial correlation between the sample points. Typically, universal kriging incorporates a first order (plane surface) or a second order (quadratic surface) polynomial in the krining process.

After fitting to the models, ME (Mean Error) and RMS (Root Mean Square Error) are criteria to select optimal models for skewness estimations and accuracy which are used respectively. The Eq 5 and the Eq 6 are shown as follow:

(Eq 5)

(Eq 6)