-ECE539 Project Report (Professor Yu Hen Hu)-
Application of Multilayer Perceptron (MLP) Neural Network in Identification and Picking P-wave arrival
Haijiang Zhang
Department of Geology and Geophysics
University of Wisconsin-Madison
Abstract
Quickly detecting and accurately picking the first-arrival of a P wave is of great importance in locating earthquakes and characterizing velocity structure, especially in the era of large volumes of digital and real-time seismic data. The detector should be capable of finding the onset of the P-wave arrival against the background of microseismic and cultural noise. Normally, P-wave onset is characterized by a rapid change in the amplitude and/or the arrival of high-frequency energy.
The Akaike information criteria (AIC) picker has been used to detect and pick the P-wave arrival (Maeta 1986; Maeta 1989). But AIC picker requires an appropriate time window, or it will detect the wrong P-wave arrival. The Multilayer Perceptron (MLP) neural network is used to detect the P-wave arrival, from which a time window can be chosen for the AIC picker. This method has been applied to our PASO array data set. About 90% of P first-arrivals are detected correctly. Compared with manual picks, this picker provides onset times and uncertainties with high confidence. 91% of autopicks are within 0.15 seconds of analyst picks for this data set.
1. Introduction
Quickly detecting and picking the arrival times for P and S waves from the recordings of earthquake events are of great importance in event location, event identification, source mechanism analysis, and spectral analysis. Traditionally, this work is did by an analyst who checking the seismograms and picking out P and S arrivals based on his individual experience. This task is time consuming and subjective, especially in the era of large volumes of digital and real-time seismic data. There is a need to provide a more reliable and robust alternative, which is less time consuming and perhaps more objective.
There have been some techniques in the literature to detect and pick the seismic waves arrivals. The traditional approach to automatic phase detection has been to apply a series of narrow bandpass frequency filters and then use the absolute value as the characteristic function (CF). When the ratio between the short term average (STA) and the long-term average (LTA) of the CF exceeds a predefined threshold, a detection is declared. Absolute values and the envelope function of the seismogram are usually used as CF (Allen, 1982).
Artificial neural networks have also been used to construct the characteristic function to detect and pick the seismic phases (Dai et al., 1995, 1997; Zhao et al., 1999; Wang et al., 1997). It is claimed that ANN method is very successful and promising in detecting and picking seismic phases. There are two different types of input vector fed to the neural network, which are the associated values of the seismograms such as mean amplitude, spectral properties, planarity, etc., and the absolute values of the seismograms, respectively. Comparatively, the former method may lose information and involve too much computing time. Using the full waveforms as the network input might be a better choice. ANN is very successful in detecting the seismic phases. However, it is difficult to pick the seismic arrival time from the characteristic function. It is not easy to determine which point should be chosen as the arrival time because there is a region of the characteristic function exceeding the predefined threshold. Multi-term method is tried to shrink this region, but it still requires an empirical value to determine the phase arrival (Zhao et al., 1999). Different from the previous methods, the Akaike Information Criterion (AIC) picker is used to pick the P-wave arrival in this report. When the time window is chosen properly, AIC picker can choose the phase arrival very accurately. The MLP neural network will choose a time window for the AIC picker.
This report will review the AIC picker and the Multilayer Perceptron (MLP) neural network first. Then I will discuss the problem of constructing the MLP neural network to detect the P-wave arrival and how the AIC picker is used to pick the P-wave arrival. Finally the application of this method in the PASO array data is given.
2. AIC Picker
Suppose that the seismogram can be divided into locally stationary segments each modeled as an Autoregressive (AR) process and the intervals before and after the onset time are two different stationary processes (Sleeman et al, 1999). The order and the value of the AR coefficients change when the characteristic of the current segment of seismogram is different from before. For example, the typical seismic noise is well represented by a relatively low order AR process, whereas seismic signals usually require higher order AR process (Leonard, et al., 1999). Akaike Information Criterion (AIC) is always used to determine the order of the AR process when fitting a time series with AR process, which indicates the badness of the model fit as well as the unreliability (Akaike, 1974). This method has been used in onset estimation by analyzing the variation in AR coefficients representing both multi-component and single-component traces of broadband and short period seismogram (Leonard et al., 1999). When the order of the AR process is fixed, AIC function is a measure for the model fit, and the point where AIC is minimized determines the optimal separation of the two stationary time series in the least squares sense, and thus is interpreted as the phase onset (Sleeman et al, 1999). This picker is known as AR-AIC picker (Leonard, 2000).
Different from AR-AIC picker, Maeta calculates AIC function directly from the seismogram, without using the AR coefficients (Maeta, 1985 and Maeta, 1986). The onset is the point where the AIC has a minimum value. For the seismogram x, the AIC value is defined as
AIC(k)=k*log(variance(x[1,k]))+(n-k-1)*log(variance(x[k+1,n]))
where k goes through all the seismogram.
Noted that AIC picker finds the onset point as the global minimum. For this reason, it is necessary to choose a time window that includes only the segment of seismogram of interest. If the time window is chosen properly, AIC picker can find the p-wave arrival accurately. For the seismogram with a very clear onset, AIC values have a very clear global minimum, which corresponds to the P-wave arrival (Figure 1a). For the seismogram with a relatively low S/N ratio, there are a few local minima in AIC values. But the global minimum still indicates accurately the P-wave onset (Figure 1b). When there are more noises in the seismogram, global minimum cannot guarantee to indicate the P-wave arrival (Figure 1c). That is, the signal to the noise ratio in the seismogram affects the accuracy of the AIC picker to some extent. But it is noted that this effect is not significant. For this reason, we do not filter the seismogram in advance because the band pass filter can reduce the first motion and distort the true P-wave arrival (Douglas et al., 1997).
a b
c
Figure 1. Seismogram and its corresponding AIC values. a) For Seismogram with clear p-wave arrival, AIC value is a very clear minimum point. b)For seismogram with clear p-wave arrival with relatively lower S/N ratio, AIC function has many local minima, whereas the global minima still corresponds to the p-wave onset. c) For very low S/N seismogram, there are a few of local minima close to each other. In this case, the global minima ca not be guaranteed to be the p-wave arrival.
If there are more seismic phases in a time window, AIC picker will choose the stronger phase (Figure 2). On the other hand, AIC picker is not "smart" enough that it will usually pick an "onset" for any segment of data no matter whether there is a true phase arrival in the time window or not (Figure 3). For this reason, we need guide the work of AIC picker by choosing an appropriate window for it.
Figure 2.Seismogram with two phases and the corresponding AIC values. It is noted that there are clear local minima with respect to each phase arrival. But the global minimum indicates the arrival of stronger phase.
Figure 3. Seismic noise data and its AIC values. The minimum value does not indicate any phase arrival although it divides the data into two different stationary segments.
3. Artificial Neural Network: Multilayer Perceptrons (MLP)
Multilayer perceptrons have been successfully applied to solve many difficult and diverse problems. The mathematical perceptron was proposed by McCulloch and Pitts (1943) to mimic the behavior of a biological neuron (Haykin, 1999). The biological neuron is mainly composed of three parts: the dendrites, the soma, and the axon. The dendrites accept information from other neurons by synapses. These input signals are attenuated with an increasing distance from the synapses to the soma. The soma integrates the received signal and thereafter activates an output depending on the total input. The axon transmits the output signal to other neurons by the synapses located at the tree structure at the end of the axon (Ban, 2000).
The mathematical neuron proceeds in a similar way but simpler way as integration takes place only over space. Typically, the network is made up of sets of nodes arranged in layers, an input layer, one or more hidden layers and an output layer. The input signal propagates through the network in a forward direction, on a layer-by-layer basis. Each node is the basic processing unit with a nonlinear activation function. The outputs of the nodes in one layer are transmitted to nodes in another layer through links called weights, which can effectively amplify or attenuate the signals. Except for the input layer, the net input to each node is the sum of the weighted outputs of nodes in the previous layer.
MLP successfully solve some difficult problems by training them in a supervised manner with a highly popular algorithm known as the error back-propagation algorithm, which is based on the error-correction learning rule. Basically, the error-correction learning consists of two phases: a forward phase and a backward phase. In the forward phase, the input vector is fed into the nodes of the input layer and propagates through the network layer by layer. The output vector is produced as the actual response of the network. In the forward phase, the weights connecting the network nodes are fixed. During the backward phase, however, the synaptic weights are all adjusted based on an error-correction rule. This method attempts to find the most suitable solution for a global minimum in the mismatch between the desired output pattern and its actual value for all of the training samples. The degree of mismatch for each input-output pair is quantified by solving for unknown synaptic weights between the hidden and output layer and then by propagating the mismatch backwards through the network to adjust the synaptic weights to make the actual response of the network move closer to the desired response in a statistical sense.
A multilayer perceptron has three distinctive characteristics (Haykin, 1999):
(1) The model of each neuron in the network includes a nonlinear activation function. Two types of nonlinear activation function are usually used: the sigmoid function and the hyperbolic tangent function.
(2) The network includes one or more hidden layers, which could enable the network to learn complex tasks by extracting progressively more meaningful features from the input patterns.
(3) The network exhibits a high degree connectivity, which is determined by the synapses of the network.
4. MLP neural network: Detection of the P-wave arrival
Several characteristic functions of the seismogram can be used as the input of the neural network, such as the absolute value function, the square function, Allen’s function, the envelop function, and the modified differential function. Following Dai’s method (Dai et al., 1995, 1997), the absolute values of the seismogram is chosen as the input of the MLP neural network since they have the highest fidelity and processing speed and are most objective amongst these functions. The reason that the seismogram itself is not used is that the first motion of an arrival has two directions (up and down) and is source dependent.
30 samples of absolute values of the seismogram are fed into the neural network. The input samples are normalized because the amplitude of the seismogram is strongly dependent on the magnitude and epicentral distance of an earthquake. By this normalization, a small set of training data can cover all the recordings with different amplitudes. For P-wave segment, the arrival is located at the 20th sample. The noise segment is extracted from the prior part of the P-wave arrival. The part before the onset is made longer than the part after it in order to achieve better distinction between the signal patterns and noise patterns. Figure 4 shows the P-wave segment and the noise segment, respectively. There are two output nodes of the neural network flag the input segment with (1, 0) for P arrivals and (0, 1) for the background noise.
It is very important to select the appropriate training sets. The training sets should represent the typical features of a signal with different frequency characters. A rule of thumb is to begin with a very small training set and add new patterns until performance is satisfactory. For the PASO array data, 9 pairs of the P-wave arrival and noise segments are chosen to train the MLP network (Figure 5).
For the input vector, MLP neural network creates the decision boundary for the input space, making it possible to recognize patterns. Any given decision boundary can closely be approximated by a two-layer network-one hidden layer and one output layer-having a sigmoid activation function. For this reason, only one hidden layer is used for configuring the MLP neural network.
.
Figure 4. P-wave arrival and noise segments