Investigation of Randomly Modulated Periodicity in Musical Instruments
Shlomo Dubnov
Music Department, University of California San Diego, La Jolla, CA92093-0326. USA,
Melvin J. Hinich
Applied Research Laboratories, The University of Texas at Austin, Austin, TX 78713-8029, USA
Abbreviated Title: Random Modulated Periodicity
Received
Abstract
Acoustical musical instruments, which are considered to produce a well-defined pitch, emit waveforms that are never exactly periodic. A periodic signal can be perfectly predicted far into the future and considered deterministic. In nature, and specifically in sustained portion of musical sounds, there is always some variation in the waveform over time. Thus, signals that are labeled as periodic are not truly deterministic. In this paper we use a formal definition for such a varying periodic signal by means of a modulation coherencefunction. This measure characterizes the amount of random variation in each Fourier component and allows capturing its statistical properties. The estimation is done in period or pitch-synchronous manner and allows capturing even the smallest deviations away from periodicity, with only mild assumptions on the nature of the random modulating noise. This modulation coherence function is very different from the coherence function between two stationary signals, which measure second order statistical / spectral similarity between signal. It is also different from non-linear phase coupling measures that were previously applied to musical sounds, which depend on interaction between several harmonic Fourier components using higher order statistics. The method is applied to a digitized record of an acoustic signal from several musical instruments.
PACS 43: 60.Cg, 75.De, 75.Ef, 75.Fg
I. Introduction
This paper investigates fluctuations away from perfect periodicity in pitched acoustic instruments during a sustained portion of their sounds. Acoustical musical instruments, which are considered to produce a well-defined pitch, emit waveforms that are never exactly periodic (Beauchamp 1974, McIntyre et al. 1981, Schumacher 1992, Dubnov and Rodet 2003). The paper focuses quantitatively on one of the possible ways of describing those fluctuations that has not been quantitatively addressed so far, namely, random modulated periodicity (Hinich 2000, Hinich and Wild 2001). This type random modulation is encountered in signals, which are labeled as periodic, but exhibit some variation in the waveform over time which are not truly deterministic. A randomly modulated periodic signal is created by some mechanism that has a more or less stable inherent periodicity with random deviations around the mean periodic value. For example, in speech signals voiced speech is randomly modulated since the oscillating vocal cord varies slowly in amplitude and phase over several pitch periods in a seemingly random fashion. Other examples include sonar reflections pinging on a target, rotating machinery (Barker et al., 1994), and so on.
In this work we investigate instrumental sounds that have a well-defined pitch during a sustained portion of their sound. Although we are dealing with sustained portions of instrumental sounds, it is important to state that these sounds are not in the "steady state" as would be produced by an artificial blowing or bowing machine, but are played by a human player, with all the attendant vibrato, amplitude and pitch variability. For instance, it should be noted that both the flute and the cello are normally played with significant vibrato at around 6Hz, while the trumpet is normally played with no vibrato. In the case of Cello, one must also distinguish between natural playing of stopped and open strings. Playing a note on an open string contains only small pitch variation due to possible variations in the force applied to the bow. A flute vibrato generally adds only a small pitch variation, and generally has a large and uncorrelated variation in the amplitudes of upper partials and not a large variation in the amplitude of the fundamental. In stopped string bowing, the sounds have both a significant pitch variation (a few percent) over all partials and also large amplitude variations among the partials because of body resonance (Fletcher and Rossing, 1995)
Recently a method for evaluating the degree of phase synchronous vs. asynchronous deviations among harmonics of musical instruments in sustained portions of their sounds was proposed (Dubnov and Rodet, 2003), based on estimation of the degree phase coupling among groups of harmonically related partials and it is closely related to evaluation of bi-coherence (using Higher Order Spectral (HOS) analysis). The bi-coherence method is different from the coherence method of the current paper in several aspects: First, the bi-coherence function depends on interaction between phases of different partials, while the coherence measure is a local property of every partial. Moreover, phase coupling measures deviations between phases of sinusoidal components, while coherence captures random modulations that may contain both phase and amplitude deviations.
We use a term “modulation coherence” to denote this new measure for signal deviation from periodicity, which measures the deviations in the frequency domain of the signal spectral component relative to a mean signal that has perfectly coherent or constant spectral components with no amplitude or phase deviations between periods. We use the term “coherence” in analogy with the physics use of the term, like in “coherent light”, being a signal of zero bandwidth, and having no deviations from single frequency (monochromatic).
One of the contributions of this paper is in derivation of a theoretical estimate for the amount of decay in modulation coherence due to vibrato (mathematical details are provided in the Appendix). It might be expected that a signal containing quazi-periodic frequency fluctuations would have little modulation coherence since it does not have a well-defined period and accordingly no averaging period or mean signal could be determined. Our analysis shows that in case that vibrato is considered to be a (random) frequency modulation, then for vibrato depth of the order of magnitude of a semitone (or less, typical to musical instruments), the decay in modulation coherence is actually very small. This finding is interesting when considering the experimental modulation coherence results for instruments with vibrato. For instance, comparing open and stopped notes on a Cello (i.e. without and with vibrato), we come to the conclusion that the large reduction in modulation coherence in the later case cannot be attributed to frequency modulation aspect due to the vibrato.
The experimental analyses in the paper are performed using a set of sounds similar to ones that were used in (Dubnov and Rodet, 2003) (specifically, the sounds of Cello, Flute and Trumpet instruments are the same recordings). The experiments include investigation of both stopped and open string cello sounds and normal playing for wind instruments containing various amount of vibrato, with the flute having a significant vibrato, while the trumpet or French horn having no vibrato. These samples were taken from McGill University Music Sound Database (McGill University Master Samples).
II. The Model
A varying periodic signal with a randomly modulated periodicity is defined as follows:
Definition: A signal is called a randomly modulated periodicity with period T if it is of the form
forfk = k/T(1.1)
where , , and for each k and E is the expectation operation. The K/2+1 are jointly dependent random processes that represent the random modulation. This signal can be written as where
and .(1.2)
The periodic component is the mean of . The zero mean stochastic term is a real valued non-stationary process.
A common approach in processing signals with a periodic structure is to segment the observations into frames of length T so that there is exactly integer number of periods in each sampling frame. The term sampling frame, or simply frame is used in this paper in order to match the terminology used in the speech and audio processing literature. The waveform in frame m is slightly different from that in frame m +1 due to variation in the stochastic signal. To further simplify notation, let us set the time origin at the start of the first frame. Then the start of the m-th frame is where m=1,…M. The variation of the waveform from frame-to-frame is determined by a probability mechanism described by the joint distribution of .
Now that the concept of a randomly modulated periodicity has been defined, the next step is to develop a measure of the amount of random variation present in each Fourier component of a signal. Such a measure, called a modulation coherencefunction,is presented in the next section. It is important to note that in the definition of the signal (1.1) it is implicitly assumed that the signal period is some integer multiple of 1/T and accordingly the frequencies fk are integer multiples of this period. Since, at this point of discussion, we are free to specify any sampling frequency, one could in principle sample any periodic analog signal so that it is also discrete periodic. The implication of the choice of the sampling frequency is that the spectral analysis involved in estimation of the modulation coherence function (i.e. the DFT operation to be performed below), does not need to employ windowing or frequency interpolation techniques in order to obtain additional spectral values “in between” the DFT bins. In practice, the signal sampling frequency is chosen a-priori independently of the signal period, a situation that indeed requires additional methods for improving the spectral analysis. This will be done in the section on estimating the coherence function immediately following the next section. For the sake of clarity of the presentation we shall first define the modulation coherence function assuming that the sampling of the signal and the signal periodicity indeed correspond to each other (i.e. the signal is discrete periodic).
Modulation coherence
The m-th frame of the signal is . Its discrete Fourier transform (DFT) at frequency for each r = 1,…,T/2 is
(2.1)
Essentially, the above result says that the DFT of a randomly modulated periodic signal can be split into the mean spectral component and the contribution of the modulation component at that frequency. Although initially this may seem trivial, there are a couple of points to consider here: One is that this is a first step in preparing the estimator and defining the modulation coherence. The second is more significant, and it shows that periodic modulation, which is considered here as an inherent property of the signal and not as an added noise, behaves in the frequency domain as an additive spectral component, i.e. surplus energy and possibly phase shift in addition to the spectral components of the mean signal. Mathematically, of course, this is a manifestation of the linearity of the DFT, but it is considered here in a stochastic context, i.e. the added spectral component is a random spectral deviation and some statistics need to be extracted from it in order to use it as a signal characteristic.
To simplify the notation, the index m is not used to subscript the complex valued random variables X(r) and U(r). The variability of the complex Fourier amplitude X(r) about its mean r is , independent of due to stationarity. If and then that complex amplitude is a true periodicity. The larger the value of , the greater is the variability of that component from frame to frame. If and , then that component does not contribute to periodicity.
In order to quantify the variability consider the function , called a modulation coherence function defined as follows for each r=1,...,T/2:
(2.2)
If then . This is the case where the frequency component has a constant amplitude and phase. If then . This is the case where themean value of the frequency component is zero, which is true for each frequency component of any stationary random process with finite energy.
A high coherence value can be either due to large amplitude relative to the standard deviation or a small standard deviation relative to the amplitude . The signal coherence value at each harmonic is dimensionless and is neither a function of the energy in the band nor the amplitude of the partial.
One should note that this modulation coherence function is very different from the coherence function between two stationary signals (p.352, Jenkins and Watts, 1968). The coherence (sometimes called coherency) between and at frequency is the correlation between and . The closer the coherence value is to one, the higher the correlation between the real and imaginary parts of both Fourier components (Carter, Knapp, and Nuttall, 1973). The modulation coherence function, in contrast, is defined for one signal[1]. It measures the variability of X(r) about its mean r. One should keep in mind that the signal in this representation is the mean of the observed signal.
In the signal plus modulation-noise representation of the signal-to-modulation-noise ratio (SMNR) is for frequency . Thus is a monotonically increasing function of SMNR. Inverting this relationship it follows that
(2.3)
A modulation coherence value of 0.44 yields a SMNR of 0.24 which is –6.2 dB.
The measure is not shift invariant in the sense that it needs to be “synchronized” to the pitch. As will be discussed in the next section, the size of the frame is chosen in practice to include multiple periods. The size of the frame defines the resolution bandwidth, i.e. the larger the frames are, the better frequency resolution we get, but with a tradeoff of having less averaging (smaller amount of frames for the signal duration) and accordingly more noisy estimates.
Estimating the Modulation Coherence Function
As mentioned earlier, the signal in practice would most likely not have a correspondence between the sampling frequency and the signal period. This situation violates the model of (1.1) and requires some changes to the modulation coherence function in (2.2). The simple solution to this problem is to assume that either the sampling frequency is sufficiently high compared to the signal period. Another solution is to use multiple periods in a frame and possibly to use zero padding or other spectral interpolation methods for estimation of the signal spectrum at frequencies that do not correspond precisely to the DFT frequencies.
We shall address these problems in two stages. First, we present a simple method for finding the fundamental frequency. Then, we use a large frame size (a frame that contains multiple periods instead of a single period) and for estimation of the mean signal and include zero padding for estimation of the spectrum of the remaining difference signal.
Finding the Fundamental Frequency
It is important to know the fundamental frequency of the periodic component in order to obtain the correct frame length for correct DFT analysis and averaging of the signal. In case that the fundamental is unknown, it must be estimated from the signal. There are many algorithms in the literature that might be used for pitch or fundamental frequency detection. Below we describe the method for determining the fundamental that was used in our program.
To find the fundamental of a sound we subtract the mean (i.e. DC value) of the signal from each data point where and is the sampling interval. In our case it is important to find the exact value of the fundamental frequency to a precision that might be higher then the DFT resolution 1/T in equation (2.1). For this purpose we resample the signal to a higher sampling frequency and then we compute the discrete Fourier transform using a multiple of the fundamental instead of a single period, a situation that also stabilizes the average frame in terms of amplitude, phase and frequency fluctuations of the instrument. The coherence function is estimated from the mean and the variance of the DFT as explained below and the process is iterated by manually adjusting the analysis frame size (and changing the DFT analysis frequency accordingly) so as to maximize the resulting coherence values. The maximally coherent results are reported in the following graphs. It should be noted that additional zero padding is not required since when a matching signal period and DFT analysis frequency are found, the analysis frequency is exact.
Mean signal, modulation variance and modulation coherence function estimates
Suppose that we have observed M frames each of length T of the process as denoted in the beginning of Section 2. Recall that for each m=1,…,M . The sample mean for each t=0,…,T-1
(3.1)
is an unbiased estimator of the "signal" .
Let denote the r-th component of the DFT of . We define
,(3.2)
and let Ym(r) denote the rth DFT component of . Estimator of the variance is defined as:
.(3.3)
The statistic defined by
.(3.4)
It can be shown (Hinich 2000) that is a consistent estimator of for frequency with an error of . The expression can be used as an estimator of the signal-to-noise ratio for frequency .
Example: Coherent versus modulation only signal components
In order to better explain the difference between modulation coherence estimation and other, more standard spectral estimation methods we consider a signal comprising of a single sinusoid at a frequency and a band-limited noise-only component at the first harmonic frequency . The signal can be written as
(4.1)
Note that this signal has energy at two frequencies, where a component at frequency has for all times, which results in modulation coherence of value one, and a second component at frequency that has , resulting in modulation coherence of zero value. It should be noted that the bandwidth of the noise component is not specified in the definition of modulation coherence, since both the definition and the analysis are asymptotic. From the point of view of spectral analysis, the second component at the right hand side of equation (4.1) is heterodyning of a signal , which centers the energy of the noise on frequency , with a bandwidth that equals that of .