Aalborg Universitet On Optimal Filter Designs for

1 On Optimal Filter Designs for Fundamental Frequency Estimation Mads Græsbøll Christensen∗, Jesper Højvang Jensen, Andreas Jakobsson, and Søren Holdt...

0 downloads 98 Views 184KB Size
Aalborg Universitet

On Optimal Filter Designs for Fundamental Frequency Estimation Christensen, Mads Græsbøll; Jensen, Jesper Højvang; Jakobsson, Andreas; Jensen, Søren Holdt Published in: IEEE Signal Processing Letters DOI (link to publication from Publisher): 10.1109/LSP.2008.2003987

Publication date: 2008 Document Version Accepted author manuscript, peer reviewed version Link to publication from Aalborg University

Citation for published version (APA): Christensen, M. G., Jensen, J. H., Jakobsson, A., & Jensen, S. H. (2008). On Optimal Filter Designs for Fundamental Frequency Estimation. IEEE Signal Processing Letters, 15, 745-748. https://doi.org/10.1109/LSP.2008.2003987

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. ? Users may download and print one copy of any publication from the public portal for the purpose of private study or research. ? You may not further distribute the material or use it for any profit-making activity or commercial gain ? You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us at [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from vbn.aau.dk on: February 27, 2020

1

On Optimal Filter Designs for Fundamental Frequency Estimation Mads Græsbøll Christensen∗ , Jesper Højvang Jensen, Andreas Jakobsson, and Søren Holdt Jensen

Abstract— Recently, we proposed using Capon’s minimum variance principle to find the fundamental frequency of a periodic waveform. The resulting estimator is formed such that it maximises the output power of a bank of filters. We present an alternative optimal single filter design, and then proceed to quantify the similarities and differences between the estimators using asymptotic analysis and Monte Carlo simulations. Our analysis shows that the single filter can be expressed in terms of the optimal filterbank, and that the methods are asymptotically equivalent, but generally different for finite length signals.

I. I NTRODUCTION Bandlimited periodic waveforms can be decomposed into a finite set of sinusoids having frequencies that are integer multiples of a so-called fundamental frequency. Much research has been devoted to the problem of finding the fundamental frequency, and rightfully so. It is an important problem in many applications in, for example, speech and audio processing, and the problem has become no less relevant with the many interesting new applications in music information retrieval. The fundamental estimation problem can be mathematically defined as follows: a signal consisting of a set of harmonically related sinusoids related by the fundamental frequency ω0 is corrupted by an additive white complex circularly symmetric Gaussian noise, w(n), having variance σ 2 , for n = 0, . . . , N − 1, i.e., x(n) =

L X

αl ejω0 ln + w(n),

(1)

l=1

jψl

where αl = Al e , with Al > 0 and ψl being the amplitude and the phase of the lth harmonic, respectively. The problem of interest is to estimate the fundamental frequency ω0 from a set of N measured samples x(n). Some representative examples of the various types of methods that are commonly used for fundamental frequency estimation are: linear prediction [1], correlation [2], subspace methods [3], frequency fitting [4], maximum likelihood (e.g., [5]), Bayesian estimation [6], and comb filtering [7]. The basic idea of the comb filtering approach is that when the teeth of the comb filter coincide with the frequencies of the individual harmonics, the output M. G. Christensen is supported by the Parametric Audio Processing project, Danish Research Council for Technology and Production Sciences grant no. 274–06–0521. J. H. Jensen is supported by the Intelligent Sound project, Danish Technical Research Council grant no. 26–04–0092 M. G. Christensen, J. H. Jensen, and S. H. Jensen are with the Department of Electronic Systems, Aalborg University, Denmark. A. Jakobsson is with the Dept. of Mathematical Statistics, Lund University, SE 221 00 Lund, Sweden.

power of the filter is maximized. This idea is conceptually related to our approach derived in [5]; however, here we design optimal signal-adaptive filters reminiscent of beamformers for coherent signals, e.g. [8], for the estimation of the fundamental frequency. In particular, we consider two fundamental frequency estimators based on the well-known minimum variance principle [9]. The two estimators are based on different filter design formulations with one being based on a bank of filters and the other on only a single filter. The first of these estimators was recently proposed [5], while the second one is novel. The estimators are compared and the asymptotic properties of the estimators are analyzed and their finite length performance is investigated and compared in Monte Carlo simulations. For simplicity, we will here consider only the single pitch estimation problem but the presented methods can easily be applied to multi pitch estimation as well (see [5]). The remainder of this paper is organized as follows. First, we introduce the two filter designs and the associated estimators in Section II. Then, we analyze and compare the estimators and their asymptotic properties in Section III. Their finite length performance is investigated in Section IV, before we conclude on our work in Section V. II. O PTIMAL F ILTER D ESIGNS A. Filterbank Approach We begin by introducing some useful notation, definitions and review the fundamental frequency estimator proposed in [5]. First, we construct a vector from M consecutive samples of the observed signal, i.e., x(n) = [ x(n) x(n−1) · · · x(n− M + 1) ]T with M ≤ N and with (·)T denoting the transpose. Next, we introduce the output signal yl (n) of the lth filter PM −1 having coefficients hl (n) as yl (n) = m=0 hl (m)x(n−m) = H H hl x(n), with (·) denoting the Hermitian transpose and H hl = [ hl (0) . . . hl (M − 1) ] . Introducing the expected value  E {·} and defining the covariance matrix as R = E x(n)xH (n) , the output power of the lth filter can be written as   H H E |yl (n)|2 = E hH (2) l x(n)x (n)hl = hl Rhl .

The PL

total output power of all the filters is  PL 2 H E |y (n)| = h Rhl . Defining a matrix l l l=1 l=1 H consisting of the filters hl as H = [ h1 · · · hL ], we can write the total output power of the power of the   PL as a sum2 subband signals, i.e., = Tr HH RH . l=1 E |yl (n)| The filter design problem can now be stated. We seek to find a set of filters that pass power undistorted at specific frequencies, here the harmonic frequencies, while minimizing

2

the power at all other frequencies. This problem can be formulated mathematically as the optimization problem:   min Tr HH RH s.t. HH Z = I, (3) H

where I is the L × L identity matrix. Furthermore, the matrix Z ∈ CM ×L has a Vandermonde structure and is constructed from L complex sinusoidal vectors as Z = [ z(ω0 ) · · · z(ω0 L) ],

(4)

with z(ω) = [ 1 e−jω · · · e−jω(M −1) ]T . Or in words, the matrix contains the harmonically related complex sinusoids. The filter bank matrix H solving (3) is given by (see, e.g., [10]) −1 H = R−1 Z ZH R−1 Z . (5)

This data and frequency dependent filter bank can then be used to estimate the fundamental frequencies by maximizing the power of the filter’s output, yielding h −1 i , (6) ω ˆ 0 = arg max Tr ZH R−1 Z ω0

which depends only on the covariance matrix and the Vandermonde matrix constructed for different candidate fundamental frequencies.

The output power of this filter can then be expressed as −1 hH Rh = 1H ZH R−1 Z 1, (11)

which, as for the first design, depends only on the inverse of R and the Vandermonde matrix Z. By maximizing the output power, we readily obtain an estimate of the fundamental frequency as −1 ω ˆ 0 = arg max 1H ZH R−1 Z 1. (12) ω0

III. A NALYSIS

We will now relate the two filter design methods and the associated estimators in (6) and (12). It is perhaps not clear whether the two methods are identical or if there are some subtle differences. On one hand, the optimization problem in (3) allows for more degrees of freedom, since L filters of length M are designed while (7) involves only a single filter. On the other hand, the former design is based on L2 constraints as opposed to the latter approach only involving L. Comparing the optimal filters in (5) and (10), we observe that the latter can be written in terms of the former as L X −1 hl , h = R−1 Z ZH R−1 Z 1 = H1 =

(13)

l=1

B. An Alternative Approach We proceed to examine an alternative formulation of the filter design problem and state its optimal solution. Suppose that we wish to design a single filter, h, that passes the signal undistorted at the harmonic frequencies and suppresses everything else. This filter design problem can be stated as min hH Rh s.t. hH z(ω0 l) = 1, h

(7)

for l = 1, . . . , L. It is worth stressing that the single filter in (7) is designed subject to L constraints, whereas in (3) the filter bank is formed using a matrix constraint. Clearly, these two formulations are related; we will return to this relation in detail in the following section. Introducing the Lagrange multipliers λ = [ λ1 . . . λL ], the Lagrangian dual function associated with the problem stated above can be written as  L(h, λ) = hH Rh − hH Z − 1T λ (8)

with 1 = [ 1 . . . 1 ]T . Taking the derivative with respect to the unknown filter impulse response, h and the Lagrange multipliers, we get      R −Z h 0 + . (9) ∇L(h, λ) = λ 1 −ZH 0 By setting this expression equal to zero, i.e., ∇L(h, λ) = 0, and solving for the unknowns, we obtain the optimal Lagrange multipliers for which as −1 the equality constraints are satisfied 1 and the optimal filter as h = R−1 Zλ. λ = ZH R−1 Z By combining the last two expressions, we get the optimal filter expressed in terms of the covariance matrix and the Vandermonde matrix Z, i.e., −1 1. (10) h = R−1 Z ZH R−1 Z

so, clearly, the two methods are related. Using this to rewrite the output power in (11), we get ! ! L L X X hH Rh = hH R hm (14) l m=1

l=1

  PL as opposed to Tr HH RH = l=1 hH l Rhl for the filterbank approach. It can be seen that the single-filter approach includes the cross-terms hH l Rhm for l 6= m, while these do not appear in the filterbank approach. From this it follows that the cost functions are generally different, i.e., h −1 i −1 1H ZH R−1 Z 1 6= Tr ZH R−1 Z (15)   hH Rh 6= Tr HH RH . (16)

This means that the two filters will result in different output powers and thus possibly different estimates. Next, we will analyze the asymptotic properties of the cost function −1 1. (17) lim M 1H ZH R−1 Z M →∞

In doing so we will make use of the following result (see, e.g., [11])    lim (AB) = lim A lim B (18) M →∞

M →∞

M →∞

where it is assumed that the limits limM →∞ A and limM →∞ B exist for the individual elements of A and B. Using (18) to rewrite the limit of I = AA−1 , we get    (19) lim I = lim A lim A−1 . M →∞

M →∞

M →∞

Next, suppose we have an analytic expression for the limit of ¯ then we have I = A ¯ limM →∞ A−1 limM →∞ A, say, A,

3

¯ −1 and from which we conclude that (limM →∞ A−1 ) = A thus   −1  lim A−1 = lim A . (20) M →∞

−1

10

CRLB Filterbank Single Filter

−2

10

M →∞

−3

10

RMSE

Applying (18) and (20) to the cost function in (23), yields   −1 −1 1 H −1 H H −1 H lim lim M 1 Z R Z 1=1 Z R Z 1. M →∞ M M →∞ (21)

−1 lim M 1H ZH R−1 Z 1=

M →∞

L X

Φ(ω0 l),

(24)

M →∞

h

−6

10

Fig. 1.

0

5

10

15

20 25 SNR [dB]

30

35

40

RMSE as a function of the SNR for N = 50. −1

10

CRLB Filterbank Single Filter

−2

10

−3

10

−4

10

Fig. 2.

20

30

40

50

60 N

70

80

90

100

RMSE as a function the number of samples N for SN R = 20 dB.

l=1

which is simply the sum over the power spectral density evaluated at the harmonic frequencies. Similar derivations for the filterbank formulation yield lim M Tr

−5

10

RMSE

We are now left with the  problem of determining the limit 1 ZH R−1 Z . In doing so, we will make use of limM →∞ M the asymptotic equivalence of Toeplitz and circulant matrices. For a given Toeplitz matrix, here R, we can construct an asymptotically equivalent circulant M × M matrix C, under certain conditions, in the sense that [12] limM →∞ √1M kC − RkF = 0, where k · kF is the Frobenius norm and the limit is taken over the dimensions of C and R. A circulant matrix C has the eigenvalue decomposition C = QΓQH where Q is the Fourier matrix. Thus, the complex sinusoids in Z are asymptotically eigenvectors of R. This allows us to determine the limit as (see [12], [13])  1 ZH RZ = diag ([ Φ(ω0 ) · · · Φ(ω0 L) ]) (22) lim M →∞ M with Φ(ω) being the power spectral density of x(n). Similarly, an expression for the inverse of R can be obtained as C−1 = QΓ−1 QH (again, see [12] for details). We now arrive at the following (see also [13] and [14]):   1 lim ZH R−1 Z = diag [ Φ−1 (ω0 ) · · · Φ−1 (ω0 L) ] . M →∞ M (23) Asymptotically, (12) can therefore be written as

−4

10

L −1 i X Φ(ω0 l). = ZH R−1 Z

(25)

l=1

which is the same as (24). Note that for a finite M the above expression still involves only the diagonal terms (due to the trace), only the diagonal terms are not the power spectral density Φ(ω) evaluated in certain points. From the above derivations, we conclude that the two cost functions are different for finite M and may yield different estimates, but are asymptotically equivalent. IV. E XPERIMENTAL R ESULTS The question remains to be answered whether there are any important differences for finite length covariance matrices and filters, and we will now seek to answer that question with some experiments, specifically using Monte Carlo simulations with synthetic signals generated according to (1). For each b = realization, sample covariance matrix is estimated as R PNthe −M 1 H n=0 x(n)x (n) which is used in place of the N −M +1

b is true covariance matrix. Since both methods require that R N and in practice we invertible,we obviously have that M < 2  use M = 25 N , a value that has been determined empirically to yield good results. First, we will investigate the accuracy of the obtained fundamental frequency estimates measured in terms of the root mean square estimation error (RMSE). We do this for ω0 = 0.6364 with L = 3, unit amplitudes, and random phases drawn from a uniform probability density function. In Figure 1, the RMSE is plotted for N = 50 as a function of the signal-to-noise ratio (SNR) (as defined in [3] for the problem in (1)). The RMSE was estimated using 200 different realizations. Similarly, the RMSE is shown as a function of the number of samples, N , in Figure 2 for an SNR of 20 dB., again for 200 realizations. In both figures, the Cramér-Rao lower bound (CRLB), as derived in [3], is also shown. Both figures suggest that, all things considered, there is very little difference in terms of accuracy for the estimated parameters, with both estimators performing well. The methods seem to have different thresholding behaviour, though. We note that our simulations also show that the methods perform similarly as a function of ω0 , but in the interest of brevity, this figure has not been included herein. Next, we will measure the differences

4

40 35

1

30 0.8 Power Ratio [dB]

Power Ratio [dB]

25 20 15 10 5

0.6 0.4 0.2

0 0

−5 −10 0

Fig. 3.

10

20

30

40 50 Filter Length

60

70

−0.2

80

Power ratio in dB as a function of the filter length M .

of the estimated output powers. We measure this using the following power ratio (PR):   −1  b −1 Z E Tr ZH R P R = 10 log10   −1  [dB], (26) H −1 H b Z R Z E 1 1

which is positive if the output power of the filterbank exceeds that of the single filter and vice versa. It should be noted that the expectation is taken over the realizations of the b The power ratio (averaged over sample covariance matrix R. 1000 realizations) is shown in Figure 3 as a function of the filter length M for an SNR of 10 dB. The filter length  is related to the number of samples as M = 52 N . The fundamental frequency was drawn from a uniform distribution in the interval [0.1571; 0.3142] with L = 5 in this experiment to avoid any biases due to special cases. The true fundamental frequency was used in obtaining the optimal filters. In Figure 4, the same is plotted for N = 100, this time as a function of the number of harmonics L with all other conditions being the same as before. Interestingly, both Figures 3 and 4 paint a rather clear picture: for low filter lengths and high number of harmonics, the single filter design method actually leads to a better estimate of the signal power while for high filter orders and few harmonics, the methods tend to perform identically. This suggests that the single filter design method is preferable.

V. C ONCLUSION We have presented two different optimal filter designs that can be used for finding high-resolution estimates of the fundamental frequency of periodic signals. The two designs differ in that one is based on the design of a filterbank while the other is based on a single filter. We have shown that the optimal single filter can in fact be obtained from the optimal filters of the filterbank and that the methods are in fact different for finite lengths, but are asymptotically equivalent. Experiments indicate that the single filter leads to superior results in terms of estimating the output power.

Fig. 4.

2

4

6

8 10 12 14 Number of Harmonics

16

18

20

Power ratio in dB as a function of the number of harmonics L.

R EFERENCES [1] K. W. Chan and H. C. So, “Accurate frequency estimation for real harmonic sinusoids,” IEEE Signal Processing Lett., vol. 11(7), pp. 609– 612, July 2004. [2] A. de Cheveigné and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,” J. Acoust. Soc. Am., vol. 111(4), pp. 1917–1930, Apr. 2002. [3] M. G. Christensen, A. Jakobsson and S. H. Jensen, “Joint high-resolution fundamental frequency and order estimation,” IEEE Trans. on Audio, Speech and Language Processing, vol. 15(5), pp. 1635–1644, July 2007. [4] H. Li, P. Stoica, and J. Li, “Computationally efficient parameter estimation for harmonic sinusoidal signals,” Signal Processing, vol. 80, pp. 1937–1944, 2000. [5] M. G. Christensen, P. Stoica, A. Jakobsson and S. H. Jensen, “Multipitch estimation,” Elsevier Signal Processing, vol. 88(4), pp. 972–983, Apr. 2008. [6] S. Godsill and M. Davy, “Bayesian harmonic models for musical pitch estimation and analysis,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 2, 2002, pp. 1769–1772. [7] A. Nehorai and B. Porat, “Adaptive comb filtering for harmonic signal enhancement,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 34(5), pp. 1124–1138, Oct. 1986. [8] L. Zhang, H. C. So, H. Ping, and G. Liao, “Effective Beamformer for Coherent Signal Reception,” IEE Electronic Letters, vol. 39(13), pp. 949–951, June 2003. [9] J. Capon, “High-resolution frequency-wavenumber spectrum analysis,” Proc. IEEE, vol. 57(8), pp. 1408–1418, 1969. [10] P. Stoica and R. Moses, Spectral Analysis of Signals. Pearson Prentice Hall, 2005. [11] T. M. Apostol, Mathematical Analysis, 2nd ed. Addison-Wesley, 1974. [12] R. M. Gray, “Toeplitz and circulant matrices: A review,” Foundations and Trends in Communications and Information Theory, vol. 2(3), pp. 155–239, 2006. [13] E. J. Hannan and B. Wahlberg, “Convergence rates for inverse toeplitz matrix forms,” J. Multivariate Analysis, vol. 31, pp. 127–135, 1989. [14] P. Stoica, H. Li, and J. Li, “Amplitude estimation of sinusoidal signals: Survey, new results and an application,” IEEE Trans. Signal Processing, vol. 48(2), pp. 338–352, Feb. 2000.