Hi, I'm interested to have an nn.module gammatone Filterbank that produces the fil

Gammatone Filterbank waveform outputs about nnaudio HOT 4 OPEN

astrocyted commented on May 27, 2024

Gammatone Filterbank waveform outputs

from nnaudio.

Comments (4)

KinWaiCheuk commented on May 27, 2024

Isn't Gammatonegram already like this? The output is already [Batch, N_filters X Signal length]. Or am I understanding your question wrongly?

from nnaudio.

astrocyted commented on May 27, 2024

Isn't Gammatonegram already like this? The output is already [Batch, N_filters X Signal length]. Or am I understanding your question wrongly?

Just saw your reply now. No. clearly thats not what Gammatonegram returns. check the docs of yourt code:
Returns
-------
spectrogram : torch.tensor
It returns a tensor of spectrograms. shape = (num_samples, freq_bins,time_steps).

time_steps is not signal length, but rather signal_length/frame_hop, i want the per channel IIR filtered waveform not the binned fft

from nnaudio.

KinWaiCheuk commented on May 27, 2024

I understand your question now. I am not familiar with gammatone and gammatonegram. This feature is implemented by @WangHelin1997. Maybe he can comment more on it?

Alternatively, can you recommend me any python library that could produce the filtered waveforms? I will check if I could implement it under the current nnAudio framework. It would be a great help if I have something to refer to just to check if I could implement it correctly.

from nnaudio.

astrocyted commented on May 27, 2024

https://github.com/detly/gammatone/blob/master/gammatone/filters.py

This is one example of its implementation. the output of erb_filterbank() function is what im asking for. its quite slow though. I tried to do it in torch too myself but not really sped up:

class GammatoneFilterbank(torch.nn.Module):
    def __init__(self,
                num_filters=64,
                sample_rate=16000,
                fmin= 50,
                fmax = None,
                gtgram = False,
                frame_length = 400,
                hop_length= 160    
                ):
        
        super(GammatoneFilterbank, self).__init__()
        self.num_filters = num_filters
        self.sample_rate = sample_rate
        
        self.gtgram = gtgram
        self.frame_length = frame_length
        self.hop_length = hop_length

        self.fmin = fmin
        if fmax:
            self.fmax = fmax
        else:
            self.fmax = self.sample_rate/2

        self.centre_freqs = self.centre_frequencies()
        self.filter_coefs = self.make_erb_filters()
   

    @staticmethod
    def erb_point(low_freq, high_freq, fraction):
        ear_q = 9.26449  # Glasberg and Moore Parameters
        min_bw = 24.7
        order = 1
        
        low_freq = torch.tensor(low_freq)
        high_freq = torch.tensor(high_freq)
        
        
        erb_point = (
            -ear_q * min_bw
            + torch.exp(
                fraction * (
                    -torch.log(high_freq + ear_q * min_bw)
                    + torch.log(low_freq + ear_q * min_bw)
                )
            ) *
            (high_freq + ear_q * min_bw)
        )
        
        return erb_point

    @staticmethod
    def erb_space(
        low_freq=50,
        high_freq=8000,
        num_bands=64):
        """
        This function computes an array of ``num`` frequencies uniformly spaced
        between ``high_freq`` and ``low_freq`` on an ERB scale.
        
        For a definition of ERB, see Moore, B. C. J., and Glasberg, B. R. (1983).
        "Suggested formulae for calculating auditory-filter bandwidths and
        excitation patterns," J. Acoust. Soc. Am. 74, 750-753.
        """
        return GammatoneFilterbank.erb_point(
            low_freq,
            high_freq,
            torch.arange(1, num_bands + 1) / num_bands
            )

    
    def centre_frequencies(self):
        """
        Calculates an array of centre frequencies (for :func:`make_erb_filters`)
        from a sampling frequency, lower cutoff frequency and the desired number of
        filters.
        
        :param fs: sampling rate
        :param num_freqs: number of centre frequencies to calculate
        :type num_freqs: int
        :param cutoff: lower cutoff frequency
        :return: same as :func:`erb_space`
        """
        return GammatoneFilterbank.erb_space(low_freq= self.fmin, high_freq= self.fmax, num_bands=self.num_filters)



    def make_erb_filters(self, width=1.0):
        T = 1 / self.sample_rate
        ear_q = 9.26449 # Glasberg and Moore Parameters
        min_bw = 24.7
        order = 1

        if not torch.is_tensor(self.centre_freqs):
            self.centre_freqs = torch.Tensor(self.centre_freqs)
        
        erb = width*((self.centre_freqs / ear_q) ** order + min_bw ** order) ** (1 / order)
        B = 1.019 * 2 * torch.Tensor([math.pi]) * erb

        arg = 2 * self.centre_freqs * torch.Tensor([math.pi]) * T
        vec = torch.exp(2j * arg)

        A0 = T
        A2 = 0
        B0 = 1
        B1 = -2 * torch.cos(arg) / torch.exp(B * T)
        B2 = torch.exp(-2 * B * T)

        rt_pos = torch.sqrt(torch.tensor(3 + 2 ** 1.5))
        rt_neg = torch.sqrt(torch.tensor(3 - 2 ** 1.5))

        common = -T * torch.exp(-(B * T))

        k11 = torch.cos(arg) + rt_pos * torch.sin(arg)
        k12 = torch.cos(arg) - rt_pos * torch.sin(arg)
        k13 = torch.cos(arg) + rt_neg * torch.sin(arg)
        k14 = torch.cos(arg) - rt_neg * torch.sin(arg)

        A11 = common * k11
        A12 = common * k12
        A13 = common * k13
        A14 = common * k14

        gain_arg = torch.exp(1j * arg - B * T)

        gain = torch.abs(
            (vec - gain_arg * k11)
            * (vec - gain_arg * k12)
            * (vec - gain_arg * k13)
            * (vec - gain_arg * k14)
            * (T * torch.exp(B * T)
                / (-1 / torch.exp(B * T) + 1 + vec * (1 - torch.exp(B * T)))
            )**4
        )

        allfilts = torch.ones_like(self.centre_freqs)

        fcoefs = torch.stack([
            A0 * allfilts, A11, A12, A13, A14, A2*allfilts,
            B0 * allfilts, B1, B2,
            gain
        ], dim=1)

        return fcoefs

    def erb_filterbank(self, waveform):
        #Batch x Time
        if waveform.ndim==1:
            waveform = waveform[None,:]

        #output = torch.zeros((self.filter_coefs[:,9].shape[0], waveform.shape[-1]))
        
        gain = self.filter_coefs[:, 9]
        # A0, A11, A2
        As1 = self.filter_coefs[:, (0, 1, 5)]
        # A0, A12, A2
        As2 = self.filter_coefs[:, (0, 2, 5)]
        # A0, A13, A2
        As3 = self.filter_coefs[:, (0, 3, 5)]
        # A0, A14, A2
        As4 = self.filter_coefs[:, (0, 4, 5)]
        # B0, B1, B2
        Bs = self.filter_coefs[:, 6:9]
        
        stacked_waveforms = waveform.expand(self.filter_coefs.shape[0],*waveform.shape[1:])

        y1 = F.lfilter(stacked_waveforms, Bs, As1, clamp=False)
        y2 = F.lfilter(y1, Bs, As2,clamp=False)
        y3 = F.lfilter(y2, Bs, As3,clamp=False)
        y4 = F.lfilter(y3, Bs, As4,clamp=False)
        
            
        return y4 / gain.unsqueeze(-1)
    
    def forward(self, x):
        if self.gtgram:
            x = self.erb_filterbank(x)
            x = torch.nn.functional.pad(x,(self.frame_length//2, self.frame_length - self.frame_length//2))
            x = torch.sum(x.unfold(-1, self.frame_length, self.hop_length)**2, axis=-1)
            return torch.sqrt(x)
        else:
            return self.erb_filterbank(x)

I guess the fastest ones are the ones directly written in C.

from nnaudio.

Gammatone Filterbank waveform outputs about nnaudio HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent