Comments (4)
Isn't Gammatonegram
already like this? The output is already [Batch, N_filters X Signal length]. Or am I understanding your question wrongly?
from nnaudio.
Isn't
Gammatonegram
already like this? The output is already [Batch, N_filters X Signal length]. Or am I understanding your question wrongly?
Just saw your reply now. No. clearly thats not what Gammatonegram returns. check the docs of yourt code:
Returns
-------
spectrogram : torch.tensor
It returns a tensor of spectrograms. shape = (num_samples, freq_bins,time_steps)
.
time_steps is not signal length, but rather signal_length/frame_hop, i want the per channel IIR filtered waveform not the binned fft
from nnaudio.
I understand your question now. I am not familiar with gammatone and gammatonegram. This feature is implemented by @WangHelin1997. Maybe he can comment more on it?
Alternatively, can you recommend me any python library that could produce the filtered waveforms? I will check if I could implement it under the current nnAudio framework. It would be a great help if I have something to refer to just to check if I could implement it correctly.
from nnaudio.
https://github.com/detly/gammatone/blob/master/gammatone/filters.py
This is one example of its implementation. the output of erb_filterbank() function is what im asking for. its quite slow though. I tried to do it in torch too myself but not really sped up:
class GammatoneFilterbank(torch.nn.Module):
def __init__(self,
num_filters=64,
sample_rate=16000,
fmin= 50,
fmax = None,
gtgram = False,
frame_length = 400,
hop_length= 160
):
super(GammatoneFilterbank, self).__init__()
self.num_filters = num_filters
self.sample_rate = sample_rate
self.gtgram = gtgram
self.frame_length = frame_length
self.hop_length = hop_length
self.fmin = fmin
if fmax:
self.fmax = fmax
else:
self.fmax = self.sample_rate/2
self.centre_freqs = self.centre_frequencies()
self.filter_coefs = self.make_erb_filters()
@staticmethod
def erb_point(low_freq, high_freq, fraction):
ear_q = 9.26449 # Glasberg and Moore Parameters
min_bw = 24.7
order = 1
low_freq = torch.tensor(low_freq)
high_freq = torch.tensor(high_freq)
erb_point = (
-ear_q * min_bw
+ torch.exp(
fraction * (
-torch.log(high_freq + ear_q * min_bw)
+ torch.log(low_freq + ear_q * min_bw)
)
) *
(high_freq + ear_q * min_bw)
)
return erb_point
@staticmethod
def erb_space(
low_freq=50,
high_freq=8000,
num_bands=64):
"""
This function computes an array of ``num`` frequencies uniformly spaced
between ``high_freq`` and ``low_freq`` on an ERB scale.
For a definition of ERB, see Moore, B. C. J., and Glasberg, B. R. (1983).
"Suggested formulae for calculating auditory-filter bandwidths and
excitation patterns," J. Acoust. Soc. Am. 74, 750-753.
"""
return GammatoneFilterbank.erb_point(
low_freq,
high_freq,
torch.arange(1, num_bands + 1) / num_bands
)
def centre_frequencies(self):
"""
Calculates an array of centre frequencies (for :func:`make_erb_filters`)
from a sampling frequency, lower cutoff frequency and the desired number of
filters.
:param fs: sampling rate
:param num_freqs: number of centre frequencies to calculate
:type num_freqs: int
:param cutoff: lower cutoff frequency
:return: same as :func:`erb_space`
"""
return GammatoneFilterbank.erb_space(low_freq= self.fmin, high_freq= self.fmax, num_bands=self.num_filters)
def make_erb_filters(self, width=1.0):
T = 1 / self.sample_rate
ear_q = 9.26449 # Glasberg and Moore Parameters
min_bw = 24.7
order = 1
if not torch.is_tensor(self.centre_freqs):
self.centre_freqs = torch.Tensor(self.centre_freqs)
erb = width*((self.centre_freqs / ear_q) ** order + min_bw ** order) ** (1 / order)
B = 1.019 * 2 * torch.Tensor([math.pi]) * erb
arg = 2 * self.centre_freqs * torch.Tensor([math.pi]) * T
vec = torch.exp(2j * arg)
A0 = T
A2 = 0
B0 = 1
B1 = -2 * torch.cos(arg) / torch.exp(B * T)
B2 = torch.exp(-2 * B * T)
rt_pos = torch.sqrt(torch.tensor(3 + 2 ** 1.5))
rt_neg = torch.sqrt(torch.tensor(3 - 2 ** 1.5))
common = -T * torch.exp(-(B * T))
k11 = torch.cos(arg) + rt_pos * torch.sin(arg)
k12 = torch.cos(arg) - rt_pos * torch.sin(arg)
k13 = torch.cos(arg) + rt_neg * torch.sin(arg)
k14 = torch.cos(arg) - rt_neg * torch.sin(arg)
A11 = common * k11
A12 = common * k12
A13 = common * k13
A14 = common * k14
gain_arg = torch.exp(1j * arg - B * T)
gain = torch.abs(
(vec - gain_arg * k11)
* (vec - gain_arg * k12)
* (vec - gain_arg * k13)
* (vec - gain_arg * k14)
* (T * torch.exp(B * T)
/ (-1 / torch.exp(B * T) + 1 + vec * (1 - torch.exp(B * T)))
)**4
)
allfilts = torch.ones_like(self.centre_freqs)
fcoefs = torch.stack([
A0 * allfilts, A11, A12, A13, A14, A2*allfilts,
B0 * allfilts, B1, B2,
gain
], dim=1)
return fcoefs
def erb_filterbank(self, waveform):
#Batch x Time
if waveform.ndim==1:
waveform = waveform[None,:]
#output = torch.zeros((self.filter_coefs[:,9].shape[0], waveform.shape[-1]))
gain = self.filter_coefs[:, 9]
# A0, A11, A2
As1 = self.filter_coefs[:, (0, 1, 5)]
# A0, A12, A2
As2 = self.filter_coefs[:, (0, 2, 5)]
# A0, A13, A2
As3 = self.filter_coefs[:, (0, 3, 5)]
# A0, A14, A2
As4 = self.filter_coefs[:, (0, 4, 5)]
# B0, B1, B2
Bs = self.filter_coefs[:, 6:9]
stacked_waveforms = waveform.expand(self.filter_coefs.shape[0],*waveform.shape[1:])
y1 = F.lfilter(stacked_waveforms, Bs, As1, clamp=False)
y2 = F.lfilter(y1, Bs, As2,clamp=False)
y3 = F.lfilter(y2, Bs, As3,clamp=False)
y4 = F.lfilter(y3, Bs, As4,clamp=False)
return y4 / gain.unsqueeze(-1)
def forward(self, x):
if self.gtgram:
x = self.erb_filterbank(x)
x = torch.nn.functional.pad(x,(self.frame_length//2, self.frame_length - self.frame_length//2))
x = torch.sum(x.unfold(-1, self.frame_length, self.hop_length)**2, axis=-1)
return torch.sqrt(x)
else:
return self.erb_filterbank(x)
I guess the fastest ones are the ones directly written in C.
from nnaudio.
Related Issues (20)
- CQT HOT 2
- General improvements HOT 9
- Learnable Window HOT 3
- CQT doesn't work on waveforms on short chunks like 0.5s HOT 2
- [Feature Request] Allow STFT kernels to be normalized HOT 3
- [Feature request] Log2 (octave) normalization in STFT HOT 1
- Mel_Basis kernel HOT 1
- Spectrograms not updating well at low frequency bins HOT 3
- Apply for multi-channel signal HOT 2
- Difficulty in using VQT feature with GPU support HOT 1
- test_stft fails with librosa 0.9.2 (python 3.9.13, numpy 1.22.2). HOT 1
- nnAudio-CPU memory growth HOT 2
- `conv1d` padding needs to be a tuple in `utils.downsampling_by_2`
- [Help wanted] Some tests are failing HOT 6
- Looking for a simple example of STFT/ iSTFT HOT 3
- [Feature Request/Inquiry] CQT inversion HOT 1
- cannot re-initialize CUDA in forked subproess HOT 1
- torchscript support HOT 1
- pip package out of date HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nnaudio.