Hello. I'm wondering how I can use the FFT to change the pitch of audio data in pcm16

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Pitch Shift using FFT? about fftea HOT 6 OPEN

Mamena2020 commented on June 28, 2024

Pitch Shift using FFT?

from fftea.

Comments (6)

liamappelbe commented on June 28, 2024

Good question. There's a few related algorithms in this area:

Resampling - changing to a different sampling rate, without changing the duration or pitch. You can also use resampling to change the duration and pitch without changing the sampling rate. This is easy to do with FFT and I'm planning to add a util for this to the library. But also, depending on what API you're passing the audio to, you might be able to specify an arbitrary sample rate. For example, if you were writing a WAV file, you might be able to just give it whatever sample rate you want. Some audio playback APIs also let you specify an arbitrary sample rate. In that case resampling is totally unnecessary.
Stretching - changing the duration of the audio without changing the sampling rate or pitch. This is not so easy to do, and I'll talk about it more below.
Pitch shifting - changing the pitch, without changing the duration or sampling rate. If we have 1 and 2, we can combine them to do pitch shifting. For example, to shift up by an octave, you'd stretch the audio to twice the length, then resample to halve the length again, increasing the pitch.

So we really just need to figure out number 2. I've been meaning to look into this for a while, because it's not obvious how it's done, but it's clear that FFT is involved somehow. So I dug into it a bit today. I've skimmed the source code of paulstretch, and it seems the basic idea is to do a windowed STFT, followed by an inverse STFT with a different chunk stride. Also, all the FFT phases are randomized along the way.

It's a bit hard to explain in words, but here's a picture. I think randomizing the phases is a stylistic choice, and I'm not sure it's what you want. If you don't want to randomize the phases then FFT might not even be necessary for this algorithm (you'll still need it for the resampling though). I'm going to try implementing it this afternoon and see what I find.

For your use case, it sounds like you want to do this transformation to a data stream. In that case your best bet is probably to do the resampling during the stretching, because the ordinary resampling algorithm needs the entire audio at once.

In any case, before any of this, you'll need to convert your audio from pcm16 to a Float64List (I have a util for this if you don't want to do it manually).

If this algorithm is too hard, I'm pretty sure you can use ffmpeg in streaming mode, rather than reading/writing files. I've done this before. You just need to tell it to read from stdin and write to stdout. Then you might be able to communicate directly with it using the Dart Process API. I haven't tried doing this via the Process API, but it should work. It's still starting up a subprocess, so it's not as efficient as writing the algorithm yourself, but at least you don't have to read/write files.

from fftea.

liamappelbe commented on June 28, 2024

@Mamena2020 Another question for you: How accurate do you need this to be? What are you wanting to use this for?

Doing more reading about audio stretching, and it seems it's still an active area of research. There's no perfect algorithm, and the best algorithms are pretty complicated. If you're fine with paulstretch or overlap-and-add, then this is something you can do yourself. But if you want something higher quality then you'll probably need to use something like SBSMS (you can load it using Dart FFI).

If you have audacity you can check out each of the 3 options I just mentioned. In the Effect > Pitch and Tempo menu, you'll see paulstretch and change tempo. In change tempo there's a checkbox for high quality. High quality stretching is using SBSMS, and low quality is using overlap-and-add. IMO, paulstretch is accurate enough for most purposes, and is much simpler than SBSMS. You can implement paulstretch pretty easily using my fft library.

from fftea.

Mamena2020 commented on June 28, 2024

@liamappelbe My primary objective is to record audio and alter its pitch in real-time. The steps I'm currently working on are as follows:

Recording:
For recording, I'm using this package flutter sound and streaming audio data in pcm16 format.
Auto Pitch:
To achieve pitch shifting, I'm using this ffmpeg kit. The process involves several steps:
- Convert audio data from pcm to wav format using flutterSoundHelper.pcmToWaveBuffer from flutter sound .
- Write the audio data (in wav format) to files.
- Utilize FFmpeg arguments for applying the desired pitch shift.
- Read the output audio data from file.
- Play output audio player.feedFromStream(outputAudio) using FlutterSoundPlayer from flutter sound
  I have successfully accomplished this, but the real-time pitch shifting experience is not as smooth as I expected. This is likely too much steps used affecting real time performance.

i've try to use ffmpeg using the Dart Process API as you suggest, but seems not luck.

    
   Future<Uint8List> testAudio({required Uint8List audioData}) async {
    
      final additionalArguments = [
        '-af',
        'volume=1'
      ]; 

      final ffmpegProcess = await io.Process.start(
        'ffmpeg',
        [
          ...additionalArguments,
          '-i', 'pipe:0', // input
          '-f', 'wav', 'pipe:1', // output
        ],
        mode: io.ProcessStartMode.detachedWithStdio,
        runInShell: true,
      );
      ffmpegProcess.stdin.add(audioData);
      await ffmpegProcess.stdin.flush();
      await ffmpegProcess.stdin.close();
      ffmpegProcess.kill();
      final bytes = await ffmpegProcess.stdout.toBytes();
      return Uint8List.fromList(bytes);
  }

   extension StreamToBytes on Stream<List<int>> {
      Future<List<int>> toBytes() async {
        final chunks = await toList();
        final bytesBuilder = BytesBuilder();
        for (final chunk in chunks) {
          bytesBuilder.add(chunk);
        }
        return bytesBuilder.takeBytes();
      }
    }

from fftea.

liamappelbe commented on June 28, 2024

Ok, if streaming to an ffmpeg process doesn't work, try paulstretch. I think that's probably the simplest approach that will definitely work.

To recap:

Convert your chunk of pcm16 audio to a Float64List
Use paulstretch to stretch or shrink your audio by whatever ratio you need (eg to shift the audio up an octave, stretch by a factor of 2).
Resample your audio back to its original length.
Output the audio chunk.

If you implement it cleverly, you can combine steps 2 and 3, for increased efficiency. I'll probably add the resampling util soon, since I've pretty much already written it. Let me know if you need help with paulstretch.

UPDATE: I've published the resampling util.

from fftea.

ivanesi commented on June 28, 2024

Is it possible to detect pitch? For example like in ear training or guitar tuner apps.

from fftea.

liamappelbe commented on June 28, 2024

Is it possible to detect pitch? For example like in ear training or guitar tuner apps.

@ivanesi Yep, this is easy. Just FFT your audio, find the frequency index with the largest magnitude, then use FFT.frequency to get the frequency in Hz of that index.

But if you have more questions, please file a separate bug. Don't hijack existing bugs with unrelated questions.

from fftea.

Pitch Shift using FFT? about fftea HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent