Hi, Will it be possible to add a TF2 implementation of SqueezeWave vocoder to this

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

SqueezeWave Implementation about tensorflowtts HOT 12 CLOSED

sujeendran commented on May 14, 2024

SqueezeWave Implementation

from tensorflowtts.

Comments (12)

dathudeptrai commented on May 14, 2024 1

@sujeendran if you can prove the performance of squeezeWave better than MB-melgan, i will implement it :)). There are no reason to add new model in this framework that no faster and no stronger than what was there. Hear audio samples and glance the paper i don't think squeezewave better than mb-melgan on both inference time and quality.

from tensorflowtts.

sujeendran commented on May 14, 2024 1

@dathudeptrai I will need some time to test mb-melgan on my target platform. I suggested SqueezeWave mostly for the speed and possibility to run on CPU for resource restricted edge devices. TFLite and TFmicro are favorable for such solutions. In my case, I was able to run a combination of FastSpeech and SqueezeWave synthesis on Jetson Nano platform in 0.5 seconds with PyTorch. The quality was not bad, but could have been better. Will update here if I'm successful with mb-melgan.

from tensorflowtts.

sujeendran commented on May 14, 2024 1

@manmay-nakhashi Thanks for the tip. I will try that. I'm not at the liberty for sharing the complete C++ code, but I can share a bit of minimal mbmelgan inference code sample once the interpreter is loaded. The same pattern can be used for fastspeech, but you just need to set the other input tensor buffers too and inputtensor will be int32_t type.
Hope this helps!

//MB Melgan
//Input signature -> [1 -1 80]float32
//Output signature -> [1 -1 1]float32
void infer(float *inputtensor, int N, float *&output, int &outsize)
{
  //Resize and reallocate tensor buffers only if input dimension has changed.
  if (currentDim != N)
  {
    const std::vector<int> newDim{1, N, 80};
    interpreter->ResizeInputTensor(0, newDim);
    // Allocate tensor buffers.
    interpreter->AllocateTensors();
  }

  // Fill input buffers
  float *inputptr = interpreter->typed_tensor<float>(inputs[0]);
  memcpy((void *)inputptr, inputtensor, sizeof(float) * N * 80);

  // Run inference
  interpreter->Invoke();

  // Read output buffers
  TfLiteIntArray *output_dims = interpreter->tensor(outputs[0])->dims;
  int output_size = output_dims->data[output_dims->size - 2];
  printf("Output shape: [1 %d 1]\n", output_size);

  float *outputptr = interpreter->typed_tensor<float>(outputs[0]);
  output = outputptr;
  outsize = output_size;
}

EDIT: Just removed the kTfLiteOk checks for allocate and invoke calls. It was part of a error check function call i forgot to remove before posting.

from tensorflowtts.

dathudeptrai commented on May 14, 2024

@sujeendran

from tensorflowtts.

manmay-nakhashi commented on May 14, 2024

@dathudeptrai @sujeendran it is fast but audio quality is not so good
Intel® Core™ i5-6300U CPU

example 1

taskset --cpu-list 1 python3 synthesis.py "Fastspeech with Squeezewave vocoder in pytorch , very fast inference on cpu"

Speech synthesis time: 1.7220683097839355

soxi out:
Input File : 'results/Fastspeech with Squeezewave vocoder in pytorch , very fast inference on cpu_112000_squeezewave.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:05.96 = 131328 samples ~ 446.694 CDDA sectors
File Size : 263k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
approx. 6 sec. audio output in 1.72 sec on single cpu

example 2
taskset --cpu-list 0 python3 synthesis.py "How are you"
Speech synthesis time: 0.3431851863861084
soxi out:
Input File : 'results/How are you _112000_squeezewave.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:00.85 = 18688 samples ~ 63.5646 CDDA sectors
File Size : 37.4k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
0.85 sec. audio output in 0.34 sec on single cpu

from tensorflowtts.

dathudeptrai commented on May 14, 2024

@sujeendran any update ?

from tensorflowtts.

sujeendran commented on May 14, 2024

@dathudeptrai Hi I haven't worked on SqueezeWave for a while as I am working on tflite c++ inference of fastspeech and mbmelgan. As manmay noted the quality of SqueezeWave is not as good as mbmelgan, but it is definitely faster on my tests running on Jetson Nano on CPU/GPU with PyTorch compared to running FastSpeech+MBMelgan on CPU/GPU with Tensorflow2.x. On Jetson, the GPU takes 2+ seconds(even after warmup) for tiny sentences with Tensorflow2.x with the above pipeline (CPU runs faster, but inference time increases linearly with sentence length). Whereas using the PyTorch GPU implementation of FastSpeech+SqueezeWave is able to do this in ~0.5 seconds irrespective of the sentence length and with no warmup.

from tensorflowtts.

dathudeptrai commented on May 14, 2024

@sujeendran on Jetson i think u can inference directly by install our framework without convert into pb or TFlite, i noticed that run inference with @tf.function and input_signature no need warmup compared with pb. In overall i think FastSpeech + mbmelgan is fast enough to run real-time on streaming mode. BTW, did you use 8bit or 32bit for tflite ?, and Jetson nano is ARM ?

from tensorflowtts.

sujeendran commented on May 14, 2024

@dathudeptrai You are right about using the @tf.function directly on Jetson for faster inference. But I was trying to reduce the size taken by the model files to avoid keeping the source code on the target device. But the GPU inference is still 2+ seconds at least. I need something that is below 1 second.
In case of TFLite, allowing supported type tf.float16 increased the speed by around 16x I would say. But I couldnt do the same with FastSpeech. The conversion to tflite failed when I gave supported type tf.float16. Jetson Nano is ARM64. Can you help me out with 8bit tflite as you mentioned?

from tensorflowtts.

manmay-nakhashi commented on May 14, 2024

@sujeendran use TFLITE_BUILTINS_INT8 as opset while tflite conversion. also can you share your c++ inference code?

from tensorflowtts.

sujeendran commented on May 14, 2024

@manmay-nakhashi can you show your code for INT8 conversion of the fastspeech model? I tried several configurations but couldn't get INT8 to work. Did you provide any representative dataset while converting? and how is the quality of inference for INT8?

from tensorflowtts.

dathudeptrai commented on May 14, 2024

@sujeendran https://www.tensorflow.org/lite/performance/post_training_quantization

from tensorflowtts.

SqueezeWave Implementation about tensorflowtts HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent