Giter VIP home page Giter VIP logo

Comments (12)

dathudeptrai avatar dathudeptrai commented on May 14, 2024 1

@sujeendran if you can prove the performance of squeezeWave better than MB-melgan, i will implement it :)). There are no reason to add new model in this framework that no faster and no stronger than what was there. Hear audio samples and glance the paper i don't think squeezewave better than mb-melgan on both inference time and quality.

from tensorflowtts.

sujeendran avatar sujeendran commented on May 14, 2024 1

@dathudeptrai I will need some time to test mb-melgan on my target platform. I suggested SqueezeWave mostly for the speed and possibility to run on CPU for resource restricted edge devices. TFLite and TFmicro are favorable for such solutions. In my case, I was able to run a combination of FastSpeech and SqueezeWave synthesis on Jetson Nano platform in 0.5 seconds with PyTorch. The quality was not bad, but could have been better. Will update here if I'm successful with mb-melgan.

from tensorflowtts.

sujeendran avatar sujeendran commented on May 14, 2024 1

@manmay-nakhashi Thanks for the tip. I will try that. I'm not at the liberty for sharing the complete C++ code, but I can share a bit of minimal mbmelgan inference code sample once the interpreter is loaded. The same pattern can be used for fastspeech, but you just need to set the other input tensor buffers too and inputtensor will be int32_t type.
Hope this helps!

//MB Melgan
//Input signature -> [1 -1 80]float32
//Output signature -> [1 -1 1]float32
void infer(float *inputtensor, int N, float *&output, int &outsize)
{
  //Resize and reallocate tensor buffers only if input dimension has changed.
  if (currentDim != N)
  {
    const std::vector<int> newDim{1, N, 80};
    interpreter->ResizeInputTensor(0, newDim);
    // Allocate tensor buffers.
    interpreter->AllocateTensors();
  }

  // Fill input buffers
  float *inputptr = interpreter->typed_tensor<float>(inputs[0]);
  memcpy((void *)inputptr, inputtensor, sizeof(float) * N * 80);

  // Run inference
  interpreter->Invoke();

  // Read output buffers
  TfLiteIntArray *output_dims = interpreter->tensor(outputs[0])->dims;
  int output_size = output_dims->data[output_dims->size - 2];
  printf("Output shape: [1 %d 1]\n", output_size);

  float *outputptr = interpreter->typed_tensor<float>(outputs[0]);
  output = outputptr;
  outsize = output_size;
}

EDIT: Just removed the kTfLiteOk checks for allocate and invoke calls. It was part of a error check function call i forgot to remove before posting.

from tensorflowtts.

dathudeptrai avatar dathudeptrai commented on May 14, 2024

@sujeendran

from tensorflowtts.

manmay-nakhashi avatar manmay-nakhashi commented on May 14, 2024

@dathudeptrai @sujeendran it is fast but audio quality is not so good
Intel® Core™ i5-6300U CPU

example 1

taskset --cpu-list 1 python3 synthesis.py "Fastspeech with Squeezewave vocoder in pytorch , very fast inference on cpu"

Speech synthesis time: 1.7220683097839355

soxi out:
Input File : 'results/Fastspeech with Squeezewave vocoder in pytorch , very fast inference on cpu_112000_squeezewave.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:05.96 = 131328 samples ~ 446.694 CDDA sectors
File Size : 263k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
approx. 6 sec. audio output in 1.72 sec on single cpu

example 2
taskset --cpu-list 0 python3 synthesis.py "How are you"
Speech synthesis time: 0.3431851863861084
soxi out:
Input File : 'results/How are you _112000_squeezewave.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:00.85 = 18688 samples ~ 63.5646 CDDA sectors
File Size : 37.4k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
0.85 sec. audio output in 0.34 sec on single cpu

from tensorflowtts.

dathudeptrai avatar dathudeptrai commented on May 14, 2024

@sujeendran any update ?

from tensorflowtts.

sujeendran avatar sujeendran commented on May 14, 2024

@dathudeptrai Hi I haven't worked on SqueezeWave for a while as I am working on tflite c++ inference of fastspeech and mbmelgan. As manmay noted the quality of SqueezeWave is not as good as mbmelgan, but it is definitely faster on my tests running on Jetson Nano on CPU/GPU with PyTorch compared to running FastSpeech+MBMelgan on CPU/GPU with Tensorflow2.x. On Jetson, the GPU takes 2+ seconds(even after warmup) for tiny sentences with Tensorflow2.x with the above pipeline (CPU runs faster, but inference time increases linearly with sentence length). Whereas using the PyTorch GPU implementation of FastSpeech+SqueezeWave is able to do this in ~0.5 seconds irrespective of the sentence length and with no warmup.

from tensorflowtts.

dathudeptrai avatar dathudeptrai commented on May 14, 2024

@sujeendran on Jetson i think u can inference directly by install our framework without convert into pb or TFlite, i noticed that run inference with @tf.function and input_signature no need warmup compared with pb. In overall i think FastSpeech + mbmelgan is fast enough to run real-time on streaming mode. BTW, did you use 8bit or 32bit for tflite ?, and Jetson nano is ARM ?

from tensorflowtts.

sujeendran avatar sujeendran commented on May 14, 2024

@dathudeptrai You are right about using the @tf.function directly on Jetson for faster inference. But I was trying to reduce the size taken by the model files to avoid keeping the source code on the target device. But the GPU inference is still 2+ seconds at least. I need something that is below 1 second.
In case of TFLite, allowing supported type tf.float16 increased the speed by around 16x I would say. But I couldnt do the same with FastSpeech. The conversion to tflite failed when I gave supported type tf.float16. Jetson Nano is ARM64. Can you help me out with 8bit tflite as you mentioned?

from tensorflowtts.

manmay-nakhashi avatar manmay-nakhashi commented on May 14, 2024

@sujeendran use TFLITE_BUILTINS_INT8 as opset while tflite conversion. also can you share your c++ inference code?

from tensorflowtts.

sujeendran avatar sujeendran commented on May 14, 2024

@manmay-nakhashi can you show your code for INT8 conversion of the fastspeech model? I tried several configurations but couldn't get INT8 to work. Did you provide any representative dataset while converting? and how is the quality of inference for INT8?

from tensorflowtts.

dathudeptrai avatar dathudeptrai commented on May 14, 2024

@sujeendran https://www.tensorflow.org/lite/performance/post_training_quantization

from tensorflowtts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.