Hello First of all thank you for making such a wonderful library. Th

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Difference between CPU and GPU output for same input about tflite-support HOT 15 CLOSED

tensorflow commented on July 19, 2024

Difference between CPU and GPU output for same input

from tflite-support.

Comments (15)

FSet89 commented on July 19, 2024

Did you find an answer for this problem?

from tflite-support.

rajeev-tbrew commented on July 19, 2024

@FSet89 No, I have not been able to find a solution. I saw that you have raised similar issue where difference is between CPU and NNPI. At least the issue I reported is some what validated :).

from tflite-support.

FSet89 commented on July 19, 2024

Yes, in my case the problem might be related with the quantization, as the inference device (NPU) may have some precision issues depending on how the quantization is performed (symm(asymm, per channel /per axis, int8/uint8...). Are you using a quantized model?

from tflite-support.

rajeev-tbrew commented on July 19, 2024

Yes, it is a quantized model but I feel implementation should be consistent OR at least we should know how to make it beave similarly. I think this library is currently evolving and we should have more info coming soon.

from tflite-support.

wangtz commented on July 19, 2024

@lu-wang-g

from tflite-support.

lu-wang-g commented on July 19, 2024

@rajeev-tbrew, can you please provide the code snippet of creating tensorbuffer and passing it to Model? Thanks!

@xunkai55 to investigate this further.

from tflite-support.

xunkai55 commented on July 19, 2024

Thanks for your nice words!

I'd like to know if there's any possibility to have a model and a piece of data for debugging purpose, so that we can reproduce the problem on our side. In theory, GPU / CPU inference should be consistent.

from tflite-support.

rajeev-tbrew commented on July 19, 2024

@xunkai55 sorry for the delayed response. Here is how we can reproduce.

The model is Mediapipe pose detector available on this URL.

To use GPU, I used kotlin to set options using the following code:

    val compatibilityList = CompatibilityList();
     val options = if(compatibilityList.isDelegateSupportedOnThisDevice){
        Log.d("Output", "This Device is GPU compatible")
        Model.Options.Builder().setDevice(Model.Device.GPU).build();

    } else {
        Log.d("Output", "This Device is not GPU compatible")
        Model.Options.Builder().setNumThreads(4).build(); 
    val poseDetection: PoseDetectionMl by lazy {
            PoseDetectionMl.newInstance(context, options);
        }
   
    fun StartPoseDetector():PoseDetectionMl{
        return poseDetection;
    }

In Java, the following code works with the image and prepares it for model input

ImageProcessor imageProcessor = new ImageProcessor.Builder()
               .add(new ResizeOp(new_height, new_width, ResizeOp.ResizeMethod.BILINEAR ))
               .add(new ResizeWithCropOrPadOp(224, 224))
               .add(new NormalizeOp(127.5f, 127.5f))
               .build(); 

 TensorImage tImage = TensorImage.fromBitmap(img);
 tImage = imageProcessor.process(tImage);

Following code then run the model prediction and gets boxes and scores tensors.

PoseDetectionMl.Outputs outputs = model.process(tImage.getTensorBuffer());
        TensorBuffer boxes = outputs.getOutputFeature0AsTensorBuffer(); //2254 ROIs
        TensorBuffer scores = outputs.getOutputFeature1AsTensorBuffer();

I am sharing the input image that I used for both GPU and CPU run along with box and scores outputs. Please do let me know if you need any additional information. Thanks a lot for your help.

output_cpu_boxes.txt
output_cpu_scores.txt
output_gpu_boxes.txt
output_gpu_scores.txt

from tflite-support.

xunkai55 commented on July 19, 2024

Thanks for reporting. I can reproduce an inconsistency on my machine and pixel 4. Will investigate deeper.

from tflite-support.

xunkai55 commented on July 19, 2024

Hi there,

TFLite by default turns on a switch for GPU inference, which allows performance loss but gains faster speed.

TFLite Support (org.tensorflow.lite.support.model.Model) adopts default settings so that switch is on. In a very simple demo, I turned off that switch (by using TFLite interpreter / delegate API) and the result looks identical then.

Please take a look.

In the future, we need to revisit our API to explore the ways to expose that option.

GpuDiffers.zip

from tflite-support.

xunkai55 commented on July 19, 2024

@rajeev-tbrew Here's how to replace the Model with Interpreter (copy-pasted from the zipped project above):

// Initialization
      Interpreter.Options gpuOptions = new Interpreter.Options();
      GpuDelegate.Options gpuDelegateOptions = new GpuDelegate.Options();
      gpuDelegateOptions.setPrecisionLossAllowed(false);
      GpuDelegate gpuDelegate = new GpuDelegate(gpuDelegateOptions);
      gpuOptions.addDelegate(gpuDelegate);
      Interpreter gpuInterpreter = new Interpreter(FileUtil.loadMappedFile(getApplicationContext(), MODEL), gpuOptions);

// Inference
    Map<Integer, Object> outputs = new HashMap<Integer, Object>();
    outputs.put(0, detection);
    outputs.put(1, score);
    interpreter.runForMultipleInputsOutputs(new Object[]{input.getBuffer()}, outputs);

from tflite-support.

rajeev-tbrew commented on July 19, 2024

Thank you @xunkai55. It will be great to have this option exposed in the support library as it makes usage of tflite models quite simple. Please feel free to close this issue if you want to track this feature request in a separate thread or you can keep it open till this option is made available in tflite-support . Thanks once again for your help.

from tflite-support.

mikaraento commented on July 19, 2024

We are looking into the differences here, however I wanted to help set expectations: GPU inference will give different answers from CPU. Whether the difference is too large compared to the performance benefit depends on the usecase and we recommend you testing your specific model and usecase before production.

As an example, internal Pose detection models have significantly different results on GPU vs CPU even though they are floating point models. In our testing the difference however is acceptable for our specific usecases.

If you need a systematic way to monitor for differences between accelerators for your model, please take a look at the mini-benchmark in https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/acceleration/mini_benchmark

from tflite-support.

rajeev-tbrew commented on July 19, 2024

@mikaraento Thanks for your inputs. I have slightly different views (which could be entirely wrong as well).

The model output in this case is not dependent on where we run it i.e. CPU or GPU. The reason it is different is because GPU is being run in a setup where precision loss is taken to speed up the inference. But if we remove that setup then both GPU and CPU give same exact outputs. That's what @xunkai55 meant in his post by using other API (Interpreter) which allows us to run model the model w/o precision loss on GPU and get the same results as running on CPU.

from tflite-support.

lu-wang-g commented on July 19, 2024

Closing the issue for now. Feel free to reopen if you have further questions.

from tflite-support.

Difference between CPU and GPU output for same input about tflite-support HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent