Giter VIP home page Giter VIP logo

depth-anything-android's Introduction

Depth-Anything V1/V2 - Android Demo

An Android app inferencing the popular Depth-Anything model, which is used for monocular depth estimation

app_img_01 app_img_02 app_img_03

Depth-Anything V1 vs. V2

v1_v2_compare

Updates

Project Setup

  1. Clone the repository, and open the resulting directory in Android Studio
$> git clone --depth=1 https://github.com/shubham0204/Depth-Anything-Android
  1. Download the ONNX models from models release and place them in the app/src/main/assets directory. The models are used by ONNX's OrtSession to load the computation graph and parameters in-memory.

Depth-Anything V1: Any one of the following models can be placed in the assets directory:

  • model.onnx: Depth-Anything module
  • model_fp16.onnx: float16 quantized version of model.onnx

Depth-Anything V2: Check the models-v2 release to download the models. The models come into two input sizes, 512 and 256. The model suffixed with _256 take an 256 * 256 sized image as input.

In DepthAnything.kt, make the following changes to inputDims and outputDims, along with the name of the model given as an argument to context.assets.open,

class DepthAnything(context: Context) {

    private val ortEnvironment = OrtEnvironment.getEnvironment()
    private val ortSession =
        ortEnvironment.createSession(context.assets.open("fused_model_uint8_256.onnx").readBytes())
    private val inputName = ortSession.inputNames.iterator().next()

    // For '_256' suffixed models
    private val inputDim = 256
    private val outputDim = 252
    // For other models
    // private val inputDim = 512
    // private val outputDim = 504

    // Other methods...
}
  1. Connect a device to Android Studio, and select Run Application from the top navigation pane.

Useful Resources

Note

The app contains an ONNX model which was created by combining the pre/post-processing operations required by Depth-Anything in a single model. To know more on how the model was built, refer this notebook.

Paper Summary

Depth Anything V1

  • MDE model trained on labeled data is used to annotate unlabeled images (62M) during training (semi-supervised learning, self learning or pseudo-labelling)
  • Teacher model trained on labeled images and then used to annotate unlabeled images. Student model trained on all images (labeled + teacher-annotated)
  • No performance gain observed, hence a more difficult optimization target was introduced for the student model. Unlabeled images are perturbed with (1) strong color distortions and (2) CutMix (used in image classification mostly)
  • Semantic assisted perception: Improve depth estimation with auxiliary semantic segmentation task, by using one shared encoder and two separate decoders

Citation

@misc{yang2024depth,
      title={Depth Anything V2}, 
      author={Lihe Yang and Bingyi Kang and Zilong Huang and Zhen Zhao and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
      year={2024},
      eprint={2406.09414},
      archivePrefix={arXiv},
      primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
}
@article{depthanything,
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
      journal={arXiv:2401.10891},
      year={2024}
}
@misc{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}

depth-anything-android's People

Contributors

shubham0204 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

depth-anything-android's Issues

Model quantized error Non-zero status code returned while running Resize node

Hi thx for your work very great app
but i got an questions the first time i copy the 2 model in assests it work but seem to me that it's the not quantized model running so i remove the model.onnx left modem_fp16.onnx rename it to model.onnx but now i get this error ( at this error the app crash):
[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Resize node. Name:'pre-/Resize' Status Message: upsamplebase.h:369 ScalesValidation 'Cubic' mode only support 2-D inputs ('Bicubic') or 4-D inputs with the corresponding outermost 2 scale values being 1 in the Resize operator
FATAL EXCEPTION: DefaultDispatcher-worker-2 Process: com.ml.shubham0204.depthanything, PID: 23065 ai.onnxruntime.OrtException: Error code - ORT_FAIL - message: Non-zero status code returned while running Resize node. Name:'pre-/Resize' Status Message: upsamplebase.h:369 ScalesValidation 'Cubic' mode only support 2-D inputs ('Bicubic') or 4-D inputs with the corresponding outermost 2 scale values being 1 in the Resize operator at ai.onnxruntime.OrtSession.run(Native Method) at ai.onnxruntime.OrtSession.run(OrtSession.java:395) at ai.onnxruntime.OrtSession.run(OrtSession.java:242) at ai.onnxruntime.OrtSession.run(OrtSession.java:210) at com.ml.shubham0204.depthanything.DepthAnything$predict$2.invokeSuspend(DepthAnything.kt:44) at com.ml.shubham0204.depthanything.DepthAnything$predict$2.invoke(Unknown Source:8) at com.ml.shubham0204.depthanything.DepthAnything$predict$2.invoke(Unknown Source:4) at kotlinx.coroutines.intrinsics.UndispatchedKt.startUndispatchedOrReturn(Undispatched.kt:78) at kotlinx.coroutines.BuildersKt__Builders_commonKt.withContext(Builders.common.kt:167) at kotlinx.coroutines.BuildersKt.withContext(Unknown Source:1) at com.ml.shubham0204.depthanything.DepthAnything.predict(DepthAnything.kt:37) at com.ml.shubham0204.depthanything.MainActivity$ImageSelectionUI$pickMediaLauncher$1$1.invokeSuspend(MainActivity.kt:104) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684) Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@13f19ec, Dispatchers.Default]

Inference latency

Thank you for the great project! Could you please provide some information of inference latency?

NNAPIFlags for Specifying CPU or GPU Inference Do Not Take Effect

  • ONNX Runtime version: 1.17.0
  • Android version: 10
  • Kotlin version: 1.9.22
  • JAVA version: 2.1
  • SDK Build-Tools: 33.0.1
  • AGP: 8.1.3
  • GPU: ARM Mali GPU | G310
  • CPU: ARMv8 Processor rev 4 (v8l)

Steps to Reproduce:

I'm trying to configure the ONNX Runtime session within the DepthAnything Class to use NNAPI with specific flags (e.g. USE_FP16 and CPU_DISABLED). Here's the code snippet for setting up the session options:

    private val ortEnvironment = OrtEnvironment.getEnvironment()
    private val ortSession : OrtSession
    private val inputName: String
    init 
    {
            // Create session options
            val options = OrtSession.SessionOptions().apply 
            {
                 addNnapi(EnumSet.of(NNAPIFlags.USE_FP16, NNAPIFlags.CPU_DISABLED))
            }
            val modelByteArray = context.assets.open("depth_anything_small_fp16.onnx").readBytes()
            ortSession = ortEnvironment.createSession(modelByteArray, options)
            inputName = ortSession.inputNames.iterator().next()
    }

Expected Behavior:
I expected the model inference to run using NNAPI with FP16 precision and without using the CPU.

Actual Behavior:
The inference seems to run as if these options were not applied at all. The performance and behavior do not change regardless of the flags set.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.