Giter VIP home page Giter VIP logo

vision_audio_and_multimodal_projects's Introduction

Computer Vision, Audio, & Multimodal Projects

This repository houses both semi-structured and non-structured projects that both were not completed using Spark and are not Natural Language (NLP) projects.

Binary Image Classification (Computer Vision)
Project Name Accuracy F1-Score Precision Recall
Bart vs Homer 0.9863 0.9841 0.9688 1.0
Brain Tumor MRI Images 0.9216 0.9375 0.8824 1.0
COVID19 Lung CT Scans 0.94 0.9379 0.9855 0.8947
Car or Motorcycle 0.9938 0.9939 0.9951 0.9927
Dogs or Cats Image Classification 0.99 0.9897 0.9885 0.9909
Male or Female Eyes 0.9727 0.9741 0.9818 0.9666
Breast Histopathology Image Classification 0.8202 0.8151 0.8141 0.8202
Multiclass & Multilabel Image Classification

Multiclass Image Classification

Project Name Accuracy Macro F1-Score Macro Precision Macro Recall Best Algorithm
Brain Tumors Image Classification1 0.8198 0.8054 0.8769 0.8149 Vision Transformer (ViT)
Diagnoses from Colonoscopy Images 0.9375 0.9365 0.9455 0.9375 -
Human Activity Recognition 0.8381 0.8394 0.8424 0.839 -
Intel Image Classification 0.9487 0.9497 0.9496 0.95 -
Landscape Recognition 0.8687 0.8694 0.8714 0.8687 -
Lung & Colon Cancer 0.9994 0.9994 0.9994 0.9994 -
Mango Leaf Disease Dataset 1.0 1.0 1.0 1.0 -
Simpsons Family Images 0.953 0.9521 0.9601 0.9531 -
Vegetable Image Classification 1.0 1.0 1.0 1.0 -
Weather Images 0.934 0.9372 0.9398 0.9354 -
Hyper Kvasir Labeled Image Classification 0.8756 0.5778 0.5823 0.5746 -

Multilabel Image Classification

Project Name Subset Accuracy F1 Score ROC AUC
Futurama - ML Image CLF 0.9672 0.9818 0.9842
Object Detection (Computer Vision)
Project Name Avg. Precision2 Avg. Recall3
License Plate Object Detection 0.513 0.617
Pedestrian Object Detection 0.560 0.745
ACL X-Rays 0.09 0.308
Abdomen MRIs 0.453 0.715
Axial MRIs 0.284 0.566
Blood Cell Object Detection 0.344 0.448
Brain Tumors 0.185 0.407
Cell Tower Object Detection 0.287 0.492
Stomata Cells 0.340 0.547
Excavator Object Detection 0.386 0.748
Forklift Object Detection 0.136 0.340
Hard Hat Object Detection 0.346 0.558
Liver Disease Object Detection 0.254 0.552
  • There are other Object Detection projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.
Image Segmentation (Computer Vision)
Project Name Mean IoU Mean Accuracy Overall Accuracy Use PEFT?
Carvana Image Modeling 0.9917 0.9962 0.9972 Yes
Dominoes 0.9198 0.9515 0.9778 Yes
CMP Facade (V2) 0.3102 0.4144 0.6267 Yes
  • There are other Image Segmentation projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.
Document AI Projects

Multiclass Classification

Project Name Accuracy Macro F1 Score Macro Precision Macro Recall
Document Classification - Desafio_1 0.9865 0.9863 0.9870 0.9861
Document Classification RVL-CDIP 0.9767 0.9154 0.9314 0.9019
Real World Documents Collections 0.767 0.7704 0.7767 0.7707
Real World Documents Collections_v2 0.826 0.8242 0.8293 0.8237
Tobacco-Related Documents 0.7532 0.722 - -
Tobacco-Related Documents_v2 0.8666 0.8308 - -
Tobacco-Related Documents_v3 0.9419 0.9278 - -
Audio Projects
Project Name Project Type
Vinyl Scratched or Not Binary Audio Classification
Audio-Drum Kit Sounds Multiclass Audio Classification
Speech Emotion Detection Emotion Detection
Toronto Emotional Speech Set (TESS) Emotion Detection
ASR Speech Recognition Dataset Automatic Speech Recognition
Optical Character Recognition Projects
Project Name CER4
20,000 Synthetic Samples Dataset 0.0029
Captcha 0.0075
Handwriting Recognition (v1) 0.0533
Handwriting Recognition (v2) 0.0360
OCR License Plate Text Recognition 0.0368
Tesseract E13B 0.0036
Tesseract CMC7 0.0050

Footnotes:

Footnotes

  1. This project is part of a transformer comparison.

  2. Average Precision (AP) @[IoU=0.50:0.95 | area=all | maxDets=100]

  3. Average Recall (AR) @[IoU=0.50:0.95 | area=all | maxDets=100]

  4. CER stands for Character Error Rate.

vision_audio_and_multimodal_projects's People

Contributors

dunnbc22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vision_audio_and_multimodal_projects's Issues

Your Tobacco3482 dataset has 2x3482 examples (Tobacco Dataset & DiT Transformer Project_v3.ipynb)

Your dataset has twice the number of examples as the original Tobacco3482 dataset downloaded from Kaggle. When I downloaded the dataset from Kaggle, there was a copy of the Tobacco3482-jpg directory within the Tobacco3482-jpg directory itself so its likely that you had duplicates. Since train_test_split is random, its likely that you were testing on training data so your results are unfortunately likely biased.

Edit: I looked at your v2 and it correctly has 3482. So the train-test overlap likely explains the performance improvement.

Question regarding Speech Emotion Detection with Revdess

Hey Nice work on Speech Emotion Detection with Crema and TESS, I'm curious if you carried out similar experiment with Revdess as well. I'm having trouble getting val loss to decrease on Revdess.

Any comments will be helpful. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.