Computer Vision, Audio, & Multimodal Projects

This repository houses both semi-structured and non-structured projects that both were not completed using Spark and are not Natural Language (NLP) projects.

Binary Image Classification (Computer Vision)

Project Name	Accuracy	F1-Score	Precision	Recall
Bart vs Homer	0.9863	0.9841	0.9688	1.0
Brain Tumor MRI Images	0.9216	0.9375	0.8824	1.0
COVID19 Lung CT Scans	0.94	0.9379	0.9855	0.8947
Car or Motorcycle	0.9938	0.9939	0.9951	0.9927
Dogs or Cats Image Classification	0.99	0.9897	0.9885	0.9909
Male or Female Eyes	0.9727	0.9741	0.9818	0.9666
Breast Histopathology Image Classification	0.8202	0.8151	0.8141	0.8202

Multiclass & Multilabel Image Classification

Multiclass Image Classification

Project Name	Accuracy	Macro F1-Score	Macro Precision	Macro Recall	Best Algorithm
Brain Tumors Image Classification 1	0.8198	0.8054	0.8769	0.8149	Vision Transformer (ViT)
Diagnoses from Colonoscopy Images	0.9375	0.9365	0.9455	0.9375	-
Human Activity Recognition	0.8381	0.8394	0.8424	0.839	-
Intel Image Classification	0.9487	0.9497	0.9496	0.95	-
Landscape Recognition	0.8687	0.8694	0.8714	0.8687	-
Lung & Colon Cancer	0.9994	0.9994	0.9994	0.9994	-
Mango Leaf Disease Dataset	1.0	1.0	1.0	1.0	-
Simpsons Family Images	0.953	0.9521	0.9601	0.9531	-
Vegetable Image Classification	1.0	1.0	1.0	1.0	-
Weather Images	0.934	0.9372	0.9398	0.9354	-
Hyper Kvasir Labeled Image Classification	0.8756	0.5778	0.5823	0.5746	-

Multilabel Image Classification

Project Name	Subset Accuracy	F1 Score	ROC AUC
Futurama - ML Image CLF	0.9672	0.9818	0.9842

Object Detection (Computer Vision)

Project Name	Avg. Precision²	Avg. Recall³
License Plate Object Detection	0.513	0.617
Pedestrian Object Detection	0.560	0.745
ACL X-Rays	0.09	0.308
Abdomen MRIs	0.453	0.715
Axial MRIs	0.284	0.566
Blood Cell Object Detection	0.344	0.448
Brain Tumors	0.185	0.407
Cell Tower Object Detection	0.287	0.492
Stomata Cells	0.340	0.547
Excavator Object Detection	0.386	0.748
Forklift Object Detection	0.136	0.340
Hard Hat Object Detection	0.346	0.558
Liver Disease Object Detection	0.254	0.552

There are other Object Detection projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.

Image Segmentation (Computer Vision)

Project Name	Mean IoU	Mean Accuracy	Overall Accuracy	Use PEFT?
Carvana Image Modeling	0.9917	0.9962	0.9972	Yes
Dominoes	0.9198	0.9515	0.9778	Yes
CMP Facade (V2)	0.3102	0.4144	0.6267	Yes

There are other Image Segmentation projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.

Document AI Projects

Multiclass Classification

Project Name	Accuracy	Macro F1 Score	Macro Precision	Macro Recall
Document Classification - Desafio_1	0.9865	0.9863	0.9870	0.9861
Document Classification RVL-CDIP	0.9767	0.9154	0.9314	0.9019
Real World Documents Collections	0.767	0.7704	0.7767	0.7707
Real World Documents Collections_v2	0.826	0.8242	0.8293	0.8237
Tobacco-Related Documents	0.7532	0.722	-	-
Tobacco-Related Documents_v2	0.8666	0.8308	-	-
Tobacco-Related Documents_v3	0.9419	0.9278	-	-

Audio Projects

Project Name	Project Type
Vinyl Scratched or Not	Binary Audio Classification
Audio-Drum Kit Sounds	Multiclass Audio Classification
Speech Emotion Detection	Emotion Detection
Toronto Emotional Speech Set (TESS)	Emotion Detection
ASR Speech Recognition Dataset	Automatic Speech Recognition

Optical Character Recognition Projects

Project Name	CER⁴
20,000 Synthetic Samples Dataset	0.0029
Captcha	0.0075
Handwriting Recognition (v1)	0.0533
Handwriting Recognition (v2)	0.0360
OCR License Plate Text Recognition	0.0368
Tesseract E13B	0.0036
Tesseract CMC7	0.0050

Footnotes:

This project is part of a transformer comparison. ↩
Average Precision (AP) @[IoU=0.50:0.95 | area=all | maxDets=100] ↩
Average Recall (AR) @[IoU=0.50:0.95 | area=all | maxDets=100] ↩
CER stands for Character Error Rate. ↩

dunnbc22 / vision_audio_and_multimodal_projects Goto Github PK

vision_audio_and_multimodal_projects's Introduction

Computer Vision, Audio, & Multimodal Projects

Multiclass Image Classification

Multilabel Image Classification

Multiclass Classification

vision_audio_and_multimodal_projects's People

Contributors

Stargazers

Watchers

Forkers

vision_audio_and_multimodal_projects's Issues

Your Tobacco3482 dataset has 2x3482 examples (Tobacco Dataset & DiT Transformer Project_v3.ipynb)

Is there any plan for open vocabulary object detection Tutorial?

Question regarding Speech Emotion Detection with Revdess

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent