hanoonar Goto Github PK

followers: 92.0 following: 21.0 repos: 23.0 gists: 0.0

Name: Hanoona Rasheed

Type: User

Company: MBZUAI

Bio: Computer Vision

Location: Dubai, UAE

Blog: https://www.hanoonarasheed.com/

Hi there 👋

I am a Ph.D. student in Computer Vision at MBZUAI. My current area of research is focused on exploring the potential of multi-modal understanding from vision and language to build scalable general-purpose vision systems, that continually learn and can generalize to various domains and downstream tasks using an open-vocabulary.

🔭 I’m currently working on Multi-Modal Transformers in Computer Vision Applications.
🌍 Visit my webpage: hanoonarasheed.com
🏛 Part of IVAL Lab
📫 How to reach me: [email protected]

Hanoona Rasheed's Projects

awesome-detection-transformer

Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)

deformable-detr

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

detectron2

Detectron2 is FAIR's next-generation platform for object detection, segmentation and other visual recognition tasks.

detic

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

detreg

Official implementation of the paper "DETReg: Unsupervised Pretraining with Region Priors for Object Detection".

edgenext

Official repository of paper titled "EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications".

fastai-v3

Starter app for fastai v3 model deployment on Render

fseai

Library for Deep Learning & Machine Learning, providing functions for various use cases like loading datasets, data visualization, data representation, training models, optimization and testing models.

fseai_cv_assignments-1

All the assignments for capsule degree

gpv-1

A task-agnostic vision-language architecture as a step towards General Purpose Vision

groundinglmm

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

guides

A collection of succinct guides - Public Domain

hangar_tutorial

A beginners guide to Hangar

hanoonar

hanoonar.github.io

llava-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

multimodal-prompt-learning

Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

object-centric-ovd

[NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection".

owod

Replicates results of OWOD

palo

Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.

pytorch_beginner_guide

Helps beginners to get started with PyTorch, by giving a brief introduction to tensors, basic torch operations, and building a neural network model from scratch.

video-chatgpt-1

Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation.

video-llava

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

hanoonar Goto Github PK

Hi there 👋

Hanoona Rasheed's Projects

Recommend Projects

Recommend Topics

Recommend Org