Giter VIP home page Giter VIP logo

sarkar-computer-vision-lectures's Introduction

Computer Vision Course

taught by Prof. Sudeep Sarkar, University of South Florida, Tampa, USA

A series of Colab (Jupyter) notebooks that walk you through the fundamentals of Computer Vision in python/opencv/pytorch/numpy/scipy.

This is an elective course in the general theme of artificial intelligence (AI) and will teach you about algorithms to extract information from images and video. We will learn about the problems of segmentation, tracking, extraction of geometric transforms in images, estimation of 3D information from a 2D image(s), and object detection and recognition using traditional and deep learning methods.

Installation

Clone this repository. See how to do it -- https://www.geeksforgeeks.org/how-to-clone-github-repository-and-push-changes-in-colaboratory/

Course Objectives

  • Solve the core problems in computer vision, including segmentation, low-level features, tracking, 2D and 3D image geometry, structure from motion, stereo, and object recognition.

  • Use traditional approaches and new deep learning-based solutions to solve computer vision problems.

  • Apply the theoretical pinning of the major solution approaches

  • Write computer vision code based on this understanding to solve real-life problems.

Textbooks

  1. Main: Richard Szeliski (2022) Computer Vision: Algorithms and Applications, ISBN 978-3030343712, Springer (second edition). The pdf of the book (Current draft, second edition) is available at http://szeliski.org/Book for free for personal use.

  2. Nalwa, Vishvjit (1993) A Guided Tour of Computer Vision, Addison-Wesley, Reading MA (ISBN 1-201-54853-4).

  3. Jain, Ramesh, Rangachar Kasturi, and Brian G. Schunck (1995), Machine Vision, McGraw-Hill, New York (ISBN 0-07-032018-7).

Course Structure

The course is divided into 9 modules, each covering a topic area in computer vision. Each module is further divided into single class session notebooks. Two class sessions constitute a week. The entire set of modules is meant for a semester long course over 14 weeks. Each class session notebook include reading materials and assignments that involve running some variation of a code or solving some math problem, or providing explanations.

  • Module 1: Image operations (Chapters 2 and 3)
    • Module 1.1: Digital camera (Section 2.3), Images and videos as arrays, image thresholding (text separation, text scanning apps)
    • Module 1.2: Geometric primitives and transformations: 2D transformations (Section 2.1)
    • Module 1.3: Pixel operations and histogram equalization (Section 3.1)
  • Module 2: Perspective Camera Model (Chapters 2)
    • Module 2.1: Perspective camera, intrinsic and extrinsic, 3D rigid (Section 2.1)
    • Module 2.2: Perspective camera, intrinsic and extrinsic, 3D rigid (Section 2.1)
  • Module 3: Linear filtering (Chapters 3)
    • Module 3.1: Linear filtering, Gaussian convolutions, and its derivatives (Section 3.2)
    • Module 3.2: Linear filtering, Gaussian convolutions, and its derivatives (Section 3.2)
    • Module 3.3: Linear filtering, Gaussian convolutions, and its derivatives (Section 3.2)
  • Module 4: Point features and matching (Chapters 3)
    • Module 4.1: Multiresolution representations (image pyramids) (Section 3.5)
    • Module 4.2: SIFT feature detector, descriptor, and matching (Section 7.1)
    • Module 4.3: Linear estimation of 2D to 2D matching
    • Module 4.4: Non-linear estimation of 2D to 2D matching
  • Module 5: Object labeling using Deep Learning (Chapters 5 and 6)
    • Module 5.1: Deep Learning Networks basics – Single layer network with regression (Section 5.3)
    • Module 5.2: Multilayer Perceptron (MLP) (Section 5.3)
    • Module 5.3: Convolution Neural Networks: Le-Net (MLP)
    • Module 5.4: Convolution Neural Networks: Alex Net (Section 5.4)
    • Module 5.5: Convolution Neural Networks: VGG, ResNet (Section 5.4)
  • Module 6: Object localization using Deep Learning (Chapters 5 and 6)
    • Module 6.1: Object Detection (SSD) (Section 6.3, 6.4)
    • Module 6.2: Region-based CNNs, R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN (Section 6.4)
    • Module 6.3: Semantic Segmentation – Fully Convolutional Networks (FCN) (Section 6.4)
  • Module 7: 2D to 3D pose alignment
    • Module 7.1: 2D to 3D pose alignment, camera calibration (Section 11.2.1)
    • Module 7.2: 2D to 3D pose alignment, camera calibration (Section 11.2.2)
  • Module 8: 3D from 2D
    • Module 8:1: 3D from 2D – 2-camera, known geometry, stereo (Section 12.1, 12.3)
    • Module 8.2: 3D from 2D - multi-camera, known geometry, triangulation (Section 11.2.4)
    • Module 8.3: 3D from 2D - multi-camera, known geometry, triangulation (Section 11.2.4)
    • Module 8.4: 3D from 2D – 2-frame, unknown geometry (motion) (Section 11.3)
  • Module 9: Video processing
    • Module 9.1: Object tracking (Kalman)
    • Module 9.2: Object tracking (Kalman)

sarkar-computer-vision-lectures's People

Contributors

sudeepsarkar avatar dzunglt24 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.