11. Video – My Blog

Video Vision Transformer

Jun 27, 2024

—

by

Introduction Videos are sequences of images. Let’s assume you have an image representation model (CNN, ViT, etc.) and a sequence model (RNN, LSTM, etc.) at hand. We ask you to tweak the model for video classification. The simplest approach would be to apply the image model to individual frames, use the sequence model to learn…

Video Classification with Transformers

Jun 27, 2024

—

by

Awais Farooq

in 11. Video

This example is a follow-up to the Video Classification with a CNN-RNN Architecture example. This time, we will be using a Transformer-based model (Vaswani et al.) to classify videos. You can follow this book chapter in case you need an introduction to Transformers (with code). After reading this example, you will know how to develop hybrid Transformer-based models for…

Next-Frame Video Prediction with Convolutional LSTMs

Jun 27, 2024

—

by

Awais Farooq

in 11. Video

Introduction The Convolutional LSTM architectures bring together time series processing and computer vision by introducing a convolutional recurrent cell in a LSTM layer. In this example, we will explore the Convolutional LSTM model in an application to next-frame prediction, the process of predicting what video frames come next given a series of past frames. Setup Dataset Construction…

Video Classification with a CNN-RNN Architecture

Jun 27, 2024

—

by

Awais Farooq

in 11. Video

This example demonstrates video classification, an important use-case with applications in recommendations, security, and so on. We will be using the UCF101 dataset to build our video classifier. The dataset consists of videos categorized into different actions, like cricket shot, punching, biking, etc. This dataset is commonly used to build action recognizers, which are an application of…

Category: 11. Video

Video Vision Transformer

Video Classification with Transformers

Next-Frame Video Prediction with Convolutional LSTMs

Video Classification with a CNN-RNN Architecture