Category: 11. Video

  • Video Vision Transformer

    Introduction Videos are sequences of images. Let’s assume you have an image representation model (CNN, ViT, etc.) and a sequence model (RNN, LSTM, etc.) at hand. We ask you to tweak the model for video classification. The simplest approach would be to apply the image model to individual frames, use the sequence model to learn…

  • Video Classification with Transformers

    This example is a follow-up to the Video Classification with a CNN-RNN Architecture example. This time, we will be using a Transformer-based model (Vaswani et al.) to classify videos. You can follow this book chapter in case you need an introduction to Transformers (with code). After reading this example, you will know how to develop hybrid Transformer-based models for…

  • Next-Frame Video Prediction with Convolutional LSTMs

    Introduction The Convolutional LSTM architectures bring together time series processing and computer vision by introducing a convolutional recurrent cell in a LSTM layer. In this example, we will explore the Convolutional LSTM model in an application to next-frame prediction, the process of predicting what video frames come next given a series of past frames. Setup Dataset Construction…

  • Video Classification with a CNN-RNN Architecture

    This example demonstrates video classification, an important use-case with applications in recommendations, security, and so on. We will be using the UCF101 dataset to build our video classifier. The dataset consists of videos categorized into different actions, like cricket shot, punching, biking, etc. This dataset is commonly used to build action recognizers, which are an application of…