Author: Awais Farooq

  • FixRes: Fixing train-test resolution discrepancy

    Introduction It is a common practice to use the same input image resolution while training and testing vision models. However, as investigated in Fixing the train-test resolution discrepancy (Touvron et al.), this practice leads to suboptimal performance. Data augmentation is an indispensable part of the training process of deep neural networks. For vision models, we typically use…

  • Knowledge Distillation

    Introduction to Knowledge Distillation Knowledge Distillation is a procedure for model compression, in which a small (student) model is trained to match a large pre-trained (teacher) model. Knowledge is transferred from the teacher model to the student by minimizing a loss function, aimed at matching softened teacher logits as well as ground-truth labels. The logits…

  • Learning to tokenize in Vision Transformers

    Introduction Vision Transformers (Dosovitskiy et al.) and many other Transformer-based architectures (Liu et al., Yuan et al., etc.) have shown strong results in image recognition. The following provides a brief overview of the components involved in the Vision Transformer architecture for image classification: If we take 224×224 images and extract 16×16 patches, we get a total…

  • Gradient Centralization for Better Training Performance

    Introduction This example implements Gradient Centralization, a new optimization technique for Deep Neural Networks by Yong et al., and demonstrates it on Laurence Moroney’s Horses or Humans Dataset. Gradient Centralization can both speedup training process and improve the final generalization performance of DNNs. It operates directly on gradients by centralizing the gradient vectors to have zero mean.…

  • Video Vision Transformer

    Introduction Videos are sequences of images. Let’s assume you have an image representation model (CNN, ViT, etc.) and a sequence model (RNN, LSTM, etc.) at hand. We ask you to tweak the model for video classification. The simplest approach would be to apply the image model to individual frames, use the sequence model to learn…

  • Video Classification with Transformers

    This example is a follow-up to the Video Classification with a CNN-RNN Architecture example. This time, we will be using a Transformer-based model (Vaswani et al.) to classify videos. You can follow this book chapter in case you need an introduction to Transformers (with code). After reading this example, you will know how to develop hybrid Transformer-based models for…

  • Next-Frame Video Prediction with Convolutional LSTMs

    Introduction The Convolutional LSTM architectures bring together time series processing and computer vision by introducing a convolutional recurrent cell in a LSTM layer. In this example, we will explore the Convolutional LSTM model in an application to next-frame prediction, the process of predicting what video frames come next given a series of past frames. Setup Dataset Construction…

  • Video Classification with a CNN-RNN Architecture

    This example demonstrates video classification, an important use-case with applications in recommendations, security, and so on. We will be using the UCF101 dataset to build our video classifier. The dataset consists of videos categorized into different actions, like cricket shot, punching, biking, etc. This dataset is commonly used to build action recognizers, which are an application of…

  • Self-supervised contrastive learning with NNCLR

    Introduction Self-supervised learning Self-supervised representation learning aims to obtain robust representations of samples from raw data without expensive labels or annotations. Early methods in this field focused on defining pretraining tasks which involved a surrogate task on a domain with ample weak supervision labels. Encoders trained to solve such tasks are expected to learn general…

  • Metric learning for image similarity search using TensorFlow Similarity

    Overview This example is based on the “Metric learning for image similarity search” example. We aim to use the same data set but implement the model using TensorFlow Similarity. Metric learning aims to train models that can embed inputs into a high-dimensional space such that “similar” inputs are pulled closer to each other and “dissimilar” inputs are…