Author: Awais Farooq

  • Gradient Centralization for Better Training Performance

    Introduction This example implements Gradient Centralization, a new optimization technique for Deep Neural Networks by Yong et al., and demonstrates it on Laurence Moroney’s Horses or Humans Dataset. Gradient Centralization can both speedup training process and improve the final generalization performance of DNNs. It operates directly on gradients by centralizing the gradient vectors to have zero mean.…

  • Video Vision Transformer

    Introduction Videos are sequences of images. Let’s assume you have an image representation model (CNN, ViT, etc.) and a sequence model (RNN, LSTM, etc.) at hand. We ask you to tweak the model for video classification. The simplest approach would be to apply the image model to individual frames, use the sequence model to learn…

  • Video Classification with Transformers

    This example is a follow-up to the Video Classification with a CNN-RNN Architecture example. This time, we will be using a Transformer-based model (Vaswani et al.) to classify videos. You can follow this book chapter in case you need an introduction to Transformers (with code). After reading this example, you will know how to develop hybrid Transformer-based models for…

  • Next-Frame Video Prediction with Convolutional LSTMs

    Introduction The Convolutional LSTM architectures bring together time series processing and computer vision by introducing a convolutional recurrent cell in a LSTM layer. In this example, we will explore the Convolutional LSTM model in an application to next-frame prediction, the process of predicting what video frames come next given a series of past frames. Setup Dataset Construction…

  • Video Classification with a CNN-RNN Architecture

    This example demonstrates video classification, an important use-case with applications in recommendations, security, and so on. We will be using the UCF101 dataset to build our video classifier. The dataset consists of videos categorized into different actions, like cricket shot, punching, biking, etc. This dataset is commonly used to build action recognizers, which are an application of…

  • Self-supervised contrastive learning with NNCLR

    Introduction Self-supervised learning Self-supervised representation learning aims to obtain robust representations of samples from raw data without expensive labels or annotations. Early methods in this field focused on defining pretraining tasks which involved a surrogate task on a domain with ample weak supervision labels. Encoders trained to solve such tasks are expected to learn general…

  • Metric learning for image similarity search using TensorFlow Similarity

    Overview This example is based on the “Metric learning for image similarity search” example. We aim to use the same data set but implement the model using TensorFlow Similarity. Metric learning aims to train models that can embed inputs into a high-dimensional space such that “similar” inputs are pulled closer to each other and “dissimilar” inputs are…

  • Metric learning for image similarity search

    Overview Metric learning aims to train models that can embed inputs into a high-dimensional space such that “similar” inputs, as defined by the training scheme, are located close to each other. These models once trained can produce embeddings for downstream systems where such similarity is useful; examples include as a ranking signal for search or…

  • Image similarity estimation using a Siamese Network with a triplet loss

    Introduction A Siamese Network is a type of network architecture that contains two or more identical subnetworks used to generate feature vectors for each input and compare them. Siamese Networks can be applied to different use cases, like detecting duplicates, finding anomalies, and face recognition. This example uses a Siamese Network with three identical subnetworks. We will…

  • Image similarity estimation using a Siamese Network with a contrastive loss

    Introduction Siamese Networks are neural networks which share weights between two or more sister networks, each producing embedding vectors of its respective inputs. In supervised similarity learning, the networks are then trained to maximize the contrast (distance) between embeddings of inputs of different classes, while minimizing the distance between embeddings of similar classes, resulting in embedding…