Awais Farooq – Page 55

Semantic Image Clustering

Jun 27, 2024

—

by

Introduction This example demonstrates how to apply the Semantic Clustering by Adopting Nearest neighbors (SCAN) algorithm (Van Gansbeke et al., 2020) on the CIFAR-10 dataset. The algorithm consists of two phases: Setup Prepare the data Define hyperparameters Implement data preprocessing The data preprocessing step resizes the input images to the desired target_size and applies feature-wise normalization. Note that, when using keras.applications.ResNet50V2 as the…

Near-duplicate image search

Jun 27, 2024

—

by

Awais Farooq

in 10. Image similarity search

Introduction Fetching similar images in (near) real time is an important use case of information retrieval systems. Some popular products utilizing it include Pinterest, Google Image Search, etc. In this example, we will build a similar image search utility using Locality Sensitive Hashing (LSH) and random projection on top of the image representations computed by a pretrained image classifier.…

Grad-CAM class activation visualization

Jun 27, 2024

—

by

Awais Farooq

in 09. Vision models interpretability

Setup Configurable parameters You can change these to another model. To get the values for last_conv_layer_name use model.summary() to see the names of all layers in the model. The Grad-CAM algorithm Let’s test-drive it Create a superimposed visualization Let’s try another image We will see how the grad cam explains the model’s outputs for a multi-label image. Let’s try…

Investigating Vision Transformer representations

Jun 27, 2024

—

by

Awais Farooq

in 09. Vision models interpretability

Introduction In this example, we look into the representations learned by different Vision Transformer (ViT) models. Our main goal with this example is to provide insights into what empowers ViTs to learn from image data. In particular, the example discusses implementations of a few different ViT analysis tools. Note: when we say “Vision Transformer”, we refer…

Model interpretability with Integrated Gradients

Jun 27, 2024

—

by

Awais Farooq

in 09. Vision models interpretability

Integrated Gradients Integrated Gradients is a technique for attributing a classification model’s prediction to its input features. It is a model interpretability technique: you can use it to visualize the relationship between input features and model predictions. Integrated Gradients is a variation on computing the gradient of the prediction output with regard to features of the…

Visualizing what convnets learn

Jun 26, 2024

—

by

Awais Farooq

in 09. Vision models interpretability

Introduction In this example, we look into what sort of visual patterns image classification models learn. We’ll be using the ResNet50V2 model, trained on the ImageNet dataset. Our process is simple: we will create input images that maximize the activation of specific filters in a target layer (picked somewhere in the middle of the model: layer conv3_block4_out). Such…

Natural language image search with a Dual Encoder

Jun 26, 2024

—

by

Awais Farooq

in 08. Image & Text

Introduction The example demonstrates how to build a dual encoder (also known as two-tower) neural network model to search for images using natural language. The model is inspired by the CLIP approach, introduced by Alec Radford et al. The idea is to train a vision encoder and a text encoder jointly to project the representation of images…

Image Captioning

Jun 26, 2024

—

by

Awais Farooq

in 08. Image & Text

Setup Download the dataset We will be using the Flickr8K dataset for this tutorial. This dataset comprises over 8,000 images, that are each paired with five different captions. Preparing the dataset Vectorizing the text data We’ll use the TextVectorization layer to vectorize the text data, that is to say, to turn the original strings into integer sequences…

RandAugment for Image Classification for Improved Robustness

Jun 26, 2024

—

by

Awais Farooq

in 07. Data augmentation

Data augmentation is a very useful technique that can help to improve the translational invariance of convolutional neural networks (CNN). RandAugment is a stochastic data augmentation routine for vision data and was proposed in RandAugment: Practical automated data augmentation with a reduced search space. It is composed of strong augmentation transforms like color jitters, Gaussian blurs,…

MixUp augmentation for image classification

Jun 26, 2024

—

by

Awais Farooq

in 07. Data augmentation

Introduction mixup is a domain-agnostic data augmentation technique proposed in mixup: Beyond Empirical Risk Minimization by Zhang et al. It’s implemented with the following formulas: (Note that the lambda values are values with the [0, 1] range and are sampled from the Beta distribution.) The technique is quite systematically named. We are literally mixing up the features and their corresponding labels.…

Author: Awais Farooq