Author: Awais Farooq

  • Metric learning for image similarity search

    Overview Metric learning aims to train models that can embed inputs into a high-dimensional space such that “similar” inputs, as defined by the training scheme, are located close to each other. These models once trained can produce embeddings for downstream systems where such similarity is useful; examples include as a ranking signal for search or…

  • Image similarity estimation using a Siamese Network with a triplet loss

    Introduction A Siamese Network is a type of network architecture that contains two or more identical subnetworks used to generate feature vectors for each input and compare them. Siamese Networks can be applied to different use cases, like detecting duplicates, finding anomalies, and face recognition. This example uses a Siamese Network with three identical subnetworks. We will…

  • Image similarity estimation using a Siamese Network with a contrastive loss

    Introduction Siamese Networks are neural networks which share weights between two or more sister networks, each producing embedding vectors of its respective inputs. In supervised similarity learning, the networks are then trained to maximize the contrast (distance) between embeddings of inputs of different classes, while minimizing the distance between embeddings of similar classes, resulting in embedding…

  • Semantic Image Clustering

    Introduction This example demonstrates how to apply the Semantic Clustering by Adopting Nearest neighbors (SCAN) algorithm (Van Gansbeke et al., 2020) on the CIFAR-10 dataset. The algorithm consists of two phases: Setup Prepare the data Define hyperparameters Implement data preprocessing The data preprocessing step resizes the input images to the desired target_size and applies feature-wise normalization. Note that, when using keras.applications.ResNet50V2 as the…

  • Near-duplicate image search

    Introduction Fetching similar images in (near) real time is an important use case of information retrieval systems. Some popular products utilizing it include Pinterest, Google Image Search, etc. In this example, we will build a similar image search utility using Locality Sensitive Hashing (LSH) and random projection on top of the image representations computed by a pretrained image classifier.…

  • Grad-CAM class activation visualization

    Setup Configurable parameters You can change these to another model. To get the values for last_conv_layer_name use model.summary() to see the names of all layers in the model. The Grad-CAM algorithm Let’s test-drive it Create a superimposed visualization Let’s try another image We will see how the grad cam explains the model’s outputs for a multi-label image. Let’s try…

  • Investigating Vision Transformer representations

    Introduction In this example, we look into the representations learned by different Vision Transformer (ViT) models. Our main goal with this example is to provide insights into what empowers ViTs to learn from image data. In particular, the example discusses implementations of a few different ViT analysis tools. Note: when we say “Vision Transformer”, we refer…

  • Model interpretability with Integrated Gradients

    Integrated Gradients Integrated Gradients is a technique for attributing a classification model’s prediction to its input features. It is a model interpretability technique: you can use it to visualize the relationship between input features and model predictions. Integrated Gradients is a variation on computing the gradient of the prediction output with regard to features of the…

  • Visualizing what convnets learn

    Introduction In this example, we look into what sort of visual patterns image classification models learn. We’ll be using the ResNet50V2 model, trained on the ImageNet dataset. Our process is simple: we will create input images that maximize the activation of specific filters in a target layer (picked somewhere in the middle of the model: layer conv3_block4_out). Such…

  • Natural language image search with a Dual Encoder

    Introduction The example demonstrates how to build a dual encoder (also known as two-tower) neural network model to search for images using natural language. The model is inspired by the CLIP approach, introduced by Alec Radford et al. The idea is to train a vision encoder and a text encoder jointly to project the representation of images…