Author: Awais Farooq

  • Involutional neural networks

    Introduction Convolution has been the basis of most modern neural networks for computer vision. A convolution kernel is spatial-agnostic and channel-specific. Because of this, it isn’t able to adapt to different visual patterns with respect to different spatial locations. Along with location-related problems, the receptive field of convolution creates challenges with regard to capturing long-range…

  • Image classification with EANet (External Attention Transformer)

    Introduction This example implements the EANet model for image classification, and demonstrates it on the CIFAR-100 dataset. EANet introduces a novel attention mechanism named external attention, based on two external, small, learnable, and shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers. It conveniently replaces self-attention as used in…

  • Image classification with ConvMixer

    Introduction Vision Transformers (ViT; Dosovitskiy et al.) extract small patches from the input images, linearly project them, and then apply the Transformer (Vaswani et al.) blocks. The application of ViTs to image recognition tasks is quickly becoming a promising area of research, because ViTs eliminate the need to have strong inductive biases (such as convolutions) for…

  • Compact Convolutional Transformers

    As discussed in the Vision Transformers (ViT) paper, a Transformer-based architecture for vision typically requires a larger dataset than usual, as well as a longer pre-training schedule. ImageNet-1k (which has about a million images) is considered to fall under the medium-sized data regime with respect to ViTs. This is primarily because, unlike CNNs, ViTs (or a typical Transformer-based architecture)…

  • Pneumonia Classification on TPU

    Introduction + Set-up This tutorial will explain how to build an X-ray image classification model to predict whether an X-ray scan shows presence of pneumonia. We need a Google Cloud link to our data to load the data using a TPU. Below, we define key configuration parameters we’ll use in this example. To run on…

  • MobileViT: A mobile-friendly Transformer-based model for image classification

    Introduction In this example, we implement the MobileViT architecture (Mehta et al.), which combines the benefits of Transformers (Vaswani et al.) and convolutions. With Transformers, we can capture long-range dependencies that result in global representations. With convolutions, we can capture spatial relationships that model locality. Besides combining the properties of Transformers and convolutions, the authors…

  • Image classification with modern MLP models

    Introduction This example implements three modern attention-free, multi-layer perceptron (MLP) based models for image classification, demonstrated on the CIFAR-100 dataset: The purpose of the example is not to compare between these models, as they might perform differently on different datasets with well-tuned hyperparameters. Rather, it is to show simple implementations of their main building blocks.…

  • Classification using Attention-based Deep Multiple Instance Learning (MIL).

    duction What is Multiple Instance Learning (MIL)? Usually, with supervised learning algorithms, the learner receives labels for a set of instances. In the case of MIL, the learner receives labels for a set of bags, each of which contains a set of instances. The bag is labeled positive if it contains at least one positive…

  • Image classification with Vision Transformer

    Introduction This example implements the Vision Transformer (ViT) model by Alexey Dosovitskiy et al. for image classification, and demonstrates it on the CIFAR-100 dataset. The ViT model applies the Transformer architecture with self-attention to sequences of image patches, without using convolution layers. Setup Prepare the data Configure the hyperparameters Use data augmentation Implement multilayer perceptron (MLP) Implement…

  • Image classification via fine-tuning with EfficientNet

    Introduction: what is EfficientNet EfficientNet, first introduced in Tan and Le, 2019 is among the most efficient models (i.e. requiring least FLOPS for inference) that reaches State-of-the-Art accuracy on both imagenet and common image classification transfer learning tasks. The smallest base model is similar to MnasNet, which reached near-SOTA with a significantly smaller model. By introducing a heuristic…