Author: Awais Farooq

  • OCR model for reading Captchas

    Introduction This example demonstrates a simple OCR model built with the Functional API. Apart from combining CNN and RNN, it also illustrates how you can instantiate a new layer and use it as an “Endpoint layer” for implementing CTC loss. For a detailed guide to layer subclassing, please check out this page in the developer guides. Setup…

  • Point cloud segmentation with PointNet

    Introduction A “point cloud” is an important type of data structure for storing geometric shape data. Due to its irregular format, it’s often transformed into regular 3D voxel grids or collections of images before being used in deep learning applications, a step which makes the data unnecessarily large. The PointNet family of models solves this…

  • 3D volumetric rendering with NeRF

    Introduction In this example, we present a minimal implementation of the research paper NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis by Ben Mildenhall et. al. The authors have proposed an ingenious way to synthesize novel views of a scene by modelling the volumetric scene function through a neural network. To help you understand this intuitively, let’s start with…

  • Monocular depth estimation

    Introduction Depth estimation is a crucial step towards inferring scene geometry from 2D images. The goal in monocular depth estimation is to predict the depth value of each pixel or inferring depth information, given only a single RGB image as input. This example will show an approach to build a depth estimation model with a convnet and simple…

  • 3D image classification from CT scans

    Introduction This example will show the steps needed to build a 3D convolutional neural network (CNN) to predict the presence of viral pneumonia in computer tomography (CT) scans. 2D CNNs are commonly used to process RGB images (3 channels). A 3D CNN is simply the 3D equivalent: it takes as input a 3D volume or…

  • Object detection with Vision Transformers

    Introduction The article Vision Transformer (ViT) architecture by Alexey Dosovitskiy et al. demonstrates that a pure transformer applied directly to sequences of image patches can perform well on object detection tasks. In this Keras example, we implement an object detection ViT and we train it on the Caltech 101 dataset to detect an airplane in the given image. Imports…

  • Keypoint Detection with Transfer Learning

    Keypoint detection consists of locating key object parts. For example, the key parts of our faces include nose tips, eyebrows, eye corners, and so on. These parts help to represent the underlying object in a feature-rich manner. Keypoint detection has applications that include pose estimation, face detection, etc. In this example, we will build a…

  • Object Detection with RetinaNet

    Introduction Object detection a very important problem in computer vision. Here the model is tasked with localizing the objects present in an image, and at the same time, classifying them into different categories. Object detection models can be broadly classified into “single-stage” and “two-stage” detectors. Two-stage detectors are often more accurate but at the cost…

  • Image Segmentation using Composable Fully-Convolutional Networks

    Introduction The following example walks through the steps to implement Fully-Convolutional Networks for Image Segmentation on the Oxford-IIIT Pets dataset. The model was proposed in the paper, Fully Convolutional Networks for Semantic Segmentation by Long et. al.(2014). Image segmentation is one of the most common and introductory tasks when it comes to Computer Vision, where we…

  • Highly accurate boundaries segmentation using BASNet

    Introduction Deep semantic segmentation algorithms have improved a lot recently, but still fails to correctly predict pixels around object boundaries. In this example we implement Boundary-Aware Segmentation Network (BASNet), using two stage predict and refine architecture, and a hybrid loss it can predict highly accurate boundaries and fine structures for image segmentation. References: Download the Data…