Category: 08. Image & Text

  • Natural language image search with a Dual Encoder

    Introduction The example demonstrates how to build a dual encoder (also known as two-tower) neural network model to search for images using natural language. The model is inspired by the CLIP approach, introduced by Alec Radford et al. The idea is to train a vision encoder and a text encoder jointly to project the representation of images…

  • Image Captioning

    Setup Download the dataset We will be using the Flickr8K dataset for this tutorial. This dataset comprises over 8,000 images, that are each paired with five different captions. Preparing the dataset Vectorizing the text data We’ll use the TextVectorization layer to vectorize the text data, that is to say, to turn the original strings into integer sequences…