Author: Awais Farooq
-
California Housing price regression dataset
load_data function Loads the California Housing dataset. This dataset was obtained from the StatLib repository. It’s a continuous regression dataset with 20,640 samples with 8 features each. The target variable is a scalar: the median house value for California districts, in dollars. The 8 input features are the following: This dataset was derived from the 1990 U.S.…
-
Fashion MNIST dataset, an alternative to MNIST
load_data function Loads the Fashion-MNIST dataset. This is a dataset of 60,000 28×28 grayscale images of 10 fashion categories, along with a test set of 10,000 images. This dataset can be used as a drop-in replacement for MNIST. The classes are: Label Description 0 T-shirt/top 1 Trouser 2 Pullover 3 Dress 4 Coat 5 Sandal 6…
-
Reuters newswire classification dataset
load_data function Loads the Reuters newswire classification dataset. This is a dataset of 11,228 newswires from Reuters, labeled over 46 topics. This was originally generated by parsing and preprocessing the classic Reuters-21578 dataset, but the preprocessing code is no longer packaged with Keras. See this GitHub discussion for more info. Each newswire is encoded as a list of…
-
IMDB movie review sentiment classification dataset
load_data function Loads the IMDB dataset. This is a dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a list of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer “3” encodes the 3rd…
-
CIFAR100 small images classification dataset
load_data function Loads the CIFAR100 dataset. This is a dataset of 50,000 32×32 color training images and 10,000 test images, labeled over 100 fine-grained classes that are grouped into 20 coarse-grained classes. See more info at the CIFAR homepage. Arguments Returns x_train: uint8 NumPy array of grayscale image data with shapes (50000, 32, 32, 3), containing the training data. Pixel…
-
CIFAR10 small images classification dataset
load_data function Loads the CIFAR10 dataset. This is a dataset of 50,000 32×32 color training images and 10,000 test images, labeled over 10 categories. See more info at the CIFAR homepage. The classes are: Label Description 0 airplane 1 automobile 2 bird 3 cat 4 deer 5 dog 6 frog 7 horse 8 ship 9 truck Returns…
-
MNIST digits classification dataset
load_data function Loads the MNIST dataset. This is a dataset of 60,000 28×28 grayscale images of the 10 digits, along with a test set of 10,000 images. More info can be found at the MNIST homepage. Arguments Returns x_train: uint8 NumPy array of grayscale image data with shapes (60000, 28, 28), containing the training data. Pixel values range from 0…
-
Audio data loading
audio_dataset_from_directory function Generates a tf.data.Dataset from audio files in a directory. If your directory structure is: Then calling audio_dataset_from_directory(main_directory, labels=’inferred’) will return a tf.data.Dataset that yields batches of audio files from the subdirectories class_a and class_b, together with labels 0 and 1 (0 corresponding to class_a and 1 corresponding to class_b). Only .wav files are supported at this time. Arguments Returns A tf.data.Dataset object. Rules regarding labels format:
-
Text data loading
text_dataset_from_directory function Generates a tf.data.Dataset from text files in a directory. If your directory structure is: Then calling text_dataset_from_directory(main_directory, labels=’inferred’) will return a tf.data.Dataset that yields batches of texts from the subdirectories class_a and class_b, together with labels 0 and 1 (0 corresponding to class_a and 1 corresponding to class_b). Only .txt files are supported at this time. Arguments Returns A tf.data.Dataset object. Rules regarding labels format:
-
Timeseries data loading
timeseries_dataset_from_array function Creates a dataset of sliding windows over a timeseries provided as array. This function takes in a sequence of data-points gathered at equal intervals, along with time series parameters such as length of the sequences/windows, spacing between two sequence/windows, etc., to produce batches of timeseries inputs and targets. Arguments Returns A tf.data.Dataset instance. If targets was passed, the…