KerasNLP Models

KerasNLP contains end-to-end implementations of popular model architectures. These models can be created in two ways:

  • Through the from_preset() constructor, which instantiates an object with a pre-trained configurations, vocabularies, and (optionally) weights.
  • Through custom configuration controlled by the user.

Below, we list all presets available in the library. For more detailed usage, browse the docstring for a particular class. For an in depth introduction to our API, see the getting started guide.

Backbone presets

The following preset names correspond to a configuration, weights and vocabulary for a model backbone. These presets are not inference-ready, and must be fine-tuned for a given task!

The names below can be used with any from_preset() constructor for a given model.

classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased")
backbone = keras_nlp.models.BertBackbone.from_preset("bert_tiny_en_uncased")
tokenizer = keras_nlp.models.BertTokenizer.from_preset("bert_tiny_en_uncased")
preprocessor = keras_nlp.models.BertPreprocessor.from_preset("bert_tiny_en_uncased")
Preset nameModelParametersDescription
albert_base_en_uncasedALBERT11.68M12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
albert_large_en_uncasedALBERT17.68M24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
albert_extra_large_en_uncasedALBERT58.72M24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
albert_extra_extra_large_en_uncasedALBERT222.60M12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bart_base_enBART139.42M6-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl. Model Card
bart_large_enBART406.29M12-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl. Model Card
bart_large_en_cnnBART406.29MThe bart_large_en backbone model fine-tuned on the CNN+DM summarization dataset. Model Card
bert_tiny_en_uncasedBERT4.39M2-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_small_en_uncasedBERT28.76M4-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_medium_en_uncasedBERT41.37M8-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_base_en_uncasedBERT109.48M12-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_base_enBERT108.31M12-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus. Model Card
bert_base_zhBERT102.27M12-layer BERT model. Trained on Chinese Wikipedia. Model Card
bert_base_multiBERT177.85M12-layer BERT model where case is maintained. Trained on trained on Wikipedias of 104 languages Model Card
bert_large_en_uncasedBERT335.14M24-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_large_enBERT333.58M24-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus. Model Card
bloom_560m_multiBLOOM559.21M24-layer Bloom model with hidden dimension of 1024. trained on 45 natural languages and 12 programming languages. Model Card
bloom_1.1b_multiBLOOM1.07B24-layer Bloom model with hidden dimension of 1536. trained on 45 natural languages and 12 programming languages. Model Card
bloom_1.7b_multiBLOOM1.72B24-layer Bloom model with hidden dimension of 2048. trained on 45 natural languages and 12 programming languages. Model Card
bloom_3b_multiBLOOM3.00B30-layer Bloom model with hidden dimension of 2560. trained on 45 natural languages and 12 programming languages. Model Card
bloomz_560m_multiBLOOMZ559.21M24-layer Bloom model with hidden dimension of 1024. finetuned on crosslingual task mixture (xP3) dataset. Model Card
bloomz_1.1b_multiBLOOMZ1.07B24-layer Bloom model with hidden dimension of 1536. finetuned on crosslingual task mixture (xP3) dataset. Model Card
bloomz_1.7b_multiBLOOMZ1.72B24-layer Bloom model with hidden dimension of 2048. finetuned on crosslingual task mixture (xP3) dataset. Model Card
bloomz_3b_multiBLOOMZ3.00B30-layer Bloom model with hidden dimension of 2560. finetuned on crosslingual task mixture (xP3) dataset. Model Card
deberta_v3_extra_small_enDeBERTaV370.68M12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card
deberta_v3_small_enDeBERTaV3141.30M6-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card
deberta_v3_base_enDeBERTaV3183.83M12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card
deberta_v3_large_enDeBERTaV3434.01M24-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card
deberta_v3_base_multiDeBERTaV3278.22M12-layer DeBERTaV3 model where case is maintained. Trained on the 2.5TB multilingual CC100 dataset. Model Card
distil_bert_base_en_uncasedDistilBERT66.36M6-layer DistilBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model. Model Card
distil_bert_base_enDistilBERT65.19M6-layer DistilBERT model where case is maintained. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model. Model Card
distil_bert_base_multiDistilBERT134.73M6-layer DistilBERT model where case is maintained. Trained on Wikipedias of 104 languages Model Card
electra_small_discriminator_uncased_enELECTRA13.55M12-layer small ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_small_generator_uncased_enELECTRA13.55M12-layer small ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_base_discriminator_uncased_enELECTRA109.48M12-layer base ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_base_generator_uncased_enELECTRA33.58M12-layer base ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_large_discriminator_uncased_enELECTRA335.14M24-layer large ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_large_generator_uncased_enELECTRA51.07M24-layer large ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
f_net_base_enFNet82.86M12-layer FNet model where case is maintained. Trained on the C4 dataset. Model Card
f_net_large_enFNet236.95M24-layer FNet model where case is maintained. Trained on the C4 dataset. Model Card
falcon_refinedweb_1b_enFalcon1.31B24-layer Falcon model (Falcon with 1B parameters), trained on 350B tokens of RefinedWeb dataset. Model Card
gemma_2b_enGemma2.51B2 billion parameter, 18-layer, base Gemma model. Model Card
gemma_instruct_2b_enGemma2.51B2 billion parameter, 18-layer, instruction tuned Gemma model. Model Card
gemma_1.1_instruct_2b_enGemma2.51B2 billion parameter, 18-layer, instruction tuned Gemma model. The 1.1 update improves model quality. Model Card
code_gemma_1.1_2b_enGemma2.51B2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. The 1.1 update improves model quality. Model Card
code_gemma_2b_enGemma2.51B2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. Model Card
gemma_7b_enGemma8.54B7 billion parameter, 28-layer, base Gemma model. Model Card
gemma_instruct_7b_enGemma8.54B7 billion parameter, 28-layer, instruction tuned Gemma model. Model Card
gemma_1.1_instruct_7b_enGemma8.54B7 billion parameter, 28-layer, instruction tuned Gemma model. The 1.1 update improves model quality. Model Card
code_gemma_7b_enGemma8.54B7 billion parameter, 28-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. Model Card
code_gemma_instruct_7b_enGemma8.54B7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code. Model Card
code_gemma_1.1_instruct_7b_enGemma8.54B7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code. The 1.1 update improves model quality. Model Card
gpt2_base_enGPT-2124.44M12-layer GPT-2 model where case is maintained. Trained on WebText. Model Card
gpt2_medium_enGPT-2354.82M24-layer GPT-2 model where case is maintained. Trained on WebText. Model Card
gpt2_large_enGPT-2774.03M36-layer GPT-2 model where case is maintained. Trained on WebText. Model Card
gpt2_extra_large_enGPT-21.56B48-layer GPT-2 model where case is maintained. Trained on WebText. Model Card
gpt2_base_en_cnn_dailymailGPT-2124.44M12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset.
llama3_8b_enLLaMA 38.03BLLaMA 3 8B Base model Model Card
llama3_instruct_8b_enLLaMA 38.03BLLaMA 3 8B Instruct model Model Card
llama2_7b_enLLaMA 26.74BLLaMA 2 7B Base model Model Card
llama2_instruct_7b_enLLaMA 26.74BLLaMA 2 7B Chat model Model Card
mistral_7b_enMistral7.24BMistral 7B base model Model Card
mistral_instruct_7b_enMistral7.24BMistral 7B instruct model Model Card
mistral_0.2_instruct_7b_enMistral7.24BMistral 7B instruct Version 0.2 model Model Card
opt_125m_enOPT125.24M12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card
opt_1.3b_enOPT1.32B24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card
opt_2.7b_enOPT2.70B32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card
opt_6.7b_enOPT6.70B32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card
pali_gemma_3b_mix_224PaliGemma2.92Bimage size 224, mix fine tuned, text sequence length is 256 Model Card
pali_gemma_3b_mix_448PaliGemma2.92Bimage size 448, mix fine tuned, text sequence length is 512 Model Card
pali_gemma_3b_224PaliGemma2.92Bimage size 224, pre trained, text sequence length is 128 Model Card
pali_gemma_3b_448PaliGemma2.92Bimage size 448, pre trained, text sequence length is 512 Model Card
pali_gemma_3b_896PaliGemma2.93Bimage size 896, pre trained, text sequence length is 512 Model Card
phi3_mini_4k_instruct_enPhi-33.82B3.8 billion parameters, 32 layers, 4k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. Model Card
phi3_mini_128k_instruct_enPhi-33.82B3.8 billion parameters, 32 layers, 128k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. Model Card
roberta_base_enRoBERTa124.05M12-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText. Model Card
roberta_large_enRoBERTa354.31M24-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText. Model Card
xlm_roberta_base_multiXLM-RoBERTa277.45M12-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages. Model Card
xlm_roberta_large_multiXLM-RoBERTa558.84M24-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages. Model Card

Note: The links provided will lead to the model card or to the official README, if no model card has been provided by the author.

Classification presets

The following preset names correspond to a configuration, weights and vocabulary for a model classifier. These models are inference ready, but can be further fine-tuned if desired.

The names below can be used with the from_preset() constructor for classifier models and preprocessing layers.

classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased_sst2")
tokenizer = keras_nlp.models.BertTokenizer.from_preset("bert_tiny_en_uncased_sst2")
preprocessor = keras_nlp.models.BertPreprocessor.from_preset("bert_tiny_en_uncased_sst2")
Preset nameModelParametersDescription
bert_tiny_en_uncased_sst2BERT4.39MThe bert_tiny_en_uncased backbone model fine-tuned on the SST-2 sentiment analysis dataset.

Note: The links provided will lead to the model card or to the official README, if no model card has been provided by the author.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *