Hugging Face is an AI community that promotes open source contributions. It is a hub of open source models for Natural Language Processing, computer vision, and other fields where AI plays its role. Even the tech giants like Google, Facebook, AWS, Microsoft, and others use the models, datasets, and libraries.
Hugging Face provides state-of-the-art models for different tasks. It has a vast number of pre-trained models for different tasks. At the time of writing this article (August 2022), there were more than
Hugging Face is famous for its contribution to the NLP domain. The NLP tasks are:
Text classification
Text generation
Translation
Summarization
Fill-mask
Question-Answering
Zero-shot classification
Sentence similarity
The computer vision tasks are as follows:
Image classification
Image segmentation
Object detection
The audio tasks are as follows:
Speech recognition
Text-to-speech
Automatic Speech recognition
Audio classification
In Hugging Face, the Transformers
library, allows us to use these models in a way that abstracts unnecessary details.
There are more than
The Datasets
library by Hugging Face provides us the facility to load these datasets, as well as our own datasets. This library also provides us with the most commonly used operations for processing the datasets. These operations include shuffling, sampling, filtering, etc. With the help of Apache Arrow, this library allows us to work with datasets that are larger than our memory.
Here, we use the Transformers
library to use a pre-trained model to generate predictions for a missing word.
from transformers import pipeline# specifying the pipelinebert_unmasker = pipeline('fill-mask', model="bert-base-uncased")text = "I have to wake up in the morning and [MASK] a doctor"result = bert_unmasker(text)for r in result:print(r)
Line 4: In this line, we use pipeline
to automatically configure a pipeline for our task, which is denoted as fill-mask
. We have specified the use bert-base-uncased
model.
Line 5: The string variable text
will be the input to our pipeline. Notice that we have placed a [MASK]
word in a place where we want our model to generate the actual word.
Line 6: To get the output from the model, we simply need to call the pipeline with the input.
Line 7–8: The output of the pipeline is in the form of a list of suggestions. Here we've used a loop to print them.
As we can see by using pipeline
we've abstracted away a lot of unnecessary details. It is similar for other tasks as well.