Learn to answer questions automatically using the Hugging Face Inference API.

We’ve all likely encountered paragraph comprehension problems, possibly with paragraphs like the one below.

Passage:"The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is within Brazil, with 60% of the rainforest, followed by Peru with 13%…"

Question: Which country has the lion’s share in the Amazon rainforest?

These problems are often convoluted, requiring a more profound look before answering.

It’s startling how transformers have enabled our AI to precisely automate paragraph comprehension. It can be conducive to tasks like curating documents to find particular information, making a conversational chatbot, and so on. Inference requires supplying both query and the context.

Answering questions using the API

The deepset/minilm-uncased-squad2 model is recommended for question-answering tasks. However, there are many models available for this task, and some common models are below:

Models for Answering Questions




Based on microsoft/MiniLM-L12-H384-uncased and fine-tuned on the SQuAD2.0 dataset by the deepset team for question-answer of specific industry language


Based on bert-large-uncased and fine-tuned on the SQuAD dataset. The bert models were trained on large datasets of the English language. Model finds answers from the text using the masking technique.


Based on roberta-base and fine-tuned on the SQuAD2.0 dataset. Designed for the question-answer tasks of the English language.


Built by the Hugging Face team and based on distilbert. Fine-tuned on the SQuAD1.1 dataset for question-answer tasks.

We can call the following endpoint via the POST request method by replacing the path parameter {model} with any model mentioned above:{model}

Request parameters

The request parameters for this API call are as follows:








Specifies question to get answer from the context




Specifies the context that will be searched to find the answer to the question




Hugging Face Inference API has a cache mechanism implemented to speed up the requests. Use it for the deterministic models. Default value is true.




Hugging Face Inference API models takes time to initiate and process the requests. If the value is true, it waits for the model to get ready instead of returning an error. Default value is false.

The code below finds the answer to the question from the context text.

// Endpoint URL
const endpointUrl = "";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
passage = "THE TIME TRAVELLER (for so it will be convenient to speak of him) \
was expounding a recondite matter to us. \
His grey eyes shone and twinkled, and his usually pale face was flushed \
and animated. The fire burned brightly, and the soft radiance of the \
incandescent lights in the lilies of silver caught the bubbles \
that flashed and passed in our glasses. \
Our chairs, being his patents, embraced and caressed us rather than submitted \
to be sat upon, and there was that luxurious after-dinner atmosphere \
when thought runs gracefully free of the trammels of precision. \
And he put it to us in this way—marking the points with a lean forefinger—as \
we sat and lazily admired his earnestness over this new \
paradox (as we thought it) and his fecundity.";
query = "What was the colour of the time traveller's eyes?";
// Input text to classify
const data = JSON.stringify({
"inputs": {
"question": query,
"context": passage
options: {
wait_for_model: true
const options = {
method: "POST",
headers: headerParameters,
body: data
async function questionAnswer() {
try {
const response = await fetch(endpointUrl, options);
} catch (error) {

Let’s have a look at the highlighted lines shown in the code widget above:

  • Line 2: We specify the deepset/minilm-uncased-squad2 model for the question-answer task.

  • Lines 24–32: We set the inputs.question with query and inputs.context with passage declared above.

  • Lines 40–47: We create a function, questionAnswer, to make the API call and handle the exceptions.

  • Line 49: We call the questionAnswer function to invoke the endpoint.

Response fields

The API call above returns a dictionary object or a list of dictionary objects, depending on the inputs. The response contains the following fields.






Answer of the question from the context



Specifies the likelihood of the answer



Starting index of the answer within the context



Ending index of the answer within the context


Now, ask the model the following questions in the widget above for the same passage:

// query#1
query = "Describe the face of the time traveller.";
// query#2
query = "What was our reaction to the time traveller?";