Building a Chatbot Using Small Language Model (SLM)

Learn how to run a small language model with Ollama to power a Gradio-based chatbot.

Looking back at our original goal of creating an educational chatbot, doing so with a framework like Rasa seems like a challenging task (it is!). Thanks to generative AI, large language models (LLMs) Large language models are artificial intelligence systems that can understand and generate human language. are now much more capable.

Generative AI for chatbots

Generative AI is a type of artificial intelligence that can generate new content, such as text, images, or audio. It works by learning from vast amounts of data and then using that knowledge to create something new. Generative AI has played a significant role in advancing language models. The advent of larger and more capable language models has drastically changed how we create chatbots.

  • Improved natural language understanding: Generative AI models can better understand and interpret human language, allowing chatbots to provide more accurate and relevant responses.

  • Content creation: Generative AI can produce creative and informative responses, making conversations more engaging and interesting.

  • Learning and adaptation: Generative AI models can continuously learn from interactions, improving their ability to provide relevant and helpful information over time.

Adding generative AI to chatbots allows for more natural, engaging, and personalized interactions. We have talked a lot about how generative AI is set to change the world; let’s see how we can use it.

Running a language model

LLMs, a specialized subset of generative AI, are our primary focus for chatbot development. These come in various shapes and sizes and are often customized for different applications. While LLMs with hundreds of billions of parameters might not be easy to run on a home computer, we can easily run Small Language Models (SLMs) on consumer hardware. Since we were running Rasa on our own virtual machine, let’s compare our experience with an SLM running locally.

Running locally refers to running the model on a virtual machine provided by Educative.

There are a few very small language models that can run on less than 8 GB of RAM. 7 billion parameters is a common small size for most models; however, we can go even smaller. The model we will be using has just half a billion parameters. Running a language model might seem like a complex and tedious task; however, thanks to Ollama, it can be done with just a single terminal command.

Ollama is a lightweight open-source framework for running LLMs locally on your machine. It allows you to run, build, and experiment with various models. Model weights, configurations, and datasets are packaged into a Modelfile, which allows us to easily access and use models. It provides a REST API that can be used to serve the models.

Using Qwen2 with Ollama

Ollama is a framework for running language models. It can be installed on Linux machines with this command:

Get hands-on with 1200+ tech skills courses.