Build AI Chatbots with Open-Source LLMs, LangChain, and Streamlit/

...

Exploring Open-Source LLMs

Explore the difference between open-source and closed-source LLM models.

We'll cover the following...

Introduction to AI democratization
Closed-source vs. open-source LLMs
- Cost implications of LLM adoption
Platforms driven by open-source LLMs
- Hugging Face platform
- Kaggle platform
Challenges and considerations

The concept of AI democratization represents a shift in the landscape of technological innovation by making advanced AI technologies available to a wider base of users beyond the large tech corporations and specialized research institutions. This movement is essential in the realm of large language models because of the complexity and resource requirements of developing and training such large models. AI democratization seeks to empower individuals, startups, and academic researchers with the tools they need to explore, innovate, and contribute to the field of AI.

Open-source LLMs are at the heart of this movement serving as a catalyst for innovation and collaboration. By providing access to pretrained models and the source code, open-source initiatives encourage an environment where knowledge and resources are shared freely. This approach not only reduces the financial obstacles associated with AI development and training but also promotes transparency in AI research and applications. As a result, open-source LLMs facilitate AI advancements in many sectors, such as healthcare, education, and environmental science.

Open source refers to the practice of sharing the original source code of software with the public, allowing anyone to inspect the code, modify it, or enhance it. This is essential for collaboration purposes, whereby collaborative development can lead to more reliable, secure, and efficient products. What makes an open-source open is the license under which it is released, which dictates the usage, permissions, and distribution rights. Building open-source software or models means that resources such as the source code, model architecture, and documentation are available freely to the public. The open-source initiative emerged as a countermeasure to the constraints of proprietary software, encouraging freedom in the development and use of software and technology. This has led to the creation of communities and organizations that support open-source initiatives, such as the Apache Software Foundation.

On the other hand, closed-source LLMs are propriety models that can be accessed through online platforms or APIs. Neither the source code is shared with the wider audience, nor are the algorithms used revealed, nor is the training data that was utilized detailed. Closed-source LLMs are paid models either per token or as subscription-based models.

LLM Models

LLM	Released	Maintainer	License	Accessible via	Architecture	Params (Billions)	Token length
AutoGPT	01/03/2023	OpenAI	MIT	GitHub	Encoder - Decoder	175 -> 1000	8,192
BERT	01/10/2018	Google	Apache 2.0	Google Cloud	Encoder	340	512
BLOOMChat	01/05/2023	SambaNova & Together Computer	BLOOMChat-176B LICENSE v1.0	Hugging Face	Decoder	176	NA
Cerebras-GPT	01/03/2023	Cerebras	Apache 2.0	Hugging Face	Decoder	0.111 - 13	2,048
Claude	01/03/2023	Anthropic	N/A	Anthropic	NA	NA	100,000 tokens
DLite (v2)	01/05/2023	AI Squared	Apache 2.0	GitHub, Hugging Face	NA	0.124 - 1.5	1,024
Dolly 2.0	01/04/2023	Databricks	Apache 2.0	Hugging Face	NA	NA	2,048
Falcon-40B	01/05/2023	Technology Innovation Institute (TII)	TII Falcon LLM License	Hugging Face	Decoder	40	2,048
Falcon-180B	01/09/2023	Technology Innovation Institute (TII)	FALCON 180B TII License	Hugging Face	Decoder	180	3,500
FastChat-T5	01/04/2023	LMSYS	Apache 2.0	GitHub, Hugging Face	NA	3	512
FinLLM	01/06/2023	AI4Finance Foundation	MIT	GitHub(FinGPT) & GitHub(FinNLP)	NA	NA	NA
GPT-3.5-Turbo	01/08/2023	OpenAI	No	OpenAI API	NA	154	4,096
GPT-J-6B	01/06/2023	EleutherAI	MIT	Hugging Face	NA	6	2,048
GPT2	01/02/2019	OpenAI	MIT	GitHub, Hugging Face	Decoder	0.117 - 1.542	1,024
GPT3	01/05/2020	OpenAI	No	OpenAI API	Decoder	175	4,096
GPT4	01/03/2023	OpenAI	No	OpenAI API	Decoder	> 1000	8,192
GPT4All-J	01/06/2023	Nomic AI	Apache 2.0	Hugging Face	NA	6	NA
h2OGPT	01/05/2023	h2o.ai	Apache 2.0	GitHub, ChatBot (Hugging Face)	NA	NA	256 & 2,048
Llama	01/02/2023	Meta	GPL 3	Meta AI	Decoder	NA	2,048
Llama-2	01/07/2023	Meta	LLAMA Community License	Meta AI	NA	7B, 13B, 70B	4,096
Megatron-LM	01/10/2019	NVIDIA	Megatron-LM	GitHub, Hugging Face	NA	8.3	1,024
MPT-7B	01/05/2023	MosaicML	Apache 2.0	Hugging Face	NA	6.7	65,000
OpenLLaMA	01/05/2023	UC Berkeley	Apache 2.0	GitHub, Hugging Face	Decoder	3B, 7B, 13B	2,048
Palmyra Base	01/01/2023	Writer	Apache 2.0	Hugging Face	Decoder	5	2,048
Pythia	01/04/2023	EleutherAI	Apache 2.0	GitHub, Hugging Face	Decoder	0.07 - 12	2,048
RedPajama-INCITE	01/05/2023	together.ai	Apache 2.0	Hugging Face	NA	3B, 7B	NA
RoBERTa	01/10/2019	Meta	GNU General Public License v2.0	Hugging Face	Encoder	0.125	512
StableLM	01/04/2023	Stability AI	CC BY-SA-4.0 / Code released under Apache 2.0	GitHub, Hugging Face	NA	NA	4,096
T5	01/10/2019	Google	Apache 2.0	GitHub, Hugging Face	Encoder - Decoder	11	512
UL2	01/10/2022	Google	Apache 2.0	GitHub, Hugging Face	Encoder	20	512 & 2,048
Vicuna-13B	30/03/2023	LMSYS	GNU General Public License v3.0 / Code released under Apache 2.0	LMSYS Org, ChatBot (Hugging Face), GitHub	NA	13	2,048

It is not a secret today that commercial LLM models, or in other words, closed-source models, are dominating the market. Only a handful of open-source models reach somewhere close to the quality of the commercial models, such as GPT models by OpenAI. Just look at the number of closed-source models and their performance in the market. However, the open-source community is trying hard to bridge this gap between open-source and closed-source models through projects that aggregate resources, such as datasets and online computing power, to train models that can compete with their commercial counterparts in terms of quality.

At first glance, these closed-source models seem cheap, costing not more than 0.001 dollars per token. However, as soon as we scale our applications in production, and as soon as we start having thousands of users, we realize that the cost of these models is huge.

Let’s find out the cost of utilizing closed-source models by calculating the number of users, the cost per token, and the pattern of usage per day. For example, we can take the cost per token for one of the premium models. On average, the input costs around 0.005 dollars per 1,000 tokens, and the output costs ...

Access this course and 1400+ top-rated courses and projects.

Preview Free Lessons→

Preview Free Lessons

Introduction to Building Chatbots

Understanding Transformers

Understanding Large Language Models (LLMs)

Data Collection and Preparation

Optimizing RAG Workflows with LangChain

Prompt Engineering and Retrieval Chains

Chatbot User Interface Development with Streamlit

Chatbot Integration and Evaluation

Capstone Project

Conclusion and Future Developments

Exploring Open-Source LLMs

Introduction to AI democratization

Closed-source vs. open-source LLMs

LLM Models

Cost implications of LLM adoption