Chunking: Entity and Relationship Extraction with LLMs at Scale
Explore strategies to overcome token limits in large language models by splitting large texts into manageable chunks. Understand how to extract entities and relationships efficiently at scale, ensuring context preservation and accurate knowledge graph construction. This lesson guides you through practical chunking implementations to enhance large-scale text processing.
Understanding the problem: Token limit in LLMs
Tokens are units of text that LLMs use to break down and analyze input. These tokens can represent individual words, sequences of characters, or combinations of words and punctuation. For example, the word "chatbot" would be treated as a single token. A sentence like "OpenAI is amazing!" might be split into multiple tokens depending on how the model processes it. Each large language model has a limit on the maximum number of tokens it can process in a single request.
Let's take the example of GPT-4 (8k version) model that we are using in our codes. Token limit for this model is 8,192 tokens. This number includes input tokens as well as output tokens.
Input tokens include prompts, which consist of system and user-level messages. In our case, the user-level message also includes raw text.
Output tokens include the response generated by the model.
If we send an input prompt that is 5,000 tokens long, the model only has 3,192 tokens left for generating a response (because the sum of input + output cannot exceed the limit). If we don't account for the output token size, the model might truncate the response or not generate a response at all because the total token limit is exceeded. We can either shorten the ...