...
/Chunking: Entity and Relationship Extraction with LLMs at Scale
Chunking: Entity and Relationship Extraction with LLMs at Scale
Learn how to implement chunking for entity and relationship extraction for large text.
Understanding the problem: Token limit in LLMs
Tokens are units of text that LLMs use to break down and analyze input. These tokens can represent individual words, sequences of characters, or combinations of words and punctuation. For example, the word "chatbot" would be treated as a single token. A sentence like "OpenAI is amazing!" might be split into multiple tokens depending on how the model processes it. Each large language model has a limit on the maximum number of tokens it can process in a single request.
Let's take the example of GPT-4 (8k version) model that we are using in our codes. Token limit for this model is 8,192 tokens. This number includes input tokens as well as output tokens.
Input tokens include prompts, which consist of system and user-level messages. In our case, the user-level message also includes raw text.
Output tokens include the response generated by the model.
If we send an input prompt that is 5,000 tokens long, the model only has 3,192 tokens left for generating a response (because the sum of input + output cannot exceed the limit). If we don't account for the output token size, the model might truncate the response or not generate a response at all because the total token limit is exceeded. We can either shorten the ...