text-embedding-ada-002 vs OpenAI's older embedding models

The text-embedding-ada-002 model represents a significant leap in OpenAI’s embedding technology. It outperforms all the old embedding models on text search, code search, and sentence similarity tasks and achieves comparable performance on text classification. The new model’s performance score of 53.3 surpasses the older models, ranging from 49.0 to 52.8.

One of the standout features of text-embedding-ada-002 is the unification of capabilities. It replaces five separate models that simplify the interface and perform better across diverse benchmarks. The context length has been increased by a factor of four, from 2048 to 8192, and the new embeddings have only 1536 dimensions, one-eighth the size of davinci-001 embeddings.

Moreover, the price of the new embedding models has been reduced by 90%, achieving better or similar performance at a 99.8% lower price. However, it’s worth noting that the new model does not outperform text-similarity-davinci-001 on certain benchmarks, so for specific tasks, a comparison with this older model might be necessary.

Older embedding models

Different models for different use cases characterized OpenAI’s older embedding models. There were three families of embedding models:

Text similarity
Text search
Code search

Each family was designed to capture specific aspects of semantic relationships, enabling applications like astronomical reports analysis, textbook content finding, and customer call transcripts tagging.

These older models achieved top performance in benchmarks like SentEval, BEIR, and CodeSearchNet. However, they were more complex, with various models catering to different functionalities, and they were also more expensive compared to the new model.

Querying the new model

Here’s a simple example of how you can query the new text-embedding-ada-002 model using Python:

Note: This code will only be executable when you enter your API key. To learn how to obtain OpenAI's API key, click here.

Conclusion

The introduction of text-embedding-ada-002 marks a considerable advancement in OpenAI’s embedding models. With its improved performance, unified capabilities, longer context, smaller size, and reduced price, it offers a more powerful and cost-effective solution for various natural language processing and code tasks. The older models, while still valuable, are overshadowed by the new model’s efficiency and versatility.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

Feature / Model	`text-embedding-ada-002`	Older OpenAI Embedding Models
Model Architecture	Transformer-based	Various (e.g., LSTM, CNN)
Pretraining Data	Diverse and large-scale data	Smaller or domain-specific data
Embedding Dimension	512	Varies (e.g., 128, 256)
Supported Languages	Multiple languages	Often English-only
Fine-Tuning Capability	Yes	Limited or No
Use Cases	General-purpose embeddings	Specific tasks or domains
Performance	Improved accuracy & robustness	Varies based on model
Availability	OpenAI API	OpenAI API