Installing spaCy's Statistical Models
Explore how to install spaCy's statistical language models needed for NLP tasks such as part-of-speech tagging and named entity recognition. Understand different model sizes, naming conventions, and various installation methods including pip and spaCy's download commands to effectively integrate language models into your projects.
We'll cover the following...
Overview
The spaCy installation doesn't come with the statistical language models needed for the spaCy pipeline tasks. spaCy language models contain knowledge about a specific language collected from a set of resources. Language models let us perform a variety of NLP tasks, including POS tagging and named-entity recognition (NER).
Different languages have different models and are language specific. There are also different models available for the same language. We'll see the differences between those models in detail in the Pro tip at the end of this section, but basically, the training data is different. The underlying statistical algorithm is the same. Some of the currently supported languages are as follows:
LANGUAGE | CODE | LANGUAGE DATA | MODELS |
Chinese | zh | lang/zh </> | 3 models |
Danish | da | lang/da </> | 3 models |
Dutch | nl | lang/nl </> | 3 models |
English | en | lang/en </> | 3 models |
French | fr | lang/fr </> | 3 models |
German | de | lang/de </> | 3 models |
Greek | el | lang/el </> | 3 models |
Italian | it | lang/it </> | 3 models |
Japanese | ja | lang/ja </> | 3 models |
Lithuanian | lt | lang/lt </> | 3 models |
Multi-Language | xx | lang/xx </> | 3 models |
Norwegian Bokmal | nb | lang/nb </> | 3 models |
Polish | pl | lang/pl </> | 3 models |
Portugese | pt | lang/pt </> | 3 models |
Romanian | ro | lang/ro </> | 3 models |
Spanish | es | lang/es </> | 3 models |
Several pre-trained models are available for different languages. For English, the following models are available for download: en_core_web_sm, en_core_web_md, and en_ ...