...

/

Installing spaCy's Statistical Models

Installing spaCy's Statistical Models

Let's learn how we can install statistical models locally.

We'll cover the following...

Overview

The spaCy installation doesn't come with the statistical language models needed for the spaCy pipeline tasks. spaCy language models contain knowledge about a specific language collected from a set of resources. Language models let us perform a variety of NLP tasks, including POS tagging and named-entity recognition (NER).

Different languages have different models and are language specific. There are also different models available for the same language. We'll see the differences between those models in detail in the Pro tip at the end of this section, but basically, the training data is different. The underlying statistical algorithm is the same. Some of the currently supported languages are as follows:

LANGUAGE

CODE

LANGUAGE DATA

MODELS

Chinese

zh

lang/zh </>

3 models

Danish

da

lang/da </>

3 models

Dutch

nl

lang/nl </>

3 models

English

en

lang/en </>

3 models

French

fr

lang/fr </>

3 models

German

de

lang/de </>

3 models

Greek

el

lang/el </>

3 models

Italian

it

lang/it </>

3 models

Japanese

ja

lang/ja </>

3 models

Lithuanian

lt

lang/lt </>

3 models

Multi-Language

xx

lang/xx </>

3 models

Norwegian Bokmal

nb

lang/nb </>

3 models

Polish

pl

lang/pl </>

3 models

Portugese

pt

lang/pt </>

3 models

Romanian

ro

lang/ro </>

3 models

Spanish

es

lang/es </>

3 models

Several pre-trained models are available for different languages. For English, the following models are available for download: en_core_web_sm, en_core_web_md, and en_ ...