...

/

Solution: Text Classification with spaCy

Solution: Text Classification with spaCy

Let's look at the solution to the previous exercise.

We'll cover the following...

Solution

Here is a possible solution to the problem of detecting fake and real news articles:

Press + to interact
import random
import spacy
from spacy.training import Example
from spacy.pipeline.textcat import DEFAULT_SINGLE_TEXTCAT_MODEL
nlp = spacy.load("en_core_web_md")
# Define the training data
train_data = [
("Biden administration announces new COVID-19 vaccine mandate for federal workers", {"cats": {"REAL": 1, "FAKE": 0}}),
("Scientists discover new species of dinosaur in Argentina", {"cats": {"REAL": 1, "FAKE": 0}}),
("Stock market reaches record high after positive economic data", {"cats": {"REAL": 1, "FAKE": 0}}),
("COVID-19 vaccine causes infertility, clickbait article shows", {"cats": {"REAL": 0, "FAKE": 1}}),
("Donald Trump declares martial law and orders military takeover", {"cats": {"REAL": 0, "FAKE": 1}}),
("Hillary Clinton linked to human trafficking ring", {"cats": {"REAL": 0, "FAKE": 1}})
]
config = {
"threshold": 0.5,
"model": DEFAULT_SINGLE_TEXTCAT_MODEL
}
textcat = nlp.add_pipe("textcat", config=config)
textcat.add_label("REAL")
textcat.add_label("FAKE")
train_examples = [Example.from_dict(nlp.make_doc(text), label) for text,label in train_data]
textcat.initialize(lambda: train_examples, nlp=nlp)
epochs=20
with nlp.select_pipes(enable="textcat"):
optimizer = nlp.resume_training()
for i in range(epochs):
random.shuffle(train_data)
for text, label in train_data:
doc = nlp.make_doc(text)
example = Example.from_dict(doc, label)
nlp.update([example], sgd=optimizer)
# Test the model
texts = [
"According to a clickbait article COVID-19 vaccine causes infertility",
"Covid-19 vaccine mandatory at work places"
]
for text in texts:
doc = nlp(text)
print(text, doc.cats)

Solution explanation

  • Lines 1–4: After importing the required libraries, we import TextCategorizer from the pipeline components.

  • Line 6: The pre-trained ...