Solution: Text Classification with spaCy
Let's look at the solution to the previous exercise.
We'll cover the following...
Solution
Here is a possible solution to the problem of detecting fake and real news articles:
Press + to interact
import randomimport spacyfrom spacy.training import Examplefrom spacy.pipeline.textcat import DEFAULT_SINGLE_TEXTCAT_MODELnlp = spacy.load("en_core_web_md")# Define the training datatrain_data = [("Biden administration announces new COVID-19 vaccine mandate for federal workers", {"cats": {"REAL": 1, "FAKE": 0}}),("Scientists discover new species of dinosaur in Argentina", {"cats": {"REAL": 1, "FAKE": 0}}),("Stock market reaches record high after positive economic data", {"cats": {"REAL": 1, "FAKE": 0}}),("COVID-19 vaccine causes infertility, clickbait article shows", {"cats": {"REAL": 0, "FAKE": 1}}),("Donald Trump declares martial law and orders military takeover", {"cats": {"REAL": 0, "FAKE": 1}}),("Hillary Clinton linked to human trafficking ring", {"cats": {"REAL": 0, "FAKE": 1}})]config = {"threshold": 0.5,"model": DEFAULT_SINGLE_TEXTCAT_MODEL}textcat = nlp.add_pipe("textcat", config=config)textcat.add_label("REAL")textcat.add_label("FAKE")train_examples = [Example.from_dict(nlp.make_doc(text), label) for text,label in train_data]textcat.initialize(lambda: train_examples, nlp=nlp)epochs=20with nlp.select_pipes(enable="textcat"):optimizer = nlp.resume_training()for i in range(epochs):random.shuffle(train_data)for text, label in train_data:doc = nlp.make_doc(text)example = Example.from_dict(doc, label)nlp.update([example], sgd=optimizer)# Test the modeltexts = ["According to a clickbait article COVID-19 vaccine causes infertility","Covid-19 vaccine mandatory at work places"]for text in texts:doc = nlp(text)print(text, doc.cats)
Solution explanation
Lines 1–4: After importing the required libraries, we import
TextCategorizer
from the pipeline components.Line 6: The pre-trained ...