This device is not compatible.
You will learn to:
Explore a dataset using Python packages.
Prepare texts for stylometric analysis.
Extract textual features that help establish authorship.
Use Burrows's Delta to compare authors’ writing styles.
Skills
Natural Language Processing
Machine Learning
Data Analysis
Prerequisites
Basic understanding of Python
Intermediate knowledge of pandas
Intermediate knowledge of seaborn
Technologies
NLTK
NumPy
Python
Pandas
Matplotlib
Project Description
In this project, we will explore authorship attribution by analyzing the unique traits in an author’s written works. Our dataset comprises a collection of songs from well-known songwriters and includes song titles, lyrics, and author information. We will develop a model that will accurately attribute authorship to a given text. Such a model can have applications in various fields, such as plagiarism detection, literary analysis, and authorship attribution.
To get started, we will load the dataset and language model that will help us in processing the text. Then, we will preprocess the text to minimize noise and extract linguistic features that can help in identifying an author, for example, word length distribution, word frequency, and word co-occurrences. Next, we will learn to create a training corpus, and use it to attribute authorship to a text using Burrows's Delta.
By the end of this project, we will build a model that can attribute authorship with high accuracy. We will also explore how these techniques can be extended to analyze how an author’s style evolves over time.
Project Tasks
1
Getting Started
Task 0: Introduction
Task 1: Import the Libraries
Task 2: Load the Dataset
2
Authorship Attribution
Task 3: Preprocess Song Lyrics for Analysis
Task 4: Get Word Lengths
Task 5: Get Word Frequencies
Task 6: Get Bigram Frequencies
Task 7: Create a Test and Train Corpora
Task 8: Tokenize Both Corpora and Calculate the Distance
3
Author Evolution
Task 9: Split the Dataset into Early Songs and Last Songs
Task 10: Compare Word Length
Task 11: Compare Frequent Words
Task 12: Compare Lexical Diversity
Task 13: Compare Function Words
Congratulations!
Relevant Courses
Use the following content to review prerequisites or explore specific concepts in detail.