Music Generation Using LSTMs

Learn how to extend a stacked LSTM network for the task of music generation.

Music is a continuous signal, which is a combination of sounds from various instruments and voices. Another characteristic is the presence of structural recurrent patterns, which we pay attention to while listening. In other words, each musical piece has its own characteristic coherence, rhythm, and flow.

To keep things simple and easy to implement, we will focus on a single instrument/monophonic music generation task. Let’s first look at the dataset and think about how we would prepare it for our task of music generation.

Dataset preparation

MIDI is an easy-to-use format that helps us extract a symbolic representation of music contained in files. For the hands-on exercises in this chapter, we will use a subset of the massive public MIDI dataset collected and shared by a Reddit user.

The subset can be found in a zipped folder (midi_dataset.zip) along with the code in this course’s GitHub repositoryhttps://github.com/PacktPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2.git.

The subset is based on classical piano pieces by great musicians such as Beethoven, Bach, Bartok, and the like.

We’ll use music21 to process the subset of this dataset and prepare our data for training the model. As music is a collection of sounds from various instruments and voices/singers, for the purpose of this exercise, we’ll first use the chordify() function to extract chords from the songs. Once we have the list of scores, the next step is to extract notes and their corresponding timing information. For extracting these details, music21 has simple-to-use interfaces such as element.pitch and element.duration.

The following snippet helps us to get a list of MIDI scores in the required format and then extract information like pitch and duration from MIDI files and prepare two parallel lists:

Get hands-on with 1400+ tech skills courses.