Generative AI with Python and TensorFlow 2/

...

MuseGAN—Polyphonic Music Generation

Learn how to generate polyphonic music using MuseGAN.

We'll cover the following...

Challenges
Solutions
MuseGAN
Generators
Critic
Training and results

The two models we have trained so far have been simplified versions of how music is actually perceived. While limited, both the attention-based LSTM model and the C-RNN-GAN-based model helped us understand the music generation process very well. In this section, we’ll build on what we’ve learned so far and move toward preparing a setup that is as close to the actual task of music generation as possible.

In 2017, Dong et al. presented a GAN-type framework for multi-track music generation in their work, “MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentDong, Hao-Wen, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. n.d. “MuseGAN: Multi-Track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment.” https://salu133445.github.io/musegan/pdf/musegan-aaai2018-paper.pdf..” The paper is a detailed explanation of various music-related concepts and how Dong and the team tackled them.

Challenges

Let’s understand the three main properties related to music that the MuseGAN work tries to take into account:

Multi-track interdependency: As we know, most songs that we listen to are usually composed of multiple instruments such as drums, guitars, bass, vocals, and so on. There is a high level of interdependency in how these components play out for the end user/listener to perceive coherence and rhythm.
Musical texture: Musical notes are often grouped into chords and melodies. These groupings are characterized by a high degree of overlap and not necessarily chronological ordering (this simplification of chronological ordering is usually applied in most known works associated with music generation). The chronological ordering comes not only as part of the need for simplification but also as a generalization from the NLP domain, language generation in particular.
Temporal structure: Music has a hierarchical structure where a song can be seen as being composed of paragraphs (at the highest level). A paragraph is composed of various phrases, which are, in turn, composed of multiple bars, and so on. The figure below depicts this hierarchy pictorially:

Press + to interact

As shown in the figure, a bar is further composed of beats, and at the lowest level, we have pixels. The authors of MuseGAN mention a bar as the compositional unit, as opposed to notes, which we have been considering the basic unit so far. This is done to account for the grouping of notes from the multi-track setup.

Solutions

MuseGAN works toward solving these three major challenges through a unique framework based on three music generation approaches. These three basic approaches make use of jamming, hybrid, and composer models. We'll briefly explain these now.

Jamming model

If we were to extrapolate the simplified monophonic GAN setup from the previous section to a polyphonic setup, the simplest method would be to use multiple generator-discriminator combinations, one for each instrument. The jamming model is precisely this setup, where $M$ multiple independent generators prepare music from their respective random vectors. Each generator has its own critic/discriminator, which helps in training the overall GAN. This setup is depicted in the figure below:

Press + to interact

Introduction to the Course

An Introduction to Generative AI

Building Blocks of Deep Neural Networks

Teaching Networks to Generate Digits

Painting Pictures with Neural Networks Using VAEs

Recognize Handwritten Digits Using a Deep Neural Network

Image Generation with GANs

Dataset Augmentation with GANs

Style Transfer with GANs

Assessment: Introduction to Generative AI to Style Transfer

Deepfakes with GANs

The Rise of Methods for Text Generation

Exploring OpenAI API

NLP 2.0: Using Transformers to Generate Text

Composing Music with Generative Models

Generating New Music with Artificial Intelligence

Play Video Games with Generative AI: GAIL

Emerging Applications in Generative AI

Assessment: Deepfakes using GANs to Emerging Applications

Conclusion

Appendix

MuseGAN—Polyphonic Music Generation

Challenges

Solutions

Jamming model

Composer model

Hybrid model