MuseGAN—Polyphonic Music Generation

Learn how to generate polyphonic music using MuseGAN.

The two models we have trained so far have been simplified versions of how music is actually perceived. While limited, both the attention-based LSTM model and the C-RNN-GAN-based model helped us understand the music generation process very well. In this section, we’ll build on what we’ve learned so far and move toward preparing a setup that is as close to the actual task of music generation as possible.

In 2017, Dong et al. presented a GAN-type framework for multi-track music generation in their work, “MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentDong, Hao-Wen, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. n.d. “MuseGAN: Multi-Track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment.” https://salu133445.github.io/musegan/pdf/musegan-aaai2018-paper.pdf..” The paper is a detailed explanation of various music-related concepts and how Dong and the team tackled them.

Challenges

Let’s understand the three main properties related to music that the MuseGAN work tries to take into account:

  • Multi-track interdependency: As we know, most songs that we listen to are usually composed of multiple instruments such as drums, guitars, bass, vocals, and so on. There is a high level of interdependency in how these components play out for the end user/listener to perceive coherence and rhythm.

  • Musical texture: Musical notes are often grouped into chords and melodies. These groupings are characterized by a high degree of overlap and not necessarily chronological ordering (this simplification of chronological ordering is usually applied in most known works associated with music generation). The chronological ordering comes not only as part of the need for simplification but also as a generalization from the NLP domain, language generation in particular.

  • Temporal structure: Music has a hierarchical structure where a song can be seen as being composed of paragraphs (at the highest level). A paragraph is composed of various phrases, which are, in turn, composed of multiple bars, and so on. The figure below depicts this hierarchy pictorially:

Get hands-on with 1400+ tech skills courses.