Music Generation Using GANs

Learn how to generate music using adversarial networks.

Let’s raise the bar a bit and try to see how we can generate music using a GAN. Music is continuous and sequential in nature. LSTMs, or RNNs in general, are quite adept at handling such datasets. We have also seen that, over the years, various types of GANs have been proposed to train deep generative networks efficiently.

Combining the power of LSTMs and GAN-based generative networks, Mogren et al. presented “Continuous Recurrent Neural Networks with Adversarial Training: C-RNN-GANMogren, Olof. 2016. “C-RNN-GAN: Continuous Recurrent Neural Networks with Adversarial Training.” ArXiv.org. November 29, 2016. https://doi.org/10.48550/arXiv.1611.09904.” in 2016 as a method for music generation. This is a straightforward yet effective implementation for music generation.

We’ll keep things simple and focus only on monophonic music generation, even though the original paper mentions using features such as tone length, frequency, intensity, and time apart from music notes. The paper also mentions a technique called feature mapping to generate polyphonic music (using the C-RNN-GAN-3 variant). We’ll only focus on understanding the basic architecture and pre-processing steps and not try to implement the paper as-is. Let’s begin with defining each of the components of our music-generating GAN.

Generator network

The generator model is a fairly straightforward implementation that highlights the effectiveness of a GAN-based generative model. We start off with a random vector zz of a given dimension and pass it through different non-linearities to achieve a final output of the desired shape.

Get hands-on with 1400+ tech skills courses.