Speech Quality Enhancement with SEGAN

Learn about enhancing speech quality with SEGAN using a noisy and clean sounds dataset.

Researchers have found a similar application to image restoration in NLP, where GANs can be trained to get rid of the noises in audio in order to enhance the quality of the recorded speeches. In this section, we will learn how to use SEGAN to reduce background noise in the audio and make the human voice in the noisy audio more audible.

SEGAN architecture

Speech Enhancement GANPascual, Santiago, Antonio Bonafonte, and Joan Serra. "SEGAN: Speech enhancement generative adversarial network." arXiv preprint arXiv:1703.09452 (2017). (SEGAN) uses 1D convolutions to successfully remove noise from speech audio. We can check out the noise removal resultshttp://veu.talp.cat/segan compared to other methods. There’s also an upgraded version, which can be found herehttp://veu.talp.cat/seganp. Images are two-dimensional, while sounds are one-dimensional. Considering GANs are so good at synthesizing 2D images, it is rather obvious to consider using 1D convolution layers instead of 2D convolutions in order to harness the power of GANs when it comes to synthesizing audio data. This is exactly how SEGAN is built.

The generator network in SEGAN employs an architecture of encoder-decoder with skip connections. The architecture of the generator network is as follows:

Get hands-on with 1400+ tech skills courses.