Librosa library in Python

Librosa is a library that is used for analyzing the behavior of audio. It helps in loading audio files, extracting the characteristics of the music, and visualizing audio data. With the help of librosa, working with audio in Python has become straightforward. If you want to work with librosa, then you need to install it on your system.

Installation of librosa

Installation of librosa requires pip to be installed on the system. You need to run the following command to install librosa:

pip3 install librosa

Applications of librosa

Liborsa is a powerful Python library. It has exciting applications in the field of analyzing and processing audio. The notable applications of librosa are illustrated below.

Applications of librosa
Applications of librosa

Audio feature extraction

We can extract the features of audio from the librosa library in Python. It loads the audio sample, computes the MFCCs, and then displays the MFCCs as a plot using Matplotlib. MFCCs represent the audio's spectral characteristics and are commonly used in audio processing tasks such as music information retrieval and speech recognition. In the code y represents the audio signal as a time-series waveform, and sr represents the audio sampling rate.

import librosa
import librosa.display
import matplotlib.pyplot as plt
# Load a built-in audio sample (example: Bird singing)
y, sr = librosa.load(librosa.example('trumpet'))
# Compute MFCCs
mfccs = librosa.feature.mfcc(y=y, sr=sr)
# Display the MFCCs
plt.figure(figsize=(10, 4))
librosa.display.specshow(mfccs, x_axis='time')
plt.colorbar()
plt.title('MFCC')
plt.tight_layout()
plt.savefig("./output/Plot.png")
plt.show()

Beat and tempo detection

We can estimate the tempo (in beats per minute - BPM) and detect beat events in the audio signal with the help of librosa. The estimated tempo and the frame indices where beat events occur are printed as output in the code below. Beat and tempo detection are essential tasks in music analysis and rhythm-based applications.

import librosa
# Load a built-in audio sample (example: Beat loop)
y, sr = librosa.load(librosa.example('trumpet'))
# Estimate the tempo and beat events
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
print(f"Estimated Tempo: {tempo} BPM")
print("Beat Frames:", beats)

Music visualization

Music visualization refers to representing audio data in a visual format, allowing users to gain insights into the characteristics of the music, such as its waveform, spectral content, and other audio features. It displays both the waveform and the spectrogram of the audio signal, providing insights into the audio's time-domain representation and frequency content. The waveform plot shows the audio signal's amplitude over time, while the spectrogram plot visualizes the audio's frequency content over time.

import librosa
import librosa.display
import matplotlib.pyplot as plt
# Load a built-in audio sample (example: Trumpet)
y, sr = librosa.load(librosa.example('trumpet'))
# Display the waveform
plt.figure(figsize=(10, 4))
librosa.display.waveshow(y, sr=sr)
plt.title('Waveform')
plt.tight_layout()
plt.savefig("./output/Plot.png")
plt.show()
# Display the spectrogram
spec = librosa.stft(y)
spec_db = librosa.amplitude_to_db(abs(spec))
plt.figure(figsize=(10, 4))
librosa.display.specshow(spec_db, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Spectrogram')
plt.tight_layout()
plt.savefig("./output/Plot1.png")
plt.show()

Onset detection

Onset detection is a fundamental task in audio signal processing that involves identifying the points in an audio signal where significant events or transients occur. It computes the onset strength envelope, identifies onset events, and visualizes the onset strength and the detected onsets on a plot. Onset detection is essential for identifying significant events in the audio signal, such as beats and note onsets. The resulting plot allows for observing rhythmic patterns and intensity changes in the audio.

import librosa
import librosa.display
import matplotlib.pyplot as plt
# Load a built-in audio sample (example: Drum loop)
y, sr = librosa.load(librosa.example('trumpet'))
# Compute onset strength envelope
onset_env = librosa.onset.onset_strength(y=y, sr=sr)
# Find onset events
onsets = librosa.onset.onset_detect(onset_envelope=onset_env, sr=sr)
# Plot the onset strength envelope and detected onsets
plt.figure(figsize=(10, 4))
plt.plot(librosa.times_like(onset_env), onset_env, label='Onset Strength')
plt.vlines(librosa.times_like(onset_env)[onsets], 0, onset_env.max(), color='r', alpha=0.9, label='Detected Onsets')
plt.legend()
plt.title('Onset Detection')
plt.tight_layout()
plt.savefig("./output/Plot.png")
plt.show()

Chroma feature extraction

Chroma feature extraction is a technique commonly used in music signal processing to represent the harmonic content of an audio signal. It aims to capture the distribution of pitch classes, which are the 12 distinct notes in the Western music scale (C, C#, D, D#, E, F, F#, G, G#, A, A#, B).

Have a look at the output of the below code.

import librosa
import librosa.display
import matplotlib.pyplot as plt
# Load a built-in audio sample (example: Trumpet)
y, sr = librosa.load(librosa.example('trumpet'))
# Compute chroma feature
chroma = librosa.feature.chroma_stft(y=y, sr=sr)
# Display the chroma feature
plt.figure(figsize=(10, 4))
librosa.display.specshow(chroma, y_axis='chroma', x_axis='time')
plt.colorbar()
plt.title('Chroma Feature')
plt.tight_layout()
plt.savefig("./output/Plot.png")
plt.show()

Harmonic and percussive source separation

Harmonic and percussive source separation is a process in audio signal processing where the goal is to decompose an audio signal into two components: the harmonic part, which contains pitched and tonal elements like melodies and chords, and the percussive part, which contains rhythmic and transient elements like drums and percussion.

import librosa
import librosa.display
import matplotlib.pyplot as plt
# Load a built-in audio sample (example: Piano)
y, sr = librosa.load(librosa.example('trumpet'))
# Perform harmonic-percussive source separation
harmonic, percussive = librosa.effects.hpss(y)
# Visualize the harmonic component
plt.figure(figsize=(10, 4))
librosa.display.waveshow(harmonic, sr=sr)
plt.title('Harmonic Component')
plt.tight_layout()
plt.savefig("./output/Plot.png")
plt.show()
# Visualize the percussive component
plt.figure(figsize=(10, 4))
librosa.display.waveshow(percussive, sr=sr)
plt.title('Percussive Component')
plt.tight_layout()
plt.savefig("./output/Plot1.png")
plt.show()

Constant-Q transform

The constant-Q transform (CQT) is a time-frequency representation approximating the human auditory perception of pitch. It is particularly useful for analyzing musical audio signals and has applications in transcription, pitch estimation, and music analysis tasks.

import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
# Load the built-in "jazz" audio file
audio, sr = librosa.load(librosa.example('trumpet'))
# Compute the CQT representation
cqt = librosa.cqt(audio, sr=sr)
# Display the CQT representation
plt.figure(figsize=(10, 6))
librosa.display.specshow(librosa.amplitude_to_db(cqt, ref=np.max), sr=sr, x_axis='time', y_axis='cqt_note')
plt.colorbar(format='%+2.0f dB')
plt.title('Constant-Q Transform (CQT)')
plt.tight_layout()
plt.savefig("./output/Plot1.png")
plt.show()

Stretch audio

Stretching audio, also known as time stretching, is an audio processing technique that alters the duration of an audio signal while preserving its pitch.

The code to stretch audio is given below.

import os
import librosa
import librosa.display
import soundfile as sf
import matplotlib.pyplot as plt
# Load a built-in audio sample (example: Trumpet)
y, sr = librosa.load(librosa.example('trumpet'))
# Stretch the audio by a factor of 1.5
y_stretched = librosa.effects.time_stretch(y, rate=1.5)
# Create the output directory if it doesn't exist
output_dir = "./Output"
os.makedirs(output_dir, exist_ok=True)
# Save the original audio
original_output_file = os.path.join(output_dir, "original_audio.wav")
sf.write(original_output_file, y, sr)
# Save the time-stretched audio
stretched_output_file = os.path.join(output_dir, "time_stretched_audio.wav")
sf.write(stretched_output_file, y_stretched, sr)
# Display the waveforms of both the original and time-stretched audio
plt.figure(figsize=(14, 4))
# Original Audio Waveform
plt.subplot(1, 2, 1)
librosa.display.waveshow(y, sr=sr)
plt.title('Original Audio Waveform')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
# Time-Stretched Audio Waveform
plt.subplot(1, 2, 2)
librosa.display.waveshow(y_stretched, sr=sr)
plt.title('Time-Stretched Audio Waveform')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.tight_layout()
plt.savefig("./output/Plot.png")
plt.show()

Conclusion

Librosa is a music processing library that is beneficial to researchers and music enthusiasts. With its user-friendly interface and a wide range of functionalities, librosa makes it easy to analyze and manipulate audio signals and music data.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved