Librosa is a library that is used for analyzing the behavior of audio. It helps in loading audio files, extracting the characteristics of the music, and visualizing audio data. With the help of librosa, working with audio in Python has become straightforward. If you want to work with librosa, then you need to install it on your system.
Installation of librosa requires pip to be installed on the system. You need to run the following command to install librosa:
pip3 install librosa
Liborsa is a powerful Python library. It has exciting applications in the field of analyzing and processing audio. The notable applications of librosa are illustrated below.
We can extract the features of audio from the librosa library in Python. It loads the audio sample, computes the MFCCs, and then displays the MFCCs as a plot using Matplotlib. MFCCs represent the audio's spectral characteristics and are commonly used in audio processing tasks such as music information retrieval and speech recognition. In the code y
represents the audio signal as a time-series waveform, and sr
represents the audio sampling rate.
import librosaimport librosa.displayimport matplotlib.pyplot as plt# Load a built-in audio sample (example: Bird singing)y, sr = librosa.load(librosa.example('trumpet'))# Compute MFCCsmfccs = librosa.feature.mfcc(y=y, sr=sr)# Display the MFCCsplt.figure(figsize=(10, 4))librosa.display.specshow(mfccs, x_axis='time')plt.colorbar()plt.title('MFCC')plt.tight_layout()plt.savefig("./output/Plot.png")plt.show()
We can estimate the tempo (in beats per minute - BPM) and detect beat events in the audio signal with the help of librosa. The estimated tempo and the frame indices where beat events occur are printed as output in the code below. Beat and tempo detection are essential tasks in music analysis and rhythm-based applications.
import librosa# Load a built-in audio sample (example: Beat loop)y, sr = librosa.load(librosa.example('trumpet'))# Estimate the tempo and beat eventstempo, beats = librosa.beat.beat_track(y=y, sr=sr)print(f"Estimated Tempo: {tempo} BPM")print("Beat Frames:", beats)
Music visualization refers to representing audio data in a visual format, allowing users to gain insights into the characteristics of the music, such as its waveform, spectral content, and other audio features. It displays both the waveform and the spectrogram of the audio signal, providing insights into the audio's time-domain representation and frequency content. The waveform plot shows the audio signal's amplitude over time, while the spectrogram plot visualizes the audio's frequency content over time.
import librosaimport librosa.displayimport matplotlib.pyplot as plt# Load a built-in audio sample (example: Trumpet)y, sr = librosa.load(librosa.example('trumpet'))# Display the waveformplt.figure(figsize=(10, 4))librosa.display.waveshow(y, sr=sr)plt.title('Waveform')plt.tight_layout()plt.savefig("./output/Plot.png")plt.show()# Display the spectrogramspec = librosa.stft(y)spec_db = librosa.amplitude_to_db(abs(spec))plt.figure(figsize=(10, 4))librosa.display.specshow(spec_db, x_axis='time', y_axis='log')plt.colorbar(format='%+2.0f dB')plt.title('Spectrogram')plt.tight_layout()plt.savefig("./output/Plot1.png")plt.show()
Onset detection is a fundamental task in audio signal processing that involves identifying the points in an audio signal where significant events or transients occur. It computes the onset strength envelope, identifies onset events, and visualizes the onset strength and the detected onsets on a plot. Onset detection is essential for identifying significant events in the audio signal, such as beats and note onsets. The resulting plot allows for observing rhythmic patterns and intensity changes in the audio.
import librosaimport librosa.displayimport matplotlib.pyplot as plt# Load a built-in audio sample (example: Drum loop)y, sr = librosa.load(librosa.example('trumpet'))# Compute onset strength envelopeonset_env = librosa.onset.onset_strength(y=y, sr=sr)# Find onset eventsonsets = librosa.onset.onset_detect(onset_envelope=onset_env, sr=sr)# Plot the onset strength envelope and detected onsetsplt.figure(figsize=(10, 4))plt.plot(librosa.times_like(onset_env), onset_env, label='Onset Strength')plt.vlines(librosa.times_like(onset_env)[onsets], 0, onset_env.max(), color='r', alpha=0.9, label='Detected Onsets')plt.legend()plt.title('Onset Detection')plt.tight_layout()plt.savefig("./output/Plot.png")plt.show()
Chroma feature extraction is a technique commonly used in music signal processing to represent the harmonic content of an audio signal. It aims to capture the distribution of pitch classes, which are the 12 distinct notes in the Western music scale (C, C#, D, D#, E, F, F#, G, G#, A, A#, B).
Have a look at the output of the below code.
import librosaimport librosa.displayimport matplotlib.pyplot as plt# Load a built-in audio sample (example: Trumpet)y, sr = librosa.load(librosa.example('trumpet'))# Compute chroma featurechroma = librosa.feature.chroma_stft(y=y, sr=sr)# Display the chroma featureplt.figure(figsize=(10, 4))librosa.display.specshow(chroma, y_axis='chroma', x_axis='time')plt.colorbar()plt.title('Chroma Feature')plt.tight_layout()plt.savefig("./output/Plot.png")plt.show()
Harmonic and percussive source separation is a process in audio signal processing where the goal is to decompose an audio signal into two components: the harmonic part, which contains pitched and tonal elements like melodies and chords, and the percussive part, which contains rhythmic and transient elements like drums and percussion.
import librosaimport librosa.displayimport matplotlib.pyplot as plt# Load a built-in audio sample (example: Piano)y, sr = librosa.load(librosa.example('trumpet'))# Perform harmonic-percussive source separationharmonic, percussive = librosa.effects.hpss(y)# Visualize the harmonic componentplt.figure(figsize=(10, 4))librosa.display.waveshow(harmonic, sr=sr)plt.title('Harmonic Component')plt.tight_layout()plt.savefig("./output/Plot.png")plt.show()# Visualize the percussive componentplt.figure(figsize=(10, 4))librosa.display.waveshow(percussive, sr=sr)plt.title('Percussive Component')plt.tight_layout()plt.savefig("./output/Plot1.png")plt.show()
The constant-Q transform (CQT) is a time-frequency representation approximating the human auditory perception of pitch. It is particularly useful for analyzing musical audio signals and has applications in transcription, pitch estimation, and music analysis tasks.
import librosaimport librosa.displayimport numpy as npimport matplotlib.pyplot as plt# Load the built-in "jazz" audio fileaudio, sr = librosa.load(librosa.example('trumpet'))# Compute the CQT representationcqt = librosa.cqt(audio, sr=sr)# Display the CQT representationplt.figure(figsize=(10, 6))librosa.display.specshow(librosa.amplitude_to_db(cqt, ref=np.max), sr=sr, x_axis='time', y_axis='cqt_note')plt.colorbar(format='%+2.0f dB')plt.title('Constant-Q Transform (CQT)')plt.tight_layout()plt.savefig("./output/Plot1.png")plt.show()
Stretching audio, also known as time stretching, is an audio processing technique that alters the duration of an audio signal while preserving its pitch.
The code to stretch audio is given below.
import osimport librosaimport librosa.displayimport soundfile as sfimport matplotlib.pyplot as plt# Load a built-in audio sample (example: Trumpet)y, sr = librosa.load(librosa.example('trumpet'))# Stretch the audio by a factor of 1.5y_stretched = librosa.effects.time_stretch(y, rate=1.5)# Create the output directory if it doesn't existoutput_dir = "./Output"os.makedirs(output_dir, exist_ok=True)# Save the original audiooriginal_output_file = os.path.join(output_dir, "original_audio.wav")sf.write(original_output_file, y, sr)# Save the time-stretched audiostretched_output_file = os.path.join(output_dir, "time_stretched_audio.wav")sf.write(stretched_output_file, y_stretched, sr)# Display the waveforms of both the original and time-stretched audioplt.figure(figsize=(14, 4))# Original Audio Waveformplt.subplot(1, 2, 1)librosa.display.waveshow(y, sr=sr)plt.title('Original Audio Waveform')plt.xlabel('Time (s)')plt.ylabel('Amplitude')# Time-Stretched Audio Waveformplt.subplot(1, 2, 2)librosa.display.waveshow(y_stretched, sr=sr)plt.title('Time-Stretched Audio Waveform')plt.xlabel('Time (s)')plt.ylabel('Amplitude')plt.tight_layout()plt.savefig("./output/Plot.png")plt.show()
Librosa is a music processing library that is beneficial to researchers and music enthusiasts. With its user-friendly interface and a wide range of functionalities, librosa makes it easy to analyze and manipulate audio signals and music data.
Free Resources