how to preprocess midi data for pytorch

3 min read 31-01-2025

MIDI (Musical Instrument Digital Interface) files are a common way to represent musical information digitally. However, before you can use MIDI data to train a PyTorch model for tasks like music generation or classification, you need to preprocess it. This article will guide you through the essential steps. We'll focus on transforming raw MIDI data into a format suitable for deep learning using PyTorch.

Understanding MIDI Data Structure

MIDI data isn't directly usable by neural networks. It's structured as a sequence of events, each representing a note onset, offset, or other control information. These events often include:

Note On: Indicates a note being played. Includes note number (pitch), velocity (loudness), and channel.
Note Off: Indicates a note being released. Includes note number and velocity.
Control Changes: Adjust parameters like volume, pan, or effects.
Program Changes: Select a different instrument.
Tempo Changes: Change the speed of the music.

This event-based nature makes direct processing challenging. We need to convert this into a numerical representation that a PyTorch model can understand.

Preprocessing Steps: From MIDI to PyTorch-Ready Tensors

Here’s a step-by-step guide to preparing MIDI data for your PyTorch models, using popular libraries like pretty_midi and music21. Remember to install these: pip install pretty_midi music21

1. Loading MIDI Files

First, you need to load your MIDI files. Both pretty_midi and music21 provide functions for this.

import pretty_midi
#or
from music21 import converter

# Pretty MIDI example:
midi_data = pretty_midi.PrettyMIDI('your_midi_file.mid')

# Music21 example:
midi_stream = converter.parse('your_midi_file.mid')

Replace 'your_midi_file.mid' with the actual path to your MIDI file.

2. Extracting Relevant Information

We're primarily interested in note information (pitch and duration). Let's extract that:

# Using pretty_midi
instruments = midi_data.instruments
notes = []
for instrument in instruments:
    for note in instrument.notes:
        notes.append((note.pitch, note.start, note.end))

# Using music21 (slightly different approach)
notes = []
for note in midi_stream.flat.notes:
    notes.append((note.pitch.midi, note.offset, note.offset + note.duration.quarterLength))

Both examples give us a list of tuples: (pitch, start_time, end_time).

3. Time-Series Representation

Now, let’s convert this into a time-series representation suitable for PyTorch. We'll create a piano-roll representation:

import numpy as np

# Define parameters
num_pitches = 128 # Standard MIDI range
time_resolution = 10 # Number of time steps per second (adjust as needed)
max_length = 1000 # Maximum sequence length (adjust as needed)


def create_piano_roll(notes, num_pitches, time_resolution, max_length):
    piano_roll = np.zeros((max_length, num_pitches))
    for pitch, start, end in notes:
        start_index = int(start * time_resolution)
        end_index = int(end * time_resolution)
        if start_index >= max_length or end_index >= max_length:
            continue #Skip notes extending beyond max_length
        piano_roll[start_index:end_index, pitch] = 1
    return piano_roll

piano_roll = create_piano_roll(notes, num_pitches, time_resolution, max_length)

This creates a binary matrix where each row represents a time step and each column represents a MIDI pitch. A '1' indicates a note is active at that time step and pitch.

4. Handling Variable-Length Sequences

MIDI sequences have variable lengths. To handle this in PyTorch, we'll use padding:

# Pad sequences to max_length
piano_roll = np.pad(piano_roll, ((0, max_length - piano_roll.shape[0]), (0, 0)), mode='constant')

#Convert to PyTorch tensor
import torch
piano_roll_tensor = torch.tensor(piano_roll, dtype=torch.float32)

We pad shorter sequences with zeros to match max_length.

5. Data Augmentation (Optional)

Consider data augmentation techniques to improve model robustness:

Pitch Shifting: Transpose the entire piece by a certain interval.
Time Stretching: Slightly alter the tempo.
Noise Addition: Add small amounts of random noise.

6. Creating Datasets and DataLoaders

Finally, organize your preprocessed data into PyTorch Datasets and DataLoaders for efficient training:

from torch.utils.data import Dataset, DataLoader

class MidiDataset(Dataset):
    def __init__(self, piano_rolls):
        self.piano_rolls = piano_rolls

    def __len__(self):
        return len(self.piano_rolls)

    def __getitem__(self, idx):
        return self.piano_rolls[idx]


# Example usage:
dataset = MidiDataset([piano_roll_tensor]) # Replace with your list of tensors
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

This provides a basic framework for preparing MIDI data. Remember to adjust parameters like time_resolution and max_length based on your dataset and model requirements. Experimentation is key to finding optimal settings. More complex preprocessing might involve chord recognition, melodic contour extraction, or other feature engineering. This foundation, however, will allow you to build sophisticated music-related PyTorch models.