Exploring AI-Driven Music Creation and Fine-Tuning in Python | by Biswarup Dutta

The convergence of deep studying and audio synthesis has remodeled how we compose, remix, and fine-tune music. On this in-depth information, we’ll cowl every part — from low-level MIDI processing to state-of-the-art text-to-music fashions, customized fine-tuning, and deploying an interactive Streamlit app. Strap in for detailed code examples, architectural insights, and sensible suggestions so you’ll be able to construct your personal AI music studio totally in Python.

Create an remoted setting and set up important packages:

python3 -m venv music-env
supply music-env/bin/activate# MIDI & audio processing
pip set up pretty_midi mido pydub soundfile numpy scipy
# Deep studying backends
pip set up torch torchvision torchaudio
# Generative music fashions
pip set up magenta            # Symbolic music (MusicVAE, PerformanceRNN)
pip set up audiocraft         # Meta’s MusicGen & AudioGen
pip set up diffusers transformers speed up  # Hugging Face Diffusers for AudioLDM2
# Internet deployment
pip set up streamlit

2.1 Loading and Inspecting MIDI Recordsdata

import pretty_mididef load_midi(path):
pm = pretty_midi.PrettyMIDI(path)
print(f"Loaded '{path}': tempo={pm.get_tempo_changes()[1][0]:.1f} BPM")
for inst in pm.devices:
print(f"  {inst.identify} – {len(inst.notes)} notes")
return pm
pm = load_midi('examples/mozart_symphony.mid')

2.2 Changing Audio to Spectrograms (for diffusion fashions)

import torchaudiowaveform, sr = torchaudio.load('examples/output.wav')
# Create a Mel spectrogram
mel_spec = torchaudio.transforms.MelSpectrogram(
sample_rate=sr, n_mels=128, n_fft=1024, hop_length=256
)(waveform)
print(mel_spec.form)  # [channels, n_mels, time_frames]

3.1 Sampling with MusicVAE

from magenta.fashions.music_vae import configs
from magenta.fashions.music_vae.trained_model import TrainedModel
import pretty_midiconfig = configs.CONFIG_MAP['hierdec-mel_16bar']
mvae = TrainedModel(config, batch_size=4, checkpoint_dir_or_path=None)
# Interpolate between two latent factors
z1, z2 = mvae.encode([sequence1, sequence2])
for alpha in [0.0, 0.25, 0.5, 0.75, 1.0]:
seq = mvae.decode(z1*(1-alpha) + z2*alpha, size=64)[0]
pm = pretty_midi.PrettyMIDI()
instr = pretty_midi.Instrument(program=0)
instr.notes.prolong(seq.notes)
pm.devices.append(instr)
pm.write(f'interp_{alpha:.2f}.mid')

Key Factors

Hierarchical VAE: Learns multi-scale construction in melodies.
Latent interpolation: Clean morphing of musical phrases.

4.1 Producing Music from Textual content Prompts

from audiocraft.fashions import MusicGen# Select a bigger mannequin for richer high quality
mannequin = MusicGen.get_pretrained('musicgen-medium')
# Generate a jazzy bass groove
wav = mannequin.generate("A mellow jazz bass line with brushed drums", length=20)
# Save as WAV
mannequin.save_wav(wav, 'jazz_bass.wav')

4.2 Understanding the Structure

Codebook tokenizer: Quantizes audio into discrete tokens.
Transformer decoder: Autoregressively predicts codebook indices.
Upsampler: Converts codes again to waveform through a neural vocoder.

5.1 One-Shot Textual content-to-Audio

from diffusers import AudioLDMPipeline
import torch
import soundfile as sfpipe = AudioLDMPipeline.from_pretrained(
"haoheliu/audioldm-m-full", variant="diffusers"
).to('cuda')
out = pipe(
"A serene piano solo with mushy reverb and mild dynamics",
num_inference_steps=80,
guidance_scale=3.0
)
audio = out.audios[0]  # numpy array
sf.write('piano_reverb.wav', audio, 24000)

5.2 High quality-Tuning Your Personal Type

Dataset: Gather pairs of (e.g., 10-50 examples).
Preprocessing: Resample to 24 kHz, normalize amplitude.
Coaching Loop:

from diffusers import AudioLDMForConditionalGeneration, AudioLDMTokenizer
from datasets import load_dataset
from transformers import Coach, TrainingArgumentsmannequin = AudioLDMForConditionalGeneration.from_pretrained("haoheliu/audioldm-m")
tokenizer = AudioLDMTokenizer.from_pretrained("haoheliu/audioldm-m")
ds = load_dataset("csv", data_files={"practice":"captions.csv"})
def prep(ex):
ex['input_ids'] = tokenizer(ex['text']).input_ids
ex['waveform'] = sf.learn(ex['wav_path'])[0]
return ex
train_ds = ds['train'].map(prep)
args = TrainingArguments(
output_dir="fine_tuned_audioldm",
per_device_train_batch_size=1,
learning_rate=2e-5,
num_train_epochs=10,
save_steps=200
)
coach = Coach(mannequin=mannequin, args=args, train_dataset=train_ds)
coach.practice()

6. Constructing the Streamlit Interface

Beneath is a sturdy Streamlit app with reside era, file uploads for fine-tuning, and obtain choices:

# app.py
import streamlit as st
from io import BytesIO
import soundfile as sf
from audiocraft.fashions import MusicGen
from diffusers import AudioLDMPipelinest.set_page_config(page_title="AI Music Studio", format="extensive")
st.title("AI Music Studio 🎶")
# Sidebar controls
model_choice = st.sidebar.selectbox("Mannequin", ["MusicGen-Medium", "AudioLDM2"])
immediate = st.sidebar.text_area("Music Immediate", "A vivid digital arpeggio")
length = st.sidebar.slider("Length (sec)", 5, 60, 15)
if st.sidebar.button("Generate"):
buffer = BytesIO()
if model_choice.startswith("MusicGen"):
mg = MusicGen.get_pretrained('musicgen-medium')
wav = mg.generate(immediate, length=length)
mg.save_wav(wav, buffer)
else:
pipe = AudioLDMPipeline.from_pretrained("haoheliu/audioldm-m-full").to('cuda')
audioldm = pipe(immediate, num_inference_steps=60).audios[0]
sf.write(buffer, audioldm, 24000, format='WAV')
st.audio(buffer.getvalue(), format='audio/wav')
st.download_button("Obtain Observe", information=buffer, file_name="monitor.wav", mime="audio/wav")
# High quality-tuning add
st.markdown("### High quality-Tune AudioLDM2")
uploaded = st.file_uploader("Add CSV with textual content,wav paths", sort="csv")
if uploaded:
st.success("Prepared for fine-tuning! (See code snippet in repo)")
st.markdown("#### Preview MIDI Instance")
midi_file = st.file_uploader("Add a MIDI file", sort="mid")
if midi_file:
import pretty_midi
pm = pretty_midi.PrettyMIDI(midi_file)
st.write(pm.devices[0].notes[:5])  # Present first 5 notes

7. Deployment Methods

Streamlit Cloud: Join your GitHub repo for fast deployment.
Docker:

(.dockerfile)FROM python:3.10-slim
COPY . /app
WORKDIR /app
RUN pip set up -r necessities.txt
EXPOSE 8501
ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"

GPU Hosts: Use AWS/GCP with NVIDIA GPUs for faster generation.

8. Next Steps & Advanced Topics

Real-Time Looping: Integrate WebAudio for browser-side live looping.
Hybrid Models: Combine symbolic (Magenta) and waveform (MusicGen) pipelines.
Customization: Build your own codebook or improve vocoder quality via adversarial training.

Embark on your creative journey — whether you’re composing ambient soundtracks, crafting fresh beats, or fine-tuning the next viral hook, Python’s AI music ecosystem puts the studio at your fingertips!

Source link

A First-Principles Guide to Multilingual Sentence Embeddings | by Tharunika L | Jun, 2025

Prediksi Kualitas Anggur dengan Random Forest — Panduan Lengkap dengan Python | by Gilang Andhika | Jun, 2025

Proposed Study: Integrating Emotional Resonance Theory into AI : An Endocept-Driven Architecture | by Tim St Louis | Jun, 2025

Putting Up A Paywall To Fight AI And Support My Family

How to Master the 5 Pillars of Entrepreneurial Excellence

What to Know Before Investing in a Pre-IPO Company

Comprehensive Guide to Dependency Management in Python

How to Succeed as a Planning-Driven Leader

Most Popular

5 Key Data and AI Innovations to Keep an Eye on in 2025

How to improve AP and invoice tasks

Title: Introduction to Machine Learning: A Beginner’s Guide | by Muhammad Hammad | Mar, 2025

Our Picks

Building a Streamlit App for Deepfake Audio Detection and Multi-label Defect Prediction | by Ayesha Saeed | May, 2025

10 Charitable Organizations Entrepreneurs Should Support

Japanese-Chinese Translation with GenAI: What Works and What Doesn’t

Exploring AI-Driven Music Creation and Fine-Tuning in Python | by Biswarup Dutta | Apr, 2025