Uses DemucsV4 for separating audio files.
tbd
One of the dependencies is audiowaveform, which handles calculating a downsampled representation of an uploaded audio file. For installation refer to their installation guide or if using ubuntu do:
sudo add-apt-repository ppa:chris-needham/ppa
sudo apt-get update
sudo apt-get install audiowaveform
Further, on MacOs install soundstretch/soundtouch with:
brew install sound-touch
For Linux you can install PySoundFile
as torchaudio backend later.
Install conda/miniconda if you haven't already.
You can use the provided environment.yml
file to get started fast.
conda env create -f environment.yml
Create a new conda environment with python=3.10
or version of you liking.
conda create --name separation python=3.10
Activate the environment with:
conda activate separation
Install pytorch and dependencies using conda/miniconda:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
A detailed guide how to get started with PyTorch for your environment / system can be found here.
Other needed dependencies:
pip install git+https://github.com/CarlGao4/demucs.git@4.1.0-update
pip install bullmq prisma minio asyncio ffmpeg
Additionally for linux:
pip install PySoundFile
Generate prisma client, api-gateway should've been started by now and a db should exist:
prisma db pull
prisma generate
Then copy .env.template
to .env
and fill out the needed env vars with your secrets.
Start the worker with python worker.py
, you will see whether any GPU is for accelerated computing is available and the model downloading from meta's public file registry.
Once thats done, the worker is ready process jobs.
In worker.py
where the SeparationProcessor
is instantiated, you can pass the model
parameter.
htdemucs
Demucs V3htdemucs_ft
Demucs V4 (Hybrid Transformer)