HSM-TSS Demo

General Audio Separation

FSD50K

Ours (extract): "Extract the sound of belly laugh,giggle"

Ours (remove): "Drop the sound of printer"

AudioSep: "belly laugh,giggle"

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Ours (extract): "Separate ratchet and pawl"

Ours (remove): "Drop the sound of computer keyboard"

AudioSep: "ratchet and pawl"

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

AudioCaps

Ours (extract): "Separate: a tool motor is running while metal whirring, grinding and clattering occur, then the tool whirs to a stop."

Ours (remove): "Drop: people are laughing and chuckling."

AudioSep: "A tool motor is running while metal whirring, grinding and clattering occur, then the tool whirs to a stop"

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Ours (extract): "Extract: typing on a computer keyboard"

Ours (remove): "Drop: vehicle motor running"

AudioSep: "typing on a computer keyboard"

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Clotho

Ours (extract): "Separate: banging on a trash can is making a drum sound."

Ours (remove): "Remove: a shrill note dings before becoming lower in tone as machine clicks play."

AudioSep: "Banging on a trash can is making a drum sound."

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Ours (extract): "Extract: quick scratchy writing on a chalkboard is occurring."

Ours (remove): "Remove: the engine of a lawnmower runs steadily to cut the grass."

AudioSep: "Quick scratchy writing on a chalkboard is occurring."

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

3 Sets

Ours (extract): "Separate drip"

Ours (remove): "Remove: someone is frying food and scraping it out of the pan and then throws the spatula in the pan."

AudioSep: "drip"

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Ours (extract): "Extract: a man speaks with some clicks and then loud long scrapes."

Ours (remove): "Remove the sound of wind chime."

AudioSep: "A man speaks with some clicks and then loud long scrapes."

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Music Separation (Zero-shot from MUSIC dataset)

Ours (extract): "Extract the sound of flute"

Ours (remove): "Drop violin"

AudioSep: "flute"

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Ours (extract): "Separate trumpet"

Ours (remove): "Drop accordion"

AudioSep: "trumpet"

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Ours (extract): "Extract the sound of tuba"

Ours (remove): "Remove cello"

AudioSep: "tuba"

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Ours (extract): "Extract trumpet"

Ours (remove): "Remove saxophone"

AudioSep: "trumpet"

Mixture

GT

AudioSep

Ours (extract)

Ours (remove)

Flexible instructions for audio source separation

Arbitrary Instruction: "I need to keep the loud train horn."

LLM Processed: task type("extract") + description("loud train horn")

Mixture

GT

Ours

Arbitrary Instruction: "Please remove the sound of squirrels, frogs and various birds close to a stream."

LLM Processed: task type("remove") + description("squirrels, frogs and various birds close to a stream")

Mixture

GT

Ours

Arbitrary Instruction: "I can hear the sound of rain and thunder from the mixture. Help me isolate the sound."

LLM Processed: task type("extract") + description("rain, thunder")

Mixture

GT

Ours

Arbitrary Instruction: "Only keep the woman speech without the music sound."

LLM Processed: task type("extract") + description("woman speech")

Mixture

GT

Ours

Performance on AUDIT demos

Drop the sound of a duck quacking in water

Mixture

AUDIT

Ours (remove)

Drop the sound of dishes and pots and pans in the middle

Mixture

AUDIT

Ours (remove)

Drop: pouring water

Mixture

AUDIT

Ours (remove)

Drop people cheering

Mixture

AUDIT

Ours (remove)

Drop a short firework explosion in the end

Mixture

AUDIT

Ours (remove)

Drop the sound of a woman talking

Mixture

AUDIT

Ours (remove)