AudioWeave: Unified Audio Generation and Editing via Joint Condition Modeling and Progressive Training

Text-to-Audio Samples

Instruction Generation Result
A telephone dialing tone followed by a plastic switch flipping
A train horn blows and fades, then metal clacking occurs
An insect buzzing as plastic clacks and plastic slaps a hard surface
Heavy rainfall with a brief muffled thunder from outside
Sanding and scraping followed by a man speaking
The wind is blowing, and a person is whistling a tune
Ambulance siren repeatedly and then continuous
A toilet flushes and water drains

Audio Editing Samples

Task Instruction Reference Audio Editing Result
Adding Insert a sound of pouring water at the end
Put a sound of toilet flush at the midpoint
Mix a sound of bird chipping throughout the audio
Add a sneezing at the beginning of the audio
Removing Drop laughing from the audio
Remove the sound of cat meowing
Replacement Exchange clock tick with violin
Replace the sound of dog with sheep
Reordering Reverse the order of pouring water and air conditioner
Swap the order of rain and munching
Inpainting Inpaint the audio
Complete the missing segment
Super-resolution Synthesize the missing high frequencies
Increase audio resolution