Voice Cloning with AI

 
 

Here are some sample audio clips produced using the Keras-AutoVC voice conversion autoencoder.

Code

My implementation uses Keras, and is available on Github.

Many-to-many conversion between seen speakers

 

Source Speaker

Target Speaker

Conversion

 
 

Source Speaker

Target Speaker

Conversion

 

Samples from each speaker are cropped into two-second segments, and transformed into a mel-reduced spectrogram. Over the course of training, speech content is transferred between speakers. The model objective is transfer of content independent of style:

 
A: Source
B: Style
Content of A in voice of B: Target
 

Parameters for cleaning and dynamic range compression of audio samples were determined using the CSVTK, my toolkit for compression, cleaning, and visualization of mel spectrograms.