Voice Cloning with AI

 
 

Here are some sample audio clips produced using the Keras-AutoVC voice conversion autoencoder.

Code

My implementation uses Keras, and is available on Github.

Many-to-many conversion between seen speakers

 

Source Speaker

Target Speaker

Conversion

 
 

Source Speaker

Target Speaker

Conversion

 

Samples from each speaker are cropped into two-second segments, and transformed into a mel-reduced spectrogram. Over the course of training, speech content is transferred between speakers. The model objective is transfer of content independent of style:

 
A: Source
B: Style
Content of A in voice of B: Target
 

Parameters for cleaning and dynamic range compression of audio samples were determined using the CSVTK, my toolkit for compression, cleaning, and visualization of mel spectrograms.

Keep reading:

Sample grocery product images from the Supermarket Product Images dataset, showing packaged food items across multiple retail categories.
Machine Learning
Alexander Wei

Supermarket Product Images Dataset

The largest dataset of grocery product images from North American supermarkets (15,000+ items) Download it on Kaggle -> My latest grocery store images dataset is

Read More »