February 24, 2023

Voice Cloning with AI

Alexander Wei
1 minute read

Code

Voice Conversion of Seen Speakers

Here are some sample audio clips produced using the Keras-AutoVC voice conversion autoencoder.

Code

My implementation uses Keras, and is available on Github.

Many-to-many conversion between seen speakers

Source Speaker

Target Speaker

Conversion

Source Speaker

Target Speaker

Conversion

Samples from each speaker are cropped into two-second segments, and transformed into a mel-reduced spectrogram. Over the course of training, speech content is transferred between speakers. The model objective is transfer of content independent of style:

Parameters for cleaning and dynamic range compression of audio samples were determined using the CSVTK, my toolkit for compression, cleaning, and visualization of mel spectrograms.

Keep reading:

Sample grocery product images from the Supermarket Product Images dataset, showing packaged food items across multiple retail categories.

Machine Learning

Supermarket Product Images Dataset

The largest dataset of grocery product images from North American supermarkets (15,000+ items) Download it on Kaggle -> My latest grocery store images dataset is

January 29, 2026 No Comments

Infrastructure

Hot coffee, black. Grafana to-go. Protect Grafana without IP whitelisting.

I know I got you wondering how to secure Grafana so that you can check your dashboards on public 4G LTE. And not need to

December 14, 2025 No Comments

Alexander Wei, B.A. M.S.

Mathematics and Applied Mathematics

Voice Cloning with AI

Code

Many-to-many conversion between seen speakers

Keep reading:

Supermarket Product Images Dataset

Hot coffee, black. Grafana to-go. Protect Grafana without IP whitelisting.

Recent Posts

Categories

Tags