The Magic of Synthetic Data

Javier Marin
4 min readDec 12, 2022

How GANs Can Help You Generate Artificial Data that is Representative of Your Original Data

Photo by Michael Dziedzic on Unsplash

Synthetic data, or data augmentation, is the process of creating new, artificial data that is similar to the original data. This is typically done using generative adversarial neural networks (GANs), which are composed of two neural networks that compete with each other to generate and identify synthetic data. The first network, the “generator,” attempts to create new data that is similar to the input data, while the second network, the “discriminator,” tries to identify which data is synthetic and which is not. Through this competition, the GAN is able to generate new, synthetic data that is representative of the original data. This can be useful in a variety of applications, such as improving the performance of machine learning algorithms and increasing the size of a dataset for analysis.

Some answers

I have seen that whenever someone talks about the synthetic data (or rather data augmentation), there is always someone who asks how it is possible that a larger dataset can be created from a few rows.

Is it something magic, or are we inventing the data? Well, we could say that both things are somewhat true. It’s magical because generative adversarial neural networks (or GANs) are amazing. And there is some truth in the…

--

--

Javier Marin

Experienced technology leader with proven track record of using cutting-edge AI technologies to drive business success and innovation.