Platform overview . .. ...

BlueGen generates synthetic data that mimics your real data but preserves privacy to unlock the real potential of data

Data Synthesizer

BlueGen's core capability is to generate synthetic data that mimics real data and guarantees privacy. BlueGen's platform applies AI to learn from real tabular data and create data with the same statistical distribution, business rules, and referential integrity.

Differential Privacy

BlueGen applies differential privacy, a mathematical definition of privacy, to generate proven privacy-safe data that looks and behaves like the original data.

Data Augmentation

BlueGen can augment datasets by generating new data points from existing data to provide training data for improving the accuracy of ML models. Or to create edge cases for software testing. Augmenting can also work in reverse and generate a subset of the real data when a small dataset is sufficient.

Federated Data Generation

BlueGen’s federated learning framework allows multiple users to collaborate in generating synthetic data without exposing their real data. Only the structure and characteristics of the different data sources are used to safely and efficiently generate more diverse synthetic datasets at a larger scale.

Cloud Service and On-Premise

BlueGen can run on-premise in your data center or in your private cloud. BlueGen’s platform and the required computing capabilities can be used from the cloud with a local (browser) agent through the federated learning framework. The data will then remain on-premise because BlueGen only sends the structure and characteristics of the data to the (cloud) platform.


BlueGen is built to handle incomplete datasets with missing values, complex data distributions, and high-dimensional categorical columns commonly found in real-world scenarios.

Data Connectors and Integrations

In addition to CSV files, BlueGen can also connect directly to databases to generate synthetic data. Databases supported include MS SQL Server, Oracle, PostgreSQL, MySQL, and SQLite. Furthermore, connectors are available to work with SAP HANA, Snowflake, and AWS Redshift. And through the Command Line Interface (CLI), BlueGen can integrate directly into data engineering and CI/CD pipelines.

"The use of synthetic data will reduce the volume of real data needed for machine learning by 70%."
"Synthetic data will reduce personal customer data collection, avoiding 70% of privacy violation sanctions."

According to Gartner, by 2025

Privacy awards