Accelerate your medical research, analyze patient data, and eliminate privacy risks

Quality of healthcare.

Patient data play a critical role in improving healthcare and public health. When researchers have timely access to this data, it supports the development of new treatments. It enables them to develop better policies and treatment protocols. It also gives them more ways to scientifically evaluate policies and treatments. 

The need for this is growing because of the rise of personalized health care. This requires data analysis of patients at a more detailed and individual level.


Still, access to most patient data in healthcare is subject to many limitations. This limits the development, innovation, and efficient implementation of new research, new products, services, and systems.

Sharing real patient data is virtually impossible because of privacy regulations and ethical concerns. The United States, for example, has the Health Insurance Portability and Accountability Act (HIPAA) and the European Union has the General Data Protection Regulation (GDPR).

... Patient data play a critical role in improving healthcare and public health

... Access to most patient data in healthcare is subject to many limitations

Use cases

Simulations and predictions

Better assessment of policy options

Simulation and prediction research requires a great number of data sets. This is the only way to accurately predict behaviors and outcomes. Real-world sources, for example from statistical agencies, are often not accessible to most researchers. However, synthetic data can be created from real-world datasets. These can be used to replace, or supplement, real-world data. Researchers can increase sampling sizes and add variables if they are not present in the original set. This makes it easier to assess the impact of policy options.

Clinical research

Innovation and research for pharmaceutical companies

Clinical studies can be lengthy and extremely expensive. A lot of time is needed to process and test data. For the development of new medications, a lot of patient data must be collected to measure their impact. This information is also needed for pricing and justification. Real data on this is often clustered and inadequate. This makes it necessary to combine data, which is often not possible. Synthetic data offers a solution here.

Algorithms, hypotheses, and methods

Increase model efficiency and accuracy

Researchers analyze variables, assess the feasibility of data sets, and test hypotheses. Using synthetic data provides another level of validation. This can benefit techniques and testing methods for algorithms used for machine learning. The combination of synthetic data, public records, and real data makes it possible to verify the robustness, efficiency, and accuracy of an algorithm.

Epidemiology and public health research

Test sets for algorithm improvement

Datasets for epidemiology and public health research are often limited in size. This leads to concerns about quality requirements, reporting procedures, and privacy. Because of their ownership, these data are also cost-intensive. With synthesized datasets, data are more readily available. These support researchers in real-time computational epidemiology and sensitivity analyses. They also make it possible to build more comprehensive test sets to improve disease detection algorithms.

Testing and demonstrating Electronic Patient Record (EPR) systems

Faster development cycle, saving cost, time and labor

Software testing is costly and labor-intensive. It requires between 30 and 40 percent of the development life cycle. Patient data is sometimes used to run the tests, with all the risks that entails. With synthetic data, they still have a data set that is privacy-secure yet realistic. It accelerates the development of the operational lifecycle while saving cost, time, and labor.

Release of datasets

Maintain analytical value for open research

Health datasets released for public use should retain their value for analysis. Confidentiality of the records should be maintained in the process. However, during de-identification of microdata, the potential for re-identification remains. The release of partially synthesized data limits the risks of disclosure while the data have valid interferences.