Privacy Enhancing Aspects of Synthetic Data with Wim Kees

For a complete audio recording and transcript of this podcast, see this link:

In this podcast, Paul Starrett, founder of PrivacyLabs, interviews Wim Kees Janssen, the founder and CEO of, a company specializing in synthetic data with a focus on privacy-related benefits. Wim explains that the inspiration for Syntho came from facing data privacy challenges in software development, where legal contracts, risk assessment, and energy-draining processes hindered data-driven innovation. The conflict between organizations’ ambitions for data-driven innovation and strong privacy regulations like GDPR and CCPA led to the creation of Syntho to bridge the gap and solve the privacy dilemma.

Synthetic data is generated by a computer algorithm and differs from original ethically sensitive data collected and generated in interactions with clients and internal processes. Syntho uses machine learning to capture statistical values and patterns from the original data, then generates synthetic data reproducing these statistics. The main advantage of synthetic data is that it allows organizations to reduce the use of original sensitive data, minimizing data breaches and unlocking data restricted by privacy regulations.

Paul discusses the potential trade-off between privacy protection and data quality with synthetic data, which can affect the machine learning model’s performance. Wim explains that having a sufficient amount of data is essential to generate high-quality synthetic data. However, operational use cases like sending invoices should still rely on original data, while analytics, data science modeling, and proof of concept procedures can benefit significantly from synthetic data.

They also highlight how synthetic data can enhance agile development processes by providing fast and risk-free access to representative data, enabling open innovation within organizations and data sharing with third parties. Synthetic data allows for data democratization and the potential for monetization through sharing or selling the data. Wim concludes that synthetic data has immense value in accelerating data-driven innovation and AI applications.

Overall, synthetic data is gaining traction, and Gartner predicts that by 2024, 60% of all machine learning will use synthetic data. However, there is still work to be done to educate organizations about the concept of synthetic data and overcome challenges before widespread adoption occurs.

The podcast also touches on Wim’s background, starting in economics and finance and transitioning to software development before venturing into privacy tech with The interview ends with Wim providing his contact information for those interested in learning more about

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top