MEET OUR CODE_N CONTEST FINALISTS 2018: Statice from Germany
Are you ready for GDPR? The new EU-wide data protection regulation introduced in May 2018 changed fundamentally the way companies handle, source and distribute data nowadays! Berlin-based startup Statice has the solution for the emerging new challenges, such as collaboration on customer data, which became nearly impossible these days. The team develops an automatic data anonymization software that allows companies and their partners to easily leverage and process existing or new personal data while still preserving privacy! Curious how that works? Get out all the insights from our conversation with CEO and co-founder Sebastian Weyer.
Lisa: Hi Sebastian, what is Statice all about? What are you trying to solve?
Sebastian: Statice’s mission is to enable data-driven collaboration between organizations, unlock undiscovered insights, solve business problems, and accelerate product development. With personal data becoming scarcer in these times of the General Data Protection Regulation (GDPR), Statice is an enabling power for data-driven innovation. Statice unlocks sensitive customer data for companies while protecting consumers by securely anonymizing this data. Statice is an automatic data anonymization software that generates entirely anonymous synthetic data. This synthetic data preserves the structure and utility of the original data. Therefore, unlike traditional anonymization technologies and products, Statice enables highly complex secondary data use cases – such as building performant machine-learning models on anonymous data. The Statice software can be used on-premise or in the cloud and allows for the enterprise grade processing of big data. The more data available, the better Statice works. Statice’s vision is to be the primary privacy-preserving data hub for data-driven collaboration across companies.
Lisa: How did you come up with the idea?
Sebastian: In our growing customer-centric economy, innovation begins with understanding people. Companies partly do this by measuring, tracking, and storing personal data on individuals in order to quantify personal preferences and use this knowledge to tailor experiences and products towards each customer. Personal data is a core pillar of modern services and products, serving as a key resource for the majority of modern technological advances and discoveries. This not only holds for scientific settings, but also for corporate R&D. At Statice, we believe in two major trends. First, data privacy is becoming an increasingly important asset for companies to provide trust. Second, innovation will build on collaboration across players and data as a main resource. Because anonymous data is exempt from data privacy regulations, we decided to build Statice as an enabling power of both trends, protecting individuals on the one hand and empowering data-driven innovation and collaboration on the other.
Lisa: What are you trying to solve?
Sebastian: Truly anonymizing data is difficult, and doing so in a manner that preserves privacy takes time, resources, and significant domain expertise. Traditional anonymization technologies usually have two problems: either they don’t protect data sufficiently or they change data so much that it’s barely usable for many use cases. So even if the generation of truly anonymous data is successful, this often equals a significant loss in data utility, rendering an anonymous dataset useless. This is why we built Statice. Statice makes anonymizing data easy while maintaining data utility and data granularity. By leveraging the recent advances in machine learning and state-of-the-art privacy techniques, Statice enables companies to release highly granular datasets with no risk of identifying a single individual. We empower companies to open up their new synthetic data in a GDPR-compliant manner for product development, training new machine learning algorithms, and unlocking industry-wide insights – internally, or collaboratively with partners. This allows for complex analysis on newly generated anonymous data.
Lisa: Can data truly be anonymized? What do people most often get wrong about data anonymization?
Sebastian: Two things were and are still most often gotten wrong in anonymization. First, there is a common misconception that there is data that is “personally identifiable” and data that is not. This used to be the case when the datasets were small, both in terms of the number of individuals and the number of individual attributes which were stored. Early techniques used to just suppress uniquely identifying information such as name, phone number, or social security number. But it has been shown that a combination of zip code, age, sex, and birth date can uniquely identify 87% of the American population. With datasets becoming more multi-dimensional, all attributes, without exception, are potentially personal identifiers and must be protected. The second misconception is wrong or optimistic assumptions about attacks. Widely used techniques such as k-anonymity and its variants assume attack scenarios in which the only resource an attacker has is the released data. The reality is that attackers may have access to a number of auxiliary datasets, which they can use alongside the released “anonymous” dataset in order to re-identify it. Any privacy mechanism aiming to achieve true anonymization should not make any assumptions about the attackers and must consider all data as sensitive. What we have built at Statice – a synthetization mechanism based on differential privacy and plausible deniability – meets precisely these two requirements.
Lisa: Thank you for the interview, Sebastian!