According to a paper, successfully anonymising data is practically impossible for any complex dataset. Here, an interesting article of Alex Hern (https://www.theguardian.com/profile/alex-hern)
Data can be deanonymised in a number of ways. In 2008, an anonymised Netflix dataset of film ratings was deanonymised by comparing the ratings with public scores on the IMDb film website in 2014; the home addresses of New York taxi drivers were uncovered from an anonymous data set of individual trips in the city; and an attempt by Australia’s health department to offer anonymous medical billing data could be reidentified by cross-referencing “mundane facts” such as the year of birth for older mothers and their children, or for mothers with many children.
Now researchers from Belgium’s Université catholique de Louvain (UCLouvain) and Imperial College London have built a model to estimate how easy it would be to deanonymise any arbitrary dataset. A dataset with 15 demographic attributes, for instance, “would render 99.98% of people in Massachusetts unique”. And for smaller populations, it gets easier: if town-level location data is included, for instance, “it would not take much to reidentify people living in Harwich Port, Massachusetts, a city of fewer than 2,000 inhabitants”.
At Espereal Technologies, we choose the approach of not to get any personal data at all, for our technologies of intelligent crowd sensing and tourist assistance.