Theoretical basics for privacy-aware analysis of mobile data 1/2: Which data do operators have? What are the limitations?
Ernst Versteeg, Swisscom Ltd

In the analysis of anonymous mobile data there are two factors which are mission-critical: The data from mobile networks and the location calculation. Without data and without taking into account the boundary conditions, every analysis is doomed to fail. This first part of the two speeches does focus on the data from mobile networks, which sources are available and which boundary conditions have to be considered. We show which different data formats Swisscom can deliver and our experience with it till now. Currently we deliver anonymous data to TomTom, for online calculation of the speed on all major roads in Switzerland. We are also working on trials with the Swiss Poster Research and other organizations which are interested.

Theoretical basics for privacy-aware analysis of mobile data 2/2: Where are they really? Insights into the Devil's Kitchen of positioning in mobile networks
Ernst Versteeg, Swisscom Ltd

In the analysis of anonymous mobile data there are two factors which are mission-critical: The data from mobile networks and the location calculation. If the location calculation is not correct, if the location areas are not as small as possible, and do not have at least a hit rate of 95%, each analysis on top of that data is doomed to fail, regardless how much effort is put into the analysis. This second part of the two speeches does therefore focus on the location calculation, the methods, and which massive limitations we found in the commercial location systems available today. We show which solution we developed and talk about what we see what is possible in the best case today. At the end we show the difference between theory and practice.

Unveiling the complexity of human mobility by querying and mining massive trajectory data
Prof. Dr. Dino Pedreschi, KDD Lab, ISTI-CNR, Pisa University
Click for the presentation.

The technologies of mobile communications pervade our society, and wireless networks sense the movement of people and vehicles, generating large volumes of mobility data, such as mobile phone call records and GPS tracks. In this work, we illustrate the analytical power of massive collections of trajectory data, sensed by vehicular GPS devices or at a fine spatio-temporal resolution, in unveiling the complexity of urban mobility. We highlight the results of a large scale experiment, based on a real life GPS dataset, obtained from 17,000 private cars with on-board GPS receivers, tracked during one week of ordinary mobile activity in the metropolitan area of the city of Milan, Italy. We show how a comprehensive atlas of urban mobility, the M-Atlas, can be created, which reveals the relevant mobility behaviors, such as commuting trips, frequently followed itineraries, extraordinary events. We also describe how GSM data can be used within our methodology. This analytics is made possible by the M-Atlas system, an integrated platform for the analysis of mobility data. The system combines spatio-temporal querying capabilities with data mining, thus providing a full support for the Mobility Knowledge Discovery process. M-Atlas has been designed around a core of models and algorithms for trajectory data mining and analysis, including trajectory pattern mining, trajectory clustering according to various similarity notions, trajectory classification and prediction.

Towards nation-wide models of mobile behavior using GSM data
Dr. Michael May, Fraunhofer IAIS

Many applications in industry operate nation-wide and therefore require national mobility models. However, most models of mobile behavior and data mining algorithms are limited in space or do not scale on national level. We present an approach that aims at the development of nation-wide mobility models by fusing different mobility information. Starting with traffic count data of selected positions in space, we develop a nation-wide traffic frequency map. In a second step we enrich this map with GPS data in order to model spatial correlation of movements and to provide socio-demographic information. In addition, we extend our model to objects in order to depict mobility, for example, in train stations. The next step to take us closer to a realistic mobility model is the refinement in time. We utilize GSM data to form a dynamic distribution of traffic frequencies. - Nation-wide mobility models pose several challenges. Perhaps the most challenging task is to fuse the different data sources in order to form a coherent image of reality.

Temporal Unmixing - a new method for the time series analysis of GSM radio network performance measurements
Dr. Stephan Schädlich, Vodafone D2 GmbH
Click for the presentation.

With Temporal Unmixing we are analysing performance measurements which are automatically carried out within our GSM radio network (e.g. measurements of the radio traffic volume). Our goal is to achieve information about radio cells which allow e.g. a classification due to the different usage within the cells (living, working, shopping, traffic...). Example: The weekly course of the radio traffic of a cell which is meanly affected by car traffic shows high values from Monday to Friday, with peaks in the morning and evening (rush hours). The estimation of fractions of different usage types within a GSM radio cell might be relevant for mobility research questions or geomarketing applications (e.g. the evaluation of shop locations).

Privacy-by-design for mobility data mining
Prof. Dr. Dino Pedreschi, KDD Lab, ISTI-CNR, Pisa University
Click for the presentation.

Big data of human mobility open up a challenging scenario for data analysis and mining. On the one hand, exciting opportunities arise of discovering new knowledge about human mobile behavior, thus fueling intelligent info-mobility applications. On the other hand, new privacy concerns arise when mobility data are published. The risk is particularly high for GPS trajectories, which represent movement of a very high precision and spatio-temporal resolution: the de-identification of such trajectories (i.e., forgetting the ID of their associated owners) is only a weak protection, as generally it is possible to re-identify a person by observing her routine movements. We propose a method for achieving real anonymity in a dataset of published trajectories, by defining a transformation of the original GPS trajectories based on spatial generalization and k-anonymity. The proposed method offers a formal data protection safeguard, quantified as a theoretical upper bound to the probability of re-identification, while preserving the data utility towards a class of analytical tasks, namely cluster analysis. We conduct a study on a real-life GPS trajectory dataset, and provide strong empirical evidence that the proposed anonymity techniques achieve the conflicting goals of data utility and data privacy; besides, it can be easily extended to GSM data. Our work is an instance of the “privacy by design” principle in the domain of data analysis: if the analytical process is designed with assumptions about i) the (sensitive) personal data that are the subject of the analysis, ii) the adversarial model, i.e., the knowledge and purpose of a malicious party that has interest to discover the sensitive data of certain individuals (or institutions), and iii) the target analytical questions that are to be answered with the data, then it is conceivable to design a privacy-preserving analytical process able to: 1) transform the source data into an anonymous or obfuscated version with a quantifiable privacy guarantee – i.e., the probability that the malicious attack fails, and 2) guarantee that the target analytical questions can be answered correctly, within a quantifiable approximation that specifies the data utility, using the transformed data instead of the original ones.

Future applications of mobile phone data and digital signage
Dr. Hans-Jörg Müller, Deutsche Telekom AG Laboratories

In this talk an outlook is given on future applications for digital signage that may become possible using mobile phone data. Such applications begin with simple forms of audience measurement. They can be extended to estimating reach, tailoring content to audiences, and even predicting where audiences of digital signage will go next. Besides content tailoring, this enables for example continuous storytelling along audience paths. An overview of current research in this direction in pervasive computing is given and the tradeoff between utility of applications and privacy is discussed.