Official statistics and mobile network operators: a business model for partnerships

Keywords:Mobile phone data, big data, official statistics, mobile network operator, business model, partnership.

1.Introduction

Recent changes in society and technology have given rise to ‘big data’, the explosive increase in huge volumes of unstructured data being created continuously by sensors and cameras, satellites, machine-to-machine communication, e-business, electronic payments and withdrawals, all kinds of internet activity, social media, … This data deluge contains valuable information which data owners seek to exploit for their own operational and commercial purposes. It can also serve, however, as a source for new types of official statistics, in real time, more objective and providing an entry to phenomena inaccessible until now.

Mobile phone data constitute an especially promising big data source with specific characteristics. Seen from the core business of network operators, i.e. running a mobile network and billing its customers, the different types of records generated are ‘exhaust’, a by-product which needs considerable prior investment to turn it into data which can be exploited. It is far from obvious that operators, private profit-driven companies, should want to make this investment just for the sake of statistics or that they can be compelled to do so. They may be willing, however, to enter into a mutually beneficial partnership.

Before mobile phone data can be used for any purpose, numerous issues need to be tackled first: getting to know the many different data types and their characteristics; technical and legal (privacy) constraints; storing and handling massive data volumes; data protection from competitors and confidentiality; ensuring validity, accuracy, reliability and providing against selectivity and bias; and finally, guaranteeing stable access over time, ultimately leading to viable, sustainable and affordable arrangementsbetween data owners, statistical institutes and other partners.

The present paper, jointly written by representatives of statistical institutes and mobile network operators, presents a collaboration project in Belgium to explore the questions listed above and sketches the first outlines of a mutually beneficial long-term partnership between private companies and public bodies equally contributing and profiting.

2.A collaboration project

In December 2015 Statistics Belgium, Eurostat and Belgium’s leading[1]network operator Proximus launched a joint project to assess the information content and possible uses of the Proximus mobile phone data. Because of its exploratory nature, the scope of the project was deliberately left open at the start. But as there wasa clear ambition to present first results by April 2016, concrete research objectives were formulated. The project group, soon joined by the European Commission’s Joint Research Centre (JRC), assembled 11 specialists covering all possible angles and contributing technical, statistical, data warehouse, IT, GIS, business and domain expertise.

The project was designed to follow a logical and incremental order, starting with the estimation of actual population, to be followed byresident population (establishing usual place of residence), determining place of work and all other aspects of the ‘usual environment’, finally leading to the possibility of measuring commuting, tourism, short-term and longer-term labour mobility and migration. In order to avoid all privacy issues and the delay these might cause, the first studies used only aggregated data, without any tracking of individual mobile devices.

The first overall analysis, of actual and resident population and population density, was innovative in several important ways. First of all, unlike most previous studies which locate a device in space and time through call detail records (CDRs -needed for billing whenever a mobile phone is used), the present study used all network signalling events, about 10 times more frequent thanCDRs. The analysis also combines mobile phone data with statistical datasets making it possible to validate one against the other and to create totally new information. In order to do so, the geographical units of mobile phone data (the area covered by each of Proximus’ 11,000 mobile network cells) had to be converted to the standard 1 km² area of statistical datasets, and vice versa (much higher precision, using triangulated coordinates, is foreseen in future studies).

First results were completed and published as foreseen [1]. They can be highlighted by the graph below showing population densities per km² derived from mobile phone counts at 4 am on Thursday 8 October (left) and the 2011 population census (right). The Pearson correlation between these two datasets is 0.85, a clear indication that mobile phone data are able to provide a valid and accurate measure of population density.

Figure 1.Belgium’s population density per km² based on Proximusmobile phone data (left) and 2011 Census (right).

The project is ongoing and now focuses on the addition of spatiotemporal datasets (e.g. land use, urban versus rural areas) to filter out error variation and thus hopefully further increase correlations, and on new mobile phone datasets addressing new questions. Several other papers have been published or are in press.

3.the emerging business model

Mobile phone data can be exploited by network operators for their operational and commercial purposes, and by statistical institutes to create official statistics. More and more operators wake up to the fact that data created while running communication services constitute a strategic asset that can be transformed into business opportunities and real value, if the right information can be extracted and converted into marketable products meeting the demands of users. The statistical system is currently going through a similar process in its attempts to better serve user needs, increase efficiency and reduce burden on respondents.Both face the common challenge of converting large amounts of data into focused information, e.g. extracting information on behaviour rather than asking persons about it, or relating information to persons instead of mobile devices.

For statistical institutes mobile phone data are part of a paradigm shift known as the ‘third data revolution’ in which statistics will be based increasingly on big data, replacing partly at least the sources previously in use: surveys, since the beginning of official statistics in the early 19th century; and administrative datasets such as population, tax or land registers exploited from the end of the 20th century onwards.

The use ofmobile phone datais expected to result in faster and even real-time statistics, at a much more detailed geographical and temporal level, with nearly complete coverage, eliminating response bias, at a lower cost and without the need to burden citizens and enterprises. Moreover, they offer an entry point in near-real time to phenomena inaccessible until now (e.g., actual present as opposed to registered population, detailed commuting patterns by day of the week, weather conditions, etc.).

However, their access and use by official statistics is hampered by several factors. Statistical institutes cannot readily access the data and therefore have limited knowledge about their characteristics, they lack the IT infrastructure for handling these large volumes and legal arrangements to regulate access and modalities of use are inadequate. Moreover, compelling network operators to provide access is problematic because mobile phone data are not just lying about, ready to be used by official statistics. On the contrary, significant prior investment is needed to transform network signals and events into exploitable data.

Fortunately, most network operators are now aware they also need these data for their own purposes and have started investing in setting up the appropriate infrastructure. Their first use is network optimisation which saves a lot of money, but another important consideration is the huge commercial potential of the data. The initial investment, however, is not the only challenge operators are facing when they start to commercialise their data.

Operators face the problem that extracting valid and accurate information from data requires specialised skillsand experience which as a rule they lack, while theyare a core competence of statistical institutes. The one billion registrations Proximus collects every day from clients do not automatically produce valid and usable information; while a much smaller sample may do so. Another drawback for operators is that they possess only mobile phone data, with limitations that soon become apparent, and lack further contextual information.

This situation, where both statistical institutes and network operators see huge potential benefits in exploiting mobile phone data, but lack important resources for doing so, provides an idealfoundation for mutually advantageous cooperation.A closer look at the objectives of network operators and statistical institutes shows they are non-competing: statistical institutes are a priori interested in larger populations and general developments for informing society in general and supporting policies, while operators want to sell information on specific situations to specific clients. On the other hand, the assets each can contribute are complementary.

The advantagesof cooperation for official statistics are obvious. First of all, without access to the data there simply can be no statistics. But apart from this, operators can also provide other invaluable inputs such as metadata, technical expertise, data storage and processing infrastructure and use cases increasing knowledge of the data.Statistical offices can contribute statistical competency, specifically when inferring from specific observations to the total population, and strict commitment to quality and its various components, such as relevance, accuracy, coherency, timeliness or completeness. For this purpose they can rely on existing data at a more detailed level than the ones published, using them to add meaning to the mobile phone data, provide a quality benchmark or, by integrating them, to increase the quality of the final product.

Mobile network operators also have a lot to gain. Most importantly by blending tailored geocoded statistical datasets obtained from statistical institutes with their own data to drastically increase the commercial value of the latter. They also benefit from the statistical and domain expertise of statistical institutesputting different data sources in context and applying a statistical approach when extracting valid and reliable information from raw data. Furthermore, collaborating with official statisticians imparts a quality stamp on the datasets the operators furnish to their commercial clients. And, last but not least, it provides an opportunity to bolster their reputation as a socially responsible corporation perceived to contribute freely to the public good.

4.Conclusions

The collaboration project of Statistics Belgium, Eurostat and Proximus has proven the feasibility of a business model in which mobile network operators and statistical institutes both invest in a cooperation to better achieve their respective and non-competing commercial and statistical objectives. Ideally this public-private partnership should lead to a long-term arrangement allowing a statistical institute to produce statistics based or partly based on mobile phone data in a sustainable way, while the mobile network operator providing the data obtains additional datasets and statistical support enhancing their data commercialisation business.

The alternatives to a voluntary arrangement present serious flaws and shortcomings. Doing it alone is obviously impossible for official statistics, but also suboptimal for operators. An external integrator collecting and combining mobile phone data will be unable to guarantee impartiality and other quality standards to which statistical institutes have to adhere by law. Finally, a legal compulsion of network operators does not exist and will be very hard to impose in a reasonable way, given the large investment needed to create and store mobile phone data.

References

[1]F. De Meersman, G. Seynaeve, M. Debusschere, P. Lusyne, P. Dewitte, Y. Baeyens, A. Wirthmann, C. Demunter, F. Reis and H.I. Reuter,Assessing the Quality of Mobile Phone Data as a Source of Statistics, Q2016 Quality Conference paper(2016)[

1

[1]40.3% market share in 2012 ().