Data Generation in HOBBIT using TomTom’s Mimicking Algorithm

The number of users of navigation services continues to grow, either using the vehicle’s built-in unit, a dedicated device or a smartphone application. This enables the collection of extensive amounts of floating car data, which can be used to extract information relevant to a number of applications like road administration, traffic management and jam avoiding routing services, among many others. However, collecting a sufficiently large dataset comes with difficulties like costs or privacy concerns. At TomTom we take the user’s right to privacy very seriously,, so we developed a Synthetic Trace Generator [Figure 1], which facilitates the creation of an arbitrary quantity of data from a few statistical descriptions of the traffic. More specifically, it generates some desired number of synthetic individual traces, with a trace being a list of positions recorded by one device (phone, car, etc) throughout one day.

Figure 1: Data required, and produced, by the Synthetic Trace Generator

The generator uses probability distributions for variables like start and end locations of trips, their starting time or what is the device’s update frequency. Using parameters sampled from such distributions, a map is then used to find an appropriate route for the trip and successive points are generated at a regular time interval with typical speeds for each road, as shown in Figure 2.

Figure 2: A segment of a generated trace

The low accuracy of most GPS devices used for navigation result in traces whose individual points show a jittery behaviour, or noise. This noise is also simulated by the generator in order to improve the realism of the data. Although the process makes use of random generators, a seed can be provided to guarantee that the same set of traces is re-generated. Every generated trace is stored in either an RDF, KML or CSV file.

As an example of the data that can be generated, Figure 3 shows how traffic changes during an ordinary day. It was built using 150 thousand synthetic traces in the region of Berlin, Germany, whose individual fixes, the pairs of position and time, were grouped into 10-minute buckets.

Figure 3: An animation of traces generated for a typical day in the Berlin region. On the top we see the current hour of day, on the left margin the latitude and on the bottom, the longitude

This generator is based on previous work published by Konrad Bösche et al. [1] which contains detailed explanations of the entire process, as well as validation of the results.


[1] Bösche K., Sellam T., Pirk H., Beier R., Mieth P., Manegold S. (2013) Scalable Generation of Synthetic GPS Traces with Real-Life Data Characteristics. In: Nambiar R., Poess M. (eds) Selected Topics in Performance Evaluation and Benchmarking. TPCTC 2012. Lecture Notes in Computer Science, vol 7755. Springer, Berlin, Heidelberg


Spread the word. Share this post!

Leave A Reply

Your email address will not be published. Required fields are marked *