John Collins, co-founder and chief product officer at data vendor Thasos, spoke to HFM about the evolution, complexity and various use cases for geolocation data at hedge funds.
Some firms perceive geolocation data as inherently complex to work with. What do you think is behind this perception?
To answer that question, it might be useful to understand the general geolocation data operating model.
We license the raw data from a couple of different types of producers or owners of that data. One is the application developers or publishers themselves. These are the people and companies that build mobile phone apps. We have one-to-one contracts with them and we receive the location data only on an anonymised basis directly from those apps.
The data consists of four fundamental fields – a latitude coordinate and longitude coordinate to determine position in space, an anonymous and persistent device identifier and a time stamp representing the time when that event occurred. So that’s the format in which we receive the data –essentially a set of lat-longs. We get that information directly from application publishers, aggregators, SDK providers and several other sources.
What some vendors will do is associate all the lat-longs with a point of interest and assume that’s enough. They’ll tag it to a ticker and send it to the [client]. The problem with that approach becomes obvious when the source – let’s say a popular navigation app – has been growing its user base. Say there are twice as many people using it this year as there were last year. If I simply associate all the lat-longs with a location of interest, say, and send that to the funds, what the funds are going to find is that the growth rate of visitation of [the location in question] is approximately two times this year what it was last year. But that doesn’t make any sense and isn’t true to what’s occurring.
This example represents what we call a change in the panel. Such changes occur all the time. Note that even if you fix the panel, so no users are added or dropped, the probability that we observe those users also changes over time. For example, if last year the navigation app collected lat-longs every 30 seconds and this year it does so every 30 minutes, then the probability we observe someone at, say, a fast-food restaurant has been dramatically reduced.
Solving for these problems and related biases that are inherent to geolocation data is extremely challenging—hence its reputation for being difficult to work with.
To read the full article, click here.