5:45pm

Sat April 23, 2011
All Tech Considered

Sifting Through An iPhone's Geo Data, Row By Row

On Wednesday, researchers announced that Apple devices regularly record time and location information of iPhone and iPads. Much has been written since then, but Zach Brand, NPR's senior director of technology takes a look at the data recorded by his own iPhone and explores the more than 44,000 entries showing when and where he has been.

After Alasdair Allan and Pete Warden announced their findings at a recent conference, many folks have explored what it means. Several folks describe why they think it is not a big deal. Congress has requested information from Apple, though last year Apple did provide a statement on how geo information was collected. One thing to note is Apple made the point that anybody can disable collection of geo data by disabling location services.

I decided to take a look at the data on my personal iPhone 3GS running the latest operating system, IOS 4. After following the process described by Alasdair/Warden to look at the data myself, I was shocked to discover 44,477 rows of information of where (and when) my phone has been. But what was in this data?

The data itself is contained in a inconspicuous database file that is saved to your computer when you backup your iPhone or iPad during a synch. The file is located in a deep directory structure with obfuscated naming, such that you won't find it unless you know where to look and you specifically seek it out.

This database file is not encrypted by default but would only exist on a computer that you use iTunes to backup your device to. It is an SqlLite database that contains 40 different tables.

Most of the focus has been on a table called 'CellLocation' within a database named consolidated.db. While some tables were empty, other tables also contained location data. This includes the WiFiLocation table and CellLocationLocal. WiFILocation is worth noting because in my case, it had the most data (111,751 rows). This has likely garnered less attention because the location data is likely GeoIP based, which means they are less precise. Entries in the WifiLocation table that were from my home were widely inaccurate, while an entry from the Las Vegas Convention center was spot on. A final note on the WiFILocation table is that it captures MAC addresses, not IP addresses.

The CellLocation Table has 14 different columns. They are:

  • rowid: A unique ID
  • MCC: Mobile Country Code
  • MNC: Mobile Network Code
  • LAC: Location Area Code
  • CI: The Cell ID or in this case presumably the Cell Tower ID
  • Timestamp: Time in seconds since Jan 1, 2001. Most values actual have sub-second accuracy.
  • Latitude & Longitude: Location information
  • Horizontal Accuracy: Potential inaccuracy in meters for Latitude and Longitude. The location's latitude and longitude identify the center of a circle of probable location, and this value indicates the radius of that circle. A negative value indicates that the location's latitude and longitude are invalid.
  • Altitude & Vertical Accuracy: Vertical location, and the accuracy of the altitude value in meters.
  • Speed: Speed in Meters – however not captured in any of my entries.
  • Course: Direction – also not captured in any of my entries.
  • Confidence: Not documented, but it appears to very based on method used to obtain Latitude/Longitude. For example a typical value I found in CellLocation is 70, while WifiLocations confidence values were typically 50. I interpret this to mean the Cell Tower telemetry based location has a higher confidence than WiFI location, which is presumably GeoIP based.

In an example row of data I examined, it shows the following:

MCC = 310 (USA) and MNC = 410 (AT&T), which make sense.

LAC =11990 A particular area identified within AT&Ts network.

The timestamp value of 299972365 translates to 4th of July, 2010 at 5:39 pm.

Latitude and longitude values that put me just near Penn Station NYC. Horizontal Accuracy of 500 meters meaning I was presumably somewhere within a 500 meter radius of the location identified. Altitude of 0, VerticalAccuracy of -1 meaning the Altitude value is not correct. Speed of -1 and Course of -1 meaning no speed or course information captured. Finally it shows a confidence of 70.

This all fits in general with my recollection of traversing the city as I made my way to the fireworks.

For each of the 44,477 rows of data on my phone, there is one 'value' per column as in the example above. Because my first entry was from June 25th 2010, and the last entry was April 21, I was looking at 301 days of information. However, I quickly discovered there are many days with no entries at all. In fact nearly half of the days (125) have zero entries! Further many duplicative measurements occur within the same second — at 11:51:18 PM on Jan 20th, 2011 my phone captured a record 176 entries — all with slight variations in location. It is unclear from my data what triggers the recording of information, but it would seem to be event driven as opposed to recording on timed intervals.

To explore and visualize the data I used both the iPhoneTracker app but also my own custom scripts to parse the data. I generated maps using a combination of iPhoneTracker, Google Maps, and multiplotter.

Two observations jumped out to me:

First is the inaccuracy of locations that others have also observed. This seems to be directly acknowledged and attempted to be accounted for with the horizontal accuracy and confidence values. The average Horizontal Accuracy in my data was 1,515 meters (nearly a mile radius from location). Further is the fact that the "confidence" rating would suggest these values are known to be uncertain. The most common and also the highest confidence rating in my data was "70."

In mapping out times that I have been working at our offices here Washington D.C., I get a scatter of points in the area but do not see our actual office building being identified. Some have suggested that it is actually the location of nearby cell towers that are being mapped. As others have observed (replicated in my own data) even among entries with CellIDs equal, location value change which seems to disprove this theory. My conclusion is that the data is generally accurate in identifying the neighborhood or region the phone was located in, but does not accurately identify a specific address.

The other interesting view of the data was when I looked for significant changes (more one degree) in my latitude or longitude from one entry to the next. This greatly reduced the amount of data and basically highlighted the flights I'd taken. Typically the first new location after a significant change in latitude/longitude values was an airport where I turned my phone back on. As I looked at this map of airports — and travel between New York and D.C. — it all looked correct except for me having no recollection of passing through Charlotte. But the data was more accurate then my memory because by checking the relevant date, I confirmed I had indeed been on a U.S. Airways flight with a connection there. Turns out, I do have my own personal travel log.

It is unclear why Apple would keep an ongoing record of this data on these devices — but now that it is more widely known, I wonder about the implications. I suspect it won't be too long before use of this data shows up in court. It is easy to imagine a scenario where this data is subpoenaed in a trial to prove or disprove the location of an individual at a given time. Likewise if a company issues their employees iPhones — what are the ramifications of the company potentially having access to the historic location information of their employees? It does appear that if you don't want to have this information captured you can do as Apple suggests and turn off location services. After turning off location services on my own phone the data entries appear to have stopped. Of course now many of the apps are severely handicapped unless I turn it back on. Copyright 2011 National Public Radio. To see more, visit http://www.npr.org/.