$Id$ 0. Introduction. Vidalia tracks geographic coordinates of the IP addresses for Tor servers, so it can display them in its Network Map window. This document describes the actual lookup and caching mechanism, and also lays out some of the security questions and future directions. 1. Fetching and caching. When we learn one or more new descriptors, we check to see if we have a cached mapping for each IP address to some geographic location. An example mapping is: 206.124.149.146,Bellevue,WA,US,47.6051,-122.1134 If we don't have a cached answer, we send an HTTP request to a perl script located on one of our geographic information servers, asking about one or more IP addresses. The requests are always sent to http://geoip.vidalia-project.net/cgi-bin/geoip which is currently hardcoded into Vidalia's source code. Requests are distributed via DNS round-robin. Currently, we have two such servers: Host IP Address Operator -------------------------------------------------------------------- pasiphae.cs.rpi.edu 128.213.48.11 Matt Edman Rensselaer Polytechnic Institute cups.cs.cmu.edu 128.2.220.167 Sasha Romanosky, Serge Egelman Carnegie Mellon University Request logs are not kept on the geographic information servers. Each server maintains a GeoLite City database from MaxMind which allows lookup of geographic location information for a given IP address. Database lookups are done using a Perl database API, also provided by MaxMind. More information on the GeoLite City database can be found at http://www.maxmind.com/app/geolitecity. Requests can be formatted as either HTTP GET or POST requests. A GET request for geographic information is applicable for small requests consisting of only a few IP addresses. An example of a GET request for geographic information for a single IP address (128.213.11.48) is: http://geoip.vidalia-project.net/cgi-bin/geoip?ip=128.213.11.48 which returns the following information: 128.213.11.48,Troy,NY,US,42.7495,-73.5951 Geographic information is formatted in the body of an HTTP response as IPAddress,City,State,Country,Latitutde,Longitude Multiple IP addresses in a single GET request are separated by commas. For example: geoip?ip=128.213.11.48,18.244.0.188,128.2.220.167 which would return the following information in the body of a standard HTTP response: 128.213.11.48,Troy,NY,US,42.7495,-73.5951 18.244.0.188,Cambridge,MA,US,42.3646,-71.1028 128.2.220.167,Pittsburgh,PA,US,40.4439,-79.9562 Requests can also be formatted as HTTP POST requests, suitable for requesting geographic information for a large number of IP addresses. The request is formatted in a similar manner as for an HTTP GET request, but with the list of IP addresses placed in the body of the request. POST /cgi-bin/geoip HTTP/1.0 Host: geoip.vidalia-project.net Content-Length: 42 ip=128.213.11.48,18.244.0.188,128.2.220.167 Vidalia always uses HTTP POST requests to request geographic location information. The order of results returned IS NOT guaranteed to be the same as the order of IP addresses given in the original request. If the geographic information database does not contain any information for an IP address given in a request, the string "UNKNOWN" follows that IP address in the response, separated by a comma (e.g., "1.2.3.4,UNKNOWN"). If no IP addresses are provided in a request, geographic information for the IP address of the requestor is returned. Vidalia currently does not use this feature. We cache geographic information in an unsorted text file called "~/.vidalia/geoip-cache" (on Windows, the cache file is stored in %APPDATA%\Vidalia\geoip-cache) along with an optional Unix timestamp, such as: 206.124.149.146,Bellevue,WA,US,47.6051,-122.1134:1159123625 If no timestamp is associated with a particular cache entry, the current date and time will be used. We load the cache file on startup, discard all entries that are over a month old, and write a new version. After that the cache file is append-only. The requests are done over Tor, as ordinary socks requests to the local Tor client. Vidalia will query the local Tor client for its socks listening address and port via Tor's controller interface. Once Vidalia queues a request for geographic information, we wait MIN_RESOLVE_QUEUE_DELAY (currently 10 seconds) after the last queued request, but no longer than MAX_RESOLVE_QUEUE_DELAY (currently 30 seconds) after the first queued request, before actually launching the connection. This way we lump them together and send one larger request instead of several little ones. If Vidalia's network map window is not currently visible, the requests are queued until either the queue contains requests for at least 1/4 of all known servers, or the network map window becomes visible again. If we don't get an answer, we don't retry -- if we don't have geographic information for a server, it simply doesn't get mapped. 2. Security and anonymity questions. First, can the operator of the above URLs track popularity and spreading of servers? Yes. Does this buy him anything? I'm not sure. Second, because no end-to-end encryption/authentication is used, the exit node can discover what is being requested -- and can modify the answers that are sent back. What are the partitioning opportunities in this scenario -- both passive partitioning to discover patterns of behavior based on which descriptors have just been fetched, and active partitioning to mislead the user into believing a given server is at a certain set of coordinates? 3. Future directions. 3.1. Encryption to/from the coordinate servers. It would be smart to encrypt the queries and responses, to at least limit the exposure. This could be done simply by running a Tor server nearby each geoip service, and asking for the address geoip.vidalia-project.net.foo.exit:80 Of course, this approach introduces more points of failure. A more complex scheme would be for Vidalia to check first whether the preferred exit server is running, and modify the address only when it is. 3.2. Tor servers could include geoip data in network statuses. Rather than having separate geoip services that Vidalia maintains, we could instead integrate the geoip data into the Tor network status documents. The Tor directory authorities would learn this information and the users would learn it through their ordinary directory downloads. This would make life easier for Vidalia, but it would also increase the bandwidth overhead of network-status downloads -- rather than caching the geoip information, users would fetch it at every update. 3.3. Map networks, not individual IP addresses. We should stop mapping individual IP addresses. For servers that have dynamic IP addresses, we end up with something like 68.179.33.128,Toronto,ON,CA,43.6667,-79.4168:1155698575 68.179.33.129,Toronto,ON,CA,43.6667,-79.4168:1155696448 68.179.33.131,Toronto,ON,CA,43.6667,-79.4168:1157209955 Instead we should just cache 68.179.33.0/24,Toronto,ON,CA,43.6667,-79.4168:1155698575 Ideally we would have a database, similar to our current GeoLite City database, that can provide network prefix information given an IP address. In the absence of such a database, simply matching on /24 might be easiest and sufficient. 3.4. What else is geoip information for? What other uses do we have for this information? Is it only useful for drawing maps of the Tor network? Once we start letting users control their circuits based on geographic data, the security questions in Section 2 become more challenging.