Introduction

The increasing number of georeferenced sequences in GenBank 1 and the growth of DNA barcoding 2 means that the raw material to create geophylogenies 3 is readily available. However, constructing visualisations of phylogenies and geography together can be tedious. Several early efforts at visualising geophylogenies focussed on using existing GIS software 4, or tools such as Google Earth 5,6,7 . While the 3D visualisations enabled by Google Earth are engaging, it’s not clear that they are easy to interpret. Another tool, GenGIS 12,13 , supports 2D visualisations where the phylogeny is drawn flat on the map, avoiding some of the problems of Google Earth visualisations. However, like Google Earth, GenGIS requires the user to download and install additional software on their computer.

By comparison, web maps such as Google Maps 15 and OpenStreetMap 16 are becoming ubiquitous and work in most modern web browsers. They support displaying user-supplied data, including geometrical information encoded in formats such as GeoJSON, making them a light weight alternative to 3D geophylogeny viewers. This paper describes a tool that makes use of the GeoJSON format and the capabilities of web maps to create quick and simple visualisations of geophylogenies.

2D layout of geophylogenies

The following discussion assumes that we have a phylogeny, and that for most (if not all) of the OTUs in that phylogeny are associated with a point locality for which we know the latitude and longitude.

In order to draw a geophylogeny on a web map we need to solve three problems. The first, relatively trivial problem is to place the the localities of the OTUs on the map (I shall refer to these as “occurrences”).

The second is to draw the phylogeny. Typically when drawing an evolutionary tree we compute x and y coordinates for a device where these coordinates have equal units and are linear in both horizontal and vertical dimensions, such as a computer screen or printer. In web maps coordinates are expressed in terms of latitude and longitude, and in the widely-used “web mercator” projection the y-axis (latitude) is non-linear. Furthermore, on a web map the user can zoom in and out, so pixel-based coordinates only make sense with respect to a particular zoom level.

The single 256 × 256 pixel tile representing the globe a zoom level 0 showing the pixel coordinates for the top left corner, the centre (corresponding to longitude 0, latitude 0), and the bottom right. Tile image map tiles by CartoDB under CC-BY 3.0 license.

Fig. 1: Web map tile

The single 256 × 256 pixel tile representing the globe a zoom level 0 showing the pixel coordinates for the top left corner, the centre (corresponding to longitude 0, latitude 0), and the bottom right. Tile image map tiles by CartoDB under CC-BY 3.0 license.

Web maps use “tiles” of a fixed size to represent the globe. Each tile is typically 256 × 256 pixels in size, and the number of tiles comprising a map is 2zoom where zoom is the zoom level. At zoom level 0 the map comprises a single tile (Fig. 1), at zoom level 1 the map comprises 4 tiles, and zoom level 2 eight tiles, and so on. To accommodate the web mercator projection, we first compute a geographic bounding box for the tree based on the bounding box that encloses the occurrences, then offset that box so that so that the tree is drawn, say below, the occurrences. We can then convert the longitude λ and latitude Ω coordinates of the bounding box to pixels x and y at zoom level 0 using the following formulae:

Note that the maximum latitude that can be displayed in the web mercator projection is 85.051129° north and south. The tree drawing is then laid out within that bounding box, with the nodes positioned in terms of pixels. Once pixel coordinates have been computed for the whole tree, they are then converted back to latitude and longitude values:

Expressing the tree in terms of latitude and longitude coordinates means that the rendering of the tree as the user zooms in and out is handled automatically by the web map application.

If we want to provide the user with a visual connection between each occurrence on the map and the location of the corresponding OTU in the phylogeny, we can draw a line connecting the two. These lines may criss-cross creating visual clutter, reducing this clutter is the third problem. To make the diagram more comprehensible, I adopt the approach used by GenGIS 12,13 to reorder the nodes in the tree to minimise the number of crossings 8. As an additional feature, if a taxon is represented by more than one occurrence, we can enclose the set of occurrences by a convex polygon to represent the range of that taxon.

Having computed a layout, we then need to render that on a web map. There are a number of different web maps available, each with their own API. Rather than tie the visualisation to a particular API, we can use a standardised output format, such as GeoJSON, to encode the layout, so that users can pick which web map they wish to use for the visualisation.

GeoJSON

GeoJSON 17 is a format for encoding geographic data in JSON (JavaScript Object Notation). It includes various geometry types (such as Point, LineString, and Polygon), and is supported by a number of online mapping tools, including Google Maps 15 and Leaflet 18 . A GeoJSON document comprises a set of one or more features, each of which has a geometry and additional properties. Using the GeoJSON geometry types we can encode occurrences (Point), the tree (a set of LineString), and taxon distributions (Polygon) in GeoJSON, then have the entire visualisation rendered by the web application. The GeoJSON specification does not, by itself, include any information on how to display the objects encoded in a GeoJSON document (e.g., what colour to use for a line), but some informal standards have emerged, such as storing CSS styles as properties.

Input format

In order to create the visualisation we also need a way to input a phylogeny and the geographic localities. The approach taken here is to use the NEXUS format 9 , and the GEOGRAPHIC datatype introduced by the Mesquite Cartographer package 14 . While some might argue that XML represents the future of phylogenetic file formats 10 , NEXUS is easy to manually edit and hence facilitates debugging and exploring the software. Given a set of OTUs, the tool expects a NEXUS file with a TREES block describing a tree, followed by a CHARACTERS block encoding the location of each OTU. Each OTU is typically a DNA sequence. Sets of sequence may belong to the same taxon (e.g., a species or a DNA barcode BIN 2 ). Following Mesquite, this information can be stored in an ALTTAXNAMES command in a NOTES block.

Figure 2 shows a NEXUS file for the widely used Banza example 11,19

NEXUS file for Hawaiian Banza, with geographical data encoded in the CHARACTERS block.

Fig. 2: NEXUS file for Hawaiian Banza

NEXUS file for Hawaiian Banza, with geographical data encoded in the CHARACTERS block.

Implementation

I have implemented a NEXUS to GeoJSON converter using PHP. The code parses the NEXUS file, computes a bounding box based on the distribution of the OTUs, draws the tree, and exports the result in GeoJSON. The code is available on github https://github.com/rdmpage/geojson-phylogeny. Code for the examples in this article are available from https://github.com/rdmpage/geojson-phylogeny-manuscript/. A live demo can be explored at http://bionames.org/~rpage/geojson-phylogeny/ which includes examples of visualising geophylogenies using both Google Maps (Fig. 3) and Leaflet (Fig. 4).

Fig3-GoogleMaps-CC-BY-no logo

Fig. 3: Geophylogeny for South American marsupial

Geophylogeny for DNA barcodes for the marsupial Proechimys guyannensis, showing two distinct clusters that are geographically allopatric (data from BOLD, map tiles by CartoDB under CC-BY 3.0 license).

Geophylogeny for Hawaiian katydids (genus Banza) displayed using the Leaflet framework with map tiles by CartoDB under CC-BY 3.0 license.

Fig. 4: Geophylogeny for Hawaiian katydids

Geophylogeny for Hawaiian katydids (genus Banza) displayed using the Leaflet framework with map tiles by CartoDB under CC-BY 3.0 license.

Discussion

At present the method described here requires a middle layer (written in PHP) that resides on a web server and converts the NEXUS file to GeoJSON. An obvious extension would be to port that code to Javascript and have the entire tool function within the web-browser client.

Although lacking some of the functionality of more specialised software such as GenGIS, an advantage of a web map-based tool is that it brings phylogenies into an environment already familiar to users of biodiversity data, such as the GBIF portal. Many users will have already encountered points on maps, and layers (e.g., of environmental data, or estimated species distributions). By representing phylogeny in GeoJSON we open the way for phylogenetic information to be incorporated into these maps.

Another reason GeoJSON is attractive is that because it is a JSON document it could be stored and indexed in a document database such as CouchDB 20 , which I’ve used elsewhere for taxonomic and phylogenetic data 21 . Hence we could imagine being able to quickly build a database of geophylogenies that can be queried both taxonomically and spatially. This would be one way to tackle the challenge of Kidd’s call for a “map of life”3.

Competing interests

The authors have declared that no competing interests exist.