U.C. Santa Cruz Genome Browser
Software Developer for the UCSC Genome Browser since 2002.
Software Developer for Jim Kent at UCSC. I work with C, Python and Javascript focusing on data mining then representing the data in a visually appealing way.
The Ebola epidemic continues to grow in West Africa. The U.S. Centers for Disease Control (CDC) estimated the occurrence of 21,000 cases in Sierra Leone and Liberia alone by Sept. 30, 2014, surging to 1,400,000 cases by Jan. 20, 2014, if the epidemic continues to grow at the current pace
The UCSC Genome Browser is a mature web tool for rapid and reliable display of any requested portion of a genome at any scale. The genome itself forms the horizontal axis that can be zoomed and scrolled. The vertical axis is a stack of annotation tracks, each containing a particular type of data. Examples of common annotation track types for a typical vertebrate genome include genes, comparative multiple alignments of many genomes, and SNPs. The tracks can be displayed at various levels of detail, and clicking on an item in a track displays a page of information about that item.
We have adapted the Genome Browser to support the display of the Ebola virus genome and a diverse set of annotations. In addition to the Ebola Genome Browser, we constructed an Ebola Portal page that wraps around the browser and other collected resources. These include a set of sequences of antibodies that bind Ebola, for use in research into vaccines and antiserum type therapies and links to many other Ebola resources.
We started with the UCSC Genome Browser code base, primarily written in C, which includes utilities for transforming data from one format to another, tools for loading the MySQL database, and CGI programs that create web pages based on the contents of the database. The source code, available at
The UCSC Genome Browser displays centers around a reference genome assembly to which all annotations are aligned. After conversations on the compatibility of annotations with Dr. Pardis Sabeti from the Broad Institute, we decided to use the sequence from GenBank accession KM034562.1 as our reference sequence. This allowed us to quickly import the extensive set of 99 Ebola genomes from Gire et al. (2014)
We have successfully produced a UCSC Genome Browser on the Ebola virus genome (Figure 1). The multiple-alignment display was extended specifically for Ebola to show the effects of nucleotide mutations on viral proteins (non-synonymous, synonymous, stop, etc.). The close evolutionary distance among different Ebola sequences is readily observable in this new display, where only one or two coding SNPs across the entire genome are seen between two isolates from different patients in the same outbreak.
Four annotation tracks are displayed. The NCBI Genes track follows a display convention in which coding regions are shown at full height and UTRs at half height. Clicking on an item in this track takes the user to a page of information on the gene. The UniProt/SwissProt Protein Annotations track is shown in dense display mode. Clicking on this track expands it such that individual items, including protein motifs, domains, cleavage sites and other features, are labeled and staggered vertically if they overlap, as is done in the gene track. The B-Cell Epitopes track from IEDB, displayed in dense mode, shows protein regions where antibodies are known to bind. The Multiz Genome Alignments track is configured to show just a printable subset of the available genomes. When zoomed out to view the full genome (as here), changes between the reference genome are visible as colored lines: green for synonymous coding changes, red for non-synonymous, blue for UTR changes, and light yellow for missing data. When zoomed in sufficiently, individual amino acid and base differences are shown instead.
Across various strains of Ebola and various outbreaks dating back to 1976, the crucial GP gene, which is the only one exposed on the surface of the virus, has large regions that are conserved. When combined with the positive results in non-human primate trials, it seems likely that the vaccines now in Phase I trials
Figure 2 illustrates how one can use the Genome Browser to inspect the degree of variation of a particular antibody epitope sequence in the current outbreak. At the top of the display, the browser shows the region of the Ebola genomic nucleotide reference sequence in view, here zoomed to a 77 bp region. The UniProt track indicates that one is looking at a part of the GP1 protein, in the extracellular, mucin-like region, located next to a CHZ (histone chaperone) domain (shown in the Pfam track).
In this view the NCBI Genes, SwissProt, Pfam, IEDB and PDB annotation tracks have been set to “pack” display mode, which shows descriptive labels of the individual track features. The Multiz Genome Alignments annotation showing the alignment of other Ebola strain sequences to the reference sequence has been configured by setting the 160 Accessions track to “full” display mode, which (at this zoom level) shows amino acid sequences for each of the strains listed. In this example, the number of displayed strains has been reduced to only one sequence per outbreak by adjusting additional track settings, accessed by clicking on the 160 Accessions track label. The annotation “Sites that Carry a Unique Base…” is displayed by setting the 2014 Specific variation track to “pack” mode.
The IEDB track in Figure 2 shows that the protein encoded by the nucleotide sequence in this view is partially targeted by four previously published antibodies: an unnamed one from a screen (TGEESA), 12B5-1-1, 14G7 and 4G7. Detailed information about each of these antibodies can be viewed by clicking on the antibody name in the display. In this instance, the details reveal that 14G7 and 4G7 were first described by Olal et al. (2012)
The Multiz Genome Alignments annotation in Figure 2 has been configured to show amino acid sequences, with only one strain sequence from each outbreak displayed: Sierra Leone 2014 (translation of the reference sequence), Zaire 07 (KC242800), Zaire 76 (NC_002549), Bundibugyo 07 (NC_014373), Reston 89 (NC_004161) and Sudan 76 (KC545389). In this annotation, a dot indicates that the amino acid is identical to that of the reference sequence at the same position. As previously shown in this journal
As shown in the bottom annotation in Figure 2, the 2014 outbreak sequence does have a mutation in this region (C/A), but the green color of the feature indicates that it is synonymous. Therefore, both the 14G7 and 4G7 (ZMapp) antibodies most likely will be effective against the current 2014 strain if they were effective against the original Zaire strain.
This example, which can be repeated with any of the other curated 77 IEDB epitopes in other regions of the virus, shows how one can use the Genome Browser to get a quick overview of available information across various databases and determine whether the data support a given hypothesis. In addition to the data natively displayed in the UCSC Genome Browser, users can import their own annotation data for display on the reference sequence using the browser’s custom track feature (
We link to the UCSC Ebola Genome Browser from a portal page (
Given the exponential growth rate of the virus and the mobility of the human population, the current Ebola virus outbreak may not be not contained until late 2015. Today’s research efforts may provide some help in managing this outbreak; therefore, anything that can be done to encourage Ebola research and data-sharing seems prudent. To this end we hope the UCSC Ebola Portal and Genome Browser will be a useful tool for researchers. Suggestions for additional tracks and general feedback may be sent to the UCSC Genome Browser public mailing list at genome@soe.ucsc.edu.
We would like to thank the Phil Berman lab at UC Santa Cruz for early feedback on the browser, Pardis Sabeti at the Broad Institute for her guidance in selecting data sets, the Rachel Karchin lab at Johns Hopkins University for their customization work on the MuPIT data in support of this project and Charlotte Kent at the CDC for alerting us to the severity of the epidemic. We would also like to thank the entire staff of the UCSC Genome Browser engineering, QA, and sysadmin teams for their help and support in the rapid release of this browser.