PhyloSoC:Extending Jalview to Support RNA Alignment Annotation and Secondary Structure Visualization

From Phyloinformatics
Jump to: navigation, search

Author and Relevant links

Lauren Lui - llui at soe dot ucsc dot edu

Project Blog

Jalview Homepage

Source Code hosted on Google

RFAM

VARNA

Abstract

The overall goal of this project is to extend many of the useful features Jalview has for protein sequence alignments to support RNA sequence analysis. By adding more parsing and analysis of the information in Stockholm files, I can add support for RNA secondary structure alignment annotation, such as coloring schemes. Other features, such as an embedded RNA secondary structure viewer and the ability to import existing RNA sequences and alignments from the Rfam database will also be added.

The idea for this project evolved from the Google maps-like multi-genome browsing in Jalview project. I proposed to add RNA support to Jalview and was (wisely) advised that doing this and adding google-maps like browsing would be far too big a project for the summer.

Project Goals

  1. Modify processing of Stockholm files
  2. RNA secondary structure alignment annotation
    • Secondary structure line indicating stems (helices) and possibly loops
    • Add feature to edit secondary structure line in alignment to indicate stems and loops.
  3. Coloring for Alignment Annotation Visualization
    • Coloring of alignment based on prediction of its consensus secondary structure
    • Coloring according to base pair conservation (covariation)
    • Coloring relevant to nucleotide alignments (pyrimidine, purine)
  4. Embed an existing Java based secondary structure viewer into Jalview
    • VARNA is a good candidate that is under the GPL (http://varna.lri.fr/)
    • Hook viewer into Jalview so than Jalview can launch the viewer for secondary structure
    • Extending the selection/mouseover messaging system so that mouse over and selections get highlighted in the linked views (ie current position in alignment highlighted in secondary structure - or clicking secondary structure diagram selects regions of the alignment).
  5. Creation of a stockholm file from a secondary structure alignment
  6. Ability to import preexisting alignments and sequences from Rfam (http://rfam.sanger.ac.uk/)

Timeline

Interim Period before Acceptance: 4/10-4/25

Practice programming in Java, Swing/AWT and become familiarized with the Jalview source. Set up development environment.

Community Bonding Period: 4/26-5/23

  • 4/26-5/9 – Set up wiki page and blog, discuss goals with mentors, join mailing lists, add other small goals missed in the initial proposal, such as specific color schemes.
  • 5/3-5/9 – Check feasibility of adding VARNA to Jalview. Talk to VARNA author if modifications are needed.
  • 5/10-5/23 – Start coding within this time period; firm date to start coding is 5/17 to make up for time lost attending RNA Society Meeting.
  • Goal 1: Modify processing of Stockholm file format to analyze secondary structure line for stems, loops, and single stranded regions.
  • Modify the class StockholmFile in the jalview.io package if necessary
  • Adding the extention “.sto” to possible files to look for in the open file dialog will be helpful

What was done:

  • Set up Eclipse as my IDE and got Jalview to compile
  • Set up my code hosting on Google: Jalview on Google
  • Set up my blog for the summer: Jalview-RNAsupport
  • Discussion on which secondary structure viewer to embed, picked VARNA
    • Tried out a variety of RNA secondary structure viewers
    • Consulted other RNA biologists about which viewers they use and how they use them
    • Spoke to the developer of VARNA, Yann Ponty
    • Got VARNA to compile under Eclipse
  • Added ".sto, .stk" as possible file extensions in Jalview. (Modified jalview.io.AppletFormatAdapter)

Week 1: Official coding start, 5/24-5/30

  • Goal 2: Add RNA secondary structure alignment annotation
  • Display a “Secondary structure” line specific for RNA showing glyphs for stems, loops, and single stranded regions
  • Add ability to edit this line to add stems and loops.
  • Modify the AnnotationPanel class in jalview.gui package to support stems and loops

What was done:

I was able to enable coloring of stems from WUSS file format, but most of the week was spent trying to figure out data flow and if I should encode "helices" for coloring in the datamodel or as sequence annotations. I added more goals for the rest of the project.

Week 2: 5/31-6/6

  • Goal 3: Add color schemes
  • This will require adding new classes to the jalview.schemes package and modifying other classes that mention color schemes, such as PopupMenu in the jalview.gui package
  • Add color scheme for bases colored depending on which stem they are part of
  • Add purine/pyrimidine coloring

More Goals/Goals shifted from other weeks

  • Convert RALEE code for parsing secondary structure lines in WUSS into Java.

What was done:

  • RALEE code was in elisp, so I learned some basic elisp to figure out the code. Code was converted into Java.
  • Signed up as a user on new Jalview bug tracker that Jim set up: http://issues.jalview.org/secure/Dashboard.jspa
  • Created issue for the Stockholm file parsing issue.

Week 3: 6/7-6/13

  • Add color scheme for covariation
  • Need to add a class to the jalview.schemes package for this.
  • Sanity check of first three goals - do extra testing of various file formats and alignments if needed, add documentation to help files

More Goals/Goals shifted from other weeks

  • Add recognition that the secondary structure annotation line pertains directly to RNA or proteins.
  • Jalview checks that the secondary structure line and sequence are the same length, but not for alignment associated annotation; may need to add this
  • Convert WUSS symbols to simplified version
  • Change all bracket types to () for VARNA, it doesn't like other bracket types, probably do with above goal easily.

What was done:

  • Fix the Stockholm fileparsing.
  • Talk to Paul Gardner about covariation coloring in Rfam

Week 4: 6/14-6/20

  • Add a new class in jalview.analysis package to hold the secondary structure line parsing functions, hook this up to processing Stockholm files
  • Get visualization of stems resulting from output of above class

What was done

  • Add a new class in jalview.analysis package to hold the secondary structure line parsing functions
  • Fixed a bug with displaying stems

Week 5: 6/21-6/27

  • Preparing poster, at RNA Society Conference (6/22- 6/26)

What was done

  • Fixed another bug with displaying stems

Week 6: 6/28-7/4

  • Hook up new class RNA.java in analysis package to process Stockholm files
  • Get visualization of stems resulting from output of above class

What was done

  • Made sure translation of WUSS notation was simplified in Stockholm File parsing, compatible with VARNA
  • Made base pairing into SequenceFeatures during processing of Stockholm
  • Hooked up processing of annotation functions in new jalview.analysis.Rna class to processing in Stockholm files

Week 7: 7/5-7/11

  • Finish the storage and parsing of the annotation line
  • Adding covariance coloring
  • Fixing the display on the annotation panel.

What was done

  • Found "bug" in RALEE parsing secondary structure code and fixed it
  • Spent some time looking at how the "disulfide bond" features were stored and coloring schemes to figure out how to add the covariance coloring.

Week 8: Midterm Evaluations, 7/12-7/18

  • Write midterm evaluations.
  • Submit midterm evaluations by July 16.
  • Plan to get to finishing the storage and parsing of the annotation line
  • Change how the helix/stem features are stored in the sequence feature objects. I am not satisfied with how I've currently implemented this.
  • Fixing the display on the annotation panel.

What was done

  • Discussion with the author of RALEE, and I discovered that my interpretation of what a "helix" is in my parsing might not be the best. Change code back to before I fixed the "bug"
  • Store secondary structure line in SequenceFeature objects

Week 9: 7/19-7/25

  • Add a color scheme for purine/pyrimidine
  • Add a color scheme for covariation
  • start adding conservation coloring logic for covariation coloring scheme.

What was done

  • Add a color scheme for purine/pyrimidine
  • Started to add a color scheme for covariation

Week 10: 7/26-8/1

  • Finish adding a color scheme for covariation
  • start adding conservation coloring logic for covariation coloring scheme.
  • Add other useful color schemes for RNA
  • Start looking into adding support to search Rfam for sequences

What was done:

  • Fixed a bug in purine/pyrimidine color scheme
  • Added the purine/pyrimidine color scheme to the Jmol menu
  • Added the skeleton of the covariation color scheme and some of the needed logic
  • Added an algorithm for generating random colors, which will be used in the covariation color scheme

Week 11: 8/2-8/8

  • Goal 6: Add support to search Rfam database
  • A support for Rfam sequence retrieval
  • Add classes similar to Pfamseed and PfamFull for fetching alignments from Rfam
  • Modify classes for database retrieval, such as the SequenceFetcher class in the jalview.ws package
  • Add a RfamFile class similar to PfamFile class in jalview.io package
  • Finish adding support to search Rfam database
  • Do testing if needed
  • Add documentation to help files
  • Do a bulk test for reading files from Rfam and fix StockholmFile.java as needed.

What was done:

  • Implemented a new method in the Stockholm file parser to store andanalyze the secondary structure line
  • Adding support to search Rfam database and sequence retrieval
    • Refactored Pfam class; created Xfam class
    • Add classes similar to Pfamseed and PfamFull for fetching alignments from Rfam
    • Modified SequenceFetcher class in the jalview.ws package so fetching sequences is available in the Jalview menu
    • Add a Rfam class similar to Pfam class in jalview.io package

Week 12: “Pencils down,” 8/9-8/15

  • Scrub code, improve documentation, submit deliverables
  • Make sure that “Help” documentation has been added sufficiently for all parts of the project.
  • If time permits, write a RNA secondary structure annotation tutorial .
  • August 16 is the firm “pencils down” date.

What was done:

  • Did a bulk groovy test for reading files from Rfam and fix StockholmFile.java
  • Reworked the covariation color scheme, renamed "By RNA Helices"
  • Fixed secondary structure line view in the annotation panel
  • Fixed bug with group popup menu
  • Changed default character color for "By RNA Helices" coloring so purine/pyrimidine color scheme is used for display in sequence logo
  • Wrote documentation and help files

Week 13: Final Evaluations, 8/16-8/20

  • Submit final evaluations by August 20.

Further work after Google Summer of Code Has Ended

Things that I didn't get to or discovered that I need to do.

Secondary Structure Visualization

  • Detection of pseudoknots

Color schemes

  • Coloring scheme for pseudoknots
  • Coloring schemes from other MSA viewers, like 4Sale and Assemble?

VARNA

  • Embed secondary structure viewer (VARNA)
  • Try to hook VARNA into Jalview, see if Jalview can launch VARNA. See what issues pop up. Contact VARNA author if needed.
  • Start adding a new class in the jalview.gui package to handle VARNA
  • Write JPanel wrapper to handle UI events from the Jalview desktop and from the VARNA panel
  • Extend the selection/mouseover messaging system so that mouse over and selections get highlighted in the linked views
  • Add modifications to StructureSelectionManager class to support Sequence/Secondary Structure highlighting.
  • Add documentation of secondary structure viewer to help files
  • Extend the selection/mouseover messaging system so that mouse over and selections get highlighted in the linked views
  • Add modifications to StructureSelectionManager class to support Sequence/Secondary Structure highlighting.
  • Testing of secondary structure viewer: spend time playing with the interface with a diverse input set representative of RNA secondary structure files and alignments.

Misc.

  • Add changes done to the main gui to the applet gui
  • Add creation of Stockholm file from secondary structure visualization
  • Add modifications to the StockholmFile class in the jalview.io package if necessary