PhyloSoC:Command Line Topological Query Application for BioSQL

From Phyloinformatics
Jump to: navigation, search

This project is part of the 2007 Phyloinformatics Summer of Code which is part of the Google Summer of Code project. This web page will serve as the central resource for information relation to the project "A Perl-based Command Line Interface to a Topological Query Application for BioSQL in Support of High Throughput Classification and Analysis of LTR Retrotransposons in Plant Genomes" that is being developed by Jamie Estill.

Jamie will use PERL to create a set of command line programs for topological queries in BioSQL. The goal of this project is to create an interface that is suitable for high throughput creation and modification of SQL based phylogenies. I will use this interface to further my research on the classification of plant LTR retrotransposons.

News

  • Finally loaded the scripts to biosql-schema/scripts --Jestill 16:23, 23 October 2007 (EDT)
  • PhyInit now available from the project source code repository--Jestill 14:57, 1 June 2007 (EDT)
  • Started a Project Blog -- 23 April 2007 (EDT)
  • This page started -- 19 April 2007 (EDT)

Project Overview

I will be coding in PERL and will be using MySQL as the development RDBMS. I will use standard PERL and BioPERL modules when available. These modules include: Bio::TreeIO, DBI, and Getopt::Std. The Felidae tree available from TreeBASE will be used as the test phylogeny dataset for the duration of the project. This test tree is of moderate size, has some named parent nodes, and includes one small comb. Phylogenies of LTR Retrotransposons will be created and stored in the database framework throughout the development process. These phylogenies will be quite large and will use individual occurrences of LTR retrotransposons as the OTUs. Development will assume a single phylogeny per database. The variables for database name, user name, user password, and host will have default values.

Student Homepage: James Estill

Mentor(s): Hilmar Lapp (primary), Weigang Qiu, Bill Piel, Mike Muratet (secondary)

Project Blog: phylosoc2007jestill.blogspot.com

Source code: code.google.com/p/phylosoc2007jestill

System Requirements

I will be adding to the system requirements as the project develops.

  • Perl
  • PERL DBI
  • BioPerl
    The following modules are used:
    • Bio::Tree:TreeI
  • Database:
    • MySQL - 4.1 or newer
      Currently I am developing this only on MySQL. Nested SQL requires version 4.1 or better. I will be actively trying to make this compatible with the oldest version of MySQL possible.

Additional useful applications:

phyinit - Initialize a database

Create PhyloDB tables and foreign keys.

Synopsis

 USAGE: phyinit.pl -d 'DBI:mysql:database=biosql;host=localhost' 
                   -u UserName -p dbPass
     REQUIRED ARGUMENTS:
       --dsn        # The DSN string for the DB connection
       --dbuser     # User name to connect with
       --dbpass     # User password to connect with
     ALTERNATIVE TO --dsn:
       --driver     # DB Driver "mysql", "Pg", "Oracle" 
       --dbname     # Name of database to use
       --host       # Host to connect with (ie. localhost)
     ADDITIONAL OPTIONS:
       --sqldir     # SQL Dir that contains the SQL to create tables
       --quiet      # Run the program in quiet mode.
       --verbose    # Run the program with maximum output
     ADDITIONAL INFORMATION:
       --version    # Show the program version     
       --usage      # Show program usage
       --help       # Show a short help message
       --man        # Show full program manual

Full Documentation

Full documentation is available at PhyloSoC:phyinit

Source

Relevant existing code

phyimport - Import trees into PhyloDB

This will initially support only a few "standard" formats and make use of the Bio::TreeIO module in BioPERL. It can therefore be extended by the open source community to include additional file formats as needed. The file formats supported initially will be the NEXUS and Newick formats.

Synopsis

 USAGE: phyimport.pl -d 'DBI:mysql:database=biosql;host=localhost' 
                     -u UserName -p dbPass -i InFilePath -f InFileFormat 
   REQUIRED ARGUMENTS:
       --dsn        # The DSN string for the DB connection
       --dbuser     # User name to connect with
       --dbpass     # User password to connect with
       --infile     # Full path to the tree file to import to the db
       --format     # "newick", "nexus" (default "newick")
   ALTERNATIVE TO --dsn:
       --driver     # DB Driver "mysql", "Pg", "Oracle" 
       --dbname     # Name of database to use
       --host       # Host to connect with (ie. localhost)
   ADDITIONAL OPTIONS:
       --tree       # Tree name to use
       --quiet      # Run the program in quiet mode.
       --verbose    # Run the program in verbose mode.
   ADDITIONAL INFORMATION:
       --version    # Show the program version     
       --usage      # Show program usage
       --help       # Print short help message
       --man        # Open full program manual

Full Documentation

Full documentation is available at PhyloSoC:phyimport

Source

phyimport.pl - Perl code to import trees into database.

Test Input Files

Relevant existing code

phyexport - Export PhyloDB trees

This will initially support whole tree export in the formats given below. This will later be extended to export a single tree resulting from a query of the tree. This subset function will make use of the precomputed nested sets and transitive closure. This export will create trees that are able to be viewed in TreeView for visual inspection of branch IDs.

Synopsis

 USAGE: phyexport.pl
   REQUIRED ARGUMENTS:
       --dsn         # The DSN string the database to connect to
                     # Must conform to:
                     # 'DBI:mysql:database=biosql;host=localhost' 
       --dbuser      # User name to connect with
       --dbpass      # Password to connect with
       --outfile     # Full path to output file that will be created.
   ALTERNATIVE TO --dsn:
       --driver      # DB Driver "mysql", "Pg" "Oracle" 
       --dbname      # Name of database to use
       --host        # Host to connect with (ie. localhost)
   ADDITIONAL OPTIONS:
       --format      # "newick", "nexus" (default "newick")
       --tree        # Name of the tree to export
       --parent-node # Node to serve as root for a subtree export
       --help        # Print this help message
       --quiet       # Run the program in quiet mode.
       --db-node-id  # Preserve DB node names in export

Full Documentation

Full documentation is available at PhyloSoC:phyexport

Source

phyexport.pl - In progress

Relevant existing code

phyopt - Compute optimization values

The phyopt program will optimize trees in a PhyloDB database by computing transitive closure paths as well as the left and right index values for the nested set indexes.

Synopsis

 USAGE: phyopt.pl -d 'DBI:mysql:database=biosql;host=localhost' 
                  -u UserName -p dbPass -t MyTree
   REQUIRED ARGUMENTS:
       --dsn        # The DSN string the database to connect to
                    # Must conform to:
                    # 'DBI:mysql:database=biosql;host=localhost' 
       --dbuser     # User name to connect with
       --dbpass     # Password to connect with
   ALTERNATIVE TO --dsn:
       --driver     # "mysql", "Pg", "Oracle" (default "mysql")
       --dbname     # Name of database to use
       --host       # optional: host to connect with
   ADDITIONAL OPTIONS:
       --tree       # Name of the tree to optimize.
                    # Otherwise the entire db is optimized.
       --quiet      # Run the program in quiet mode.
       --verbose    # Run the program in verbose mode.
   ADDITIONAL INFORMATION:
       --version    # Show the program version     
       --usage      # Show program usage
       --help       # Print short help message
       --man        # Open full program manual

Full Documentation

Full documentation is available at PhyloSoC:phyopt

Source

phyopt.pl

Relevant existing code:

phyreport - Print report of information for a tree

Return a standard set of information for a given tree or for the entire database. This will return a standard set of information including (1) number of leaf nodes, (2) node IDs and names of terminal taxa etc. The output will be printed to an output file path.

Synopsis

 Usage: phyreport.pl -o PhyloDbReport.txt
   REQUIRED ARGUMENTS:
       --dsn         # The DSN string the database to connect to
                     # Must conform to:
                     # 'DBI:mysql:database=biosql;host=localhost' 
       --dbuser      # User name to connect with
       --dbpass      # Password to connect with
       --outfile     # Full path to output file that will be created.
   ALTERNATIVE TO --dsn:
       --driver      # DB Driver "mysql", "Pg", "Oracle" 
       --dbname      # Name of database to use
       --host        # Host to connect with (ie. localhost)
   ADDITIONAL OPTIONS:
       --tree        # Name of the tree to report on
                     # Otherwise generate report for all trees
       --quiet       # Run the program in quiet mode.
       --verbose     # Run the program in verbose mode.
   ADDITIONAL INFORMATION:
       --version     # Show the program version     
       --usage       # Show program usage
       --help        # Print short help message
       --man         # Open full program manual

Full Documentation

Full documentation is available at PhyloSoC:phyreport

Source

phyreport.pl

Relevant existing code

print-trees.pl

phymod - Modify PhyloDB trees

Modify an existing phylogeny in the database. This will use -x, -c, and -v as command line arguments to indicate remove branch(cut), move branch(copy), add branch(paste). The add branch function will at first assume that the user is attempting to add an additional tree from an external file source to an existing database. Future development will allow for cut or copy and paste from one tree in the database to another tree. The program will assume that the user knows the ID of the branch which will be removed or added to. All precomputed fields will be set to null following changes in tree topology. By default, this will attempt to warn the user before doing something stupid, however these warnings can be turned off with the quiet flag.

DELETE:
To request a delete query, simply specify a node to cut without providing another node to paste to. This will delete the target node and all child nodes. The node attributes, edges, and edge attributes will also be deleted from the database:

PhyMod -d dbName -u dbUserName -x RemoveNodeID[-h dbHost]

COPY AND PASTE:
Copy node from a source tree and place in the destination tree. If the destination tree name does not exist, a new tree will be created. Note: the source tree name is not required if the node id is passed since the node id is unique in the database.

PhyMod -d dbName -u dbUserName -c SourceNodeID -v DestinationBrachID -t DestinationTreeName[-h dbHost]

CUT AND PASTE:
Cut node from from the source tree and place in the destination tree. If the destination tree name does not exist, a new tree will be created. The data from the source tree will be deleted.

PhyMod -d dbName -u dbUserName -x CutNodeID -v DestNodeID [-h dbHost]

Synopsis

 Usage: phymod.pl
   REQUIRED ARGUMENTS:
       --dsn        # The DSN string for the DB connection
       --dbuser     # User name to connect with
       --dbpass     # User password to connect with
       --infile     # Full path to the tree file to import to the db
       --format     # "newick", "nexus" (default "newick")
   ALTERNATIVE TO --dsn:
       --driver     # DB Driver "mysql", "Pg", "Oracle" 
       --dbname     # Name of database to use
       --host       # Host to connect with (ie. localhost)
   ADDITIONAL OPTIONS:
       --tree       # Tree name to use
       --quiet      # Run the program in quiet mode.
       --verbose    # Run the program in verbose mode.
   ADDITIONAL INFORMATION:
       --version    # Show the program version     
       --usage      # Show program usage
       --help       # Print short help message
       --man        # Open full program manual

Full Documentation

Full documentation is available at: PhyloSoC:phymod

Source

phymod.pl - In progress

phymodwork.sql - In progress


Relevant existing code

References

The following references are relevant to the goals of this project. :

I would appreciate any input for further references --Jestill 12:19, 19 April 2007 (EDT)