From Phyloinformatics
Jump to: navigation, search

Two new Bio::SimpleAlign methods are written & tested:

Generate and set unique short names

"$aln->set_displayname_safe" (assign serial names to sequences & store/return original names in a hash).


      Title     : set_displayname_safe
      Usage     : ($new_aln, $ref_name)=$ali->set_displayname_safe()
      Function  : Assign machine-generated serial names to sequences in input order.
                  Designed to protect names during PHYLIP runs. Assign 10-char string
                  in the form of "S000000001" to "S999999999". Restore the original
                  names using "restore_displayname".
      Returns   : 1. a new $aln with system names;
                  2. a hash ref for restoring names

Restore long names



      Title     : restore_displayname
      Usage     : $aln_name_restored=$ali->restore_displayname($hash_ref)
      Function  : Restore original sequence names (after running $ali->set_displayname_safe)
      Returns   : a new $aln with names restored.
      Argument  : a hash reference of names from "set_displayname_safe".

Application: making bootstrapped data sets with PHYLIP

Files modified:
AlignIO/ (now stores name_ref in a new $io field {-safe_name=>$ref})
Run/Phylo/Phylip/ (now the "run" methods returns $name_ref, in addition to $aln objects)


  1. !/usr/bin/perl -w
  2. Run SeqBoot (Phylip) without corrupting your sequence names

use Bio::AlignIO; use Bio::Tools::Run::Phylo::Phylip::SeqBoot; use Data::Dumper;

my $long_name_file=shift @ARGV; my $in=new Bio::AlignIO(-file=>$long_name_file); my $aln=$in->next_aln();

my @params = ('datatype'=>'SEQUENCE','replicates'=>5); my $seq = Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params);

my ($aln_ref, $name_ref) = $seq->run($aln); # diff to the original "run": save names in "$name_ref"

my $aio = Bio::AlignIO->new(-file=>">",-format=>"clustalw"); foreach my $ai(@{$aln_ref}){

 $ai=$ai->restore_displayname($name_ref); # restore sequence names

} </perl>