Difference between revisions of "NameIssue"

From Phyloinformatics
Jump to: navigation, search
m
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
###Two new Bio::SimpleAlign methods are written & tested: "$aln->set_displayname_safe" (assign serial names to sequences & store/resturn original names in a hash) and "$aln->restore_displayname". New files: SimpleAlign.pm and SimpleAlign.t
+
Two new Bio::SimpleAlign methods are written & tested:  
###A use case: Run Phylip::SeqBoot while preserving names
+
 
 +
===Generate and set unique short names===
 +
 
 +
"$aln->set_displayname_safe" (assign serial names to sequences & store/return original names in a hash).
 +
<pre>
 +
    set_displayname_safe
 +
 
 +
      Title    : set_displayname_safe
 +
      Usage    : ($new_aln, $ref_name)=$ali->set_displayname_safe()
 +
      Function  : Assign machine-generated serial names to sequences in input order.
 +
                  Designed to protect names during PHYLIP runs. Assign 10-char string
 +
                  in the form of "S000000001" to "S999999999". Restore the original
 +
                  names using "restore_displayname".
 +
      Returns  : 1. a new $aln with system names;
 +
                  2. a hash ref for restoring names
 +
 
 +
</pre>
 +
 
 +
===Restore long names===
 +
 
 +
"$aln->restore_displayname"
 +
 
 +
<pre>
 +
    restore_displayname
 +
 
 +
      Title    : restore_displayname
 +
      Usage    : $aln_name_restored=$ali->restore_displayname($hash_ref)
 +
      Function  : Restore original sequence names (after running $ali->set_displayname_safe)
 +
      Returns  : a new $aln with names restored.
 +
      Argument  : a hash reference of names from "set_displayname_safe".
 +
 
 +
</pre>
 +
 
 +
===Application: making bootstrapped data sets with PHYLIP===
 +
 
 +
Files modified:
 +
 +
SimpleAlign.pm<br>
 +
SimpleAlign.t<br>
 +
AlignIO/phylip.pm (now stores name_ref in a new $io field {-safe_name=>$ref})<br>
 +
Run/Phylo/Phylip/SeqBoot.pm (now the "run" methods returns $name_ref, in addition to $aln objects)
  
 
<perl>
 
<perl>
 
#!/usr/bin/perl -w
 
#!/usr/bin/perl -w
# Run SeqBoot (Phylip) without  
+
# Run SeqBoot (Phylip) without corrupting your sequence names
  
 
use Bio::AlignIO;
 
use Bio::AlignIO;
Line 13: Line 53:
 
my $in=new Bio::AlignIO(-file=>$long_name_file);
 
my $in=new Bio::AlignIO(-file=>$long_name_file);
 
my $aln=$in->next_aln();
 
my $aln=$in->next_aln();
 
#my ($aln2,$ref_name)=$aln->set_displayname_safe();
 
#print Dumper($ref_name);
 
  
 
my @params = ('datatype'=>'SEQUENCE','replicates'=>5);
 
my @params = ('datatype'=>'SEQUENCE','replicates'=>5);
 
my $seq = Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params);
 
my $seq = Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params);
  
#my $aln_ref = $seq->run($aln2);
+
my ($aln_ref, $name_ref) = $seq->run($aln); # diff to the original "run": save names in "$name_ref"
my ($aln_ref, $name_ref) = $seq->run($aln);
 
  
#my $aio = Bio::AlignIO->new(-format=>"phylip");
+
my $aio = Bio::AlignIO->new(-file=>">alignment.bootstrap.new",-format=>"clustalw");
my $aio = Bio::AlignIO->new(-format=>"fasta");
 
#$aln->set_displayname_flat();
 
#my $aio = Bio::AlignIO->new(-file=>">alignment.bootstrap.new",-format=>"clustalw");
 
 
foreach my $ai(@{$aln_ref}){
 
foreach my $ai(@{$aln_ref}){
#  $aio->write_aln($ai->restore_displayname($name_ref));
+
   $ai=$ai->restore_displayname($name_ref); # restore sequence names
  foreach my $seq ($ai->each_seq()) {
+
   $aio->write_aln($ai);
    print $seq->id(), "\n";
 
  }
 
 
 
  print "\n";
 
   $ai=$ai->restore_displayname($name_ref);
 
   foreach my $seq ($ai->each_seq()) {
 
    print $seq->id(), "\n";
 
  }
 
  print "===\n"
 
$aio->write_aln($ai);
 
 
}
 
}
 
</perl>
 
</perl>
 +
 +
[[Category:Phylohackathon 1]]

Latest revision as of 12:50, 5 September 2007

Two new Bio::SimpleAlign methods are written & tested:

Generate and set unique short names

"$aln->set_displayname_safe" (assign serial names to sequences & store/return original names in a hash).

    set_displayname_safe

      Title     : set_displayname_safe
      Usage     : ($new_aln, $ref_name)=$ali->set_displayname_safe()
      Function  : Assign machine-generated serial names to sequences in input order.
                  Designed to protect names during PHYLIP runs. Assign 10-char string
                  in the form of "S000000001" to "S999999999". Restore the original
                  names using "restore_displayname".
      Returns   : 1. a new $aln with system names;
                  2. a hash ref for restoring names

Restore long names

"$aln->restore_displayname"

     restore_displayname

      Title     : restore_displayname
      Usage     : $aln_name_restored=$ali->restore_displayname($hash_ref)
      Function  : Restore original sequence names (after running $ali->set_displayname_safe)
      Returns   : a new $aln with names restored.
      Argument  : a hash reference of names from "set_displayname_safe".

Application: making bootstrapped data sets with PHYLIP

Files modified:

SimpleAlign.pm
SimpleAlign.t
AlignIO/phylip.pm (now stores name_ref in a new $io field {-safe_name=>$ref})
Run/Phylo/Phylip/SeqBoot.pm (now the "run" methods returns $name_ref, in addition to $aln objects)

<perl>

  1. !/usr/bin/perl -w
  2. Run SeqBoot (Phylip) without corrupting your sequence names

use Bio::AlignIO; use Bio::Tools::Run::Phylo::Phylip::SeqBoot; use Data::Dumper;

my $long_name_file=shift @ARGV; my $in=new Bio::AlignIO(-file=>$long_name_file); my $aln=$in->next_aln();

my @params = ('datatype'=>'SEQUENCE','replicates'=>5); my $seq = Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params);

my ($aln_ref, $name_ref) = $seq->run($aln); # diff to the original "run": save names in "$name_ref"

my $aio = Bio::AlignIO->new(-file=>">alignment.bootstrap.new",-format=>"clustalw"); foreach my $ai(@{$aln_ref}){

 $ai=$ai->restore_displayname($name_ref); # restore sequence names
 $aio->write_aln($ai);

} </perl>