GSOC2010. BioPerl Alignment Subsystem Refactoring: Example code of the new alignment method/package

I just found there is already a package "Bio::DB::Fasta" implemented in BioPerl to load the fasta file by indexing (Any_DBM). So, by using this package, we may implement a method in Bio::AlignIO to generate Bio::PrimarySeq objects for the fasta sequences, and they may inherit most of the methods in Bio::SimpleAlign.

At the moment, this code still doesn't work, because add_seq function in Bio::SimpleAlign does not support Bio::PrimarySeq object. This may be solved in future.

#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
use Bio::DB::Fasta;
use Bio::SimpleAlign;

my $in=Bio::AlignIO->new(-file=>"clustalw2-pumpkin_aa_edi.fst",
'-format'=>'fasta');

my $aln=$in->next_locatable_aln;

print $aln->num_sequences;
print $aln->percentage_identity;

foreach my $seq ($aln->each_seq()) {
#$seq will only load sequences, when we call $seq->seq()
#do something

}

#############
#New method in the Bio::AlignIO::Fasta
#############
use Bio::DB::Fasta;

sub next_locatable_aln {
my $self = shift;

my $aln = Bio::SimpleAlign->new();

my $db=Bio::DB::Fasta->new($self->{"_file"});
my $stream=$db->get_PrimarySeq_stream();
foreach my $seq ($stream->next_seq()) {
$aln->add_seq($seq);
}
return $aln->num_sequences;
}

GSOC2010. BioPerl Alignment Subsystem Refactoring

Monday, June 14, 2010

Example code of the new alignment method/package

No comments:

Post a Comment

External links

Blog Archive

About Me