I just found there is already a package "Bio::DB::Fasta" implemented in BioPerl to load the fasta file by indexing (Any_DBM). So, by using this package, we may implement a method in Bio::AlignIO to generate Bio::PrimarySeq objects for the fasta sequences, and they may inherit most of the methods in Bio::SimpleAlign.
At the moment, this code still doesn't work, because add_seq function in Bio::SimpleAlign does not support Bio::PrimarySeq object. This may be solved in future.
#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
use Bio::DB::Fasta;
use Bio::SimpleAlign;
my $in=Bio::AlignIO->new(-file=>"clustalw2-pumpkin_aa_edi.fst",
'-format'=>'fasta');
my $aln=$in->next_locatable_aln;
print $aln->num_sequences;
print $aln->percentage_identity;
foreach my $seq ($aln->each_seq()) {
#$seq will only load sequences, when we call $seq->seq()
#do something
}
#############
#New method in the Bio::AlignIO::Fasta
#############
use Bio::DB::Fasta;
sub next_locatable_aln {
my $self = shift;
my $aln = Bio::SimpleAlign->new();
my $db=Bio::DB::Fasta->new($self->{"_file"});
my $stream=$db->get_PrimarySeq_stream();
foreach my $seq ($stream->next_seq()) {
$aln->add_seq($seq);
}
return $aln->num_sequences;
}
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment