Friday, July 30, 2010

Bio::DB::Align::Rfam implemented

Bio::DB::Align::Rfam is implemented. However, a test using pfam alignment format failed. It seems there is something wrong with Bio::AlignIO::pfam. It needs to be solved later.

Now Bio::DB::Align::Rfam provides the sequence retrieval method by accession.

get_Aln_by_acc #retrieve the alignment using accession, and returns a Bio::SimpleAlign object



An example

my $dbobj=Bio::DB::Align->new(-db=>"rfam");
my $aln=$dbobj->get_Aln_by_acc("RF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full");

print $aln->length();

foreach my $seq ($aln->each_Seq) {
#do something
}

Friday, July 23, 2010

Bio::DB::Align::Pfam implemented

Bio::DB::Align::Pfam is implemented. This package uses the Pfam RESTful service to retrieve alignment sequences from Pfam.

Now this packages include these methods

new
get_Aln_by_id #retrieve the alignment using ID, and returns a Bio::SimpleAlign object
get_Aln_by_acc #retrieve the alignment using accession, and returns a Bio::SimpleAlign object
id2acc #conversion of id to accession


An example

my $dbobj=Bio::DB::Align->new(-db=>"pfam");
my $aln=$dbobj->get_Aln_by_acc("PF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"PF0001",-alignment=>"full");

print $aln->length();

foreach my $seq ($aln->each_Seq) {
#do something
}

Wednesday, July 21, 2010

How can I access resources on Uniprot programmatically

Here is the answer from Uniprot.It can be accessed through RESTful service.

http://www.uniprot.org/faq/28

Friday, July 16, 2010

Summary of Bio::SimpleAlign

A summary of what is done for Bio::SimpleAlign

http://spreadsheets.google.com/ccc?key=0AssLTcJFJMbXdDFfZGpJZlhidFY5blBneGdhQUZ6WFE&hl=en&authkey=CJTCw4QL

I have listed which method is new, which method is renamed, which method is rewritten ...

Wednesday, July 14, 2010

First trial run for Bio::DB::Align::Pfam succeeded

I just got a trial run for some methods retrieving Pfam alignment sequences. It succeeded.

At the moment, this package (Bio::DB::Align::Pfam) aims at retrieving alignment sequences from Pfam using the accession number provided by the user. Calling the package can be:


my $obj = Bio::DB::Align::Pfam->new(); #or Bio::DB::Align->new(-database=>"Pfam") ???
my $aln = $obj->get_Aln_by_acc(-accession=>"PF00653",-alnType=>"seed",-format=>"fasta",-order=>"t",-case=>"l",-gap=>"default");

or simply as

my $aln = $obj->get_Aln_by_acc("PF00653");

The package will return a Bio::SimpleAlign object by simply one or two commands.

Tuesday, July 13, 2010

A possbile structure for Bio::DB::Align

The next task is to implement a package retrieving online alignment sequences and return a Bio::SimpleAlign object. The possible way of doing that is to implement these packages:

Bio::DB::Align (folder)
AlignI.pm (Interface showing methods need to be implemented)
Pfam.pm (Implementation of retrieving alignment data from Pfam)
Uniprot.pm (Implementation of retrieving alignment data from Uniprot)

Probably, in AlignI, the methods are
get_Align_by_id
get_Align_by_acc

In the implementation, e.g. Pfam.pm, the package will implement both Bio::DB::Align::AlignI for alignment retrieving methods and Bio::DB::GenericWebAgent for web related methods.

For simplicity, Bio::DB::Align::Pfam and Bio::DB::Align::Uniprot will only implement alignment related methods. The package retrieving other information from these two databases can be implmented later as Bio::DB::Pfam and Bio::DB::Uniprot

Friday, July 9, 2010

The structure of Bio::SimpleAlign

Alignment modifier methods
add_seq
remove_LocatableSeq
remove_Seqs
remove_redundant_Seqs
uniq_seq
remove_columns
sort_alphabetically
sort_by_list
sort_by_pairwise_identity
sort_by_length
sort_by_start
set_new_reference
Alignment selection methods
each_seq
each_alphabetically
each_seq_with_id
get_seq_by_pos
get_seq_by_id
select_Seqs
select_columns
remove_gaps
mask_columns
seq_with_features
Change sequences within the MSA
map_chars
uppercase
lowercase
togglecase
match
unmatch
Consensus sequences
consensus_string
consensus_iupac
consensus_meta
bracket_string
cigar_line
match_line
gap_line
all_gap_line
gap_col_matrix
MSA attributes
id
accession
description
source
missing_char
match_char
gap_char
mask_char
symbol_chars
Alignment descriptors
score
is_flush
length
maxdisplayname_length
max_metaname_length
num_residues
num_sequences
average_percentage_identity
percentage_identity
overall_percentage_identity
pairwise_percentage_identity
column_from_residue_number
Sequence names
displayname
set_displayname_count
set_displayname_flat
set_displayname_normal
set_displayname_safe
restore_displayname
methods implementing Bio::FeatureHolderI
get_SeqFeatures
add_SeqFeature
remove_SeqFeatures
feature_count
get_all_SeqFeatures
methods for Bio::AnnotatableI
annotation

Major improvements on Bio::SimpleAlign

The cleaning for Bio::SimpleAlign is 99% finished, though a few tests may be needed. Here are the major improvements on Bio::SimpleAlign

1. MSA modifying and selection methods are more consistent and easier to use. I have enabled multiple/reverse selections for all sequences/columns selection methods, and change the names to be more understandable.
2. gap chars/missing chars are more consistent in the package
3. Some redundant methods are removed. The methods are moved to more reasonable categories.
4. Some methods are renamed. Methods selecting/giving objects are capitalized, e.g. each_seq to each_Seq. (Though each_Seq and add_Seq may cause inconsistency in some packages related with Bio::SimpleAlign, because they are so widely used. I will find them and replace them).

Thursday, July 8, 2010

Retrieving online alignment sequences

Pfam database provides a web service to retrieve information. I may start from writing a wrapper for that.

http://pfam.sanger.ac.uk/help#tabview=tab10

This may request an implementation of a Bio::DB::Pfam package.

Tuesday, July 6, 2010

Enable multiple/toggle selection feature in Bio::SimpleAlign

One major new feature in Bio::SimpleAlign is to enable multiple and toggle selection for methods selecting sequences or columns. The new ways of doing can be :

$newaln=$aln->select_Seqs([4..10,20..35,37]);
$newaln=$aln->select_Seqs(-selection=>[4..10,20..35,37]);

Or you can toggle selection(reverse selection) using:
$newaln=$aln->select_Seqs([4..10,20..35,37],1);
$newaln=$aln->select_Seqs(-selection=>[4..10,20..35,37],-toggle=>1);

The coordinates of the sequences or columns are 1-based. This new feature will make the selection much easier for users.And, it will affect methods such as:

select_Seqs
remove_Seqs
select_columns
remove_columns
mask_columns

Monday, July 5, 2010

Back to work

Back from the conference and vacation.

Continue coding now~~ o.o