Bio::DB::Align::Rfam is implemented. However, a test using pfam alignment format failed. It seems there is something wrong with Bio::AlignIO::pfam. It needs to be solved later.
Now Bio::DB::Align::Rfam provides the sequence retrieval method by accession.
get_Aln_by_acc #retrieve the alignment using accession, and returns a Bio::SimpleAlign object
An example
my $dbobj=Bio::DB::Align->new(-db=>"rfam");
my $aln=$dbobj->get_Aln_by_acc("RF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full");
print $aln->length();
foreach my $seq ($aln->each_Seq) {
#do something
}
Friday, July 30, 2010
Friday, July 23, 2010
Bio::DB::Align::Pfam implemented
Bio::DB::Align::Pfam is implemented. This package uses the Pfam RESTful service to retrieve alignment sequences from Pfam.
Now this packages include these methods
new
get_Aln_by_id #retrieve the alignment using ID, and returns a Bio::SimpleAlign object
get_Aln_by_acc #retrieve the alignment using accession, and returns a Bio::SimpleAlign object
id2acc #conversion of id to accession
An example
my $dbobj=Bio::DB::Align->new(-db=>"pfam");
my $aln=$dbobj->get_Aln_by_acc("PF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"PF0001",-alignment=>"full");
print $aln->length();
foreach my $seq ($aln->each_Seq) {
#do something
}
Now this packages include these methods
new
get_Aln_by_id #retrieve the alignment using ID, and returns a Bio::SimpleAlign object
get_Aln_by_acc #retrieve the alignment using accession, and returns a Bio::SimpleAlign object
id2acc #conversion of id to accession
An example
my $dbobj=Bio::DB::Align->new(-db=>"pfam");
my $aln=$dbobj->get_Aln_by_acc("PF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"PF0001",-alignment=>"full");
print $aln->length();
foreach my $seq ($aln->each_Seq) {
#do something
}
Wednesday, July 21, 2010
How can I access resources on Uniprot programmatically
Here is the answer from Uniprot.It can be accessed through RESTful service.
http://www.uniprot.org/faq/28
http://www.uniprot.org/faq/28
Friday, July 16, 2010
Summary of Bio::SimpleAlign
A summary of what is done for Bio::SimpleAlign
http://spreadsheets.google.com/ccc?key=0AssLTcJFJMbXdDFfZGpJZlhidFY5blBneGdhQUZ6WFE&hl=en&authkey=CJTCw4QL
I have listed which method is new, which method is renamed, which method is rewritten ...
http://spreadsheets.google.com/ccc?key=0AssLTcJFJMbXdDFfZGpJZlhidFY5blBneGdhQUZ6WFE&hl=en&authkey=CJTCw4QL
I have listed which method is new, which method is renamed, which method is rewritten ...
Wednesday, July 14, 2010
First trial run for Bio::DB::Align::Pfam succeeded
I just got a trial run for some methods retrieving Pfam alignment sequences. It succeeded.
At the moment, this package (Bio::DB::Align::Pfam) aims at retrieving alignment sequences from Pfam using the accession number provided by the user. Calling the package can be:
my $obj = Bio::DB::Align::Pfam->new(); #or Bio::DB::Align->new(-database=>"Pfam") ???
my $aln = $obj->get_Aln_by_acc(-accession=>"PF00653",-alnType=>"seed",-format=>"fasta",-order=>"t",-case=>"l",-gap=>"default");
or simply as
my $aln = $obj->get_Aln_by_acc("PF00653");
The package will return a Bio::SimpleAlign object by simply one or two commands.
At the moment, this package (Bio::DB::Align::Pfam) aims at retrieving alignment sequences from Pfam using the accession number provided by the user. Calling the package can be:
my $obj = Bio::DB::Align::Pfam->new(); #or Bio::DB::Align->new(-database=>"Pfam") ???
my $aln = $obj->get_Aln_by_acc(-accession=>"PF00653",-alnType=>"seed",-format=>"fasta",-order=>"t",-case=>"l",-gap=>"default");
or simply as
my $aln = $obj->get_Aln_by_acc("PF00653");
The package will return a Bio::SimpleAlign object by simply one or two commands.
Tuesday, July 13, 2010
A possbile structure for Bio::DB::Align
The next task is to implement a package retrieving online alignment sequences and return a Bio::SimpleAlign object. The possible way of doing that is to implement these packages:
Bio::DB::Align (folder)
AlignI.pm (Interface showing methods need to be implemented)
Pfam.pm (Implementation of retrieving alignment data from Pfam)
Uniprot.pm (Implementation of retrieving alignment data from Uniprot)
Probably, in AlignI, the methods are
get_Align_by_id
get_Align_by_acc
In the implementation, e.g. Pfam.pm, the package will implement both Bio::DB::Align::AlignI for alignment retrieving methods and Bio::DB::GenericWebAgent for web related methods.
For simplicity, Bio::DB::Align::Pfam and Bio::DB::Align::Uniprot will only implement alignment related methods. The package retrieving other information from these two databases can be implmented later as Bio::DB::Pfam and Bio::DB::Uniprot
Bio::DB::Align (folder)
AlignI.pm (Interface showing methods need to be implemented)
Pfam.pm (Implementation of retrieving alignment data from Pfam)
Uniprot.pm (Implementation of retrieving alignment data from Uniprot)
Probably, in AlignI, the methods are
get_Align_by_id
get_Align_by_acc
In the implementation, e.g. Pfam.pm, the package will implement both Bio::DB::Align::AlignI for alignment retrieving methods and Bio::DB::GenericWebAgent for web related methods.
For simplicity, Bio::DB::Align::Pfam and Bio::DB::Align::Uniprot will only implement alignment related methods. The package retrieving other information from these two databases can be implmented later as Bio::DB::Pfam and Bio::DB::Uniprot
Friday, July 9, 2010
The structure of Bio::SimpleAlign
Alignment modifier methods
add_seq
remove_LocatableSeq
remove_Seqs
remove_redundant_Seqs
uniq_seq
remove_columns
sort_alphabetically
sort_by_list
sort_by_pairwise_identity
sort_by_length
sort_by_start
set_new_reference
Alignment selection methods
each_seq
each_alphabetically
each_seq_with_id
get_seq_by_pos
get_seq_by_id
select_Seqs
select_columns
remove_gaps
mask_columns
seq_with_features
Change sequences within the MSA
map_chars
uppercase
lowercase
togglecase
match
unmatch
Consensus sequences
consensus_string
consensus_iupac
consensus_meta
bracket_string
cigar_line
match_line
gap_line
all_gap_line
gap_col_matrix
MSA attributes
id
accession
description
source
missing_char
match_char
gap_char
mask_char
symbol_chars
Alignment descriptors
score
is_flush
length
maxdisplayname_length
max_metaname_length
num_residues
num_sequences
average_percentage_identity
percentage_identity
overall_percentage_identity
pairwise_percentage_identity
column_from_residue_number
Sequence names
displayname
set_displayname_count
set_displayname_flat
set_displayname_normal
set_displayname_safe
restore_displayname
methods implementing Bio::FeatureHolderI
get_SeqFeatures
add_SeqFeature
remove_SeqFeatures
feature_count
get_all_SeqFeatures
methods for Bio::AnnotatableI
annotation
add_seq
remove_LocatableSeq
remove_Seqs
remove_redundant_Seqs
uniq_seq
remove_columns
sort_alphabetically
sort_by_list
sort_by_pairwise_identity
sort_by_length
sort_by_start
set_new_reference
Alignment selection methods
each_seq
each_alphabetically
each_seq_with_id
get_seq_by_pos
get_seq_by_id
select_Seqs
select_columns
remove_gaps
mask_columns
seq_with_features
Change sequences within the MSA
map_chars
uppercase
lowercase
togglecase
match
unmatch
Consensus sequences
consensus_string
consensus_iupac
consensus_meta
bracket_string
cigar_line
match_line
gap_line
all_gap_line
gap_col_matrix
MSA attributes
id
accession
description
source
missing_char
match_char
gap_char
mask_char
symbol_chars
Alignment descriptors
score
is_flush
length
maxdisplayname_length
max_metaname_length
num_residues
num_sequences
average_percentage_identity
percentage_identity
overall_percentage_identity
pairwise_percentage_identity
column_from_residue_number
Sequence names
displayname
set_displayname_count
set_displayname_flat
set_displayname_normal
set_displayname_safe
restore_displayname
methods implementing Bio::FeatureHolderI
get_SeqFeatures
add_SeqFeature
remove_SeqFeatures
feature_count
get_all_SeqFeatures
methods for Bio::AnnotatableI
annotation
Major improvements on Bio::SimpleAlign
The cleaning for Bio::SimpleAlign is 99% finished, though a few tests may be needed. Here are the major improvements on Bio::SimpleAlign
1. MSA modifying and selection methods are more consistent and easier to use. I have enabled multiple/reverse selections for all sequences/columns selection methods, and change the names to be more understandable.
2. gap chars/missing chars are more consistent in the package
3. Some redundant methods are removed. The methods are moved to more reasonable categories.
4. Some methods are renamed. Methods selecting/giving objects are capitalized, e.g. each_seq to each_Seq. (Though each_Seq and add_Seq may cause inconsistency in some packages related with Bio::SimpleAlign, because they are so widely used. I will find them and replace them).
1. MSA modifying and selection methods are more consistent and easier to use. I have enabled multiple/reverse selections for all sequences/columns selection methods, and change the names to be more understandable.
2. gap chars/missing chars are more consistent in the package
3. Some redundant methods are removed. The methods are moved to more reasonable categories.
4. Some methods are renamed. Methods selecting/giving objects are capitalized, e.g. each_seq to each_Seq. (Though each_Seq and add_Seq may cause inconsistency in some packages related with Bio::SimpleAlign, because they are so widely used. I will find them and replace them).
Thursday, July 8, 2010
Retrieving online alignment sequences
Pfam database provides a web service to retrieve information. I may start from writing a wrapper for that.
http://pfam.sanger.ac.uk/help#tabview=tab10
This may request an implementation of a Bio::DB::Pfam package.
http://pfam.sanger.ac.uk/help#tabview=tab10
This may request an implementation of a Bio::DB::Pfam package.
Tuesday, July 6, 2010
Enable multiple/toggle selection feature in Bio::SimpleAlign
One major new feature in Bio::SimpleAlign is to enable multiple and toggle selection for methods selecting sequences or columns. The new ways of doing can be :
$newaln=$aln->select_Seqs([4..10,20..35,37]);
$newaln=$aln->select_Seqs(-selection=>[4..10,20..35,37]);
Or you can toggle selection(reverse selection) using:
$newaln=$aln->select_Seqs([4..10,20..35,37],1);
$newaln=$aln->select_Seqs(-selection=>[4..10,20..35,37],-toggle=>1);
The coordinates of the sequences or columns are 1-based. This new feature will make the selection much easier for users.And, it will affect methods such as:
select_Seqs
remove_Seqs
select_columns
remove_columns
mask_columns
$newaln=$aln->select_Seqs([4..10,20..35,37]);
$newaln=$aln->select_Seqs(-selection=>[4..10,20..35,37]);
Or you can toggle selection(reverse selection) using:
$newaln=$aln->select_Seqs([4..10,20..35,37],1);
$newaln=$aln->select_Seqs(-selection=>[4..10,20..35,37],-toggle=>1);
The coordinates of the sequences or columns are 1-based. This new feature will make the selection much easier for users.And, it will affect methods such as:
select_Seqs
remove_Seqs
select_columns
remove_columns
mask_columns
Monday, July 5, 2010
Subscribe to:
Posts (Atom)