Thursday, May 27, 2010

New structure of Bio::Align

A new summary of Bio::Align modules is in

http://spreadsheets.google.com/ccc?key=0AssLTcJFJMbXdERTd1VPeFhKM3JRM2x4UHpUVVVpNFE&hl=en

Wednesday, May 19, 2010

New structure of Bio::SimpleAlign 2

See update in the google doc

http://spreadsheets.google.com/ccc?key=0AssLTcJFJMbXdFp3Smg1S3JaYzBKNUcxTmQ0STBNTXc&hl=en

Wednesday, May 12, 2010

New structure of Bio::SimpleAlign

I have finished summarizing the methods in Bio::SimpleAlign. The methods are classified into several categories. The classification borrows ideas from Bio::SimpleAlign's original classification, the menu in JalView and my own understanding.

The methods are listed in:

http://spreadsheets.google.com/ccc?key=0AssLTcJFJMbXdFp3Smg1S3JaYzBKNUcxTmQ0STBNTXc&hl=en

The next step is to sort out how the methods can be classified into different packages. Fucntions reading/writing internal alignment attributes can be transfered to Bio::Align::AlignI. Functions modifying, selecting, and calculating alignment sequences/features can be moved to Bio::Align::Utilities, or made into new packages.
Several new methods will be added.

A text based report will be given in the next couple of days.

General ideas of th project

The general ideas of the the project are in two folds:

1. Re-classify the Bio::SimpleAlign module into several new modules.

The current Bio::SimpleAlign is a very complicated module, comprising of more than 80 methods. The first aim of the project is to classify the methods into several small modules. The basic idea it to keep the "generic methods", which are methods reading/writing alignment features in the basic module. Then, move the other "outer methods", e.g. methods editing/calculating sequence residues and/or sub-alignment into seperate modules. Most of the methods in the current module will be kept, and a few new methods will be added.

Programing difficulty: Easy to Medium

2. Alignment-oriented Bio::Align::AlignI module

The new Bio::Align::AlignI module will be refactored to be alignment oriented. Generally, when we load the alignment file/DB (next_aln()), we only read the alignment feature into memory. The actual sequences will be read line by line when we need them (next_seq()).

Programing difficulty: Medium to Difficult

The two points above can be the starting points of the project. At the moment, it is limited to global alignment (e.g. clustalw output file). These two points are highly related. The second point deals with the general input protocol, and the first point deals with the methods manipulating the alignment.


Later in the project, or in the near future, assembly file (SAM/ACE) and local alignment(BLAST) file and may be considered. This may concern the refactor of Bio::Assembly and other packages.

The next post will be a summary of the new structure of Bio::SimpleAlign.