Wednesday, May 12, 2010

General ideas of th project

The general ideas of the the project are in two folds:

1. Re-classify the Bio::SimpleAlign module into several new modules.

The current Bio::SimpleAlign is a very complicated module, comprising of more than 80 methods. The first aim of the project is to classify the methods into several small modules. The basic idea it to keep the "generic methods", which are methods reading/writing alignment features in the basic module. Then, move the other "outer methods", e.g. methods editing/calculating sequence residues and/or sub-alignment into seperate modules. Most of the methods in the current module will be kept, and a few new methods will be added.

Programing difficulty: Easy to Medium

2. Alignment-oriented Bio::Align::AlignI module

The new Bio::Align::AlignI module will be refactored to be alignment oriented. Generally, when we load the alignment file/DB (next_aln()), we only read the alignment feature into memory. The actual sequences will be read line by line when we need them (next_seq()).

Programing difficulty: Medium to Difficult

The two points above can be the starting points of the project. At the moment, it is limited to global alignment (e.g. clustalw output file). These two points are highly related. The second point deals with the general input protocol, and the first point deals with the methods manipulating the alignment.


Later in the project, or in the near future, assembly file (SAM/ACE) and local alignment(BLAST) file and may be considered. This may concern the refactor of Bio::Assembly and other packages.

The next post will be a summary of the new structure of Bio::SimpleAlign.

No comments:

Post a Comment