Monday, June 14, 2010

Conventions in Bio::Align package (Updating 1606)

Several conventions should be set up in case of potential conflicts among methods or packages.

1. The position of sequences/columns in the alignment should be 1 based.

2. The selection of sequences/columns should be list based, for example, (1,5,8..10,21..50).

3. Special charectors should be read/written by methods in the Bio::Align package. For example:
$aln->gap_char;
$aln->match_char;
$aln->missing_char;
$aln->mask_char;
The selection of characters can only be made from [0-9A-Za-z\*\-\.=~\\/\?], as configured by Bio::PrimarySeq::seq.

4. The ordering of the sequences in the alignment should be based on $seq->{'_order'}

5. The length of the sequences should be the same in all methods, at the moment, they can be calculated from $aln->length, CORE::length, and $seq->length(), each of them is calculated differently.

The length of the sequence should be defined as the number of alphabetic characters. And, the length of the alignment should be defined as the longest length of the sequences(including special characters, e.g. '-','?')

6. The name of the function should be clearer. For example, each_seq should be each_Seq to show it is retrieving sequence objects instead of sequences themselves.

No comments:

Post a Comment