So, this is something that will have to scale to p...

2010-06-07T21:03:01.865+01:00

So, this is something that will have to scale to possibly millions of sequences.

First: the 'database' backend should be abstracted out of the equation, used to just grab the proper sequences and so on needed for the work. In this way you can implement this backend however you want, just give it consistent methods (interface). The default should probably be an in-memory store of some kind, whereas others can be tied to a simple DB via DBI, etc, are lazy, etc. Start with something that works (in-memory) and work out from there. You can wrap particular tools (Bio::Samtools, Bio::BigFile, etc) as needed. Or use the already set up tools Mark Jensen and others have worked on if it fills the need.

Storable will just serialize the data, correct? So one would use it in conjunction with DB_File or others. Tie::File is a possibility maybe for a lazy parser, not sure how well it scales to very large files.

Comments on GSOC2010. BioPerl Alignment Subsystem Refactoring: An open question of how to be memory efficient

So, this is something that will have to scale to p...