Saturday, August 14, 2010

Bio::LocatableSeq end checking inconsistency (problem unsolved) conversation with Chris Fields on IRC

This end problem is still a tricky unsolved issue. Here is the discussion of Chris and me in the IRC channel. The ideas here may be used for further improvement.


01[17:22] Hi, pyrimidine. It seems now SimpleAlign has passed nearly all the old tests,except the end position test
01[17:23] The reason is in my email to the bioperl mail list... it is not easy to solve
[17:24] <@pyrimidine> my feeling is, truthfully, that the sequence should be immutable
[17:25] <@pyrimidine> and that any calls to modify the seq using $ls->seq('ATTTGA') beyond the constructor should ie
[17:25] <@pyrimidine> *die
[17:25] <@pyrimidine> this solves a LOT of problems
01[17:25] yep ... sounds cool :D
01[17:26] Then everytime when you modify a sequence, you can start a new sequence object. This may solve some problems
[17:26] <@pyrimidine> any methods that modify the sequence should return a new LocatableSeq with the modifications
[17:26] <@pyrimidine> yes
[17:26] <@pyrimidine> beat me to it
[17:27] <@pyrimidine> :)
[17:29] <@pyrimidine> I posed that question to Heikki L. at BOSC (switching to mainly immutable objects) and his initial though was 'isn't that a lot of work'
[17:29] <@pyrimidine> then, after thinking it over, he realized it solves a lot of state-based problems in bioperl
01[17:30] yep, that is true. It makes sense
[17:30] <@pyrimidine> one issue: setting the sequence and start or end is probably fine, but setting seq, start, and end is tricky
01[17:31] Because when you modify the sequence, the sequence is changed, and it is no sense to use the old information
[17:31] <@pyrimidine> right
05[17:31] -ChanServ- [#gsoc] Welcome to the Google Summer of Code channel! Please note that this channel is logged.
[17:32] <@pyrimidine> but,
[17:32] <@pyrimidine> $ls = Bio::LocatableSeq->new(-seq => 'ATGA-AG', -start => 2, -end => 7)
[17:32] <@pyrimidine> is a bit of a snare
[17:33] <@pyrimidine> end is defined based on the sequence itself
01[17:33] yep
[17:33] <@pyrimidine> so, I think passing in end should be more a validation step instead of a set
01[17:34] This is one is ok, it will still pass the check from $ls->end($ls->end)
[17:35] <@pyrimidine> calling end() and length() should probably just use the ungapped length (with end offset by the length)
[17:35] <@pyrimidine> oops
[17:35] <@pyrimidine> end offset by the start and the length
01[17:35] yep, this is the current check in end()
[17:36] <@pyrimidine> so, what do we do in this situtation:
[17:36] <@pyrimidine> $ls = Bio::LocatableSeq->new(-seq => '', -start => 2, -end => 7)
[17:36] <@pyrimidine> (a virtual sequence)
[17:36] <@pyrimidine> I tend to think, in this case, we don't use LocatableSeq
[17:37] <@pyrimidine> have something else
[17:37] <@pyrimidine> and keep the logic for that separate
01[17:38] ok
[17:38] <@pyrimidine> this popped up with MUMmer I think
01[17:39] Yes, we should have the location information stored in a seperate object in some case
01[17:40] Location for every residue in the current sequence from the original sequence
01[17:40] But thi will make things even more complicated
[17:40] <@pyrimidine> well, in that case you could switch to using features
01[17:41] But anyway, it is the basic idea I have for the new Bio::DB::Seq, a location based module to store alignment sequences
[17:41] <@pyrimidine> sounds good

No comments:

Post a Comment