What Was -------- 1. DWT files -- multiple segments, each with - start(ms), sampling(ms) - ndwell, idwell[], tdwell[] (ms) - nclass, amp[], sd[] 2. QUB idealization (QFS_QIF) - assumes idl segments are sub-ranges of data segments - each segment kept as the numbers and arrays in (1) 3. DataSet of idealized (qubtree) - result of skm in qubopt - input to mil in qubopt --- DataSet { sampling = 0.05 #\ ms Segment { start = 0.0 #\ ms amp = 0.0 1.0 sd = 0.1 0.1 DwellCount = 3 Classes = 0 1 0 Durations = 0.25 2.6 0.3 } } --- Trouble ------- After running an idealizer, QUB cleared any existing idealization and replaced it with the new one. This caused trouble when you idealized the whole data, but a small amount idealized incorrectly. You'd have to tweak the parameters and re-idealize the whole thing again, and hope. In addition, the QIF retained the segmentation which generated it, even if two segments were back-to-back within a single data segment -- you couldn't re-draw the lines with a new selection list. This would have been nice for using IdlBase on chopped data, since it can't handle too many points at once, then making a list of clusters which might each cross a chop-boundary. What we needed was a structure which could easily insert and remove events, quickly seek to the event at point p0, and iterate through until the event at point p1. Basic Replacement ----------------- (I am going to overlook sampling, amp, sd, etc.) typedef struct { int cls; int first; int last; } Dwell; typedef map DwellMap; class IdlSeg { int first; int last; DwellMap dwells; ... }; We make one IdlSeg per data segment, with the same bounds. Initially, seg.dwells contains one dwell: { cls = -1, first = 0, last = seg.last - seg.first } (dwell bounds are relative to this seg -- makes it easier to insert/remove segments) * finding the dwell at point p: DwellMap::iterator di = dwells.find(p - first); if ( di != dwells.end() ) return di->second; * iterating do ++di while di->second.last >= p1; * inserting dwells d[0], ..., d[n-1]: dwells.erase( dwells.find( d[0].first ), dwells.find( d[n-1].last ) ) // (roughly) dwells.insert( each d[i] ) join adjacent dwells of same class * removing dwells between p0, p1: truncate the boundary events as necessary remove the events in the middle insert an event of class -1 from p0 to p1 join adjacent dwells of same class Wrinkle: data segments that can be inserted and deleted e.g. cut, paste, append - data segments have contiguous point ranges: 0 .. n0-1 n0 .. n0+n1-1 ... - dwell bounds are relative to the segment, so only seg.first, last need be changed. - IdlSegments object holds the segments, manages segmentation, converts to/from qubtree form Problem: takes too long to destroy a big segment - erasing one dwell: O(log(n)) - erasing all dwells: O(nlog(n)) - but n might be a million Solution: break long segments into chunks - chunk size is M = 2^^k, so we can right-shift a point index by k to find the chunk index - events that cross chunk boundaries are broken in 2 (or however many chunks they cross) - now erasing all dwells takes O(nlog(M)) == O(nk) == O(n) Side-effects - there may be several events of same class in a row -- can be drawn separately -- are separate as seen from iterators -- need to be joined for purposes of MIL etc. (by conversion to qubtree form) - iterators are crazy: DwellMap::iterator -- iterates through one chunk IdlMapsIter -- iterates seamlessly over all chunks of one segment IdlSegmentsIter -- iterates seamlessly over all segments (since data segmentation is irrelevant for drawing) Problem: maps take up way more space than arrays, and even keeping the arrays in memory could be excessive for large enough data. Solution: QUB_Freezer - chunks can be in one or more format - map -- mutable, lots of memory - arrays -- read-only, less memory - arrays-on-disk -- no read or write, no memory - chunks are kept in "freezer" - downgrades storage level of least-recently-accessed chunks when #dwells at a storage level exceeds a threshold. - we checkOut / checkIn a chunk while reading or modifying it - marks its last access time - wrests control of storage levels: - we convert 'em back to maps as needed for setDwells - we reload arrays as needed for reading/iterating - freezer is ordered roughly by access time, as a heap - freezer access is intermediated by one or more FreezerSeries - ordered, resizable like a vector for simplicity - chunk keeps array-data in a qubtree node for easy paging-to-disk - node is part of a QUB_NodeArray - common root so they are all in the same file, which you can saveAsTemp() - collection of "unconnected" nodes (connectivity is managed behind scenes to enable common root) - arr[2*i] is child of arr[i] - arr[2*i+1] is sibling of arr[i]