python - Maintaining overlapping annotations while removing dashes from string -


assume following python function remove dashes ("gaps") string while maintaining correct annotations on string. input variables instring , annotations constitute string , dictionary, respectively.

def degapbutmaintainanno(instring, annotations):     degapped_instring = ''     degapped_annotations = {}     gaps_cumulative = 0     range_name, index_list in annotations.items():         gaps_within_range = 0         pos, char in enumerate(instring):             if pos in index_list , char == '-':                 index_list.remove(pos)                 gaps_within_range += 1             if pos in index_list , char != '-':                 degapped_instring += char                 index_list[index_list.index(pos)] = pos - gaps_within_range         index_list = [i-gaps_cumulative in index_list]         degapped_annotations[range_name] = index_list         gaps_cumulative += gaps_within_range     return (degapped_instring, degapped_annotations) 

said function works expected if none of ranges specified input dictionary overlap:

>>> instr = "a--at--t" >>> annot = {"range1":[0,1,2,3,4], "range2":[5,6,7]} >>> degapbutmaintainanno(instr, annot) out: ('aatt', {'range1': [0, 1, 2], 'range2': [3]}) 

as 1 or more of ranges overlap, however, code fails:

>>> annot = {"range1":[0,1,2,3,4], "range2":[4,5,6,7]} >>> degapbutmaintainanno(instr, annot) out: ('aattt', {'range1': [0, 1, 2], 'range2': [2, 3]}) # see additional 't' in string 

does have suggestion on how correct code overlapping ranges?

i think might over-thinking things. here's attempt:

from copy import copy  def rewritegene(instr, annos):     annotations = copy(annos)     index = instr.find('-')     while index > -1:         key, ls in annotations.items():             if index in ls:                 ls.remove(index)             annotations[key] = [e-1 if e > index else e e in ls]         instr = instr[:index] + instr[index+1:]         index = instr.find('-')     return instr, annotations  instr = "a--at--t" annos = {"range1":[0,1,2,3,4], "range2":[4,5,6,7]}  print rewritegene(instr, annos) # ('aatt', {'range2': [2, 3], 'range1': [0, 1, 2]}) 

it should pretty readable is, let me know if want clarification on anything.


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -