Python regex finding sub-string -
i'm new python , regex. here i'm trying recover text between 2 limits. starting mov/add/rd/sub/and/etc.. , end limit end of line.
/********** sample input text file *************/ f0004030: a0 10 20 02 mov %l0, %psr //some unwanted lines f0004034: 90 04 20 03 add %l0, 3, %o0 f0004038: 93 48 00 00 rd %psr, %o1 f000403c: a0 10 3f fe sub %o5, %l0, %g1 /*-------- here code -----------/ try: objdump = open(dest+name,"r") except ioerror: print "error: '" + name + "' not found in " + dest sys.exit() objdump_file = objdump.readlines() objdump_line in objdump_file: = ['add', 'mov','sub','rd', 'and'] if any(x in objdump_line x in a) # avoid unwanted lines >>>>>>>>>> here problem >>>>>>>>>>>>> m = re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.dotall) <<<<<<<<<<< here problem <<<<<<<<<<<<< print m /*---------- result i'm getting --------------*/ [('mov', ' %l0, %psr', '')] [('add', ' %l0, 3, %o0', '')] [('rd', ' %psr, %o1', '')] [('sub', ' %o5, %l0, %g1', '')] /*----------- expected result ----------------*/ [' %l0, %psr'] [' %l0, 3, %o0'] [' %psr, %o1'] [' %o5, %l0, %g1']
i have no idea why parentheses , unwanted quotes coming !!. in advance.
quoting python documentation here findall
return non-overlapping matches of pattern in string, list of strings. string scanned left-to-right, , matches returned in order found. if 1 or more groups present in pattern, return list of groups; list of tuples if pattern has more 1 group. empty matches included in result unless touch beginning of match.
the parenthesis represents 1 group or list found , contains list contains captured groups. there can multiple groups can found. can access as
re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.dotall)[0][1] 0 represents first group , 1 represents first element of list of group not want other element
the capturing group tries capture expression matched between parenthesis. last capturing group there no text. getting empty ''
as mentioned in comment using this
add(.*?)$
instead of try this
(add)(.*?)$
the ()
indicates capturing group , result expected
Comments
Post a Comment