Speeding up re.sub in python -
i have following python code, running bit slow on 10mb file. i'm wondering there way speed up? maybe doing re.sub in 1 go (rather 2 operations) - not sure how though, or maybe there way?
def changemode(file, amode0, amode1, bmode0, bmode1): line in iter(file): if 'aaa' in line or 'bbb' in line or 'ccc' in line: line = re.sub(mode0, mode1, line) line = re.sub(bmode0, bmode1, line) endstring += line return endstring
cheers
if affected lines rare can speed lot using re.sub
or re.finditer
finding lines directly instead of iterating on lines @ python level. , str.replace
fast in case of simple string replacements:
def fsub(m): return m.group().replace('ij', 'xx').replace('kl', 'yy') s = re.sub('(?m)^.*(?:aaa|bbb|ccc).*', fsub, open(path).read())
note: (?m)
causes ^
match beginning of each line , .*
not grab beyond line end.
regex pre-compilation can speed many individual regex re.sub's (when simple string replacements not applicable) little:
rec = re.compile(r'ij\d+') # once ... line = rec.sub('xx', line) #
(re.sub
uses regex compile cache quite fast yet.)
if replacements not change string size, can speed things lot using bytearray
/ buffers or mmap
, modify data in-place. (re.sub()
, string.replace
, endstring += line
cause lot of memory copyied around.)
Comments
Post a Comment