Speeding up re.sub in python -


i have following python code, running bit slow on 10mb file. i'm wondering there way speed up? maybe doing re.sub in 1 go (rather 2 operations) - not sure how though, or maybe there way?

def changemode(file, amode0, amode1, bmode0, bmode1): line in iter(file):     if 'aaa' in line or 'bbb' in line or 'ccc' in line:             line = re.sub(mode0, mode1, line)             line = re.sub(bmode0, bmode1, line)     endstring += line return endstring 

cheers

if affected lines rare can speed lot using re.sub or re.finditer finding lines directly instead of iterating on lines @ python level. , str.replace fast in case of simple string replacements:

def fsub(m):     return m.group().replace('ij', 'xx').replace('kl', 'yy')  s = re.sub('(?m)^.*(?:aaa|bbb|ccc).*', fsub, open(path).read()) 

note: (?m) causes ^ match beginning of each line , .* not grab beyond line end.

regex pre-compilation can speed many individual regex re.sub's (when simple string replacements not applicable) little:

rec = re.compile(r'ij\d+') # once ... line = rec.sub('xx', line)  # 

(re.sub uses regex compile cache quite fast yet.)

if replacements not change string size, can speed things lot using bytearray / buffers or mmap , modify data in-place. (re.sub() , string.replace , endstring += line cause lot of memory copyied around.)


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -