python - Using regular expression to find specific strings between parentheses (including parentheses) -


i trying use regular expression find specific strings between parentheses in string 1 below:

foo = '((peach w/o juice) or apple or (pear w/o water) or kiwi or (lychee , sugar) or (pineapple w/o salt))' 

specifically, want find (peach w/o juice), (pear w/o water), , (pineapple w/o salt).

i tried lookahead , lookbehind, unable obtain correct results.

for example, when following regex:

import re regex = '(?<=[\s\(])\([^\)].*\sw/o\s[^\)].*\)(?=[\)\s])' re.findall(regex, foo) 

i end entire string:

['(peach w/o juice) or apple or (pear w/o water) or kiwi or (lychee , sugar) or (pineapple w/o salt)'] 

edit:

i found problem:

instead of [\)].*, should [\)]*, give me correct result:

regex = '(?<=[\s\(])\([^\)]*\sw/o\s[^\)]*\)(?=[\)\s])'  re.findall(regex, foo) ['(peach w/o juice)', '(pear w/o water)', '(pineapple w/o salt)'] 

i think problem .* operators being greedy - consume as can if don't put ? after them: .*?. also, note since want parentheses, shouldn't need lookahead/lookbehind operations; exclude parentheses find.

instead of debugging regex, decided rewrite it:

>>> import re >>> foo ='((peach w/o juice) or apple or (pear w/o water) or kiwi or (lychee , sugar) or (pineapple w/o salt))' >>> regex = '\([a-za-z ]*?w/o.*?\)' >>> re.findall(regex, foo) ['(peach w/o juice)', '(pear w/o water)', '(pineapple w/o salt)'] 

here's breakdown:

\( captures leading parentheses - note it's escaped

[a-za-z ] captures alphabetical characters , space (note space after z before closing bracket) used instead of . no other parentheses captured. using period operator cause (lychee , sugar) or (pineapple w/o salt) captured 1 match.

*? * causes characters in bracket match 0 or more times, ? says only capture many need make match

w/o captures "w/o" you're looking

.*? captures more characters (again, non-greedy because of ?)

\) captures trailing parenthesese


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -