python - Using regular expression to find specific strings between parentheses (including parentheses) -
i trying use regular expression find specific strings between parentheses in string 1 below:
foo = '((peach w/o juice) or apple or (pear w/o water) or kiwi or (lychee , sugar) or (pineapple w/o salt))' specifically, want find (peach w/o juice), (pear w/o water), , (pineapple w/o salt).
i tried lookahead , lookbehind, unable obtain correct results.
for example, when following regex:
import re regex = '(?<=[\s\(])\([^\)].*\sw/o\s[^\)].*\)(?=[\)\s])' re.findall(regex, foo) i end entire string:
['(peach w/o juice) or apple or (pear w/o water) or kiwi or (lychee , sugar) or (pineapple w/o salt)'] edit:
i found problem:
instead of [\)].*, should [\)]*, give me correct result:
regex = '(?<=[\s\(])\([^\)]*\sw/o\s[^\)]*\)(?=[\)\s])' re.findall(regex, foo) ['(peach w/o juice)', '(pear w/o water)', '(pineapple w/o salt)']
i think problem .* operators being greedy - consume as can if don't put ? after them: .*?. also, note since want parentheses, shouldn't need lookahead/lookbehind operations; exclude parentheses find.
instead of debugging regex, decided rewrite it:
>>> import re >>> foo ='((peach w/o juice) or apple or (pear w/o water) or kiwi or (lychee , sugar) or (pineapple w/o salt))' >>> regex = '\([a-za-z ]*?w/o.*?\)' >>> re.findall(regex, foo) ['(peach w/o juice)', '(pear w/o water)', '(pineapple w/o salt)'] here's breakdown:
\( captures leading parentheses - note it's escaped
[a-za-z ] captures alphabetical characters , space (note space after z before closing bracket) used instead of . no other parentheses captured. using period operator cause (lychee , sugar) or (pineapple w/o salt) captured 1 match.
*? * causes characters in bracket match 0 or more times, ? says only capture many need make match
w/o captures "w/o" you're looking
.*? captures more characters (again, non-greedy because of ?)
\) captures trailing parenthesese
Comments
Post a Comment