python - Using regular expression to find specific strings between parentheses (including parentheses) -
i trying use regular expression find specific strings between parentheses in string 1 below:
foo = '((peach w/o juice) or apple or (pear w/o water) or kiwi or (lychee , sugar) or (pineapple w/o salt))'
specifically, want find (peach w/o juice)
, (pear w/o water)
, , (pineapple w/o salt)
.
i tried lookahead
, lookbehind
, unable obtain correct results.
for example, when following regex:
import re regex = '(?<=[\s\(])\([^\)].*\sw/o\s[^\)].*\)(?=[\)\s])' re.findall(regex, foo)
i end entire string:
['(peach w/o juice) or apple or (pear w/o water) or kiwi or (lychee , sugar) or (pineapple w/o salt)']
edit:
i found problem:
instead of [\)].*
, should [\)]*
, give me correct result:
regex = '(?<=[\s\(])\([^\)]*\sw/o\s[^\)]*\)(?=[\)\s])' re.findall(regex, foo) ['(peach w/o juice)', '(pear w/o water)', '(pineapple w/o salt)']
i think problem .*
operators being greedy - consume as can if don't put ?
after them: .*?
. also, note since want parentheses, shouldn't need lookahead/lookbehind operations; exclude parentheses find.
instead of debugging regex, decided rewrite it:
>>> import re >>> foo ='((peach w/o juice) or apple or (pear w/o water) or kiwi or (lychee , sugar) or (pineapple w/o salt))' >>> regex = '\([a-za-z ]*?w/o.*?\)' >>> re.findall(regex, foo) ['(peach w/o juice)', '(pear w/o water)', '(pineapple w/o salt)']
here's breakdown:
\(
captures leading parentheses - note it's escaped
[a-za-z ]
captures alphabetical characters , space (note space after z before closing bracket) used instead of .
no other parentheses captured. using period operator cause (lychee , sugar) or (pineapple w/o salt)
captured 1 match.
*?
*
causes characters in bracket match 0 or more times, ?
says only capture many need make match
w/o
captures "w/o" you're looking
.*?
captures more characters (again, non-greedy because of ?
)
\)
captures trailing parenthesese
Comments
Post a Comment