Why does this regex space in the last match? -


i have following text:

2 hcl + 12 na + 3 (na₃cl₂)₂₄ → 2 nacl + h₂

i match each molecule, including coefficient. regex below working, space character, right before last match, getting matched, shouldn't. here's regex i'm using:

(([0-9]* ??\(*([a-z]+[₀-₉]*)+\)*[₀-₉]*))

if @ regex101 link, might easier see problem is: https://regex101.com/r/hk7jy6/1

update

if strings valid chemical formulae, why bother using subscript/digits/letters? there non-whitespace symbols. since there must obligatory letter or (, use them in character class [a-z(], , append \s* (zero or more non-whitespaces):

/(?:\d+ )?[a-z(]\s*/gi 

see regex demo. (?:...)? construct optional non-capturing group (i.e. group used group, not capture (=store submatch inside memory buffer).

original answer explanation of root cause

you have digits , space pattern @ beginning optional subpatterns, instead, need match them obligatorily, place optional group:

(?:[0-9]+ )?\(*([a-z]+[₀-₉]*)+\)*[₀-₉]* 

see regex demo

your [0-9]* ?? turned (?:[0-9]+ )?. note here not have use lazy version of ? quantifier, work same way greedy one. removed 2 unnecessary outer grouping (...).

since (?:[0-9]+ )? group optional, space matched if there digit in front of it. if there no digit, next character can matched 0 or more (. then, [a-z] letter should present (if there no (, letter first character in match).

let me break down:

  • (?:[0-9]+ )? - optional 1 or more digits followed space
  • \(* - 0 or more ( (maybe meant ?)
  • ([a-z]+[₀-₉]*)+ - 0 or more sequences of 1 or more letters followed 0 or more sbscript digits
  • \)* - 0 or more ) (maybe meant ?)
  • [₀-₉]* - 0 or more subscript digits

if want make sure not match (ca or h), should split \(*...\)* this:

(?:[0-9]+ )?(?:(?:[a-z]+[₀-₉]*)+|\((?:[a-z]+[₀-₉]*)+\))[₀-₉]* 

see another demo


Comments

Popular posts from this blog

routing - AngularJS State management ->load multiple states in one page -

python - GRASS parser() error -

json - Gson().fromJson(jsonResult, Myobject.class) return values in 0's -