regex - What difference betewen [\s\S]*? and .*? in Java regular expressions? -


i have developed regular expression identify block of xml inside text file. expression looks (i have removed java escape slashes make read easy):

<\?xml\s+version="[\d\.]+"\s*\?>\s*<\s*rdf:rdf[^>]*>[\s\s]*?<\s*\/\s*rdf:rdf\s*> 

then optimised , replaced [\s\s]*? in middle .*? stop recognising xml.

as far know, \s means white-space symbols , \s means non white-spaced symbols or [^\s] [\s\s] logically should same . didn't used greedy filters. can difference?

the regex expressions . , \s\s not equivalent, since . doesn't catch line terminators (like new line) default.

according oracle website, . matches "any character (may or may not match line terminators)", while line terminator of following:

  • a newline (line feed) character ('\n'),
  • a carriage-return character followed newline character ("\r\n"),
  • a standalone carriage-return character ('\r'),
  • a next-line character ('\u0085'),
  • a line-separator character ('\u2028'), or
  • a paragraph-separator character ('\u2029).

the 2 expressions not equivalent, long necessary flags not set. again quoting oracle website:

if unix_lines mode activated, line terminators recognized newline characters.

the regular expression . matches character except line terminator unless dotall flag specified.


Comments

Popular posts from this blog

routing - AngularJS State management ->load multiple states in one page -

python - GRASS parser() error -

Swift game error message -