regex - What difference betewen [\s\S]*? and .*? in Java regular expressions? -
i have developed regular expression identify block of xml inside text file. expression looks (i have removed java escape slashes make read easy):
<\?xml\s+version="[\d\.]+"\s*\?>\s*<\s*rdf:rdf[^>]*>[\s\s]*?<\s*\/\s*rdf:rdf\s*> then optimised , replaced [\s\s]*? in middle .*? stop recognising xml.
as far know, \s means white-space symbols , \s means non white-spaced symbols or [^\s] [\s\s] logically should same . didn't used greedy filters. can difference?
the regex expressions . , \s\s not equivalent, since . doesn't catch line terminators (like new line) default.
according oracle website, . matches "any character (may or may not match line terminators)", while line terminator of following:
- a newline (line feed) character ('\n'),
- a carriage-return character followed newline character ("\r\n"),
- a standalone carriage-return character ('\r'),
- a next-line character ('\u0085'),
- a line-separator character ('\u2028'), or
- a paragraph-separator character ('\u2029).
the 2 expressions not equivalent, long necessary flags not set. again quoting oracle website:
if
unix_linesmode activated, line terminators recognized newline characters.the regular expression
.matches character except line terminator unlessdotallflag specified.
Comments
Post a Comment