php - Get words from string - skip html -
i use function first "x" words of string. main part is:
preg_match_all('/(<\/?([\w+]+)[^>]*>)?([^<>]*)/', $text, $tags, preg_set_order);
when word inside html - example:
<a href="/"><u>linktext</u></a>
the regex see word "linktext" word. regex should changed skip every word inside html tag.
is possible?
use xsl transformations. used template related answer (how remove text xml document):
$string = '<a href="/">some text <u>linktext</u> more text</a>'; $xsltemplate = '<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/xsl/transform" version="1.0"> <!-- copy nodes --> <xsl:template match="node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- clear attributes --> <xsl:template match="@*"> <xsl:attribute name="{name()}" /> </xsl:template> <!-- ignore text content of nodex --> <xsl:template match="text()" /> </xsl:stylesheet>'; libxml_use_internal_errors(true); $inputdom = new domdocument(); $inputdom->loadhtml($string); $xsldom = new domdocument(); $xsldom->loadxml($xsltemplate); $cp = new xsltprocessor(); $cp->registerphpfunctions(); $cp->importstylesheet($xsldom); $transformedresult = $cp->transformtodoc($inputdom); $transformedhtmlstring = $transformedresult->savexml($transformedresult->getelementsbytagname('body')->item(0)); $transformedhtmlstring = str_replace('<body>','', $transformedhtmlstring); //savexml() method leaves automatically created body tag $transformedhtmlstring = str_replace('</body>','', $transformedhtmlstring); echo $transformedhtmlstring;
Comments
Post a Comment