php - Get words from string - skip html -


i use function first "x" words of string. main part is:

preg_match_all('/(<\/?([\w+]+)[^>]*>)?([^<>]*)/', $text, $tags, preg_set_order); 

when word inside html - example:

<a href="/"><u>linktext</u></a> 

the regex see word "linktext" word. regex should changed skip every word inside html tag.

is possible?

use xsl transformations. used template related answer (how remove text xml document):

$string = '<a href="/">some text <u>linktext</u> more text</a>'; $xsltemplate = '<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/xsl/transform"                 version="1.0">   <!-- copy nodes -->   <xsl:template match="node()">     <xsl:copy>       <xsl:apply-templates select="@*|node()"/>     </xsl:copy>   </xsl:template>   <!-- clear attributes -->   <xsl:template match="@*">     <xsl:attribute name="{name()}" />   </xsl:template>   <!-- ignore text content of nodex -->   <xsl:template match="text()" /> </xsl:stylesheet>';  libxml_use_internal_errors(true);  $inputdom = new domdocument(); $inputdom->loadhtml($string);  $xsldom = new domdocument(); $xsldom->loadxml($xsltemplate);  $cp = new xsltprocessor(); $cp->registerphpfunctions(); $cp->importstylesheet($xsldom);  $transformedresult = $cp->transformtodoc($inputdom); $transformedhtmlstring = $transformedresult->savexml($transformedresult->getelementsbytagname('body')->item(0));  $transformedhtmlstring = str_replace('<body>','', $transformedhtmlstring); //savexml() method leaves automatically created body tag $transformedhtmlstring = str_replace('</body>','', $transformedhtmlstring); echo $transformedhtmlstring; 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -