Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Can I use xPath & txp:evaluate to match just 1 block of the body?
I have a site with over 1000 blog articles ported over from Wordpress. In one section, all articles start with:
*Place/town* Rest of the article body here …
For the article preview cards, I’d like to retrieve just the strong
text at the beginning of the article, if it exists. It has to be the bit at the beginning of the article, not the first occurrence of strong
text (which might be mid-text if the intro location has been omitted).
preg_match
and a regex expression in a txp:php block:
<txp:php>
global $variable;
$variable['standort'] = preg_match("/^<p>\s?<strong>(.+?)<\/strong>/", parse('<txp:body />'), $location) ? trim($location[1], '.:') : '';
</txp:php>
… but I can’t help thinking there might be a better way using txp:evaluate
and xPath. Am I right in understanding the docs that only the xPath string functions are available, not the node search ability?
TXP Builders – finely-crafted code, design and txp
Offline
Re: Can I use xPath & txp:evaluate to match just 1 block of the body?
From memory, it’s only a subset of xpath 1.0 that we support which includes string search.
If you’re concerned about regex (it is quite the blunt instrument sometimes) is there any mileage in operating on body_html and using DomDocument to parse out the strong tag? Bit more wordy code, but it might be more robust.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Online
Re: Can I use xPath & txp:evaluate to match just 1 block of the body?
Ah yes, that came up recently in another context. DOMdocument may be more robust, but it comes with its own idiosyncrasies, as I discovered by trial, error and searching: you need to silence HTML5 errors and add a prefix to process UTF-8 encoded characters. This seems to work:
<txp:php>
global $variable;
$body = new DOMDocument;
// xml encoding prefix to support utf-8 chars
// LIBXML_NOERROR to silence unrecognized HTML5 tag errors
$body->loadHTML('<?xml encoding="utf-8" ?>' . parse('<txp:body />'), LIBXML_NOERROR);
// first child of first 'p' tag
$body_start = $body->getElementsByTagName('p')->item(0)->childNodes->item(0);
$variable['standort'] = (isset($body_start->tagName) && $body_start->tagName == 'strong') ? $body_start->textContent : '';
</txp:php>
I still require a <txp:php
block for this, though.
TXP Builders – finely-crafted code, design and txp
Offline