Can I use xPath & txp:evaluate to match just 1 block of the body?

jakob · 2024-07-31 08:20:23

I have a site with over 1000 blog articles ported over from Wordpress. In one section, all articles start with:

*Place/town* Rest of the article body here …

For the article preview cards, I’d like to retrieve just the strong text at the beginning of the article, if it exists. It has to be the bit at the beginning of the article, not the first occurrence of strong text (which might be mid-text if the intro location has been omitted).

I’ve got something working with preg_match and a regex expression in a txp:php block:

<txp:php>
    global $variable;
    $variable['standort'] = preg_match("/^<p>\s?<strong>(.+?)<\/strong>/", parse('<txp:body />'), $location) ? trim($location[1], '.:') : '';
</txp:php>

… but I can’t help thinking there might be a better way using txp:evaluate and xPath. Am I right in understanding the docs that only the xPath string functions are available, not the node search ability?

Bloke · 2024-07-31 08:36:01

From memory, it’s only a subset of xpath 1.0 that we support which includes string search.

If you’re concerned about regex (it is quite the blunt instrument sometimes) is there any mileage in operating on body_html and using DomDocument to parse out the strong tag? Bit more wordy code, but it might be more robust.

jakob · 2024-07-31 09:11:52

Ah yes, that came up recently in another context. DOMdocument may be more robust, but it comes with its own idiosyncrasies, as I discovered by trial, error and searching: you need to silence HTML5 errors and add a prefix to process UTF-8 encoded characters. This seems to work:

<txp:php>
    global $variable;

    $body = new DOMDocument;
    // xml encoding prefix to support utf-8 chars
    // LIBXML_NOERROR to silence unrecognized HTML5 tag errors
    $body->loadHTML('<?xml encoding="utf-8" ?>' . parse('<txp:body />'), LIBXML_NOERROR);

    // first child of first 'p' tag
    $body_start = $body->getElementsByTagName('p')->item(0)->childNodes->item(0);
    $variable['standort'] = (isset($body_start->tagName) && $body_start->tagName == 'strong') ? $body_start->textContent : '';
</txp:php>

I still require a <txp:php block for this, though.

Textpattern CMS

Textpattern CMS support forum

#1 2024-07-31 08:20:23

Can I use xPath & txp:evaluate to match just 1 block of the body?

#2 2024-07-31 08:36:01

Re: Can I use xPath & txp:evaluate to match just 1 block of the body?

#3 2024-07-31 09:11:52

Re: Can I use xPath & txp:evaluate to match just 1 block of the body?

Board footer