Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2024-07-31 08:20:23

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 4,702
Website

Can I use xPath & txp:evaluate to match just 1 block of the body?

I have a site with over 1000 blog articles ported over from Wordpress. In one section, all articles start with:

*Place/town* Rest of the article body here …

For the article preview cards, I’d like to retrieve just the strong text at the beginning of the article, if it exists. It has to be the bit at the beginning of the article, not the first occurrence of strong text (which might be mid-text if the intro location has been omitted).

I’ve got something working with preg_match and a regex expression in a txp:php block:
<txp:php>
    global $variable;
    $variable['standort'] = preg_match("/^<p>\s?<strong>(.+?)<\/strong>/", parse('<txp:body />'), $location) ? trim($location[1], '.:') : '';
</txp:php>

… but I can’t help thinking there might be a better way using txp:evaluate and xPath. Am I right in understanding the docs that only the xPath string functions are available, not the node search ability?


TXP Builders – finely-crafted code, design and txp

Offline

#2 2024-07-31 08:36:01

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,419
Website GitHub

Re: Can I use xPath & txp:evaluate to match just 1 block of the body?

From memory, it’s only a subset of xpath 1.0 that we support which includes string search.

If you’re concerned about regex (it is quite the blunt instrument sometimes) is there any mileage in operating on body_html and using DomDocument to parse out the strong tag? Bit more wordy code, but it might be more robust.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Online

#3 2024-07-31 09:11:52

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 4,702
Website

Re: Can I use xPath & txp:evaluate to match just 1 block of the body?

Ah yes, that came up recently in another context. DOMdocument may be more robust, but it comes with its own idiosyncrasies, as I discovered by trial, error and searching: you need to silence HTML5 errors and add a prefix to process UTF-8 encoded characters. This seems to work:

<txp:php>
    global $variable;

    $body = new DOMDocument;
    // xml encoding prefix to support utf-8 chars
    // LIBXML_NOERROR to silence unrecognized HTML5 tag errors
    $body->loadHTML('<?xml encoding="utf-8" ?>' . parse('<txp:body />'), LIBXML_NOERROR);

    // first child of first 'p' tag
    $body_start = $body->getElementsByTagName('p')->item(0)->childNodes->item(0);
    $variable['standort'] = (isset($body_start->tagName) && $body_start->tagName == 'strong') ? $body_start->textContent : '';
</txp:php>

I still require a <txp:php block for this, though.


TXP Builders – finely-crafted code, design and txp

Offline

Board footer

Powered by FluxBB