etc_query: all things Textpattern

aslsw66 · 2014-10-29 13:09:24

I have a URL in the format www.domainname.com/folder/index.php?query1=value1&query2=value2#hashvalue.

What I want to be able to do is reproduce this URL without the final hash value eg. www.domainname.com/folder/index.php?query1=value1&query2=value2 .

I get a sense that etc_query should be able to do this sort of thing, but I’m not clever enough to know if its true or not. I’ve been researching into XPath, but while I can see that text can be searched for it’s not clear how to either delete it completely, or end the string at the point where the text is found. It’s annoying to admit this, because I could make this work in programming languages.

Any ideas?

jakob · 2014-10-29 15:43:22

You should be able to get this bit ?query1=value1&query2=value2 without the hash with this:

<txp:php>echo $_SERVER['QUERY_STRING'];</txp:php>

… and then reconstruct the rest with txp:site_url, txp:section etc. You might also be able to get this via txp:page_url like you can request_uri, but I’m not sure.

And if it’s not a txp url, you can use

<txp:smd_wrap transform="split|#||first">
  www.domainname.com/folder/index.php?query1=value1&query2=value2#hashvalue
</txp:smd_wrap>

or rah_function and the php split function to lop off the hash at the end of the string…

etc · 2014-10-29 20:36:22

aslsw66 wrote #285212:

I have a URL in the format www.domainname.com/folder/index.php?query1=value1&query2=value2#hashvalue.

What I want to be able to do is reproduce this URL without the final hash value eg. www.domainname.com/folder/index.php?query1=value1&query2=value2 .

I presume this URL is the value of some href attribute inside a chunk of HTML code (otherwise Jakob’s advice is probably easier to follow). You can try this:

<txp:etc_query data="your html code" functions="preg_replace"
	replace="//a/@href={preg_replace('/#.*$/', '', string())}" />

It’s actually more php than xpath.

aslsw66 · 2014-10-29 21:24:20

And if it’s not a txp url, you can use

It’s likely to become more complex than a straight TXP URL as this will be driving some admin functionality at the front-end for specific users.

But Jakob thanks for the pointers to smd_wrap and rah_function. As usual, now you point them out I knew they were out there but also, as usual, I struggle to remember all of the great plugins out there.

It’s actually more php than xpath.

I did play around with preg_replace but frankly it feels like black magic to me – it’s very powerful but highly codified. Can I admit to tip-toeing around etc_query too? So thanks for the tip.

It looks like I have three options to play with!

jakob · 2014-10-30 14:03:14

I struggle to remember all of the great plugins out there

Me too, especially after being away from things for a while.

preg_replace is also doable with smd_wrap and the transform="replace|regex|..." attribute. I also struggle with regex, but there are various helper sites (and apps like patterns for the mac or a similar regex tool for windows) that make life easier. Oleg’s pattern first of all looks only in href attributes, courtesy of his plugin and then does the following:

The / … / just delimit the pattern to search for.
starting with a #, then search for .* = zero or more single characters until $ = the end of the string is reached.

etc · 2014-10-30 17:18:37

aslsw66 wrote #285228:

I did play around with preg_replace but frankly it feels like black magic to me – it’s very powerful but highly codified.

Actually, preg_replace is too much here, a simpler (and more comprehensive) option would be

<txp:etc_query data="<a href='www.domainname.com/folder/index.php?query1=value1&query2=value2#hashvalue'>link</a>"
	replace="//a@@href={substring-before(string(@href), '#')}" />

~~but I have discovered that XPath string() function breaks on the first & (amp) character, so neither will work for your example.~~ Edit: no, it is some etc_query bug, the snippet above should work.

It looks like I have three options to play with!

It depends on where URL comes from…

Last edited by etc (2014-10-30 20:12:49)

MattD · 2015-01-14 18:15:48

If I could wrap my head around the power of this tool I’m sure I could figure this out. Until then I have to ask for help.

I’m pulling data from a Google Calendar XML feed but the summary of the events is a big messy html like this

When: Fri Feb 6, 2015 6pm to 10:30am&nbsp;
HST<br>
<br>Who: John Doe, The Amazing Restaurant Events, Jane Doe
<br>Where: The Amazing Restaurant
<br>Event Status: tentative
<br>Event Description: Come exercise your brain with the BEST weekly Trivia...

How can I not print the Who line and the Event Status line? Or even better is there a way to separate each of the lines and selectively output them?

Recurring events are even worse!

etc · 2015-01-14 18:33:50

From the top of my head, you can output the text following, say, the 3rd <br> with

<txp:etc_query ...>
	{//br[3]/following-sibling::text()[1]}
</txp:etc_query>

But yes, it’s a real mess…

etc · 2015-01-15 10:41:01

You might have some difficulties with Google Calendar atom feed: it’s a namespaced XML, some parts of content are html encoded, and, when decoded, contain unclosed <br>, invalid in XML. I have managed to import it with

<txp:etc_query markup="xml:"
	url="https://www.google.com/calendar/feeds/YOUR_ID@group.calendar.google.com/public/basic"
	query="//entry"
	replace="content={str_replace('<br>', '<br />', html_entity_decode(htmlspecialchars_decode(string())))}"
 	functions="htmlspecialchars_decode,html_entity_decode,str_replace"
>
	{content/br[3]/following-sibling::text()[1]}
</txp:etc_query>

fwiw

MattD · 2015-01-15 17:28:44

Thanks etc but I get no output using that. I had to change the markup="xml:" to remove the :.

aslsw66 · 2015-05-18 12:38:57

A while a go, I got some help on how to pull out the fire danger ratings from the Australian Bureau of Meteorology’s weather feed using:

<txp:etc_query url=“ftp://ftp2.bom.gov.au/anon/gen/fwo/IDN10035.xml” markup=“xml” query=”//text[\@type=‘fire_danger’]”>
{?}
</txp:etc_query>

It all worked nicely, until the end of the official fire season. The BoM no longer publishes this number, but running the query returns (as far as I can tell) two carriage returns or new lines.

What is the best way to test whether or not the type='fire_danger' actually exists in the feed?

Last edited by aslsw66 (2015-05-18 12:39:36)

etc · 2015-05-18 17:24:42

aslsw66 wrote #290869:

running the query returns (as far as I can tell) two carriage returns or new lines.

What is the best way to test whether or not the type='fire_danger' actually exists in the feed?

Hello, I don’t think line breaks are generated by etc_query, which should output nothing in this case. To be sure, add <txp:else /> tag inside etc_query and some “not found” text after it. You probably have line breaks before and after <txp:etc_query /> tag?

Textpattern CMS

Textpattern CMS support forum

#217 2014-10-29 13:09:24

Re: etc_query: all things Textpattern

#218 2014-10-29 15:43:22

Re: etc_query: all things Textpattern

#219 2014-10-29 20:36:22

Re: etc_query: all things Textpattern

aslsw66 wrote #285212:

#220 2014-10-29 21:24:20

Re: etc_query: all things Textpattern

#221 2014-10-30 14:03:14

Re: etc_query: all things Textpattern

#222 2014-10-30 17:18:37

Re: etc_query: all things Textpattern

aslsw66 wrote #285228:

#223 2015-01-14 18:15:48

Re: etc_query: all things Textpattern

#224 2015-01-14 18:33:50

Re: etc_query: all things Textpattern

#225 2015-01-15 10:41:01

Re: etc_query: all things Textpattern

#226 2015-01-15 17:28:44

Re: etc_query: all things Textpattern

#227 2015-05-18 12:38:57

Re: etc_query: all things Textpattern

#228 2015-05-18 17:24:42

Re: etc_query: all things Textpattern

aslsw66 wrote #290869:

Board footer