smd_xml : extract data from XML feeds

Bloke · 2010-01-02 13:14:29

The first plugin of the new decade arrives. This one is a kind of generic XML processor. Give it a URL (e.g. a feed) that returns well-formed XML and then use the plugin to filter out stuff from the various records.

Include, or exclude any node information
Automatically extract XML attribute data and manipulate it
Use a Form or the plugin container to restructure / output data you have extracted
Add custom pagination to your document to allow visitors to step through your data

It uses the, by now, familiar {replacement tag} syntax that wet pioneered and I stole so you can reformat the information from the XML document for your own purposes. Although untested, an interesting experiment might be to grab feed data from somewhere on the web and use smd_query in the container to INSERT parts of the data into your TXP database. As far as I know there are no restrictions on what you can and can’t grab from the XML document so if you can see it, you can muck about with it.

Download the plugin and get XMLing.

As ever, post any thoughts, improvements, bugs, praise, or flaming pitchforks here and I’ll tend to the village.

Happy New Year!

Revision history
————————

All available versions and changes are listed here. Each entry indexes the relevant post(s) in the thread to learn about the features.

08 Oct 2019 | 0.4.3 | Register missed conditional tags with parser.
08 Oct 2019 | 0.4.2 | Maintenance update for PHP5+ and Textpattern 4.6+
06 Oct 2014 | 0.4.1 | Add support for customisable headers (thank johnno)
03 Apr 2012 | 0.4.0 | Improved feed support and tag detection for more varied / complicated feeds ; added XML-over-FTP support (thanks aslsw66) ; added SOAP transport facility, transport_opts and transport_config attributes ; added XSL and regex transform support ; allowed sub->field support and added match, ontagstart, ontagend and load_atts for finer control over field extraction ; added datawrap, var_prefix and timeout attributes ; added record attribute support (thanks Mats) ; fixed mangled date field bug ; fixed attributes-in-record-entry limit bug and undesired ontag output (both thanks tye) ; changed format’s escape attribute to fordb (escape is now for htmlspecialchars()) ; added kill_spaces so inter-tag whitespace removal is optional (but highly recommended) ; added tag_delim (thanks MattD)
17 Jan 10 | 0.3.0 | Enabled URL params to be passed in the data attribute ; added format (thanks photonomad) ; deprecated linkify ; param_delim default is now pipe
13 Jan 10 | 0.2.2 | Added line_length (thanks nardo)
05 Jan 10 | 0.2.1 | Added defaults, set_empty and transport ; fixed https support (thanks photonomad)
03 Jan 10 | 0.2.0 | Added cached data (thanks variaas) ; added pagination and limit/offset ; added linkify (thanks Jaro)
02 Jan 10 | 0.1.0 | Initial release

Last edited by Bloke (2019-10-08 15:08:14)

variaas · 2010-01-02 14:58:32

Looks awesome – any plans for caching capabilities? Pinging Twitter every page load can become excessive.

Bloke · 2010-01-02 19:09:57

variaas wrote:

any plans for caching capabilities?

Hadn’t thought about it, but it makes sense. Leave it with me.

LetterHoofd · 2010-01-02 21:11:23

This plugin looks very promising. Might be a true timesaver.

jan · 2010-01-03 01:56:14

I was exactly looking for this when I stumbled upon it being the newest submission, awesome :)
Tried it out really quickly, and seems to work very well!

One feature that may be useful: a limiter on the amount of entries you want to import.
That way you could, for example, show the last 3 posts from some blog on your site.

If you agree, then perhaps an offset parameter is possible too? (Since that could be added without much extra effort).

If the things I’m suggesting are already possible, slap me, it’s already 3 AM.. :-)

Anyway, good job!

Last edited by jan (2010-01-03 01:56:54)

Bloke · 2010-01-03 02:04:22

jan wrote:

I was exactly looking for this when I stumbled upon it being the newest submission, awesome :)

Excellent, glad it’s potentially useful.

… a limiter on the amount of entries you want to import.

Hehe, are you a mind reader? I’m adding limit, offset and paging features to the plugin right now so you can step through the records if you want :-) It’s a bit of a cheat because you can’t very easily grab part of an XML document, but it seems to work.

Variaas’ cache capability is coded and working already, so watch this space…

it’s already 3 AM

Only 2am here: the night is young… :-)

jan · 2010-01-03 02:13:57

Haha, maybe I imported an xml feed of your thoughts?
Anyway, I “fell with my nose in the butter” like the Dutch say.
Curious for 0.2! :D

Last edited by jan (2010-01-03 02:14:41)

Jaro · 2010-01-03 11:30:14

Great plugin!

Would there be a way to define a date/time format in the output? Also it would be great if all links in the output would be automatically clickable (e.g. bit.ly links used in Twitter).

Bloke · 2010-01-03 11:36:43

Jaro wrote:

Great plugin!

Ta. Documenting the next version today…

Would there be a way to define a date/time format in the output?

Not yet. I did think it would be useful, but couldn’t think of a decent way of doing it. Any ideas how to best specify it without making it too messy or clumsy?

it would be great if all links in the output would be automatically clickable (e.g. bit.ly links used in Twitter).

Saw your other post and thought it was a good idea, so I’m workin’ on it!

Jaro · 2010-01-03 11:49:42

Bloke wrote:
Not yet. I did think it would be useful, but couldn’t think of a decent way of doing it. Any ideas how to best specify it without making it too messy or clumsy?

No, not really, sorry. I’m not really good with php.

I noticed one thing. When I active this plugin I get a warning:

Warning: Call-time pass-by-reference has been deprecated in C:\wamp\www\mysite\textpattern\lib\txplib_misc.php(594) : eval()'d code on line 182

I’m running 4.2.0 (r3275) on localhost.

Bloke · 2010-01-03 11:55:58

Jaro wrote:

@Warning: Call-time pass-by-reference has been deprecated

What version of PHP are you running? Probably something I should look into, thanks for the heads up.

Jaro · 2010-01-03 12:10:57

Bloke wrote:

What version of PHP are you running? Probably something I should look into, thanks for the heads up.

I’m running PHP 5.2.9-1. Let me know if I can help you any further to debug this.

Last edited by Jaro (2010-01-03 12:11:08)

Bloke · 2010-01-03 12:40:05

Jaro wrote:

I’m running PHP 5.2.9-1. Let me know if I can help you any further to debug this.

Thanks. Would you try something for me please: modify the plugin and change line 182 from:

xml_set_object($xmlparser, &$this);

to this:

xml_set_object($xmlparser, $this);

then see if a) the warning goes away, b) the plugin still works as it did before. Ta!

Jaro · 2010-01-03 12:46:59

That fixed it. Warning is gone and the plugin works. Thanks!

Bloke · 2010-01-03 19:04:16

Try v0.2

Features:

Pagination / limit / offset (use negative offsets to start from the end of the feed)
Cached documents with variable timeout
Automatically linkify any links into anchored hyperlinks

Last edited by Bloke (2010-01-03 19:04:33)

Textpattern CMS

Textpattern CMS support forum

#1 2010-01-02 13:14:29

smd_xml : extract data from XML feeds

#2 2010-01-02 14:58:32

Re: smd_xml : extract data from XML feeds

#3 2010-01-02 19:09:57

Re: smd_xml : extract data from XML feeds

#4 2010-01-02 21:11:23

Re: smd_xml : extract data from XML feeds

#5 2010-01-03 01:56:14

Re: smd_xml : extract data from XML feeds

#6 2010-01-03 02:04:22

Re: smd_xml : extract data from XML feeds

#7 2010-01-03 02:13:57

Re: smd_xml : extract data from XML feeds

#8 2010-01-03 11:30:14

Re: smd_xml : extract data from XML feeds

#9 2010-01-03 11:36:43

Re: smd_xml : extract data from XML feeds

#10 2010-01-03 11:49:42

Re: smd_xml : extract data from XML feeds

#11 2010-01-03 11:55:58

Re: smd_xml : extract data from XML feeds

#12 2010-01-03 12:10:57

Re: smd_xml : extract data from XML feeds

#13 2010-01-03 12:40:05

Re: smd_xml : extract data from XML feeds

#14 2010-01-03 12:46:59

Re: smd_xml : extract data from XML feeds

#15 2010-01-03 19:04:16

Re: smd_xml : extract data from XML feeds

Board footer