smd_xml : extract data from XML feeds

nardo · 2010-01-04 11:16:33

not been on the forums for a while – still using Txp daily, hourly! – but always impressed when dropping in … this plugin looks great, bloke … you’re a machine

P.S. I mucked around with that other plugin that takes feeds and inserts them into the db as articles … this might be more flexible, look forward to someone implementing that bit ; )

photonomad · 2010-01-04 21:09:57

dang, this is so cool!

I was just giving the plugin a try and it works effortlessly with most xml feeds. I just tried it with a sample feed from artistdata.com and I can’t get it to work…. I’m wondering if it is a problem with the artistdata feed or me (probably), or something that the plugin isn’t catching with regard to the way they do their feeds? This is the ArtistData developers info page

Here is my try at parsing it with the plugin:

<txp:smd_xml data="https://www.artistdata.com/dashboard/feed/ADS-B05XYB0QLMN8D15/17" record="Shows" fields="AgeLimit, City, Country, Date, DoorsTime, EventName, OtherArtist, OtherArtistURL, State, TicketPrice, TicketPurchaseURL, Time, Venue, VenueAddress, VenuePhone, VenuePostalCode, VenueURL" skip="Artist, ArtistIdentifier, ArtistURL, Genre, PartnerArtistIdentifier, ShowIdentifier, Status" linkify="OtherArtistURL, TicketPurchaseURL, VenueURL" wraptag="div" target_enc="iso-8859-1">
AgeLimit: {AgeLimit}<br />
City: {City}<br />
Country: {Country}<br />
Date: {Date}<br />
DoorsTime: {DoorsTime}<br />
EventName: {EventName}<br />
OtherArtist: {OtherArtist}<br />
OtherArtistURL: {OtherArtistURL}<br />
State: {State}<br />
TicketPrice: {TicketPrice}<br />
TicketPurchaseURL: {TicketPurchaseURL}<br />
Time: {Time}<br />
Venue: {Venue}<br />
VenueAddress: {VenueAddress}<br />
VenuePhone: {VenuePhone}<br />
VenuePostalCode: {VenuePostalCode}<br />
VenueURL: {VenueURL}
</txp:smd_xml>

also tried it without as many variables and it still returns nothing:

<txp:smd_xml data="https://www.artistdata.com/dashboard/feed/ADS-B05XYB0QLMN8D15/17" record="Shows" fields="Time" wraptag="div" target_enc="iso-8859-1">
Time: {Time} <br />
</txp:smd_xml>

Any ideas? I hope I haven’t just missed something obvious!

jan · 2010-01-04 21:57:54

photonomad,

I first thought that the feed url you are referencing is only retrievable when you’re logged in, but that doesn’t seem to be the case.

You sure your server allows fopen wrappers?
And have you checked your php error log yet?

photonomad · 2010-01-04 23:12:23

No errors in my php error log and my php.ini was set to allow_url_fopen. I just tried it with allow_url_fopen On and then switched it to Off – still no errors either way. The xml feed that I tested from upcoming.yahoo.com works fine, but the artistdata xml feed returns nothing. I’ve got Textpattern in debugging mode and no errors show there either. hmm…

Zanza · 2010-01-04 23:13:39

Don’t know if this is related, but the feed Photonomad posted does not validate. It could be a problem, given the way this plugin works (just asking Bloke)?

Bloke · 2010-01-05 10:58:48

photonomad

Nothing you’re doing wrong; it’s me. The plugin can’t handle https:// feeds because I hard-coded the port to 80. D’oh!

If you switch debugging on in the plugin (debug="3" is the highest level and shows the feed source) you’ll see that the plugin receives a 301 Redirect instead of the proper feed in your case.

[ Incidentally, you don’t need to skip fields if you’re not using them; the only time you need to skip them is if they clash with ones you want to pull out. My rule of thumb is that if you have a sub-node inside your record that uses the same field as one of the ones you are pulling out then you should skip the node. Otherwise you can save yourself some typing! ]

I’ve fixed the feed thing in v0.21. Many thanks for the report, hope that gets you going. I’ve also added a few other attributes:

defaults allows you to set default values for any nodes that are empty
set_empty is a shortcut that makes sure that all empty nodes are set as empty instead of showing up as {Replacement Tag} in your output
transport allows you to force the HTTP transport mechanism of how the feed is fetched (shouldn’t ever need this)

photonomad · 2010-01-05 18:24:11

Bloke

You Rock!! : )

photonomad · 2010-01-05 20:03:39

I’m also curious about formatting times and dates. I wouldn’t know how to program it. Writing from a user’s point of view, might there be a way to trigger the plugin (maybe with parentheses or something) to format a field on output somehow like this:

feed example:
<Date>2008-11-15</Date>
<Time>20:00:00</Time>

maybe the plugin output could be written like this:
{Date(‘D, M jS’)} at {Time(‘g:ia’)}

and the end result:

Saturday, Nov 15th at 8:00pm

I’m sure it is more complicated to implement, but I thought I’d throw it out here anyway!

Bloke · 2010-01-06 08:31:48

photonomad wrote:

I’m also curious about formatting times and dates.

Me too. I had a look for a date-based reformatting plugin but none exists as far as I can make out. Yet… :-) The closest is <txp:smd_cal_now /> in smd_calendar. If you happen to have that installed you can probably use it to reformat (most) English-looking dates.

Sadly, the syntax you specified is rather difficult to implement. But I may be able to add an attribute to the plugin itself that allows you to format stuff. For example format="Date|datetime|M-d-Y, DoorsTime|datetime|H:m:s" and so on but I’d rather not have to do it because it means the plugin becomes limited. In other words, where does it end? If you wanted to truncate the output you could use rvm_substr in your Form/container but some might argue that the plugin should be able to do it natively with format="SomeField|truncate|8". Or if you wanted to convert a field to an integer or something like that. So if I do date manipulation the formatting features potentially never end!

However, it is a useful facility so I might do it if I can find a way. Or I might write/find a date formatting plugin instead :-)

nardo · 2010-01-12 23:32:35

I’m trying to parse this feed from a Google Spreadsheet

I can get 12 names output – but no more – whether calling the file from Google or from a local version of the file saved to hard drive

stumped

Bloke · 2010-01-13 10:05:13

nardo wrote:

I can get 12 names output – but no more – whether calling the file from Google or from a local version of the file saved to hard drive

Eeeek! There are no line breaks in that feed and the plugin had a max line length of 8192 characters (which is fine for 99% of feeds). I’ve added a line_length attribute now so you can set it to something huge, but that might not help you because I think PHP enforces some internal limit that you’d have to to raise by mucking around with php.ini.

It’s far better if you possibly can to switch to transport="curl" because it doesn’t have any line length restrictions.

Here’s v0.22 anyway. Hope that helps, and thanks for the report.

nardo · 2010-01-14 04:23:15

good stuff bloke – the fact that there were no line-breaks was annoying me (given the cruft) but didn’t clue me in to why the feed was truncated … when I move off XAMPP to live server I’ll give curl a go … for the moment the line_length attribute is working neatly

another question … I’m pulling in a list of names … sorted & displayed alphabetically … I’d like to provide named anchors to jump down the list to “T” or “W” … if_different makes that easy … but how to extract the first character (e.g. “a” from “ALBERTA”) from a field’s data? … not suggesting smd_xml should be able to do it tho’ …

pieman · 2010-01-14 11:36:29

Only just got around to trying this one out. Shame on me…

Needless to say it’s another Bloke classic. This is getting boring ;-)

Bloke · 2010-01-17 12:17:34

v0.3 steps up to the plate and smacks xml documents out of the park. Features:

format attribute which allows you to further manipulate each field in a variety of ways prior to seeing it in the form/container. Exampels are changing the case of strings, sanitizing the data, reformatting dates/times, escaping data ready for import. If anyone can think of any other transformations I’ve missed, please yell
URL params can now be passed in the data attribute (ahem: minor oversight on my part)
linkify is deprecated: use format="field_name|link" instead now. You will receive a warning if you use linkify any more
the link creation regex is improved so it catches more URLs. If anyone has any problems with it, please let me know which URLs it chokes on
IMPORTANT : param_delim default is now the pipe symbol (|) instead of the colon (:). Please update any existing smd_xml tag attributes accordingly or add param_delim=":" to preserve the existing functionality. The colon proved to be used too often in too many streams and meant you had to pretty much use param_delim in every smd_xml tag to make it useful. Plus, you often want to use colons in date/time strings so it made sense to alter it

See how you get on. This version is much better at being able to embed smd_query tags inside XML streams to insert data into your TXP database from feeds. See example 6 in the help for a concrete implementation of this. As always, report good / bad / ugly stuff here and I’ll send the mermaids out to your oil rig to fix everything.

nardo · 2010-01-19 04:15:09

Bloke, what would happen if you have a data feed that has updated with additional content for existing items as well as new items … and you INSERT into Txp database … would it append ALL as new articles? or would it append new info to the relevant Txp fields in existing articles and make new articles where there are new items in the feed?

Textpattern CMS

Textpattern CMS support forum

#16 2010-01-04 11:16:33

Re: smd_xml : extract data from XML feeds

#17 2010-01-04 21:09:57

Re: smd_xml : extract data from XML feeds

#18 2010-01-04 21:57:54

Re: smd_xml : extract data from XML feeds

#19 2010-01-04 23:12:23

Re: smd_xml : extract data from XML feeds

#20 2010-01-04 23:13:39

Re: smd_xml : extract data from XML feeds

#21 2010-01-05 10:58:48

Re: smd_xml : extract data from XML feeds

#22 2010-01-05 18:24:11

Re: smd_xml : extract data from XML feeds

#23 2010-01-05 20:03:39

Re: smd_xml : extract data from XML feeds

#24 2010-01-06 08:31:48

Re: smd_xml : extract data from XML feeds

#25 2010-01-12 23:32:35

Re: smd_xml : extract data from XML feeds

#26 2010-01-13 10:05:13

Re: smd_xml : extract data from XML feeds

#27 2010-01-14 04:23:15

Re: smd_xml : extract data from XML feeds

#28 2010-01-14 11:36:29

Re: smd_xml : extract data from XML feeds

#29 2010-01-17 12:17:34

Re: smd_xml : extract data from XML feeds

#30 2010-01-19 04:15:09

Re: smd_xml : extract data from XML feeds

Board footer