Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2010-11-22 17:53:54

vurt
Member
Registered: 2010-10-22
Posts: 50

[howto] For those having problems importing WP into TXP

My problem:

Articles imported via the Wordpress importer were breaking/truncated. Upon inspection noted that the articles always broke in the same place, at the same character.

How I fixed it:

Two clean installs of Wordpress (in this case 3.0) and Textpattern 4.2.0.

Export existing WP site XML and import into clean WP install. (I do all of this locally with MAMP/XAMPP for the sake of convenience.)

Import WP into clean TXP install via TXP Importer. Find a broken article and note where the it terminates. Delete all imported posts and associated content.

Export the new Wordpress XML via Wordpress export tool. Open XML file in a good text editor like Notepad++ (Windows), set it to show all characters, white space, etc.

Find where the offending article broke in the XML file, in the content block, and examine characters closely. I found that sometimes spaces (this one really mystifies me but Notepadd++ makes it apparent), left and right quotes, and long dashes needed to be replaced with generic keyboard quotes, dashes, spaces etc. Find and replace the offending characters and save modified XML file.

Delete all WP content from new installation and import the modified XML.

Use the TXP importer to import the now updated WP.

Rinse and repeat as necessary until all of the offending characters have been eliminated.

I’m no encoding expert, and all of character replacement can probably be accomplished via phpmyadmin, but certain characters will signal the import to stop parsing article content.

I can’t tell you why this happens, only that it fixed it for me.

Offline

#2 2010-11-22 18:00:38

Els
Admin
From: The Netherlands
Registered: 2004-06-06
Posts: 7,458

Re: [howto] For those having problems importing WP into TXP

Thanks vurt, moved to How-tos and Examples :)

Offline

#3 2010-11-22 18:14:29

vurt
Member
Registered: 2010-10-22
Posts: 50

Re: [howto] For those having problems importing WP into TXP

Noted. Wasn’t sure where to post it. Thanks Els.

Offline

#4 2010-11-23 07:57:33

wet
Developer
From: Lenzing, Austria
Registered: 2005-06-06
Posts: 3,267
Website

Re: [howto] For those having problems importing WP into TXP

Would you mind sending me a copy of a WP export file which breaks, plus an indication of where the culprit character(s) are to be found?

Offline

#5 2010-11-29 15:11:03

vurt
Member
Registered: 2010-10-22
Posts: 50

Re: [howto] For those having problems importing WP into TXP

On its way.

Offline

#6 2010-12-07 18:03:59

wet
Developer
From: Lenzing, Austria
Registered: 2005-06-06
Posts: 3,267
Website

Re: [howto] For those having problems importing WP into TXP

Thanks for your contribution. Issue fixed.

Offline

#7 2010-12-08 02:05:30

jstubbs
Moderator
From: Hong Kong
Registered: 2004-12-13
Posts: 2,394
Website

Re: [howto] For those having problems importing WP into TXP

This would make a good TXP Tip for possible converts from WP to TXP. Vurt, would you mind emailing me a tutorial based on your post?

Offline

#8 2010-12-10 14:02:48

vurt
Member
Registered: 2010-10-22
Posts: 50

Re: [howto] For those having problems importing WP into TXP

Sure thing, but it’ll have to wait until after the holidays. I am deep in it right now.

Are you asking for a tutorial about how to import a site from WP into Txp, or how to fix it if you have problems?

Offline

#9 2011-06-21 18:47:58

web0master
New Member
From: warsaw, poland
Registered: 2011-06-21
Posts: 3

Re: [howto] For those having problems importing WP into TXP

Thanks to you I managed to import nearly 5,000 posts of wodpress


“Fortunately, in the sense of concrete and abstract” – that all things in common beings happy
“Fortunately, in peace and irresponsibility” – a happy life is carefree and peaceful, but the author tries to show that many philosophers before him did not include the happiness of quiet and carefree pop327

Offline

#10 2013-08-15 04:03:20

johnstephens
Plugin Author
From: Woodbridge, VA
Registered: 2008-06-01
Posts: 989
Website

Re: [howto] For those having problems importing WP into TXP

Robert, are you still interested in looking at WordPress data that breaks the importer?

My friend administers an outdated WP site that got hacked, and after he followed his host’s instructions for cleaning up the mess, he asked me to install Textpattern in place of the WordPress blog. I ran the importer and set the URLs to match, and spent a couple hours converting the WordPress theme into Textpattern templates.

A bunch of articles appeared in the backend, and on the site once I got the templates up, so I didn’t immediately notice the major problem: only 297 of 996 articles were imported, and nothing more recent than 2012. I’ve tried a whole lot of stuff to get it working, including the XML export/import method suggested above. When I do that, not even WordPress can import all the articles from it’s own XML store. I also tried the importer after upgrading the old WordPress installation to the latest version. I even cloned the wp_posts table, dropped its data, and selected year-by-year batches back into wp_posts to import into Textpattern. But even that method loses a lot of articles in transit, and I end up with the same articles for all the labor.

There are obvious encoding inconsistencies that may be the root of the problem: I can see strings of special characters that should be quotes and apostrophes and stuff, but I don’t know if there are any improper invisible characters in the mix. Also, the WordPress data includes HTML in the title field: br and i elements, mainly. I didn’t think that would affect MySQL operations, but I don’t know what’s causing the badness.

If there’s some way to sanitize the data before running Textpattern’s importer, I’d love to learn it. I’m happy to send the SQL backup file to you if you think the future of TXP could benefit from inspecting it. I welcome any other suggestions or guidance anyone might offer too.

Thank you!

Offline

Board footer

Powered by FluxBB