Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#1 2010-11-22 17:53:54
- vurt
- Member
- Registered: 2010-10-22
- Posts: 50
[howto] For those having problems importing WP into TXP
My problem:
Articles imported via the Wordpress importer were breaking/truncated. Upon inspection noted that the articles always broke in the same place, at the same character.
How I fixed it:
Two clean installs of Wordpress (in this case 3.0) and Textpattern 4.2.0.
Export existing WP site XML and import into clean WP install. (I do all of this locally with MAMP/XAMPP for the sake of convenience.)
Import WP into clean TXP install via TXP Importer. Find a broken article and note where the it terminates. Delete all imported posts and associated content.
Export the new Wordpress XML via Wordpress export tool. Open XML file in a good text editor like Notepad++ (Windows), set it to show all characters, white space, etc.
Find where the offending article broke in the XML file, in the content block, and examine characters closely. I found that sometimes spaces (this one really mystifies me but Notepadd++ makes it apparent), left and right quotes, and long dashes needed to be replaced with generic keyboard quotes, dashes, spaces etc. Find and replace the offending characters and save modified XML file.
Delete all WP content from new installation and import the modified XML.
Use the TXP importer to import the now updated WP.
Rinse and repeat as necessary until all of the offending characters have been eliminated.
I’m no encoding expert, and all of character replacement can probably be accomplished via phpmyadmin, but certain characters will signal the import to stop parsing article content.
I can’t tell you why this happens, only that it fixed it for me.
Offline
#2 2010-11-22 18:00:38
- els
- Moderator
- From: The Netherlands
- Registered: 2004-06-06
- Posts: 7,458
Re: [howto] For those having problems importing WP into TXP
Thanks vurt, moved to How-tos and Examples :)
Offline
#3 2010-11-22 18:14:29
- vurt
- Member
- Registered: 2010-10-22
- Posts: 50
Re: [howto] For those having problems importing WP into TXP
Noted. Wasn’t sure where to post it. Thanks Els.
Offline
Re: [howto] For those having problems importing WP into TXP
Would you mind sending me a copy of a WP export file which breaks, plus an indication of where the culprit character(s) are to be found?
Offline
#5 2010-11-29 15:11:03
- vurt
- Member
- Registered: 2010-10-22
- Posts: 50
Re: [howto] For those having problems importing WP into TXP
On its way.
Offline
Re: [howto] For those having problems importing WP into TXP
Thanks for your contribution. Issue fixed.
Offline
Re: [howto] For those having problems importing WP into TXP
This would make a good TXP Tip for possible converts from WP to TXP. Vurt, would you mind emailing me a tutorial based on your post?
Offline
#8 2010-12-10 14:02:48
- vurt
- Member
- Registered: 2010-10-22
- Posts: 50
Re: [howto] For those having problems importing WP into TXP
Sure thing, but it’ll have to wait until after the holidays. I am deep in it right now.
Are you asking for a tutorial about how to import a site from WP into Txp, or how to fix it if you have problems?
Offline
#9 2011-06-21 18:47:58
- web0master
- New Member
- From: warsaw, poland
- Registered: 2011-06-21
- Posts: 3
Re: [howto] For those having problems importing WP into TXP
Thanks to you I managed to import nearly 5,000 posts of wodpress
“Fortunately, in the sense of concrete and abstract” – that all things in common beings happy
“Fortunately, in peace and irresponsibility” – a happy life is carefree and peaceful, but the author tries to show that many philosophers before him did not include the happiness of quiet and carefree pop327
Offline
Re: [howto] For those having problems importing WP into TXP
Robert, are you still interested in looking at WordPress data that breaks the importer?
My friend administers an outdated WP site that got hacked, and after he followed his host’s instructions for cleaning up the mess, he asked me to install Textpattern in place of the WordPress blog. I ran the importer and set the URLs to match, and spent a couple hours converting the WordPress theme into Textpattern templates.
A bunch of articles appeared in the backend, and on the site once I got the templates up, so I didn’t immediately notice the major problem: only 297 of 996 articles were imported, and nothing more recent than 2012. I’ve tried a whole lot of stuff to get it working, including the XML export/import method suggested above. When I do that, not even WordPress can import all the articles from it’s own XML store. I also tried the importer after upgrading the old WordPress installation to the latest version. I even cloned the wp_posts table, dropped its data, and selected year-by-year batches back into wp_posts to import into Textpattern. But even that method loses a lot of articles in transit, and I end up with the same articles for all the labor.
There are obvious encoding inconsistencies that may be the root of the problem: I can see strings of special characters that should be quotes and apostrophes and stuff, but I don’t know if there are any improper invisible characters in the mix. Also, the WordPress data includes HTML in the title field: br
and i
elements, mainly. I didn’t think that would affect MySQL operations, but I don’t know what’s causing the badness.
If there’s some way to sanitize the data before running Textpattern’s importer, I’d love to learn it. I’m happy to send the SQL backup file to you if you think the future of TXP could benefit from inspecting it. I welcome any other suggestions or guidance anyone might offer too.
Thank you!
Offline