Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Converting character set in utf8 for MySQL =>4.1
This has been talked in the past in various way, but as far as I know never quite fully explained, covered, or made working. The goal is to convert Textpattern database content from latin1 to utf8, when one updrage his MySQl from 3.x or 4.0 to 4.1 and above. Note: this is not mandatory, TXP can work without it, even with MySQL 4.1 and above.
From this post by Maniqui:
When my hosting upgraded from MySQL 4.0 to MySQL 4.1, I did all the things needed to convert my database from latin1/iso-8859-1 to a utf-8.
…
OK, when I installed TXP for the first time, TXP determined that the connection with the database had to be handled as latin1.
But after upgraded MySQL and after I do everything to convert my DB to UTF-8, I needed to change that line.
Maniqui (or anyone who has done it), could you elaborate on that? How can one convert a latin1 (more precisely, the utf8 content stored as latin1 in MySQL) TXP database into real utf8?
I’ve done some research on this, but I can’t find a clean way of doing it for (mostly) everyone. It’s the missing step, with it (and with MySQL 5 becoming more and more standard on various hosters) one could write a full howto for Textbook.
Offline
Re: Converting character set in utf8 for MySQL =>4.1
In a couple of months I plan on moving up to MySQL 5.0 from 4.0 on my VPS. So I would also welcome any insight anyone has on this subject.
We Love TXP . TXP Themes . TXP Tags . TXP Planet . TXP Make
Offline
Re: Converting character set in utf8 for MySQL =>4.1
Hi Jeremie,
yeah, I remember that post and few others I made before about utf-8 and charsets… I was looking how to write my articles without needing to use html entities but at the same time to have them displaying nice through PHPMyAdmin or in my backed-up database dumps… I remember I was a bit obssessive with that. I read every article about UTF8 and charsets and all that stuff. Of course, I dont truly understand it as I would like to…
Well, stop Maniquí. Try to give some help.
First, there is this nice textsnippet: Convert a db to UTF8 after upgrading to MySQL 4.1
Then, you have this great article by Kusor: UTF-8 tips
The “problem” for you is that is written in spanish, but maybe between me and hcgtv (Bert, i think you also speak a bit of spanish, am I right?) can translate those resources if needed.
Also, you have this topic (spanish too) but you can read the “technical” words that are always in english…
Last edited by maniqui (2006-10-06 14:21:09)
Offline
#4 2006-10-06 03:07:11
- Mary
- Sock Enthusiast
- Registered: 2004-06-27
- Posts: 6,236
Re: Converting character set in utf8 for MySQL =>4.1
I found this:
- http://www.phpwact.org/php/i18n/utf-8/mysql
- http://dev.mysql.com/doc/refman/5.0/en/charset-convert.html (was available in 4.1 too)
Last edited by Mary (2006-10-06 03:09:45)
Offline
Re: Converting character set in utf8 for MySQL =>4.1
Ok so a simple dump should be enough, with re-importing as utf8. I’ll test it. And yup, the Spanish article are out of my league :)
Mary, the MySQL Convert way is quite impossible to understand without solid MySQL background. The page manual is über geek only :/
Offline
Re: Converting character set in utf8 for MySQL =>4.1
http://mysqldba.blogspot.com/2006/07/rebuilding-40-data-into-41-data.html – that’s the dba from flickr, so he probably knows a thing or two… ;) It’s the same situation as for textpattern, utf8-data in latin1 columns if you installed on mysql4.0 or older. There are some changes though – ignore anything about slaves, replication or restarting mysql (not relevant), and we use the myisam engine, not the innodb engine, so you can ignore that part, too. Basicaly then, you have to drop the structure and the data seperately, then edit the dump of the table-structures. reimport the tables, reimport the data.
Convert() could be a problem, as it translates between charactersets, but that’s not what we really want to do, we do not have latin1 data, we have utf8-data in a latin1 column.
Last edited by Sencer (2006-10-06 07:20:53)
Offline
Re: Converting character set in utf8 for MySQL =>4.1
Good links, there’s a lot to digest and I can test it on my local Debian server before committing the procedure to a live site.
maniqui, si hablo Espanol but I’m more comfortable with English when it comes to technical jargon.
We Love TXP . TXP Themes . TXP Tags . TXP Planet . TXP Make
Offline
Re: Converting character set in utf8 for MySQL =>4.1
First test: it seems to work fine with the glyphs, PMA confirmed it. MySQL has some trouble with duplicate keys and/or entries, I’ll have to dig more on that, but at least the non ascii glyphs are fine.
Offline
Re: Converting character set in utf8 for MySQL =>4.1
Well, I finally converted my MySQL 3.x database to 4.1’s utf-8. It seems OK locally. Here’s what I did:
- Exported the tables with phpMyAdmin
- Opened them in a text editor
- Did a find and replace for “latin1” and replaced it with utf8 (other variants of utf8 charsets weren’t working)
- Copy pasted it into an SQL query
- My utf8 were changed to utf8_general_ci, and so far, everything seems fine.
edit – Well, the textpattern table was fine, but the txp_discuss and lang tables weren’t (not sure about links – didn’t have any posted with apostrophes or any other characters). Fortunately it was pretty simple to fix. 1) Update to the latest language file (takes care of all the weird characters in admin) 2) Execute the following query (could probably be combined, but MySQL scares me):
UPDATE txp_discuss SET message = replace(message,"’","’");#closing apostrophe
UPDATE txp_discuss SET message = replace(message,"“","“");#curly opening
UPDATE txp_discuss SET message = replace(message,"â€","”");#curly closing
UPDATE txp_discuss SET message = replace(message,"–","–");#en dash
Those are the only characters I had a problem with.
Last edited by jm (2006-11-05 09:03:46)
Offline