Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2006-10-06 02:00:41

Jeremie
Member
From: Provence, France
Registered: 2004-08-11
Posts: 1,578
Website

Converting character set in utf8 for MySQL =>4.1

This has been talked in the past in various way, but as far as I know never quite fully explained, covered, or made working. The goal is to convert Textpattern database content from latin1 to utf8, when one updrage his MySQl from 3.x or 4.0 to 4.1 and above. Note: this is not mandatory, TXP can work without it, even with MySQL 4.1 and above.

From this post by Maniqui:

When my hosting upgraded from MySQL 4.0 to MySQL 4.1, I did all the things needed to convert my database from latin1/iso-8859-1 to a utf-8.

OK, when I installed TXP for the first time, TXP determined that the connection with the database had to be handled as latin1.
But after upgraded MySQL and after I do everything to convert my DB to UTF-8, I needed to change that line.

Maniqui (or anyone who has done it), could you elaborate on that? How can one convert a latin1 (more precisely, the utf8 content stored as latin1 in MySQL) TXP database into real utf8?

I’ve done some research on this, but I can’t find a clean way of doing it for (mostly) everyone. It’s the missing step, with it (and with MySQL 5 becoming more and more standard on various hosters) one could write a full howto for Textbook.

Offline

#2 2006-10-06 02:27:23

hcgtv
Plugin Author
From: Key Largo, Florida
Registered: 2005-11-29
Posts: 2,722
Website

Re: Converting character set in utf8 for MySQL =>4.1

In a couple of months I plan on moving up to MySQL 5.0 from 4.0 on my VPS. So I would also welcome any insight anyone has on this subject.

Offline

#3 2006-10-06 02:57:07

maniqui
Member
From: Buenos Aires, Argentina
Registered: 2004-10-10
Posts: 3,070
Website

Re: Converting character set in utf8 for MySQL =>4.1

Hi Jeremie,

yeah, I remember that post and few others I made before about utf-8 and charsets… I was looking how to write my articles without needing to use html entities but at the same time to have them displaying nice through PHPMyAdmin or in my backed-up database dumps… I remember I was a bit obssessive with that. I read every article about UTF8 and charsets and all that stuff. Of course, I dont truly understand it as I would like to…
Well, stop Maniquí. Try to give some help.

First, there is this nice textsnippet: Convert a db to UTF8 after upgrading to MySQL 4.1

Then, you have this great article by Kusor: UTF-8 tips
The “problem” for you is that is written in spanish, but maybe between me and hcgtv (Bert, i think you also speak a bit of spanish, am I right?) can translate those resources if needed.

Also, you have this topic (spanish too) but you can read the “technical” words that are always in english…

Last edited by maniqui (2006-10-06 14:21:09)


La música ideas portará y siempre continuará

TXP Builders – finely-crafted code, design and txp

Offline

#4 2006-10-06 03:07:11

Mary
Sock Enthusiast
Registered: 2004-06-27
Posts: 6,236

Re: Converting character set in utf8 for MySQL =>4.1

I found this:

Last edited by Mary (2006-10-06 03:09:45)

Offline

#5 2006-10-06 06:00:08

Jeremie
Member
From: Provence, France
Registered: 2004-08-11
Posts: 1,578
Website

Re: Converting character set in utf8 for MySQL =>4.1

Ok so a simple dump should be enough, with re-importing as utf8. I’ll test it. And yup, the Spanish article are out of my league :)

Mary, the MySQL Convert way is quite impossible to understand without solid MySQL background. The page manual is über geek only :/

Offline

#6 2006-10-06 07:20:00

Sencer
Archived Developer
From: cgn, de
Registered: 2004-03-23
Posts: 1,803
Website

Re: Converting character set in utf8 for MySQL =>4.1

http://mysqldba.blogspot.com/2006/07/rebuilding-40-data-into-41-data.html – that’s the dba from flickr, so he probably knows a thing or two… ;) It’s the same situation as for textpattern, utf8-data in latin1 columns if you installed on mysql4.0 or older. There are some changes though – ignore anything about slaves, replication or restarting mysql (not relevant), and we use the myisam engine, not the innodb engine, so you can ignore that part, too. Basicaly then, you have to drop the structure and the data seperately, then edit the dump of the table-structures. reimport the tables, reimport the data.

Convert() could be a problem, as it translates between charactersets, but that’s not what we really want to do, we do not have latin1 data, we have utf8-data in a latin1 column.

Last edited by Sencer (2006-10-06 07:20:53)

Offline

#7 2006-10-06 14:13:25

hcgtv
Plugin Author
From: Key Largo, Florida
Registered: 2005-11-29
Posts: 2,722
Website

Re: Converting character set in utf8 for MySQL =>4.1

Good links, there’s a lot to digest and I can test it on my local Debian server before committing the procedure to a live site.

maniqui, si hablo Espanol but I’m more comfortable with English when it comes to technical jargon.

Offline

#8 2006-10-07 05:44:07

Jeremie
Member
From: Provence, France
Registered: 2004-08-11
Posts: 1,578
Website

Re: Converting character set in utf8 for MySQL =>4.1

First test: it seems to work fine with the glyphs, PMA confirmed it. MySQL has some trouble with duplicate keys and/or entries, I’ll have to dig more on that, but at least the non ascii glyphs are fine.

Offline

#9 2006-11-05 07:42:54

jm
Plugin Author
From: Missoula, MT
Registered: 2005-11-27
Posts: 1,746
Website

Re: Converting character set in utf8 for MySQL =>4.1

Well, I finally converted my MySQL 3.x database to 4.1’s utf-8. It seems OK locally. Here’s what I did:

  1. Exported the tables with phpMyAdmin
  2. Opened them in a text editor
  3. Did a find and replace for “latin1” and replaced it with utf8 (other variants of utf8 charsets weren’t working)
  4. Copy pasted it into an SQL query
  5. My utf8 were changed to utf8_general_ci, and so far, everything seems fine.

edit – Well, the textpattern table was fine, but the txp_discuss and lang tables weren’t (not sure about links – didn’t have any posted with apostrophes or any other characters). Fortunately it was pretty simple to fix. 1) Update to the latest language file (takes care of all the weird characters in admin) 2) Execute the following query (could probably be combined, but MySQL scares me):

UPDATE txp_discuss SET message = replace(message,"’","’");#closing apostrophe
UPDATE txp_discuss SET message = replace(message,"“","“");#curly opening
UPDATE txp_discuss SET message = replace(message,"”","”");#curly closing
UPDATE txp_discuss SET message = replace(message,"–","–");#en dash

Those are the only characters I had a problem with.

Last edited by jm (2006-11-05 09:03:46)

Offline

Board footer

Powered by FluxBB