Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2006-04-06 02:53:24

marvix
Member
Registered: 2006-04-06
Posts: 27

[issue] utf8 and preg_replace bug ... ?!?

Hi …

While testing the code with arabic and hebrol … I found the section name and the category name not saved well in the db … its come as ascii code after that … and with searching I found this line in txp_section.php :

134, 172 : $name = preg_replace(“/[^[:alnum:]\-_]/”, “”, str_replace(” “,”-”,$name));

also the same in file txp_category.php line 223 and 279 ..

I couldn`t figur out the use of replace here yet .. but for sure becuase of this the info is not saving well in the db !

What the solution ?? now I had just put as php comment ..

Thanks in advanced.

Offline

#2 2006-04-10 19:11:38

marvix
Member
Registered: 2006-04-06
Posts: 27

Re: [issue] utf8 and preg_replace bug ... ?!?

No one help me with this ?!

Offline

#3 2006-04-10 19:51:53

marios
Archived Plugin Author
Registered: 2005-03-12
Posts: 1,253

Re: [issue] utf8 and preg_replace bug ... ?!?

I’m afrayd, I won’t be of much help, but the name field of the section must allways be dirified, that is expected behavior, since the name part is used in the URL sheme as well, and you can not have non ASCII characters in URL’s, (well, net yet at least),
That’s why you use the title field instead, to display your non ASCII charracters.
Does the title field work at least ?
That would be interesting allso for testing, since arabic is a rtl language,

regards, marios


⌃ ⇧ < ⌃ ⇧ >

Offline

#4 2006-04-10 20:51:09

Sencer
Archived Developer
From: cgn, de
Registered: 2004-03-23
Posts: 1,803
Website

Re: [issue] utf8 and preg_replace bug ... ?!?

After you create a cateory, you can go in and edit it, then you’ll see that each category has a name and a title. One for the url which is restricted as seen above, and one for display on the pages itself, which is not restricted.

What you are describing doesn’t seem to be a bug, but intended behaviour. Unless I didn’t understand what you were trying to say. :)

Offline

#5 2006-04-10 21:53:22

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] utf8 and preg_replace bug ... ?!?

What’s the bug? What input did you enter? What were the results? What did you expect to happen?


Alex

Offline

#6 2006-04-11 09:50:57

marvix
Member
Registered: 2006-04-06
Posts: 27

Re: [issue] utf8 and preg_replace bug ... ?!?

The title is ok … but the name … after adding new section its showing just 1 char from the word, and I think it must show up whole word, after I had removed that line works fine, but becuz I don`t know how to figure out the preg and str replace I was thinking its buged … the link to this sction in the user side design work fine … but if the section name 3 char I`ll get about 30 asci char somthing like this : “H%%^#@HJHJ^e” and this for frindly url looks strange :)

For arabic the best is “Windows-1256”, and will be great if I can get what I need by id such , section id/cat id/article id !

maybe this links will help also !

http://www.sitepoint.com/blogs/2006/03/03/us-ascii-transliterations-of-unicode-text/
http://www.sitepoint.com/blogs/2006/02/26/php-utf-8-01/

Offline

#7 2006-04-11 22:37:26

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] utf8 and preg_replace bug ... ?!?

I’m having trouble following your description. We need a clear, step by step description of:

  • The steps required to reproduce the problem
  • The expected result, and
  • The actual result

Textpattern uses utf-8, which works fine for virtually all languages, Arabic included. If you’re trying to force the page output encoding to Windows-1256, that could be the cause of the problem.

Last edited by zem (2006-04-11 22:39:36)


Alex

Offline

#8 2006-04-12 05:31:38

marvix
Member
Registered: 2006-04-06
Posts: 27

Re: [issue] utf8 and preg_replace bug ... ?!?

I`ll not use any utf8 language in the name input .. thats all .. I was testing the script, i`ll switch from ez publish, joomla, to exp … its look great :)

But … i`ll try to modify the code to get friendly urls by id, I mean such “section_id/category_id/id” in whole links in the site … this option is missed from the rang selection :)

Last edited by marvix (2006-04-12 06:37:29)

Offline

#9 2006-04-12 06:51:54

Sencer
Archived Developer
From: cgn, de
Registered: 2004-03-23
Posts: 1,803
Website

Re: [issue] utf8 and preg_replace bug ... ?!?

[edit: well, ok, it looks like you edited your post while I was typing my response. ]

marvix, look, we do already do transliteration, it’s characterbased, here’s the conversion list:

http://svn.textpattern.com/development/4.0/textpattern/lib/i18n-ascii.txt

After that has happened, we remove all non-ascii characters from the url.

The link to sitepoint has one arabic speaker stating that such characterbased translteration doesn’t work well/at all for arabic. However you can do the transiliteration manually yourself when you edit the category.

If you still want to report a concrete bug, you have to do it like this:

Action: Create category by entereing “tüt”.
Expected: it should show name as “tuet”.
Actual result: it shows name as “tt”.

Of course if you want to propose that we transliterate arabic into ascii, you’ll need to be more concrete and tell us how that can be done.

Last edited by Sencer (2006-04-12 06:53:32)

Offline

#10 2006-06-15 11:42:53

Boby Dimitrov
Member
From: Sofia, Bulgaria
Registered: 2004-09-27
Posts: 76
Website

Re: [issue] utf8 and preg_replace bug ... ?!?

Don’t know where’s the proper place to say this, but several of the cyrillic letters in the transliteration are wrong, when it comes to Bulgarian language at least (and I suppose most Slavic languages). Here’s what I’ve changed at my blog to have readable urls:

Before:
Х = “KH”
Ц = “TS”
Щ = “SHCH
Ъ = “”
ъ = “”
Ь = “”
ь = “”

After:
Х = “H”
Ц = “C” (this one is not “by the book”, but most people transliterate that way)
Щ = “SHT
Ъ = “Y”
ъ = “y”
Ь = “I”
ь = “i”

Also, I have no idea why the small letters (non-capital) are transliterated like capitals, but maybe it does not matter.

Offline

#11 2006-06-15 11:49:50

Sencer
Archived Developer
From: cgn, de
Registered: 2004-03-23
Posts: 1,803
Website

Re: [issue] utf8 and preg_replace bug ... ?!?

Hello Boby,

thanks for the info. We also have the ability to do language-specific transliteration and already do so in some places where people have let us know about these things. The only prerequisite for doing it, is that we have a translation for the language as well. Currently we have translations for the following languages:
http://rpc.textpattern.com/lang/

We welcome further translations, you can find more info here:
http://forum.textpattern.com/viewtopic.php?id=10594

Once we have that, we can also add language specific transliterations.

Offline

#12 2006-06-15 14:56:08

Boby Dimitrov
Member
From: Sofia, Bulgaria
Registered: 2004-09-27
Posts: 76
Website

Re: [issue] utf8 and preg_replace bug ... ?!?

Sencer,

I was about to translate to Bulgarian some time ago but in the process I found plenty of stings hardcoded, so I gave up and decided to wait for better times. I’ll have a look at the progress made in the last two years… :)

Offline

Board footer

Powered by FluxBB