Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
[issue] accented character and urls
I have some post with accented characters in title. When in url, the accented character are stripped away instead of converted in plain charachter (es: à —-> a).
If the accented character is in the middle of the word the remaining part of the word is stripped away.
Is this the correct behavior (I hope no)? Have I to set some pref? Or is my installation that have problems? Anyway, I have the same behavior in two different installation on two different server. TXP 4.0
Thank for any suggestion.
Zanz
Offline
Re: [issue] accented character and urls
The dirifying of url-titles is explained here:
http://forum.textpattern.com/viewtopic.php?id=9959, because it has to be done with a list that is manually compiled it likely doesn’t have every possible character in it.
I have tried with the character à you mention and it is converted into an a for the url-title.
[quote]If the accented character is in the middle of the word the remaining part of the word is stripped away.[/quote]
I have never seen that behaviour. Can you post the exacty title of the post? Which PHP version are you using (you can find this in the diagnsotics tab)?
In any case it is possible to manually override/change every url-title for each article.
Offline
Re: [issue] accented character and urls
> Sencer wrote:
> The dirifying of url-titles is explained here:
http://forum.textpattern.com/viewtopic.php?id=9959, because it has to be done with a list that is manually compiled it likely doesn’t have every possible character in it.
Ok, thank. It doesn’t work that way for me. :-( Trying to guess why.
>I have tried with the character à you mention and it is converted into an a for the url-title.
Suppose my post is called “verità”. In the url it comes: “section/verit”
>[quote]If the accented character is in the middle of the word the remaining part of the word is stripped away.[/quote]
>I have never seen that behaviour. Can you post the exacty title of the post? Which PHP version are you using (you can find this in the diagnsotics tab)?
Title of post (just trying, it doesn’t mean nothing): veritàvera. It comes: “section/verit”.
The versions of php are 4.4.0 and 4.3.10.
>In any case it is possible to manually override/change every url-title for each article.
True, I forgot that, thanks. I can go for this solution, until I can’t get what’s wrong. This only happens in titles of the post, not sections or category, thanks to the title/name fields of section and category.
Zanz
Offline
Re: [issue] accented character and urls
I tried to figure out what the problem is. I noticed that the file i18n-ascii.txt on my disc looks different than http://svn.textpattern.com/development/4.0/textpattern/lib/i18n-ascii.txt . I tried to copy and paste that file into another file and ftp that one. Then, the accented character are translated, but:
- every accented character become an ‘a’.
- I get an error message: Warning: Error parsing /home/XXX/public_html/textpattern/lib/i18n-ascii.txt on line 13 in /home/XXX/public_html/textpattern/lib/txplib_misc.php on line 613
I supposed the problem stands in the encoding of the file itself.
Then I remembered that I uploaded using win98ME, that doesn’t support second.
Then re-uploaded lib files from mac os X, and most accented characters worked fine! With one ecception… ‘à’!! The one that I first used .
Update:
Uhm… no, this is the normal behavior even on my other server, where I haven’t modified the file. Don’t know why, the à character is the only that I can’t manage to translate…. :-(
Zanz
Last edited by Zanza (2005-08-27 15:13:37)
Offline
#5 2005-08-28 00:09:10
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: [issue] accented character and urls
I tried to copy and paste that file into another file and ftp that one.
Don’t copy and paste. The file contains utf-8 characters, and many text editors don’t handle those correctly.
Alex
Offline
Re: [issue] accented character and urls
Ok, thanks zem for the advice.
I restored the original lib files in both my installations, and still have the problem. The problem is very strange: only the ‘à’ character is stripped out in article title url.
Every other accented character I tried is being parsed. ‘è’ becomes ‘e’, ‘ì’ becomes ‘i’, and so on.
Only ‘à’ is stripped away.
Even stranger: I made a php routine from the txplib_misc.php file, in another directory, and used the same i18n-ascii.txt file. Basically I duplicated the dumbDown function, to see in in a new routine the problem exists. Well, I can’t believe: in my new file the ‘à’ characters are being correctly translated in ‘a’!!
I don’t know what the hell the problem may be!…
Mumble…
Z-
Last edited by Zanza (2005-08-29 11:54:19)
Offline
Re: [issue] accented character and urls
Update. I made a step toward the comprehension of this strange behavior.
When I put a title like ‘oègià’ (fictitious.. ;-), the string that come out from the dumbDown function (called in the stripSpace function) is ‘oègi&’ (with numeric refererence to ampersand: & #38;). In other words, the dumbDown function turn the ‘à’ in an ampersand! I can’t guess why, but this is the reason it is stripped out in the remaining instructions of stripSpace function, I suppose.
The weird thing is: if I create a dumbDown function in another file in the same dir, and use a direct file reference to i18n-ascii.txt file (not using txpath), when I call that file the à is not stripped out, but correctly turned in a ‘a’!
Unfortunately in the txplib_misc.php file I can’t use direct reference to the i18n-ascii.txt file, because I get an error. No, I don’t get an error, I’m still trying to figure out what’s really happening.
I need a coffe, before going on… ;-)
Z.
Last edited by Zanza (2005-08-29 13:36:20)
Offline
Re: [issue] accented character and urls
Another update:
The problem is really strange. When parsed by txplib_misc.php file, the dumbDown function is buggy, ‘cause some charachter (all ‘a’ type, and some other, like ç) for some reason are translated into & #38; , and then stripped out by next line of code.
The same function and the same conversion files (i18n-ascii.txt or some other made by my own) are working perfectly if called from other php files I made.
So one possibility is: there is something in files including txplib_misc.php somewhere in textpattern that can justify this mis-encoding? Maybe some mime-type problem, or a definition, or I don’t know what, working bad in some version of php (I have tried 4.3 and 4.4)
This is a head-ache problem, really, for me. I wonder no one experiment it, because it comes from the default installation on two different server… :-( And it should be a major problem when it comes to internationalization.
If it’s a my mistake, I can’t find what!
Bye
Z-
Last edited by Zanza (2005-08-29 16:32:24)
Offline
Re: [issue] accented character and urls
Is classTextile modified in any way? Is the one form the release? Can you post the line that contains classTestile in the output of your high-level diagnostics?
Offline
Re: [issue] accented character and urls
> Sencer wrote:
> Is classTextile modified in any way? Is the one form the release? Can you post the line that contains classTestile in the output of your high-level diagnostics?
You mean this?
/lib/classTextile.php: r737 (91b5c252098fec80f4f4d45f1449e244)
Anyway, no, I haven’t modified anything. This is the default behavior. By the way, this only happens to titles, that I think aren’t processed by textile, or are they?
More: when I put a category (link , image or article) title like “accessibilità”, then the name resulting is “accessibilit38”!! While the title remain “accessibilità”.
If I create a category name “accessibilit& agrave;” (without space), then it’s correctly translated in “accessibilita” in category name. Now I remember this behavior always happened also in previous version of TXP, that I tested, and would not go further. Now I see that the problem is still here.
The problem seems that some charachter (à, and some other) when converted in ascii ansi, are incorrectly translated in & #38; by dumbDown function (I echoed $text before and after that call).
I don’t know why. I suppose it’s a sort of Unicode problem with file i18n-ascii.txt when called by textpattern. If called from another routine external of textpattern, the coding is ok. Any idea?
M-
Last edited by Zanza (2005-08-30 10:29:45)
Offline
Re: [issue] accented character and urls
> You mean this?
> /lib/classTextile.php: r737 (91b5c252098fec80f4f4d45f1449e244)
Yepp, the file from the download has a different md5-hash, though. Can you reupload the original file from the download?
> Anyway, no, I haven’t modified anything. This is the default behavior. By the way, this
> only happens to titles, that I think aren’t processed by textile, or are they?
The way the code is in 4.0, the textiled Title is dumbed down. (This is already changed in svn). But even then – Textile will not convert utf-8 characters to numeric entities – it used to to do that in the past, but that was changed a while before 4.0 was released. That’s why I was wondering whether the problem you are having might be due to changed files or a mixture of old files.
Can you do a clean install on the same server with freshly downloaded files and see if the same problem exists?
Offline
Re: [issue] accented character and urls
Ok: I moved my old dir and reinstalled all textpattern folder files.
Now the class textile in diagnostic is:
/lib/classTextile.php: r737 (2b88b8520af4a64c01e1638963d9adb9)
But the problem with à converted in & #38; still remains. :-(
It remains both in creating categories (portabilità —-> portabilit38) and in article titles converted in url (portabilità —-> portabilit , cause the ampersand is stripped away by preg_replace).
Hints?
Z-
Offline