Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#1 2006-09-05 11:59:11
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Relaxed transliteration for section, category names
We’re considering some changes to the way strings are transliterated and stripped for inclusion in URLs. Section and category names in particular.
At the moment, Textpattern does this:
1. Transliterate known characters to ASCII, following the rules in i18n-ascii
2. Remove and collapse punctuation and spaces
3. Remove anything remaining non-ASCII characters
This causes problems with languages like Chinese that don’t have transliteration rules, since most names wind up empty.
We’re considering eliminating step (3). That would leave intact any non-ASCII word characters that don’t have transliteration rules. They’ll be percent encoded in URLs.
Would that be a good thing, or a bad thing?
Alex
Offline
Re: Relaxed transliteration for section, category names
Would it be possible to do that only if step three gives an empty string?
How do you deal with editing the category? When you edit and save a category name that contains , the % would be removed, leaving hexcode. And if you don’t remove the % upon saving, people could try creating categories like ’10’ which is not a valid urlencoded string.
I wonder what the effect would be on a category named ‘0’ (zero). Currently, you can’t create that category, probably because the string ‘0’ is seen as a boolean false when used in an if-statement. I suppose ‘0’ would be encoded as %30 even though it would be valid without urlencoding. Note that you can create categories named ‘1’, ‘2’ etc.
Last edited by ruud (2006-09-05 20:42:04)
Offline
Re: Relaxed transliteration for section, category names
ruud wrote:
How do you deal with editing the category?
It would display that which was entered – so basically everything the user entered would be a “valid entry”. The percent-encoding would only happen where it is needed, i.e., in the actual urls that are generated.
Offline
Re: Relaxed transliteration for section, category names
I know this is not an answer for this question, but it’s probably worth the mention: There are plans to include the ICU library in PHP version 6. ICU includes great transliteration functions, which would be the ideal replacement for the i18n-ascii-method. So, keep this in mind for the future.
VC3 :: weblog :: my wishlist
Offline
#5 2006-09-06 00:16:03
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: Relaxed transliteration for section, category names
Would it be possible to do that only if step three gives an empty string?
Possible, but I don’t think it’s an ideal solution. What if the category is entered as 中文-中文 or similar? Step three won’t give an empty string, but it’s definitely a situation where we don’t want to strip unknown characters.
The basic assumption I’m making is that i18n-ascii.txt maps every character that can reasonably be transliterated. That’s probably not 100% true, but it should be the goal.
How do you deal with editing the category? When you edit and save a category name that contains , the % would be removed, leaving hexcode. And if you don’t remove the % upon saving, people could try creating categories like ‘10’ which is not a valid urlencoded string.
The name would be saved unencoded, and only encoded when the URL is actually generated.
Last edited by zem (2006-09-06 00:17:13)
Alex
Offline