Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2006-09-05 11:59:11

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Relaxed transliteration for section, category names

We’re considering some changes to the way strings are transliterated and stripped for inclusion in URLs. Section and category names in particular.

At the moment, Textpattern does this:

1. Transliterate known characters to ASCII, following the rules in i18n-ascii
2. Remove and collapse punctuation and spaces
3. Remove anything remaining non-ASCII characters

This causes problems with languages like Chinese that don’t have transliteration rules, since most names wind up empty.

We’re considering eliminating step (3). That would leave intact any non-ASCII word characters that don’t have transliteration rules. They’ll be percent encoded in URLs.

Would that be a good thing, or a bad thing?


Alex

Offline

#2 2006-09-05 20:16:41

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Relaxed transliteration for section, category names

Would it be possible to do that only if step three gives an empty string?

How do you deal with editing the category? When you edit and save a category name that contains , the % would be removed, leaving hexcode. And if you don’t remove the % upon saving, people could try creating categories like ’10’ which is not a valid urlencoded string.

I wonder what the effect would be on a category named ‘0’ (zero). Currently, you can’t create that category, probably because the string ‘0’ is seen as a boolean false when used in an if-statement. I suppose ‘0’ would be encoded as %30 even though it would be valid without urlencoding. Note that you can create categories named ‘1’, ‘2’ etc.

Last edited by ruud (2006-09-05 20:42:04)

Offline

#3 2006-09-05 20:32:20

Sencer
Archived Developer
From: cgn, de
Registered: 2004-03-23
Posts: 1,803
Website

Re: Relaxed transliteration for section, category names

ruud wrote:

How do you deal with editing the category?

It would display that which was entered – so basically everything the user entered would be a “valid entry”. The percent-encoding would only happen where it is needed, i.e., in the actual urls that are generated.

Offline

#4 2006-09-05 21:34:44

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

Re: Relaxed transliteration for section, category names

I know this is not an answer for this question, but it’s probably worth the mention: There are plans to include the ICU library in PHP version 6. ICU includes great transliteration functions, which would be the ideal replacement for the i18n-ascii-method. So, keep this in mind for the future.

Offline

#5 2006-09-06 00:16:03

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: Relaxed transliteration for section, category names

Would it be possible to do that only if step three gives an empty string?

Possible, but I don’t think it’s an ideal solution. What if the category is entered as 中文-中文 or similar? Step three won’t give an empty string, but it’s definitely a situation where we don’t want to strip unknown characters.

The basic assumption I’m making is that i18n-ascii.txt maps every character that can reasonably be transliterated. That’s probably not 100% true, but it should be the goal.

How do you deal with editing the category? When you edit and save a category name that contains , the % would be removed, leaving hexcode. And if you don’t remove the % upon saving, people could try creating categories like ‘10’ which is not a valid urlencoded string.

The name would be saved unencoded, and only encoded when the URL is actually generated.

Last edited by zem (2006-09-06 00:17:13)


Alex

Offline

Board footer

Powered by FluxBB