Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2012-06-22 11:18:31

THE BLUE DRAGON
Member
From: Israel
Registered: 2007-11-16
Posts: 638
Website

Clean URL in UTF-8

Hi,
I will like to know please how to get the clean url in the addressbar, and URL-only title both in UTF-8
I need it for Japanese and for Hebrew please.

Currently what I get it is only like: %d7%91%d7%93%d7%99%d7%a7%d7%94
instead of: בדיקה

On top of that, if I’m adding a number to the title text, then the URL-only title becomes to be only that number without all the other text.

For example, if you publish an article with the title in Hebrew as: בדיקה
(It means “test”)
You will get: %d7%91%d7%93%d7%99%d7%a7%d7%94

But if you will add a number in the end: בדיקה 2
(means “test 2”)
You will end-up with only: 2

So what should I do to get these languages work and look pretty please?
I keeps the TXP interface language in English.

In phpMyAdmin when I create the database I set both the collation fields to: utf8_unicode_ci
is that OK?

Offline

#2 2012-06-27 09:45:54

wet
Developer Emeritus
From: Schoerfling, Austria
Registered: 2005-06-06
Posts: 3,340
Website GitHub Mastodon

Re: Clean URL in UTF-8

First, a bit of an explanation what Textpattern tries to achieve…

There aren’t that many valid UTF-8 characters for URLs:

A URI is a sequence of characters from a very limited set: the letters of the basic Latin alphabet, digits, and a few special characters. RFC3986

Therefore Textpattern transliterates any non-ASCII character into its ASCII equivalent to both fulfil this requirement and overcome this limitation. This works fine for Cyrillic and many other character sets but obviously Textpattern does not have enough information to provide this transliteration for Hebrew.

We’d need a transliteration table like the one we have for Cyrilic and other supported character sets here. Maybe you’d want to contribute that?

If you nevertheless want raw UTF-8 characters in URLs you can provide a custom plugin which translates titles to URLs.

The plugin must hook into the sanitize_for_url event and return the desired URL. Textpattern will then use this returned value and not use its built-in algorithm.

Offline

#3 2012-06-27 10:40:11

Dragondz
Moderator
From: Algérie
Registered: 2005-06-12
Posts: 1,548
Website GitHub Twitter

Re: Clean URL in UTF-8

You can also copy past the title into the URL-only title input manually and got the utf link.

Cheers

Offline

#4 2012-06-27 11:48:32

THE BLUE DRAGON
Member
From: Israel
Registered: 2007-11-16
Posts: 638
Website

Re: Clean URL in UTF-8

Thank you both,
I will check and try to understand what’s going on in the i18n-ascii.txt file,
but for now I will just use Dragondz suggestion to just copy the title to the url-only-title field which seems to work fine.

if($('#page-article').length > 0){
	$('.publish').click(function(){
		$('#url-title').val($('#title').val());
	});
}

If I will have problems then I think I will just give up and use article-id instead.
I also need this for Japanese in which unfortunately I don’t speak.

Arigato ;)

Offline

#5 2012-08-03 10:26:38

whocarez
Plugin Author
From: Germany/Ukraine
Registered: 2007-10-08
Posts: 305
Website GitHub Twitter

Re: Clean URL in UTF-8

Hello,
maybe this small solution is also working for you:

Last edited by whocarez (2012-09-07 12:14:23)

Offline

#6 2012-09-06 09:29:24

THE BLUE DRAGON
Member
From: Israel
Registered: 2007-11-16
Posts: 638
Website

Re: Clean URL in UTF-8

Thanks for the plugin I will give it a try in a future project, for now I just don’t trust UTF-8 URLs, they are cousing a lot of issues, with social media, and social media plugins.
So for now I’m only using section/id, without any titles.
But will give it a try again in the next UTF-8 project that I will got, who knows maybe your plugin will save the day! :)

Offline

Board footer

Powered by FluxBB