Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#13 2011-08-21 08:06:39

Vienuolis
Member
From: Vilnius, Lithuania
Registered: 2009-06-14
Posts: 307
Website GitHub GitLab Twitter

Re: [textile] Links with umlauts don't work

Just tested URL encoding by Markdown (one of Textile alternatives) on Tumblr.com — sweet!

Last edited by Vienuolis (2011-08-21 08:07:25)

Offline

#14 2011-08-23 10:52:27

bjornbjorn
New Member
Registered: 2011-08-23
Posts: 1

Re: [textile] Links with umlauts don't work

Hi guys, has anyone reported this as a bug? We’re also having issues with this – getting reports from users that Textile won’t markup links with cyrillic characters. So looks like the same issue.

Offline

#15 2011-08-23 11:25:12

wet
Developer Emeritus
From: Schoerfling, Austria
Registered: 2005-06-06
Posts: 3,323
Website Mastodon

Re: [textile] Links with umlauts don't work

bjornbjorn wrote:

Hi guys, has anyone reported this as a bug?

This is no bug.

Offline

#16 2011-08-23 11:25:35

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: [textile] Links with umlauts don't work

Please provide examples, so it becomes clear which part of the URL has cyrillic characters.

I think the difficult part is dealing with URLs like these:

  1. http://example.com/one%20two%20three
  2. http://example.com/minus2%discount
  3. http://example.com/éë
  4. http://example.com/éë2%discount
  5. http://example.com/éë%20minus2%discount
You’d have to
  1. no encoding needed
  2. urlencoding needed… but how easy is it to recognize this properly?
  3. urlencode
  4. urlencode (including the %)
  5. probably invalid, mix of already urlencoded and non-urlencoded. I’d probably urlencode here.

Interesting, if I copy an URL that has non-ascii characters from the address bar in Firefox and past it somewhere else, it’s pasted with proper URL encoding. If people who use textile past their URLs in the same way, there’s no need for textile to take care of the encoding.

Last edited by ruud (2011-08-23 12:05:53)

Offline

#17 2011-08-23 11:34:19

net-carver
Archived Plugin Author
Registered: 2006-03-08
Posts: 1,648

Re: [textile] Links with umlauts don't work

Yes, textile does face a challenge with the URL encoding. I’ll see if I can put some time into this.

Vienuolis wrote:

Just tested URL encoding by Markdown (one of Textile alternatives) on Tumblr.com — sweet!

For the time being, I think Markdown has a Textile compatibility mode so you should be able to drop it into your Txp sites (renaming the file) as a replacement for Textile.


Steve

Offline

#18 2011-08-23 13:35:09

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,079
Website

Re: [textile] Links with umlauts don't work

ruud wrote:

Interesting, if I copy an URL that has non-ascii characters from the address bar in Firefox and past it somewhere else, it’s pasted with proper URL encoding. If people who use textile past their URLs in the same way, there’s no need for textile to take care of the encoding.

That is a Firefox specific behaviour I think. Camino, Opera and Safari (all on OS X 10.6.8 and 10.5) don’t do that. Nor would I expect it, to be honest. That is, I consider the Firefox behaviour a bug.

Hold on, Chrome dev channel appears to have the same problem as Firefox. :-(

—-
It is a tricky problem, esp for languages that don’t use spaces to separate words (e.g Japanese). I noticed on Wikipedia that they use roman brackets ‘)’ in URI – as opposed to Japanese brackets ‘)’.
example: http://ja.wikipedia.org/wiki/福島駅_(大阪府)

Last edited by phiw13 (2011-08-23 13:36:18)


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#19 2012-02-16 21:58:59

net-carver
Archived Plugin Author
Registered: 2006-03-08
Posts: 1,648

Re: [textile] Links with umlauts don't work

Folks

For anyone following this thread, I’ve pushed a new branch unicode-links that is a first-cut attempt at dealing with this for normal mode (not restricted mode). For now I’m calling rawurlencode on the url part and then decoding the reserved characters (as defined in RFC3986) on the assumption that in normal mode, folks won’t be entering already encoded data in their URLs (the same will likely not be true in restricted mode where comments may be left by any Joe H. Acker.)

Anyway, for now, this seems to let you enter the likes of…

"Übermensch":https://de.wikipedia.org/wiki/Übermensch

… getting back…

 <p><a href="https://de.wikipedia.org/wiki/%C3%9Cbermensch">Übermensch</a></p>

… which looks OK to my untrained eye and works fine in FF. I am, however, having trouble getting Japanese characters through our unicode regex — characters just like those in Philippe’s post above.


Steve

Offline

#20 2012-02-17 21:39:31

uli
Moderator
From: Cologne
Registered: 2006-08-15
Posts: 4,304

Re: [textile] Links with umlauts don't work

Steve, I’ve tortured that version a little with what I could reach on my keyboard with the help of two hands (i.e. äöüÄÖÜßçéáóúèàòùÉÁÓÚÈÀÒÙêÊôÔâÂûÛåÅœŒæÆøØëËïÏ), and none of these (umm … all of them together) broke the link.

Great job :)


In bad weather I never leave home without wet_plugout, smd_where_used and adi_form_links

Offline

#21 2012-02-17 22:06:00

net-carver
Archived Plugin Author
Registered: 2006-03-08
Posts: 1,648

Re: [textile] Links with umlauts don't work

uli wrote:

Steve, I’ve tortured that version a little with what I could reach on my keyboard with the help of two hands (i.e. äöüÄÖÜßçéáóúèàòùÉÁÓÚÈÀÒÙêÊôÔâÂûÛåÅœŒæÆøØëËïÏ), and none of these (umm … all of them together) broke the link.

Great job :)

Thanks for testing and sending the feedback Uli.


Steve

Offline

#22 2012-02-20 16:09:40

net-carver
Archived Plugin Author
Registered: 2006-03-08
Posts: 1,648

Re: [textile] Links with umlauts don't work

A quick update regarding possible unicode (or just plain URL encoded entities) input for links in restricted mode (think comments), I was thinking of simply rawurlencoding the input and then not decoding characters from the reserved character set.

Anyone else have any input on this?


Steve

Offline

#23 2012-03-18 13:06:05

net-carver
Archived Plugin Author
Registered: 2006-03-08
Posts: 1,648

Re: [textile] Links with umlauts don't work

Hello all,

I’ve committed what I have so far into the master branch to support unicode in links and you are invited to test this out. I am currently looking at the problems with the Japanese wikipedia style links — like http://ja.wikipedia.org/wiki/福島駅_(大阪府) — which tend to end in a closing bracket ‘)’ as Philippe mentioned above. It’s actually a problem for textile to match upto and including the final ‘)’ at the moment but I’m playing with extending the regex a little to allow for this.


Steve

Offline

#24 2012-03-18 13:39:26

net-carver
Archived Plugin Author
Registered: 2006-03-08
Posts: 1,648

Re: [textile] Links with umlauts don't work

There’s a new branch, ja-wiki-links that should allow textile to handle Japanese Wikipedia style links that end with a balanced set of parenthesis. However, this isn’t perfect as it looses the ability to deal with a trailing slash.

So, using Philippe’s example…

"福島駅":http://ja.wikipedia.org/wiki/福島駅_(大阪府) this is a test -- no trailing slash.

…works using this branch, as does this…

"$":http://ja.wikipedia.org/wiki/福島駅_(大阪府) this is another test -- no trailing slash.

…but not this…

"福島駅":http://ja.wikipedia.org/wiki/福島駅_(大阪府)/ this is a test -- with a trailing slash.

…which creates the link OK but leaves the slash as a separate character at the end.


Steve

Offline

Board footer

Powered by FluxBB