Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: [textile] Links with umlauts don't work
Just tested URL encoding by Markdown (one of Textile alternatives) on Tumblr.com — sweet!
Last edited by Vienuolis (2011-08-21 08:07:25)
Offline
#14 2011-08-23 10:52:27
- bjornbjorn
- New Member
- Registered: 2011-08-23
- Posts: 1
Re: [textile] Links with umlauts don't work
Hi guys, has anyone reported this as a bug? We’re also having issues with this – getting reports from users that Textile won’t markup links with cyrillic characters. So looks like the same issue.
Offline
Re: [textile] Links with umlauts don't work
Offline
Re: [textile] Links with umlauts don't work
Please provide examples, so it becomes clear which part of the URL has cyrillic characters.
I think the difficult part is dealing with URLs like these:
- http://example.com/one%20two%20three
- http://example.com/minus2%discount
- http://example.com/éë
- http://example.com/éë2%discount
- http://example.com/éë%20minus2%discount
- no encoding needed
- urlencoding needed… but how easy is it to recognize this properly?
- urlencode
- urlencode (including the %)
- probably invalid, mix of already urlencoded and non-urlencoded. I’d probably urlencode here.
Interesting, if I copy an URL that has non-ascii characters from the address bar in Firefox and past it somewhere else, it’s pasted with proper URL encoding. If people who use textile past their URLs in the same way, there’s no need for textile to take care of the encoding.
Last edited by ruud (2011-08-23 12:05:53)
Offline
#17 2011-08-23 11:34:19
- net-carver
- Archived Plugin Author
- Registered: 2006-03-08
- Posts: 1,648
Re: [textile] Links with umlauts don't work
Yes, textile does face a challenge with the URL encoding. I’ll see if I can put some time into this.
Vienuolis wrote:
Just tested URL encoding by Markdown (one of Textile alternatives) on Tumblr.com — sweet!
For the time being, I think Markdown has a Textile compatibility mode so you should be able to drop it into your Txp sites (renaming the file) as a replacement for Textile.
— Steve
Offline
Re: [textile] Links with umlauts don't work
ruud wrote:
Interesting, if I copy an URL that has non-ascii characters from the address bar in Firefox and past it somewhere else, it’s pasted with proper URL encoding. If people who use textile past their URLs in the same way, there’s no need for textile to take care of the encoding.
That is a Firefox specific behaviour I think. Camino, Opera and Safari (all on OS X 10.6.8 and 10.5) don’t do that. Nor would I expect it, to be honest. That is, I consider the Firefox behaviour a bug.
Hold on, Chrome dev channel appears to have the same problem as Firefox. :-(
—-
It is a tricky problem, esp for languages that don’t use spaces to separate words (e.g Japanese). I noticed on Wikipedia that they use roman brackets ‘)’ in URI – as opposed to Japanese brackets ‘)’.
example: http://ja.wikipedia.org/wiki/福島駅_(大阪府)
Last edited by phiw13 (2011-08-23 13:36:18)
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline
#19 2012-02-16 21:58:59
- net-carver
- Archived Plugin Author
- Registered: 2006-03-08
- Posts: 1,648
Re: [textile] Links with umlauts don't work
Folks
For anyone following this thread, I’ve pushed a new branch unicode-links that is a first-cut attempt at dealing with this for normal mode (not restricted mode). For now I’m calling rawurlencode on the url part and then decoding the reserved characters (as defined in RFC3986) on the assumption that in normal mode, folks won’t be entering already encoded data in their URLs (the same will likely not be true in restricted mode where comments may be left by any Joe H. Acker.)
Anyway, for now, this seems to let you enter the likes of…
"Übermensch":https://de.wikipedia.org/wiki/Übermensch
… getting back…
<p><a href="https://de.wikipedia.org/wiki/%C3%9Cbermensch">Übermensch</a></p>
… which looks OK to my untrained eye and works fine in FF. I am, however, having trouble getting Japanese characters through our unicode regex — characters just like those in Philippe’s post above.
— Steve
Offline
#20 2012-02-17 21:39:31
- uli
- Moderator
- From: Cologne
- Registered: 2006-08-15
- Posts: 4,306
Re: [textile] Links with umlauts don't work
Steve, I’ve tortured that version a little with what I could reach on my keyboard with the help of two hands (i.e. äöüÄÖÜßçéáóúèàòùÉÁÓÚÈÀÒÙêÊôÔâÂûÛåÅœŒæÆøØëËïÏ
), and none of these (umm … all of them together) broke the link.
Great job :)
In bad weather I never leave home without wet_plugout, smd_where_used and adi_form_links
Offline
#21 2012-02-17 22:06:00
- net-carver
- Archived Plugin Author
- Registered: 2006-03-08
- Posts: 1,648
Re: [textile] Links with umlauts don't work
uli wrote:
Steve, I’ve tortured that version a little with what I could reach on my keyboard with the help of two hands (i.e.
äöüÄÖÜßçéáóúèàòùÉÁÓÚÈÀÒÙêÊôÔâÂûÛåÅœŒæÆøØëËïÏ
), and none of these (umm … all of them together) broke the link.Great job :)
Thanks for testing and sending the feedback Uli.
— Steve
Offline
#22 2012-02-20 16:09:40
- net-carver
- Archived Plugin Author
- Registered: 2006-03-08
- Posts: 1,648
Re: [textile] Links with umlauts don't work
A quick update regarding possible unicode (or just plain URL encoded entities) input for links in restricted mode (think comments), I was thinking of simply rawurlencoding the input and then not decoding characters from the reserved character set.
Anyone else have any input on this?
— Steve
Offline
#23 2012-03-18 13:06:05
- net-carver
- Archived Plugin Author
- Registered: 2006-03-08
- Posts: 1,648
Re: [textile] Links with umlauts don't work
Hello all,
I’ve committed what I have so far into the master branch to support unicode in links and you are invited to test this out. I am currently looking at the problems with the Japanese wikipedia style links — like http://ja.wikipedia.org/wiki/福島駅_(大阪府) — which tend to end in a closing bracket ‘)’ as Philippe mentioned above. It’s actually a problem for textile to match upto and including the final ‘)’ at the moment but I’m playing with extending the regex a little to allow for this.
— Steve
Offline
#24 2012-03-18 13:39:26
- net-carver
- Archived Plugin Author
- Registered: 2006-03-08
- Posts: 1,648
Re: [textile] Links with umlauts don't work
There’s a new branch, ja-wiki-links that should allow textile to handle Japanese Wikipedia style links that end with a balanced set of parenthesis. However, this isn’t perfect as it looses the ability to deal with a trailing slash.
So, using Philippe’s example…
"福島駅":http://ja.wikipedia.org/wiki/福島駅_(大阪府) this is a test -- no trailing slash.
…works using this branch, as does this…
"$":http://ja.wikipedia.org/wiki/福島駅_(大阪府) this is another test -- no trailing slash.
…but not this…
"福島駅":http://ja.wikipedia.org/wiki/福島駅_(大阪府)/ this is a test -- with a trailing slash.
…which creates the link OK but leaves the slash as a separate character at the end.
— Steve
Offline