Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

  1. Index
  2. » Archives
  3. » & #38; in urls

#1 2008-01-17 03:29:07

guiguibonbon
Member
Registered: 2006-02-20
Posts: 296

& #38; in urls

Texpattern has the wisdom to change any ampersand in urls into an html entity. That’s great. Except it could better us & instead of &. Here’s why : all programming languages have the equivalent to php’s htmlspecialchars_decode() function but none of these function (including php’s) seam to tackle &. They only transform &. Also, there’s an issue with such functions as urlencode() which transform # into %23, thereby ruining the ampersand entirely. That of course does not happen with amp because it contains no special character.

This poses an issue in the case of newsletter apps which track clicks. They use redirects, and just perform a regular decoding function on the urls they found in the html of the email. Meaning & just stays. The bad news is urls containing the &# combination are plainly not valid and are being cut right after the & by the servers. There’s no way detecting it.

I emailed campaign monitor about the issue, and it sounds like they never had anyone complain about this. Maybe we should do like the rest of the world and use &.

Last edited by guiguibonbon (2008-01-17 03:46:20)

Offline

#2 2008-01-17 13:50:30

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: & #38; in urls

As I understand it, the numeric entities are used to avoid problems in RSS/Atom feeds, due to readers not being able to understand named entities.

Offline

#3 2008-01-17 14:27:22

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,200
Website GitHub Mastodon Twitter

Re: & #38; in urls

I do not know anything about the matter but just in case guiguibonbon is right re the observation, what if the numeric entities are only used in the excerpt field?


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#4 2008-01-17 19:03:52

Gocom
Developer Emeritus
From: Helsinki, Finland
Registered: 2006-07-14
Posts: 4,533
Website

Re: & #38; in urls

I do not know anything about the matter but just in case guiguibonbon is right re the observation, what if the numeric entities are only used in the excerpt field?

Colak, what do you mean?

Only issue is that instead of special char we should use entitets, as php has it own funtions for that also. And sp-chars ain’t for urls, entitets are.

Cheers!

Offline

#5 2008-01-17 20:28:06

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: & #38; in urls

htmlspecialchars() produces the escaping that ggbb wants, but TXP uses its own escape_output() function instead.

Depending on your preferences, body or excerpt is used in the feeds, so doing this just in the excerpt doesn’t help.

Offline

#6 2008-01-18 03:33:02

phiw13
Plugin Author
From: South-Western Japan
Registered: 2004-02-27
Posts: 3,418
Website

Re: & #38; in urls

ruud wrote:

As I understand it, the numeric entities are used to avoid problems in RSS/Atom feeds, due to readers not being able to understand named entities.

That would be strange. There are 5 predefined (named) entities in XML, and all XML processors are required to support those.

I’m not sure what Atom and RSS specs have to say on this.


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
phiw13 on Codeberg

Offline

  1. Index
  2. » Archives
  3. » & #38; in urls

Board footer

Powered by FluxBB