Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
attribute values in TXP tags: html escaped or not?
If I want to set a label “me & you” for the TXP category_list tag, I could do either:
<txp:category_list label="me & you">
or
<txp:category_list label="me & you">
I think only the second one produces valid XHTML in TXP 4.0.4, because the label value is used as-is (attribute parsing doesn’t un-escape attribute values, nor is the label value escaped when it’s output on the page)… but is that by design or could it change in the next TXP version?
The reason why I’m posting it here is that when writing a plugin, I try to follow TXP conventions whenever possible, so the tags offered by the plugin behave similar to core TXP tags.
Offline
#2 2007-04-11 02:24:55
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: attribute values in TXP tags: html escaped or not?
Fair question. I’d guess you’ll probably find this isn’t handled consistently at present.
For plugins that allow a label attribute, I’d recommend you call the doLabel() function, and let it worry about escaping. Best to handle it centrally.
Any comments on whether or not we should change doLabel() to escape its output?
Alex
Offline
Re: attribute values in TXP tags: html escaped or not?
My considerations:
There are no more than three entities which would require escaping when they are used as an attribute value: >
, <
, and &
. The first two of them have to be escaped prior to the point where they are fed to the parser, as otherwise the tag won’t be parsed successfully.
This leaves us with the need for a correct procedure regarding the ampersand: For the sake of consistency, I’d advise plugin users to escape attribute values, and plugin developers to leave them untouched.
doLabel() ist one point to consider, but there’s more. On similar occasions, I’ve used escape_title() in some of my plugins where appropriate.
Last edited by wet (2007-04-11 06:07:00)
Offline
#4 2007-04-11 09:18:18
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: attribute values in TXP tags: html escaped or not?
The first two of them have to be escaped prior to the point where they are fed to the parser, as otherwise the tag won’t be parsed successfully
I don’t think that’s true, at least not always. This parses correctly for me:
<txp:text item="1 < 2" />
<txp:text item="2 > 1" />
I think it’s reasonable to allow those characters in attributes, so I don’t think that should be considered a loophole or a bad thing. I can imagine tag syntax like <txp:if_something where="foo>=10" />
being useful. <txp:if_something where="foo>=10" />
would be both counterintuitive and semantically incorrect.
The improved parser in 4.1 supports backslash escaping for this kind of situation. It was intended for syntax like <txp:foo title="with a \"quote\"" />
. If there’s a definitive syntax to be decided for < and > in tag attributes, perhaps they should require backslash escaping. (Some kind of escaping is required to ensure the parser doesn’t get confused in some circumstances)
Last edited by zem (2007-04-11 09:25:14)
Alex
Offline
Re: attribute values in TXP tags: html escaped or not?
zem wrote:
If there’s a definitive syntax to be decided for < and > in tag attributes, perhaps they should require backslash escaping.
This sounds most reasonable.
Offline
Re: attribute values in TXP tags: html escaped or not?
zem wrote:
For plugins that allow a label attribute, I’d recommend you call the doLabel() function, and let it worry about escaping. Best to handle it centrally.
I can’t use a function that might change its output (see below), which is one of the reasons why it’s nice to know exactly how things like this are supposed to behave.
Any comments on whether or not we should change doLabel() to escape its output?
Perhaps a different question is: should it be possible to have HTML tags contained in the label attribute value. Looking a the tag parser and doLabel, I suspect that things like label="some <em>bold</em> label"
would show up just like that in the HTML source (making the word ‘bold’ show up emphasized).
Although TextBook is not the official documentation, it’s confusing on whether that is allowed or just happens to work. Some parts of Textbook say label="value"
, which doesn’t specify if that value is HTML or just text, while in other places it says label="label text"
which seems to suggest that it doesn’t allow HTML, only plain text.
Before making any kind of decision on the doLabel output encoding, I think it is important to decide whether HTML tags should be allowed to exist within a label or not:
- YES, tags allowed: doLabel doesn’t have to escape output, but perhaps the label attribute needs some escaping (backslashes) to play nice with the tag parser
- NO, just plain text: two options
- user html-escapes the input, doLabel does nothing
- user uses only backslash escape codes to please the tag parser, doLabel handles html-escaping.
I’d vote for the last option: letting doLabel escape the attribute value and having the user enter the attribute value as plain text (backslash escaped if needed).
If I understand the 4.0.x tag parser correctly, without backslash escaping it would only break on things like this:
<txp:tag attribute="some <hr /> thing" />
The />
part would cause problems, unless preceded by br
.
And of course "
quotes aren’t possible within an attribute value either.
What I like about using plain text attribute values instead of HTML escaped attribute values is:
- Once you have the attribute values in the $atts array in the taghandler function, I find it easier to work with non-escaped text (comparable to disabling magic_quotes in PHP) and escaping only if and when needed.
- Letting the taghandler function deal with escaping is easier and probably faster than unescaping (if you need the unescaped attribute value).
- It makes mistakes more obvious, because if someone accidentally enters
&
instead of&
, it shows up double escaped on the website (always valid XHTML), which is easy for anyone to see, while the opposite (invalid XHTML) would be probably be only detected by a validation tool.
It does however require the TXP user to understand that the TXP parser behaves differently than a HTML parser and therefore uses different escape techniques.
Last edited by ruud (2007-04-11 23:07:58)
Offline
#7 2007-04-12 21:50:47
- Mary
- Sock Enthusiast
- Registered: 2004-06-27
- Posts: 6,236
Re: attribute values in TXP tags: html escaped or not?
I’ve always understood a label to be purely textual.
Offline
Re: attribute values in TXP tags: html escaped or not?
Mary, which of the two code examples in the opening post do you think shows the correct method in TXP to create a label me & you
or any other attribute value for that matter?
Alex points out that this may not be handled consistently for all attributes, so the question is: how do you think the attribute values should be entered by TXP users?
Last edited by ruud (2007-04-12 23:02:30)
Offline
#9 2007-04-13 04:51:54
- Mary
- Sock Enthusiast
- Registered: 2004-06-27
- Posts: 6,236
Re: attribute values in TXP tags: html escaped or not?
Considering that most tags are meant to be used in pages and forms, you need to manually escape other entities you enter in your page/form anyway. That makes me think that you should have to do the escaping yourself.
I don’t know how I feel about entering (X)HTML/Txp tags inside Txp attribute values (<txp:tag attribute="some <hr /> thing" />
). As far as I can recall, I’ve always avoided doing it myself, it just feels… dirty. ;o And it makes it kind of hard to read.
Offline
#10 2007-04-13 08:15:35
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: attribute values in TXP tags: html escaped or not?
I’d vote for the last option: letting doLabel escape the attribute value and having the user enter the attribute value as plain text (backslash escaped if needed).
Yes, I think this is the most sensible option.
Generally speaking I think the data being passed around Textpattern taghandlers and library functions, stored in the database etc, should be raw utf-8 — not HTML code, entity-escaped text or some other special encoding. Escaping should be “late” and dynamic — it happens at the last minute, just before the output is sent to the browser, and the tag (or output functions) ought to decide what type of escaping to apply.
As a simple example, I could imagine the doLabel() function including the label text in the labeltag’s title
attribute, e.g. <txp:foo label="my label" />
produces <h3 title="my label">my label</h3>
. That wouldn’t work if tags were allowed within attributes. (Unescaping, strip_tags etc is just a hacky stopgap).
For things like <txp:sometag label="some <em>bold</em> label" />
, I’m almost tempted to suggest <txp:sometag label="some *bold* label" />
as a possible future syntax.
ruud: are you saying we shouldn’t add escaping to the doLabel() function in the 4.0.x branch?
Last edited by zem (2007-04-13 08:18:57)
Alex
Offline
Re: attribute values in TXP tags: html escaped or not?
zem wrote:
For things like
<txp:sometag label="some <em>bold</em> label" />
, I’m almost tempted to suggest<txp:sometag label="some *bold* label" />
as a possible future syntax.
One way or the other, you’ll need escaping, for either escape from HTML or from Textile.
Offline
Re: attribute values in TXP tags: html escaped or not?
zem wrote:
Generally speaking I think the data being passed around Textpattern taghandlers and library functions, stored in the database etc, should be raw utf-8 — not HTML code, entity-escaped text or some other special encoding. Escaping should be “late” and dynamic — it happens at the last minute, just before the output is sent to the browser, and the tag (or output functions) ought to decide what type of escaping to apply
My thoughts exactly.
If the label attribute value is supposed to be plain text without html tags for markup purposes, then I think the doLabel() function should use htmlspecialchars or escape_output, so <txp:tag label="me & you" />
would send <label>me & you</label>
to the browser.
but Mary wrote:
Considering that most tags are meant to be used in pages and forms, you need to manually escape other entities you enter in your page/form anyway. That makes me think that you should have to do the escaping yourself.
Two developers, two opinions… pick one :)
Let me explain what triggered me to ask this question. I was looking at the zem_contact_reborn plugin code. It uses the label attribute value both in the HTML contact form and in the plain text email. There isn’t supposed to be HTML markup in the label, but ampersands and > < characters can occur. Obviously, in the email I don’t want any HTML escaped text, but when showing a label in the contact form it should be properly HTML escaped:
- If the user html-escapes the attribute values, I have to unescape for use in email
- If no escaping is done by the user, I have to html-escape the label when sending it to the browser.
Since there doesn’t seem to be a consensus on which approach to use, I’ll go for the second approach (no escaping done by the user).
Perhaps it’s too late to set guidelines/rules in the TXP 4.0.x series for escaping and for what types of data (text/html/textile) to use where. In any case, I think it would be a welcome improvement for the 4.1.x series. Not just for attribute values, but also for whatever is stored in places like custom fields and link/image descriptions and the article title.
This would not only saves the user from having to experiment to get the desired result, but is also valuable when checking the code for missing/duplicate/wrong escaping (not knowing what is the right way, it hard to detect bugs).
Last edited by ruud (2007-04-14 18:52:53)
Offline