Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Pages: 1
#1 2005-11-07 18:54:45
- Andrew
- Plugin Author
- Registered: 2004-02-23
- Posts: 730
serarch engines and /?commented=1
It looks like Google and MSN (Yahoo’s always a bit behind) are now slowly indexing urls for domain.tld/article/nnn/url-only-title/?commented=1
for all Textpattern sites. It’s a little annoying and I’m not sure there’s a way to avoid this… just something I noticed.
It’s bad because users are entering the article page with no comment form displayed and the text “Thank you for adding your comment.” Obviously, they’ve not yet commented and it’s not at all intuitive to the average user what is going on; they’ll likely just leave without commenting. Perhaps add a referrer check to only display if you’re coming from the same site?
For instance, something like this? (not heavily tested)
Last edited by Andrew (2005-11-07 19:09:30)
Offline
#2 2005-11-07 21:03:07
- zem
- Developer Emeritus
- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: serarch engines and /?commented=1
It looks like Google and MSN (Yahoo’s always a bit behind) are now slowly indexing urls for domain.tld/article/nnn/url-only-title/?commented=1 for all Textpattern sites.
That’s rather odd, given that such a URL should only be discovered after a POST. Which ought to mean that spiders never see it.
When you say “are now slowly indexing”, do you mean that you’re seeing spiders hit those pages, or that you’re seeing them show up in search results?
Alex
Offline
#3 2005-11-07 21:17:44
- Andrew
- Plugin Author
- Registered: 2004-02-23
- Posts: 730
Re: serarch engines and /?commented=1
It’s actually possible in more than one way:
- User posts comment, clicks on external site link (carrying
?commented=1
referrer to a public log) - Site displays AdSense or YPN or something. Search engine bots will typically use ad views as an opportunity to discover “new” urls, as can be seen here:
11/6 11:19:43 pm crawl-66-249-72-111.googlebot.com article/1399/?commented=1
11/6 11:19:41 pm xxx.xxx.xx.xx.some-isp.net article/1399/?commented=1
11/6 11:19:41 pm xxx.xxx.xx.xx.some-isp.net article/1399/
11/6 9:12:59 am crawl-66-249-72-111.googlebot.com article/1399/?commented=1
11/6 9:12:57 am xxx.xxx.xx.xx.some-isp.net article/1399/?commented=1
11/6 9:12:56 am xxx.xxx.xx.xx.some-isp.net article/1399/
11/5 10:06:55 am crawl-66-249-72-111.googlebot.com article/1007/
11/5 10:06:53 am xxx.xxx.xx.xx.some-isp.net article/1007/
11/5 10:05:16 am crawl-66-249-72-111.googlebot.com article/828/
11/5 10:05:14 am xxx.xxx.xx.xx.some-isp.net article/828/
Notice how googlebot hits the url immediately after the user loads the page containing the ad unit? Bots typically crawl a new url immediately after a visitor views a url — any url (not good, but the truth) — so it becomes very easy for these urls to be discovered.
Offline
#4 2005-11-08 20:22:27
- Andrew
- Plugin Author
- Registered: 2004-02-23
- Posts: 730
Re: serarch engines and /?commented=1
RewriteCond %{QUERY_STRING} ^commented=1
RewriteCond %{HTTP_REFERER} !domain.tld
RewriteRule ^(.*)$ /$1? [R=301,L]
This also seems to work in Apache, but I’m more comfortable with an in-application solution.
Offline
Re: serarch engines and /?commented=1
Referrers are not reliable information, many “firewalls” and similar applications filter out referrers or (worse) replace them with other abritrary strings.
When posting the comment we do a Post-Redirect-Get which is the recommended way to do it. So the only way to pass information the the final landing page (that a comment was just entered) is via the url.
I guess it looks like an inbetween “splash-page”, (what you see in this forum after posting) is the solution then.
Are there any better suggestions?
Offline
#6 2005-11-08 22:51:08
- Andrew
- Plugin Author
- Registered: 2004-02-23
- Posts: 730
Re: serarch engines and /?commented=1
Hm, yeah… agreed on the unreliable referrer thing. I’m not sure about the splash page thing; I guess I’d have to see it in action. Let me post and see whatcha mean (silly I’ve never noticed here before).
Offline
#7 2005-11-08 22:57:05
- Andrew
- Plugin Author
- Registered: 2004-02-23
- Posts: 730
Re: serarch engines and /?commented=1
Ok, so just a splash page with
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="refresh" content="1;URL=/wherever/we're/going/" />
I could live with that; although, I’m not sure about the other users… I anticipate the need for a user pref. To me, it’s important to maintain the control of which urls are indexed by search engines. The splash page would facilitate this, while sacrificing little in usability.
Last edited by Andrew (2005-11-08 22:59:46)
Offline
Re: serarch engines and /?commented=1
The purpose of the splash-page would be to provide feedback to the user on what happened. (thank you message, or comment is in the moderation queue etc.). Because once we redirect back to the canonical url, there will be no way to display feedback “in-page”.
Offline
#9 2005-11-08 23:45:31
- Andrew
- Plugin Author
- Registered: 2004-02-23
- Posts: 730
Re: serarch engines and /?commented=1
While it’s not good to have those urls floating around, I’m doubtful the average Textpattern user would be happy with a preview and splash page during the commenting process. I’ve been trying to come up with another solution, but still have come up emptyhanded.
Offline
#10 2005-11-08 23:50:07
- Andrew
- Plugin Author
- Registered: 2004-02-23
- Posts: 730
Re: serarch engines and /?commented=1
What if the user had the ability to override the splash from within comment_form via some tag like <txp:comment_redirect url="" />
? This seems possible if it generates a hidden field, which saveComment()
could pick up, yes?
Offline
Re: serarch engines and /?commented=1
The problem is not offering options. It’s just that none are really satisfactory, no matter how you look at it.
- current solution: After submitting, you go directly to the article-page, which contains user-feedback. Drawback: url has ?commented=1.
- Redirect straight to the canonical url, so search-engines and users that want to link, are not confused and link to wrong urls. Drawback: No user feedback (especially annoying when moderation is on).
- Redirect to a splash-page, which contains user-feedback but is otherwise uninteresting (users won’t link, search-engines won’t see it). Drawback: Yet another page-load.
And I am sure, it won’t be long until some users request an option that comments be submitted via Ajax trickery… ;)
Offline
#12 2005-11-09 15:53:17
- Andrew
- Plugin Author
- Registered: 2004-02-23
- Posts: 730
Re: serarch engines and /?commented=1
Yes, my thoughts exactly. In that case, I guess I’d have to say just leave it be for now since it’s not a widespread problem and I feel that the splash page would likely cause some dissent.
Offline
Pages: 1