Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2008-08-14 00:27:36

kkobashi
Member
Registered: 2008-01-27
Posts: 51
Website

Lack of section/category index page

So I type in: http://www.mysite.com/category

and up comes a 404 error. Why is this decision made in the core?

The same goes true with section.

Further, suppose I have a section named ‘articles’.

http://www.mysite.com/articles
http://www.mysite.com/section/articles

are the same the thing! There’s duplicate content problems right there.

So you go and supply a sitemap to Google, Yahoo, wherever. And they get an idea of what nodes are in the application. They tree traverse stating at the root and bang, hit the category or section index page. Get a 404 and say “screw it! we aren’t traversing any farther”

Why aren’t there category and section index pages? Author has it. Why not treat section and category just like that to avoid this sort of potential problem? SEO wise, this could kill a site from the search engines. Has anyone run into this problem on their Textpattern websites? How do your sites rank in Google/Yahoo indexes? Are all your categories being spidered and in the search index?

Sorry, but I have concerns about Textpattern here. Textpattern has been through how many versions and major fundamental issues like these have not been found or fixed?

Last edited by kkobashi (2008-08-14 01:36:51)


Kerry Kobashi
Kobashi Computing

Offline

#2 2008-08-14 23:40:42

kkobashi
Member
Registered: 2008-01-27
Posts: 51
Website

Re: Lack of section/category index page

Is this a red herring or not?


Kerry Kobashi
Kobashi Computing

Offline

#3 2008-08-15 00:15:04

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,453
Website GitHub

Re: Lack of section/category index page

kkobashi wrote:

I type in: http://www.mysite.com/category and up comes a 404 error. Why is this decision made in the core?

I’d guess because categories aren’t real things. There’s no such category as '' so it rightly gives a 404. Same as file_download; it’s a virtual URL. If you gave site.com/file_download without a file you’d get a 404 because the resource was missing. It’s the same with categories.

Further, suppose I have a section named ‘articles’… There’s duplicate content problems right there.

Not necessarily. Why would you use both types of URL in a site? Surely you’d choose one or the other URL scheme and stick to it? A search engine can’t magically sniff out stuff it doesn’t know is there: it collects all links on a page and visits each one. If you use both type of link in your site the bot might think you have duplicate content; if you don’t, it won’t be any wiser.

So you go and supply a sitemap… They tree traverse stating at the root and bang, hit the category or section index page. Get a 404 and say “screw it! we aren’t traversing any farther”

Will they? News to me. I was under the impression they won’t visit a link if you don’t have any links that go to it. For starters it can’t just “decide” to try the word “category” or “section” without you telling it; otherwise, why would it not pick the word “aardvark” or “trombone” instead? It’d be there a long while guessing words from a dictionary in the hope it might find some content worth indexing.

If you have made links in your site that go to site.com/category or site.com/section when you know there’s no content there, you would expect a 404! Besides, I think I read somewhere (can’t find it now, so I may have imagined it) that you shouldn’t let spiders index list pages; spiders should only index articles and content or they’ll see a page of links and then find the article again and think you have duplicate content. So in this case, the apparent “lack” of ‘category’ or ‘section’ index pages could be argued to help your SEO! Like I say, I might be wrong here.

Has anyone run into this problem on their Textpattern websites?

Not noticed it. My sites index fine. Not sure how I’d prove each category is being indexed, since I’d have to choose a search term that matched an article in that category that ranked high enough up the google ranking to see if it had indexed that particular category(!) But I don’t think it matters: as long as it indexes the article and its (relevant) content, why do I care if it indexes the category list page or not? I’d prefer it didn’t since the category index page conveys no (well, little) information about the articles in my site.

I have concerns about Textpattern here. Textpattern has been through how many versions and major fundamental issues like these have not been found or fixed?

If they were fundamental issues they would have been fixed. I’m not being all “TXP is God” or anything — it has minor flaws like every system, that are being addressed — but unless someone can convince me otherwise I’m not sure these are true problems that affect a site’s searchability or ranking/importance. Getting inbound links from other relevant sites is the most important way you can help boost your ranking.

Last edited by Bloke (2008-08-15 00:20:22)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#4 2008-08-15 06:17:08

kkobashi
Member
Registered: 2008-01-27
Posts: 51
Website

Re: Lack of section/category index page

“Why would you use both types of URL in a site?” “A search engine can’t magically sniff out stuff it doesn’t know is there: it collects all links on a page “

And why do the links only have to come from your site?. Ever thought about the case of inbound links to both /section/whatever and /whatever from other sites? What if those pages that point to those URIs were created by competitors who want to exploit your duplicate content weakness? So google comes along, spiders both pages and says “Whoa! Duplicate content penalty”. Its a serious flaw and should be fixed!

“I’d guess because categories aren’t real things.”

By definition, “URI’s uniquely identify an abstract or physical resource/object on the site.” http://www.ietf.org/rfc/rfc3986.txt

How does an empty category uniquely identify itself when it throws up a 404 “not found”? There are many things that are “Not found” Whats so unique about that? And what does that say about its decendents? Categories under Textpattern are URIs. Are they just for cosmetic looks? The URI exists and there should be no 404’s along the path to get to the resource object. Return an empty page if you have to! But can I? No. Because Textpattern hard codes the 404. Also, category pages can be created to list out articles associated with them. They dont have to be one way links to articles. It can be the other way too.

“Not sure how I’d prove each category is being indexed”
Is a category under Textpattern not a URI? (http://www.site.com/category/foobar). Is foobar a resource/object on the server? It better be or its a 404. Why would this be any different than an article page? Look in Googles cache. Review your logs and who hit it. Plenty of ways to know if a bot munched on it. And if it did, and it is not showing up in the Google index, then you have a problem. Is this due to a flaw in architecture or because you dont have any links pointing at it?

“why do I care if it indexes the category list page or not? I’d prefer it didn’t since the category index page conveys no (well, little) information about the articles in my site”

Site Categories
————————————-
Toys article 1 article 2 .. (more …)

Games article 1 article 2 … (more …)

and so on

Isn’t that more useful to a user than 404 page?

One builds websites for users. Why not accomodate them by showing something useful like all the categories in links instead of a bloody and useless 404? How about a tag cloud? It can serve as a useful category index page. Its another way for users to find things. Isn’t that what a user is doing when they visit a site? Finding things? Those friendly URLs were made with a purpose – to help assist users remember things and traverse. Why go through all the hassle of mod_rewrite and tweaking URIs if you are going to put up a 404?!

Ok. I’m done with this thread. I appreciate the feedback.

Last edited by kkobashi (2008-08-15 06:29:20)


Kerry Kobashi
Kobashi Computing

Offline

#5 2008-08-15 07:28:10

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,453
Website GitHub

Re: Lack of section/category index page

Fair points, well made.

Honest answer then: I don’t know. I’ll leave it to people more knowledgable than me to find the answers you require.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#6 2008-08-15 10:11:23

Gocom
Developer Emeritus
From: Helsinki, Finland
Registered: 2006-07-14
Posts: 4,533
Website

Re: Lack of section/category index page

  1. People just can link to made up addresses.
  2. Ofcourse you can block those with robots.txt or redirects as I do (btw, i block all category etc. lists that can cause duplicate content).
  3. Messy URLs? Yep, someone can link them too. But you can redirect those too, in example with rah_metas
  4. Search results and the q? Yep, those too. Robots.txt (and redirect for users) again can block it if you want. In example I always limit search to one section with redirects (in example /lol/?q=bar or /foo/?q=bar => /?q=bar)

And what does that say about its decendents? Categories under Textpattern are URIs. Are they just for cosmetic looks? The URI exists and there should be no 404’s along the path to get to the resource object. Return an empty page if you have to! But can I? No. Because Textpattern hard codes the 404.

You can by styling the error template that you use (error_default / error_404). Like by using conditionals that hook up with the request URI. Or you can redirect the page.

Offline

#7 2008-08-17 03:00:59

maniqui
Member
From: Buenos Aires, Argentina
Registered: 2004-10-10
Posts: 3,070
Website

Re: Lack of section/category index page

Gocom wrote:

i block all category etc. lists that can cause duplicate content.

Please, could you explain how to do that? :)

Search results and the q? Yep, those too. Robots.txt (and redirect for users) again can block it if you want.

And that one too, please.

Thanks!


La música ideas portará y siempre continuará

TXP Builders – finely-crafted code, design and txp

Offline

#8 2008-08-17 14:01:33

Gocom
Developer Emeritus
From: Helsinki, Finland
Registered: 2006-07-14
Posts: 4,533
Website

Re: Lack of section/category index page

Please, could you explain how to do that? :)

To your robots.txt file in example:

User-agent: *
Disallow: /?c=*
Disallow: /category/
Disallow: /?month=*
Disallow: /?q=*

And then, you could redirect all searches to one place, in example with .httaccess file or with line of php in top of your page template:

<txp:php>
	if(section(array()) != 'default' && gps('q')) {
		header('Location: '.hu.'?q='.gps('q'));
		exit();
	}
</txp:php>

Offline

#9 2008-08-17 16:40:44

maniqui
Member
From: Buenos Aires, Argentina
Registered: 2004-10-10
Posts: 3,070
Website

Re: Lack of section/category index page

@Gocom, many thanks.

Regarding the redirection of searches, I will probably redirect them to a /search/ section.


La música ideas portará y siempre continuará

TXP Builders – finely-crafted code, design and txp

Offline

Board footer

Powered by FluxBB