Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2006-02-26 00:56:29

Jeremie
Member
From: Provence, France
Registered: 2004-08-11
Posts: 1,578
Website

Google and File Download

Two strange things that have been happening for some time. Let’s investigate.

First example :

http://shadowrun.fr/file_download/10  In Sitemap   	 Web  	  General HTTP error  	 Feb 22

This is an example of what I get from Google’s sitemap.

Second example :

2/25 9:37 am  	crawl-66-249-71-70.googlebot.com  	file_download/12#aborted-at-1%

This is from the TXP logs.

As far as I can tell, Googlebot doesn’t like the way file downloading is handled by TXP or PHP. ANyone has a clue about why ?

(Edit: assumed resolved and marked as such. :) -Mary)

Offline

#2 2006-02-26 07:49:34

Sencer
Archived Developer
From: cgn, de
Registered: 2004-03-23
Posts: 1,803
Website

Re: Google and File Download

Google doesn’t index binary data. IMHO it will request the file, and then when it ses that it is sent as “application/octet-stream” it will discard the request. The HTTP error might be explained (I am just guessing) that the Mime-Type of the Response does not match what Googlebot is sending as as Accept-Headers in the HTTP request.

Not sure how to go about proving/checking that this is what is actually happening. So if anybody has any alternative theories…. :)

Offline

#3 2006-02-26 20:52:39

Jeremie
Member
From: Provence, France
Registered: 2004-08-11
Posts: 1,578
Website

Re: Google and File Download

One thing though, most of the files on this site are PDF, and Google should “read” them.

Offline

#4 2006-02-27 08:37:54

Sencer
Archived Developer
From: cgn, de
Registered: 2004-03-23
Posts: 1,803
Website

Re: Google and File Download

I can only guess, since I don’t know how the Googlebots work, but it’s possible that they decide what they do based on the mimetype. Since we want the files to be downloaded, we send application/octet-stream which means that Google wouldn’t know that it is a pdf file, unless they downloaded and parsed the file, which they won’t, if they decide based upon the mimetype. I guess in the future we could think about configurable mimetpes for download.

Have you tried asking on one of the Google-Groups or Google directly?

Offline

Board footer

Powered by FluxBB