Google and File Download

Jeremie · 2006-02-26 00:56:29

Two strange things that have been happening for some time. Let’s investigate.

First example :

http://shadowrun.fr/file_download/10  In Sitemap   	 Web  	  General HTTP error  	 Feb 22

This is an example of what I get from Google’s sitemap.

Second example :

2/25 9:37 am  	crawl-66-249-71-70.googlebot.com  	file_download/12#aborted-at-1%

This is from the TXP logs.

As far as I can tell, Googlebot doesn’t like the way file downloading is handled by TXP or PHP. ANyone has a clue about why ?

(Edit: assumed resolved and marked as such. :) -Mary)

Sencer · 2006-02-26 07:49:34

Google doesn’t index binary data. IMHO it will request the file, and then when it ses that it is sent as “application/octet-stream” it will discard the request. The HTTP error might be explained (I am just guessing) that the Mime-Type of the Response does not match what Googlebot is sending as as Accept-Headers in the HTTP request.

Not sure how to go about proving/checking that this is what is actually happening. So if anybody has any alternative theories…. :)

Jeremie · 2006-02-26 20:52:39

One thing though, most of the files on this site are PDF, and Google should “read” them.

Sencer · 2006-02-27 08:37:54

I can only guess, since I don’t know how the Googlebots work, but it’s possible that they decide what they do based on the mimetype. Since we want the files to be downloaded, we send application/octet-stream which means that Google wouldn’t know that it is a pdf file, unless they downloaded and parsed the file, which they won’t, if they decide based upon the mimetype. I guess in the future we could think about configurable mimetpes for download.

Have you tried asking on one of the Google-Groups or Google directly?

Textpattern CMS

Textpattern CMS support forum

#1 2006-02-26 00:56:29

Google and File Download

#2 2006-02-26 07:49:34

Re: Google and File Download

#3 2006-02-26 20:52:39

Re: Google and File Download

#4 2006-02-27 08:37:54

Re: Google and File Download

Board footer