Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Pages: 1
Google and File Download
Two strange things that have been happening for some time. Let’s investigate.
First example :
http://shadowrun.fr/file_download/10 In Sitemap Web General HTTP error Feb 22
This is an example of what I get from Google’s sitemap.
Second example :
2/25 9:37 am crawl-66-249-71-70.googlebot.com file_download/12#aborted-at-1%
This is from the TXP logs.
As far as I can tell, Googlebot doesn’t like the way file downloading is handled by TXP or PHP. ANyone has a clue about why ?
(Edit: assumed resolved and marked as such. :) -Mary)
Offline
Re: Google and File Download
Google doesn’t index binary data. IMHO it will request the file, and then when it ses that it is sent as “application/octet-stream” it will discard the request. The HTTP error might be explained (I am just guessing) that the Mime-Type of the Response does not match what Googlebot is sending as as Accept-Headers in the HTTP request.
Not sure how to go about proving/checking that this is what is actually happening. So if anybody has any alternative theories…. :)
Offline
Re: Google and File Download
One thing though, most of the files on this site are PDF, and Google should “read” them.
Offline
Re: Google and File Download
I can only guess, since I don’t know how the Googlebots work, but it’s possible that they decide what they do based on the mimetype. Since we want the files to be downloaded, we send application/octet-stream which means that Google wouldn’t know that it is a pdf file, unless they downloaded and parsed the file, which they won’t, if they decide based upon the mimetype. I guess in the future we could think about configurable mimetpes for download.
Have you tried asking on one of the Google-Groups or Google directly?
Offline
Pages: 1