Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2012-10-31 22:13:35

etc
Developer
Registered: 2010-11-11
Posts: 5,126
Website GitHub

Why is this page so attractive?

It’s not auto-promotion. :) There is a bunch of IP addresses (mainly registered in China) that repeatedly knock at this page of my site. They send a GET, then POST, and then another GET requests, all within few seconds. They do not look like search robots, since this is the only page they visit, many times a day. These junkies continue even if I set Status to hidden, with only one GET request this time, but more frequently. I am confident in Textpattern and wouldn’t mind, if they did not pollute the logs.

What attracts them? Is this a MySQL SELECT query in the text? Does someone experience the same thing?

Offline

#2 2012-11-01 05:57:23

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,054
Website GitHub Mastodon Twitter

Re: Why is this page so attractive?

My immediate reaction would be that they are attracted to the comment form but I have the same problem with http://neme-imca.org/publications/ which has no forms in it.

Oleg, I do agree with you that the main issue is the logs.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#3 2012-11-01 09:04:04

etc
Developer
Registered: 2010-11-11
Posts: 5,126
Website GitHub

Re: Why is this page so attractive?

Comments are enabled on all pages, thare must be other thing they are after. Here is a typical sequence (we care about our visitors privacy :)

	01 Nov 2012 09:27:01 	218.93.xxx.xxx 	218.93.xxx.xxx 	projet/​etc/​index.​php?​id=6 	  	GET 	200
	01 Nov 2012 09:26:59 	218.93.xxx.xxx 	218.93.xxx.xxx 	projet/​etc/​index.​php?​id=6 	  	POST 	200
	01 Nov 2012 09:26:58 	218.93.xxx.xxx 	218.93.xxx.xxx 	projet/​etc/​index.​php?​id=6 	  	GET 	200

There is another page (id=3) that contains SELECT ... FROM ... WHERE text, it gets some similar hits too, but much less. And the third one (id=10) not at all.

Amazing, yesterday I have redirected some of these IP back home by REMOTE_ADDR, today they have started to use HTTP_X_FORWARDED_FOR. Are they humans?

Offline

#4 2012-11-01 10:30:00

etc
Developer
Registered: 2010-11-11
Posts: 5,126
Website GitHub

Re: Why is this page so attractive?

What scares me is that I serve txp_die() with 503 status to these guys, but some of them somehow bypass it and still get a 200 response.

Offline

#5 2012-11-01 13:00:55

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,054
Website GitHub Mastodon Twitter

Re: Why is this page so attractive?

Hi Oleg,

Did you check their ips on stopforumspam and projecthoney pot? When the ip is listed in either of those places and they are insistently hitting my site I just block it with htaccess.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#6 2012-11-01 17:38:52

etc
Developer
Registered: 2010-11-11
Posts: 5,126
Website GitHub

Re: Why is this page so attractive?

Yiannis, thank you for the links. Yes, they are listed, and will probably finish in .htaccess. But I am curious, how do they bypass txp_die? I thought the following would stop them:

register_callback('etc_filter', 'pretext_end');

function etc_filter()
{
  $banned = array(/*bad guys ip*/);
  $ip = remote_addr();
  if(in_array($ip, $banned)) txp_die('Unavailable');
}

but some (not all) of them still reach their favorite page.

Offline

#7 2012-11-01 19:00:28

Gocom
Developer Emeritus
From: Helsinki, Finland
Registered: 2006-07-14
Posts: 4,533
Website

Re: Why is this page so attractive?

The remote_addr() supports proxies and as such it gets it’s information from a X_FORWARDED_FOR HTTP header if deemed necessary.

Last edited by Gocom (2012-11-01 19:01:39)

Offline

#8 2012-11-01 19:20:55

etc
Developer
Registered: 2010-11-11
Posts: 5,126
Website GitHub

Re: Why is this page so attractive?

Yes, but what troubles me, that some IP address listed in the “bad guys ip” array above, still appears in my (txp and apache) logs, with 200 response status. I guess, log hit function uses the same remote_addr(), so how can this IP bypass my filter? If I set the article status to “hidden”, it gets an 404, as it should.

Offline

#9 2012-11-02 11:35:19

etc
Developer
Registered: 2010-11-11
Posts: 5,126
Website GitHub

Re: Why is this page so attractive?

False alert, sorry, rather a little issue with txp log tab. For some reason, ip addresses therein contain some invisible character after each dot (Firefox on W7). Since I have copied them into my “bad guys” array from txp log tab, they did not match actual ips because of these invisible chars.

Edit: more precisely, it happens for ip in “Host” column.

Last edited by etc (2012-11-02 12:58:49)

Offline

#10 2012-11-02 19:07:01

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Why is this page so attractive?

^^ that’s due to the use of zero-with space characters to enable wrapping of long words. I think I’m to blame for them occurring in that column. Perhaps better to remove them there. I’ve had the exact same problem copying the IPnr/hostname. Takes too much time to figure out something isn’t working due to those ZWSP characters.

Offline

#11 2012-11-02 19:54:42

Gocom
Developer Emeritus
From: Helsinki, Finland
Registered: 2006-07-14
Posts: 4,533
Website

Re: Why is this page so attractive?

You called? Lemme see: r4542.

Offline

Board footer

Powered by FluxBB