Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2025-09-18 08:33:16

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,250
Website GitHub Mastodon Twitter

Referrer spam

I know that it is not GDPR compatible, but is there a way to have the IP addresses in the visitor logs in order to block referrer spam? In the past week, we had about 100,000 hits/day, most by a particular IP. I of course did not know that, except from the fact that our logs were over 1000 pages with 96 logs/page every time I checked for a week I ignored them but eventually decided to look at the server logs via sftp to identify the culprit.

I wonder if we could store IPs in the txp logs for 24 hours or so, for the protection of our sites.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#2 2025-09-18 17:37:47

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 5,000
Website GitHub

Re: Referrer spam

Perhaps a long shot, but I don’t see why you couldn’t switch on logging and then run a script once a day to clear the entries in txp_log that are older than that day. That could take the form of

  • a separate php file that you trigger at a certain time each day using a cronjob on your server.
  • a txp plugin that is triggered by a visit to the site and runs once a day doing the same.

Textpattern doesn’t have a “runner” that would do it at a certain point each day, so a txp plugin would have to be triggered by a visit. Said plugin would note if it had run already that day to stop it being triggered by every visit. The “risk” here is that if you have a day with no visitors (not likely with your site), then the logs may end up being stored for longer.

There is aks_cron (no idea if it still works) but as Jukka noted, it uses the same principle, not true cron.


TXP Builders – finely-crafted code, design and txp

Online

#3 Yesterday 01:30:51

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,250
Website GitHub Mastodon Twitter

Re: Referrer spam

Hi Julian,
Our txp logs expire after 1 day anyway. The issue is the load on the server, not the logs. It felt like a DoS attack.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#4 Yesterday 08:11:28

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 5,000
Website GitHub

Re: Referrer spam

Sorry, I misunderstood you. It’s specifically the IPs you’re interested in and they were removed entirely in this and this commit.

I don’t know of anything off hand, but there is a callback event log_hit (docs description) that a plugin could potentially hook into to record the hit’s IP. The IP and host columns no longer exist in the txp_logs table, so a plugin would either need to restore them, or store them in an own table. It depends how much information you need with each hit to be able judge their (un)desirability.


TXP Builders – finely-crafted code, design and txp

Online

#5 Yesterday 15:28:39

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,481
Bitbucket GitHub

Re: Referrer spam

colak wrote #340556:

The issue is the load on the server, not the logs. It felt like a DoS attack.

If you’re at the fat end of a DoS attack, there’s a chance the source IP(s) could be inaccurate.

If you’re running Apache, you could look into mod_ratelimit — that would give you something at the web server level that would take the edge off before Textpattern gets the brunt. Another consideration could be more aggressive caching so you’re essentially service flat files from a cache rather than hitting a database for each visitor.

Offline

#6 Today 02:31:57

vistopher
New Member
Registered: 2025-09-15
Posts: 3

Re: Referrer spam

colak wrote #340552:

I know that it is not GDPR compatible, but is there a way to have the IP addresses in the visitor logs in order to block referrer spam? In the past week, we had about 100,000 hits/day, most by a particular IP. I of course did not know that, except from the fact that our logs were over 1000 pages with 96 logs/page every time I checked for a week I ignored them but eventually decided to look at the server logs via sftp to identify the culprit.

I wonder if we could store IPs in the txp logs for 24 hours or so, for the protection of our sites.

Too bad txp removed the IP logging. I made a plugin doing what you requested:
https://github.com/thyelite/IP-Logging-for-Textpattern

Please let me know if you use it and if it helps you out.

Last edited by vistopher (Today 02:34:06)

Offline

#7 Today 04:18:16

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,250
Website GitHub Mastodon Twitter

Re: Referrer spam

vistopher wrote #340561:

Too bad txp removed the IP logging. I made a plugin doing what you requested:
https://github.com/thyelite/IP-Logging-for-Textpattern

Please let me know if you use it and if it helps you out.

Thanks so much!!! I have a number of things to do today but I’ll be installing it tomorrow and let you know.

Digging into the logs I found that all new ips that are attacking our site belong to google which may mean that someone is spoofing their IPs.

34.174.1.216
34.174.18.28
34.174.80.38
34.174.115.48
34.174.155.39
34.174.169.124

I’ll try to describe the severity of this. I go to the txp preferences, make logs expire in 0 days. Refresh the logs pane and there are no referrers, as expected.

I then change the expiry time to a day, refresh the logs and there are already over 400 hits. The referrer urls are all legit sites. lynda.com, google, science.org, techcrunch.com, etc. The logs reveal that there a possibly bots as they visit most/all pages of the site in less than 5”.

@Pete

I have added a ratelimit directive in the htaccess and hope that it will work.

<IfModule ratelimit_module>
SetOutputFilter RATE_LIMIT
SetEnv rate-limit 100
SetEnv rate-initial-burst 200
</IfModule>

Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#8 Today 06:09:39

vistopher
New Member
Registered: 2025-09-15
Posts: 3

Re: Referrer spam

colak wrote #340563:

Thanks so much!!! I have a number of things to do today but I’ll be installing it tomorrow and let you know.

Digging into the logs I found that all new ips that are attacking our site belong to google which may mean that someone is spoofing their IPs.

34.174.1.216
34.174.18.28
34.174.80.38
34.174.115.48
34.174.155.39
34.174.169.124

I’ll try to describe the severity of this. I go to the txp preferences, make logs expire in 0 days. Refresh the logs pane and there are no referrers, as expected.

I then change the expiry time to a day, refresh the logs and there are already over 400 hits. The referrer urls are all legit sites. lynda.com, google, science.org, techcrunch.com, etc. The logs reveal that there a possibly bots as they visit most/all pages of the site in less than 5”.

@Pete

I have added a ratelimit directive and hope that it will work.

<IfModule ratelimit_module>...

Do you have a robots.txt set up? Might set that up and that way you can determine if it’s googlebot aggressively crawling or not. I have a forum that regularly got 700 concurrent users due to crawlers and AI bots before I set up bot controls with my robots.txt and cloudflare for AI bots.

Offline

#9 Today 07:40:36

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,250
Website GitHub Mastodon Twitter

Re: Referrer spam

vistopher wrote #340564:

Do you have a robots.txt set up? Might set that up and that way you can determine if it’s googlebot aggressively crawling or not. I have a forum that regularly got 700 concurrent users due to crawlers and AI bots before I set up bot controls with my robots.txt and cloudflare for AI bots.

I do have a robots.txt buy as skewray correctly observed “evil crawlers ignore it.”

Edited to add that ratelimit_module has reduced the speed of the bots dramatically.

Last edited by colak (Today 07:43:31)


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#10 Today 11:40:31

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,942
Website GitHub

Re: Referrer spam

vistopher wrote #340561:

I made a plugin doing what you requested:
https://github.com/thyelite/IP-Logging-for-Textpattern

That’s ace, thank you. It comes a couple of days ahead of my planned changes to sneak in page-level steps to all tabular panels. We’ve had them on the Users panel for ages and I thought, while I had a few weeks’ grace, I’d roll it out to other panels. I did the Articles panel the other day. I’m working on Images. Then Files and after that was going to turn to Links, Logs and Comments. And maybe, if I get time, Sections and Themes.

Sorry you had to do a kludgy JavaScript dance to add the columns. Give me a few days and it should be easier to hook into the tables.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#11 Today 13:15:21

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,250
Website GitHub Mastodon Twitter

Re: Referrer spam

Bloke wrote #340569:

Sorry you had to do a kludgy JavaScript dance to add the columns. Give me a few days and it should be easier to hook into the tables.

Maybe I should wait then?


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

Board footer

Powered by FluxBB