Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#1 2007-07-14 21:52:50
- Podflower
- New Member
- Registered: 2007-05-24
- Posts: 6
Automated spam search queries: query caching?
Hi,
I have 2 sites that run on Textpattern on a 512MB VPS. Over the past month the server has been crashing daily. After much to-ing and fro-ing with our host, they have said that the MySQL is being bombarded with Spam queries:
| 26313 | latefilm | localhost | latefilm | Query | 23 | Sorting
result | select *, unix_timestamp(Posted) as uPosted, match (Title,Body)
against (‘Hi! Good site respect! <a |
| 26314 | latefilm | localhost | latefilm | Query | 66 | Sending
data | select count(*) from textpattern where 1 and Status = 4 and
Posted <= now() and (Title rlike ‘Hello! |
| 26315 | latefilm | localhost | latefilm | Query | 61 | Sending
data | select count(*) from textpattern where 1 and Status = 4 and
Posted <= now() and (Title rlike ‘Hi! Go |
| 26339 | latefilm | localhost | latefilm | Query | 18 | Sending
data | select count(*) from textpattern where 1 and Status = 4 and
Posted <= now() and (Title rlike ‘Hi! Go |
| 26343 | latefilm | localhost | latefilm | Query | 14 | Sending
data | select count(*) from textpattern where 1 and Status = 4 and
Posted <= now() and (Title rlike ‘Hi! Go |
| 26344 | admin | localhost | NULL | Query | 0 | NULL
| show processlist
|
| 26347 | latefilm | localhost | latefilm | Query | 2 | Sending
data | select count(*) from textpattern where 1 and Status = 4 and
Posted <= now() and (Title rlike ‘Hi! Go |
| 26348 | latefilm | localhost | latefilm | Query | 1 | Sending
data | select count(*) from textpattern where 1 and Status = 4 and
Posted <= now() and (Title rlike ‘Hello! |
———-+—————+—————-+—————+————-+———+————————+———————————————————————————————————————————————————
51 rows in set (0.00 sec)
That is just an example of what it looked like, the list is actually very long.
Our host told me that Textpattern doesnt allow for query caching (is this correct?) and this is something that would go a long way to fixing the issue.
I really dont know much at all about how the database works but from what I understand a query cache creates static cached pages so the spam wouldn’t call on the database for each query, therefore massively reducing the server use. Is there a pluggin that can do this or has anyone had the same issue and can suggest a way to help?
The issue has got so bad that it crashes the server (VPS 512MB 1TB bandwidth) a few times a day. The sites are just a hobby and I can’t afford any more memory on the server!
Thanks
(Edit: updated discussion topic. -Mary)
Last edited by Mary (2007-07-15 22:39:05)
Offline
#2 2007-07-14 21:55:07
- Podflower
- New Member
- Registered: 2007-05-24
- Posts: 6
Re: Automated spam search queries: query caching?
P.S They spammers are not actually leaving any comments on the sites. We banned a lot of IP’s. It seems to effect the server even though they dont leave any comments.
Offline
Re: Automated spam search queries: query caching?
Those queries are not related to comments, but they’re all about using the search feature on your website. Typically search queries are not identical so caching them doesn’t work, especially since the search results will change in time (due to new articles being published or reaching publication date).
If all those spam queries contain “from textpattern where” and “Title rlike”, then you may want to consider removing the search from your website.
Offline
Re: Automated spam search queries: query caching?
There are some caching plugins, like zem_cache or asy_jpcache that can reduce queries but (each of them) also has their own drawbacks, you could look into them. But as ruud explained, the queries you mention are unrelated to any caching.
In the faq are some entries that deal with diagnosing performance issues, take a look at the suggestions there as well, as it’ll give ideas on how to improve your templates to be more resource-friendly.
Our host told me that Textpattern doesnt allow for query caching (is this correct?) and this is something that would go a long way to fixing the issue.
I don’t what he meant by that. Query-caching could mean a function of mysql, in which case textpattern cannot have any influence on the feature. Or he might have meant that the software should cache queries – which textpattern does support with a few plugins (like zem_cache, asy_jpcache), and additionally there is 304/Last-Modified functionality built-in. The total amount of queries is mostly related to the complexity of the template though, and potentially plugins, and herefor see the above suggestion.
As for the example queries you posted – like ruud explained no amount caching or reducing complexity could affect those kinds of queries. If earch is an issue, there are ways to modify the search functionality which could improve performance but come at the expense of matching less articles – plus it would (likely) only make much of a difference if there are many thousand very longish articles. Before fiddling with it, you should irst establish whether that’s the actually the main cause of the problem, though.
Offline
#5 2007-07-15 22:37:36
- Mary
- Sock Enthusiast
- Registered: 2004-06-27
- Posts: 6,236
Re: Automated spam search queries: query caching?
You could make use of some simple slowing down measures.
For example, in all pages where you don’t support searches, you could do (before any article tags):
<txp:if_search><txp:txp_die /></txp:if_search>
That will give an error page if you’re trying to pass a search request other than through a legitimate search form you’ve set up.
And on pages where you do support search, do (before any article tags):
<txp:if_search><txp:php>sleep(1);</txp:php></txp:if_search>
That would force the server to wait 1 second before continuing. That will grind down automated searches quite a bit, and you can increase that number a little if needed (not too much though, or real visitors will be penalized for running legitimate searches).
Offline
#6 2007-07-16 16:46:23
- ehater
- New Member
- Registered: 2007-07-16
- Posts: 3
Re: Automated spam search queries: query caching?
Is there any reason to use “and Posted <= now()” in the search query? Since you’re using now(), the query cannot be cached by MySQL query cache and has to be executed every single time. If the query was cached, the effect of search/comment spam would be reduced by quite a lot
Sencer wrote:
I don’t what he meant by that. Query-caching could mean a function of mysql, in which case textpattern cannot have any influence on the feature.
….
As for the example queries you posted – like ruud explained no amount caching or reducing complexity could affect those kinds of queries. If earch is an issue, there are ways to modify the search functionality which could improve performance but come at the expense of matching less articles – plus it would (likely) only make much of a difference if there are many thousand very longish articles. Before fiddling with it, you should irst establish whether that’s the actually the main cause of the problem, though.
Last edited by ehater (2007-07-16 16:52:38)
Offline
Re: Automated spam search queries: query caching?
Textpattern allows you to set a publication date for articles. Without the “Posted <= now()” part, searching could reveal articles whose publication date has not yet been reached.
Offline
#8 2007-07-16 18:02:08
- ehater
- New Member
- Registered: 2007-07-16
- Posts: 3
Re: Automated spam search queries: query caching?
I understand the design choice, but at this point the resilience to automated spam/search queries is absolutely terrible. The server in question was bombarded with stacking queries, each search result would take longer and longer to execute. Here’s the slow query log: “# Query_time: 1186 Lock_time: 0 Rows_sent: 0 Rows_examined: 1897”. That’s 1186 seconds to send back zero results.
And there’s nothing can be done on the server side because of of the nature of the queries ( you using rlike and now() ) . Because you’re using now() MySQL server cannot cache queries and because you’re using rlike, it
doesn’t use indexes at all and has to scan the full table. Please read this to get more insight about rlike: http://forums.mysql.com/read.php?24,37136,37183#msg-37183
Here are the example queries on a completely idle server. It takes 35 seconds to execute this query, imagine what happens when there are dozens of them waiting on each other and the spambots submitting more and more. Either you have to say in system requirements that a dedicated server is needed to utilize textpattern search function or change the way you do search queries completely. You can store the precision value (10-30minutes) in a table somewhere and use that timestamp instead of now() to make queries cacheable and use fulltext search instead of rlike
Using now(); , each query gets executed for 35 seconds
mysql> select count(*) from textpattern where 1 and Status = 4 and Posted <= now() and (Title rlike ‘Hello! Perfect and very interestin wed portal and visit still <a href=\“http://seyong.info/nude-yoga/index.html\”>nude yoga</a> and <a href=\“http://seyong.info/nude-boys/index.html\”>nude boys</a> and <a href= http://seyong.info/nude-yoga/index.html >nude yoga</a> and <a href= http://seyong.info/nude-boys/index.html >nude boys</a> Thanks!’ or Body rlike ‘Hello! Perfect and very interestin wed portal and visit still <a href=\“http://seyong.info/nude-yoga/index.html\”>nude yoga</a> and <a href=\“http://seyong.info/nude-boys/index.html\”>nude boys</a> and <a href= http://seyong.info/nude-yoga/index.html >nude yoga</a> and <a href= http://seyong.info/nude-boys/index.html >nude boys</a> Thanks!’);
—————
| count(*) |
—————
| 0 |
—————
1 row in set (35.94 sec)
Not using now();, query gets executed for 32 seconds the first time and 0 seconds second time because it’s in MySQl query cache;
mysql> select count(*) from textpattern where 1 and Status = 4 and Posted <= ’2007-07-13 08:08:12’ and (Title rlike ‘Hello! Perfect and very interestin wed portal and visit still <a href=\“http://seyong.info/nude-yoga/index.html\”>nude yoga</a> and <a href=\“http://seyong.info/nude-boys/index.html\”>nude boys</a> and <a href= http://seyong.info/nude-yoga/index.html >nude yoga</a> and <a href= http://seyong.info/nude-boys/index.html >nude boys</a> Thanks!’ or Body rlike ‘Hello! Perfect and very interestin wed portal and visit still <a href=\“http://seyong.info/nude-yoga/index.html\”>nude yoga</a> and <a href=\“http://seyong.info/nude-boys/index.html\”>nude boys</a> and <a href= http://seyong.info/nude-yoga/index.html >nude yoga</a> and <a href= http://seyong.info/nude-boys/index.html >nude boys</a> Thanks!’);
—————
| count(*) |
—————
| 0 |
—————
1 row in set (32.42 sec)
mysql> select count(*) from textpattern where 1 and Status = 4 and Posted <= ’2007-07-13 08:08:12’ and (Title rlike ‘Hello! Perfect and very interestin wed portal and visit still <a href=\“http://seyong.info/nude-yoga/index.html\”>nude yoga</a> and <a href=\“http://seyong.info/nude-boys/index.html\”>nude boys</a> and <a href= http://seyong.info/nude-yoga/index.html >nude yoga</a> and <a href= http://seyong.info/nude-boys/index.html >nude boys</a> Thanks!’ or Body rlike ‘Hello! Perfect and very interestin wed portal and visit still <a href=\“http://seyong.info/nude-yoga/index.html\”>nude yoga</a> and <a href=\“http://seyong.info/nude-boys/index.html\”>nude boys</a> and <a href= http://seyong.info/nude-yoga/index.html >nude yoga</a> and <a href= http://seyong.info/nude-boys/index.html >nude boys</a> Thanks!’);
—————
| count(*) |
—————
| 0 |
—————
1 row in set (0.00 sec)
Offline
Re: Automated spam search queries: query caching?
What happens to the query times if you replace (in /textpattern/publish.php line 587):
and (Title rlike '$q' or Body rlike '$q')
with:
and match (Title,Body) against ('$q')
That should use an index and be much much faster, but could cause problems (no matches) with search queries that have keywords of less than 4 chars in length or are considered stop words (which is probably why rlike is used instead by TXP).
I’m also very curious how many articles are stored in that TXP installation and how big the main textpattern table is, because 30+ seconds is rather high.
Last edited by ruud (2007-07-16 18:39:47)
Offline
#10 2007-07-16 18:44:31
- ehater
- New Member
- Registered: 2007-07-16
- Posts: 3
Re: Automated spam search queries: query caching?
This query takes 34.95 sec and returns empty set
select *, unix_timestamp(Posted) as uPosted, match (Title,Body) against (‘Hi! Good site respect! Visit <a href=\“http://seyong.info/celebs-nude/index.html\”>celebs nude</a> and <a href=\“http://seyong.info/nude-boy/index.html\”>nude boy</a> and <a href= http://seyong.info/celebs-nude/index.html >celebs nude</a> and <a href= http://seyong.info/nude-boy/index.html >nude boy</a> Thanks!’) as score from textpattern where 1 and Status = 4 and Posted <= now() and (Title rlike ‘Hi! Good site respect! Visit <a href=\“http://seyong.info/celebs-nude/index.html\”>celebs nude</a> and <a href=\“http://seyong.info/nude-boy/index.html\”>nude boy</a> and <a href= http://seyong.info/celebs-nude/index.html >celebs nude</a> and <a href= http://seyong.info/nude-boy/index.html >nude boy</a> Thanks!’ or Body rlike ‘Hi! Good site respect! Visit <a href=\“http://seyong.info/celebs-nude/index.html\”>celebs nude</a> and <a href=\“http://seyong.info/nude-boy/index.html\”>nude boy</a> and <a href= http://seyong.info/celebs-nude/index.html >celebs nude</a> and <a href= http://seyong.info/nude-boy/index.html >nude boy</a> Thanks!’) order by score desc limit 0, 10;
This takes 0.08 seconds and returns 10 rows…. different result
select *, unix_timestamp(Posted) as uPosted, match (Title,Body) against (‘Hi! Good site respect! Visit <a href=\“http://seyong.info/celebs-nude/index.html\”>celebs nude</a> and <a href=\“http://seyong.info/nude-boy/index.html\”>nude boy</a> and <a href= http://seyong.info/celebs-nude/index.html >celebs nude</a> and <a href= http://seyong.info/nude-boy/index.html >nude boy</a> Thanks!’) as score from textpattern where 1 and Status = 4 and Posted <= now() order by score desc limit 0, 10;
Offline
Re: Automated spam search queries: query caching?
And this query:
select *, unix_timestamp(Posted) as uPosted, match (Title,Body) against
('Hi! Good site respect! Visit <a href=\"http://seyong.info/celebs-nude/index.html\">celebs nude</a> and <a href=\"http://seyong.info/nude-boy/index.html\">nude boy</a> and <a href= http://seyong.info/celebs-nude/index.html >celebs nude</a> and <a href= http://seyong.info/nude-boy/index.html >nude boy</a> Thanks!')
as score
from textpattern
where 1 and Status = 4 and Posted <= now() and (match (Title,Body) against
('Hi! Good site respect! Visit <a href=\"http://seyong.info/celebs-nude/index.html\">celebs nude</a> and <a href=\"http://seyong.info/nude-boy/index.html\">nude boy</a> and <a href= http://seyong.info/celebs-nude/index.html >celebs nude</a> and <a href= http://seyong.info/nude-boy/index.html >nude boy</a> Thanks!'))
order by score desc limit 0, 10;
This should return an empty result set but probably also takes around 0.08 seconds because both match statements are identical and therefore reused in MySQL.
Last edited by ruud (2007-07-16 19:08:57)
Offline
Re: Automated spam search queries: query caching?
So, it look slike your options are:
- Use Mary’s suggestion
- Write a little plugin that checks for urls submited in the search field, and show an error page to prevent the automated spam-queries from the irritated spambot. Should be no more than a few lines.
- Modify the code to remove/replace the parts and trade convenience/features for performance. This is trickier as it needs more know-how, and requires rework for every update.
I’d go with option 1 or two if I was you.
Offline