Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#13 2009-05-27 13:15:43

merz1
Member
From: Hamburg
Registered: 2006-05-04
Posts: 994
Website

Re: Full page caching in TXP core

Regarding the server side bottle necks (Apache, PHP, MySQL) which play a secondary role in my request I found the following document to be very interesting:

Where did PHP go wrong? ‘quercus’ Java implementation of PHP has 9x performance. http://bit.ly/UBIFM [pdf] (But is Java any better? ;))

Some short comments from the bottom up (chronologically).

(hcgtv) where you can cache certain areas of a page

That’s OT bc thread is “Full page caching in TXP core”

With today’s quad core servers, dynamic pages are plenty fast.

Whatever the server specification is the output of cached files is faster.

On a very graphical site

The subject is not proxy mechanisms nor disk I/O.

A basic solution would be a script that runs wget and mirrors your site to disk. Then you just adjust .htaccess to point your visitors to /static/index.html. A poor man’s full page caching :)

If you have basic pages and no categories or tag pages this might work but then you might consider a static page from the beginning :-) (You are kidding, aren’t you?)

Let’s take a site that gets about 100 visitors a day, a cache is overkill.

Regarding the time to deliver a page to a visitor a cache is never overkill. The cache overhead plays a role if the majority of visitors request a non-cached page (cache statistics). This is more likely to happen on low traffic sites than on high traffic sites depending on the time the cache files are stored.


Get all online mentions of Textpattern via OPML subscription: TXP Info Sources: Textpattern RSS feeds as dynamic OPML

Offline

#14 2009-05-28 00:24:26

hcgtv
Plugin Author
From: Key Largo, Florida
Registered: 2005-11-29
Posts: 2,722
Website

Re: Full page caching in TXP core

hcgtv wrote: A basic solution would be a script that runs wget and mirrors your site to disk. Then you just adjust .htaccess to point your visitors to /static/index.html. A poor man’s full page caching :)

merz1 wrote: If you have basic pages and no categories or tag pages this might work but then you might consider a static page from the beginning :-) (You are kidding, aren’t you?)

No, I’m not kidding. With the proper switches, wget can mirror a site perfectly fine. I’m not saying that this is your solution, but it can be a solution.

Offline

#15 2009-05-28 00:46:10

artagesw
Member
From: Seattle, WA
Registered: 2007-04-29
Posts: 227
Website

Re: Full page caching in TXP core

hcgtv wrote:

No, I’m not kidding. With the proper switches, wget can mirror a site perfectly fine. I’m not saying that this is your solution, but it can be a solution.

Bert, what switches would you pass to wget to tell it to skip any pages with interactive content (such as a contact form)?

Offline

#16 2009-05-28 05:25:26

wet
Developer Emeritus
From: Schoerfling, Austria
Registered: 2005-06-06
Posts: 3,323
Website Mastodon

Re: Full page caching in TXP core

merz1 wrote:

[…]

  • Better cache control for plug-ins via core hooks.
  • Cache integrity: A single cache control center via hooks. Example: Every plug-in using external sources (or embedded PHP applications) eg SimplePie, the RSS parser, could read a ‘purge signal’ to delete the RSS cache.
  • A forced cache option for pages and single articles: Save and write to cache, save and don’t cache.
  • More granularity: Cache this section, don’t cache that section.
  • User cache: Own cache content/directory for logged-in users.
  • Fast preview in edit mode: Save, cache & view draft versions much faster. (The draft preview mode is awesome but the time it takes to save & view the article sometimes sucks.)
  • Collision handling for plug-ins. Example: A plug-in reading referrers and rendering parts of the page depending on a referrer could throw a warning (admin side).
  • Different cache lifetime cycles. Example: Cached CSS files only need to be updated by a forced refresh or by saving a new version.

With just one user’s (well thought-out) specs, we already have a massive feature set with a multitude of configuration options. OTOH, it’s a niche requirement so I’d rather see this as an optional component than as an extra mandatory burden for small applications.

From my POV, implementing signals and hooks into the core to send and receive abstract caching hints to/from plugins would be the way to go. For the sending part, all instances of update_lastmod() might serve as a starting point. Any concrete implementation should be added by plugins.

Offline

#17 2009-05-28 11:37:12

merz1
Member
From: Hamburg
Registered: 2006-05-04
Posts: 994
Website

Re: Full page caching in TXP core

Thanks Robert! I never said “mandatory” :)

I would like to get feedback regarding the single points on the list. I think that there are no too many additional feature requests.

From my POV, implementing signals and hooks into the core to send and receive abstract caching hints to/from plugins would be the way to go. For the sending part, all instances of update_lastmod() might serve as a starting point. Any concrete implementation should be added by plugins.

I agree that TXP core should have ‘signals and hooks’.

Q: I have no idea if my main full page cache feature request ‘look for HTML files via .htaccess rule first’ can be accomplished by a plug-in?
Q: Same question for ‘cache this, don’t cache that’?

(If plug-in solution:) It would be great if the TXP core changes implement a check for a present (active cache) plug-in including the option to control the on/off state.

  1. Cache is active and can be switched off
  2. Cache is active and can be purged
  3. Cache is deactivated and can be switched on

A new full page cache (or also a partial page cache) solution/concept (via plug-in or not) should at least implement that users don’t have to edit core files like the main index.php. This seems to be the biggest burden for asy_jpcache for new TXP installations or after TXP updates. A 2nd benefit would be that diagnosis doesn’t throw a ‘changed file’ warning.


Get all online mentions of Textpattern via OPML subscription: TXP Info Sources: Textpattern RSS feeds as dynamic OPML

Offline

#18 2009-05-28 14:52:45

hcgtv
Plugin Author
From: Key Largo, Florida
Registered: 2005-11-29
Posts: 2,722
Website

Re: Full page caching in TXP core

artagesw wrote:

Bert, what switches would you pass to wget to tell it to skip any pages with interactive content (such as a contact form)?

You can use the —reject switch to avoid certain patterns in file names – Types of Files.

Offline

#19 2009-05-28 17:20:04

artagesw
Member
From: Seattle, WA
Registered: 2007-04-29
Posts: 227
Website

Re: Full page caching in TXP core

hcgtv wrote:

You can use the —reject switch to avoid certain patterns in file names – Types of Files.

Cool. That looks fairly powerful, actually.

Offline

#20 2009-05-28 20:05:33

artagesw
Member
From: Seattle, WA
Registered: 2007-04-29
Posts: 227
Website

Re: Full page caching in TXP core

So let’s say I wanted to set up an automated self-updating static cache using wget. How would I set that up?

The main issue I see is how to keep the static cache fresh.

Let’s say you set up a cron job to wget the entire site every 10 minutes.

1. If wget is just hitting the main URL for the site in order to build the static cache, then wget would itself retrieve files from the static cache once it has been created. (It would end up getting files served from the cache just as any browser would.) So, it would never update existing files. In other words, how does the cache become invalidated?

2. Let’s say we address (1) by simply blowing away the entire static cache prior to each wget (a crude but effective solution). Now, how often should the cron job run? A 10 minute interval would mean on average, a stale version of a changed page would be served for 5 minutes before being refreshed. Running it more frequently could put undesirable extra load on the web server.

3. This is clearly lot less efficient than an intelligent caching system that can invalidate/update just a single page that has changed, and do so instantly.

4. Also, it’s a lot less user-friendly (harder to manage, etc.) than a built-in caching solution would be.

So, although I agree you could put together a workable “poor man’s cache” with the above techniques, and it might work fine for some (smallish) sites, I don’t think it approaches the utility of a fully baked-in solution.

Just my 2 cents.

Offline

#21 2009-05-28 23:38:02

hcgtv
Plugin Author
From: Key Largo, Florida
Registered: 2005-11-29
Posts: 2,722
Website

Re: Full page caching in TXP core

artagesw wrote:

So let’s say I wanted to set up an automated self-updating static cache using wget. How would I set that up?

Write a script and run it from a plugin in the admin area. The publisher of the site would trigger the wget script when the site is changed by either adding or changing an article or by modifying the look and feel.

You could also create a plugin that would monitor certain changes in the backend and would itself be the trigger, but what if you need to create an article and also tweak the CSS, how would the plugin know when you’re done with your changes?

Another option is to run the script via a cron job every 10 minutes, it would check for the existence of a file in a directory. If the file exists, the site was changed, so run wget and wipe out the file when done.

1. In other words, how does the cache become invalidated?

Check out how wget uses time-stamping, it checks the timestamp of the html file in cache against what your site returns via the Last-Modified header. See Send “Last-Modified” header? in the advanced preferences.

So, although I agree you could put together a workable “poor man’s cache” with the above techniques, and it might work fine for some (smallish) sites, I don’t think it approaches the utility of a fully baked-in solution.

Yes, it doesn’t come close to a full core solution but it could help a site sustain a Digg effect.

Offline

#22 2009-05-28 23:51:10

artagesw
Member
From: Seattle, WA
Registered: 2007-04-29
Posts: 227
Website

Re: Full page caching in TXP core

hcgtv wrote:

Check out how wget uses time-stamping, it checks the timestamp of the html file in cache against what your site returns via the Last-Modified header. See Send “Last-Modified” header? in the advanced preferences.

OK, so you are assuming that the “public facing” domain for the site is being served directly from the cache directory, and wget is fetching the underlying Textpattern site via a private IP? Otherwise, wget is not going to see any Last-Modified headers. If apache is serving pages from the cache (via mod-rewrite rules), then wget is also going to be fed from the cache – not from Textpattern…

Offline

#23 2009-05-29 00:15:22

hcgtv
Plugin Author
From: Key Largo, Florida
Registered: 2005-11-29
Posts: 2,722
Website

Re: Full page caching in TXP core

artagesw wrote:

OK, so you are assuming that the “public facing” domain for the site is being served directly from the cache directory, and wget is fetching the underlying Textpattern site via a private IP?

Yes, you have to point wget at your dynamic Textpattern site so it can build the static pages. Whatever means you may employ, private IP, subdomain for live site or using .htaccess to detect wget and feed it the live site while serving static site to every other user agent.

We can also make a special user agent for wget, should you want to be sure to feed the live site to your own wget process as opposed to someone using wget to leech your site.

Offline

#24 2009-05-29 00:30:34

artagesw
Member
From: Seattle, WA
Registered: 2007-04-29
Posts: 227
Website

Re: Full page caching in TXP core

hcgtv wrote:

Yes, you have to point wget at your dynamic Textpattern site so it can build the static pages. Whatever means you may employ, private IP, subdomain for live site or using .htaccess to detect wget and feed it the live site while serving static site to every other user agent.

OK, so now the issue of interactive pages creeps back in. If the public domain is pointing at the static site, and we have configured wget to ignore certain interactive pages (forms and such), then how does the user reach those pages? Mod-rewrite on a per-URL basis?

Offline

Board footer

Powered by FluxBB