Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2008-03-20 04:40:21

Kossatsch
Member
From: St. Wolfgang
Registered: 2004-04-01
Posts: 198
Website

[request] Broken link checker

Anybody thinks such a thing could be useful for TXP?

Well, I do.

The Broken Links tab displays a list of invalid URLs found along with the relevant posts and the anchor text of the links. “View” and “Edit Post” do exactly what they say and “Discard” will remove the message about a broken link, but not the link itself (so it will show up again later unless you fix it; this plugin doesn’t modify your links).
By default all old posts/links are re-checked every 72 hours, or you can set a different time period.


txp at irox.de since spring 2004 (g1.17) & at roxomatic since 2006.

Offline

#2 2008-03-20 18:29:05

Mary
Sock Enthusiast
Registered: 2004-06-27
Posts: 6,236

Re: [request] Broken link checker

It runs CURL on all your links. That could get very intense if you have a lot of links. Why not use some of the other tools available?

Offline

#3 2008-03-20 19:50:30

Kossatsch
Member
From: St. Wolfgang
Registered: 2004-04-01
Posts: 198
Website

Re: [request] Broken link checker

It is not about having a link checker, it is about having a link checker as part of the TXP user interface, so that you have a kind of choice and easy follow up (something like “set this article as draft”, delete “article” or “delete link” if links are checked).

Wouldn’t it be possible to run this process paused or with few priority?


txp at irox.de since spring 2004 (g1.17) & at roxomatic since 2006.

Offline

#4 2008-06-20 03:58:55

tomk
Member
Registered: 2008-05-15
Posts: 12

Re: [request] Broken link checker

As part of a project, I’m learning how to develop a plug-in for TXP.
I’m attending your request, and making a broken link checker.
My first goal is to make it get with a regular expression all URLs from the Body_url field of each article, and send each to a script that would return the code of page’s status (200 ok, 404 page not found, etc.)

I’ll place a new tab under the Extensions tab, in the admin. interface, and a push of a button will retrieve the results, where you can see broken links (status <> 200), article ID/title (I’ll see which info. is relevant), and offer you to go to the post.

My second goal would be just what you say on your last post, offer you to set article as draft, delete link, or maybe suggest a fix automatically (depends on RegEx I’ll make).

I’d be happy to read any comments on the approach I’ll take, and all suggestions are welcome.

Offline

#5 2008-06-20 08:29:54

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,447
Website GitHub

Re: [request] Broken link checker

tomk wrote:

My first goal is to make it get with a regular expression all URLs from the Body_url field of each article…

Nice. One thing with this approach — unless I’ve misunderstood — is I’m not sure if it would pick up URLs from Forms or Pages, Links, etc. For example, I often use a form called navbar which contains all my section links and I tend to forget to update it when I rename stuff. Would your checker notice this?

In my limited knowledge of this area, wouldn’t a link checker have to either:

  1. run through each article and render/parse them into their full (X)HTML as they are going to be seen on the client side, then check each link in turn
  2. check each article body/excerpt, form, page, link, image and file reference in all the relevant tables on the admin side/database

I don’t really know what’s involved as I’ve not thought it through. Perhaps you have a better idea that I’ve not thought of at this hour on a Friday morning!

I’d guess #1 would mean checking a lot of the same links more than once, so it would be slow unless you kept a record of which links you’d already checked and skip them next time you see an identical link on another page? And if you go this route, perhaps one of the publicly available link checkers that Mary highlighted would save you having to write a plugin ;-)

Now, #2 would be incredibly useful insofar as you could tell people which form/page/etc the broken link stems from instead of just saying “there’s one” and we have to then hunt through forms/pages to find where the link is generated. Trouble is, the link might be made up like this:

<txp:site_url /><txp:section />

So a regex wouldn’t necessarily pick it up. You’d probably need to parse the form in its full article context in order to work out where the URL resolved.

I’m not trying to put you off writing it, as your ideas for extending this plugin sound great. If you can pull it off successfully it’d gain a following.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Online

#6 2008-06-20 13:30:32

tomk
Member
Registered: 2008-05-15
Posts: 12

Re: [request] Broken link checker

As a starter, I preffer starting with suggestion # 2, the original one I have, as it seems more important to me to let know which is the article where the broken link was found, and offer to go and edit it (nd maybe some more options).
TXP tags handeling could be an enhancement for afterwards (which I promise I’ll revise), but at least for a first version I’m interested for an admin-user be able to check written URLs that leads to external pages. But a next step would be making this better.

Offline

Board footer

Powered by FluxBB