Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
moved urls - any advice?
With the relaunch of neme.org and the launch of a new subdomain at news.neme.org hosting our legacy content we are left with 1627 urls which are currently returning a 404 in our main domain.
Does anyone have any experience as to how to deal with this? As the previous version on our main domain was using the /id/title
schema I can not use a general rule to redirect sections using 301 to the news site. Going the php way will also be as bad.
Adding over 1600 redirects in the htaccess file would probably end on a 500 error at worst or a massive usage of resources on each page request at best.
Any advice appreciated.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: moved urls - any advice?
A really wild guess here:
If the id number is still the same as it was, perhaps you could redirect from /id/title
to index.php?id=__
so that the correct page is shown, and then make sure you have rel="canonical"
set up properly with the current article URL. Something like:
RewriteRule ^(\d{1,})/.*$ http://news.neme.org/index.php?id=$1 [L, R=301]
Might that work? Hopefully search engines will then index the canonical url, but really I’m not sure.
What do the experts say?
— —
BTW: today neme.org is reachable from the same computer where it was unreachable yesterday.
BTW II: very nice of you to publish your templates. Great gesture of sharing.
TXP Builders – finely-crafted code, design and txp
Offline
Re: moved urls - any advice?
Hi Julian and thanks so much again.
I forgot to mention one problem.
I moved all texts to the neme.org site where they initially resigned but they now have different ids and urls. Interestingly and thankfully zem_redirect redirects neme.org/82/why-have-there-been-no-great-net-artists to its new url at neme.org/texts/why-have-there-been-no-great-net-artists in spite of the fact that the article has a new id. I did this as the texts are one of our main resources and really wanted them to remain on the main domain.
The rest of the content now at the news site lists major international art exhibitions and calls.
The problem I am facing is that if I use a rule which is going to be based on ids anyone calling one of those texts will be redirected (wrongly) to the news site. Many of those texts have been cited in a number of books and academic journals (including the old and recent urls) and we would very much like to keep them reachable to our community.
On the other hand we proportionally receive more hits through google for the rest of the pages which are of course many more.
commerce v art. For me I always chose the second but it would be nice if both are satisfied. Something I have never managed to do.
I’m glad the site is now reachable.
Re the source… I was thinking of releasing it for some years now but only got around it today. The code still needs beautification and optimisation but I thought I should post it as is, so as to push me towards that goal.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: moved urls - any advice?
colak wrote #302358:
I forgot to mention one problem. … The problem I am facing is that if I use a rule which is going to be based on ids anyone calling one of those texts will be redirected (wrongly) to the news site.
Sounds increasingly tricky. Is there at least any way of maybe identifying a group of id numbers that are your texts and a group that are not? If so maybe this stack overflow thread holds some answers. Perhaps something along the lines of answer number 4.
Another option might be to use an Apache RewriteMap if your host allows. Apparently that is better at handling large numbers of redirections, and you can prepare a from->to matching scheme as a plain text file. I have no experience of it but you might want to google it – for example. If you have any luck with that, please report back.
TXP Builders – finely-crafted code, design and txp
Offline
Re: moved urls - any advice?
Thanks!. I wrote to the host to see if they support RewriteMaps. I will of course report back.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: moved urls - any advice?
I received the answer:
you can install a simple Apache instance and configure it to perform these redirects. More information is available here:
https://community.webfaction.com/questions/14339/installing-a-private-instance-of-the-shared-apache-centos-6
the problem is that digging into the sitemaps, they actually seem to rely on regex for such volumes of redirects. A way to deal with this issue maybe is to figure out a way to re-evaluate 404s and compare them with the available ids of the subdomain. These urls of course could be saved into a comma separated flat file so as to avoid needless database requests. Basically if the id in the requested uri is in the file, a 301 could be activated and allow zem_redirect to rewrite /id/
to /id/title
. if not, a 404 is returned.
This is the theory. In practice I have no idea how to do it except from producing the flat file and seeking advice from the rubber ducks here. ^^
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: moved urls - any advice?
Hmmm… Nothing?
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: moved urls - any advice?
Perhaps it really is worth doing further research on the RewriteMap approach. From cursory reading around, it seems to be for exactly the kind of situation you have – lots of individual redirects with a non-generalisable pattern.
As far as I can tell, the regex bit is only to catch the pattern of the incoming url requests that need handling via the rewritemap. If you have dropped the ids from the url scheme in the new site, that means you only need apply the rewritemap to those urls beginning with an id, e.g. ^(\d{1,})/.*$
. Those urls would then be compared against your long list of {old-id-number} http://domain.com/new-address
entries in your txt-format rewritemap file and redirected (with a 301 permanently moved) to the new address. Your remaining incoming url requests would by-pass that and be handled by txp as usual.
If I’ve understood that correctly, that would have minimal performance impact on your regular site, as it applies only to the /123/article-url-title
style URLs (those prefixed with an id), and once the search engines have updated their indexes that situation will become increasingly infrequent too – it would then apply only to old links from other sites.
Having never tried it out, this is all “in theory” but I reckon it could get you there. I’ve no idea how you do that with webfaction’s specific setup of Apps, but you could ask them for more help with your specific situation. They can be quite helpful.
Some links for further reading:
- Plain text rewrite maps – docs on the apache site
- The rewritemap solution for multiple abbreviations – Peachpit: think id-nums instead of abbreviations here.
- Tons of 301 redirects without slowing down site – on the old webfaction community forum
- Mass redirects with the help of rewrite map – in German but the examples are code.
TXP Builders – finely-crafted code, design and txp
Offline
Re: moved urls - any advice?
jakob wrote #302411:
As far as I can tell, the regex bit is only to catch the pattern of the incoming url requests that need handling via the rewritemap. If you have dropped the ids from the url scheme in the new site, that means you only need apply the rewritemap to those urls beginning with an id, e.g.
^(\d{1,})/.*$
.
Only if it was that simple :(
We currently have 104 texts which are reached just fine using the zem_redirect plugin.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: moved urls - any advice?
colak wrote #302413:
Only if it was that simple :(
We currently have 104 texts which are reached just fine using the zem_redirect plugin.
Am I still misunderstanding you? I don’t mean passing through the id like I mentioned earlier.
I understood you that you have two kinds of urls that need redirecting:
news urls
old: neme.org/123/news-article
new: news.neme.org/news-article
texts urls (intermingled between the other ids)
old: neme.org/82/why-have-there-been-no-great-net-artists
new: neme.org/texts/why-have-there-been-no-great-net-artists (with a different id)
And the rest of the new neme.org site now just has /section/article-url-title
urls.
So, what the to-be-redirected urls have in common is an /{id-number}/
component in the url and none of the rest of the site will have that.
So, my suggestion would be to you divert all incoming url requests with the id number component to your rewritemap and then create a text file that maps the old id number to the new url for all 1627 articles. Your map file would be something like:
# old id new url
123 http://news.neme.org/news-article
82 http://www.neme.org/texts/why-have-there-been-no-great-net-artists
… and so on for all the urls to be redirected (note that they are redirecting to different sites based on the old id).
A further idea: you might even be able to shorten the rewritemap to just the 104 text urls and then use either a rewrite for the remainder or the not-found option to send those to the news site via a regex pattern.
TXP Builders – finely-crafted code, design and txp
Offline
Re: moved urls - any advice?
I suppose, as a final resort, you could do the latter with htaccess too:
- make individual redirects for the 104 text pages
- after that a final line that rewrites all the urls with an id to the news site, stripped of the id.
The htaccess would still be long, but nothing like the 1600 entries. The downside is it would be processed for all url requests… (or maybe you can do a RewriteCond
first for those urls with an id in the url, that is immediately skipped by the regular urls).
After leaving it in place for a while, you could remove them again once you’re satisfied that the search machines are using the new urls. Then for the remaining links to your site on other websites, you could set up your 404 error page to try and suggest where the page might be based on the error url. There was a thread for a 404 page that took the url and automatically did a search on the results but I can’t find it off-hand. Or you just suggest they try either of the two links: /texts/article-url-title-of-404-page
or news.neme.org/article-url-title-of-404-page
. One of them will work.
TXP Builders – finely-crafted code, design and txp
Offline
Re: moved urls - any advice?
Hi Julian,
Thanks for your patience. I generated the map. That was very easy using some basic txp tags
<txp:article_custom limit="2000" break="br" wraptag="p" sort="id asc">
<txp:article_id /> <txp:permlink />
</txp:article_custom>
The problem now is to implement it in the server:)
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline