Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Search excerpt tag
I wasn’t complaining about the split, I was congratulating TXP on splitting it without validation problems. The search is looking for occurrences of “txp” not for txp tags so it is doing what is asked for. Also note that the example ends with <txp:
so it has managed to split 2 tags within a single excerpt without problem.
I would be perfectly happy with that. :)
Do you want me to try this new code out as well?
Stuart
In a Time of Universal Deceit
Telling the Truth is Revolutionary.
Offline
Re: Search excerpt tag
You could try out that new code… but I’ve found another possible solution: simply merge extract parts if the end of one part is the same as the beginning of the following part:
$q = $pretext['q'];
$result = preg_replace('/\s+/', ' ', strip_tags(str_replace('><', '> <', $thisarticle['body'])));
- preg_match_all("/\b.{1,50}".preg_quote($q).".{1,50}\b/iu", $result, $concat);
+ preg_match_all("/(^|\s|\G).{1,50}".preg_quote($q).".{1,50}(\s|$)/iu", $result, $concat, PREG_OFFSET_CAPTURE);
- for ($i = 0, $r = array(); $i < min($limit, count($concat[0])); $i++)
+ for ($end = -1, $i = 0, $r = array(); $i < min($limit, count($concat[0])); $i++)
{
- $r[] = trim($concat[0][$i]);
+ if ($end == $concat[0][$i][1])
+ {
+ array_push($r, array_pop($r) . $concat[0][$i][0]);
+ }
+ else
+ {
+ $r[] = $concat[0][$i][0];
+ }
+
+ $end = strlen($concat[0][$i][0]) + $concat[0][$i][1];
}
+ $r = array_map('trim', $r);
+
$concat = join($break.n, $r);
$concat = preg_replace('/^[^>]+>/U', '', $concat);
For this article:
txp-begin This theme will only work with Textpattern 4.0.7 or above. I make no apologies for that. The theme makes use of the new “tags-within-tags” ability of the latest parser. It also uses the new tags @<txp:variable />@, @<txp:if_variable>@, @<txp:if_keywords>@ and @<txp:modified />@.
With the exception of @<txp:if_keywords>@ all the above tags are used in the “meta-data” column for the articles in the “articles” section. end-txp
this gives me the following excerpt:
…txp-begin This theme will only work with Textpattern … the latest parser. It also uses the new tags <txp:variable />, <txp:if_variable>, <txp:if_keywords> and <txp:modified />. With the exception of <txp:if_keywords> all the above tags are used in … for the articles in the “articles” section. end-txp …
Offline
Re: Search excerpt tag
heh… try searching for part of a html entity like the ‘8217’ from ’
… oops. This doesn’t happen on your website (because most of the entities in your text were added by textile), but if you include such an entity on the write tab yourself, that will happen.
This would be a very bad way to fix it:
$q = $pretext['q'];
$result = preg_replace('/\s+/', ' ', strip_tags(str_replace('><', '> <', $thisarticle['body'])));
- preg_match_all("/\b.{1,50}".preg_quote($q).".{1,50}\b/iu", $result, $concat);
+ preg_match_all("/(^|\s|\G).{0,50}".preg_quote($q).".{0,50}(\s|$)/iu", $result, $concat, PREG_OFFSET_CAPTURE);
- for ($i = 0, $r = array(); $i < min($limit, count($concat[0])); $i++)
+ for ($end = -1, $i = 0, $r = array(); $i < min($limit, count($concat[0])); $i++)
{
- $r[] = trim($concat[0][$i]);
+ if ($end == $concat[0][$i][1])
+ {
+ array_push($r, array_pop($r) . $concat[0][$i][0]);
+ }
+ else
+ {
+ $r[] = $concat[0][$i][0];
+ }
+
+ $end = strlen($concat[0][$i][0]) + $concat[0][$i][1];
}
+ $callback = create_function('$str', "return preg_replace('/^(\w{0,10};|[^>]+>)\s*/U', '', preg_replace('/\s*&[^;]*\$/', '', trim(\$str)));");
+ $r = array_map($callback, $r);
+
+ $concat = '';
+
+ for ($i = 0; $i < count($i); $i++)
+ {
+ $p = preg_split('/&#?\w{0,10};/', $r[$i], -1, PREG_SPLIT_DELIM_CAPTURE);
+
+ for ($j = 0; $j < count($p); $j += 2)
+ {
+ $p = preg_split('/&#?\w{0,10};/', $r[$i], -1, PREG_SPLIT_DELIM_CAPTURE);
+
+ for ($j = 0; $j < count($p); $j += 2)
+ {
+ $p[$j] = preg_replace('/('.preg_quote($q).')/i', "<$hilight>$1</$hilight>", $p[$j]);
+ }
+
+ $r[$i] = join('', $p);
+ }
+
$concat = join($break.n, $r);
- $concat = preg_replace('/^[^>]+>/U', '', $concat);
- $concat = preg_replace('/('.preg_quote($q).')/i', "<$hilight>$1</$hilight>", $concat);
return ($concat) ? trim($break.$concat.$break) : '';
}
because while it creates excerpts that pass validation, it still gives search results if you search for part of an entity… and that’s not really what you want, because you don’t see such a search keyword anywhere in the article when it’s displayed. Hmm… surely, that can be done better.
Last edited by ruud (2008-12-30 14:11:15)
Offline
Re: Search excerpt tag
On that site I’m going to open up the “About” section of the theme for searching as I’m sure it contains articles that have non-textiled entities in them and see what I get.
Hang on. What happens if an article contains something like <div>
on it’s own line but has a blank space in front of it so that it isn’t textiled? How does that get saved?
Anyway, doing a search for “div” gives two results.
The first is what I think would be termed a “partial” match which is fine.
The second only displays the article title, no excerpt(s). This particular article does not contain the word “div” but it does contain a lot of <div>
tags.
Last edited by thebombsite (2008-12-31 16:03:32)
Stuart
In a Time of Universal Deceit
Telling the Truth is Revolutionary.
Offline
Re: Search excerpt tag
Tags are stripped from excerpts, so they never show up. Searching for part of a HTML entity is the only thing that really causes problems.
The patch shown above is far too invasive for 4.0.8. I might go for a small patch like the one in post #12. Not perfect, but enough to avoid the most common problems.
Offline
Re: Search excerpt tag
OK. I’m still using the code from #10 so I’ll swap to #12 and let you know if anything odd occurs.
Stuart
In a Time of Universal Deceit
Telling the Truth is Revolutionary.
Offline
Re: Search excerpt tag
Only the first part of #12
Offline
Re: Search excerpt tag
Yep. :)
Stuart
In a Time of Universal Deceit
Telling the Truth is Revolutionary.
Offline
Re: Search excerpt tag
I should just mention that I’ve included the new code in thebombsite.com and I’m not seeing any problems with it yet. Doing a search for “txp” throws out a lot more results and everything seems to validate. I’ll let you know if I catch anything.
Stuart
In a Time of Universal Deceit
Telling the Truth is Revolutionary.
Offline