Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#13 2008-12-30 13:05:05

thebombsite
Archived Plugin Author
From: Exmouth, England
Registered: 2004-08-24
Posts: 3,251
Website

Re: Search excerpt tag

I wasn’t complaining about the split, I was congratulating TXP on splitting it without validation problems. The search is looking for occurrences of “txp” not for txp tags so it is doing what is asked for. Also note that the example ends with <txp: so it has managed to split 2 tags within a single excerpt without problem.

I would be perfectly happy with that. :)

Do you want me to try this new code out as well?


Stuart

In a Time of Universal Deceit
Telling the Truth is Revolutionary.

Offline

#14 2008-12-30 13:15:20

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Search excerpt tag

You could try out that new code… but I’ve found another possible solution: simply merge extract parts if the end of one part is the same as the beginning of the following part:

 		$q = $pretext['q'];

 		$result = preg_replace('/\s+/', ' ', strip_tags(str_replace('><', '> <', $thisarticle['body'])));
-		preg_match_all("/\b.{1,50}".preg_quote($q).".{1,50}\b/iu", $result, $concat);
+		preg_match_all("/(^|\s|\G).{1,50}".preg_quote($q).".{1,50}(\s|$)/iu", $result, $concat, PREG_OFFSET_CAPTURE);

-		for ($i = 0, $r = array(); $i < min($limit, count($concat[0])); $i++)
+		for ($end = -1, $i = 0, $r = array(); $i < min($limit, count($concat[0])); $i++)
 		{
-			$r[] = trim($concat[0][$i]);
+			if ($end == $concat[0][$i][1])
+			{
+				array_push($r, array_pop($r) . $concat[0][$i][0]);
+			}
+			else
+			{
+				$r[] = $concat[0][$i][0];
+			}
+
+			$end = strlen($concat[0][$i][0]) + $concat[0][$i][1];
 		}

+		$r = array_map('trim', $r);
+
 		$concat = join($break.n, $r);
 		$concat = preg_replace('/^[^>]+>/U', '', $concat);

For this article:

txp-begin This theme will only work with Textpattern 4.0.7 or above. I make no apologies for that. The theme makes use of the new “tags-within-tags” ability of the latest parser. It also uses the new tags @<txp:variable />@, @<txp:if_variable>@, @<txp:if_keywords>@ and @<txp:modified />@.

With the exception of @<txp:if_keywords>@ all the above tags are used in the “meta-data” column for the articles in the “articles” section. end-txp

this gives me the following excerpt:

…txp-begin This theme will only work with Textpattern … the latest parser. It also uses the new tags <txp:variable />, <txp:if_variable>, <txp:if_keywords> and <txp:modified />. With the exception of <txp:if_keywords> all the above tags are used in … for the articles in the “articles” section. end-txp …

Offline

#15 2008-12-30 13:36:57

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Search excerpt tag

heh… try searching for part of a html entity like the ‘8217’ from &#8217 … oops. This doesn’t happen on your website (because most of the entities in your text were added by textile), but if you include such an entity on the write tab yourself, that will happen.

This would be a very bad way to fix it:

               $q = $pretext['q'];

                $result = preg_replace('/\s+/', ' ', strip_tags(str_replace('><', '> <', $thisarticle['body'])));
-               preg_match_all("/\b.{1,50}".preg_quote($q).".{1,50}\b/iu", $result, $concat);
+               preg_match_all("/(^|\s|\G).{0,50}".preg_quote($q).".{0,50}(\s|$)/iu", $result, $concat, PREG_OFFSET_CAPTURE);

-               for ($i = 0, $r = array(); $i < min($limit, count($concat[0])); $i++)
+               for ($end = -1, $i = 0, $r = array(); $i < min($limit, count($concat[0])); $i++)
                {
-                       $r[] = trim($concat[0][$i]);
+                       if ($end == $concat[0][$i][1])
+                       {
+                               array_push($r, array_pop($r) . $concat[0][$i][0]);
+                       }
+                       else
+                       {
+                               $r[] = $concat[0][$i][0];
+                       }
+
+                       $end = strlen($concat[0][$i][0]) + $concat[0][$i][1];
                }

+               $callback = create_function('$str', "return preg_replace('/^(\w{0,10};|[^>]+>)\s*/U', '', preg_replace('/\s*&[^;]*\$/', '', trim(\$str)));");
+               $r = array_map($callback, $r);
+
+               $concat = '';
+
+               for ($i = 0; $i < count($i); $i++)
+               {
+                       $p = preg_split('/&#?\w{0,10};/', $r[$i], -1, PREG_SPLIT_DELIM_CAPTURE);
+
+                       for ($j = 0; $j < count($p); $j += 2)
+               {
+                       $p = preg_split('/&#?\w{0,10};/', $r[$i], -1, PREG_SPLIT_DELIM_CAPTURE);
+
+                       for ($j = 0; $j < count($p); $j += 2)
+                       {       
+                               $p[$j] = preg_replace('/('.preg_quote($q).')/i', "<$hilight>$1</$hilight>", $p[$j]);
+                       }
+
+                       $r[$i] = join('', $p);
+               }
+
                $concat = join($break.n, $r);
-               $concat = preg_replace('/^[^>]+>/U', '', $concat);
-               $concat = preg_replace('/('.preg_quote($q).')/i', "<$hilight>$1</$hilight>", $concat);

                return ($concat) ? trim($break.$concat.$break) : '';
        }

because while it creates excerpts that pass validation, it still gives search results if you search for part of an entity… and that’s not really what you want, because you don’t see such a search keyword anywhere in the article when it’s displayed. Hmm… surely, that can be done better.

Last edited by ruud (2008-12-30 14:11:15)

Offline

#16 2008-12-31 15:02:30

thebombsite
Archived Plugin Author
From: Exmouth, England
Registered: 2004-08-24
Posts: 3,251
Website

Re: Search excerpt tag

On that site I’m going to open up the “About” section of the theme for searching as I’m sure it contains articles that have non-textiled entities in them and see what I get.

Hang on. What happens if an article contains something like <div> on it’s own line but has a blank space in front of it so that it isn’t textiled? How does that get saved?

Anyway, doing a search for “div” gives two results.
The first is what I think would be termed a “partial” match which is fine.
The second only displays the article title, no excerpt(s). This particular article does not contain the word “div” but it does contain a lot of <div> tags.

Last edited by thebombsite (2008-12-31 16:03:32)


Stuart

In a Time of Universal Deceit
Telling the Truth is Revolutionary.

Offline

#17 2008-12-31 15:35:58

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Search excerpt tag

Tags are stripped from excerpts, so they never show up. Searching for part of a HTML entity is the only thing that really causes problems.

The patch shown above is far too invasive for 4.0.8. I might go for a small patch like the one in post #12. Not perfect, but enough to avoid the most common problems.

Offline

#18 2008-12-31 16:05:26

thebombsite
Archived Plugin Author
From: Exmouth, England
Registered: 2004-08-24
Posts: 3,251
Website

Re: Search excerpt tag

OK. I’m still using the code from #10 so I’ll swap to #12 and let you know if anything odd occurs.


Stuart

In a Time of Universal Deceit
Telling the Truth is Revolutionary.

Offline

#19 2008-12-31 19:05:10

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Search excerpt tag

Only the first part of #12

Offline

#20 2008-12-31 19:14:08

thebombsite
Archived Plugin Author
From: Exmouth, England
Registered: 2004-08-24
Posts: 3,251
Website

Re: Search excerpt tag

Yep. :)


Stuart

In a Time of Universal Deceit
Telling the Truth is Revolutionary.

Offline

#21 2009-01-03 12:51:38

thebombsite
Archived Plugin Author
From: Exmouth, England
Registered: 2004-08-24
Posts: 3,251
Website

Re: Search excerpt tag

I should just mention that I’ve included the new code in thebombsite.com and I’m not seeing any problems with it yet. Doing a search for “txp” throws out a lot more results and everything seems to validate. I’ll let you know if I catch anything.


Stuart

In a Time of Universal Deceit
Telling the Truth is Revolutionary.

Offline

Board footer

Powered by FluxBB