Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2007-01-22 11:18:21

mikkeX
Archived Plugin Author
Registered: 2004-02-26
Posts: 74

TinyMCE: Problem with htmlentities in search

I have the problem that some of the ones that are publishing on a Textpattern site I work with uses TinyMCE to type in the articles, and som use Textile. That is okay, but my problem is that in the database the text is saved as htmlentities, and if I search for ‘Mullsjö’ for example, I do not find the article that has the text, becuse in the database it is ‘Mullsj&oml;’ Is there something to do to make ut search for both ‘Mullsjö’ and ‘Mullsjö’?

(Edit: updated discussion topic. -Mary)

Last edited by Mary (2007-03-22 22:32:58)

Offline

#2 2007-01-23 10:27:27

mikkeX
Archived Plugin Author
Registered: 2004-02-26
Posts: 74

Re: TinyMCE: Problem with htmlentities in search

Anybody?

Offline

#3 2007-03-21 21:27:40

Logoleptic
Plugin Author
From: Kansas, USA
Registered: 2004-02-29
Posts: 482

Re: TinyMCE: Problem with htmlentities in search

I’d be interested in this myself. Google — and some site search scripts — will look for international variations of plain-ASCII characters. I wonder how much trouble this would be to include in Txp.

Offline

#4 2007-03-22 22:32:41

Mary
Sock Enthusiast
Registered: 2004-06-27
Posts: 6,236

Re: TinyMCE: Problem with htmlentities in search

This is a problem with a plugin, not Textpattern itself. You need to configure TinyMCE to insert raw charachters, rather than numeric/named entities.

Offline

#5 2007-03-23 03:31:44

hakjoon
Member
From: Arlington, VA
Registered: 2004-07-29
Posts: 1,634
Website

Re: TinyMCE: Problem with htmlentities in search

Totally missed this thread.

One could argue that since named entities is the correct method to publish these characters on the web and Textpattern is a web publishing tool, that it should be smart enough to handle them. It does not seem that hard to run the query string through htmlentities before it gets run through search.

Although Textile does not seem to convert these entities, so it would have to be something that can be turned on and off. Maybe it would be better off packaged as a plugin.

Regardless here is the TinyMCE setting to turn off entity encoding just add it to the init string in the admin page.

Last edited by hakjoon (2007-03-23 13:41:04)


Shoving is the answer – pusher robot

Offline

#6 2007-03-23 08:32:57

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: TinyMCE: Problem with htmlentities in search

When using UTF-8, there’s no need to convert to HTML entities. Html entities are for characters that are outside the charset used to transmit the page. I think using HTML entities would cause incorrect sorting in MySQL.

Last edited by ruud (2007-03-23 08:33:55)

Offline

#7 2007-03-23 13:41:33

hakjoon
Member
From: Arlington, VA
Registered: 2004-07-29
Posts: 1,634
Website

Re: TinyMCE: Problem with htmlentities in search

I didn’t know that about UTF-8. Learn something new everyday.


Shoving is the answer – pusher robot

Offline

#8 2007-03-23 20:48:18

Logoleptic
Plugin Author
From: Kansas, USA
Registered: 2004-02-29
Posts: 482

Re: TinyMCE: Problem with htmlentities in search

ruud wrote:

When using UTF-8, there’s no need to convert to HTML entities. Html entities are for characters that are outside the charset used to transmit the page. I think using HTML entities would cause incorrect sorting in MySQL.

Unless I’m mistaken, this still doesn’t solve the problem of looking for the same word in both ASCII and UTF-8. Say, for example, that someone runs a Txp-powered site about movie industry rumors. Someone wants to search for Alfonso Cuarón, to find out what his latest project is. They can’t find him at all, however, because they searched for “Alfonso Cuaron.” Google will find both, whichever version you use. Will Textpattern?

Offline

#9 2007-03-23 21:41:04

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: TinyMCE: Problem with htmlentities in search

From what I’ve read, if the MySQL tables are set to use UTF-8 (not UTF-8 stored in latin1 tables), then the search should be accent-insensitive. I’m still using MySQL 4.0.x so I can’t test this at the moment.

Offline

#10 2007-03-29 00:14:35

Logoleptic
Plugin Author
From: Kansas, USA
Registered: 2004-02-29
Posts: 482

Re: TinyMCE: Problem with htmlentities in search

ruud wrote:

From what I’ve read, if the MySQL tables are set to use UTF-8 (not UTF-8 stored in latin1 tables), then the search should be accent-insensitive. I’m still using MySQL 4.0.x so I can’t test this at the moment.

So having this feature is dependent on running 5.0 or higher? Good to know.

Anyone looking for site search with more bells and whistles than Textpattern’s built-in offering might want to check out the ONLamp.com PHP Search Engine Shoot-Out. They look at six search tools of varying ability. One that they left out — and which I’ve used with good results on a non-Txp site — is Orca Search. Among other things, if offers optional accent matching.

Offline

#11 2007-03-29 18:30:19

Logoleptic
Plugin Author
From: Kansas, USA
Registered: 2004-02-29
Posts: 482

Re: TinyMCE: Problem with htmlentities in search

ruud wrote:

mysql 4.1 and up

Gotcha. Thanks!

Offline

Board footer

Powered by FluxBB