Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#1 2007-01-22 11:18:21
- mikkeX
- Archived Plugin Author
- Registered: 2004-02-26
- Posts: 74
TinyMCE: Problem with htmlentities in search
I have the problem that some of the ones that are publishing on a Textpattern site I work with uses TinyMCE to type in the articles, and som use Textile. That is okay, but my problem is that in the database the text is saved as htmlentities, and if I search for ‘Mullsjö’ for example, I do not find the article that has the text, becuse in the database it is ‘Mullsj&oml;’ Is there something to do to make ut search for both ‘Mullsjö’ and ‘Mullsjö’?
(Edit: updated discussion topic. -Mary)
Last edited by Mary (2007-03-22 22:32:58)
Offline
#2 2007-01-23 10:27:27
- mikkeX
- Archived Plugin Author
- Registered: 2004-02-26
- Posts: 74
Re: TinyMCE: Problem with htmlentities in search
Anybody?
Offline
#3 2007-03-21 21:27:40
- Logoleptic
- Plugin Author
- From: Kansas, USA
- Registered: 2004-02-29
- Posts: 482
Re: TinyMCE: Problem with htmlentities in search
I’d be interested in this myself. Google — and some site search scripts — will look for international variations of plain-ASCII characters. I wonder how much trouble this would be to include in Txp.
Offline
#4 2007-03-22 22:32:41
- Mary
- Sock Enthusiast
- Registered: 2004-06-27
- Posts: 6,236
Re: TinyMCE: Problem with htmlentities in search
This is a problem with a plugin, not Textpattern itself. You need to configure TinyMCE to insert raw charachters, rather than numeric/named entities.
Offline
Re: TinyMCE: Problem with htmlentities in search
Totally missed this thread.
One could argue that since named entities is the correct method to publish these characters on the web and Textpattern is a web publishing tool, that it should be smart enough to handle them. It does not seem that hard to run the query string through htmlentities
before it gets run through search.
Although Textile does not seem to convert these entities, so it would have to be something that can be turned on and off. Maybe it would be better off packaged as a plugin.
Regardless here is the TinyMCE setting to turn off entity encoding just add it to the init string in the admin page.
Last edited by hakjoon (2007-03-23 13:41:04)
Shoving is the answer – pusher robot
Offline
Re: TinyMCE: Problem with htmlentities in search
When using UTF-8, there’s no need to convert to HTML entities. Html entities are for characters that are outside the charset used to transmit the page. I think using HTML entities would cause incorrect sorting in MySQL.
Last edited by ruud (2007-03-23 08:33:55)
Offline
Re: TinyMCE: Problem with htmlentities in search
I didn’t know that about UTF-8. Learn something new everyday.
Shoving is the answer – pusher robot
Offline
#8 2007-03-23 20:48:18
- Logoleptic
- Plugin Author
- From: Kansas, USA
- Registered: 2004-02-29
- Posts: 482
Re: TinyMCE: Problem with htmlentities in search
ruud wrote:
When using UTF-8, there’s no need to convert to HTML entities. Html entities are for characters that are outside the charset used to transmit the page. I think using HTML entities would cause incorrect sorting in MySQL.
Unless I’m mistaken, this still doesn’t solve the problem of looking for the same word in both ASCII and UTF-8. Say, for example, that someone runs a Txp-powered site about movie industry rumors. Someone wants to search for Alfonso Cuarón, to find out what his latest project is. They can’t find him at all, however, because they searched for “Alfonso Cuaron.” Google will find both, whichever version you use. Will Textpattern?
Offline
Re: TinyMCE: Problem with htmlentities in search
From what I’ve read, if the MySQL tables are set to use UTF-8 (not UTF-8 stored in latin1 tables), then the search should be accent-insensitive. I’m still using MySQL 4.0.x so I can’t test this at the moment.
Offline
#10 2007-03-29 00:14:35
- Logoleptic
- Plugin Author
- From: Kansas, USA
- Registered: 2004-02-29
- Posts: 482
Re: TinyMCE: Problem with htmlentities in search
ruud wrote:
From what I’ve read, if the MySQL tables are set to use UTF-8 (not UTF-8 stored in latin1 tables), then the search should be accent-insensitive. I’m still using MySQL 4.0.x so I can’t test this at the moment.
So having this feature is dependent on running 5.0 or higher? Good to know.
Anyone looking for site search with more bells and whistles than Textpattern’s built-in offering might want to check out the ONLamp.com PHP Search Engine Shoot-Out. They look at six search tools of varying ability. One that they left out — and which I’ve used with good results on a non-Txp site — is Orca Search. Among other things, if offers optional accent matching.
Offline
#11 2007-03-29 18:30:19
- Logoleptic
- Plugin Author
- From: Kansas, USA
- Registered: 2004-02-29
- Posts: 482
Re: TinyMCE: Problem with htmlentities in search
ruud wrote:
mysql 4.1 and up
Gotcha. Thanks!
Offline