Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Hackers > tsearch is non-...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 2 Topic 9544 of 10966
Post > Topic >>

tsearch is non-multibyte-aware in a few places

by tgl@[EMAIL PROTECTED] (Tom Lane) Jun 19, 2008 at 12:29 PM

I've identified the cause of bug #4253:

            /* Trim trailing space */
            while (*pbuf && !t_isspace(pbuf))
                pbuf++;
            *pbuf = '\0';

At least on Macs, t_isspace is capable of returning "true" when pointed
at the second byte of a 2-byte UTF8 character.  This explains the re****t
that the letter "à" has a problem when some other ones don't.  Of
course pbuf needs to be incremented using pg_mblen not just ++.

I looked around for other occurrences of the same problem and found
a couple.  I also found occurrences of the same pattern for skipping
whitespace:

            while (*s && t_isspace(s))
                s++;

This is safe if and only if t_isspace is never true for multibyte
characters ... can anyone think of a counterexample?

			regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
 




 2 Posts in Topic:
tsearch is non-multibyte-aware in a few places
tgl@[EMAIL PROTECTED] (T  2008-06-19 12:29:11 
Re: tsearch is non-multibyte-aware in a few places
tgl@[EMAIL PROTECTED] (T  2008-06-19 13:23:36 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Mon Dec 1 11:56:05 CST 2008.