Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql General > encoding confus...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 6 Topic 15783 of 17437
Post > Topic >>

encoding confusion

by Sim Zacks <sim@[EMAIL PROTECTED] > Jun 10, 2008 at 04:10 PM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

{BACKGROUND]
I am testing dbmail for our cor****ate email solution.

We originally tested it on mysql and now we are migrating it to
postgresql.

The messages are stored in a longblob field on mysql and a bytea field
in postgresql.

I set the database up as UTF-8, even though we get emails that are not
UTF encoded, mostly because I didn't know what else to try that would
incor****ate all the possible encodings. Examples of 3 encodings we
regularly receive are: UTF-8, Windows-1255, ISO-8859-8-I.

I transferred the entire database except for one table using the
dbi-link contrib module, connecting directly to MySQL and pulling the
data. The table that did not transfer was the Messageblks table which
has a field mentioned above.
It would not transfer through the dbi-link, so I wrote a python script
(see below) to read a row from mysql and write a row to postgresql
(using pygresql and mysqldb).
When I used pygresql's escape_bytea function to copy the data, it went
smoothly, but the data was corrupt.
When I tried the escape_string function it died because the data it was
moving was not UTF-8.

I finally got it to work by defining a database as SQL-ASCII and then
using escape_string worked. After the data was all in place, I pg_dumped
and pg_restored into a UTF-8 database and it surprisingly works now.

[CONFUSION]
What I don't understand, is that if that database can't handle the non
UTF characters, how does it allow them in when I receive an email
(tested, it works) and when I restored the backup.
I also don't understand why the data transfer didn't work to a UTF
database, but it worked to an ASCII database, if the data can go into a
UTF database from an ascii database.

Lastly, I wanted to know if anybody has experience moving data from
mysql to postgresql and if what I did is considered normal, or if there
is a better way of doing this.

Thank you
Sim

[Python script]:
im****t MySQLdb
from MySQLdb.cursors im****t *
im****t pg
im****t sys

pgdb=pg.connect(host="1.2.3.4",user="username",dbname="dbmail",
passwd="password")
mydb=MySQLdb.connect(host="localhost",user="mysqluser",passwd="mysqlpassword",
db="dbmail", cursorclass=MySQLdb.cursors.SSDictCursor,charset="utf8")
mycrs=mydb.cursor()
mycrs.execute("select * from dbmail_messageblks")
while 1:
        nextrow=mycrs.fetchone()
        nextrow["messageblk"]=pg.escape_string(nextrow["messageblk"])
        pgdb.query("insert into
dbmail_messageblks(messageblk,physmessage_id,is_header,messageblk_idnr,blocksize)
values($field$%(messageblk)s$field$,%(physmessage_id)s,%(is_header)s,%(messageblk_idnr)s,%(blocksize)s)"
% nextrow)


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhOfUQACgkQjDX6szCBa+oyJwCghkG7XpcrHKY7ybeJgvjPA/XM
qLIAoMS4gOWpP4o2lKrRU2v0IdtaiSQj
=OhYp
-----END PGP SIGNATURE-----
 




 6 Posts in Topic:
encoding confusion
Sim Zacks <sim@[EMAIL   2008-06-10 16:10:28 
Re: encoding confusion
dev@[EMAIL PROTECTED] (R  2008-06-10 16:13:46 
Re: encoding confusion
Sim Zacks <sim@[EMAIL   2008-06-11 08:35:59 
Re: encoding confusion
dev@[EMAIL PROTECTED] (R  2008-06-11 08:03:34 
Re: encoding confusion
dalroi@[EMAIL PROTECTED]   2008-06-11 20:06:01 
Re: encoding confusion
laurenz.albe@[EMAIL PROTE  2008-06-11 07:58:16 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Nov 22 16:50:32 CST 2008.