Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Interfaces Jdbc > Re: Populating ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 5 of 7 Topic 1971 of 2064
Post > Topic >>

Re: Populating large tables with occasional bad values

by craig@[EMAIL PROTECTED] (Craig Ringer) Jun 12, 2008 at 01:22 AM

John T. Dow wrote:

> If I did this regularly, three options seem easiest.
> 
> 1 - Load locally to get clean data and then COPY. This requires the
server to have access local access to the file to be copied, and if the
server is hosted by an isp, it depends on them whether you can do this
easily.

You can COPY over a PostgreSQL network connection. See the \copy sup****t 
in psql for one example of how it works.

I've never had cause to try to use it with JDBC so I don't know how/if 
it works in that context.

However, batched inserts can be quite fast enough. If you do one INSERT 
INTO per 1000 rows you'll already be seeing a massive performance boost:

INSERT INTO mytable
VALUES
(blah, blah),
(blah, blah),
(blah, blah),
-- hundreds of rows later
(blah, blah);

.... will be a *LOT* faster. If you have to do special processing or 
error handling you can do it once you have the data in a server-side 
staging table - and you can get it there quickly with multi-row inserts 
or (at least using psql) a \copy .

> 3 - If the only problem is duplicate keys, load into a special table
without the constraint, issue update commands to rewrite the keys as
needed, then select/insert to the correct table.

This would be my preferred approach, personally, using either network 
COPY or multi-row INSERTs to get the data into the staging table.

You can do a whole lot more than fix unique constaint violations, 
though. With a PL/PgSQL function to process the staging table and do the 
inserts you can do some pretty major data surgery.

I have some conversion code that takes pretty ugly data in unconstrained 
staging tables and reprocesses it to fit a new, much stricter and better 
normalized schema. Large amounts of restructuring and fixing up are 
required. Some of the simpler conversions are done by INSERT ... SELECT 
statements, but the complicated stuff is done with PL/PgSQL functions.

I've sent you the conversion code off-list in case it's informative. 
It's pretty ugly code as it's going to be thrown away when we cut over 
to the new system, but it should show just how much data conversion & 
repair you can do on the database server side.

It's not like you can't put aside rows that need user interaction as you 
process the staging table, either. Just INSERT them into a "problem" 
table and delete them from the staging table. Then have the client scan 
through the (much smaller) problem table and ask the user to make 
decisions. When everything looks satisfactory and the user's decisions 
have been acted on, COMMIT.

--
Craig Ringer

-- 
Sent via pgsql-jdbc mailing list (pgsql-jdbc@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc
 




 7 Posts in Topic:
Populating large tables with occasional bad values
john@[EMAIL PROTECTED] (  2008-06-11 10:52:57 
Re: Populating large tables with occasional bad values
craig@[EMAIL PROTECTED]   2008-06-12 00:06:01 
Re: Populating large tables with occasional bad values
craig@[EMAIL PROTECTED]   2008-06-12 00:20:46 
Re: Populating large tables with occasional bad values
john@[EMAIL PROTECTED] (  2008-06-11 12:39:36 
Re: Populating large tables with occasional bad values
craig@[EMAIL PROTECTED]   2008-06-12 01:22:14 
Re: Populating large tables with occasional bad values
oliver@[EMAIL PROTECTED]   2008-06-12 13:23:50 
Re: Populating large tables with occasional bad values
tivvpgsqljdbc@[EMAIL PROT  2008-06-12 11:56:02 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Oct 11 20:26:26 CDT 2008.