On Jan 31, 7:41 am, "set...@[EMAIL PROTECTED]
" <set...@[EMAIL PROTECTED]
> wrote:
> Hello All,
>
> I am unsure if similar question has already been posted to this group.
> If so please point me to the relevant post/group.. otherwise please
> read on...
>
> We are in the process of migrating our application from Sybase to
> Oracle. Basically for each table, we would do an bcp out of the table
> and load it into oracle via sqlldr.
>
> The source table has duplicate rows in it. However, we would not want
> to ex****t those dupe rows to our oracle table.
>
> My intention is to create a view that would pick up only the unique
> records and I can bcp the data out of the view.
>
> I can create a view something like
> select distict * from sybase_table. However, I am worried about the
> performance since I have heard that distinct clause is a costly
> operation ( I am working on about 120-140 million rows ).
>
> Is there a better way to achieve it?
>
> Thanks in advance
> Arun
Arun:
Are the duplication criteria just a few columns or all columns? If
just a few, it might be better to create a unique constraint on those
columns on the target (Oracle) side and allow SQL*Loader to handle the
rejected duplicates. Naturally, having any indexes (which is, of
course, how unique/PK constraints are enforced) on the target table
can slow loads into that table, so you should use a DIRECT PATH load
in SQL*Loader.
Another approach might be to delete the duplicate rows out of the
source table (or a copy of it if you're nervous ;-) first. Of course,
deleting from such a large table will also be costly, but maybe not as
costly as the DISTINCTing.
Finally, you may want to dump the source data to a text file, use any
myriad of (fast) text-manipulation utilities to remove the duplicates,
then use SQL*Loader to load the cleansed text file into the target.
HTH,
--Jeff


|