sql - Postgres: delete duplicate rows with bi-directional relationship -
i working postgres database has bi-directional join table. data being cleaned , no longer want duplicates.
the table has 3.4m rows, snapshot this:
sentence_id | translation_id -------------|---------------- 77 | 1276 1276 | 77 77 | 2481 2481 | 77 77 | 380381 380381 | 77 this query identifies pairs:
select s.sentence_id, x.sentence_id tmp s, tmp x s.sentence_id = x.translation_id , x.sentence_id = s.translation_id what trying delete 1 of duplicate pairs, eg, rows 1 , 2 (77, 1276 , 1276,77), want retain 1 row. doesn't matter one.
example output be:
sentence_id | translation_id -------------|---------------- 77 | 1276 77 | 2481 77 | 380381 how can handle in sql on postgres?
here 1 method: delete row if sentence_id greater translation_id and inverse relationship exists:
delete tmp tmp.translation_id > tmp.sentence_id , exists (select 1 tmp tmp2 tmp2.sentence_id = tmp.translation_id , tmp.sentence_id = tmp2.translation_id ); having said that, deleting lots of rows relatively large table can inefficient. might better stash values in temporary table, truncate first table, , re-insert values:
create table tmp_tmp select sentence_id, translation_id tmp tmp.translation_id > tmp.sentence_id , exists (select 1 tmp tmp2 tmp2.sentence_id = tmp.translation_id , tmp.sentence_id = tmp2.translation_id ); truncate table tmp; insert tmp(sentence_id, translation_id) select sentence_id, translation_id tmp_tmp;
Comments
Post a Comment