sql - Postgres: delete duplicate rows with bi-directional relationship -


i working postgres database has bi-directional join table. data being cleaned , no longer want duplicates.

the table has 3.4m rows, snapshot this:

sentence_id  |  translation_id -------------|---------------- 77           |  1276 1276         |  77 77           |  2481 2481         |  77 77           |  380381 380381       |  77 

this query identifies pairs:

select s.sentence_id, x.sentence_id tmp s, tmp x s.sentence_id = x.translation_id , x.sentence_id = s.translation_id 

what trying delete 1 of duplicate pairs, eg, rows 1 , 2 (77, 1276 , 1276,77), want retain 1 row. doesn't matter one.

example output be:

sentence_id  |  translation_id -------------|---------------- 77           |  1276 77           |  2481 77           |  380381 

how can handle in sql on postgres?

here 1 method: delete row if sentence_id greater translation_id and inverse relationship exists:

delete tmp     tmp.translation_id > tmp.sentence_id ,           exists (select 1                   tmp tmp2                   tmp2.sentence_id = tmp.translation_id ,                         tmp.sentence_id = tmp2.translation_id                  ); 

having said that, deleting lots of rows relatively large table can inefficient. might better stash values in temporary table, truncate first table, , re-insert values:

create table tmp_tmp     select sentence_id, translation_id     tmp     tmp.translation_id > tmp.sentence_id ,           exists (select 1                   tmp tmp2                   tmp2.sentence_id = tmp.translation_id ,                         tmp.sentence_id = tmp2.translation_id                  );  truncate table tmp;  insert tmp(sentence_id, translation_id)     select sentence_id, translation_id     tmp_tmp; 

Comments

Popular posts from this blog

routing - AngularJS State management ->load multiple states in one page -

python - GRASS parser() error -

json - Gson().fromJson(jsonResult, Myobject.class) return values in 0's -