×

Loading...
Ad by
  • 予人玫瑰,手有余香:加拿大新天地工作移民诚聘求职顾问&行业导师!
Ad by
  • 予人玫瑰,手有余香:加拿大新天地工作移民诚聘求职顾问&行业导师!

Topic

  • 工作学习 / 学科技术 / MySQL: Two ways of deleting records, huge difference in performance MySQL: Two ways of deleting records, huge difference in performance +1

    There are two tables T1 and T2. I want to delete those in T1 if they are already in T2.

    This is the first statement:

    delete T1.* from T1 inner join T2 on T1.col1 = T2.col1;

    The execution takes 2 seconds.

    However, if I rewrite the statement in a different way as below:

    delete from T1 where col1 in (select col1 from T2);

    Guess, how much time does it take to execute? 8 minutes!

    • 嘗試著回答:T1 有M 條記錄, T2 有 N 條記錄, T1 AND T2 索引 都是是COL1. 第一種的複雜是 M + N, 第二種的複雜的是M * LOG(N) +2
      • 只有你的回答是对的。用WHERE的时候不走indexing的B-tree造成的。
    • MySqL性能真不行,我专门试过,同一台机器上安装MYsql和sql server, MySQL 同样的情况MYSql差不少,这两个语句放在sql server 差不太多,你比较一下两个query plan就能明白差在哪里 +1
      • True. It didn't do much on the optimization. And it only provided execution plan for SELECT.
    • MySQL execution plan咋说的
      • It's not available for DELETE operation.
        • MySQL需要升级了,估计是5.7以前的
          • 10.3.28-MariaDB
    • In 是performance killer, 啥数据库都一样 +4
      • 这才是正解 +1
        • Too simple. It depends on what kind of thing in that "in"
          • 啥 IN 都是full table scan, 除非表里数据少, 一般不要用 +2
            • 有索引不怕吧?
      • 没读过源码,不知道MySQL是如何实现的,但是感觉这个in应该可以用bloom filter实现吧?
    • 纯 SELECT 操作,统计不包含在T2中的T1个数。效率很差。同学们有没有更好的办法? 纯 SELECT 操作,统计不包含在T2中的T1个数。效率很差。同学们有没有更好的办法? +1

      -- Size of T1: 200K records. -- Size of T2: 10K records. -- 6m39s select count(T1.col1) from T1 left join T2 on T1.col1 = T2.col1 where T2.col1 is null; -- 11m42s select count(T1.col1) from T1 where T1.col1 not in (select col1 from T2); -- 11m33s select count(T1.col1) from T1 where not exists (select 1 from T2 where T2.col1 = T1.col1);
      • T1.col1 和 T2.col1上面有没有index?
        • Yes, both are primary keys.
          • 嗯,那就没办法了,delete 的话用while loop 在后台慢慢删,select 用read uncommitted data 慢慢读。
      • 嘗試使用:總數 - 使用INNER JOIN 找相同的個數
        • Smartie! Just tried. It works perfectly. Less than one second!
      • 试试这个啦,select count(T1.col1) from T2 left join T1 on T1.col1 = T2.col1 where T2.col1 is null;
        • Logically it's not what I want. It immediately returns 0, which is not right either.
      • 这个问题没看到具体配置和库结构不好回答,不过,根据我这个秘书制片人转码,卡车司机傻大兵转库最简单粗暴的办法,在SELET语句末尾加一个LIMIT 语句就好,假设你只需要知道前几条符合条件的搜索结果,速度可能快很多,因为这样就不会傻乎乎整个兜底查一遍,你不妨一试 +2
      • 再抛一块温村工地的砖引缅北高科的玉,你试试看这个语句,未必完全正确,欢迎小刀订正,鹅不在江湖多年了 +3

        SELECT count(T1.col1) FROM T1

        EXCEPT

        SELECT T1.col1 FROM T1 a JOIN T2 b ON a.col1 = b.col1

        • It works great! Thank you very much! +2
          • 🤝Any time. It has been my privilege and pleasure to help. I'm a self-taught professional, in all aspects, including military, trucking, movie industry, and of course, IT. BTW,how long did it take this time? Just curious +2
    • 去问问ChatGPT吧😄
      • 去读读my SQL 源代码看看内部是怎么实现的。
      • 楼主已经问过啦,ChatGPT如果能回答,大家早就失业了 +1
        • 👍,😄,我老婆和我成天跟着CHATGPT学英语 +1
    • t1,t2中各多少记录?in select 按你MySQL版本,应该会利用索引的,但你做的delete应该是全表遍历,这个t1索引可能帮不上。第一种做法应该是利用了两个表的索引,第二种做法如果t1记录少,也不会很慢 +1
      • 200K and 10K
    • 老大,真感兴趣的话试试这个方法。如果t1中rows很多,考虑分成稍小的batch, 比如1000个ID。 老大,真感兴趣的话试试这个方法。如果t1中rows很多,考虑分成稍小的batch, 比如1000个ID。

      select group_concat(concat('"',id-to-purge,'"')) into @batch_id_purge FROM t1;

      select IFNULL(@batch_id_purge , '"DONE"') into @batch_id_purge ;

      set @delete_stmt = concat ("DELETE FROM t2 WHERE id IN (", @batch_id_purge ,")");
      prepare STMT FROM @delete_stmt;
      execute STMT;

      • 问题已解决,老大狠HAPPY,给老大点个大大的赞,👍,这也是我们共同提高的大好机会 +2

        SELECT count(T1.col1) FROM T1

        EXCEPT

        SELECT T1.col1 FROM T1 a JOIN T2 b ON a.col1 = b.col1

      • Thanks!
    • 确实如此。在SQL里我也避免用where... in。可以象老大这样用inner join代替,也可以用where exists,性能都好很多。 +1
      • 👍 +3