2011-12-16

「索引列を演算するとインデックスが使えなくなる」これはなぜなのか

例えばテーブルt1の列col1にインデックスがあり、以下のクエリでそのインデックスが参照される場合、

    select * from t1 where col1 >= 20	 	--(1)

クエリを以下のように書き換えると

    select * from t1 where col1*5 >= 100	--(2)

インデックスが使えなくなってしまう、だから索引列は極力演算したり関数に渡したりしない方がよい、という話がある。
実際、PostgreSQL9.1.2やTeradata12では(1)はインデックスシーク、(2)はテーブルスキャンになってしまう。

しかし、B-treeインデックスには列値そのものが載っているのだから、(2)でインデックスを使えない理由はないような気がする。
インデックス読み出して「col1の値が20以上か」をチェックする代わりに、「col1の値を5倍して100以上になるか」をチェックすればよいだけではないのか。

SQL Serverならインデックスを使ってくれるが...

実際、SQL Server 2008は(2)のクエリについてもインデックスを使ってくれる。
サンプルDB AdventureWorks の SalesOrderDetail テーブルのOrderQtyにインデックスを張り、まずは

    select * 
    from Sales.SalesOrderDetail
    where OrderQty >= 20

でインデックスが使用されることを確認する(実行計画右上の Index Seek)。

次にOrderQtyを×5してみると、

    select * 
    from Sales.SalesOrderDetail
    where OrderQty*5 >= 100

同じインデックスが使用されることがわかる。

ただし、掛け算することによって実行計画右上の Index Seek (Nonclustered) が Index Scan (Nonclustered) に変わってしまっている。
つまりルートノードを出発点としてB-treeインデックスを「L字型」に読むのではなく、リーフノードを端から全部読んでいるのだ。ここがわからない。
なぜシークできずスキャンするのか。リーフノードに対して可能な演算と比較が、ルートノード／ブランチノードに対してはできない事情があるようだが、それは何か？

2011-08-30

プログラマの定年が35歳ではないなら一体何歳なのか

仕事

「社外でも通用するスキル」のくだらなさについてに丁寧なコメントをいただいので、投げっぱなしにしないでもう一つ先まで考えてみる。

前回書いたことは「ポータブルスキルを『社外でも通用する技術』と考える限り、自社専用スキルとポータブルスキルのどっちを選んでも詰む」という話だった。
「自社専用スキルに特化すると会社を放り出されたときに困る。ポータブルスキルがあればクビになっても困らないはず」と思って売れ筋の技術を磨いても、技術はある年齢になると売れなくなるので、40歳になってみるとやっぱり「会社を放り出されると困る人」になっている自分を発見する。
「ポータブルスキルとは社外でも通用する技術である」という考え方が間違っているので、こういうことになる。

前回書いたことはこれだけ。
では、自社専用スキルに頼れない俺は、これからどうすればいいのか。

ミクロの対応

個人レベルでの対応ははっきりしていて、コメントで指摘いただいている「希少性」を加えて自分のポータブルスキルを組み立てるしかない。
つまり

みんながそれなりにできることを、突出して高いレベルでする
みんながやらない(やりたがらない)ことをする

のどちらかだ。1.ができればベスト、ダメなら2.の路線で行けばよい。
言うのは簡単だが、問題は自分にできるのかどうか、しかも一時だけではなく、リタイヤするまで継続できるのかどうかだ。

マクロの現実

昔からプログラマとかSEには「35歳定年説」というのがあって、これを否定するような言説（「もうそういう時代じゃない」「定年は人によって違う」「自分は努力しているので関係ない」等々）を数年に1回は見るんだけど、見るたびに危なっかしいなあと思うのは、だったら何歳が自分の定年なのか考えてないように見えることだ。
35が定年でないなら45なのか55なのか。「俺は35ぐらいで下り坂には入らない。まだまだ第一線で戦える」と考えている人の限界が仮に55歳だとして、55でリタイアした後どうするのか。
中高年になって転身するぐらいなら、長く働ける仕事を早めに見つけた方がいいんじゃないのか。整体師とか。
だったら一周して35歳定年説は正しかったということになるのではないか。

「高く売れる突出した能力があれば、何歳だろうとエンジニアとしてやっていける」というのは正しい。しかし結果としてそれを実現できる人はほんの少数だ。「突出せよ」というのは「上位数%に入れ」というのと同じだから、当然そうなる。こういう特別な人たちのことは、まあいいとして、
問題はその他大多数だ。この業界で、与えられた仕事を真面目にこなし、一方で日々の勉強も欠かさない職人たち、要するに普通の真面目な人は、最後にどうなるのか。

戦うにしろ降りるにしろ、まずは現実を見ることだろう。現実を見ないで生き残り戦略もないだろう。
個人的には、40過ぎてなお若い人と生存競争をしたくないと思う。いいかげん利他の精神で生きたいよな。皆さんもそうでしょう。

2011-08-09

「社外でも通用するスキル」のくだらなさについて

仕事

自分が長い間勘違いしてたから言うんだけど、

最後に｢Firm Specific Skill｣と｢Portable Skill｣の違いについて。前者は直訳すれば「企業特殊技能」と
いいまして、社員がその企業ならではの技能を磨き、所属企業とのコミットメントを高めながら共に進んで
いくというものです。でも、ホントに若い人が気をつけるべきなのは、自分が所属している企業外に「持ち
運び可能な技能」ではないでしょうか？

大企業で働くと毀損されるいくつかのコトについて - GoTheDistance

ここで言うポータブルスキルの正体が何なのかは、よく考えなくてはならない。
ポータブルスキル＝「社外でも通用するスキル」「転職時に潰しの効く技術」ぐらいに考えていると大損する。

大企業の人はそう言うけど

業界大手のお客さんと話をしていて、「(うちの会社名)さんは技術があるからいいですよねえ。私らは技術がないから転職もできない...」みたいに言われたことがあるけど、そう言うお客さんの給料は俺よりはるかに高いのだ。
なぜそうなるのかというと、俺が持っているといわれている「技術」は市場があって金を払えば買って来れるものでしかないのに対し、この方が持っているのは、どこにも売っていない、まさにその会社専用のスキルだからだ。

市場があれば競争があり、俺の持っている「技術」は常に安値に向かう圧力にさらされている。対してこのお客さんのスキルは外部調達できないから競争とは無縁だ。ゆえに俺の給料は安く、お客さんの給料は高い。事実はこうなのだが、
角度を変えて見ると、俺が持っているのは「社外で通用する」スキルで、お客さんが持っているのは「その企業から一歩出たら全く役に立たない」スキルだ、ということにもなる。ここで勘違いすると生涯地を這う。

ポータブルスキルの正体

「社外でも通用する技術」なんてくだらないものなのだ。理由は2つあって、
市場で求められている技術は常に価格競争圧力にさらされているので、そんなものを追っても金銭的に報われることはない、ということが一つ。
もう一つは、技術が売れなくなる日が誰にでも来ることだ。例えばITエンジニアが40歳になると、求められている技術を提供できても、買ってくれる人が激減する。IT関係の求人をチェックしてみればわかる。40歳以上のエンジニアの求人なんてほとんどない。

会社員の一生を支えるポータブルスキルの正体は技術ではない。くだらない作業を含めた全ての仕事に対して常にタイムアタックすることであったり、情報が足りない中でも即座にできる/できないの判断をして責任を取ることであったり、そういった技術自体よりも一段メタな能力を追わなくてはならない。
これらの能力は「自社の外ではまったく通用しない」仕事の渦中にあっても育てることができるから、仮に大企業でそういった仕事がアサインされてしまっても、まったくハンデにならない。結論、大企業でも別にいいんじゃないか。働いたことないけど。

2010-10-05

忘れてしまったほうがいいSQLチューニングテクニックについて

SQL

10年ぐらい前はJavaで文字列を連結するときは String ではなく StringBuffer を使ったほうが速いなんて言われてたんだけど、それがテクニックとして有効だったのは「StringBufferを使えばこれこれの内部動作になる」という関係が固定されていたからで、仮にJVMのリビジョンによってどっちが速いかがコロコロ変わったり、文字列の長さが一定以上になると突然あっちの方が速くなったり、みたいな状態だったら「文字列の連結はStringBufferの方が...」はテクニックとして成り立たなくなってしまう。
が、SQLについての言説の一部はまさにこの状態で、本来成り立たないことがチューニングテクニックとして語られていると思う。
そういうのを暗記するのはやめて実行計画を見ましょう、という話。

「INとEXISTSはどっちが速いか」の不毛

昔から「INを等価なEXISTSに書き換えると速くなる」なんて言われていて、「これは特定のRDBMSに限定されない、汎用的なテクニックだ」という嘘までセットになってたりするんだけど、実際はいろいろ条件が揃わないとEXISTSはINに勝てない。
1000万件のテーブルをメインクエリに、1000件のテーブルをサブクエリに置いて、INとEXISTSのどっちが速いか確認してみよう。

-- (1)IN
explain analyze
select count(*) 
from accounts a
where bid in
(select tid from tellers);

-- (2)EXISTS
explain analyze
select count(*) 
from accounts a
where exists
(select * from tellers where tid=a.bid);

上記2本のSQLを同じ実行計画で処理するRDBMSもあるんだけどそれでは同じ速さになってしまうので、(1)(2)で異なる実行計画を吐いてくれるPostgreSQL8.3を使って実験してみると、手元の環境ではINの方が1.5倍速い。

EXISTSを勝たせるにはどうするかというと、メインクエリのWHERE句に絞り込みの条件を付けて、サブクエリにぶつける件数を少なくすればEXISTSの方が速くなる。
手元の環境では、1000万件を1000件まで絞るとようやく逆転してくれた。

-- EXISTSの方が7%速い

explain analyze
select count(*) 
from accounts a
where bid in
(select tid from tellers)
and aid between 1 and 1000;

explain analyze
select count(*) 
from accounts a
where exists
(select * from tellers where tid=a.bid)
and aid between 1 and 1000;

ところがここで結合カラムを変更し、サブクエリ側テーブルにインデックスのない列(=bid)で結合すると、再びINの方が速くなる。

-- INの方が15%速い

explain analyze
select count(*) 
from accounts a
where bid in
(select bid from tellers)
and aid between 1 and 1000;

explain analyze
select count(*) 
from accounts a
where exists
(select * from tellers where bid=a.bid)
and aid between 1 and 1000;

以上をまとめると「メインクエリ側のWHERE句で件数を十分に絞り込み、サブクエリ側にインデックスのある列で結合する場合は、EXISTSはINより速い」ということになるがこんなこと覚えても役に立たない。これは当該サーバ・当該DBに固有の結果であって環境が変われば結果も変わる。例えば「tellersが500万件もある」とか「ストレージがSSDだ」という環境ではどのクエリでもEXISTSが圧勝するんじゃないだろうか。
要するに「INとEXISTSはどっちが速いか」という問題の立て方がよくないわけです。

実行計画を見てみる

EXISTSが勝ったり負けたりした理由は実行計画を見れば明白で、この環境では IN は Hash Join(=両テーブル1回ずつのシーケンシャルスキャン)で実行されるのに対して、EXISTS は Nested Loop Join 風の動きでtellersにランダムアクセスしていた。
3つのテスト結果はSQLの水準で考えると何だか不安定に見えるが、より低レベルに降りてみれば

ランダムアクセスを1000万回やったら1回のシーケンシャルスキャンに負けた

ランダムアクセスを1000回に減らせば1回のシーケンシャルスキャンに勝った

インデックスがないと1000回のランダムアクセスが1000回のシーケンシャルスキャンに変わるので、1回のシーケンシャルスキャンに負けた

という、ランダムアクセスとシーケンシャルスキャンの特性から推測できる話に過ぎない。この特性はあらゆる環境に共通するので、上で太字にしたTipsと違って覚えておけばどこでも使える。

つまりSQLではなく実行計画のレベルでチューニングを考えると、覚えなくてはならないことが少数の汎用的な知識だけになってありがたいわけです。
逆にSQLのレベルでチューニングテクニックを暗記すると、覚えることが増える上に効果が出ない原因が理解できずRDBが嫌になってきます。

2010-09-30

大山康晴はなぜ強かったのか

本

羽生善治「決断力 (角川oneテーマ21)」より。

史上最強の棋士が十五世名人の大山康晴先生であることは、誰もが認めるであろう。
大山先生が六十九歳で亡くなる二年前。ガンの手術をしたあと、必敗の将棋を夜中の二時まで
粘りに粘って、最後に催眠術のようなものを使って逆転勝ちしたことがある。
私もそばでずっと見ていて心から感動したものだ。

いや、それって将棋が強いのと違うような．．．

2010-09-24

Teradataの実行計画(1) - 大量データの結合の計画

Teradata

Teradataの実行計画について少しずつ整理していきます。
今回は大量データの結合の実行計画について。
こんなクエリは、どういう実行計画になっていればOKだろうか。

	select *
	from t　/* 1000万件のトランザクション */
	inner join m 　/* マスタ */
	on m.c1=t.c1

RowHash match scan ... using a merge join

m.c1とt.c1がそれぞのテーブルのPI(Primary Index)であれば、以下の実行計画が出てくるはずだ。

  4) We do an all-AMPs JOIN step from TST.m by way of a RowHash match
     scan with no residual conditions, which is joined to TST.t by way
     of a RowHash match scan with no residual conditions.  TST.m and
     TST.t are joined using a merge join, with a join condition of (
     "TST.m.c1 = TST.t.c1").  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with low confidence to be 2,759,376 rows. 
     The estimated time for this step is 44.38 seconds.

RowHash match scan ... using a merge join は、Teradataの結合処理中で最速の実行計画。
このプランなら問題なし。

TeradataはデータをPrimary Index順にソートして格納している。
なので Primary Index 同士で結合するならば、両テーブルの先頭からレコードを順番に取り出していきなりマージ処理できる*1。
実行してみると、トランザクションを単純に全件読むのとほとんど変わらない速さでクエリが終了する。

single partition hash join

小さなテーブルのPI列と、大きなテーブルの非PI列で結合する...例えば「巨大なトランザクションと、100件程度の組織マスタを組織コードで結合する」といったパターンは、single partition hash join に帰着できれば問題ない。
マスタ側の件数が十分に少なければ(数10万件ぐらいまでか)、以下のような実行計画が出てくる。

  4) We do an all-AMPs RETRIEVE step from TST.m by way of an all-rows
     scan with no residual conditions into Spool 2 (all_amps), which is
     duplicated on all AMPs.  The size of Spool 2 is estimated with
     high confidence to be 8,340 rows.  The estimated time for this
     step is 0.02 seconds. 

  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to TST.t by way of an all-rows scan. 
     Spool 2 and TST.t are joined using a single partition hash_ join,
     with a join condition of ("c1 = TST.t.c1").  The
     result goes into Spool 1 (group_amps), which is built locally on
     the AMPs.  The size of Spool 1 is estimated with no confidence to
     be 2,483,439 rows.  The estimated time for this step is 22.38
     seconds.

4)で小さいマスタを全AMPにコピーし、5)でトランザクションとハッシュ結合する。
トランザクションを単純に全件読むのに比べて、せいぜい1-2割増しの時間でクエリが終了する。

マスタが大きすぎたら...

マスタが100万件もあれば、恐らく single partition hash join ではなく、以下のような hash join of N partitions ... になってしまうだろう。

  4) We do an all-AMPs RETRIEVE step from TST.m by way of an all-rows
     scan with no residual conditions into Spool 2 (all_amps) fanned
     out into 8 hash join partitions, which is duplicated on all AMPs. 
     The size of Spool 2 is estimated with low confidence to be
     2,607,120 rows.  The estimated time for this step is 0.83 seconds. 

  5) We do an all-AMPs RETRIEVE step from TST.t by way of an all-rows
     scan with a condition of ("NOT (TST.t.c1 IS NULL)") into
     Spool 3 (all_amps) fanned out into 8 hash join partitions, which
     is built locally on the AMPs.  The size of Spool 3 is estimated
     with no confidence to be 2,483,439 rows.  The estimated time for
     this step is 14.54 seconds. 

  6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to Spool 3 (Last Use) by way of an
     all-rows scan.  Spool 2 and Spool 3 are joined using a hash join
     of 8 partitions, with a join condition of ("c1 = c1"). 
     The result goes into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with no confidence
     to be 2,483,439 rows.  The estimated time for this step is 15.57
     seconds.

ハッシュ結合とは、小さい方のテーブル(=マスタ)をメモリ上に展開してハッシュ表とし、それに対して大きい方(=トランザクション)をぶつけるものだが、マスタの件数が多すぎて全件メモリに載らない場合は、Teradataはマスタをメモリに載るサイズに小分けし、トランザクションもそれに合わせて分割してから、複数回に分けて結合することで対応する。

上記の実行計画でやっていることは、
4)でマスタを8分割して全AMPにコピーし、
5)でトランザクションも8分割してスプールに書き出し、
6)で8回に分けて両者を結合している。
single partition hash joinに比べて、5)でトランザクションを「読んで書いて」する手間が増えているので遅い。

Teradataがどっちのhash joinを選択するかをアプリケーションからコントロールする方法はない(Teradataにhint句はない)。
ただし、ハッシュ表を展開するメモリのサイズはHTMemAllocというシステムパラメータになっているので、これを拡大して single partition hash join に倒れやすくする、ということはできるのかもしれない。やったことないけど。

*1:全件スキャンするのにブロックの順番はどうやって維持するのか。マルチブロックリードしないのか。という疑問があるかもしれませんが、それについてはまた

2010-09-13

Teradata SQLパフォーマンス改善Tips(2) - not in で比較する列にはnot null制約を付ける

Teradata

こんなSQLでパフォーマンスが出ない場合、

	SELECT *
	FROM t1
	WHERE c1 NOT IN (
		SELECT c1 FROM t2
	)

not inで比較してる列＝t1.c1, t2.c1 にnot null制約が付いているか確認してみるといい。
もしもc1にnullがあり得ないにもかかわらず、not null制約が抜け落ちているなら、制約を付けることによって、多分2倍ぐらい速くなるだろう。

t1.c1, t2.c1にnot null制約が付いている場合、上記SQLの実行計画(の核になる部分)はこうなる：

	  4) We execute the following steps in parallel. 
	       1) We do an all-AMPs RETRIEVE step from TST.t1 by way of an
	          all-rows scan with no residual conditions into Spool 2
	          (all_amps), which is built locally on the AMPs.  Then we do a
	          SORT to order Spool 2 by the hash code of (TST.t1.c1). 
	          The size of Spool 2 is estimated with low confidence to be
	          1,537,850 rows (138,406,500 bytes).  The estimated time for
	          this step is 1 minute and 11 seconds. 
	       2) We do an all-AMPs RETRIEVE step from TST.t2 by way of an
	          all-rows scan with no residual conditions into Spool 3
	          (all_amps), which is duplicated on all AMPs.  Then we do a
	          SORT to order Spool 3 by the hash code of (TST.t2.c1). 
	          The size of Spool 3 is estimated with low confidence to be
	          149,716 rows (3,293,752 bytes).  The estimated time for this
	          step is 3.34 seconds. 
	  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
	     all-rows scan, which is joined to Spool 3 (Last Use) by way of an
	     all-rows scan.  Spool 2 and Spool 3 are joined using an exclusion
	     merge join, with a join condition of ("c1 = c1"). 
	     The result goes into Spool 1 (group_amps), which is built locally
	     on the AMPs.  The size of Spool 1 is estimated with index join
	     confidence to be 1,537,850 rows (150,709,300 bytes).  The
	     estimated time for this step is 22.97 seconds.

150万件のt1を各AMP上でc1順にソート
7.5万件のt2を全AMPにコピーしてc1順にソート
1,2の結果をexclusion merge joinして「t2にないもの」を抽出

という、素直な実行計画になっている。

ここで、t1.c1, t2.c1からnot null制約を削除すると、こんな風になってしまう：

	  4) We do an all-AMPs SUM step to aggregate from TST.t1 by way of an
	     all-rows scan with no residual conditions.  Aggregate Intermediate
	     Results are computed globally, then placed in Spool 4. 
	  5) We do an all-AMPs SUM step to aggregate from TST.t2 by way of an
	     all-rows scan with no residual conditions.  Aggregate Intermediate
	     Results are computed globally, then placed in Spool 7. 
	  6) We execute the following steps in parallel. 
	       1) We do an all-AMPs RETRIEVE step from Spool 7 (Last Use) by
	          way of an all-rows scan into Spool 2 (all_amps), which is
	          duplicated on all AMPs. 
	       2) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by
	          way of an all-rows scan into Spool 3 (all_amps), which is
	          duplicated on all AMPs. 
	  7) We execute the following steps in parallel. 
	       1) We do an all-AMPs RETRIEVE step from TST.t1 by way of an
	          all-rows scan with no residual conditions into Spool 6
	          (all_amps), which is built locally on the AMPs.  Then we do a
	          SORT to order Spool 6 by the hash code of (TST.t1.c1),
	          and null value information in Spool 3 and Spool 2.  Skip this
	          retrieve step if null exists.  The size of Spool 6 is
	          estimated with low confidence to be 1,535,724 rows (
	          138,215,160 bytes).  The estimated time for this step is 1
	          minute and 11 seconds. 
	       2) We do an all-AMPs RETRIEVE step from TST.t2 by way of an
	          all-rows scan with no residual conditions into Spool 9
	          (all_amps), which is duplicated on all AMPs.  Then we do a
	          SORT to order Spool 9 by the hash code of (TST.t2.c1),
	          and null value information in Spool 3 and Spool 2.  Skip this
	          retrieve step if null exists.  The size of Spool 9 is
	          estimated with low confidence to be 149,716 rows (3,293,752
	          bytes).  The estimated time for this step is 3.34 seconds. 
	  8) We execute the following steps in parallel. 
	       1) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of
	          an all-rows scan, which is joined to Spool 9 (Last Use) by
	          way of an all-rows scan.  Spool 6 and Spool 9 are joined
	          using an exclusion merge join, with a join condition of (
	          "c1 = c1"), and null value information in Spool
	          3 and Spool 2.  Skip this join step if null exists.  The
	          result goes into Spool 1 (group_amps), which is built locally
	          on the AMPs.  The size of Spool 1 is estimated with index
	          join confidence to be 1,535,724 rows (150,500,952 bytes). 
	          The estimated time for this step is 29.22 seconds. 
	       2) We do an all-AMPs RETRIEVE step from TST.t1 by way of an
	          all-rows scan with no residual conditions into Spool 10
	          (all_amps), which is built locally on the AMPs.  Then we do a
	          SORT to order Spool 10 by the hash code of (TST.t1.c1),
	          and null value information in Spool 3 and Spool 2.  Skip this
	          retrieve step if there is no null.  The size of Spool 10 is
	          estimated with low confidence to be 1,535,724 rows (
	          138,215,160 bytes).  The estimated time for this step is 1
	          minute and 11 seconds. 
	       3) We do an all-AMPs RETRIEVE step from TST.t2 by way of an
	          all-rows scan with no residual conditions into Spool 11
	          (all_amps), which is duplicated on all AMPs.  Then we do a
	          SORT to order Spool 11 by the hash code of (TST.t2.c1),
	          and null value information in Spool 3 and Spool 2.  Skip this
	          retrieve step if there is no null.  The size of Spool 11 is
	          estimated with low confidence to be 149,716 rows (3,293,752
	          bytes).  The estimated time for this step is 3.34 seconds. 
	  9) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of an
	     all-rows scan, which is joined to Spool 11 (Last Use) by way of an
	     all-rows scan.  Spool 10 and Spool 11 are joined using an
	     exclusion merge join, with a join condition of ("c1 =
	     c1"), and null value information in Spool 3 (Last Use) and
	     Spool 2 (Last Use).  Skip this join step if there is no null.  The
	     result goes into Spool 1 (group_amps), which is built locally on
	     the AMPs.  The size of Spool 1 is estimated with index join
	     confidence to be 1,535,724 rows (150,500,952 bytes).  The
	     estimated time for this step is 29.22 seconds.

正直、これは何をやっているのか読み切れない。が、t1, t2ともに最低2回ずつテーブルスキャンしているので、not null版よりかなり遅くなることが予想できる。
not inは、サブクエリが返す結果表に1行でもnullが含まれている場合、メインクエリの結果を0件にする必要があるので、その辺の手当てのためにStepが増えているようだ。
何やってるかを理解するのは元気なときにやることにして、とりあえず「本来あるべきnot nullの付け忘れは、not in の動作を遅くする」ということだけ覚えておこう。

ちなみに上記SQLを not exists で書き直すと、not null制約を付け忘れたままでも、速い方の実行計画が出てくる。
これは「not existsはサブクエリ側の t2.c1 にnullが混じっていても、(not nullのように)メインクエリの結果が0件にはならない」ことに対応している、のだろう。