MeCab Pluginと違って特にINSTALL PLUGINとかせずに即使える。
mysql> ALTER TABLE articles ADD FULLTEXT KEY (title, content) WITH PARSER ngram; Query OK, 0 rows affected (6 hours 25 min 20.67 sec) Records: 0 Duplicates: 0 Warnings: 0
( д ) ゚ ゚ 6時間半!? しょうがないとはいえ、MeCabのときは1時間半でお釣りが来てたのにと思うとちょっとしょんぼり。
時間もさることながら、このやり方(データをインポートしてからALTER TABLEでインデックス追加)だと容量がかなり持っていかれる。
[root@v157-7-235-254 mysql]# while true ; do > date ; df -h > echo "" > sleep 60 > done Tue Mar 10 10:13:41 JST 2015 Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup-lv_root 97G 16G 76G 18% / tmpfs 499M 0 499M 0% /dev/shm /dev/vda1 485M 32M 428M 7% /boot .. Tue Mar 10 16:12:45 JST 2015 Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup-lv_root 97G 74G 18G 81% / tmpfs 499M 0 499M 0% /dev/shm /dev/vda1 485M 32M 428M 7% /boot .. Tue Mar 10 16:44:51 JST 2015 Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup-lv_root 97G 26G 66G 28% / tmpfs 499M 0 499M 0% /dev/shm /dev/vda1 485M 32M 428M 7% /boot # lsof -p `pidof mysqld` COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mysqld 3049 mysql cwd DIR 253,0 4096 919114 /usr/local/mysql/data mysqld 3049 mysql rtd DIR 253,0 4096 2 / mysqld 3049 mysql txt REG 253,0 193624979 919090 /usr/local/mysql/bin/mysqld mysqld 3049 mysql mem REG 253,0 5624 914553 /lib64/libaio.so.1.0.1 mysqld 3049 mysql DEL REG 253,0 914859 /lib64/libfreebl3.so mysqld 3049 mysql mem REG 253,0 49199027 2231548 /usr/local/mysql/lib/mecab/dic/ipadic_utf-8/sys.dic mysqld 3049 mysql mem REG 253,0 3463716 2231541 /usr/local/mysql/lib/mecab/dic/ipadic_utf-8/matrix.bin mysqld 3049 mysql mem REG 253,0 262496 2231546 /usr/local/mysql/lib/mecab/dic/ipadic_utf-8/char.bin mysqld 3049 mysql mem REG 253,0 3988451 2231491 /usr/local/mysql/lib/plugin/libpluginmecab.so mysqld 3049 mysql mem REG 253,0 65928 914006 /lib64/libnss_files-2.12.so mysqld 3049 mysql DEL REG 253,0 913936 /lib64/libc-2.12.so mysqld 3049 mysql DEL REG 253,0 915115 /lib64/libgcc_s-4.4.7-20120601.so.1.#prelink#.JjQM5q mysqld 3049 mysql DEL REG 253,0 913990 /lib64/libm-2.12.so mysqld 3049 mysql DEL REG 253,0 263139 /usr/lib64/libstdc++.so.6.0.13.#prelink#.htv5XJ mysqld 3049 mysql DEL REG 253,0 914081 /lib64/librt-2.12.so mysqld 3049 mysql DEL REG 253,0 913976 /lib64/libdl-2.12.so mysqld 3049 mysql DEL REG 253,0 913940 /lib64/libcrypt-2.12.so.#prelink#.3QGWZ8 mysqld 3049 mysql DEL REG 253,0 913960 /lib64/libpthread-2.12.so.#prelink#.xdy3QH mysqld 3049 mysql DEL REG 253,0 913930 /lib64/ld-2.12.so mysqld 3049 mysql mem REG 253,0 5690 2231545 /usr/local/mysql/lib/mecab/dic/ipadic_utf-8/unk.dic mysqld 3049 mysql 0r CHR 1,3 0t0 3788 /dev/null mysqld 3049 mysql 1w REG 253,0 15868 919127 /usr/local/mysql/data/error.log mysqld 3049 mysql 2w REG 253,0 15868 919127 /usr/local/mysql/data/error.log mysqld 3049 mysql 3uW REG 253,0 79691776 919158 /usr/local/mysql/data/ibdata1 .. mysqld 3049 mysql 45uW REG 253,0 11286872064 1305615 /usr/local/mysql/data/wikipedia/articles.ibd mysqld 3049 mysql 46uW REG 253,0 98304 1305604 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_000000000000013a_INDEX_1.ibd mysqld 3049 mysql 47uW REG 253,0 98304 1305607 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_000000000000013a_INDEX_2.ibd mysqld 3049 mysql 48uW REG 253,0 98304 1305611 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_000000000000013a_INDEX_3.ibd mysqld 3049 mysql 49uW REG 253,0 98304 1305613 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_000000000000013a_INDEX_4.ibd mysqld 3049 mysql 50uW REG 253,0 98304 1305614 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_000000000000013a_INDEX_5.ibd mysqld 3049 mysql 51uW REG 253,0 98304 1305616 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_000000000000013a_INDEX_6.ibd mysqld 3049 mysql 52uW REG 253,0 98304 1305603 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_DELETED.ibd mysqld 3049 mysql 53uW REG 253,0 98304 1305605 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_DELETED_CACHE.ibd mysqld 3049 mysql 54uW REG 253,0 98304 1305606 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_BEING_DELETED.ibd mysqld 3049 mysql 55uW REG 253,0 98304 1305610 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_BEING_DELETED_CACHE.ibd mysqld 3049 mysql 56uW REG 253,0 98304 1305612 /usr/local/mysql/data/wikipedia/FTS_00000000000000f7_CONFIG.ibd mysqld 3049 mysql 58u REG 253,0 10842275840 1046745 /data/tmp/ibDDeVbj (deleted) mysqld 3049 mysql 59u REG 253,0 696254464 1046746 /data/tmp/ibaNLTUS (deleted) mysqld 3049 mysql 60u REG 253,0 1790967808 1046747 /data/tmp/ibNm7Vmi (deleted) mysqld 3049 mysql 61u REG 253,0 1623195648 1046748 /data/tmp/ibl17OJP (deleted) mysqld 3049 mysql 62u REG 253,0 8006926336 1046749 /data/tmp/ibBoSvqQ (deleted) mysqld 3049 mysql 63u REG 253,0 8006926336 1046750 /data/tmp/ibudnROb (deleted) mysqld 3049 mysql 64u REG 253,0 10904141824 1046751 /data/tmp/ibNbFRxL (deleted) mysqld 3049 mysql 65u REG 253,0 698351616 1046752 /data/tmp/ibYchSgl (deleted) mysqld 3049 mysql 66u REG 253,0 1802502144 1046753 /data/tmp/ibYYJGHW (deleted) mysqld 3049 mysql 67u REG 253,0 1631584256 1046754 /data/tmp/ibQOsAMA (deleted) mysqld 3049 mysql 68u REG 253,0 8050966528 1046755 /data/tmp/ibiSAO6i (deleted) mysqld 3049 mysql 69u REG 253,0 8050966528 1046756 /data/tmp/ibGrGYaE (deleted) mysqld 3049 mysql 70u REG 253,0 361758720 1046757 /data/tmp/ibQvfudp (deleted) mysqld 3049 mysql 71u REG 253,0 359661568 1046758 /data/tmp/ibN0BTAJ (deleted)
最終的に元に戻るとはいえ、tmpdirに数GB単位のテンポラリーファイルをいくつもつくられてしかもそれが途中で解放されないのは結構つらい。。
まあそこはそれとして
mysql> CREATE TABLE t1 (val varchar(32)); Query OK, 0 rows affected (0.26 sec) mysql> INSERT INTO t1 VALUES ('にっこにっこにー'); Query OK, 1 row affected (0.08 sec) mysql> SELECT * FROM t1; +--------------------------+ | val | +--------------------------+ | にっこにっこにー | +--------------------------+ 1 row in set (0.00 sec)
こんなサンプルがあるじゃろ?
mysql> SELECT @@ngram_token_size; +--------------------+ | @@ngram_token_size | +--------------------+ | 2 | +--------------------+ 1 row in set (0.00 sec) mysql> ALTER TABLE t1 ADD FULLTEXT KEY idx_2gram(val) WITH PARSER ngram; Query OK, 0 rows affected (0.44 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> SET GLOBAL innodb_ft_aux_table = 'd1/t1'; Query OK, 0 rows affected (0.00 sec) mysql> OPTIMIZE TABLE t1; +-------+----------+----------+-------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +-------+----------+----------+-------------------------------------------------------------------+ | d1.t1 | optimize | note | Table does not support optimize, doing recreate + analyze instead | | d1.t1 | optimize | status | OK | +-------+----------+----------+-------------------------------------------------------------------+ 2 rows in set (0.36 sec) mysql> SELECT * FROM information_schema.innodb_ft_index_cache ORDER BY position; +--------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +--------+--------------+-------------+-----------+--------+----------+ | にっ | 2 | 2 | 1 | 2 | 0 | | っこ | 2 | 2 | 1 | 2 | 3 | | こに | 2 | 2 | 1 | 2 | 6 | | っこ | 2 | 2 | 1 | 2 | 9 | | にっ | 2 | 2 | 1 | 2 | 9 | | こに | 2 | 2 | 1 | 2 | 9 | | にー | 2 | 2 | 1 | 2 | 18 | +--------+--------------+-------------+-----------+--------+----------+ 7 rows in set (0.02 sec) mysql> SELECT * FROM information_schema.innodb_ft_config; +---------------------------+-------+ | KEY | VALUE | +---------------------------+-------+ | optimize_checkpoint_limit | 180 | | synced_doc_id | 0 | | stopword_table_name | | | use_stopword | 1 | +---------------------------+-------+ 4 rows in set (0.03 sec)
ngram_token_size変数でトークンのサイズを制御できる。この変数はオンライン変更できないので、my.cnfに設定してからmysqld再起動。
mysql> SET GLOBAL innodb_ft_aux_table = 'd1/t1'; Query OK, 0 rows affected (0.00 sec) mysql> OPTIMIZE TABLE t1; +-------+----------+----------+-------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +-------+----------+----------+-------------------------------------------------------------------+ | d1.t1 | optimize | note | Table does not support optimize, doing recreate + analyze instead | | d1.t1 | optimize | status | OK | +-------+----------+----------+-------------------------------------------------------------------+ 2 rows in set (0.31 sec) mysql> SELECT * FROM information_schema.innodb_ft_index_cache ORDER BY position; +-----------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +-----------+--------------+-------------+-----------+--------+----------+ | にっこ | 2 | 2 | 1 | 2 | 0 | | っこに | 2 | 2 | 1 | 2 | 3 | | こにっ | 2 | 2 | 1 | 2 | 6 | | にっこ | 2 | 2 | 1 | 2 | 9 | | っこに | 2 | 2 | 1 | 2 | 9 | | こにー | 2 | 2 | 1 | 2 | 15 | +-----------+--------------+-------------+-----------+--------+----------+ 6 rows in set (0.00 sec) mysql> SELECT * FROM information_schema.innodb_ft_config; +---------------------------+-------+ | KEY | VALUE | +---------------------------+-------+ | optimize_checkpoint_limit | 180 | | synced_doc_id | 0 | | stopword_table_name | | | use_stopword | 1 | +---------------------------+-------+ 4 rows in set (0.01 sec)
あ、あれ? 再作成しなくても切りなおされるの?;
0 件のコメント :
コメントを投稿