MongoDB 様々なインデックス

MongoDBの特殊なインデックスの使い方を説明します。マルチキーインデックスは配列に対するインデックスで、地理インデックスは地理情報の検索を早めるインデックスです。TTLインデックスは特殊なインデックスで一定以上古いデータを自動的に削除するインデックスです。

その他、テキストインデックスというものもありますが、MongoDBは日本語の形態素解析に対応していないため本サイトでは扱いません。

前提

公式ドキュメント

参考となる公式ドキュメントを以下に示します。

動作確認済環境

Rocky Linux 8.6
MongoDB Server 6.0.2

ハッシュインデックス

テストデータの生成

「MongoDB 実行計画と基本的なインデックス」で特に断りなく説明したインデックスはB-tree（二分木）インデックスです。MongoDBでは、B-treeだけではなくハッシュ値に基づくインデックスも作成可能です。それではハッシュインデックスを動作確認するためのテストデータを生成しましょう。

B-treeは大小比較をインデックス検索できますが、ハッシュインデックスは大小比較をインデックス検索できません。その代わり等号検索はB-treeよりもハッシュインデックスの方が高速です。大小比較をする業務要件が（将来含めて）存在しないならばハッシュインデックスが望ましいでしょう。

db.inventory.drop()
db.createCollection('inventory')

db.inventory.insertMany([
   { _id: 0, item: "journal", qty: 25, size: { h: 14, w: 21, uom: "cm" }, status: "A" },
   { _id: 1, item: "notebook", qty: 50, size: { h: 8.5, w: 11, uom: "in" }, status: "A" },
   { _id: 2, item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: "in" }, status: "D" },
   { _id: 3, item: "planner", qty: 75, size: { h: 22.85, w: 30, uom: "cm" }, status: "D" },
   { _id: 4, item: "postcard", qty: 45, size: { h: 10, w: 15.25, uom: "cm" }, status: "A" }
]);

動作確認

インデックス作成前の挙動を確認します。item名に基づく検索をすると、COLLSCAN（全走検索）と表示される事が分かります。

test> db.inventory.find( { item : "journal" } ).explain().queryPlanner
{
  namespace: 'test.inventory',
  indexFilterSet: false,
  parsedQuery: { item: { '$eq': 'journal' } },
  queryHash: '8545567D',
  planCacheKey: '8545567D',
  maxIndexedOrSolutionsReached: false,
  maxIndexedAndSolutionsReached: false,
  maxScansToExplodeReached: false,
  winningPlan: {
    stage: 'COLLSCAN',
    filter: { item: { '$eq': 'journal' } },
    direction: 'forward'
  },
  rejectedPlans: []
}

ハッシュインデックスを作成します。操作方法は以下のようにフィールド名に対して”hashed”と指定します。

db.inventory.createIndex( { item: "hashed" } )

インデックス作成後の実行計画を観察します。IXSCAN(インデックス検索)に変わった事が分かります。また、keyPatternに{ item: ‘hashed’ }と書かれている事からハッシュインデックスに基づく検索がなされている事が分かります。

test> db.inventory.find( { item : "journal" } ).explain().queryPlanner
{
  namespace: 'test.inventory',
  indexFilterSet: false,
  parsedQuery: { item: { '$eq': 'journal' } },
  queryHash: '8545567D',
  planCacheKey: '42BAD16E',
  maxIndexedOrSolutionsReached: false,
  maxIndexedAndSolutionsReached: false,
  maxScansToExplodeReached: false,
  winningPlan: {
    stage: 'FETCH',
    filter: { item: { '$eq': 'journal' } },
    inputStage: {
      stage: 'IXSCAN',
      keyPattern: { item: 'hashed' },
      indexName: 'item_hashed',
      isMultiKey: false,
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: { item: [ '[5643570574010195245, 5643570574010195245]' ] }
    }
  },
  rejectedPlans: []
}
test>

マルチキーインデックス

テストデータの生成

マルチキーインデックスとは、フィールドが配列の場合に対して適用できるインデックスです。以下のような映画批評を扱うデータで、評価(ratings)を配列として格納するような場合を例に挙げます。

db.movieReview.drop()
db.createCollection('movieReview')

db.movieReview.insertMany([
  { _id: 5, type: "Action", item: "Top Gun Maverick", ratings: [ 5, 8, 9, 8, 7 ] },
  { _id: 6, type: "Action", item: "Matrix Reloaded", ratings: [ 5, 9, 6, 8 ] },
  { _id: 7, type: "Action", item: "Jurassic World", ratings: [ 9, 5, 8, 7, 8 ] },
  { _id: 8, type: "Jaws", item: "Ouija Shark", ratings: [ 4, 5, 2, 4, 6 ] },
  { _id: 9, type: "Jaws", item: "Alien vs Jaws", ratings: [ 2, 1, 3, 4, 1 ] }
])

このような配列データを扱う場合は、$elemMatch句を使用すると要素を含むか否かの検索をする事ができます。例えば、評価1を含む映画を検索するならば、以下のように操作します。

test> db.movieReview.find( { ratings : { $elemMatch: { $eq: 1 } } } )
[
  {
    _id: 9,
    type: 'Jaws',
    item: 'Alien vs Jaws',
    ratings: [ 2, 1, 3, 4, 1 ]
  }
]
test>

評価1から評価3までを含む映画ならば、以下のように検索します。

test> db.movieReview.find( { ratings : { $elemMatch: { $gte: 1, $lte: 3 } } } )
[
  {
    _id: 8,
    type: 'Jaws',
    item: 'Ouija Shark',
    ratings: [ 4, 5, 2, 4, 6 ]
  },
  {
    _id: 9,
    type: 'Jaws',
    item: 'Alien vs Jaws',
    ratings: [ 2, 1, 3, 4, 1 ]
  }
]
test>

動作確認

インデックス作成前の挙動を確認します。評価1以下の映画を検索する時の実行計画を観察すると、COLLSCAN（全走検索）と表示される事が分かります。

test> db.movieReview.find( { ratings : { $elemMatch: { $lte: 1 } } } ).explain().queryPlanner
{
  namespace: 'test.movieReview',
  indexFilterSet: false,
  parsedQuery: { ratings: { '$elemMatch': { '$lte': 1 } } },
  queryHash: 'ED2492B3',
  planCacheKey: 'ED2492B3',
  maxIndexedOrSolutionsReached: false,
  maxIndexedAndSolutionsReached: false,
  maxScansToExplodeReached: false,
  winningPlan: {
    stage: 'COLLSCAN',
    filter: { ratings: { '$elemMatch': { '$lte': 1 } } },
    direction: 'forward'
  },
  rejectedPlans: []
}
test>

マルチキーインデックスを作成します。操作方法は単一キーインデックスの作成と全く同じです。

操作方法は省略しますが、複合キーインデックスとマルチキーインデックスを併用する事もできます。

db.movieReview.createIndex( { ratings: 1 } )

インデックス作成後の実行計画を観察します。IXSCAN(インデックス検索)に変わった事が分かります。

test> db.movieReview.find( { ratings : { $elemMatch: { $lte: 1 } } } ).explain().queryPlanner
{
  namespace: 'test.movieReview',
  indexFilterSet: false,
  parsedQuery: { ratings: { '$elemMatch': { '$lte': 1 } } },
  queryHash: 'ED2492B3',
  planCacheKey: '01BDAE38',
  maxIndexedOrSolutionsReached: false,
  maxIndexedAndSolutionsReached: false,
  maxScansToExplodeReached: false,
  winningPlan: {
    stage: 'FETCH',
    filter: { ratings: { '$elemMatch': { '$lte': 1 } } },
    inputStage: {
      stage: 'IXSCAN',
      keyPattern: { ratings: 1 },
      indexName: 'ratings_1',
      isMultiKey: true,
      multiKeyPaths: { ratings: [ 'ratings' ] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: { ratings: [ '[-inf.0, 1]' ] }
    }
  },
  rejectedPlans: []
}
test>

TTLインデックス

テストデータの生成

TTLインデックスとは、Date型のフィールドに着目し、一定時間前のものを自動的に削除する機能です。例えば、3ヶ月使用がなかったユーザを削除するなどの用途に使えます。

それでは動作確認をしてみましょう。まずは10秒毎にテストデータを生成するスクリプトを作成します。このスクリプトを実行し、しばらく放置します。

#!/bin/bash

mongosh --quiet << EOF
db.student.drop()
db.createCollection('student')
EOF

seq=0

while true ; do
  mongosh --quiet << EOF
  db.student.insert({_id: ${seq}, name: "yamada${seq}", createdAt: new Date()})
EOF
  sleep 10
  seq=$((seq+1))
done

動作確認

しばらく放置した後に、レコードの登録件数と、先頭3レコードのデータを確認します。すべてのデータが登録されている事が分かります。

test> db.student.count()
DeprecationWarning: Collection.count() is deprecated. Use countDocuments or estimatedDocumentCount.
803
test> db.student.find().limit(3)
[
  {
    _id: 0,
    name: 'yamada0',
    createdAt: ISODate("2022-10-21T12:07:01.915Z")
  },
  {
    _id: 1,
    name: 'yamada1',
    createdAt: ISODate("2022-10-21T12:07:14.272Z")
  },
  {
    _id: 2,
    name: 'yamada2',
    createdAt: ISODate("2022-10-21T12:07:26.772Z")
  }
]
test>

過去300秒(5分)で更新がなかったデータを削除するTTLインデックスを作成します。より具体的に言えば、createdAtフィールドが過去300秒(5分)よりも古いレコードを自動的に削除するTTLインデックスを作成します。

db.student.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 300 }
)

インデックス作成から2,3分待ちます。レコードの登録件数と先頭3レコードのデータを確認し、確かに過去300秒(5分)以外のデータが削除されている事を確認します。

test> db.student.count()
27
test> db.student.find().limit(3)
[
  {
    _id: 794,
    name: 'yamada794',
    createdAt: ISODate("2022-10-21T14:51:29.110Z")
  },
  {
    _id: 795,
    name: 'yamada795',
    createdAt: ISODate("2022-10-21T14:51:41.659Z")
  },
  {
    _id: 796,
    name: 'yamada796',
    createdAt: ISODate("2022-10-21T14:51:53.938Z")
  }
]
test>

地理インデックス(2dsphere Indexes)

テストデータの生成

地理インデックス(2dsphere Indexes)は緯度経度情報に基づき、直線距離範囲内や正方形範囲内にあるかどうかを処理するインデックスを生成できます。

それではテストデータとして、関東7都県の地理データをテストデータとして登録します。緯度経度の小数点以下は60進数ではなく10進数で表記してください。

db.prefecture.drop()
db.createCollection('prefecture')

db.prefecture.insertMany( [
   {
      _id : 8,
      loc : { type: "Point", coordinates: [ 140.44667, 36.34139 ] },
      prefecture: "茨城",
      city : "水戸"
   },
   {
      _id : 9,
      loc : { type: "Point", coordinates: [ 139.88361, 36.56583 ] },
      prefecture: "栃木",
      city : "宇都宮"
   },
   {
      _id : 10,
      loc : { type: "Point", coordinates: [ 139.06083, 36.39111] },
      prefecture: "群馬",
      city : "前橋"
   },
   {
      _id : 11,
      loc : { type: "Point", coordinates: [ 139.64889, 35.85694] },
      prefecture: "埼玉",
      city : "さいたま"
   },
   {
      _id : 12,
      loc : { type: "Point", coordinates: [ 140.12333, 35.60472 ] },
      prefecture: "千葉",
      city : "千葉"
   },
   {
      _id : 13,
      loc : { type: "Point", coordinates: [ 139.69167, 35.68944 ] },
      prefecture: "東京",
      city : "新宿"
   },
   {
      _id : 14,
      loc : { type: "Point", coordinates: [ 139.6425, 35.44778 ] },
      prefecture: "神奈川",
      city : "横浜市"
   }

] )

例えば、東京(西経139.69167度北緯35.68944度)から30Km範囲内の県庁所在地を求めるクエリは以下のように記述します。centerSphereの第2引数は東京からの角度をラジアン単位で指定します。地球の半径は6371Kmなので30Km範囲内は 30/6371と計算できます。

test> db.prefecture.find( { loc :
...                   { $geoWithin :
...                     { $centerSphere :
...                        [ [ 139.69167, 35.68944 ] , 30 / 6371 ]
...                 } } } )
[
  {
    _id: 11,
    loc: { type: 'Point', coordinates: [ 139.64889, 35.85694 ] },
    prefecture: '埼玉',
    city: 'さいたま'
  },
  {
    _id: 13,
    loc: { type: 'Point', coordinates: [ 139.69167, 35.68944 ] },
    prefecture: '東京',
    city: '新宿'
  },
  {
    _id: 14,
    loc: { type: 'Point', coordinates: [ 139.6425, 35.44778 ] },
    prefecture: '神奈川',
    city: '横浜市'
  }
]
test>

動作確認

test> db.prefecture.find( { loc :
...                   { $geoWithin :
...                     { $centerSphere :
...                        [ [ 139.69167, 35.68944 ] , 30 / 6371 ]
...                 } } } ).explain().queryPlanner
{
  namespace: 'test.prefecture',
  indexFilterSet: false,
  parsedQuery: {
    loc: {
      '$geoWithin': {
        '$centerSphere': [ [ 139.69167, 35.68944 ], 0.0047088369172814315 ]
      }
    }
  },
  queryHash: 'CD83596A',
  planCacheKey: 'CD83596A',
  maxIndexedOrSolutionsReached: false,
  maxIndexedAndSolutionsReached: false,
  maxScansToExplodeReached: false,
  winningPlan: {
    stage: 'COLLSCAN',
    filter: {
      loc: {
        '$geoWithin': {
          '$centerSphere': [ [ 139.69167, 35.68944 ], 0.0047088369172814315 ]
        }
      }
    },
    direction: 'forward'
  },
  rejectedPlans: []
}
test>

マルチキーインデックスを作成します。地理インデックスを使用する場合は、”2dsphere”と指定します。

db.prefecture.createIndex( { loc : "2dsphere" } )

インデックス作成後の実行計画を観察します。IXSCAN(インデックス検索)に変わった事が分かります。

test> db.prefecture.find( { loc :
...                   { $geoWithin :
...                     { $centerSphere :
...                        [ [ 139.69167, 35.68944 ] , 30 / 6371 ]
...                 } } } ).explain().queryPlanner
{
  namespace: 'test.prefecture',
  indexFilterSet: false,
  parsedQuery: {
    loc: {
      '$geoWithin': {
        '$centerSphere': [ [ 139.69167, 35.68944 ], 0.0047088369172814315 ]
      }
    }
  },
  queryHash: 'CD83596A',
  planCacheKey: '207B42F4',
  maxIndexedOrSolutionsReached: false,
  maxIndexedAndSolutionsReached: false,
  maxScansToExplodeReached: false,
  winningPlan: {
    stage: 'FETCH',
    filter: {
      loc: {
        '$geoWithin': {
          '$centerSphere': [ [ 139.69167, 35.68944 ], 0.0047088369172814315 ]
        }
      }
    },
    inputStage: {
      stage: 'IXSCAN',
      keyPattern: { loc: '2dsphere' },
      indexName: 'loc_2dsphere',
      isMultiKey: false,
      multiKeyPaths: { loc: [] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: {
        loc: [
          '[6922032627268452352, 6922032627268452352]',
          '[6924354795826315264, 6924354795826315264]',
          '[6924357544605384705, 6924358094361198591]',
          '[6924358094361198592, 6924358094361198592]',
          '[6924359193872826368, 6924359193872826368]',
          '[6924372388012359680, 6924372388012359680]',
          '[6924376786058870784, 6924376786058870784]',
          '[6924378985082126337, 6924381184105381887]',
          '[6924381184105381889, 6924389980198404095]',
          '[6924389980198404097, 6924398776291426303]',
          '[6924398776291426305, 6924400975314681855]',
          '[6924403174337937408, 6924403174337937408]',
          '[6924405373361192961, 6924407572384448511]',
          '[6924407572384448512, 6924407572384448512]',
          '[6924407572384448513, 6924409771407704063]',
          '[6924411970430959616, 6924411970430959616]',
          '[6924415268965842944, 6924415268965842944]',
          '[6924415268965842945, 6924415818721656831]',
          '[6924416368477470721, 6924425164570492927]',
          '[6924425164570492929, 6924460348942581759]',
          '[6924477941128626176, 6924477941128626176]',
          '[6924486737221648385, 6924495533314670591]',
          '[6924495533314670592, 6924495533314670592]',
          '[6924495533314670593, 6924504329407692799]',
          '[6924513125500715008, 6924513125500715008]',
          '[6924521921593737217, 6924530717686759423]',
          '[6924530717686759425, 6924565902058848255]',
          '[6924565902058848256, 6924565902058848256]',
          '[6924583494244892672, 6924583494244892672]',
          '[6924592290337914881, 6924601086430937087]',
          '[6924636270803025920, 6924636270803025920]',
          '[6925410326988980224, 6925410326988980224]',
          '[6927169545593421824, 6927169545593421824]',
          '[6927222322151555072, 6927222322151555072]',
          '[6927235516291088384, 6927235516291088384]',
          '[6927238814825971712, 6927238814825971712]',
          '[6927239364581785601, 6927239914337599487]',
          '[6927239914337599489, 6927242113360855039]',
          '[6927244312384110592, 6927244312384110592]',
          '[6927257506523643904, 6927257506523643904]',
          '[6927310283081777152, 6927310283081777152]',
          '[6927380651825954816, 6927380651825954816]',
          '[6927662126802665472, 6927662126802665472]',
          '[6931039826523193344, 6931039826523193344]',
          '[6935543426150563840, 6935543426150563840]',
          '[6989586621679009792, 6989586621679009792]'
        ]
      }
    }
  },
  rejectedPlans: []
}
test>