qop's notes: MongoDB 索引

2020-04-30

MongoDB 索引

MongoDB 和傳統關聯式資料庫概念相念，都離不開 CRUD，裡面也是讀取（查詢）資料是最多變化的，也可能是最複雜的。其他三個動作，只要讀取熟悉了之後，就相對簡單。

讀取資料，經常需要排序，為了效能也有索引的設置，其中包含單欄（single field）索引和複合索引（compound index）。

首先先使用 explain() 查看讀取的策略，如下列（省略了不相干的內容）。在 queryPlanner.winningPlan 可以看到 stage 值為 COLLSCAN，代表 collections scan，也就是逐筆讀取。

> db.inventory.find()
{ "_id" : ObjectId("5ea4715cda0c749138d46e52"), "item" : "paper", "qty" : 100 }
{ "_id" : ObjectId("5ea4715cda0c749138d46e53"), "item" : "journal", "quantity" : 25 }
{ "_id" : ObjectId("5ea4715cda0c749138d46e54"), "item" : "planner", "qty" : 75 }
{ "_id" : ObjectId("5ea4715cda0c749138d46e55"), "item" : "postcard", "qty" : 45 }
> db.inventory.find({item: 'postcard'}).explain()
{
 "queryPlanner" : {
  "winningPlan" : {
   "stage" : "COLLSCAN",
   "filter" : {
    "item" : {
     "$eq" : "postcard"
    }
   },
   "direction" : "forward"
  }
 }
}

接著我們使用 createIndex() 新增一個 item 欄位的升冪索引。numIndexesAfter 表示加入索引後的索引數量。再使用 explain() 查看讀取的策略，stage 變成 FETCH。原則上使用 item 這欄為條件讀取資料時，速度會加快許多。

> db.inventory.createIndex({item: 1})
{
 "createdCollectionAutomatically" : false,
 "numIndexesBefore" : 1,
 "numIndexesAfter" : 2,
 "ok" : 1
}
> db.inventory.getIndexes()  // 查看所有索引
[
 {
  "v" : 2,
  "key" : {
   "_id" : 1
  },
  "name" : "_id_",
  "ns" : "test.inventory"
 },
 {
  "v" : 2,
  "key" : {
   "item" : 1
  },
  "name" : "item_1",
  "ns" : "test.inventory"
 }
]
> db.inventory.find({item: 'postcard'}).explain()
{
 "queryPlanner" : {
  "winningPlan" : {
   "stage" : "FETCH",
   "inputStage" : {
    "stage" : "IXSCAN",
    "keyPattern" : {
     "item" : 1
    },
    "indexName" : "item_1",
    "isMultiKey" : false,
    "multiKeyPaths" : {
     "item" : [ ]
    },
    "isUnique" : false,
    "isSparse" : false,
    "isPartial" : false,
    "indexVersion" : 2,
    "direction" : "forward",
    "indexBounds" : {
     "item" : [
      "[\"postcard\", \"postcard\"]"
     ]
    }
   }
  }
 }
}

組合索引的官方說明先新增測試的資料，並建立複合索引，使用 {a: 1, b: 1}：

> db.compoundTest.insertMany([
...     {a: 10, b: 2}, {a: 10, b: 8}, {a: 10, b: 6},
...     {a: 70, b: 2}, {a: 70, b: 8}, {a: 70, b: 6},
...     {a: 30, b: 2}, {a: 30, b: 8}, {a: 30, b: 6},
... ])
> db.compoundTest.createIndex({a:1, b:1})
{
 "createdCollectionAutomatically" : false,
 "numIndexesBefore" : 1,
 "numIndexesAfter" : 2,
 "ok" : 1
}
> db.compoundTest.getIndexes()
[
 {
  "v" : 2,
  "key" : {
   "_id" : 1
  },
  "name" : "_id_",
  "ns" : "test.compoundTest"
 },
 {
  "v" : 2,
  "key" : {
   "a" : 1,
   "b" : 1
  },
  "name" : "a_1_b_1",
  "ns" : "test.compoundTest"
 }
]

以下是測試及取得 queryPlanner.winningPlan.stage 的結果：

db.compoundTest.explain().aggregate({$sort:{a: 1}})  // FETCH
db.compoundTest.explain().aggregate({$sort:{a: -1}})  // FETCH
db.compoundTest.explain().aggregate({$sort:{a: 1, b:1}})  // FETCH
db.compoundTest.explain().aggregate({$sort:{a: 1, b:-1}})  // COLLSCAN
db.compoundTest.explain().aggregate({$sort:{a: -1, b:-1}})  // FETCH
db.compoundTest.explain().aggregate({$sort:{a: -1, b:1}})  // COLLSCAN

db.compoundTest.explain().aggregate({$sort:{b: 1}})  // COLLSCAN
db.compoundTest.explain().aggregate({$sort:{b: -1}})  // COLLSCAN

由結果可以知道，複合索引的欄位是有優先順序的。我們 a 放在前面，b 放在後面，所以用 a 去排序的時候都是有效的。a 加上 b 時，只有 {a: 1, b:1} 和 {a: -1, b:-1} 是有效的，因為 {a: -1, b:-1} 是設定的相反順序排序。所有 b 為主要順序欄位的，索引都無法發揮省時的效果，只能逐筆排查詢。

沒有留言:

張貼留言

2020-04-30

MongoDB 索引

MongoDB 索引

沒有留言:

FB 留言