Elasticsearch-84-文件系统数据建模及搜索案例

数据建模

对类似文件系统(filesystem)这种的有多层级关系的数据进行建模,比如说路径这种多层级的数据

数据添加

首先先自定义一个分词器.

PUT /fs
{
  "settings": {
    "analysis": {
      "analyzer": {
        "paths": { 
          "tokenizer": "path_hierarchy"
        }
      }
    }
  }
}

ath_hierarchy tokenizer:
举个例子,比如现在有一个路劲是a/b/c/d,经过path_hierarchy这种分词,会被拆分为
a/b/c/d
/a/b/c
/a/b
/a

分词器创建好之后,手动创建mapping

PUT /fs/_mapping/file
{
  "properties": {
    "name": { 
      "type":  "keyword"
    },
    "path": { 
      "type":  "keyword",
      "fields": {
        "tree": { 
          "type":     "text",
          "analyzer": "paths"
        }
      }
    }
  }
}

path这个field里面又创建了一个内置的filed,path.tree

索引创建完成,添加数据

PUT /fs/file/1
{
  "name": "README.txt", 
  "path": "/workspace/projects/helloworld", 
  "contents": "这是我的第一个elasticsearch程序"
}

需求一

查找一份内容包括elasticsearch,在/workspace/projects/helloworld这个目录下的文件

搜索请求:

GET /fs/file/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "contents": "elasticsearch"
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "path": "/workspace/projects/helloworld"
              }
            }
          }
        }
      ]
    }
  }
}

返回值:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.284885,
    "hits": [
      {
        "_index": "fs",
        "_type": "file",
        "_id": "1",
        "_score": 1.284885,
        "_source": {
          "name": "README.txt",
          "path": "/workspace/projects/helloworld",
          "contents": "这是我的第一个elasticsearch程序"
        }
      }
    ]
  }
}

需求二

搜索/workspace目录下,内容包含elasticsearch的所有的文件

GET /fs/file/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "contents": "elasticsearch"
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "path.tree": "/workspace"
              }
            }
          }
        }
      ]
    }
  }
}

请求和上面基本是一样的,只是filter里面用之前创建的path.tree,因为path是不分词的