Elasticsearch-22-mapping详解

什么是mapping

往es里面直接插入数据,es会自动建立索引,同时建立type以及对应的mapping.

mapping中就定义了每个field的数据类型

不同的数据类型,可能有的是精确搜索(exact value),有的是全文检索(full text).

exact value在建立倒排索引的时候,分词是将整个值一起作为一个关键词建立到倒排索引中的;而full text是会经过各种处理的,分词 normalization 才会建立到倒排索引中.

一个搜索过来的时候对exact value field或者是full text field进行搜索的行为也是不一样的,会跟建立倒排索引的行为保持一致;比如说exact value搜索的时候,就是直接按照整个值进行匹配;full text query string,也会进行分词和normalization再去倒排索引中去搜索

可以用es的dynamic mapping 让其自动建立mapping,包括自动设置数据类型,也可以提前手动创建index的type的mapping,自己对各自的field进行设置,包括数据类型,索引行为,分词器等等.

总结: mapping,就是index的type的元数据,每个type都有一个自己的mapping,决定了数据类型,建立倒排索引的行为,还有进行搜索的行为

核心数据类型

mapping 下的核心数据类型有:

字符串: String
整型: byte, short, integer, long
浮点型: float, double
布尔型: boolean
日期类型: date

dynamic mapping 数据类型映射

数据	映射后的数据类型
true/fasle	boolean
123	long
123.45	double
2017-01-01	date
“hello world”	string/text

查询mapping

语法:

1	GET /index/_mapping/type

手动建立mapping

只能在创建index的时候手动建立mapping,或者新增field mapping, 不能修改 filed mapping

之间我们建立过一个website的index,我们先删掉.

1	DELETE /website

现在来手动建立这个索引,并手动创建mapping

PUT /website
{
  "mappings": {
    "article": {
      "properties": {
        "author_id":{       // field
          "type": "long"    // 类型
        },
        "title":{
          "type": "text",
          "analyzer": "english" // 指定分词器
        },
        "content":{
          "type": "text"
        },
        "post_date":{
          "type": "date"
        },
        "publisher_id":{
          "type": "string", 
          "index": "not_analyzed" // 不分词,就是 exact value ,上面的类型一定要写string,否则不生效
        }
      }
    }
  }
}

analyzed:分词
not_analyzed:不分词
no:直接不建立到倒排索引里,也就是说搜索不到

好了,创建完成,然后我们来尝试修改一下这个mapping.

PUT /website
{
  "mappings": {
    "article": {
      "properties": {
        "author_id":{
          "type": "text"
        }
      }
    }
  }
}

返回值:

{
  "error": {
    "root_cause": [
      {
        "type": "index_already_exists_exception",
        "reason": "index [website/-6NKQPj3TPWDrrlxhalkmw] already exists",
        "index_uuid": "-6NKQPj3TPWDrrlxhalkmw",
        "index": "website"
      }
    ],
    "type": "index_already_exists_exception",
    "reason": "index [website/-6NKQPj3TPWDrrlxhalkmw] already exists",
    "index_uuid": "-6NKQPj3TPWDrrlxhalkmw",
    "index": "website"
  },
  "status": 400
}

运行后发现,报错了,原因就是建立好的field mapping是不能去修改的,但是我们可以新增一个field,并指定type等

PUT /website/_mapping/article
{
  "properties": {
    "new_field":{
      "type": "string",
      "index": "not_analyzed"
    }
  }
}

返回值:

1
2
3

{
  "acknowledged": true
}

可以看到已经新增成功了.

测试mapping

完成后我们测试一下分词的效果, content这个field是普通的text类型,我们来测试一下

GET /website/_analyze
{
  "field": "content",
  "text": "my-dogs"
}

返回值:

{
  "tokens": [
    {
      "token": "my",
      "start_offset": 0,
      "end_offset": 2,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "dogs",
      "start_offset": 3,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

可以看到my-dogs 被拆分为 my dogs,去掉了 - ,没有进行单复数转换等,因为默认的分词器就是standard analyzer(标准分词器)

再来试一下new_field,我们在设置的时候这个filed是不能分词的

GET website/_analyze
{
  "field": "new_field",
  "text": "my dogs"
}

返回值:

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[f57uV91][127.0.0.1:9300][indices:admin/analyze[s]]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Can't process field [new_field], Analysis requests are only supported on tokenized fields"
  },
  "status": 400
}

报错了,因为这个field是不能分词的.