Elasticsearch-19-倒排索引核心原理

假设我们现在有两个document.

document1: I really liked my small dogs, and I think my mom also liked them.

document2: He never liked any dogs, so I hope that my mom will not expect me to liked him.

两个document中的数据将会被分词,比如分成这样

这个时候我们如果搜索 mother like little dog 的时候,不会有任何结果的
先回对搜索条件拆词,拆分为
mother
like
little
dog

这个时候去上面的倒排索引去匹配,发现没有一个词是可以匹配的到的. 这显然不是我门想要的搜索结果

其实建立倒排索引的时候,还会做一件事,就是进行normalization标准化,包括时态转换，复数，同义词，大小写等,对拆出的各个单词进行相应的处理,以便后面搜索的时候能够搜索到相关联document的概率

进行normalization后的倒排索引:

word	document1	document2	normalization
I	√	√
really	√
like	√	√	liked – >like
my	√	√
little	√		small –> little
dog	√	√	dogs –> dog
and	√
think	√
mom	√	√
also	√
them	√
He		√
never		√
any		√
so		√
hope		√
that		√
will		√
not		√
expect		√
me		√
to		√
him		√

这时候再按上面的搜索条件 mother like little dog 搜索,将搜索条件分词,进行normalization后
mother –> mom
like –> like
little –> little
dog –> dog

这时候拿关键词去匹配上面的倒排索引,就能把document1和document2都搜索出来