ElasticSearch ES使用ngram分词器

言曌 2021年12月03日11:30:35 6 9679 views

博主分享免费Java教学视频，B站账号：Java刘哥，长期提供技术问题解决、项目定制：本站商品点此

一、standard分词分析

ES默认分词器standard不支持我目前的需求，比如我需要搜索 22.doc

查看分词效果如下

standard 分词器把22.doc分词了22和doc两个词

如果我想搜索2. 或者 .d就搜不到了...

但是需求就是想跟数据库like模糊查询一样，任何一个字符都能搜到

二、ngram 分词分析

那么我们就需要换个分词器了，我们看看ngram的分词分析

似乎满足我们的需求

三、创建索引

这里直接贴创建索引的脚本了

我用的是ES7版本

{
    "settings": {
        "index": {
            "number_of_shards": "5",
            "number_of_replicas": "1"
        },
        "index.max_ngram_diff": 5,
        "analysis": {
            "analyzer": {
                "ngram_analyzer": {
                    "tokenizer": "ngram_tokenizer"
                }
            },
            "tokenizer": {
                "ngram_tokenizer": {
                    "type": "ngram",
                    "min_gram": 1,
                    "max_gram": 5,
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "repoPrefix": {
                "type": "text"
            },
            "fileName": {
                "type": "text",
				"analyzer": "ngram_analyzer"
            },
            "level": {
                "type": "long"
            },
			"leafFlag": {
                "type": "boolean"
            },
            "filePath": {
                "type": "text"
            },
            "creatorName": {
                "type": "text"
            },
            "description": {
                "type": "text",
				"analyzer": "ngram_analyzer"
            },
            "updateTime": {
                "type": "date"
            },
            "revision": {
                "type": "long"
            },
            "createTime": {
                "type": "date"
            },
            "fileSize": {
                "type": "long"
            },
            "updaterName": {
                "type": "text"
            },
            "_class": {
                "type": "keyword"
            },
            "id": {
                "type": "long"
            },
            "projectId": {
                "type": "long"
            },
            "fileType": {
                "type": "long"
            }
        }
    }
}

四、Java代码

对应实体类如下

import lombok.Data;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.stereotype.Component;


import javax.persistence.Id;
import java.io.Serializable;
import java.util.Date;

/**
 * 文件类对应索引
 */
@Component
@Document(indexName = "common_file", indexStoreType = "common_file")
@Data
public class EsCommonFile implements Serializable
{
    

    /**
     * ID
     */
    @Id
    private Long id;

    /**
     * 项目id
     */
    private Long projectId;

    /**
     * 文件名称
     */
    private String fileName;

    /**
     * 文件类型(1 file,2 folder)
     */
    private Integer fileType;

    /**
     * 文件路径
     */
    private String repoPrefix;

    /**
     * 文件路径
     */
    private String filePath;

    /**
     * 文件大小，单位字节
     */
    private Long fileSize;


    /**
     * 文件版本
     */
    private Long revision;

    /**
     * 创建人账号
     */
    private String creatorName;

    /**
     * 修改人账号
     */
    private String updaterName;

    /**
     * 描述
     */
    private String description;

    /**
     * 文件层级，仓库目录为1，子目录依次+1
     */
    private Integer level;

    /**
     * 是否存在子目录(1不存在，0存在)
     */
    private Boolean leafFlag;

    /**
     * 创建时间
     */
    private Date createTime;


    /**
     * 更新时间
     */
    private Date updateTime;

}

不需要加 @Field 注解

具体ES整合可以看我之前的文章

https://liuyanzhao.com/1462011641964138498.html

历史上的今天

十二月

03日

微信
交流学习，资料分享

个人淘宝
店铺名：言曌博客咨询部
(部分商品未及时上架淘宝)

言曌博客

ElasticSearch ES使用ngram分词器

一、standard分词分析

二、ngram 分词分析

三、创建索引

四、Java代码

历史上的今天

发表评论取消回复

微信

在线咨询

一、standard分词分析

二、ngram 分词分析

三、创建索引

四、Java代码

历史上的今天

您可以选择一种方式赞助本站

支付宝扫一扫赞助

微信钱包扫描赞助

发表评论 取消回复

微信

在线咨询

发表评论取消回复