Elasticsearch 是一个分布式可扩展的实时搜索和分析引擎。它能帮助你搜索、分析和浏览数据,而往往大家并没有在某个项目一开始就预料到需要这些功能。Elasticsearch之所以出现就是为了重新赋予原始数据新的活力。

注意先查看自己使用的elasticsearch的版本。Open edX中目前使用的elasticsearch为0.92版,十分落后,以至于许多特性都和最新的版本不一致

1
2
3
4
5
6
7
8
{"index": {"_index": "library", "_type": "book", "_id": "1"}}
{"title": "All Quiet on the Western Front","otitle": "Im Westen nichts Neues","author": "Erich Maria Remarque","year": 1929,"characters": ["Paul Bäumer", "Albert Kropp", "Haie Westhus", "Fredrich Müller", "Stanislaus Katczinsky", "Tjaden"],"tags": ["novel"],"copies": 1, "available": true,"section" : 3}
{"index": {"_index": "library", "_type": "book", "_id": "2"}}
{"title": "Catch-22","author": "Joseph Heller","year": 1961,"characters": ["John Yossarian", "Captain Aardvark","Chaplain Tappman", "Colonel Cathcart", "Doctor Daneeka"],"tags": ["novel"],"copies": 6, "available" : false,"section" : 1}
{"index": {"_index": "library", "_type": "book", "_id": "3"}}
{"title": "The Complete Sherlock Holmes","author": "Arthur Conan Doyle","year": 1936,"characters":["Sherlock Holmes","Dr. Watson", "G. Lestrade"],"tags":[],"copies": 0, "available" : false, "section" : 12}
{"index": {"_index": "library", "_type": "book", "_id": "4"}}
{"title": "Crime and Punishment","otitle": "Преступлéние и наказáние","author": "Fyodor Dostoevsky","year": 1886,"characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],"tags": [],"copies": 0, "available" : true}

如果你在Open edX中启用课程/内容搜索功能,那么课程数据将被索引到elasticsearch里。目前这部分有一个问题,当课程被删除后索引还存在,如果不手动删除,用户还能检索出课程。这是个bug

####标准分析器 标准分析器是Elasticsearch默认使用的分析器。对于文本分析,它对于任何语言都是最佳选择(译者注:就是没啥特殊需求,对于任何一个国家的语言,这个分析器就够用了) 它根据Unicode Consortium的定义的单词边界(word boundaries)来切分文本,然后去掉大部分标点符号。最后,把所有词转为小写

Lucene 并不了解内部对象。 一个 Lucene 文件包含一个键-值对应的扁平表单。 为了让 Elasticsearch 可以有效的索引内部对象,将文件转换为扁平格式