Measure Zero

Measure Zero

马上订阅 Measure Zero RSS 更新: https://shiina18.github.io/atom.xml

LightRAG 源码简要分享

2025年1月21日 08:00

Guo, Z., Xia, L., Yu, Y., Ao, T., & Huang, C. (2024). Lightrag: Simple and fast retrieval-augmented generation.

大体流程:

  • 用 LLM 提取 chunks 中的实体和关系, 并存成一个图
  • 用 LLM 从 query 中提取关键词, 根据关键词召回实体或关系, 再找到最相关的 chunks, 最后把所有东西都拼起来给 LLM 输出答案

提取实体和关系并存为图

提示词位于 lightrag/prompt.py. 文档分片后, 让 LLM 按照特定格式提取实体和关系 (以及关键词), 再把 output 解析出来存储. 看代码下面的 step 3 content_keywords 好像全程都没用到.

1. Identify all entities. 
...
Format each entity as ("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>)

2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
...
- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity
- relationship_keywords: one or more high-level key words that summarize the overarching nature of the relationship, focusing on concepts or themes rather than specific details
Format each relationship as ("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_keywords>{tuple_delimiter}<relationship_strength>)

3. Identify high-level key words that summarize the main concepts, themes, or topics of the entire text. These should capture the overarching ideas present in the document.
Format the content-level key words as ("content_keywords"{tuple_delimiter}<high_level_keywords>)

...

5. When finished, output {completion_delimiter}
Example 1:

Entity_types: [person, technology, mission, organization, location]
Text:...

剩余内容已隐藏

查看完整文章以阅读更多