张

张小凯的博客

https://jasonkayzk.github.io/

https://jasonkayzk.github.io/atom.xml (RSS订阅地址)

下一代提示词工程语言POML简明教程

传统的提示词工程通常涉及编写自由文本，随着应用的发展，提示词文本会变得越来越复杂。从而引出：提示词难以维护、难以进行版本控制、在不同场景下难以重用，几乎不可能进行系统化测试等一系列问题。如何解决这些问题呢？ Microsoft给出了一个工程化的答案：POML！文章和 Colab 配合，学习效果更佳： https://colab.research.google.com/drive/1RrZyqB16XMvsFBjir90m-NCXE35kWFdy?usp=sharing 下一代提示词工程语言POML简明教程一、简介 POML通过引入一种结构化方法，使用类似于 HTML 的格式编写提示词内容；用户无需编写纯文本提示词，而是可以使用 <role>、<task> 和 <example> 等语义组件来组织提示词意图，从而带来更好的 LLM 性能和更便捷的提示词维护。同时，POML具有类似CSS的样式系统，将内容与定义表示分离；（一）核心架构 POML采用三层架构运行，分离关注点并支持灵活的提示词开发：该架构通过几个阶段处理POML文件：解析：将类似XML的语法转换为结构化的中间表示；处理：应用样式表、解析模板并集成外部数据；生成：以各种格式生成最终的优化提示词；这种分离使开发者能够在不改变核心逻辑的情况下修改表示样式，无缝集成外部数据源，并在项目中保持一致的提示词结构。（二）主要特性 1、结构化标记系统 POML使用类似HTML的语法和语义组件，使提示词更具可读性和可维护性。主要组件包括： <role>：定义LLM应采用的角色或身份； <task>：指定LLM需要完成的任务； <example>：提供少样本学习示例； <output-format>：控制预期的响应格式； <hint>：提供额外的上下文或约束；例如： <poml> <role>You are a patient teacher explaining concepts to a 10-year-old.</role> <task>Explain the concept of photosynthesis using the provided image as a reference.</task> <img src="photosynthesis.png" alt="Diagram of photosynthesis" /> <output-format> Keep the explanation simple, engaging, and under 100 words. Start with "Hey there, future scientist!". </output-format></poml> 2、外部数据集成 POML通过专用组件来集成外部数据： <document>：嵌入文本文件、PDF或Word文档； <table>：集成电子表格或CSV文件中的结构化数据 <img>：包含带有替代文本的图像，用于支持视觉的模型 <audio>：处理多模态应用的音频文件；例如： <hint captionStyle="header" caption="Background Knowledge"> <Document src="assets/tom_and_jerry.docx"/></hint><example> <input> <img src="assets/tom_cat.jpg" alt="The image contains the Tom cat character." syntax="multimedia" /> </input> <output> <Document src="assets/tom_introduction.txt"/> </output></example> 3、解耦的表示样式 POML具有类似CSS的样式系统，将内容与表示分离。例如： <stylesheet> role { verbosity: concise; format: markdown; } task { emphasis: strong; }</stylesheet> 这允许开发者修改详细程度、输出格式和强调等样式方面，而无需改变核心提示词逻辑，显著降低调整提示词时格式漂移的风险。 4、模板引擎 POML包含强大的模板引擎，用于动态提示词生成：变量：{ { variable_name } } 循环：<for each="item in items">...</for> 条件：<if condition="variable > 0">...</if> 定义：<let name="variable" value="expression" /> 这支持创建数据驱动的提示词，能够适应不同的上下文和输入。（三）开发生态 POML提供全面的开发工具包，提高生产力： 1、VSCode扩展 Visual Studio Code扩展提供：语法高亮和语言支持上下文感知的自动补全实时预览功能与LLM提供商的集成测试错误诊断和验证可重用组件的提示词库 2、多语言SDK POML为Python和TypeScript/JavaScript提供SDK： Python SDK: from poml import load, render# Load and render a POML fileprompt = load("example.poml")result = render(prompt, variables={"topic": "photosynthesis"}) TypeScript SDK: import { loadPoml, renderPoml } from 'pomljs';// Load and render a POML fileconst prompt = await loadPoml('example.poml');const result = await renderPoml(prompt, { topic: 'photosynthesis' }); 二、基本使用（一）安装 Node.js (via npm)： npm install pomljs Python (via pip)： pip install poml （二）第一个案例 1、编写POML文件编写一个名为 example.poml 的文件，内容如下： example.poml <poml> <role>You are a patient teacher(named {{teacher_name}}) explaining concepts to a 10-year-old.</role> <task>Explain the concept of photosynthesis using the provided image as a reference.</task> <input> <img src="photosynthesis.jpg" alt="Diagram of photosynthesis" syntax="multimedia"/> </input> <output-format> Keep the explanation simple, engaging, and under 100 words. Start with "Hey there, future scientist!". </output-format></poml> 示例定义了： LLM 的角色和任务，包含一张图片作为上下文，并指定了所需的输出格式。同时，包含了一个变量 teacher_name；编写完成后，如果你安装了 Visual Studio Code poml 插件，则可以进行预览： 2、解析并渲染POML 借助 POML 工具包，此提示词可以轻松渲染为灵活的格式，并可通过 LLM 进行测试。例如在 Python 中： from poml import poml# Process a POML file# result = poml("example.poml")# Process with context variablesresult = poml("example.poml", context={"teacher_name": "Jasonkay"})print(f"Process with context variables: {result}")# Get OpenAI chat format(Within the higher version)# messages = poml("example.poml", format="openai_chat")# print(f"Get OpenAI chat format: {messages}") poml 函数接受以下参数： markup：POML 内容（字符串或文件路径） context：可选的模板注入数据 stylesheet：可选的样式自定义 format：输出格式（”dict”、”openai_chat”、”langchain”、”pydantic” 或 “raw”）执行代码后，输出结果为： Process with context variables: [{'speaker': 'system', 'content': '# Role\n\nYou are a patient teacher(named Jasonkay) explaining concepts to a 10-year-old.\n\n# Task\n\nExplain the concept of photosynthesis using the provided image as a reference.'}, {'speaker': 'human', 'content': [{'type': 'image/webp', 'base64': 'UklGRg......', 'alt': 'Diagram of photosynthesis'}, '# Output Format\n\nKeep the explanation simple, engaging, and under 100 words. Start with "Hey there, future scientist!".']}] 可以看到，输出的内容将内容进行了渲染！ 3、与LLM系统集成(Gemini) 最后，将我们的提示词和外部 LLM 系统相结合！由于目前最新的 POML SDK 还不支持使用 format 参数来渲染 openai_chat 类型的 Prompt；因此，这里使用 Gemini API 来发送图片！使用下面的 poml 文件来渲染： example.poml <poml> <role>You are a patient teacher(named {{teacher_name}}) explaining concepts to a 10-year-old.</role> <task>Explain the concept of photosynthesis using the provided image as a reference.</task> <output-format> Keep the explanation simple, engaging, and under 100 words. Start with "Hey there, future scientist!". </output-format></poml> 首先安装 Gemini SDK： pip install -U google-genai 要运行下面的代码，你需要创建一个 Gemini 的 API Key： https://aistudio.google.com/app/apikey 随后，将下面的 YOUR_API_KEY 替换为你生成的 Key！ from google import genaifrom poml import pomlfrom google.genai import typesGEMINI_API_KEY="YOUR_API_KEY"client = genai.Client(api_key=GEMINI_API_KEY)# Read the picturewith open('photosynthesis.jpg', 'rb') as f: image_bytes = f.read()# Render the POML fileresult = poml("example.poml", context={"teacher_name": "Jasonkay"}, chat=False)# print(f"Process with context variables: {result}")response = client.models.generate_content( model="gemini-2.5-flash", contents=[ result, types.Part.from_bytes( data=image_bytes, mime_type='image/jpeg', ), ])print(response.text) 最后，执行即可输出内容： Look at our amazing plant friend! Just like you need food, plants need to eat too! This image shows how they do it, a process called **photosynthesis**. Plants use sunlight from the sun, and "drink" water through their roots. They also breathe in a gas called CO2 (carbon dioxide) from the air, shown by the blue arrow going in. Using these, they make their own sugary food to grow! As a super cool bonus, they release O2 (oxygen) for us to breathe, shown by the blue arrow going out. Amazing, right? （三）使用样式现在，让我们为上面的例子增加相关的样式，来优化的 Prompt 配置！ example-2.poml <poml> <role>You are a patient teacher(named {{teacher_name}}) explaining concepts to a 10-year-old.</role> <task>Explain the concept of photosynthesis using the provided image as a reference.</task> <output-format> <list listStyle="dash"> <item className="explanation">Keep the explanation simple, engaging, and under 100 words.</item> <item className="greeting"> Start with "Hey there, future scientist!". </item> </list> </output-format></poml><stylesheet> { ".explanation": { "syntax": "json" }, "list" : { "whiteSpace": "trim" } }</stylesheet> 渲染结果如下： # RoleYou are a patient teacher(named ) explaining concepts to a 10-year-old.# TaskExplain the concept of photosynthesis using the provided image as a reference.# Output Format```json"Keep the explanation simple, engaging, and under 100 words."```- Start with "Hey there, future scientist!". 更多内容可以参考官方文档： https://microsoft.github.io/poml/latest/language/standalone/#stylesheet 三、深入学习在完成了基础学习之后，可以继续阅读下面的内容：更多官方的案例官方文档 POML语法结构 POML中间表示 API参考外部系统集成自定义组件进行更加深度的学习！附录文章和 Colab 配合，学习效果更佳： https://colab.research.google.com/drive/1RrZyqB16XMvsFBjir90m-NCXE35kWFdy?usp=sharing 参考文章： https://github.com/microsoft/poml https://zread.ai/microsoft/poml https://microsoft.github.io/poml/latest/ https://ai.google.dev/gemini-api/docs/image-understanding

张小凯的博客

下一代提示词工程语言POML简明教程

开了一个新的专门学习日语的博客

一、并行编程导论与CUDA入门

一些免费的GPU资源

debian12部署kubernetes-1.28集群

分享两个服务器实用脚本：xsync和xcall

RSS订阅工具Folo使用

多平台消息推送工具ntfy使用

AppleScript介绍与简单实战

uv使用

gemini-cli使用

跑步一年多的一些总结和感想

通过GithubActions拉取并推送Docker镜像到国内云

使用AnythingLLM+SillconFlow+Milvus快速搭建个人云端知识库

【顶】さよなら２０２４、こんにちは２０２５！

Excel通过身份证号列计算性别

开源的个人书籍管理系统Talebook

2024年安装Docker的方法

Python项目Linter、Formatter和Github-Actions配置

Zerotier配置内网流量转发

张小凯的博客

下一代提示词工程语言POML简明教程

开了一个新的专门学习日语的博客

一、并行编程导论与CUDA入门

一些免费的GPU资源

debian12部署kubernetes-1.28集群

分享两个服务器实用脚本：xsync和xcall

RSS订阅工具Folo使用

多平台消息推送工具ntfy使用

AppleScript介绍与简单实战

uv使用

gemini-cli使用

跑步一年多的一些总结和感想

通过GithubActions拉取并推送Docker镜像到国内云

使用AnythingLLM+SillconFlow+Milvus快速搭建个人云端知识库

【顶】さよなら２０２４、こんにちは２０２５！

Excel通过身份证号列计算性别

开源的个人书籍管理系统Talebook

2024年安装Docker的方法

Python项目Linter、Formatter和Github-Actions配置

Zerotier配置内网流量转发

张小凯的博客

下一代提示词工程语言POML简明教程

开了一个新的专门学习日语的博客

一、并行编程导论与CUDA入门

一些免费的GPU资源

debian12部署kubernetes-1.28集群

分享两个服务器实用脚本：xsync和xcall

RSS订阅工具Folo使用

多平台消息推送工具ntfy使用

AppleScript介绍与简单实战

uv使用

gemini-cli使用

跑步一年多的一些总结和感想

通过GithubActions拉取并推送Docker镜像到国内云

使用AnythingLLM+SillconFlow+Milvus快速搭建个人云端知识库

【顶】さよなら２０２４、こんにちは２０２５！

Excel通过身份证号列计算性别

开源的个人书籍管理系统Talebook

2024年安装Docker的方法

Python项目Linter、Formatter和Github-Actions配置

Zerotier配置内网流量转发