Using SSE to Implement ChatGPT in Rails

中文版本 (Chinese version): https://ruby-china.org/topics/43052

Introduction

When using ChatGPT, you may notice that the response is not returned all at once after completion, but rather in chunks, as if the response was being typed out:

About SSE

If we check OpenAI API document, we can find that there's a param called stream for the create chat completion API.

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.

So what is SSE?

Basically, SSE, short for “Server-Sent Event”, is a simple way to stream events from a server. It is used for sending real-time updates from a server to a client over a single HTTP connection. With SSE, the server can push data to the client as soon as it becomes available, without the need for the client to constantly poll the server for updates.

SSE can be implemented through the HTTP protocol:

The client make a GET request to the server: https://www.host.com/stream
The client sets Connection: keep-alive to establish a long-lived connection
The server sets a Content-Type: text/event-stream response header
The server starts sending events that look...

中文版本 (Chinese version): https://ruby-china.org/topics/43052

Introduction

When using ChatGPT, you may notice that the response is not returned all at once after completion, but rather in chunks, as if the response was being typed out:

About SSE

If we check OpenAI API document, we can find that there's a param called stream for the create chat completion API.

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.

So what is SSE?

SSE can be implemented through the HTTP protocol:

The client make a GET request to the server: https://www.host.com/stream
The client sets Connection: keep-alive to establish a long-lived connection
The server sets a Content-Type: text/event-stream response header
The server starts sending events that look...

中文版本 (Chinese version): https://ruby-china.org/topics/43052

Introduction

When using ChatGPT, you may notice that the response is not returned all at once after completion, but rather in chunks, as if the response was being typed out:

About SSE

If we check OpenAI API document, we can find that there's a param called stream for the create chat completion API.

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.

So what is SSE?

SSE can be implemented through the HTTP protocol:

The client make a GET request to the server: https://www.host.com/stream
The client sets Connection: keep-alive to establish a long-lived connection
The server sets a Content-Type: text/event-stream response header
The server starts sending events that look...