Streaming and Real-time Data Formats: A Practical Guide

In modern web development, data is no longer just static files or single API responses. Real-time updates, logs, and massive datasets require streaming formats—ways to send and process data as it arrives, rather than waiting for the entire payload to be ready. This guide explores the most popular formats for streaming and real-time communication.

1. Line-Delimited Formats (NDJSON & JSON Lines)

When you need to stream a list of objects (like records from a database or log entries), a standard JSON array [...] is problematic because the parser has to wait for the closing ] before it can process anything. Line-delimited formats solve this.

NDJSON (Newline Delimited JSON)

NDJSON is a standard for storing or streaming data where each line is a valid JSON object.

How it works: {"id":1}\n{"id":2}\n...
Key Advantage: You can parse and process each object individually as soon as the newline \n is received.
Use Case: Large database exports, structured logging, and data pipelines.

JSON Lines (JSONL)

JSON Lines is essentially the same as NDJSON. It is a text-based format where each line is a valid JSON value.

Key Advantage: Compatibility with Unix tools like grep, awk, and sed.
Use Case: Dataset storage for AI/ML training and log analysis.

CSV Stream

Similar to NDJSON, a CSV Stream sends rows of comma-separated values line by line.

Key Advantage: Extremely low overhead.
Use Case: Exporting millions of rows to Excel-compatible formats in real-time.

2. Server-Sent Events (SSE)

Server-Sent Events (SSE) is a standard allowing servers to push data to web pages over HTTP. Unlike WebSockets, it is a one-way communication channel (Server -> Client).

How it works: The server keeps an HTTP connection open and sends data in a specific text/event-stream format.

Protocol Format:

event: user-update
data: {"name": "Alice"}

event: chat-message
data: "Hello world!"

Key Advantage: Automatic reconnection, lightweight, and works over standard HTTP/HTTPS.
Use Case: Live sports scores, stock price tickers, and social media notifications.

3. WebSockets and Message Formats

While SSE is for one-way streaming, WebSockets provide a full-duplex (two-way) communication channel.

WebSocket Message Formats

Because WebSockets only provide a transport layer, developers must choose a message format.

JSON: The most common choice for ease of use.
Binary (Protobuf/MessagePack): Used when low latency and small payload size are critical.
Custom Text Protocols: Sometimes used for simple commands.
Use Case: Real-time collaborative editing (Google Docs), online gaming, and chat applications.

Comparison of Streaming Approaches

Format / Tech	Direction	Overhead	Reconnection	Best For
NDJSON / JSONL	Uni-directional	Low	N/A (File/Stream)	Logs, Data Exports
SSE	Server -> Client	Very Low	Automatic	Live Dashboards
WebSockets	Bi-directional	Medium	Manual	Interactive Apps
CSV Stream	Uni-directional	Minimal	N/A	Large Reports

FAQ: Frequently Asked Questions

Q: Why not just use a JSON array for streaming?

A: Standard JSON parsers are "all-or-nothing." They cannot yield objects until the entire array is closed. NDJSON allows "incremental" parsing, which saves memory and reduces latency.

Q: When should I use SSE instead of WebSockets?

A: Use SSE if you only need the server to push data to the client (e.g., notifications). SSE is easier to implement, handles disconnections automatically, and is more fire-wall friendly than WebSockets.

Q: How do I handle large NDJSON files in Node.js?

A: Use a streaming parser like readline or a dedicated NDJSON library. This allows you to process gigabytes of data with a constant, small memory footprint.

Streaming and Real-time Data Formats: NDJSON, SSE, and Beyond