Streaming and Real-time Data Formats: A Practical Guide
In modern web development, data is no longer just static files or single API responses. Real-time updates, logs, and massive datasets require streaming formats—ways to send and process data as it arrives, rather than waiting for the entire payload to be ready. This guide explores the most popular formats for streaming and real-time communication.
1. Line-Delimited Formats (NDJSON & JSON Lines)
When you need to stream a list of objects (like records from a database or log entries), a standard JSON array [...] is problematic because the parser has to wait for the closing ] before it can process anything. Line-delimited formats solve this.
NDJSON (Newline Delimited JSON)
NDJSON is a standard for storing or streaming data where each line is a valid JSON object.
- How it works:
{"id":1}\n{"id":2}\n... - Key Advantage: You can parse and process each object individually as soon as the newline
\nis received. - Use Case: Large database exports, structured logging, and data pipelines.
JSON Lines (JSONL)
JSON Lines is essentially the same as NDJSON. It is a text-based format where each line is a valid JSON value.
- Key Advantage: Compatibility with Unix tools like
grep,awk, andsed. - Use Case: Dataset storage for AI/ML training and log analysis.
CSV Stream
Similar to NDJSON, a CSV Stream sends rows of comma-separated values line by line.
- Key Advantage: Extremely low overhead.
- Use Case: Exporting millions of rows to Excel-compatible formats in real-time.
2. Server-Sent Events (SSE)
Server-Sent Events (SSE) is a standard allowing servers to push data to web pages over HTTP. Unlike WebSockets, it is a one-way communication channel (Server -> Client).
- How it works: The server keeps an HTTP connection open and sends data in a specific
text/event-streamformat. - Protocol Format:
event: user-update data: {"name": "Alice"} event: chat-message data: "Hello world!" - Key Advantage: Automatic reconnection, lightweight, and works over standard HTTP/HTTPS.
- Use Case: Live sports scores, stock price tickers, and social media notifications.
3. WebSockets and Message Formats
While SSE is for one-way streaming, WebSockets provide a full-duplex (two-way) communication channel.
WebSocket Message Formats
Because WebSockets only provide a transport layer, developers must choose a message format.
JSON: The most common choice for ease of use.
Binary (Protobuf/MessagePack): Used when low latency and small payload size are critical.
Custom Text Protocols: Sometimes used for simple commands.
Use Case: Real-time collaborative editing (Google Docs), online gaming, and chat applications.
Comparison of Streaming Approaches
| Format / Tech | Direction | Overhead | Reconnection | Best For |
|---|---|---|---|---|
| NDJSON / JSONL | Uni-directional | Low | N/A (File/Stream) | Logs, Data Exports |
| SSE | Server -> Client | Very Low | Automatic | Live Dashboards |
| WebSockets | Bi-directional | Medium | Manual | Interactive Apps |
| CSV Stream | Uni-directional | Minimal | N/A | Large Reports |
FAQ: Frequently Asked Questions
Q: Why not just use a JSON array for streaming?
A: Standard JSON parsers are "all-or-nothing." They cannot yield objects until the entire array is closed. NDJSON allows "incremental" parsing, which saves memory and reduces latency.
Q: When should I use SSE instead of WebSockets?
A: Use SSE if you only need the server to push data to the client (e.g., notifications). SSE is easier to implement, handles disconnections automatically, and is more fire-wall friendly than WebSockets.
Q: How do I handle large NDJSON files in Node.js?
A: Use a streaming parser like readline or a dedicated NDJSON library. This allows you to process gigabytes of data with a constant, small memory footprint.