Users prefer seeing tokens appear immediately rather than waiting for the full response. Streaming sends tokens to the client as they're generated.
Implement with Server-Sent Events (SSE) or WebSockets. Most inference servers support streaming out of the box. Your API returns a stream of token chunks.
Streaming improves perceived latency dramatically. Even if total generation time is unchanged, users see progress immediately. This is standard for production chat interfaces.