System Design: Building a Real-Time Chat Application
I have been through three system design rounds this year alone, and real-time chat comes up more than anything else. But beyond interviews, I have actually built messaging systems at two companies. The gap between a whiteboard answer and a production system is enormous. This post is what I wish someone had written for me before I started.
Problem Statement
We are building a chat application that supports:
Functional Requirements:
- One-on-one and group messaging
- Real-time message delivery (sub-second latency)
- Message persistence and history
- Online/offline presence indicators
- Read receipts
- Push notifications for offline users
Non-Functional Requirements:
- 10 million daily active users
- 99.99% availability
- Messages delivered in order within a conversation
- End-to-end latency under 200ms for online users
- Messages stored for 5 years
High-Level Architecture
Here is how the major components connect:
Clients (Web/Mobile)
|
| WebSocket (STOMP)
v
[Load Balancer (Layer 7)]
|
v
[Chat Server Cluster] <---> [Redis Pub/Sub] <---> [Chat Server Cluster]
| |
v v
[Message Queue (Kafka)] [Presence Service (Redis)]
|
v
[Message Persistence (PostgreSQL)]
|
v
[Push Notification Service]
The key insight: Chat servers are stateful because they hold WebSocket connections. Redis Pub/Sub bridges the gap when two users connected to different servers need to talk.
Technology Choices
| Component | Technology | Why |
|---|---|---|
| Real-time transport | WebSocket + STOMP | Full-duplex, Spring has first-class support |
| Message broker | Redis Pub/Sub | Low latency, simple, handles cross-server routing |
| Persistent queue | Apache Kafka | Durability, replay, decouples write path |
| Database | PostgreSQL | JSONB for flexible message metadata, strong consistency |
| Presence | Redis | TTL-based keys, sub-millisecond reads |
| Push notifications | Firebase Cloud Messaging | Industry standard, handles both iOS and Android |
WebSocket Implementation with Spring
Spring's STOMP over WebSocket support is production-tested and saves you from writing low-level frame handling.
WebSocket Configuration:
@Configuration
@EnableWebSocketMessageBroker
public class WebSocketConfig implements WebSocketMessageBrokerConfigurer {
@Override
public void configureMessageBroker(MessageBrokerRegistry config) {
// Use Redis-backed broker for multi-instance support
config.enableStompBrokerRelay("/topic", "/queue")
.setRelayHost("redis-host")
.setRelayPort(6379);
config.setApplicationDestinationPrefixes("/app");
config.setUserDestinationPrefix("/user");
}
@Override
public void registerStompEndpoints(StompEndpointRegistry registry) {
registry.addEndpoint("/ws/chat")
.setAllowedOrigins("https://yourchatapp.com")
.withSockJS(); // Fallback for older browsers
}
}
Message Handler:
@Controller
public class ChatController {
private final MessagePersistenceService persistenceService;
private final PresenceService presenceService;
private final PushNotificationService pushService;
@MessageMapping("/chat.send")
public void sendMessage(@Payload ChatMessage message,
SimpMessageHeaderAccessor headerAccessor) {
String senderId = headerAccessor.getUser().getName();
message.setSenderId(senderId);
message.setTimestamp(Instant.now());
message.setMessageId(UUID.randomUUID().toString());
// Persist asynchronously via Kafka
persistenceService.persistAsync(message);
// Route to recipient
String recipientId = message.getRecipientId();
if (presenceService.isOnline(recipientId)) {
messagingTemplate.convertAndSendToUser(
recipientId,
"/queue/messages",
message
);
} else {
// User offline -- queue push notification
pushService.sendPushNotification(recipientId, message);
}
}
@MessageMapping("/chat.typing")
public void typingIndicator(@Payload TypingEvent event,
SimpMessageHeaderAccessor headerAccessor) {
messagingTemplate.convertAndSendToUser(
event.getRecipientId(),
"/queue/typing",
event
);
}
@Autowired
private SimpMessagingTemplate messagingTemplate;
}
Message Persistence and Delivery Guarantees
Messages flow through Kafka before hitting PostgreSQL. This decouples the hot path (WebSocket delivery) from the cold path (database write). Even if the database is slow, the user sees the message instantly.
For delivery guarantees, I track three states:
- SENT — server received the message
- DELIVERED — recipient's client acknowledged receipt
- READ — recipient opened the conversation
@Entity
@Table(name = "messages")
public class Message {
@Id
private String messageId;
private String conversationId;
private String senderId;
private String content;
private Instant timestamp;
@Enumerated(EnumType.STRING)
private DeliveryStatus status; // SENT, DELIVERED, READ
private Instant deliveredAt;
private Instant readAt;
}
The client sends an acknowledgment back over the WebSocket when it receives a message. This flips the status from SENT to DELIVERED. Simple, reliable.
Presence System with Redis
Presence is one of those features that looks trivial but gets tricky at scale. I use Redis TTL keys:
@Service
public class PresenceService {
private final StringRedisTemplate redisTemplate;
private static final Duration PRESENCE_TTL = Duration.ofSeconds(30);
public void markOnline(String userId, String serverId) {
String key = "presence:" + userId;
Map<String, String> value = Map.of(
"serverId", serverId,
"lastSeen", Instant.now().toString()
);
redisTemplate.opsForHash().putAll(key, value);
redisTemplate.expire(key, PRESENCE_TTL);
}
public boolean isOnline(String userId) {
return Boolean.TRUE.equals(
redisTemplate.hasKey("presence:" + userId)
);
}
public void heartbeat(String userId) {
redisTemplate.expire("presence:" + userId, PRESENCE_TTL);
}
}
Clients send a heartbeat every 15 seconds. If the key expires (no heartbeat for 30 seconds), the user is considered offline. No complex state machine needed.
Scaling WebSockets Horizontally
This is where most designs fall apart. User A is connected to Server 1, User B is connected to Server 2. How does A's message reach B?
Redis Pub/Sub solves this. Each chat server subscribes to a channel. When Server 1 receives a message for User B, it publishes to Redis. Server 2 picks it up and delivers over its local WebSocket.
@Service
public class RedisMessageRelay {
private final StringRedisTemplate redisTemplate;
private final SimpMessagingTemplate messagingTemplate;
public void relayMessage(ChatMessage message) {
String channel = "chat:user:" + message.getRecipientId();
redisTemplate.convertAndSend(channel,
objectMapper.writeValueAsString(message));
}
@Bean
public MessageListenerAdapter messageListener() {
return new MessageListenerAdapter((MessageListener) (message, pattern) -> {
ChatMessage chatMessage = objectMapper.readValue(
message.getBody(), ChatMessage.class);
messagingTemplate.convertAndSendToUser(
chatMessage.getRecipientId(),
"/queue/messages",
chatMessage
);
});
}
}
For group chats with many participants, fan-out happens at the Redis layer. Each server only delivers to users connected to it.
Message Ordering and Consistency
Messages within a single conversation must be ordered. I use a Snowflake-like ID generator that produces time-sortable, globally unique IDs. The conversation is partitioned in Kafka by conversationId, so messages within a conversation are always processed in order.
On the client side, messages are sorted by their server-assigned timestamp, not the client's local time. Never trust the client clock.
Database Schema
CREATE TABLE users (
user_id VARCHAR(36) PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
display_name VARCHAR(100),
avatar_url TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE conversations (
conversation_id VARCHAR(36) PRIMARY KEY,
type VARCHAR(10) NOT NULL, -- 'DIRECT' or 'GROUP'
name VARCHAR(100),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE conversation_members (
conversation_id VARCHAR(36) REFERENCES conversations(conversation_id),
user_id VARCHAR(36) REFERENCES users(user_id),
joined_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (conversation_id, user_id)
);
CREATE TABLE messages (
message_id VARCHAR(36) PRIMARY KEY,
conversation_id VARCHAR(36) REFERENCES conversations(conversation_id),
sender_id VARCHAR(36) REFERENCES users(user_id),
content TEXT NOT NULL,
status VARCHAR(10) DEFAULT 'SENT',
created_at TIMESTAMP DEFAULT NOW(),
delivered_at TIMESTAMP,
read_at TIMESTAMP
);
-- Critical indexes for query performance
CREATE INDEX idx_messages_conversation ON messages(conversation_id, created_at DESC);
CREATE INDEX idx_conversation_members_user ON conversation_members(user_id);
Partition the messages table by created_at (monthly partitions) once you pass a few hundred million rows. Old partitions can be moved to cold storage.
Capacity Estimation
Let us do some quick math for 10 million DAU:
- Average user sends 20 messages/day
- Total: 200 million messages/day (~2,300 messages/second)
- Average message size: 200 bytes (content + metadata)
- Daily storage: 200M x 200B = 40 GB/day
- Annual storage: ~14.6 TB/year
- 5-year retention: ~73 TB
For WebSocket connections: 10M concurrent connections at peak. At ~10KB memory per connection, that is 100 GB of RAM across the cluster. With 16 GB allocated per server instance, you need roughly 7-8 chat server instances at peak. In practice, I would run 12-15 for headroom and fault tolerance.
Redis Pub/Sub handles 500K+ messages/second on a single node, so one Redis cluster with a few replicas covers us comfortably.
Push Notifications for Offline Users
When the presence check says a user is offline, the message gets routed to a notification queue:
@Service
public class PushNotificationService {
private final FirebaseMessaging firebaseMessaging;
public void sendPushNotification(String userId, ChatMessage message) {
String fcmToken = tokenRepository.getToken(userId);
if (fcmToken == null) return;
Message notification = Message.builder()
.setToken(fcmToken)
.setNotification(Notification.builder()
.setTitle(message.getSenderName())
.setBody(truncate(message.getContent(), 100))
.build())
.putData("conversationId", message.getConversationId())
.build();
firebaseMessaging.sendAsync(notification);
}
}
Batch notifications if a user has many unread messages. Nobody wants 50 separate push alerts.
Conclusion
System design is fundamentally about trade-offs. In this chat system, we traded the simplicity of a stateless HTTP API for the complexity of stateful WebSocket connections — because sub-second latency demanded it. We added Redis as a coordination layer, accepting the operational overhead because the alternative (sticky sessions with no failover) is worse. We chose eventual consistency for read receipts because strong consistency there would crush throughput for no real user benefit.
Every decision has a cost. The skill is not in memorizing architectures — it is in understanding why each piece exists and what breaks if you remove it. That is what separates a whiteboard answer from a system that actually runs at scale.