A thought experiment: High-Availability IRC

I've been idly pondering how to create a system for high-availability IRC. I don't have any immediate intentions of implementing any of this, but I thought some people might find it interesting.

IRC involves clients establishing TCP or TLS connections of long durations to IRC servers. If the connection is broken, this has client-visible consequences; the user is told that they have been disconnected and they can no longer send or receive messages.

Even if reconnection occurs immediately, this momentary disconnection is irritating to the user, so it should be avoided if we're seeking to create a high-availability IRC network.
However reliable an IRC daemon solution, or an OS is, it can be safely assumed it will always be necessary to perform maintenance on a machine eventually. There's also the question of hardware reliability, but for VMs live migration can be used to drain physical hardware, so I don't consider this an issue. I also don't consider unforeseen hardware failure as this is a much harder issue which would require constant synchronisation of state between IRC daemons. Any box eventually needs to be rebooted, for example for kernel updates. Therefore, it needs to be possible to transfer existing clients to another server before taking a server out of service, without clients ever noticing this.
Unfortunately, here some of the mistakes of the Internet's architecture come to bite us, such as the lack of a session layer and the tight binding of a transport association to a TCP 4-tuple (source IP, source port, destination IP, destination port). For vanilla TCP, we can't change any of these values without breaking the connection.
Therefore, any solution must fit into one of these categories:
- Multiple servers handle traffic with the same destination IP address (that is: don't change the 4-tuple). Suppose that two servers have the same IP, and “the network” (or load balancer or so on) can be switched to route traffic to either server. If TCP connection state can be transferred from one server to another (as well as IRC state relating to the client, of course), then a TCP connection could essentially have one leg of it transferred from one server to another, without the client ever knowing. Doing this would presumably require modification of the kernels of both servers, to allow TCP connection state to be serialized and deserialized and sent to one another over the network, so that the state can be transferred. But it is in principle possible.
- Intermediating proxies (that is: add a layer of indirection). The idea here is you would use something along the lines of HAProxy as a TCP or TLS frontend which then passes traffic to backend IRC servers. I don't believe HAProxy as it stands knows how to transfer an existing TCP connection to a new backend, so it would need to be modified to support this, but this is readily feasible. The server being shut down would have to inform the new server of its state, then tell HAProxy to hand over the connections to the new server.
  
  The problem with the intermediating proxy approach, though, is that eventually you have to take that intermediating proxy out of service for maintenance, too, so it only shunts the problem. It might still be worthwhile if IRC daemons are restarted more often (for example, due to feature enhancements), but IRC connections can easily last many months without interruption.
- MP-TCP. MP-TCP is a newish extension to TCP which allows for multipathing, in a similar vein to SCTP. The stated intention of this extension is to allow the endpoints of a TCP connection to comprise multiple IPs so as to allow multiple network paths to be used, but since this allows the IPs to which a TCP connection is bound to be changed dynamically without tearing down the connection, this could theoretically be used to move one end of a TCP connection to a new server with the other end of the TCP connection being none the wiser. This would be a nonstandard usage and would surely require modification of a MPTCP stack to allow state to be transferred via servers like this, but the client would only need a standard MPTCP implementation. The drawback of this approach is that MPTCP support is necessary on the client, so (at least for IRC) it's mainly only of interest when using an intermediating proxy which will constitute the client. But then you run into the same drawbacks of intermediary proxies stated above; you've simply shunted the problem and will eventually have to take the proxy out of service for maintenance.
For this reason, it seems to me like the first solution is the optimal one. This solution can be implemented any number of ways, but the most obvious one is to connect the two servers as though they're routers both connected to the same virtual subnet containing the service IP, and configure routing accordingly. Some method must be found to ensure all traffic can be forced to one server instead of the other, for when maintenance is performed.

Although the simplest solution is simply to route all traffic to one server and keep a second with the same IP as a standby, ECMP load balancing in principle should also be possible; if done consistently, traffic for a given 4-tuple should always be routed to the same server, avoiding the need for state synchronization.
For the hell of it, I've also contemplated the possibility of what an AWS-based IRC network would look like (indulge me). Is it even possible to implement the first solution using AWS? Most of the load balancing and high availability routing functionality provided by AWS doesn't have any way to reassign a TCP backend connection once it's been made, just as in the case of HAProxy, and indeed, since this would require special marshalling when a transfer occurs, it would be an extremely niche feature.

However, as far as I can tell, AWS does offer one service which might be adequate to the task: the “Gateway Load Balancer”. If I understand it correctly — the documentation is rather spartan — it allows traffic destined for an entire subnet to be tunneled to an instance via the GENEVE protocol. This means that you basically get a stream of UDP packets that you can handle with a normal daemon which contain tunneled Ethernet frames containing IP packets.

Theoretically, you could therefore write an IRC daemon which supports GENEVE and incorporates its own userspace TCP/IP stack, complete with a way to transfer TCP connection state to another server. Both servers would have the same IP address; since the Gateway Load Balancer allows multiple backends with health checks, this means that all traffic to the given IP could be tunneled to one server, unless it ceases to report itself as healthy, in which case all traffic to the same IP is then handed to the second server. This suggests a formal handover protocol such as this:
- The IRC daemon on the first server stops transmitting to all clients temporarily, and discards any incoming TCP traffic. This is to ensure the TCP state machine doesn't mutate while the migration is in progress. To the client, it will simply look like packet loss is occurring, and it will retransmit packets.
- The IRC daemon, with its own userspace TCP/IP stack, serializes the state of its TCP connections, as well as the associated IRC user state, and sends it to the second server.
- The second server's IRC daemon deserializes this data into its TCP/IP stack and lets the first server know it's ready to handle traffic. It also starts reporting itself as healthy when the load balancer makes health checks.
- The first server stops reporting itself as healthy, and the load balancer stops forwarding traffic to it and starts forwarding traffic to the second server instead. TCP traffic, including any retransmitted packets, will now be handled by the second server.
- The first server shuts down and can be taken out of service, rebooted, etc. The process can be performed identically in reverse when the second server is to be maintained.
Of course, since this would all be implemented in userspace, it would also be an ideal opportunity to use something like io_uring for high performance networking.
All of this comes at some substantial expense, of course. But theoretically it's possible, and I think that's interesting. The notion of moving TCP connection endpoints between machines without the other party knowing has long been an interesting idea to me, frustrated by the unfortunate design of TCP. It really is something which should be easier.

It's possible MP-TCP will become more popular in the future; it seems to be that e.g. iOS and 5G are pushing it. If this is the case, perhaps one day it might become common enough amongst clients that it can be used to implement connection transfer without having multiple servers share a virtual IP.

There are of course other aspects of how IRC could be made highly available besides the question of client reconnection and server restarts; IRC networks generally limit themselves to connecting servers in a tree, without redundant associations between servers. This can clearly be solved in the same manner as for computer networking, such as with solutions resemblent of Spanning-Tree Protocol or dynamic routing protocols. The issue of making IRC services highly available is also another subject entirely.