Here is a branch of meek that uses the Turbo Tunnel concept: inside the HTTPS tunnel, it carries serialized QUIC packets instead of undifferentiated chunks of a data stream.
Using it works pretty much the same way as always—except you need to set --quic-tls-cert
and --quic-tls-key
options on the server and a quic-tls-pubkey=
option on the client, because QUIC has its own TLS layer (totally separate from the outer HTTPS TLS layer). See the server man page and client man page for details.
This code is more than a prototype or a proof of concept. The underlying meek code has been in production for a long time, and the Turbo Tunnel support code has by now been through a few iterations (see #14). However, I don't want to merge this branch into the mainline just yet, because it's not compatible with the protocol that meek clients use now, and because in preliminary testing it appears that the Turbo Tunnel–based meek, at least in the implementation so far, is slower than the way it works already for high-bandwidth streams. More on this below.
For simplicity, in the branch I tore out the existing simple tunnel transport and replaced it with the QUIC layer. Therefore it's not backward compatible: a non–Turbo Tunnel client won't be able to talk to a Turbo Tunnel server. Of course it wouldn't take much, at the protocol level, to restore backward compatibility: have the Turbo Tunnel clients set a special request header or use a reserved /quic
URL path or something. But I didn't do that, because the switch to Turbo Tunnel really required a rearchitecting of the code, basically turning the logic inside-out. Before, the server received HTTP requests, matched them to an upstream TCP connection using the session ID, and fed the entire request body to the upstream. Now, the server deserializes packets from requests bodies and feeds them into a QUIC engine, not immediately taking any further action. The QUIC engine then calls back into the application with "new session" and "new stream" events, after it has received enough packets to make those events happen. I actually like the Turbo Tunnel–based architecture better: you have your traditional top-level listener (a QUIC listener) and an accept loop; then alongside that you have an HTTP server that just does packet I/O on behalf of the QUIC engine. Arguably it's the traditional meek architecture that's inside-out. At any rate, combining both techniques in a single program would require factoring out a common interface from both.
The key to implementing a Turbo Tunnel–like design in a protocol is building an interface between discrete packets and whatever your obfuscation transport happens to be—a PacketConn
. On the client side, this is done by PollingPacketConn
, which puts to-be-written packets in a queue, from where they are bundled up and sent in HTTP requests by a group of polling goroutines. Incoming packets are unbundled from HTTP responses and then placed in another queue from which the QUIC engine can handle them at its leisure. On the server side, the compatibility interface is QueuePacketConn
, which is similar in that it puts incoming and outgoing packets in queues, but simpler because it doesn't have to do its own polling. The server side combines QueuePacketConn
with ClientMap
, a data structure for persisting session across logically separate HTTP request–response pairs. Connection migration, essentially. While connection migration is nice to have for obfs4-like protocols, it's a necessity for meek-like protocols. That's because each "connection" is a single HTTP request–response, and there needs to be something to link a sequence of them together.
Incidentally, it's conceivable that we could eliminate the need for polling in HTTP-tunnelled protocols using Server Push, a new feature in HTTP/2 that allows the server to send data without the client asking first. Unfortunately, the Go http2
client does not support Server Push, so at this point it would not be easy to implement.
QUIC uses TLS for its own server authentication and encryption—this TLS is entirely separate from the TLS used by the outer domain-fronted HTTPS layer. In my past prototypes here and here, to avoid dealing with the TLS complication, I just turned off certificate verification. Here, I wanted to do it right. So on the server you have to generate a TLS certificate for use by the QUIC layer. (It can be a self-signed certificate and you don't have to get it signed by a CA. Again, the QUIC TLS layer has nothing to do with the HTTP TLS layer.) If you are running from torrc, for example, it will look like this:
ServerTransportPlugin meek exec /usr/local/bin/meek-server --acme-hostnames=meek.example.com --quic-tls-cert=quic.crt --quic-tls-key=quic.key
Then, on the client, you provide a hash of the server's public key (as in HPKP). I did it this way because it allows you to distribute public keys out-of-band and not have to rely on CAs or a PKI. (Only in the inner QUIC layer. The HTTPS layer still does certificate verification like normal.) In the client torrc, it looks like this:
Bridge meek 0.0.1.0:1 url=https://meek.example.com/ quic-tls-pubkey=JWF4kDsnrJxn0pSTXwKeYR0jY8rtd/jdT9FZkN6ycvc=
The nice thing about an inner layer of QUIC providing its own crypto features is that you can send plaintext through the obfuscation tunnel, and the CDN middlebox won't be able to read it. It means you're not limited to carrying protocols like Tor that are themselves encrypted and authenticated.
The pluggable transports spec supports a feature called USERADDR that allows the PT server to inform the upstream application of the IP address of the connecting client. This is where Tor per-country bridge metrics come from, for example. The Tor bridge does a geolocation of the USERADDR IP address in order to figure out where clients are connecting from. The meek server has support for USERADDR, but I had to remove it in the Turbo Tunnel implementation because it would be complicated to do. The web server part of the server knows client IP addresses, but that's not the part that needs to provide the USERADDR. It only feeds packets into the QUIC engine, which some time later decides that a QUIC connection has started. That's the point at which we need to know the client IP address, but by then it's been lost. (It's not even a well-defined concept: there are several packets making up the QUIC handshake, and conceivably they could all have come in HTTP requests from different IP addresses—all that matters is their QUIC connection ID.) It may be possible to hack in at least heuristic USERADDR support, especially if the packet library gives a little visibility into the packet-level metadata (easier with kcp-go than with quic-go), but at this point I decided it wasn't worth the trouble.
Now about performance. I was disappointed after running a few performance tests. The turbotunnel branch was almost twice as slow as mainline meek, despite meek's slow, one-direction-at-a-time transport protocol. Here are sample runs of simultaneous upload and download of 10 MB. There's no Tor involved here, just a client–server connection through the listed tunnel.
protocol | time |
---|---|
direct QUIC UDP | 3.7 s |
TCP-encapsulated QUIC | 10.6 s |
traditional meek | 23.3 s |
meek with encapsulated QUIC | 34.9 s |
I investigated the cause, and as best I can tell, it's due to QUIC congestion control. Running the programs with QUIC_GO_LOG_LEVEL=DEBUG
turns up many messages of the form Congestion limited: bytes in flight 50430, window 49299
, and looking at the packet trace, there are clearly places where the client and server have data to send (and the HTTP channel has capacity to send it), but they are holding back. In short, it looks like it's the problem that @ewust anticipated here ("This is also complicated if the timing/packet size layer tries to play with timings, and your reliability layer has some constraints on retransmissions/ACK timings or tries to do things with RTT estimation").
Of course bandwidth isn't the whole story, and it's possible that the Turbo Tunnel code has lower latency because it enables the client to send at any time. But it's disappointing to have something "turbo" that's slower than what came before, no matter the cause, especially as meek already had most of the benefits that Turbo Tunnel is supposed to provide, performance being the only dimension liable to improve. I checked and it looks like quic-go doesn't provide knobs to control how congestion control works. One option is to try another one of the candidate protocols, such as KCP, in the inner layer—converting from one to another is not too difficult.
Here are graphs showing the the differences in bandwidth and TCP segment size. Above the axis is upload; below is download. Source code and data for the graphs is at meek-turbotunnel-test.zip.
Zooming in, we can see the characteristic ping-pong traffic pattern of meek. The client waits for a response to be received, then immediately sends off another request. (The traffic trace was taken at the client.)
In the Turbo Tunnel case, the client is more free as to when it may send. But the bursts it sends are smaller, because they are being limited by QUIC congestion control.