SOCKS Protocol Deep Dive

Joseph DyeSoftware Engineer / Protocol Nerd

09.05.202515 minutes

Introduction to SOCKS

As part of our in-house proxy solution - Ancelotti - we implemented our own Rust-based sans-IO SOCKS5 library. In the near future we will be releasing a three part blog series discussing how to implement the entirety of the SOCKS5 protocol using different methods for representing state machines (using enums, using typestates, using coroutines).

Before doing that however, we felt the need to talk about what SOCKS5 is, why you should use it, why you shouldn't use it and what the future of proxying with SOCKS5 might look like.

The need for clear explanations of SOCKS5 becomes even more apparent when considering the quality of information currently available online. Many proxy service providers publish articles about SOCKS5 that often recycle information from previous publications without sufficient verification or understanding of the topics they're writing about.

These articles are frequently created with SEO in mind rather than educational value and sometimes encourage unnecessary upgrades and services. The content creation is often delegated to writers who lack specialized technical knowledge, making it difficult for them to evaluate or correct the source material they're referencing.

This leads to a mess of online blogs which often misrepresent technical details or are flat-out false. Let's look at some of the claims made:

SOCKS5 is faster than HTTP. Wrong. The SOCKS5 handshake requires three roundtrips whereas HTTP requires just one. HTTP will nearly always be faster.
SOCKS5 is more secure than HTTP. Wrong. Both SOCKS5 and HTTP make no mention of security in their specifications. Security, via TLS, happens at a lower level than these protocols and is nearly entirely agnostic to the proxying protocol.
SOCKS5 lets you tunnel at a lower level than HTTP. Wrong. Once the handshakes are complete, SOCKS5 and HTTP, will both provide you with a TCP connection that you can read and write to and from. Anything that you can do with a tunnel established by SOCKS5, you can do with a tunnel established by HTTP.
SOCKS5 can do more than HTTP. Correct. SOCKS5 can let you proxy UDP traffic (for things like video streaming and gaming) and do things like FTP via a proxy. HTTP is not able to do either of these things.

Based on these points, it should be quite clear that HTTP has the edge over SOCKS in every way when it comes to tunneling TCP traffic. Consequently, as a general rule of thumb, use HTTP proxies over SOCKS proxies unless you have a legitimate use case (you'll know when you do) or your tools require one or the other.

Now, if you are not interested in - in essence - reading the SOCKv5 RFC then the paragraphs above are all you need to know.

The Handshake

Proxying can be broken up into two stages: the handshake and the tunnel. During the handshake information is exchanged such as the desired target and the authentication details. The handshake is, essentially, the client and the server making sure they are on the same page before the real work begins. The tunnel is where this "real work" happens and data is funneled from the client to the target, and from the target to the client.

The most common use of a proxy is for establishing TCP tunnels and there are two ways to do this.

Firstly, by sending a HTTP/1.1 request with the CONNECT method. The proxy server reads the host from this request, establishes a connection to that host, returns 200 OK to the user and then blindly funnels any traffic from you to the target and from the target to you.
Secondly, by using SOCKS5 which consists of two request/response pairs and an optional authentication request/response pair. Once these requests have been sent and the responses have been received, the tunnel functions identically to the tunnel formed by a HTTP CONNECT request.

Let's take these request/response pairs and examine them in detail.

Greeting/Selection

layout for SOCKS5 greeting message
VERSION	NUMBER OF METHODS	METHODS
0x05	1 byte	1 to 255 bytes

layout for SOCKS5 selection message
VERSION	METHOD
0x05	1 byte

The purpose of this first request/response pair is to establish what, if any, authentication method is going to be used. The client sends a list of authentication methods it is able to use and the server response with it's preferred method from that list.

Both messages begin with a reserved version byte (0x05) before providing the rest of their data. The greeting provides the list of authentication methods by telling us how many items it is going to send us, and then sending us that many. This layout for variable length data occurs multiple times in the other SOCKS5 messages.

The authentication methods themselves are each one byte long and defined in the spec:

SOCKS5 auth method options
BYTE	MEANING
0x00	no authentication required
0x01	GSSAPI
0x02	username and password
0x00 to 0x7F	IANA assigned
0x80 to 0xFE	reserved for private use
0xFF	no acceptable method

Of these methods, only "no auth" (0x00) and "user/pass" auth (0x02) are regularly used today.

Auth Request/Response

layout for SOCKS5 user/pass auth request
VERSION	USERNAME LENGTH	USERNAME	PASSWORD LENGTH	PASSWORD
0x01	1 byte	1 to 255 bytes	1 byte	1 to 255 bytes

layout for SOCKS5 user/pass auth response
VERSION	STATUS
0x01	1

If "no auth" is chosen as the method, this request/response can be skipped. If however "user/pass" authentication was chosen then the steps defined in RFC 1929 have to be followed. That means, the two messages above have to be sent before proceeding to the next message pair. Note that in both messages the version is 0x01, not 0x05 like you might expect.

The request provides the username and password of the client, represented in the variable length list format we saw in the previous section. The response provides a status code to the user which will be a single byte: 0x00 if authentication was successful and anything else if it failed.

Proxy Request/Response

layout for SOCKS5 user/pass proxy request
VERSION	COMMAND	RESERVED	ADDRESS TYPE	ADDRESS	PORT
0x05	1 byte	0x00	1 byte	Variable	2 bytes

layout for SOCKS5 user/pass proxy response
VERSION	REPLY	RESERVED	ADDRESS TYPE	ADDRESS	PORT
0x05	1	0x00	1	Variable	2

In the final request/response pair the client sends a message containing a command and address. The command can have one of three values: CONNECT / BIND / ASSOCIATE .

CONNECT is used for establishing a TCP tunnel, it's just an inefficient version of HTTP CONNECT. The address provided in the address field is that of the target to be proxied to.
BIND is used to enable target-to-client connections. A socket is opened on the server and all TCP traffic to this opened socket will be proxied to the client. BIND is mainly for things like FTP where two connections are required. One, formed by the client to the server (CONNECT) is used for controlling the FTP process (picking the files to download and sending credentials). The other, formed by the server to the client (BIND) is used for transferring the requested data.
ASSOCIATE is used to enable UDP proxying. Like with BIND a socket is opened on the server. Unlike with BIND this socket proxies UDP traffic. The address in the address field is where the client is going to be sending UDP segments from - if left as 0.0.0.0 - we assume that it is the same IP they performed the handshake with. The use for ASSOCIATE is for streaming video, VOIP (voice over IP) or gaming.

With BIND and ASSOCIATE the opened sockets will remain open as long as the original connection remains open. They function similarly, except for them working with different traffic types.

SOCKS5 command options
BYTE	MEANING
0x01	CONNECT
0x02	BIND
0x03	ASSOCIATE

The commands are simple. More interesting is the introduction of the Address Type and Address fields. The type can be one of three values and is used to indicate if the address is an IPv4, IPv6 or domain. If it is an IPv4 or IPv6, the Address field will be four or sixteen bytes long respectively. If it's a domain, it will be a another variable length field from 1 to 255 bytes.

SOCKS5 address types
MEANING	BYTE	LENGTH
IPv4	0x01	4 bytes
IPv6	0x04	16 bytes
DOMAIN	0x03	1 byte + (1 to 255 bytes)

The response contains fields for a status code and an address too. The status code can be one of a few values and serves as a way to return some sort of message back to the client. The address in the response is used similarly to how it is in the request, with a different meaning depending on the command we received. If it was a CONNECT, the field means nothing, and for BIND and ASSOCIATE it's the socket we chose to open. This socket does not have to be the same as the socket in the request and the client must use the socket we return to them for their proxying.

SOCKS5 reply options
BYTE	MEANING
0x01	success
0x02	general server failure
0x03	connection not allowed by ruleset
0x04	network unreachable
0x05	host unreachable
0x06	connection refused
0x07	ttl expired
0x08	command not supported

While I have described the handshake as a series of message pairs for easier understanding, the BIND command breaks this convention since it sends two responses. The first is sent after the server binds the local socket and the second after the incoming connection from the target server is established.

In the case of an error status code or a CONNECT you must make sure to provide the address in the response even though it is redundant. Most SOCKS5 implementations expect it and will break if they do not receive it.

The Tunnel

Once the handshake is formed, we can enter the Tunnel phase. In this phase the proxy server sends data to our target and relays data from the target back to us. The specifics of how this happens are a little different depending on the chosen command.

CONNECT

The CONNECT method is essentially "fire and forget" when tunneling. Data is simply transferred bidirectionally until either end of the tunnel closes the connection; there are no special cases to handle or headers to process. It's with BIND and ASSOCIATE that things become a little more complex.

BIND

BIND is also "fire and forget", funneling data between the client and the target, typically used in conjunction with a CONNECT request.

ASSOCIATE

layout for SOCKS5 UDP fragment headers
RESERVED	FRAGMENT	ADDRESS TYPE	ADDRESS	PORT
0x05	1 byte	1 byte	Variable	2 bytes

The RFC tells us that when tunneling UDP traffic, any packets the client sends to the socket must be wrapped in the headers above. These packets will have their headers stripped and then be sent to their requested target. Any packets other machines send to the socket must arrive without these headers. These packets will then be wrapped in the headers and sent to the client. The server will only accept and forward packets from the client whilst the initial TCP connection used to send the ASSOCIATE command is open.

That all sounds straightforward, but unfortunately it begins to get complicated when you examine the headers we expect on the UDP packets. These headers contain a `fragment` field which is a single byte ranging from 0 to 255.

0x00 means the packet is not part of a fragmented sequence and can be forwarded immediately. "Standalone" is how the RFC describes the packets.
0x01 to 0x7F (0 to 127) means the fragment is part of a fragmented sequence and this number is the fragments position within that greater sequence.
0x80 to 0xFF (128 to 255) means the the fragmented sequence has ended and that all the fragmented packets can be sent on their way. These packets are called "end of sequence" packets by the RFC.

Any fragmented packet we receive must be placed into a buffer and a five second timer must be started. Additional fragments received must be added to this buffer and if the timer expires before we receive an end of sequence packet, we must drop all fragments in the buffer.

The reassembly queue must be reinitialized and the associated fragments abandoned whenever [...] a new datagram arrives carrying a FRAG field whose value is less than the highest FRAG value processed for this fragment sequence.

The SOCKS5 RFC explicitly disallows re-ordering of the fragments. Any out of order fragment causes all fragments in that sequence to be dropped. In the case that we aren't forced to drop all of our fragments, the RFC doesn't specify what is supposed to be done with them. However, given that the name of the buffer is the "reassembly queue" it makes sense to do our best to reassemble the fragments into larger ones where possible.

Implementation of fragmentation is optional; an implementation that does not support fragmentation MUST drop any datagram whose FRAG field is other than X'00'.

The fragmentation aspect of the SOCKS5 protocol is ambiguous and in a world where it was not vestigial, optional and hardly used, it would be the source of many bugs and considerable developer frustration.

Fortunately, fragmentation isn't used by anyone so there is no need to think about it any further. Ignore it if you ever write a client or server implementation.

On that underwhelming note, we've walked through the entirety of the RFC and you should have a deeper understanding of the protocol than you did before and be able to continue on to our implementation blogs (releasing soon, we promise) with little issue.

If, however, simply ignoring fragmentation isn't satisfying, you can read this blog.

SOCKS6

In 2018 through to 2021, some work was done to introduce SOCKS6 which filed down some of the rough edges we've encountered, with the main focus being reducing the number of round-trips required. The key takeaways from the draft SOCKS6 RFC are:

Leveraging TCP Fast Open (TFO) which is an optional extension that allows the completion of the TCP handshake with one less round-trip. It does this by attaching a cookie to the first packet with data that would otherwise be received in follow up packets.
Combining all messages of the handshake into one: providing the server with the list of acceptable authentication methods and issuing the desired command all in one request.
Allowing the sending of authentication messages optimistically - which is to say, sending the authentication messages before a response to the initial message has been received. Effectively allowing the client to assume that their desired authentication method and host will be acceptable to the server.
Providing a flag within the messages to encourage the server to open the connection to the target using TFO.

Unfortunately, SOCKS6 never took off and I think there are a few reasons why.

TCP Fast Open is essentially a dead feature and relying on it to help establish these connections was perhaps too heavily integrated with the feature to survive it's death.
Optimistic sending of messages already works... While not being explicitly allowed in the specification, all proxy services offered by the companies I've previously mentioned support it. Our server implementation supports it without us even meaning to; the nature of properly reading and parsing messages from a stream means that most implementations support it unintentionally.
Newer and cooler technologies exist which are moving away from TCP as a whole, so a protocol which relies upon TCP to establish tunnels is outdated. With QUIC and HTTP/3 implemented atop it there's the option to establish TCP and UDP tunnels using HTTP/3 CONNECT and CONNECT-UDP respectively. In fact, it's now possible to establish UDP proxying via HTTP/1 using the same CONNECT-UDP. We have a similar "Deep Dive" blog on HTTP proxying coming soon where we will discuss these features.

Conclusion

SOCKS is a versatile protocol which offers more functionality than HTTP proxying but at the expense of slower initialization times. If HTTP proxying is possible, SOCKS CONNECT should rarely be used over it. The real reason to use SOCKS is for the BIND and ASSOCIATE commands, allowing server-to-client TCP connections and UDP proxying respectively. If you have a genuine use case for SOCKS, you'll know it. Otherwise use HTTP for everything unless your client requires otherwise.

If you're interested in implementing the SOCKS5 protocol in Rust, keep an eye out for our upcoming three part blog series.