Piecing together SOCKS5 Fragmentation

Joseph DyeSoftware Engineer / Protocol Nerd

09.05.202510 minutes

In a different blog we discuss the SOCKS5 protocol in depth and touch upon the UDP fragmentation feature - an ambiguous and confusing addition to the protocol. This blog is a short exploration of the origin of that feature and it's reasons for existing. I'd suggest you read that blog first. If you can't be bothered to do that, read the recap below.

Recap

layout for SOCKS5 UDP fragment headers
RESERVED	FRAGMENT	ADDRESS TYPE	ADDRESS	PORT
0x05	1	1	Variable	2

When proxying UDP traffic, the headers above are used to indicate the target server for that packet. These headers contain a fragment field which is a single byte ranging from 0 to 255.

0x00 means the packet is not part of a fragmented sequence and can be forwarded immediately. "Standalone" is how the RFC describes the packets.
0x01 to 0x7F (0 to 127) means the fragment is part of a fragmented sequence and this number is the fragments position within that greater sequence.
0x80 to 0xFF (128 to 255) means the the fragmented sequence has ended and that all the fragmented packets can be sent on their way. These packets are called "end of sequence" packets by the RFC.

Any fragmented packet we receive must be placed into a buffer and a five second timer must be started. Additional fragments received must be added to this buffer and if the timer expires before we receive an end of sequence packet, we must drop all fragments in the buffer. The SOCKS5 RFC explicitly disallows re-ordering of the fragments - any out of order fragment causes all fragments in that sequence to be dropped.

In the case that we aren't forced to drop all of our fragments, the RFC doesn't specify what is supposed to be done with them or what happens if different fragments have different targets? Do we create a fragment buffer for each target?

The SOCKS5 RFC even recommends against the fragmentation feature and makes implementation of the feature entirely optional.

It is recommended that fragmentation be avoided by applications wherever possible.

Implementation of fragmentation is optional; an implementation that does not support fragmentation MUST drop any datagram whose FRAG field is other than X'00'.

Rabbit hole

I am left wondering why this feature was every introduced. Not only is it poorly defined, it seems to be repeating work that is handled elsewhere and it is entirely optional which means clients would have to know the inner workings of the servers they are interacting with to know if they could even use this feature. If they didn't, all their traffic could have been getting refused with no way for them to know it.

In a review of existing SOCKS5 libraries, I have found no trace of the fragmentation feature. Rust crate 'fast-socks5' which is used in production by AnyIP, drops all packets that attempt to use fragmentation. In the Python library 'pysocks', fragmentation is similarly unused. The same goes for 'ProxiFyre' - a C++ tool used for redirecting packets on Windows through a SOCKS5 proxy.

// fast-socks5 implementation of UDP fragmentation
if frag != 0 {
    debug!("Discard UDP frag packets silently.");
    return Ok(());
}

Since I couldn't find an example of this feature actually being implemented in an open source library I decided to test the services of some of the biggest proxy providers who might have in-house protocol implementations.

After buying SOCKS5 capable IPs and attempting to use fragmentation with them, I found that Bright Data, Oxylabs, and other large players do not support this feature. At Ping Proxies, we've implemented fragmentation on the off chance that someone, somewhere requires it for their use-case but we find that to be highly unlikely and we don't expect any usage of this feature.

Going back in time

With nowhere seeming to use fragmentation, I tried to look through the mailing lists for discussion of the feature. After a considerable amount of time digging, I've found nothing. My archive does not go back far enough - the oldest email is from 1998.

The first draft of SOCKS5 did not contain fragmentation but the second draft in March 1995 did. This draft contained, nearly word for word, what would end up in the final RFC three years later and so there does not seem to have been a need for discussion in the emails I have access to.

The emails did mention the "reference implementation" of the protocol and so I went looking for that. Maybe I could glean something from the code base that implemented the feature...

The reference implementation would have been as lost-to-time as the mailing list if not for a single person archiving it (and sharing an understandably salty open letter about the maintainers of the implementation) but looking here did nothing to help.

Even with this source code, with the file relating to UDP handling last being modified in the year 2000, I was unable to learn anything about fragmentation. Why? Because the reference implementation did not implement it either. There's a comment on the line where it should be handled saying "unused".

static IORETTYPE lsProtoSend(S5IOHandle fd, lsProxyInfo *pri, const IOPTRTYPE msg, IOLENTYPE len, int flags, const ss *dest, int dstlen) {
    // ...
    SETVERS(header, 0);
    SETRESV(header, 0);                                 /* unused - reserved */
    SETFRAG(header, 0);                                 /* unused - fragment */
    // ...
}

Reaching out

At this point I was left with no other options. All modern libraries ignore the feature. All proxy services ignore the feature. Even the reference implementation ignores the feature. My library which does implement it could possibly be the only library to do so. Ever. And I don't even know if my implementation is correct because there's so many ambiguities.

My curiosity at this point was uncontrollable. There's no need for me to understand this feature at this point. There's no value for my company to extract. There's no learning material to use in my upcoming blogs. This was just a case of me not liking not knowing. So... I emailed the RFC author: Marcus Leech.

His LinkedIn displayed an email that is no longer valid so I emailed his workplace and asked to be put in contact with him. I described my situation. I acknowledged that it was a stupid question. I asked very nicely...

[...] Do you possess an archive of the mailing lists, or do you perhaps recall some of the reasoning given for the introduction of fragments to the specification?

I didn't think I would receive a reply, this was a question about a small protocol from 30 years ago so why bother? But... I did get a reply. Quickly too. Within 3 hours of sending the email.

Honestly, I don't remember. It was such a long long time ago, and very few installations that I'm aware of even use SOCKS5 anymore.

Unfortunately, it was what you might expect. I barely remember code I wrote a month ago, let alone the reasoning behind why that code was required. And so reasoning behind this feature is lost to time. If the three people who read this blog can provide any insight, please reach out. I'd love to know.

Conclusion

In researching this I've come to the conclusion that the internet is not forever. Digitization has made information more accessible than ever and yet, in doing so, we've made it more ephemeral than ever. Now that digital media is often the only record of something, it's a certainty that some - or more likely most - information will be lost to time in greater numbers and at faster rates than ever before.

A clay tablet about shitty copper survived thousands of years but discussion of a common protocol for proxying internet traffic didn't survive twenty five...