SCTP is a message oriented, reliable transport protocol with direct support for multihoming that runs on top of IP or UDP, and supports both v4 and v6 versions.
Like TCP, SCTP provides reliable, connection oriented data delivery with congestion control. Unlike TCP, SCTP also provides message boundary preservation, ordered and unordered message delivery, multi-streaming and multi-homing. Detection of data corruption, loss of data and duplication of data is achieved by using checksums and sequence numbers. A selective retransmission mechanism is applied to correct loss or corruption of data.
In this manual the socket API for the SCTP User-land implementation will be described. It is based on RFC 6458. The main focus of this document is on pointing out the differences to the SCTP Sockets API. For all aspects of the sockets API that are not mentioned in this document, please refer to RFC 6458. Questions about SCTP itself can hopefully be answered by RFC 4960.
The user-land stack has been tested on FreeBSD 10.0, Ubuntu 11.10, Windows 7, Mac OS X 10.6, and Mac OS X 10.7. The current version of the user-land stack is provided on github. Download the tarball and untar it in a folder of your choice. The tarball contains all the sources to build the libusrsctp, which has to be linked to the object file of an example program. In addition there are two applications in the folder
programs that can be built and run.
In the folder
$ ./bootstrap $ ./configure $ make
Now, the library
libusrsctp.la has been built in the subdirectory
usrsctplib, and the example programs are ready to run from the subdirectory
If you have root privileges or are in the sudoer group, you can install the library in
/usr/local/lib and copy the header file to
/usr/include with the command
$ sudo make install
On Windows you need a compiler like Microsoft Visual Studio. You can build the library and the example programs with the command line tool of the compiler by typing
$ nmake -f Makefile.nmake
in the directory
Create a directory outside the
usrsctp directory, enter it and generate files by typing
$ cmake <path-to-usrsctp-sources> $ cmake --build .
By default CMake generates a DEBUG build with verbose output.
Several test programs are included, including a discard server and a client. You can run both to send data from the client to the server. The client reads data from stdin and sends them to the server, which prints the message in the terminal and discards it. The sources of the server are also provided here and those of the client here.
Both programs can either send data over SCTP directly or use UDP encapsulation, thus encapsulating the SCTP packet in a UDP datagram. The first mode works on loopback or in a protected setup without any NAT boxes involved. In all other cases it is better to use UDP encapsulation.
The usage of the
$ discard_server [local_encaps_port remote_encaps_port]
For UDP encapsulation the ports have to be specified. The local and remote encapsulation ports can be arbitrarily set. For example, you can call
$ ./discard_server 11111 22222
on a Unix-like OS and
$ discard_server.exe 11111 22222
The client needs two additional parameters, the server's address and its port. Its usage is
$ client remote_addr remote_port [local_port local_encaps_port remote_encaps_port]
The remote address is the server‘s address. If client and server are started on the same machine, the loopback address
127.0.0.1 can be used for Unix-like OSs and the local address on Windows. The discard port is 9, thus 9 has to be taken as remote port. The encapsulation ports have to match those of the server, i.e. the server’s
local_encaps_port is the client's
remote_encaps_port and vice versa. Thus, the client can be started with
$ ./client 127.0.0.1 9 0 22222 11111
on a Unix-like OS and
$ client.exe 192.168.0.1 9 0 22222 11111
on Windows provided your local IP address is 192.168.0.1.
To send data over SCTP directly you might need root privileges because raw sockets are used. Thus instead of specifying the encapsulation ports you have to start the programs prepending
sudo or in case of Windows start the program from an administrator console.
Instead of asking constantly for new data, a callback API can be used that is triggered by SCTP. A callback function has to be registered that will be called whenever data is ready to be delivered to the application.
discard_server has a flag to switch between the two modi. If
use_cb is set to 1, the callback API will be used. To change the setting, just set the flag and compile the program again.
All system calls start with the prefix
usrsctp_ to distinguish them from the kernel variants. Some of them are changed to account for the different demands in the userland environment.
Every application has to start with
usrsctp_init(). This function calls
sctp_init() and reserves the memory necessary to administer the data transfer. The function prototype is
void usrsctp_init(uint16_t udp_port)
As it is not always possible to send data directly over SCTP because not all NAT boxes can process SCTP packets, the data can be sent over UDP. To encapsulate SCTP into UDP a UDP port has to be specified, to which the datagrams can be sent. This local UDP port is set with the parameter
udp_port. The default value is 9899, the standard UDP encapsulation port. If UDP encapsulation is not necessary, the UDP port has to be set to 0.
At the end of the program
usrsctp_finish() should be called to free all the memory that has been allocated before. The function prototype is
The return code is 0 on success and -1 in case of an error.
A representation of an SCTP endpoint is a socket. Is it created with
usrsctp_socket(). The function prototype is:
struct socket * usrsctp_socket(int domain, int type, int protocol, int (*receive_cb)(struct socket *sock, union sctp_sockstore addr, void *data, size_t datalen, struct sctp_rcvinfo, int flags, void *ulp_info), int (*send_cb)(struct socket *sock, uint32_t sb_free), uint32_t sb_threshold, void *ulp_info)
The arguments taken from RFC 6458 are:
In usrsctp, a callback API can be used.
sb_thresholdspecifies the amount of free space in the send socket buffer before the send function in the application is called. If a send callback function is specified and
sb_thresholdis 0, the function is called whenever there is room in the send socket buffer.
ulp_infoparameter. This value will be passed to the
receive_cbwhen it is invoked.
usrsctp_socket() returns the pointer to the new socket in the
struct socket data type. It will be needed in all other system calls. In case of a failure NULL is returned and errno is set to the appropriate error code.
The function prototype of
void usrsctp_close(struct socket *so)
Thus the only difference is the absence of a return code.
The following functions have the same functionality as their kernel pendants. There prototypes are described in the following subsections. For a detailed description please refer to RFC 6458.
int usrsctp_bind(struct socket *so, struct sockaddr *addr, socklen_t addrlen)
struct sockaddr_infor an IPv4 address or
struct sockaddr_in6for an IPv6 address).
usrsctp_bind() returns 0 on success and -1 in case of an error.
int usrsctp_listen(struct socket *so, int backlog)
usrsctp_listen() returns 0 on success and -1 in case of an error.
struct socket * usrsctp_accept(struct socket *so, struct sockaddr * addr, socklen_t * addrlen)
struct sockaddr_infor an IPv4 address or
struct sockaddr_in6for an IPv6 address).
usrsctp_accept() returns the accepted socket on success and NULL in case of an error.
int usrsctp_connect(struct socket *so, struct sockaddr *name, socklen_t addrlen)
struct sockaddr_infor an IPv4 address or
struct sockaddr_in6for an IPv6 address).
usrsctp_connect() returns 0 on success and -1 in case of an error.
int usrsctp_shutdown(struct socket *so, int how)
usrsctp_shutdown() returns 0 on success and -1 in case of an error.
Since the publication of RFC 6458 there is only one function for sending and one for receiving that is not deprecated. Therefore, only these two are described here.
ssize_t usrsctp_sendv(struct socket *so, const void *data, size_t len, struct sockaddr *addrs, int addrcnt, void *info, socklen_t infolen, unsigned int infotype, int flags)
struct iovecdata structure, we chose to pass the data as a void pointer.
addrscan be set to NULL.
void *info. The data types
struct sctp_prinfo, and
struct sctp_sendv_spaare supported as defined in RFC 6458. Support for
struct sctp_authinfois not implemented yet, therefore, errno is set EINVAL and -1 will be returned, if it is used.
usrsctp_sendv() returns the number of bytes sent, or -1 if an error occurred. The variable errno is then set appropriately.
ssize_t usrsctp_recvv(struct socket *so, void *dbuf, size_t len, struct sockaddr *from, socklen_t * fromlen, void *info, socklen_t *infolen, unsigned int *infotype, int *msg_flags)
usrsctp_sendv()the data is returned in a buffer.
infohave to be handled in the same way as specified in RFC 6458.
*infotypeis set to the type of the info buffer. The current defined values are
MSG_NOTIFICATION). Note that this field is an in/out parameter. Options for the receive may also be passed into the value (e.g.,
MSG_EOR). Returning from the call, the flags' value will differ from its original value.
usrsctp_recvv() returns the number of bytes received, or -1 if an error occurred. The variable errno is then set appropriately.
Socket options are used to change the default behavior of socket calls. Their behavior is specified in RFC 6458. The functions to get or set them are
int usrsctp_getsockopt(struct socket *so, int level, int optname, void *optval, socklen_t *optlen)
int usrsctp_setsockopt(struct socket *so, int level, int optname, const void *optval, socklen_t optlen)
and the arguments are
These functions return 0 on success and -1 in case of an error.
In kernel implementations like for instance FreeBSD, it is possible to change parameters in the operating system. These parameters are called sysctl variables.
In usrsctp applications can set or retrieve these variables with the functions
void usrsctp_sysctl_set_ ## (uint32_t value)
uint32_t usrsctp_sysctl_get_ ## (void)
## stands for the name of the variable.
In the following paragraphs a short description of the parameters will be given.
The space of the available send buffer can be changed from its default value of 262,144 bytes to a value between 0 and
2^32 - 1 bytes.
The space of the available receive buffer can be changed from its default value of 262,144 bytes to a value between 0 and
2^32 - 1 bytes.
The TCB (Thread Control Block) hash table sizes, i.e. the size of one TCB in the hash table, can be tuned between 1 and
2^32 - 1 bytes. The default value is 1,024 bytes. A TCB contains for instance pointers to the socket, the endpoint, information about the association and some statistic data.
The PCB (Protocol Control Block) hash table sizes, i.e. the size of one PCB in the hash table, can be tuned between 1 and
2^32 - 1 bytes. The default value is 256 bytes. The PCB contains all variables that characterize an endpoint.
This parameters tunes the maximum number of cached resources in the system. It can be set between 0 and
2^32 - 1. The default value is 1000.
This parameters tunes the maximum number of cached resources in an association. It can be set between 0 and
2^32 - 1. The default value is 10.
Data is stored in mbufs. Several mbufs can be chained together. The maximum number of small mbufs in a chain can be set with this parameter, before an mbuf cluset is used. The default is 5.
TBD This parameter configures the threshold below which more space should be added to a socket send buffer. The default value is 1452 bytes.
The retransmission timeout (RTO), i.e. the time that controls the retransmission of messages, has several parameters, that can be changed, for example to shorten the time, before a message is retransmitted. The range of these parameters is between 0 and
2^32 - 1ms.
The default value for the maximum retransmission timeout in ms is 60,000 (60secs).
The default value for the minimum retransmission timeout in ms is 1,000 (1sec).
The default value for the initial retransmission timeout in ms is 3,000 (3sec). This value is only needed before the first calculation of a round trip time took place.
The default value for the maximum retransmission timeout for an INIT chunk in ms is 60,000 (60secs).
A cookie has a specified life time. If it expires the cookie is not valid any more and an ABORT is sent. The default value in ms is 60,000 (60secs).
Set the default time between two heartbeats. The default is 30,000ms.
If a SHUTDOWN is not answered with a SHUTDOWN-ACK while the shutdown guard timer is still running, the association will be aborted after the default of 180secs.
TBD To set the size of the packets to the highest value possible, the maximum transfer unit (MTU) of the complete path has to be known. The default time interval for the path mtu discovery is 600secs.
TBD The default secret lifetime of a server is 3600secs.
TBD Vtag time wait time, 0 disables it. Default: 60secs
Transmissions and retransmissions of messages might fail. To protect the system against too many retransmissions, limits have to be defined.
The default maximum number of retransmissions of an INIT chunks is 8, before an ABORT is sent.
This parameter sets the maximum number of failed retransmissions before the association is aborted. The default value is 10.
This parameter sets the maximum number of path failures before the association is aborted. The default value is 5. Notice that the number of paths multiplied by this value should be equal to
sctp_assoc_rtx_max_default. That means that the default configuration is good for two paths.
The parameter configures how many times an unlucky chunk can be retransmitted before the association aborts. The default is set to 30.
TBD Default potentially failed threshold. Default: 65535
TBD When one-2-one hits qlimit abort. Default: 0
The SACK frequency defines the number of packets that are awaited, before a SACK is sent. The default value is 2.
As a SACK (Selective Acknowlegment) is sent after every other packet, a timer is set to send a SACK in case another packet does not arrive in due time. The default value for this timer is 200ms.
TBD This is a flag to turn the controlling of the coherence of SACKs on or off. The default value is 1 (on).
If a slow hosts receives data on a lossy link it is possible that its receiver window is full and new data can only be accepted if one chunk with a higher TSN (Transmission Sequence Number) that has previously been acknowledged is dropped. As a consequence the sender has to store data, even if they have been acknowledged in case they have to be retransmitted. If this behavior is not necessary, non-renegable SACKs can be turned on. By default the use of non-renegable SACKs is turned off.
In some cases it is not desirable to wait for the SACK timer to expire before a SACK is sent. In these cases a bit called SACK-IMMEDIATELY (see draft-tuexen-tsvwg-sctp-sack-immediately-09) can be set to provoke the instant sending of a SACK. The default is to turn it off.
TBD SCTP ABC max increase per SACK (L). Default: 1
Max burst defines the maximum number of packets that may be sent in one flight.
The default value for max burst is 0, which means that the number of packets sent as a flight is not limited by this parameter, but may be by another one, see the next paragraph.
The use of max burst is based on the size of the congestion window (cwnd). This parameter is set by default.
Heartbeats are mostly used to verify a path. Their number can be limited. The default is 4.
In the state of fast retransmission the number of packet bursts can be limited. The default value is 4.
In order to keep track of the peer‘s advertised receiver window, the sender calculates the window by subtracting the amount of data sent. Yet, some OSs reduce the receiver window by the real space needed to store the data. This parameter sets the additional amount to debit the peer’s receiver window per chunk sent. The default value is 256, which is the value needed by FreeBSD.
This parameter sets the maximum number of chunks that can be queued per association. The default value is 512.
TBD The minimum size when splitting a chunk is 2904 bytes by default.
TBD This parameter can be tuned for scaling of number of chunks and messages. The default is10.
TBD This parameter configures the minimum size of the residual data chunk in the second part of the split. The default is 1452.
The calculation of the round trip time (RTT) depends on several parameters.
TBD Shift amount for bw smoothing on rtt calc. Default: 4
TBD Shift amount for rtt smoothing on rtt calc. Default: 5
TBD What to return when rtt and bw are unchanged. Default: 0
The congestion control should protect the network against fast senders.
Explicit congestion notifications are turned on by default.
This parameter sets the default algorithm for the congestion control. Default is 0, i.e. the one specified in RFC 4960.
Set the initial congestion window in MTUs. The default is 3.
TBD Enable for RTCC CC datacenter ECN. Default: 1
TBD How many the sames it takes to try step down of cwnd. Default: 20
An important extension of SCTP is the dynamic address reconfiguration (see RFC 5061), also known as ADD-IP, which allows the changing of addresses during the lifetime of an association. For this feature the AUTH extension (see RFC 4895) is necessary.
If SCTP Auto-ASCONF is enabled, the peer is informed automatically when a new address is added or removed. This feature is enabled by default.
By default the sending of multiple ASCONFs is disabled.
The use of AUTH, which is normally turned on, can be disabled by setting this parameter to 0.
It is also possible to disable the requirement to use AUTH in conjunction with ADD-IP by setting this parameter to 1.
A prominent feature of SCTP is the possibility to use several addresses for the same association. One is the primary path, and the others are needed in case of a path failure. Using CMT the data is sent on several paths to enhance the throughput.
To turn CMT on, this parameter has to be set to 1.
To use delayed acknowledgments with CMT this parameter has to be set to 1.
For CMT it makes sense to split the send and receive buffer to have shares for each path. By default buffer splitting is turned off.
To be able to pass NAT boxes, the boxes have to handle SCTP packets in a specific way.
SCTP NAT friendly operation. Default:1
Enable sending of the nat-friendly SCTP option on INITs. Default: 0
Set the SCTP/UDP tunneling port. Default: 9899
TBD Enable SCTP base mobility. Default: 0
TBD Enable SCTP fast handoff. default: 0
Calculating the checksum for packets sent on loopback is turned off by default. To turn it on, set this parameter to 0.
The peer is notified about the number of outgoing streams in the INIT or INIT-ACK chunk. The default is 10.
Determines whether SCTP should respond to the drain calls. Default: 1
TBD Enforce strict data ordering, abort if control inside data. Default: 0
Set the default stream scheduling module. Implemented modules are:
TBD Default fragment interleave level. Default: 1
TBD Enable SCTP blackholing. Default: 0
Set the logging level. The default is 0.
Turn debug output on or off. It is disabled by default. To obtain debug output,
SCTP_DEBUG has to be set as a compile flag.
|sctp_sendspace||Send buffer space||1864135|
|sctp_recvspace||Receive buffer space||1864135|
|sctp_hashtblsize||Tunable for TCB hash table sizes||1024|
|sctp_pcbtblsize||Tunable for PCB hash table sizes||256|
|sctp_system_free_resc_limit||Cached resources in the system||1000|
|sctp_asoc_free_resc_limit||Cashed resources in an association||10|
|sctp_rto_max_default||Default value for RTO_max||60000ms|
|sctp_rto_min_default||Default value for RTO_min||1000ms|
|sctp_rto_initial_default||Default value for RTO_initial||3000ms|
|sctp_init_rto_max_default||Default value for the maximum RTO for sending an INIT||60000ms|
|sctp_valid_cookie_life_default||Valid cookie life time||60000ms|
|sctp_init_rtx_max_default||Maximum number of INIT retransmissions||8|
|sctp_assoc_rtx_max_default||Maximum number of failed retransmissions before the association is aborted||10|
|sctp_path_rtx_max_default||Maximum number of failed retransmissions before a path fails||5|
|sctp_ecn_enable||Enabling explicit congestion notifications||1|
|sctp_strict_sacks||Control the coherence of SACKs||1|
|sctp_delayed_sack_time_default||Default delayed SACK timer||200ms|
|sctp_sack_freq_default||Default SACK frequency||2|
|sctp_nr_sack_on_off||Turn non-renegable SACKs on or off||0|
|sctp_enable_sack_immediately||Enable sending of the SACK-IMMEDIATELY bit||0|
|sctp_no_csum_on_loopback||Enable the compilation of the checksum on packets sent on loopback||1|
|sctp_peer_chunk_oh||Amount to debit peers rwnd per chunk sent||256|
|sctp_max_burst_default||Default max burst for SCTP endpoints||0|
|sctp_use_cwnd_based_maxburst||Use max burst based on the size of the congestion window||1|
|sctp_hb_maxburst||Confirmation Heartbeat max burst||4|
|sctp_max_chunks_on_queue||Default max chunks on queue per asoc||512|
|sctp_min_split_point||Minimum size when splitting a chunk||2904|
|sctp_chunkscale||Tunable for Scaling of number of chunks and messages||10|
|sctp_mbuf_threshold_count||Maximum number of small mbufs in a chain||5|
|sctp_heartbeat_interval_default||Deafult time between two Heartbeats||30000ms|
|sctp_pmtu_raise_time_default||Default PMTU raise timer||600secs|
|sctp_shutdown_guard_time_default||Default shutdown guard timer||180secs|
|sctp_secret_lifetime_default||Default secret lifetime||3600secs|
|sctp_add_more_threshold||Threshold when more space should be added to a socket send buffer||1452|
|sctp_nr_outgoing_streams_default||Default number of outgoing streams||10|
|sctp_cmt_on_off||Turn CMT on or off.||0|
|sctp_cmt_use_dac||Use delayed acknowledgment for CMT||0|
|sctp_fr_max_burst_default||Default max burst for SCTP endpoints when fast retransmitting||4|
|sctp_auto_asconf||Enable SCTP Auto-ASCONF||1|
|sctp_multiple_asconfs||Enable SCTP Muliple-ASCONFs||0|
|sctp_asconf_auth_nochk||Disable SCTP ASCONF AUTH requirement||0|
|sctp_auth_disable||Disable SCTP AUTH function||0|
|sctp_nat_friendly||SCTP NAT friendly operation||1|
|sctp_inits_include_nat_friendly||Enable sending of the nat-friendly SCTP option on INITs.||0|
|sctp_udp_tunneling_port||Set the SCTP/UDP tunneling port||9899|
|sctp_do_drain||Determines whether SCTP should respond to the drain calls||1|
|sctp_abort_if_one_2_one_hits_limit||When one-2-one hits qlimit abort||0|
|sctp_strict_data_order||Enforce strict data ordering, abort if control inside data||0|
|sctp_min_residual||Minimum residual data chunk in second part of split||1452|
|sctp_max_retran_chunk||Maximum times an unlucky chunk can be retransmitted before the association aborts||30|
|sctp_default_cc_module||Default congestion control module||0|
|sctp_default_ss_module||Default stream scheduling module||0|
|sctp_default_frag_interleave||Default fragment interleave level||1|
|sctp_mobility_base||Enable SCTP base mobility||0|
|sctp_mobility_fasthandoff||Enable SCTP fast handoff||0|
|sctp_L2_abc_variable||SCTP ABC max increase per SACK (L)||1|
|sctp_vtag_time_wait||Vtag time wait time, 0 disables it.||60secs|
|sctp_blackhole||Enable SCTP blackholing||0|
|sctp_path_pf_threshold||Default potentially failed threshold||65535|
|sctp_rttvar_bw||Shift amount for bw smoothing on rtt calc||4|
|sctp_rttvar_rtt||Shift amount for rtt smoothing on rtt calc||5|
|sctp_rttvar_eqret||What to return when rtt and bw are unchanged||0|
|sctp_steady_step||How many the sames it takes to try step down of cwnd||20|
|sctp_use_dccc_ecn||Enable for RTCC CC datacenter ECN||1|
|sctp_buffer_splitting||Enable send/receive buffer splitting||0|
|sctp_initial_cwnd||Initial congestion window in MTUs||3|
|sctp_debug_on||Turns debug output on or off.||0|
Stream Control Transmission Protocol. RFC 4960, September 2007.
M. Tüxen, R. Stewart, P. Lei, and E. Rescorla:
Authenticated Chunks for the Stream Control Transmission Protocol (SCTP). RFC 4895, August 2007.
R. Stewart, Q. Xie, M. Tüxen, S. Maruyama, and M. Kozuka:
Stream Control Transmission Protocol (SCTP) Dynamic Address Reconfiguration. RFC 5061, September 2007.
R. Stewart, M. Tüxen, K. Poon, and V. Yasevich:
Sockets API Extensions for the Stream Control Transmission Protocol (SCTP). RFC 6458, Dezember 2011.
R. Stewart, M. Tüxen, and P. Lei:
Stream Control Transmission Protocol (SCTP) Stream Reconfiguration. RFC 6525, February 2012.
M. Tüxen and R. Stewart
UDP Encapsulation of Stream Control Transmission Protocol (SCTP) Packets for End-Host to End-Host Communication RFC 6951, May 2013.
M. Tüxen, I. Rüngeler, and R. Stewart:
SACK-IMMEDIATELY Extension for the Stream Control Transmission Protocol RFC 7053, November 2013.