Selector: Use poll based selector instead of an epoll based selector.
The OpenJDK epoll based selector suffers from a serious bug where it
can never successfully deregister keys from closed channels.
The root cause of this bug is the sequence of operations that occur when
a channel that's registered with a selector is closed :
(0) Application code calls Channel.close().
(1) The channel is "preClosed" - We dup2(2) /dev/null into the channel's
file descriptor and the channel is marked as closed at the Java level.
(2) All keys associated with the channel are cancelled. Cancels are
lazy, which means that the Selectors involved won't necessarily
deregister these keys until an ongoing call to select() (if any) returns
or until the next call to select() on that selector.
(3) Once all selectors associated with the channel deregister these
cancelled keys, the channel FD is properly closed (via close(2)). Note
that an arbitrary length of time might elapse between Step 0 and this step.
This isn't a resource leak because the channel's FD is now a reference
THE PROBLEM :
The default Selector implementation on Linux 2.6 and higher uses epoll(7).
epoll can scale better than poll(2) because a lot of the state related
to the interest set (the set of descriptors we're polling on) is
maintained by the kernel. One of the side-effects of this design is that
callers must call into the kernel to make changes to the interest set
via epoll_ctl(7), for eg., by using EPOLL_CTL_ADD to add descriptors or
EPOLL_CTL_DEL to remove descriptors from the interest set. A call to
epoll_ctl with op = EPOLL_CTL_DEL is made when the selector attempts to
deregister an FD associated with a channel from the interest set (see
Step 2, above). These calls will *always fail* because the channel has
been preClosed (see Step 1). They fail because the kernel uses its own
internal file structure to maintain state, and rejects the command
because the descriptor we're passing in describes a different file
(/dev/null) that isn't selectable and isn't registered with the epoll
This is an issue in upstream OpenJDK as well and various select
implementations (such as netty) have hacks to work around it. Outside
of Android, things will work OK in most cases because the kernel has its
own internal cleanup logic to deregister files from epoll instances
whenever the last *non epoll* reference to the file has been closed -
and usually this happens at the point at which the dup2(2) from Step 1
is called. However, on Android, sockets tagged with the SocketTagger
will never hit this code path because the socket tagging implementation
(qtaguid) keeps a reference to the internal file until the socket
has been untagged. In cases where sockets are closed without being
untagged, the tagger keeps a reference to it until the process dies.
THE SOLUTION :
We switch over to using poll(2) instead of epoll(7). One of the
advantages of poll(2) is that there's less state maintained by the
kernel. We don't need to make a syscall (analogous to epoll_ctl)
whenever we want to remove an FD from the interest set; we merely
remove it from the list of FDs passed in the next time we call
through to poll. Poll is also slightly more efficient and less
overhead to set up when the number of FDs being polled is small
(which is the common case on Android).
We also need to make sure that all tagged sockets are untagged before
they're preclosed at the platform level. However, there's nothing we
can do about applications that abuse public api (android.net.TrafficStats).
ALTERNATE APPROACHES :
For completeness, I'm listing a couple of other approaches that were
considered but discarded.
- Removing preClose: This has the disadvantage of increasing the amount
of time (Delta between Step 0 and Step 3) a channel's descriptor is
kept alive. This also opens up races in the rare case where a
closed FD number is reused on a different thread while we have reads
- A Synchronous call to EPOLL_CTL_DEL when a channel is removed: This is a
non-starter because of the specified order of events in
AbstractSelectableChannel; implCloseSelectableChannel must be called
before all associated keys are cancelled.
(partially cherry picked from commit 4585ee7a9ef27260cb2e2b54bb18bc68861d5584)
This version of the change preserves the original EPoll classes because
they are used for asynchronous channels.
6 files changed