cgroup: fix cgroup_sk_alloc() for sk_clone_lock()

[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <>
Reported-by: Peter Geis <>
Reported-by: Lu Fengqi <>
Reported-by: Daniƫl Sonck <>
Reported-by: Zhang Qiang <>
Tested-by: Cameron Berkenpas <>
Tested-by: Peter Geis <>
Tested-by: Thomas Lamprecht <>
Cc: Daniel Borkmann <>
Cc: Zefan Li <>
Cc: Tejun Heo <>
Cc: Roman Gushchin <>
Signed-off-by: Cong Wang <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
4 files changed