LSN-0107-1: Kernel Live Patch Security Notice
LSN-0107-1: Kernel Live Patch Security Notice
5 NOV 2024
Releases
Ubuntu 24.04 LTS Ubuntu 22.04 LTS Ubuntu 20.04 LTS Ubuntu 18.04 ESM Ubuntu 16.04 ESM Ubuntu 14.04 ESM
Software Description
aws - Linux kernel for Amazon Web Services (AWS) systems - (>= 4.15.0-1159, >= 5.4.0-1009, >= 5.4.0-1061, >= 5.15.0-1000, >= 6.8.0-1008, >= 4.4.0-1129, >= 4.4.0-1159)
aws-5.15 - Linux kernel for Amazon Web Services (AWS) systems - (>= 5.15.0-1000)
aws-hwe - Linux kernel for Amazon Web Services (AWS-HWE) systems - (>= 4.15.0-1126)
azure - Linux kernel for Microsoft Azure Cloud systems - (>= 5.4.0-1010, >= 5.15.0-1000, >= 6.8.0-1007, >= 4.15.0-1114)
azure-4.15 - Linux kernel for Microsoft Azure Cloud systems - (>= 4.15.0-1168)
azure-5.15 - Linux kernel for Microsoft Azure cloud systems - (>= 5.15.0-1069)
gcp - Linux kernel for Google Cloud Platform (GCP) systems - (>= 5.4.0-1009, >= 5.15.0-1000, >= 6.8.0-1007, >= 4.15.0-1118)
gcp-4.15 - Linux kernel for Google Cloud Platform (GCP) systems - (>= 4.15.0-1154)
gcp-5.15 - Linux kernel for Google Cloud Platform (GCP) systems - (>= 5.15.0-1000)
generic-4.15 - Linux hardware enablement (HWE) kernel - (>= 4.15.0-214, >= 4.15.0-143, >= 4.15.0-143)
generic-4.4 - Linux kernel - (>= 4.4.0-168, >= 4.4.0-211, >= 4.4.0-243)
generic-5.15 - Linux hardware enablement (HWE) kernel - (>= 5.15.0-0)
generic-5.4 - Linux kernel - (>= 5.4.0-150, >= 5.4.0-26)
gke - Linux kernel for Google Container Engine (GKE) systems - (>= 5.4.0-1033, >= 5.15.0-1000)
gke-5.15 - Linux kernel for Google Container Engine (GKE) systems - (>= 5.15.0-1000)
Details
In the Linux kernel, the following vulnerability has been
resolved: inet: inet_defrag: prevent sk release while still in use
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released. This affects skb fragments
reassembled via netfilter or similar modules, e.g. openvswitch or ct_act.c,
when run as part of tx pipeline. Eric Dumazet made an initial analysis of
this bug. Quoting Eric: Calling ip_defrag() in output path is also implying
skb_orphan(), which is buggy because output path relies on sk not
disappearing. A relevant old patch about the issue was : 8282f27449bf
(‘inet: frag: Always orphan skbs inside ip_defrag()’) [..
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an inet
socket, not an arbitrary one. If we orphan the packet in ipvlan, then
downstream things like FQ packet scheduler will not work properly. We need
to change ip_defrag() to only use skb_orphan() when really needed, ie
whenever frag_list is going to be used. Eric suggested to stash sk in
fragment queue and made an initial patch. However there is a problem with
this: If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree. IOW,
we have no choice but to fix up sk_wmem accouting to reflect the fully
reassembled skb, else wmem will underflow. This change moves the orphan
down into the core, to last possible moment. As ip_defrag_offset is aliased
with sk_buff->sk member, we must move the offset into the FRAG_CB, else
skb->sk gets clobbered. This allows to delay the orphaning long enough to
learn if the skb has to be queued or if the skb is completing the reasm
queue. In the former case, things work as before, skb is orphaned. This is
safe because skb gets queued/stolen and won’t continue past reasm engine.
In the latter case, we will steal the skb->sk reference, reattach it to the
head skb, and fix up wmem accouting when inet_frag inflates truesize.)(CVE-2024-26921)
In the Linux kernel, the following vulnerability has been
resolved: af_unix: Fix garbage collector racing against connect() Garbage
collector does not take into account the risk of embryo getting enqueued
during the garbage collection. If such embryo has a peer that carries
SCM_RIGHTS, two consecutive passes of scan_children() may see a different
set of children. Leading to an incorrectly elevated inflight count, and
then a dangling pointer within the gc_inflight_list. sockets are
AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening in-flight
socket bound to addr, not in fdtable V’s fd will be passed via sendmsg(),
gets inflight count bumped connect(S, addr) sendmsg(S, [V]); close(V)
unix_gc() ---------------- ------------------------- ----------- NS =
unix_create1() skb1 = sock_wmalloc(NS) L = unix_find_other(addr)
unix_state_lock(L) unix_peer(S) = NS // V count=1 inflight=0 NS =
unix_peer(S) skb2 = sock_alloc() skb_queue_tail(NS, skb2[V]) // V became
in-flight // V count=2 inflight=1 close(V) // V count=1 inflight=1 // GC
candidate condition met for u in gc_inflight_list: if (total_refs ==
inflight_refs) add u to gc_candidates // gc_candidates={L, V} for u in
gc_candidates: scan_children(u, dec_inflight) // embryo (skb1) was not //
reachable from L yet, so V’s // inflight remains unchangedskb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates: if
(u.inflight) scan_children(u, inc_inflight_move_tail) // V count=1
inflight=2 (!) If there is a GC-candidate listening socket, lock/unlock its
state. This makes GC wait until the end of any ongoing connect() to that
socket. After flipping the lock, a possibly SCM-laden embryo is already
enqueued. And if there is another embryo coming, it can not possibly carry
SCM_RIGHTS. At this point, unix_inflight() can not happen because
unix_gc_lock is already taken. Inflight graph remains unaffected.)(CVE-2024-26923)
In the Linux kernel, the following vulnerability has been
resolved: mm: swap: fix race between free_swap_and_cache() and swapoff()
There was previously a theoretical window where swapoff() could run and
teardown a swap_info_struct while a call to free_swap_and_cache() was
running in another thread. This could cause, amongst other bad
possibilities, swap_page_trans_huge_swapped() (called by
free_swap_and_cache()) to access the freed memory for swap_map. This is a
theoretical problem and I haven’t been able to provoke it from a test case.
But there has been agreement based on code review that this is possible
(see link below). Fix it by using get_swap_device()/put_swap_device(),
which will stall swapoff(). There was an extra check in _swap_info_get() to
confirm that the swap entry was not free. This isn’t present in
get_swap_device() because it doesn’t make sense in general due to the race
between getting the reference and swapoff. So I’ve added an equivalent
check directly in free_swap_and_cache(). Details of how to provoke one
possible issue (thanks to David Hildenbrand for deriving this): --8 swap_entry_free() might be the last user and result in ‘count ==
SWAP_HAS_CACHE’. swapoff->try_to_unuse() will stop as soon as soon as
si->inuse_pages==0. So the question is: could someone reclaim the folio and
turn si->inuse_pages==0, before we completed
swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in the
swapcache. Only 2 subpages are still references by swap entries. Process 1
still references subpage 0 via swap entry. Process 2 still references
subpage 1 via swap entry. Process 1 quits. Calls free_swap_and_cache(). ->
count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.] Process 2
quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE Process 2
goes ahead, passes swap_page_trans_huge_swapped(), and callstry_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()-> ... WRITE_ONCE(si->inuse_pages,
si->inuse_pages - nr_entries); What stops swapoff to succeed after process
2 reclaimed the swap cache but before process1 finished its call to
swap_page_trans_huge_swapped()? --8
In the Linux kernel, the following vulnerability has been
resolved: Bluetooth: Fix use-after-free bugs caused by sco_sock_timeout
When the sco connection is established and then, the sco socket is
releasing, timeout_work will be scheduled to judge whether the sco
disconnection is timeout. The sock will be deallocated later, but it is
dereferenced again in sco_sock_timeout. As a result, the use-after-free
bugs will happen. The root cause is shown below: Cleanup Thread
Worker
Thread sco_sock_release
sco_sock_close
__sco_sock_close
sco_sock_set_timer
schedule_delayed_work
sco_sock_kill
(wait a time)
sock_put(sk) //FREE
sco_sock_timeout
sock_hold(sk) //USE The KASAN
report triggered by POC is shown below: [ 95.890016