Martin KaFai Lau says:

====================
pull-request: bpf-next 2023-08-03

We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 84 files changed, 4026 insertions(+), 562 deletions(-).

The main changes are:

1) Add SO_REUSEPORT support for TC bpf_sk_assign from Lorenz Bauer,
   Daniel Borkmann

2) Support new insns from cpu v4 from Yonghong Song

3) Non-atomically allocate freelist during prefill from YiFei Zhu

4) Support defragmenting IPv(4|6) packets in BPF from Daniel Xu

5) Add tracepoint to xdp attaching failure from Leon Hwang

6) struct netdev_rx_queue and xdp.h reshuffling to reduce
   rebuild time from Jakub Kicinski

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
  net: invert the netdevice.h vs xdp.h dependency
  net: move struct netdev_rx_queue out of netdevice.h
  eth: add missing xdp.h includes in drivers
  selftests/bpf: Add testcase for xdp attaching failure tracepoint
  bpf, xdp: Add tracepoint to xdp attaching failure
  selftests/bpf: fix static assert compilation issue for test_cls_*.c
  bpf: fix bpf_probe_read_kernel prototype mismatch
  riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework
  libbpf: fix typos in Makefile
  tracing: bpf: use struct trace_entry in struct syscall_tp_t
  bpf, devmap: Remove unused dtab field from bpf_dtab_netdev
  bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry
  netfilter: bpf: Only define get_proto_defrag_hook() if necessary
  bpf: Fix an array-index-out-of-bounds issue in disasm.c
  net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn
  docs/bpf: Fix malformed documentation
  bpf: selftests: Add defrag selftests
  bpf: selftests: Support custom type and proto for client sockets
  bpf: selftests: Support not connecting client socket
  netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
  ...
====================

Link: https://lore.kernel.org/r/20230803174845.825419-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski
2023-08-03 15:34:36 -07:00
84 changed files with 4023 additions and 559 deletions

View File

@@ -140,11 +140,6 @@ A: Because if we picked one-to-one relationship to x64 it would have made
it more complicated to support on arm64 and other archs. Also it it more complicated to support on arm64 and other archs. Also it
needs div-by-zero runtime check. needs div-by-zero runtime check.
Q: Why there is no BPF_SDIV for signed divide operation?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: Because it would be rarely used. llvm errors in such case and
prints a suggestion to use unsigned divide instead.
Q: Why BPF has implicit prologue and epilogue? Q: Why BPF has implicit prologue and epilogue?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: Because architectures like sparc have register windows and in general A: Because architectures like sparc have register windows and in general

View File

@@ -154,24 +154,27 @@ otherwise identical operations.
The 'code' field encodes the operation as below, where 'src' and 'dst' refer The 'code' field encodes the operation as below, where 'src' and 'dst' refer
to the values of the source and destination registers, respectively. to the values of the source and destination registers, respectively.
======== ===== ========================================================== ========= ===== ======= ==========================================================
code value description code value offset description
======== ===== ========================================================== ========= ===== ======= ==========================================================
BPF_ADD 0x00 dst += src BPF_ADD 0x00 0 dst += src
BPF_SUB 0x10 dst -= src BPF_SUB 0x10 0 dst -= src
BPF_MUL 0x20 dst \*= src BPF_MUL 0x20 0 dst \*= src
BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0 BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0
BPF_OR 0x40 dst \|= src BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0
BPF_AND 0x50 dst &= src BPF_OR 0x40 0 dst \|= src
BPF_LSH 0x60 dst <<= (src & mask) BPF_AND 0x50 0 dst &= src
BPF_RSH 0x70 dst >>= (src & mask) BPF_LSH 0x60 0 dst <<= (src & mask)
BPF_NEG 0x80 dst = -src BPF_RSH 0x70 0 dst >>= (src & mask)
BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst BPF_NEG 0x80 0 dst = -dst
BPF_XOR 0xa0 dst ^= src BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst
BPF_MOV 0xb0 dst = src BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst
BPF_ARSH 0xc0 sign extending dst >>= (src & mask) BPF_XOR 0xa0 0 dst ^= src
BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below) BPF_MOV 0xb0 0 dst = src
======== ===== ========================================================== BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src
BPF_ARSH 0xc0 0 sign extending dst >>= (src & mask)
BPF_END 0xd0 0 byte swap operations (see `Byte swap instructions`_ below)
========= ===== ======= ==========================================================
Underflow and overflow are allowed during arithmetic operations, meaning Underflow and overflow are allowed during arithmetic operations, meaning
the 64-bit or 32-bit value will wrap. If eBPF program execution would the 64-bit or 32-bit value will wrap. If eBPF program execution would
@@ -198,33 +201,51 @@ where '(u32)' indicates that the upper 32 bits are zeroed.
dst = dst ^ imm32 dst = dst ^ imm32
Also note that the division and modulo operations are unsigned. Thus, for Note that most instructions have instruction offset of 0. Only three instructions
``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas (``BPF_SDIV``, ``BPF_SMOD``, ``BPF_MOVSX``) have a non-zero offset.
for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result
interpreted as an unsigned 64-bit value. There are no instructions for The devision and modulo operations support both unsigned and signed flavors.
signed division or modulo.
For unsigned operations (``BPF_DIV`` and ``BPF_MOD``), for ``BPF_ALU``,
'imm' is interpreted as a 32-bit unsigned value. For ``BPF_ALU64``,
'imm' is first sign extended from 32 to 64 bits, and then interpreted as
a 64-bit unsigned value.
For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
is first sign extended from 32 to 64 bits, and then interpreted as a
64-bit signed value.
The ``BPF_MOVSX`` instruction does a move operation with sign extension.
``BPF_ALU | BPF_MOVSX`` sign extends 8-bit and 16-bit operands into 32
bit operands, and zeroes the remaining upper 32 bits.
``BPF_ALU64 | BPF_MOVSX`` sign extends 8-bit, 16-bit, and 32-bit
operands into 64 bit operands.
Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
for 32-bit operations. for 32-bit operations.
Byte swap instructions Byte swap instructions
~~~~~~~~~~~~~~~~~~~~~~ ----------------------
The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit The byte swap instructions use instruction classes of ``BPF_ALU`` and ``BPF_ALU64``
'code' field of ``BPF_END``. and a 4-bit 'code' field of ``BPF_END``.
The byte swap instructions operate on the destination register The byte swap instructions operate on the destination register
only and do not use a separate source register or immediate value. only and do not use a separate source register or immediate value.
The 1-bit source operand field in the opcode is used to select what byte For ``BPF_ALU``, the 1-bit source operand field in the opcode is used to
order the operation convert from or to: select what byte order the operation converts from or to. For
``BPF_ALU64``, the 1-bit source operand field in the opcode is reserved
and must be set to 0.
========= ===== ================================================= ========= ========= ===== =================================================
source value description class source value description
========= ===== ================================================= ========= ========= ===== =================================================
BPF_TO_LE 0x00 convert between host byte order and little endian BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little endian
BPF_TO_BE 0x08 convert between host byte order and big endian BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big endian
========= ===== ================================================= BPF_ALU64 Reserved 0x00 do byte swap unconditionally
========= ========= ===== =================================================
The 'imm' field encodes the width of the swap operations. The following widths The 'imm' field encodes the width of the swap operations. The following widths
are supported: 16, 32 and 64. are supported: 16, 32 and 64.
@@ -239,6 +260,12 @@ Examples:
dst = htobe64(dst) dst = htobe64(dst)
``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
dst = bswap16 dst
dst = bswap32 dst
dst = bswap64 dst
Jump instructions Jump instructions
----------------- -----------------
@@ -249,7 +276,8 @@ The 'code' field encodes the operation as below:
======== ===== === =========================================== ========================================= ======== ===== === =========================================== =========================================
code value src description notes code value src description notes
======== ===== === =========================================== ========================================= ======== ===== === =========================================== =========================================
BPF_JA 0x0 0x0 PC += offset BPF_JMP only BPF_JA 0x0 0x0 PC += offset BPF_JMP class
BPF_JA 0x0 0x0 PC += imm BPF_JMP32 class
BPF_JEQ 0x1 any PC += offset if dst == src BPF_JEQ 0x1 any PC += offset if dst == src
BPF_JGT 0x2 any PC += offset if dst > src unsigned BPF_JGT 0x2 any PC += offset if dst > src unsigned
BPF_JGE 0x3 any PC += offset if dst >= src unsigned BPF_JGE 0x3 any PC += offset if dst >= src unsigned
@@ -278,6 +306,19 @@ Example:
where 's>=' indicates a signed '>=' comparison. where 's>=' indicates a signed '>=' comparison.
``BPF_JA | BPF_K | BPF_JMP32`` (0x06) means::
gotol +imm
where 'imm' means the branch offset comes from insn 'imm' field.
Note that there are two flavors of ``BPF_JA`` instructions. The
``BPF_JMP`` class permits a 16-bit jump offset specified by the 'offset'
field, whereas the ``BPF_JMP32`` class permits a 32-bit jump offset
specified by the 'imm' field. A > 16-bit conditional jump may be
converted to a < 16-bit conditional jump plus a 32-bit unconditional
jump.
Helper functions Helper functions
~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
@@ -320,6 +361,7 @@ The mode modifier is one of:
BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_
BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_
BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_
BPF_MEMSX 0x80 sign-extension load operations `Sign-extension load operations`_
BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_
============= ===== ==================================== ============= ============= ===== ==================================== =============
@@ -350,9 +392,23 @@ instructions that transfer data between a register and memory.
``BPF_MEM | <size> | BPF_LDX`` means:: ``BPF_MEM | <size> | BPF_LDX`` means::
dst = *(size *) (src + offset) dst = *(unsigned size *) (src + offset)
Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``. Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW`` and
'unsigned size' is one of u8, u16, u32 or u64.
Sign-extension load operations
------------------------------
The ``BPF_MEMSX`` mode modifier is used to encode sign-extension load
instructions that transfer data between a register and memory.
``BPF_MEMSX | <size> | BPF_LDX`` means::
dst = *(signed size *) (src + offset)
Where size is one of: ``BPF_B``, ``BPF_H`` or ``BPF_W``, and
'signed size' is one of s8, s16 or s32.
Atomic operations Atomic operations
----------------- -----------------

View File

@@ -3704,7 +3704,7 @@ M: Daniel Borkmann <daniel@iogearbox.net>
M: Andrii Nakryiko <andrii@kernel.org> M: Andrii Nakryiko <andrii@kernel.org>
R: Martin KaFai Lau <martin.lau@linux.dev> R: Martin KaFai Lau <martin.lau@linux.dev>
R: Song Liu <song@kernel.org> R: Song Liu <song@kernel.org>
R: Yonghong Song <yhs@fb.com> R: Yonghong Song <yonghong.song@linux.dev>
R: John Fastabend <john.fastabend@gmail.com> R: John Fastabend <john.fastabend@gmail.com>
R: KP Singh <kpsingh@kernel.org> R: KP Singh <kpsingh@kernel.org>
R: Stanislav Fomichev <sdf@google.com> R: Stanislav Fomichev <sdf@google.com>
@@ -3743,7 +3743,7 @@ F: tools/lib/bpf/
F: tools/testing/selftests/bpf/ F: tools/testing/selftests/bpf/
BPF [ITERATOR] BPF [ITERATOR]
M: Yonghong Song <yhs@fb.com> M: Yonghong Song <yonghong.song@linux.dev>
L: bpf@vger.kernel.org L: bpf@vger.kernel.org
S: Maintained S: Maintained
F: kernel/bpf/*iter.c F: kernel/bpf/*iter.c

View File

@@ -13,6 +13,8 @@
#include <asm/patch.h> #include <asm/patch.h>
#include "bpf_jit.h" #include "bpf_jit.h"
#define RV_FENTRY_NINSNS 2
#define RV_REG_TCC RV_REG_A6 #define RV_REG_TCC RV_REG_A6
#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */ #define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
@@ -241,7 +243,7 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
if (!is_tail_call) if (!is_tail_call)
emit_mv(RV_REG_A0, RV_REG_A5, ctx); emit_mv(RV_REG_A0, RV_REG_A5, ctx);
emit_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA, emit_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA,
is_tail_call ? 20 : 0, /* skip reserved nops and TCC init */ is_tail_call ? (RV_FENTRY_NINSNS + 1) * 4 : 0, /* skip reserved nops and TCC init */
ctx); ctx);
} }
@@ -618,32 +620,7 @@ static int add_exception_handler(const struct bpf_insn *insn,
return 0; return 0;
} }
static int gen_call_or_nops(void *target, void *ip, u32 *insns) static int gen_jump_or_nops(void *target, void *ip, u32 *insns, bool is_call)
{
s64 rvoff;
int i, ret;
struct rv_jit_context ctx;
ctx.ninsns = 0;
ctx.insns = (u16 *)insns;
if (!target) {
for (i = 0; i < 4; i++)
emit(rv_nop(), &ctx);
return 0;
}
rvoff = (s64)(target - (ip + 4));
emit(rv_sd(RV_REG_SP, -8, RV_REG_RA), &ctx);
ret = emit_jump_and_link(RV_REG_RA, rvoff, false, &ctx);
if (ret)
return ret;
emit(rv_ld(RV_REG_RA, -8, RV_REG_SP), &ctx);
return 0;
}
static int gen_jump_or_nops(void *target, void *ip, u32 *insns)
{ {
s64 rvoff; s64 rvoff;
struct rv_jit_context ctx; struct rv_jit_context ctx;
@@ -658,38 +635,35 @@ static int gen_jump_or_nops(void *target, void *ip, u32 *insns)
} }
rvoff = (s64)(target - ip); rvoff = (s64)(target - ip);
return emit_jump_and_link(RV_REG_ZERO, rvoff, false, &ctx); return emit_jump_and_link(is_call ? RV_REG_T0 : RV_REG_ZERO, rvoff, false, &ctx);
} }
int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type, int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
void *old_addr, void *new_addr) void *old_addr, void *new_addr)
{ {
u32 old_insns[4], new_insns[4]; u32 old_insns[RV_FENTRY_NINSNS], new_insns[RV_FENTRY_NINSNS];
bool is_call = poke_type == BPF_MOD_CALL; bool is_call = poke_type == BPF_MOD_CALL;
int (*gen_insns)(void *target, void *ip, u32 *insns);
int ninsns = is_call ? 4 : 2;
int ret; int ret;
if (!is_bpf_text_address((unsigned long)ip)) if (!is_kernel_text((unsigned long)ip) &&
!is_bpf_text_address((unsigned long)ip))
return -ENOTSUPP; return -ENOTSUPP;
gen_insns = is_call ? gen_call_or_nops : gen_jump_or_nops; ret = gen_jump_or_nops(old_addr, ip, old_insns, is_call);
ret = gen_insns(old_addr, ip, old_insns);
if (ret) if (ret)
return ret; return ret;
if (memcmp(ip, old_insns, ninsns * 4)) if (memcmp(ip, old_insns, RV_FENTRY_NINSNS * 4))
return -EFAULT; return -EFAULT;
ret = gen_insns(new_addr, ip, new_insns); ret = gen_jump_or_nops(new_addr, ip, new_insns, is_call);
if (ret) if (ret)
return ret; return ret;
cpus_read_lock(); cpus_read_lock();
mutex_lock(&text_mutex); mutex_lock(&text_mutex);
if (memcmp(ip, new_insns, ninsns * 4)) if (memcmp(ip, new_insns, RV_FENTRY_NINSNS * 4))
ret = patch_text(ip, new_insns, ninsns); ret = patch_text(ip, new_insns, RV_FENTRY_NINSNS);
mutex_unlock(&text_mutex); mutex_unlock(&text_mutex);
cpus_read_unlock(); cpus_read_unlock();
@@ -787,8 +761,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
int i, ret, offset; int i, ret, offset;
int *branches_off = NULL; int *branches_off = NULL;
int stack_size = 0, nregs = m->nr_args; int stack_size = 0, nregs = m->nr_args;
int retaddr_off, fp_off, retval_off, args_off; int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off;
int nregs_off, ip_off, run_ctx_off, sreg_off;
struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
@@ -796,13 +769,27 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
bool save_ret; bool save_ret;
u32 insn; u32 insn;
/* Generated trampoline stack layout: /* Two types of generated trampoline stack layout:
* *
* FP - 8 [ RA of parent func ] return address of parent * 1. trampoline called from function entry
* --------------------------------------
* FP + 8 [ RA to parent func ] return address to parent
* function * function
* FP - retaddr_off [ RA of traced func ] return address of traced * FP + 0 [ FP of parent func ] frame pointer of parent
* function * function
* FP - fp_off [ FP of parent func ] * FP - 8 [ T0 to traced func ] return address of traced
* function
* FP - 16 [ FP of traced func ] frame pointer of traced
* function
* --------------------------------------
*
* 2. trampoline called directly
* --------------------------------------
* FP - 8 [ RA to caller func ] return address to caller
* function
* FP - 16 [ FP of caller func ] frame pointer of caller
* function
* --------------------------------------
* *
* FP - retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or * FP - retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or
* BPF_TRAMP_F_RET_FENTRY_RET * BPF_TRAMP_F_RET_FENTRY_RET
@@ -833,14 +820,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
if (nregs > 8) if (nregs > 8)
return -ENOTSUPP; return -ENOTSUPP;
/* room for parent function return address */ /* room of trampoline frame to store return address and frame pointer */
stack_size += 8; stack_size += 16;
stack_size += 8;
retaddr_off = stack_size;
stack_size += 8;
fp_off = stack_size;
save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET); save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
if (save_ret) { if (save_ret) {
@@ -867,12 +848,29 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
stack_size = round_up(stack_size, 16); stack_size = round_up(stack_size, 16);
if (func_addr) {
/* For the trampoline called from function entry,
* the frame of traced function and the frame of
* trampoline need to be considered.
*/
emit_addi(RV_REG_SP, RV_REG_SP, -16, ctx);
emit_sd(RV_REG_SP, 8, RV_REG_RA, ctx);
emit_sd(RV_REG_SP, 0, RV_REG_FP, ctx);
emit_addi(RV_REG_FP, RV_REG_SP, 16, ctx);
emit_addi(RV_REG_SP, RV_REG_SP, -stack_size, ctx); emit_addi(RV_REG_SP, RV_REG_SP, -stack_size, ctx);
emit_sd(RV_REG_SP, stack_size - 8, RV_REG_T0, ctx);
emit_sd(RV_REG_SP, stack_size - retaddr_off, RV_REG_RA, ctx); emit_sd(RV_REG_SP, stack_size - 16, RV_REG_FP, ctx);
emit_sd(RV_REG_SP, stack_size - fp_off, RV_REG_FP, ctx);
emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx); emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx);
} else {
/* For the trampoline called directly, just handle
* the frame of trampoline.
*/
emit_addi(RV_REG_SP, RV_REG_SP, -stack_size, ctx);
emit_sd(RV_REG_SP, stack_size - 8, RV_REG_RA, ctx);
emit_sd(RV_REG_SP, stack_size - 16, RV_REG_FP, ctx);
emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx);
}
/* callee saved register S1 to pass start time */ /* callee saved register S1 to pass start time */
emit_sd(RV_REG_FP, -sreg_off, RV_REG_S1, ctx); emit_sd(RV_REG_FP, -sreg_off, RV_REG_S1, ctx);
@@ -890,7 +888,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
/* skip to actual body of traced function */ /* skip to actual body of traced function */
if (flags & BPF_TRAMP_F_SKIP_FRAME) if (flags & BPF_TRAMP_F_SKIP_FRAME)
orig_call += 16; orig_call += RV_FENTRY_NINSNS * 4;
if (flags & BPF_TRAMP_F_CALL_ORIG) { if (flags & BPF_TRAMP_F_CALL_ORIG) {
emit_imm(RV_REG_A0, (const s64)im, ctx); emit_imm(RV_REG_A0, (const s64)im, ctx);
@@ -967,17 +965,30 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx); emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx);
if (flags & BPF_TRAMP_F_SKIP_FRAME) if (func_addr) {
/* return address of parent function */ /* trampoline called from function entry */
emit_ld(RV_REG_RA, stack_size - 8, RV_REG_SP, ctx); emit_ld(RV_REG_T0, stack_size - 8, RV_REG_SP, ctx);
else emit_ld(RV_REG_FP, stack_size - 16, RV_REG_SP, ctx);
/* return address of traced function */ emit_addi(RV_REG_SP, RV_REG_SP, stack_size, ctx);
emit_ld(RV_REG_RA, stack_size - retaddr_off, RV_REG_SP, ctx);
emit_ld(RV_REG_FP, stack_size - fp_off, RV_REG_SP, ctx); emit_ld(RV_REG_RA, 8, RV_REG_SP, ctx);
emit_ld(RV_REG_FP, 0, RV_REG_SP, ctx);
emit_addi(RV_REG_SP, RV_REG_SP, 16, ctx);
if (flags & BPF_TRAMP_F_SKIP_FRAME)
/* return to parent function */
emit_jalr(RV_REG_ZERO, RV_REG_RA, 0, ctx);
else
/* return to traced function */
emit_jalr(RV_REG_ZERO, RV_REG_T0, 0, ctx);
} else {
/* trampoline called directly */
emit_ld(RV_REG_RA, stack_size - 8, RV_REG_SP, ctx);
emit_ld(RV_REG_FP, stack_size - 16, RV_REG_SP, ctx);
emit_addi(RV_REG_SP, RV_REG_SP, stack_size, ctx); emit_addi(RV_REG_SP, RV_REG_SP, stack_size, ctx);
emit_jalr(RV_REG_ZERO, RV_REG_RA, 0, ctx); emit_jalr(RV_REG_ZERO, RV_REG_RA, 0, ctx);
}
ret = ctx->ninsns; ret = ctx->ninsns;
out: out:
@@ -1691,8 +1702,8 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
store_offset = stack_adjust - 8; store_offset = stack_adjust - 8;
/* reserve 4 nop insns */ /* nops reserved for auipc+jalr pair */
for (i = 0; i < 4; i++) for (i = 0; i < RV_FENTRY_NINSNS; i++)
emit(rv_nop(), ctx); emit(rv_nop(), ctx);
/* First instruction is always setting the tail-call-counter /* First instruction is always setting the tail-call-counter

View File

@@ -701,6 +701,38 @@ static void emit_mov_reg(u8 **pprog, bool is64, u32 dst_reg, u32 src_reg)
*pprog = prog; *pprog = prog;
} }
static void emit_movsx_reg(u8 **pprog, int num_bits, bool is64, u32 dst_reg,
u32 src_reg)
{
u8 *prog = *pprog;
if (is64) {
/* movs[b,w,l]q dst, src */
if (num_bits == 8)
EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbe,
add_2reg(0xC0, src_reg, dst_reg));
else if (num_bits == 16)
EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbf,
add_2reg(0xC0, src_reg, dst_reg));
else if (num_bits == 32)
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x63,
add_2reg(0xC0, src_reg, dst_reg));
} else {
/* movs[b,w]l dst, src */
if (num_bits == 8) {
EMIT4(add_2mod(0x40, src_reg, dst_reg), 0x0f, 0xbe,
add_2reg(0xC0, src_reg, dst_reg));
} else if (num_bits == 16) {
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT1(add_2mod(0x40, src_reg, dst_reg));
EMIT3(add_2mod(0x0f, src_reg, dst_reg), 0xbf,
add_2reg(0xC0, src_reg, dst_reg));
}
}
*pprog = prog;
}
/* Emit the suffix (ModR/M etc) for addressing *(ptr_reg + off) and val_reg */ /* Emit the suffix (ModR/M etc) for addressing *(ptr_reg + off) and val_reg */
static void emit_insn_suffix(u8 **pprog, u32 ptr_reg, u32 val_reg, int off) static void emit_insn_suffix(u8 **pprog, u32 ptr_reg, u32 val_reg, int off)
{ {
@@ -779,6 +811,29 @@ static void emit_ldx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
*pprog = prog; *pprog = prog;
} }
/* LDSX: dst_reg = *(s8*)(src_reg + off) */
static void emit_ldsx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
{
u8 *prog = *pprog;
switch (size) {
case BPF_B:
/* Emit 'movsx rax, byte ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xBE);
break;
case BPF_H:
/* Emit 'movsx rax, word ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xBF);
break;
case BPF_W:
/* Emit 'movsx rax, dword ptr [rax+0x14]' */
EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x63);
break;
}
emit_insn_suffix(&prog, src_reg, dst_reg, off);
*pprog = prog;
}
/* STX: *(u8*)(dst_reg + off) = src_reg */ /* STX: *(u8*)(dst_reg + off) = src_reg */
static void emit_stx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off) static void emit_stx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
{ {
@@ -1028,9 +1083,14 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
case BPF_ALU64 | BPF_MOV | BPF_X: case BPF_ALU64 | BPF_MOV | BPF_X:
case BPF_ALU | BPF_MOV | BPF_X: case BPF_ALU | BPF_MOV | BPF_X:
if (insn->off == 0)
emit_mov_reg(&prog, emit_mov_reg(&prog,
BPF_CLASS(insn->code) == BPF_ALU64, BPF_CLASS(insn->code) == BPF_ALU64,
dst_reg, src_reg); dst_reg, src_reg);
else
emit_movsx_reg(&prog, insn->off,
BPF_CLASS(insn->code) == BPF_ALU64,
dst_reg, src_reg);
break; break;
/* neg dst */ /* neg dst */
@@ -1134,6 +1194,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
/* mov rax, dst_reg */ /* mov rax, dst_reg */
emit_mov_reg(&prog, is64, BPF_REG_0, dst_reg); emit_mov_reg(&prog, is64, BPF_REG_0, dst_reg);
if (insn->off == 0) {
/* /*
* xor edx, edx * xor edx, edx
* equivalent to 'xor rdx, rdx', but one byte less * equivalent to 'xor rdx, rdx', but one byte less
@@ -1143,6 +1204,16 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
/* div src_reg */ /* div src_reg */
maybe_emit_1mod(&prog, src_reg, is64); maybe_emit_1mod(&prog, src_reg, is64);
EMIT2(0xF7, add_1reg(0xF0, src_reg)); EMIT2(0xF7, add_1reg(0xF0, src_reg));
} else {
if (BPF_CLASS(insn->code) == BPF_ALU)
EMIT1(0x99); /* cdq */
else
EMIT2(0x48, 0x99); /* cqo */
/* idiv src_reg */
maybe_emit_1mod(&prog, src_reg, is64);
EMIT2(0xF7, add_1reg(0xF8, src_reg));
}
if (BPF_OP(insn->code) == BPF_MOD && if (BPF_OP(insn->code) == BPF_MOD &&
dst_reg != BPF_REG_3) dst_reg != BPF_REG_3)
@@ -1262,6 +1333,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
break; break;
case BPF_ALU | BPF_END | BPF_FROM_BE: case BPF_ALU | BPF_END | BPF_FROM_BE:
case BPF_ALU64 | BPF_END | BPF_FROM_LE:
switch (imm32) { switch (imm32) {
case 16: case 16:
/* Emit 'ror %ax, 8' to swap lower 2 bytes */ /* Emit 'ror %ax, 8' to swap lower 2 bytes */
@@ -1370,9 +1442,17 @@ st: if (is_imm8(insn->off))
case BPF_LDX | BPF_PROBE_MEM | BPF_W: case BPF_LDX | BPF_PROBE_MEM | BPF_W:
case BPF_LDX | BPF_MEM | BPF_DW: case BPF_LDX | BPF_MEM | BPF_DW:
case BPF_LDX | BPF_PROBE_MEM | BPF_DW: case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
/* LDXS: dst_reg = *(s8*)(src_reg + off) */
case BPF_LDX | BPF_MEMSX | BPF_B:
case BPF_LDX | BPF_MEMSX | BPF_H:
case BPF_LDX | BPF_MEMSX | BPF_W:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_B:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_H:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
insn_off = insn->off; insn_off = insn->off;
if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
/* Conservatively check that src_reg + insn->off is a kernel address: /* Conservatively check that src_reg + insn->off is a kernel address:
* src_reg + insn->off >= TASK_SIZE_MAX + PAGE_SIZE * src_reg + insn->off >= TASK_SIZE_MAX + PAGE_SIZE
* src_reg is used as scratch for src_reg += insn->off and restored * src_reg is used as scratch for src_reg += insn->off and restored
@@ -1415,8 +1495,13 @@ st: if (is_imm8(insn->off))
start_of_ldx = prog; start_of_ldx = prog;
end_of_jmp[-1] = start_of_ldx - end_of_jmp; end_of_jmp[-1] = start_of_ldx - end_of_jmp;
} }
if (BPF_MODE(insn->code) == BPF_PROBE_MEMSX ||
BPF_MODE(insn->code) == BPF_MEMSX)
emit_ldsx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn_off);
else
emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn_off); emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn_off);
if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
struct exception_table_entry *ex; struct exception_table_entry *ex;
u8 *_insn = image + proglen + (start_of_ldx - temp); u8 *_insn = image + proglen + (start_of_ldx - temp);
s64 delta; s64 delta;
@@ -1730,6 +1815,8 @@ emit_cond_jmp: /* Convert BPF opcode to x86 */
break; break;
case BPF_JMP | BPF_JA: case BPF_JMP | BPF_JA:
case BPF_JMP32 | BPF_JA:
if (BPF_CLASS(insn->code) == BPF_JMP) {
if (insn->off == -1) if (insn->off == -1)
/* -1 jmp instructions will always jump /* -1 jmp instructions will always jump
* backwards two bytes. Explicitly handling * backwards two bytes. Explicitly handling
@@ -1740,6 +1827,12 @@ emit_cond_jmp: /* Convert BPF opcode to x86 */
jmp_offset = -2; jmp_offset = -2;
else else
jmp_offset = addrs[i + insn->off] - addrs[i]; jmp_offset = addrs[i + insn->off] - addrs[i];
} else {
if (insn->imm == -1)
jmp_offset = -2;
else
jmp_offset = addrs[i + insn->imm] - addrs[i];
}
if (!jmp_offset) { if (!jmp_offset) {
/* /*

View File

@@ -90,6 +90,7 @@
#include <net/tls.h> #include <net/tls.h>
#endif #endif
#include <net/ip6_route.h> #include <net/ip6_route.h>
#include <net/xdp.h>
#include "bonding_priv.h" #include "bonding_priv.h"

View File

@@ -14,6 +14,7 @@
#include <linux/interrupt.h> #include <linux/interrupt.h>
#include <linux/netdevice.h> #include <linux/netdevice.h>
#include <linux/skbuff.h> #include <linux/skbuff.h>
#include <net/xdp.h>
#include <uapi/linux/bpf.h> #include <uapi/linux/bpf.h>
#include "ena_com.h" #include "ena_com.h"

View File

@@ -14,6 +14,7 @@
#include <linux/net_tstamp.h> #include <linux/net_tstamp.h>
#include <linux/ptp_clock_kernel.h> #include <linux/ptp_clock_kernel.h>
#include <linux/miscdevice.h> #include <linux/miscdevice.h>
#include <net/xdp.h>
#define TSNEP "tsnep" #define TSNEP "tsnep"

View File

@@ -12,6 +12,7 @@
#include <linux/fsl/mc.h> #include <linux/fsl/mc.h>
#include <linux/net_tstamp.h> #include <linux/net_tstamp.h>
#include <net/devlink.h> #include <net/devlink.h>
#include <net/xdp.h>
#include <soc/fsl/dpaa2-io.h> #include <soc/fsl/dpaa2-io.h>
#include <soc/fsl/dpaa2-fd.h> #include <soc/fsl/dpaa2-fd.h>

View File

@@ -11,6 +11,7 @@
#include <linux/if_vlan.h> #include <linux/if_vlan.h>
#include <linux/phylink.h> #include <linux/phylink.h>
#include <linux/dim.h> #include <linux/dim.h>
#include <net/xdp.h>
#include "enetc_hw.h" #include "enetc_hw.h"

View File

@@ -22,6 +22,7 @@
#include <linux/timecounter.h> #include <linux/timecounter.h>
#include <dt-bindings/firmware/imx/rsrc.h> #include <dt-bindings/firmware/imx/rsrc.h>
#include <linux/firmware/imx/sci.h> #include <linux/firmware/imx/sci.h>
#include <net/xdp.h>
#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \ #if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM) || \ defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM) || \

View File

@@ -5,6 +5,7 @@
#include <linux/netdevice.h> #include <linux/netdevice.h>
#include <linux/u64_stats_sync.h> #include <linux/u64_stats_sync.h>
#include <net/xdp.h>
/* Tx descriptor size */ /* Tx descriptor size */
#define FUNETH_SQE_SIZE 64U #define FUNETH_SQE_SIZE 64U

View File

@@ -11,6 +11,7 @@
#include <linux/netdevice.h> #include <linux/netdevice.h>
#include <linux/pci.h> #include <linux/pci.h>
#include <linux/u64_stats_sync.h> #include <linux/u64_stats_sync.h>
#include <net/xdp.h>
#include "gve_desc.h" #include "gve_desc.h"
#include "gve_desc_dqo.h" #include "gve_desc_dqo.h"

View File

@@ -15,6 +15,7 @@
#include <linux/net_tstamp.h> #include <linux/net_tstamp.h>
#include <linux/bitfield.h> #include <linux/bitfield.h>
#include <linux/hrtimer.h> #include <linux/hrtimer.h>
#include <net/xdp.h>
#include "igc_hw.h" #include "igc_hw.h"

View File

@@ -14,6 +14,7 @@
#include <net/pkt_cls.h> #include <net/pkt_cls.h>
#include <net/pkt_sched.h> #include <net/pkt_sched.h>
#include <net/switchdev.h> #include <net/switchdev.h>
#include <net/xdp.h>
#include <vcap_api.h> #include <vcap_api.h>
#include <vcap_api_client.h> #include <vcap_api_client.h>

View File

@@ -11,6 +11,7 @@
#include <net/checksum.h> #include <net/checksum.h>
#include <net/ip6_checksum.h> #include <net/ip6_checksum.h>
#include <net/xdp.h>
#include <net/mana/mana.h> #include <net/mana/mana.h>
#include <net/mana/mana_auxiliary.h> #include <net/mana/mana_auxiliary.h>

View File

@@ -22,6 +22,7 @@
#include <linux/net_tstamp.h> #include <linux/net_tstamp.h>
#include <linux/reset.h> #include <linux/reset.h>
#include <net/page_pool.h> #include <net/page_pool.h>
#include <net/xdp.h>
#include <uapi/linux/bpf.h> #include <uapi/linux/bpf.h>
struct stmmac_resources { struct stmmac_resources {

View File

@@ -6,6 +6,7 @@
#ifndef DRIVERS_NET_ETHERNET_TI_CPSW_PRIV_H_ #ifndef DRIVERS_NET_ETHERNET_TI_CPSW_PRIV_H_
#define DRIVERS_NET_ETHERNET_TI_CPSW_PRIV_H_ #define DRIVERS_NET_ETHERNET_TI_CPSW_PRIV_H_
#include <net/xdp.h>
#include <uapi/linux/bpf.h> #include <uapi/linux/bpf.h>
#include "davinci_cpdma.h" #include "davinci_cpdma.h"

View File

@@ -16,6 +16,7 @@
#include <linux/hyperv.h> #include <linux/hyperv.h>
#include <linux/rndis.h> #include <linux/rndis.h>
#include <linux/jhash.h> #include <linux/jhash.h>
#include <net/xdp.h>
/* RSS related */ /* RSS related */
#define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203 /* query only */ #define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203 /* query only */

View File

@@ -22,6 +22,7 @@
#include <net/net_namespace.h> #include <net/net_namespace.h>
#include <net/rtnetlink.h> #include <net/rtnetlink.h>
#include <net/sock.h> #include <net/sock.h>
#include <net/xdp.h>
#include <linux/virtio_net.h> #include <linux/virtio_net.h>
#include <linux/skb_array.h> #include <linux/skb_array.h>

View File

@@ -22,6 +22,7 @@
#include <net/route.h> #include <net/route.h>
#include <net/xdp.h> #include <net/xdp.h>
#include <net/net_failover.h> #include <net/net_failover.h>
#include <net/netdev_rx_queue.h>
static int napi_weight = NAPI_POLL_WEIGHT; static int napi_weight = NAPI_POLL_WEIGHT;
module_param(napi_weight, int, 0444); module_param(napi_weight, int, 0444);

View File

@@ -2661,6 +2661,18 @@ static inline void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr)
} }
#endif /* CONFIG_BPF_SYSCALL */ #endif /* CONFIG_BPF_SYSCALL */
static __always_inline int
bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr)
{
int ret = -EFAULT;
if (IS_ENABLED(CONFIG_BPF_EVENTS))
ret = copy_from_kernel_nofault(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
memset(dst, 0, size);
return ret;
}
void __bpf_free_used_btfs(struct bpf_prog_aux *aux, void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
struct btf_mod_pair *used_btfs, u32 len); struct btf_mod_pair *used_btfs, u32 len);

View File

@@ -69,6 +69,9 @@ struct ctl_table_header;
/* unused opcode to mark special load instruction. Same as BPF_ABS */ /* unused opcode to mark special load instruction. Same as BPF_ABS */
#define BPF_PROBE_MEM 0x20 #define BPF_PROBE_MEM 0x20
/* unused opcode to mark special ldsx instruction. Same as BPF_IND */
#define BPF_PROBE_MEMSX 0x40
/* unused opcode to mark call to interpreter with arguments */ /* unused opcode to mark call to interpreter with arguments */
#define BPF_CALL_ARGS 0xe0 #define BPF_CALL_ARGS 0xe0
@@ -90,22 +93,28 @@ struct ctl_table_header;
/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */ /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
#define BPF_ALU64_REG(OP, DST, SRC) \ #define BPF_ALU64_REG_OFF(OP, DST, SRC, OFF) \
((struct bpf_insn) { \ ((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \ .dst_reg = DST, \
.src_reg = SRC, \ .src_reg = SRC, \
.off = 0, \ .off = OFF, \
.imm = 0 }) .imm = 0 })
#define BPF_ALU32_REG(OP, DST, SRC) \ #define BPF_ALU64_REG(OP, DST, SRC) \
BPF_ALU64_REG_OFF(OP, DST, SRC, 0)
#define BPF_ALU32_REG_OFF(OP, DST, SRC, OFF) \
((struct bpf_insn) { \ ((struct bpf_insn) { \
.code = BPF_ALU | BPF_OP(OP) | BPF_X, \ .code = BPF_ALU | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \ .dst_reg = DST, \
.src_reg = SRC, \ .src_reg = SRC, \
.off = 0, \ .off = OFF, \
.imm = 0 }) .imm = 0 })
#define BPF_ALU32_REG(OP, DST, SRC) \
BPF_ALU32_REG_OFF(OP, DST, SRC, 0)
/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */ /* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
#define BPF_ALU64_IMM(OP, DST, IMM) \ #define BPF_ALU64_IMM(OP, DST, IMM) \
@@ -765,23 +774,6 @@ DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
u32 xdp_master_redirect(struct xdp_buff *xdp); u32 xdp_master_redirect(struct xdp_buff *xdp);
static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
struct xdp_buff *xdp)
{
/* Driver XDP hooks are invoked within a single NAPI poll cycle and thus
* under local_bh_disable(), which provides the needed RCU protection
* for accessing map entries.
*/
u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
act = xdp_master_redirect(xdp);
}
return act;
}
void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog); void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog) static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog)

View File

@@ -40,7 +40,6 @@
#include <net/dcbnl.h> #include <net/dcbnl.h>
#endif #endif
#include <net/netprio_cgroup.h> #include <net/netprio_cgroup.h>
#include <net/xdp.h>
#include <linux/netdev_features.h> #include <linux/netdev_features.h>
#include <linux/neighbour.h> #include <linux/neighbour.h>
@@ -77,8 +76,12 @@ struct udp_tunnel_nic_info;
struct udp_tunnel_nic; struct udp_tunnel_nic;
struct bpf_prog; struct bpf_prog;
struct xdp_buff; struct xdp_buff;
struct xdp_frame;
struct xdp_metadata_ops;
struct xdp_md; struct xdp_md;
typedef u32 xdp_features_t;
void synchronize_net(void); void synchronize_net(void);
void netdev_set_default_ethtool_ops(struct net_device *dev, void netdev_set_default_ethtool_ops(struct net_device *dev,
const struct ethtool_ops *ops); const struct ethtool_ops *ops);
@@ -783,32 +786,6 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, u32 flow_id,
#endif #endif
#endif /* CONFIG_RPS */ #endif /* CONFIG_RPS */
/* This structure contains an instance of an RX queue. */
struct netdev_rx_queue {
struct xdp_rxq_info xdp_rxq;
#ifdef CONFIG_RPS
struct rps_map __rcu *rps_map;
struct rps_dev_flow_table __rcu *rps_flow_table;
#endif
struct kobject kobj;
struct net_device *dev;
netdevice_tracker dev_tracker;
#ifdef CONFIG_XDP_SOCKETS
struct xsk_buff_pool *pool;
#endif
} ____cacheline_aligned_in_smp;
/*
* RX queue sysfs structures and functions.
*/
struct rx_queue_attribute {
struct attribute attr;
ssize_t (*show)(struct netdev_rx_queue *queue, char *buf);
ssize_t (*store)(struct netdev_rx_queue *queue,
const char *buf, size_t len);
};
/* XPS map type and offset of the xps map within net_device->xps_maps[]. */ /* XPS map type and offset of the xps map within net_device->xps_maps[]. */
enum xps_map_type { enum xps_map_type {
XPS_CPUS = 0, XPS_CPUS = 0,
@@ -1670,12 +1647,6 @@ struct net_device_ops {
struct netlink_ext_ack *extack); struct netlink_ext_ack *extack);
}; };
struct xdp_metadata_ops {
int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash,
enum xdp_rss_hash_type *rss_type);
};
/** /**
* enum netdev_priv_flags - &struct net_device priv_flags * enum netdev_priv_flags - &struct net_device priv_flags
* *
@@ -3851,24 +3822,6 @@ static inline int netif_set_real_num_rx_queues(struct net_device *dev,
int netif_set_real_num_queues(struct net_device *dev, int netif_set_real_num_queues(struct net_device *dev,
unsigned int txq, unsigned int rxq); unsigned int txq, unsigned int rxq);
static inline struct netdev_rx_queue *
__netif_get_rx_queue(struct net_device *dev, unsigned int rxq)
{
return dev->_rx + rxq;
}
#ifdef CONFIG_SYSFS
static inline unsigned int get_netdev_rx_queue_index(
struct netdev_rx_queue *queue)
{
struct net_device *dev = queue->dev;
int index = queue - dev->_rx;
BUG_ON(index >= dev->num_rx_queues);
return index;
}
#endif
int netif_get_num_default_rss_queues(void); int netif_get_num_default_rss_queues(void);
void dev_kfree_skb_irq_reason(struct sk_buff *skb, enum skb_drop_reason reason); void dev_kfree_skb_irq_reason(struct sk_buff *skb, enum skb_drop_reason reason);

View File

@@ -11,6 +11,7 @@
#include <linux/wait.h> #include <linux/wait.h>
#include <linux/list.h> #include <linux/list.h>
#include <linux/static_key.h> #include <linux/static_key.h>
#include <linux/module.h>
#include <linux/netfilter_defs.h> #include <linux/netfilter_defs.h>
#include <linux/netdevice.h> #include <linux/netdevice.h>
#include <linux/sockptr.h> #include <linux/sockptr.h>
@@ -481,6 +482,15 @@ struct nfnl_ct_hook {
}; };
extern const struct nfnl_ct_hook __rcu *nfnl_ct_hook; extern const struct nfnl_ct_hook __rcu *nfnl_ct_hook;
struct nf_defrag_hook {
struct module *owner;
int (*enable)(struct net *net);
void (*disable)(struct net *net);
};
extern const struct nf_defrag_hook __rcu *nf_defrag_v4_hook;
extern const struct nf_defrag_hook __rcu *nf_defrag_v6_hook;
/* /*
* nf_skb_duplicated - TEE target has sent a packet * nf_skb_duplicated - TEE target has sent a packet
* *

View File

@@ -16,6 +16,7 @@
#include <linux/sched/clock.h> #include <linux/sched/clock.h>
#include <linux/sched/signal.h> #include <linux/sched/signal.h>
#include <net/ip.h> #include <net/ip.h>
#include <net/xdp.h>
/* 0 - Reserved to indicate value not set /* 0 - Reserved to indicate value not set
* 1..NR_CPUS - Reserved for sender_cpu * 1..NR_CPUS - Reserved for sender_cpu

View File

@@ -48,6 +48,22 @@ struct sock *__inet6_lookup_established(struct net *net,
const u16 hnum, const int dif, const u16 hnum, const int dif,
const int sdif); const int sdif);
typedef u32 (inet6_ehashfn_t)(const struct net *net,
const struct in6_addr *laddr, const u16 lport,
const struct in6_addr *faddr, const __be16 fport);
inet6_ehashfn_t inet6_ehashfn;
INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn);
struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk,
struct sk_buff *skb, int doff,
const struct in6_addr *saddr,
__be16 sport,
const struct in6_addr *daddr,
unsigned short hnum,
inet6_ehashfn_t *ehashfn);
struct sock *inet6_lookup_listener(struct net *net, struct sock *inet6_lookup_listener(struct net *net,
struct inet_hashinfo *hashinfo, struct inet_hashinfo *hashinfo,
struct sk_buff *skb, int doff, struct sk_buff *skb, int doff,
@@ -57,6 +73,15 @@ struct sock *inet6_lookup_listener(struct net *net,
const unsigned short hnum, const unsigned short hnum,
const int dif, const int sdif); const int dif, const int sdif);
struct sock *inet6_lookup_run_sk_lookup(struct net *net,
int protocol,
struct sk_buff *skb, int doff,
const struct in6_addr *saddr,
const __be16 sport,
const struct in6_addr *daddr,
const u16 hnum, const int dif,
inet6_ehashfn_t *ehashfn);
static inline struct sock *__inet6_lookup(struct net *net, static inline struct sock *__inet6_lookup(struct net *net,
struct inet_hashinfo *hashinfo, struct inet_hashinfo *hashinfo,
struct sk_buff *skb, int doff, struct sk_buff *skb, int doff,
@@ -78,6 +103,46 @@ static inline struct sock *__inet6_lookup(struct net *net,
daddr, hnum, dif, sdif); daddr, hnum, dif, sdif);
} }
static inline
struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, int doff,
const struct in6_addr *saddr, const __be16 sport,
const struct in6_addr *daddr, const __be16 dport,
bool *refcounted, inet6_ehashfn_t *ehashfn)
{
struct sock *sk, *reuse_sk;
bool prefetched;
sk = skb_steal_sock(skb, refcounted, &prefetched);
if (!sk)
return NULL;
if (!prefetched)
return sk;
if (sk->sk_protocol == IPPROTO_TCP) {
if (sk->sk_state != TCP_LISTEN)
return sk;
} else if (sk->sk_protocol == IPPROTO_UDP) {
if (sk->sk_state != TCP_CLOSE)
return sk;
} else {
return sk;
}
reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff,
saddr, sport, daddr, ntohs(dport),
ehashfn);
if (!reuse_sk)
return sk;
/* We've chosen a new reuseport sock which is never refcounted. This
* implies that sk also isn't refcounted.
*/
WARN_ON_ONCE(*refcounted);
return reuse_sk;
}
static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
struct sk_buff *skb, int doff, struct sk_buff *skb, int doff,
const __be16 sport, const __be16 sport,
@@ -85,14 +150,20 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
int iif, int sdif, int iif, int sdif,
bool *refcounted) bool *refcounted)
{ {
struct sock *sk = skb_steal_sock(skb, refcounted); struct net *net = dev_net(skb_dst(skb)->dev);
const struct ipv6hdr *ip6h = ipv6_hdr(skb);
struct sock *sk;
sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport, &ip6h->daddr, dport,
refcounted, inet6_ehashfn);
if (IS_ERR(sk))
return NULL;
if (sk) if (sk)
return sk; return sk;
return __inet6_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb, return __inet6_lookup(net, hashinfo, skb,
doff, &ipv6_hdr(skb)->saddr, sport, doff, &ip6h->saddr, sport,
&ipv6_hdr(skb)->daddr, ntohs(dport), &ip6h->daddr, ntohs(dport),
iif, sdif, refcounted); iif, sdif, refcounted);
} }

View File

@@ -379,6 +379,27 @@ struct sock *__inet_lookup_established(struct net *net,
const __be32 daddr, const u16 hnum, const __be32 daddr, const u16 hnum,
const int dif, const int sdif); const int dif, const int sdif);
typedef u32 (inet_ehashfn_t)(const struct net *net,
const __be32 laddr, const __u16 lport,
const __be32 faddr, const __be16 fport);
inet_ehashfn_t inet_ehashfn;
INDIRECT_CALLABLE_DECLARE(inet_ehashfn_t udp_ehashfn);
struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
struct sk_buff *skb, int doff,
__be32 saddr, __be16 sport,
__be32 daddr, unsigned short hnum,
inet_ehashfn_t *ehashfn);
struct sock *inet_lookup_run_sk_lookup(struct net *net,
int protocol,
struct sk_buff *skb, int doff,
__be32 saddr, __be16 sport,
__be32 daddr, u16 hnum, const int dif,
inet_ehashfn_t *ehashfn);
static inline struct sock * static inline struct sock *
inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo,
const __be32 saddr, const __be16 sport, const __be32 saddr, const __be16 sport,
@@ -428,6 +449,46 @@ static inline struct sock *inet_lookup(struct net *net,
return sk; return sk;
} }
static inline
struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int doff,
const __be32 saddr, const __be16 sport,
const __be32 daddr, const __be16 dport,
bool *refcounted, inet_ehashfn_t *ehashfn)
{
struct sock *sk, *reuse_sk;
bool prefetched;
sk = skb_steal_sock(skb, refcounted, &prefetched);
if (!sk)
return NULL;
if (!prefetched)
return sk;
if (sk->sk_protocol == IPPROTO_TCP) {
if (sk->sk_state != TCP_LISTEN)
return sk;
} else if (sk->sk_protocol == IPPROTO_UDP) {
if (sk->sk_state != TCP_CLOSE)
return sk;
} else {
return sk;
}
reuse_sk = inet_lookup_reuseport(net, sk, skb, doff,
saddr, sport, daddr, ntohs(dport),
ehashfn);
if (!reuse_sk)
return sk;
/* We've chosen a new reuseport sock which is never refcounted. This
* implies that sk also isn't refcounted.
*/
WARN_ON_ONCE(*refcounted);
return reuse_sk;
}
static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo,
struct sk_buff *skb, struct sk_buff *skb,
int doff, int doff,
@@ -436,22 +497,23 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo,
const int sdif, const int sdif,
bool *refcounted) bool *refcounted)
{ {
struct sock *sk = skb_steal_sock(skb, refcounted); struct net *net = dev_net(skb_dst(skb)->dev);
const struct iphdr *iph = ip_hdr(skb); const struct iphdr *iph = ip_hdr(skb);
struct sock *sk;
sk = inet_steal_sock(net, skb, doff, iph->saddr, sport, iph->daddr, dport,
refcounted, inet_ehashfn);
if (IS_ERR(sk))
return NULL;
if (sk) if (sk)
return sk; return sk;
return __inet_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb, return __inet_lookup(net, hashinfo, skb,
doff, iph->saddr, sport, doff, iph->saddr, sport,
iph->daddr, dport, inet_iif(skb), sdif, iph->daddr, dport, inet_iif(skb), sdif,
refcounted); refcounted);
} }
u32 inet6_ehashfn(const struct net *net,
const struct in6_addr *laddr, const u16 lport,
const struct in6_addr *faddr, const __be16 fport);
static inline void sk_daddr_set(struct sock *sk, __be32 addr) static inline void sk_daddr_set(struct sock *sk, __be32 addr)
{ {
sk->sk_daddr = addr; /* alias of inet_daddr */ sk->sk_daddr = addr; /* alias of inet_daddr */

View File

@@ -4,6 +4,8 @@
#ifndef _MANA_H #ifndef _MANA_H
#define _MANA_H #define _MANA_H
#include <net/xdp.h>
#include "gdma.h" #include "gdma.h"
#include "hw_channel.h" #include "hw_channel.h"

View File

@@ -0,0 +1,53 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_NETDEV_RX_QUEUE_H
#define _LINUX_NETDEV_RX_QUEUE_H
#include <linux/kobject.h>
#include <linux/netdevice.h>
#include <linux/sysfs.h>
#include <net/xdp.h>
/* This structure contains an instance of an RX queue. */
struct netdev_rx_queue {
struct xdp_rxq_info xdp_rxq;
#ifdef CONFIG_RPS
struct rps_map __rcu *rps_map;
struct rps_dev_flow_table __rcu *rps_flow_table;
#endif
struct kobject kobj;
struct net_device *dev;
netdevice_tracker dev_tracker;
#ifdef CONFIG_XDP_SOCKETS
struct xsk_buff_pool *pool;
#endif
} ____cacheline_aligned_in_smp;
/*
* RX queue sysfs structures and functions.
*/
struct rx_queue_attribute {
struct attribute attr;
ssize_t (*show)(struct netdev_rx_queue *queue, char *buf);
ssize_t (*store)(struct netdev_rx_queue *queue,
const char *buf, size_t len);
};
static inline struct netdev_rx_queue *
__netif_get_rx_queue(struct net_device *dev, unsigned int rxq)
{
return dev->_rx + rxq;
}
#ifdef CONFIG_SYSFS
static inline unsigned int
get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
{
struct net_device *dev = queue->dev;
int index = queue - dev->_rx;
BUG_ON(index >= dev->num_rx_queues);
return index;
}
#endif
#endif

View File

@@ -2815,20 +2815,23 @@ sk_is_refcounted(struct sock *sk)
* skb_steal_sock - steal a socket from an sk_buff * skb_steal_sock - steal a socket from an sk_buff
* @skb: sk_buff to steal the socket from * @skb: sk_buff to steal the socket from
* @refcounted: is set to true if the socket is reference-counted * @refcounted: is set to true if the socket is reference-counted
* @prefetched: is set to true if the socket was assigned from bpf
*/ */
static inline struct sock * static inline struct sock *
skb_steal_sock(struct sk_buff *skb, bool *refcounted) skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched)
{ {
if (skb->sk) { if (skb->sk) {
struct sock *sk = skb->sk; struct sock *sk = skb->sk;
*refcounted = true; *refcounted = true;
if (skb_sk_is_prefetched(skb)) *prefetched = skb_sk_is_prefetched(skb);
if (*prefetched)
*refcounted = sk_is_refcounted(sk); *refcounted = sk_is_refcounted(sk);
skb->destructor = NULL; skb->destructor = NULL;
skb->sk = NULL; skb->sk = NULL;
return sk; return sk;
} }
*prefetched = false;
*refcounted = false; *refcounted = false;
return NULL; return NULL;
} }

View File

@@ -6,9 +6,10 @@
#ifndef __LINUX_NET_XDP_H__ #ifndef __LINUX_NET_XDP_H__
#define __LINUX_NET_XDP_H__ #define __LINUX_NET_XDP_H__
#include <linux/skbuff.h> /* skb_shared_info */
#include <uapi/linux/netdev.h>
#include <linux/bitfield.h> #include <linux/bitfield.h>
#include <linux/filter.h>
#include <linux/netdevice.h>
#include <linux/skbuff.h> /* skb_shared_info */
/** /**
* DOC: XDP RX-queue information * DOC: XDP RX-queue information
@@ -45,8 +46,6 @@ enum xdp_mem_type {
MEM_TYPE_MAX, MEM_TYPE_MAX,
}; };
typedef u32 xdp_features_t;
/* XDP flags for ndo_xdp_xmit */ /* XDP flags for ndo_xdp_xmit */
#define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */
#define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH
@@ -443,6 +442,12 @@ enum xdp_rss_hash_type {
XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR, XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR,
}; };
struct xdp_metadata_ops {
int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash,
enum xdp_rss_hash_type *rss_type);
};
#ifdef CONFIG_NET #ifdef CONFIG_NET
u32 bpf_xdp_metadata_kfunc_id(int id); u32 bpf_xdp_metadata_kfunc_id(int id);
bool bpf_dev_bound_kfunc_id(u32 btf_id); bool bpf_dev_bound_kfunc_id(u32 btf_id);
@@ -474,4 +479,20 @@ static inline void xdp_clear_features_flag(struct net_device *dev)
xdp_set_features_flag(dev, 0); xdp_set_features_flag(dev, 0);
} }
static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
struct xdp_buff *xdp)
{
/* Driver XDP hooks are invoked within a single NAPI poll cycle and thus
* under local_bh_disable(), which provides the needed RCU protection
* for accessing map entries.
*/
u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
act = xdp_master_redirect(xdp);
}
return act;
}
#endif /* __LINUX_NET_XDP_H__ */ #endif /* __LINUX_NET_XDP_H__ */

View File

@@ -9,6 +9,7 @@
#include <linux/filter.h> #include <linux/filter.h>
#include <linux/tracepoint.h> #include <linux/tracepoint.h>
#include <linux/bpf.h> #include <linux/bpf.h>
#include <net/xdp.h>
#define __XDP_ACT_MAP(FN) \ #define __XDP_ACT_MAP(FN) \
FN(ABORTED) \ FN(ABORTED) \
@@ -404,6 +405,23 @@ TRACE_EVENT(mem_return_failed,
) )
); );
TRACE_EVENT(bpf_xdp_link_attach_failed,
TP_PROTO(const char *msg),
TP_ARGS(msg),
TP_STRUCT__entry(
__string(msg, msg)
),
TP_fast_assign(
__assign_str(msg, msg);
),
TP_printk("errmsg=%s", __get_str(msg))
);
#endif /* _TRACE_XDP_H */ #endif /* _TRACE_XDP_H */
#include <trace/define_trace.h> #include <trace/define_trace.h>

View File

@@ -19,6 +19,7 @@
/* ld/ldx fields */ /* ld/ldx fields */
#define BPF_DW 0x18 /* double word (64-bit) */ #define BPF_DW 0x18 /* double word (64-bit) */
#define BPF_MEMSX 0x80 /* load with sign extension */
#define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */ #define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */
#define BPF_XADD 0xc0 /* exclusive add - legacy name */ #define BPF_XADD 0xc0 /* exclusive add - legacy name */
@@ -1187,6 +1188,11 @@ enum bpf_perf_event_type {
*/ */
#define BPF_F_KPROBE_MULTI_RETURN (1U << 0) #define BPF_F_KPROBE_MULTI_RETURN (1U << 0)
/* link_create.netfilter.flags used in LINK_CREATE command for
* BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation.
*/
#define BPF_F_NETFILTER_IP_DEFRAG (1U << 0)
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions: * the following extensions:
* *
@@ -4198,9 +4204,6 @@ union bpf_attr {
* **-EOPNOTSUPP** if the operation is not supported, for example * **-EOPNOTSUPP** if the operation is not supported, for example
* a call from outside of TC ingress. * a call from outside of TC ingress.
* *
* **-ESOCKTNOSUPPORT** if the socket type is not supported
* (reuseport).
*
* long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags) * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
* Description * Description
* Helper is overloaded depending on BPF program type. This * Helper is overloaded depending on BPF program type. This

View File

@@ -29,6 +29,7 @@
#include <net/netfilter/nf_bpf_link.h> #include <net/netfilter/nf_bpf_link.h>
#include <net/sock.h> #include <net/sock.h>
#include <net/xdp.h>
#include "../tools/lib/bpf/relo_core.h" #include "../tools/lib/bpf/relo_core.h"
/* BTF (BPF Type Format) is the meta data format which describes /* BTF (BPF Type Format) is the meta data format which describes

View File

@@ -61,6 +61,7 @@
#define AX regs[BPF_REG_AX] #define AX regs[BPF_REG_AX]
#define ARG1 regs[BPF_REG_ARG1] #define ARG1 regs[BPF_REG_ARG1]
#define CTX regs[BPF_REG_CTX] #define CTX regs[BPF_REG_CTX]
#define OFF insn->off
#define IMM insn->imm #define IMM insn->imm
struct bpf_mem_alloc bpf_global_ma; struct bpf_mem_alloc bpf_global_ma;
@@ -372,7 +373,12 @@ static int bpf_adj_delta_to_off(struct bpf_insn *insn, u32 pos, s32 end_old,
{ {
const s32 off_min = S16_MIN, off_max = S16_MAX; const s32 off_min = S16_MIN, off_max = S16_MAX;
s32 delta = end_new - end_old; s32 delta = end_new - end_old;
s32 off = insn->off; s32 off;
if (insn->code == (BPF_JMP32 | BPF_JA))
off = insn->imm;
else
off = insn->off;
if (curr < pos && curr + off + 1 >= end_old) if (curr < pos && curr + off + 1 >= end_old)
off += delta; off += delta;
@@ -380,8 +386,12 @@ static int bpf_adj_delta_to_off(struct bpf_insn *insn, u32 pos, s32 end_old,
off -= delta; off -= delta;
if (off < off_min || off > off_max) if (off < off_min || off > off_max)
return -ERANGE; return -ERANGE;
if (!probe_pass) if (!probe_pass) {
if (insn->code == (BPF_JMP32 | BPF_JA))
insn->imm = off;
else
insn->off = off; insn->off = off;
}
return 0; return 0;
} }
@@ -1271,7 +1281,7 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
case BPF_ALU | BPF_MOD | BPF_K: case BPF_ALU | BPF_MOD | BPF_K:
*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm); *to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd); *to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
*to++ = BPF_ALU32_REG(from->code, from->dst_reg, BPF_REG_AX); *to++ = BPF_ALU32_REG_OFF(from->code, from->dst_reg, BPF_REG_AX, from->off);
break; break;
case BPF_ALU64 | BPF_ADD | BPF_K: case BPF_ALU64 | BPF_ADD | BPF_K:
@@ -1285,7 +1295,7 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
case BPF_ALU64 | BPF_MOD | BPF_K: case BPF_ALU64 | BPF_MOD | BPF_K:
*to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm); *to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
*to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd); *to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
*to++ = BPF_ALU64_REG(from->code, from->dst_reg, BPF_REG_AX); *to++ = BPF_ALU64_REG_OFF(from->code, from->dst_reg, BPF_REG_AX, from->off);
break; break;
case BPF_JMP | BPF_JEQ | BPF_K: case BPF_JMP | BPF_JEQ | BPF_K:
@@ -1523,6 +1533,7 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
INSN_3(ALU64, DIV, X), \ INSN_3(ALU64, DIV, X), \
INSN_3(ALU64, MOD, X), \ INSN_3(ALU64, MOD, X), \
INSN_2(ALU64, NEG), \ INSN_2(ALU64, NEG), \
INSN_3(ALU64, END, TO_LE), \
/* Immediate based. */ \ /* Immediate based. */ \
INSN_3(ALU64, ADD, K), \ INSN_3(ALU64, ADD, K), \
INSN_3(ALU64, SUB, K), \ INSN_3(ALU64, SUB, K), \
@@ -1591,6 +1602,7 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
INSN_3(JMP, JSLE, K), \ INSN_3(JMP, JSLE, K), \
INSN_3(JMP, JSET, K), \ INSN_3(JMP, JSET, K), \
INSN_2(JMP, JA), \ INSN_2(JMP, JA), \
INSN_2(JMP32, JA), \
/* Store instructions. */ \ /* Store instructions. */ \
/* Register based. */ \ /* Register based. */ \
INSN_3(STX, MEM, B), \ INSN_3(STX, MEM, B), \
@@ -1610,6 +1622,9 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
INSN_3(LDX, MEM, H), \ INSN_3(LDX, MEM, H), \
INSN_3(LDX, MEM, W), \ INSN_3(LDX, MEM, W), \
INSN_3(LDX, MEM, DW), \ INSN_3(LDX, MEM, DW), \
INSN_3(LDX, MEMSX, B), \
INSN_3(LDX, MEMSX, H), \
INSN_3(LDX, MEMSX, W), \
/* Immediate based. */ \ /* Immediate based. */ \
INSN_3(LD, IMM, DW) INSN_3(LD, IMM, DW)
@@ -1635,12 +1650,6 @@ bool bpf_opcode_in_insntable(u8 code)
} }
#ifndef CONFIG_BPF_JIT_ALWAYS_ON #ifndef CONFIG_BPF_JIT_ALWAYS_ON
u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
{
memset(dst, 0, size);
return -EFAULT;
}
/** /**
* ___bpf_prog_run - run eBPF program on a given context * ___bpf_prog_run - run eBPF program on a given context
* @regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers * @regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers
@@ -1666,6 +1675,9 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
[BPF_LDX | BPF_PROBE_MEM | BPF_H] = &&LDX_PROBE_MEM_H, [BPF_LDX | BPF_PROBE_MEM | BPF_H] = &&LDX_PROBE_MEM_H,
[BPF_LDX | BPF_PROBE_MEM | BPF_W] = &&LDX_PROBE_MEM_W, [BPF_LDX | BPF_PROBE_MEM | BPF_W] = &&LDX_PROBE_MEM_W,
[BPF_LDX | BPF_PROBE_MEM | BPF_DW] = &&LDX_PROBE_MEM_DW, [BPF_LDX | BPF_PROBE_MEM | BPF_DW] = &&LDX_PROBE_MEM_DW,
[BPF_LDX | BPF_PROBE_MEMSX | BPF_B] = &&LDX_PROBE_MEMSX_B,
[BPF_LDX | BPF_PROBE_MEMSX | BPF_H] = &&LDX_PROBE_MEMSX_H,
[BPF_LDX | BPF_PROBE_MEMSX | BPF_W] = &&LDX_PROBE_MEMSX_W,
}; };
#undef BPF_INSN_3_LBL #undef BPF_INSN_3_LBL
#undef BPF_INSN_2_LBL #undef BPF_INSN_2_LBL
@@ -1733,13 +1745,36 @@ select_insn:
DST = -DST; DST = -DST;
CONT; CONT;
ALU_MOV_X: ALU_MOV_X:
switch (OFF) {
case 0:
DST = (u32) SRC; DST = (u32) SRC;
break;
case 8:
DST = (u32)(s8) SRC;
break;
case 16:
DST = (u32)(s16) SRC;
break;
}
CONT; CONT;
ALU_MOV_K: ALU_MOV_K:
DST = (u32) IMM; DST = (u32) IMM;
CONT; CONT;
ALU64_MOV_X: ALU64_MOV_X:
switch (OFF) {
case 0:
DST = SRC; DST = SRC;
break;
case 8:
DST = (s8) SRC;
break;
case 16:
DST = (s16) SRC;
break;
case 32:
DST = (s32) SRC;
break;
}
CONT; CONT;
ALU64_MOV_K: ALU64_MOV_K:
DST = IMM; DST = IMM;
@@ -1761,36 +1796,114 @@ select_insn:
(*(s64 *) &DST) >>= IMM; (*(s64 *) &DST) >>= IMM;
CONT; CONT;
ALU64_MOD_X: ALU64_MOD_X:
switch (OFF) {
case 0:
div64_u64_rem(DST, SRC, &AX); div64_u64_rem(DST, SRC, &AX);
DST = AX; DST = AX;
break;
case 1:
AX = div64_s64(DST, SRC);
DST = DST - AX * SRC;
break;
}
CONT; CONT;
ALU_MOD_X: ALU_MOD_X:
switch (OFF) {
case 0:
AX = (u32) DST; AX = (u32) DST;
DST = do_div(AX, (u32) SRC); DST = do_div(AX, (u32) SRC);
break;
case 1:
AX = abs((s32)DST);
AX = do_div(AX, abs((s32)SRC));
if ((s32)DST < 0)
DST = (u32)-AX;
else
DST = (u32)AX;
break;
}
CONT; CONT;
ALU64_MOD_K: ALU64_MOD_K:
switch (OFF) {
case 0:
div64_u64_rem(DST, IMM, &AX); div64_u64_rem(DST, IMM, &AX);
DST = AX; DST = AX;
break;
case 1:
AX = div64_s64(DST, IMM);
DST = DST - AX * IMM;
break;
}
CONT; CONT;
ALU_MOD_K: ALU_MOD_K:
switch (OFF) {
case 0:
AX = (u32) DST; AX = (u32) DST;
DST = do_div(AX, (u32) IMM); DST = do_div(AX, (u32) IMM);
break;
case 1:
AX = abs((s32)DST);
AX = do_div(AX, abs((s32)IMM));
if ((s32)DST < 0)
DST = (u32)-AX;
else
DST = (u32)AX;
break;
}
CONT; CONT;
ALU64_DIV_X: ALU64_DIV_X:
switch (OFF) {
case 0:
DST = div64_u64(DST, SRC); DST = div64_u64(DST, SRC);
break;
case 1:
DST = div64_s64(DST, SRC);
break;
}
CONT; CONT;
ALU_DIV_X: ALU_DIV_X:
switch (OFF) {
case 0:
AX = (u32) DST; AX = (u32) DST;
do_div(AX, (u32) SRC); do_div(AX, (u32) SRC);
DST = (u32) AX; DST = (u32) AX;
break;
case 1:
AX = abs((s32)DST);
do_div(AX, abs((s32)SRC));
if (((s32)DST < 0) == ((s32)SRC < 0))
DST = (u32)AX;
else
DST = (u32)-AX;
break;
}
CONT; CONT;
ALU64_DIV_K: ALU64_DIV_K:
switch (OFF) {
case 0:
DST = div64_u64(DST, IMM); DST = div64_u64(DST, IMM);
break;
case 1:
DST = div64_s64(DST, IMM);
break;
}
CONT; CONT;
ALU_DIV_K: ALU_DIV_K:
switch (OFF) {
case 0:
AX = (u32) DST; AX = (u32) DST;
do_div(AX, (u32) IMM); do_div(AX, (u32) IMM);
DST = (u32) AX; DST = (u32) AX;
break;
case 1:
AX = abs((s32)DST);
do_div(AX, abs((s32)IMM));
if (((s32)DST < 0) == ((s32)IMM < 0))
DST = (u32)AX;
else
DST = (u32)-AX;
break;
}
CONT; CONT;
ALU_END_TO_BE: ALU_END_TO_BE:
switch (IMM) { switch (IMM) {
@@ -1818,6 +1931,19 @@ select_insn:
break; break;
} }
CONT; CONT;
ALU64_END_TO_LE:
switch (IMM) {
case 16:
DST = (__force u16) __swab16(DST);
break;
case 32:
DST = (__force u32) __swab32(DST);
break;
case 64:
DST = (__force u64) __swab64(DST);
break;
}
CONT;
/* CALL */ /* CALL */
JMP_CALL: JMP_CALL:
@@ -1867,6 +1993,9 @@ out:
JMP_JA: JMP_JA:
insn += insn->off; insn += insn->off;
CONT; CONT;
JMP32_JA:
insn += insn->imm;
CONT;
JMP_EXIT: JMP_EXIT:
return BPF_R0; return BPF_R0;
/* JMP */ /* JMP */
@@ -1931,7 +2060,7 @@ out:
DST = *(SIZE *)(unsigned long) (SRC + insn->off); \ DST = *(SIZE *)(unsigned long) (SRC + insn->off); \
CONT; \ CONT; \
LDX_PROBE_MEM_##SIZEOP: \ LDX_PROBE_MEM_##SIZEOP: \
bpf_probe_read_kernel(&DST, sizeof(SIZE), \ bpf_probe_read_kernel_common(&DST, sizeof(SIZE), \
(const void *)(long) (SRC + insn->off)); \ (const void *)(long) (SRC + insn->off)); \
DST = *((SIZE *)&DST); \ DST = *((SIZE *)&DST); \
CONT; CONT;
@@ -1942,6 +2071,21 @@ out:
LDST(DW, u64) LDST(DW, u64)
#undef LDST #undef LDST
#define LDSX(SIZEOP, SIZE) \
LDX_MEMSX_##SIZEOP: \
DST = *(SIZE *)(unsigned long) (SRC + insn->off); \
CONT; \
LDX_PROBE_MEMSX_##SIZEOP: \
bpf_probe_read_kernel_common(&DST, sizeof(SIZE), \
(const void *)(long) (SRC + insn->off)); \
DST = *((SIZE *)&DST); \
CONT;
LDSX(B, s8)
LDSX(H, s16)
LDSX(W, s32)
#undef LDSX
#define ATOMIC_ALU_OP(BOP, KOP) \ #define ATOMIC_ALU_OP(BOP, KOP) \
case BOP: \ case BOP: \
if (BPF_SIZE(insn->code) == BPF_W) \ if (BPF_SIZE(insn->code) == BPF_W) \

View File

@@ -61,8 +61,6 @@ struct bpf_cpu_map_entry {
/* XDP can run multiple RX-ring queues, need __percpu enqueue store */ /* XDP can run multiple RX-ring queues, need __percpu enqueue store */
struct xdp_bulk_queue __percpu *bulkq; struct xdp_bulk_queue __percpu *bulkq;
struct bpf_cpu_map *cmap;
/* Queue with potential multi-producers, and single-consumer kthread */ /* Queue with potential multi-producers, and single-consumer kthread */
struct ptr_ring *queue; struct ptr_ring *queue;
struct task_struct *kthread; struct task_struct *kthread;
@@ -595,7 +593,6 @@ static long cpu_map_update_elem(struct bpf_map *map, void *key, void *value,
rcpu = __cpu_map_entry_alloc(map, &cpumap_value, key_cpu); rcpu = __cpu_map_entry_alloc(map, &cpumap_value, key_cpu);
if (!rcpu) if (!rcpu)
return -ENOMEM; return -ENOMEM;
rcpu->cmap = cmap;
} }
rcu_read_lock(); rcu_read_lock();
__cpu_map_entry_replace(cmap, key_cpu, rcpu); __cpu_map_entry_replace(cmap, key_cpu, rcpu);

View File

@@ -65,7 +65,6 @@ struct xdp_dev_bulk_queue {
struct bpf_dtab_netdev { struct bpf_dtab_netdev {
struct net_device *dev; /* must be first member, due to tracepoint */ struct net_device *dev; /* must be first member, due to tracepoint */
struct hlist_node index_hlist; struct hlist_node index_hlist;
struct bpf_dtab *dtab;
struct bpf_prog *xdp_prog; struct bpf_prog *xdp_prog;
struct rcu_head rcu; struct rcu_head rcu;
unsigned int idx; unsigned int idx;
@@ -874,7 +873,6 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
} }
dev->idx = idx; dev->idx = idx;
dev->dtab = dtab;
if (prog) { if (prog) {
dev->xdp_prog = prog; dev->xdp_prog = prog;
dev->val.bpf_prog.id = prog->aux->id; dev->val.bpf_prog.id = prog->aux->id;

View File

@@ -87,6 +87,17 @@ const char *const bpf_alu_string[16] = {
[BPF_END >> 4] = "endian", [BPF_END >> 4] = "endian",
}; };
const char *const bpf_alu_sign_string[16] = {
[BPF_DIV >> 4] = "s/=",
[BPF_MOD >> 4] = "s%=",
};
const char *const bpf_movsx_string[4] = {
[0] = "(s8)",
[1] = "(s16)",
[3] = "(s32)",
};
static const char *const bpf_atomic_alu_string[16] = { static const char *const bpf_atomic_alu_string[16] = {
[BPF_ADD >> 4] = "add", [BPF_ADD >> 4] = "add",
[BPF_AND >> 4] = "and", [BPF_AND >> 4] = "and",
@@ -101,6 +112,12 @@ static const char *const bpf_ldst_string[] = {
[BPF_DW >> 3] = "u64", [BPF_DW >> 3] = "u64",
}; };
static const char *const bpf_ldsx_string[] = {
[BPF_W >> 3] = "s32",
[BPF_H >> 3] = "s16",
[BPF_B >> 3] = "s8",
};
static const char *const bpf_jmp_string[16] = { static const char *const bpf_jmp_string[16] = {
[BPF_JA >> 4] = "jmp", [BPF_JA >> 4] = "jmp",
[BPF_JEQ >> 4] = "==", [BPF_JEQ >> 4] = "==",
@@ -128,6 +145,27 @@ static void print_bpf_end_insn(bpf_insn_print_t verbose,
insn->imm, insn->dst_reg); insn->imm, insn->dst_reg);
} }
static void print_bpf_bswap_insn(bpf_insn_print_t verbose,
void *private_data,
const struct bpf_insn *insn)
{
verbose(private_data, "(%02x) r%d = bswap%d r%d\n",
insn->code, insn->dst_reg,
insn->imm, insn->dst_reg);
}
static bool is_sdiv_smod(const struct bpf_insn *insn)
{
return (BPF_OP(insn->code) == BPF_DIV || BPF_OP(insn->code) == BPF_MOD) &&
insn->off == 1;
}
static bool is_movsx(const struct bpf_insn *insn)
{
return BPF_OP(insn->code) == BPF_MOV &&
(insn->off == 8 || insn->off == 16 || insn->off == 32);
}
void print_bpf_insn(const struct bpf_insn_cbs *cbs, void print_bpf_insn(const struct bpf_insn_cbs *cbs,
const struct bpf_insn *insn, const struct bpf_insn *insn,
bool allow_ptr_leaks) bool allow_ptr_leaks)
@@ -138,7 +176,7 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
if (class == BPF_ALU || class == BPF_ALU64) { if (class == BPF_ALU || class == BPF_ALU64) {
if (BPF_OP(insn->code) == BPF_END) { if (BPF_OP(insn->code) == BPF_END) {
if (class == BPF_ALU64) if (class == BPF_ALU64)
verbose(cbs->private_data, "BUG_alu64_%02x\n", insn->code); print_bpf_bswap_insn(verbose, cbs->private_data, insn);
else else
print_bpf_end_insn(verbose, cbs->private_data, insn); print_bpf_end_insn(verbose, cbs->private_data, insn);
} else if (BPF_OP(insn->code) == BPF_NEG) { } else if (BPF_OP(insn->code) == BPF_NEG) {
@@ -147,17 +185,20 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
insn->dst_reg, class == BPF_ALU ? 'w' : 'r', insn->dst_reg, class == BPF_ALU ? 'w' : 'r',
insn->dst_reg); insn->dst_reg);
} else if (BPF_SRC(insn->code) == BPF_X) { } else if (BPF_SRC(insn->code) == BPF_X) {
verbose(cbs->private_data, "(%02x) %c%d %s %c%d\n", verbose(cbs->private_data, "(%02x) %c%d %s %s%c%d\n",
insn->code, class == BPF_ALU ? 'w' : 'r', insn->code, class == BPF_ALU ? 'w' : 'r',
insn->dst_reg, insn->dst_reg,
bpf_alu_string[BPF_OP(insn->code) >> 4], is_sdiv_smod(insn) ? bpf_alu_sign_string[BPF_OP(insn->code) >> 4]
: bpf_alu_string[BPF_OP(insn->code) >> 4],
is_movsx(insn) ? bpf_movsx_string[(insn->off >> 3) - 1] : "",
class == BPF_ALU ? 'w' : 'r', class == BPF_ALU ? 'w' : 'r',
insn->src_reg); insn->src_reg);
} else { } else {
verbose(cbs->private_data, "(%02x) %c%d %s %d\n", verbose(cbs->private_data, "(%02x) %c%d %s %d\n",
insn->code, class == BPF_ALU ? 'w' : 'r', insn->code, class == BPF_ALU ? 'w' : 'r',
insn->dst_reg, insn->dst_reg,
bpf_alu_string[BPF_OP(insn->code) >> 4], is_sdiv_smod(insn) ? bpf_alu_sign_string[BPF_OP(insn->code) >> 4]
: bpf_alu_string[BPF_OP(insn->code) >> 4],
insn->imm); insn->imm);
} }
} else if (class == BPF_STX) { } else if (class == BPF_STX) {
@@ -218,13 +259,15 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
verbose(cbs->private_data, "BUG_st_%02x\n", insn->code); verbose(cbs->private_data, "BUG_st_%02x\n", insn->code);
} }
} else if (class == BPF_LDX) { } else if (class == BPF_LDX) {
if (BPF_MODE(insn->code) != BPF_MEM) { if (BPF_MODE(insn->code) != BPF_MEM && BPF_MODE(insn->code) != BPF_MEMSX) {
verbose(cbs->private_data, "BUG_ldx_%02x\n", insn->code); verbose(cbs->private_data, "BUG_ldx_%02x\n", insn->code);
return; return;
} }
verbose(cbs->private_data, "(%02x) r%d = *(%s *)(r%d %+d)\n", verbose(cbs->private_data, "(%02x) r%d = *(%s *)(r%d %+d)\n",
insn->code, insn->dst_reg, insn->code, insn->dst_reg,
bpf_ldst_string[BPF_SIZE(insn->code) >> 3], BPF_MODE(insn->code) == BPF_MEM ?
bpf_ldst_string[BPF_SIZE(insn->code) >> 3] :
bpf_ldsx_string[BPF_SIZE(insn->code) >> 3],
insn->src_reg, insn->off); insn->src_reg, insn->off);
} else if (class == BPF_LD) { } else if (class == BPF_LD) {
if (BPF_MODE(insn->code) == BPF_ABS) { if (BPF_MODE(insn->code) == BPF_ABS) {
@@ -279,6 +322,9 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
} else if (insn->code == (BPF_JMP | BPF_JA)) { } else if (insn->code == (BPF_JMP | BPF_JA)) {
verbose(cbs->private_data, "(%02x) goto pc%+d\n", verbose(cbs->private_data, "(%02x) goto pc%+d\n",
insn->code, insn->off); insn->code, insn->off);
} else if (insn->code == (BPF_JMP32 | BPF_JA)) {
verbose(cbs->private_data, "(%02x) gotol pc%+d\n",
insn->code, insn->imm);
} else if (insn->code == (BPF_JMP | BPF_EXIT)) { } else if (insn->code == (BPF_JMP | BPF_EXIT)) {
verbose(cbs->private_data, "(%02x) exit\n", insn->code); verbose(cbs->private_data, "(%02x) exit\n", insn->code);
} else if (BPF_SRC(insn->code) == BPF_X) { } else if (BPF_SRC(insn->code) == BPF_X) {

View File

@@ -183,11 +183,11 @@ static void inc_active(struct bpf_mem_cache *c, unsigned long *flags)
WARN_ON_ONCE(local_inc_return(&c->active) != 1); WARN_ON_ONCE(local_inc_return(&c->active) != 1);
} }
static void dec_active(struct bpf_mem_cache *c, unsigned long flags) static void dec_active(struct bpf_mem_cache *c, unsigned long *flags)
{ {
local_dec(&c->active); local_dec(&c->active);
if (IS_ENABLED(CONFIG_PREEMPT_RT)) if (IS_ENABLED(CONFIG_PREEMPT_RT))
local_irq_restore(flags); local_irq_restore(*flags);
} }
static void add_obj_to_free_list(struct bpf_mem_cache *c, void *obj) static void add_obj_to_free_list(struct bpf_mem_cache *c, void *obj)
@@ -197,16 +197,20 @@ static void add_obj_to_free_list(struct bpf_mem_cache *c, void *obj)
inc_active(c, &flags); inc_active(c, &flags);
__llist_add(obj, &c->free_llist); __llist_add(obj, &c->free_llist);
c->free_cnt++; c->free_cnt++;
dec_active(c, flags); dec_active(c, &flags);
} }
/* Mostly runs from irq_work except __init phase. */ /* Mostly runs from irq_work except __init phase. */
static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node, bool atomic)
{ {
struct mem_cgroup *memcg = NULL, *old_memcg; struct mem_cgroup *memcg = NULL, *old_memcg;
gfp_t gfp;
void *obj; void *obj;
int i; int i;
gfp = __GFP_NOWARN | __GFP_ACCOUNT;
gfp |= atomic ? GFP_NOWAIT : GFP_KERNEL;
for (i = 0; i < cnt; i++) { for (i = 0; i < cnt; i++) {
/* /*
* For every 'c' llist_del_first(&c->free_by_rcu_ttrace); is * For every 'c' llist_del_first(&c->free_by_rcu_ttrace); is
@@ -238,7 +242,7 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node)
* will allocate from the current numa node which is what we * will allocate from the current numa node which is what we
* want here. * want here.
*/ */
obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT); obj = __alloc(c, node, gfp);
if (!obj) if (!obj)
break; break;
add_obj_to_free_list(c, obj); add_obj_to_free_list(c, obj);
@@ -344,7 +348,7 @@ static void free_bulk(struct bpf_mem_cache *c)
cnt = --c->free_cnt; cnt = --c->free_cnt;
else else
cnt = 0; cnt = 0;
dec_active(c, flags); dec_active(c, &flags);
if (llnode) if (llnode)
enque_to_free(tgt, llnode); enque_to_free(tgt, llnode);
} while (cnt > (c->high_watermark + c->low_watermark) / 2); } while (cnt > (c->high_watermark + c->low_watermark) / 2);
@@ -384,7 +388,7 @@ static void check_free_by_rcu(struct bpf_mem_cache *c)
llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra_rcu)) llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra_rcu))
if (__llist_add(llnode, &c->free_by_rcu)) if (__llist_add(llnode, &c->free_by_rcu))
c->free_by_rcu_tail = llnode; c->free_by_rcu_tail = llnode;
dec_active(c, flags); dec_active(c, &flags);
} }
if (llist_empty(&c->free_by_rcu)) if (llist_empty(&c->free_by_rcu))
@@ -408,7 +412,7 @@ static void check_free_by_rcu(struct bpf_mem_cache *c)
inc_active(c, &flags); inc_active(c, &flags);
WRITE_ONCE(c->waiting_for_gp.first, __llist_del_all(&c->free_by_rcu)); WRITE_ONCE(c->waiting_for_gp.first, __llist_del_all(&c->free_by_rcu));
c->waiting_for_gp_tail = c->free_by_rcu_tail; c->waiting_for_gp_tail = c->free_by_rcu_tail;
dec_active(c, flags); dec_active(c, &flags);
if (unlikely(READ_ONCE(c->draining))) { if (unlikely(READ_ONCE(c->draining))) {
free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size); free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size);
@@ -429,7 +433,7 @@ static void bpf_mem_refill(struct irq_work *work)
/* irq_work runs on this cpu and kmalloc will allocate /* irq_work runs on this cpu and kmalloc will allocate
* from the current numa node which is what we want here. * from the current numa node which is what we want here.
*/ */
alloc_bulk(c, c->batch, NUMA_NO_NODE); alloc_bulk(c, c->batch, NUMA_NO_NODE, true);
else if (cnt > c->high_watermark) else if (cnt > c->high_watermark)
free_bulk(c); free_bulk(c);
@@ -477,7 +481,7 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu)
* prog won't be doing more than 4 map_update_elem from * prog won't be doing more than 4 map_update_elem from
* irq disabled region * irq disabled region
*/ */
alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu)); alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu), false);
} }
/* When size != 0 bpf_mem_cache for each cpu. /* When size != 0 bpf_mem_cache for each cpu.

View File

@@ -25,6 +25,7 @@
#include <linux/rhashtable.h> #include <linux/rhashtable.h>
#include <linux/rtnetlink.h> #include <linux/rtnetlink.h>
#include <linux/rwsem.h> #include <linux/rwsem.h>
#include <net/xdp.h>
/* Protects offdevs, members of bpf_offload_netdev and offload members /* Protects offdevs, members of bpf_offload_netdev and offload members
* of all progs. * of all progs.

View File

@@ -26,6 +26,7 @@
#include <linux/poison.h> #include <linux/poison.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/cpumask.h> #include <linux/cpumask.h>
#include <net/xdp.h>
#include "disasm.h" #include "disasm.h"
@@ -2855,6 +2856,9 @@ static int check_subprogs(struct bpf_verifier_env *env)
goto next; goto next;
if (BPF_OP(code) == BPF_EXIT || BPF_OP(code) == BPF_CALL) if (BPF_OP(code) == BPF_EXIT || BPF_OP(code) == BPF_CALL)
goto next; goto next;
if (code == (BPF_JMP32 | BPF_JA))
off = i + insn[i].imm + 1;
else
off = i + insn[i].off + 1; off = i + insn[i].off + 1;
if (off < subprog_start || off >= subprog_end) { if (off < subprog_start || off >= subprog_end) {
verbose(env, "jump out of range from insn %d to %d\n", i, off); verbose(env, "jump out of range from insn %d to %d\n", i, off);
@@ -2867,6 +2871,7 @@ next:
* or unconditional jump back * or unconditional jump back
*/ */
if (code != (BPF_JMP | BPF_EXIT) && if (code != (BPF_JMP | BPF_EXIT) &&
code != (BPF_JMP32 | BPF_JA) &&
code != (BPF_JMP | BPF_JA)) { code != (BPF_JMP | BPF_JA)) {
verbose(env, "last insn is not an exit or jmp\n"); verbose(env, "last insn is not an exit or jmp\n");
return -EINVAL; return -EINVAL;
@@ -3012,8 +3017,10 @@ static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn,
} }
} }
if (class == BPF_ALU64 && op == BPF_END && (insn->imm == 16 || insn->imm == 32))
return false;
if (class == BPF_ALU64 || class == BPF_JMP || if (class == BPF_ALU64 || class == BPF_JMP ||
/* BPF_END always use BPF_ALU class. */
(class == BPF_ALU && op == BPF_END && insn->imm == 64)) (class == BPF_ALU && op == BPF_END && insn->imm == 64))
return true; return true;
@@ -3421,7 +3428,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
return 0; return 0;
if (opcode == BPF_MOV) { if (opcode == BPF_MOV) {
if (BPF_SRC(insn->code) == BPF_X) { if (BPF_SRC(insn->code) == BPF_X) {
/* dreg = sreg /* dreg = sreg or dreg = (s8, s16, s32)sreg
* dreg needs precision after this insn * dreg needs precision after this insn
* sreg needs precision before this insn * sreg needs precision before this insn
*/ */
@@ -5827,6 +5834,147 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
__reg_combine_64_into_32(reg); __reg_combine_64_into_32(reg);
} }
static void set_sext64_default_val(struct bpf_reg_state *reg, int size)
{
if (size == 1) {
reg->smin_value = reg->s32_min_value = S8_MIN;
reg->smax_value = reg->s32_max_value = S8_MAX;
} else if (size == 2) {
reg->smin_value = reg->s32_min_value = S16_MIN;
reg->smax_value = reg->s32_max_value = S16_MAX;
} else {
/* size == 4 */
reg->smin_value = reg->s32_min_value = S32_MIN;
reg->smax_value = reg->s32_max_value = S32_MAX;
}
reg->umin_value = reg->u32_min_value = 0;
reg->umax_value = U64_MAX;
reg->u32_max_value = U32_MAX;
reg->var_off = tnum_unknown;
}
static void coerce_reg_to_size_sx(struct bpf_reg_state *reg, int size)
{
s64 init_s64_max, init_s64_min, s64_max, s64_min, u64_cval;
u64 top_smax_value, top_smin_value;
u64 num_bits = size * 8;
if (tnum_is_const(reg->var_off)) {
u64_cval = reg->var_off.value;
if (size == 1)
reg->var_off = tnum_const((s8)u64_cval);
else if (size == 2)
reg->var_off = tnum_const((s16)u64_cval);
else
/* size == 4 */
reg->var_off = tnum_const((s32)u64_cval);
u64_cval = reg->var_off.value;
reg->smax_value = reg->smin_value = u64_cval;
reg->umax_value = reg->umin_value = u64_cval;
reg->s32_max_value = reg->s32_min_value = u64_cval;
reg->u32_max_value = reg->u32_min_value = u64_cval;
return;
}
top_smax_value = ((u64)reg->smax_value >> num_bits) << num_bits;
top_smin_value = ((u64)reg->smin_value >> num_bits) << num_bits;
if (top_smax_value != top_smin_value)
goto out;
/* find the s64_min and s64_min after sign extension */
if (size == 1) {
init_s64_max = (s8)reg->smax_value;
init_s64_min = (s8)reg->smin_value;
} else if (size == 2) {
init_s64_max = (s16)reg->smax_value;
init_s64_min = (s16)reg->smin_value;
} else {
init_s64_max = (s32)reg->smax_value;
init_s64_min = (s32)reg->smin_value;
}
s64_max = max(init_s64_max, init_s64_min);
s64_min = min(init_s64_max, init_s64_min);
/* both of s64_max/s64_min positive or negative */
if ((s64_max >= 0) == (s64_min >= 0)) {
reg->smin_value = reg->s32_min_value = s64_min;
reg->smax_value = reg->s32_max_value = s64_max;
reg->umin_value = reg->u32_min_value = s64_min;
reg->umax_value = reg->u32_max_value = s64_max;
reg->var_off = tnum_range(s64_min, s64_max);
return;
}
out:
set_sext64_default_val(reg, size);
}
static void set_sext32_default_val(struct bpf_reg_state *reg, int size)
{
if (size == 1) {
reg->s32_min_value = S8_MIN;
reg->s32_max_value = S8_MAX;
} else {
/* size == 2 */
reg->s32_min_value = S16_MIN;
reg->s32_max_value = S16_MAX;
}
reg->u32_min_value = 0;
reg->u32_max_value = U32_MAX;
}
static void coerce_subreg_to_size_sx(struct bpf_reg_state *reg, int size)
{
s32 init_s32_max, init_s32_min, s32_max, s32_min, u32_val;
u32 top_smax_value, top_smin_value;
u32 num_bits = size * 8;
if (tnum_is_const(reg->var_off)) {
u32_val = reg->var_off.value;
if (size == 1)
reg->var_off = tnum_const((s8)u32_val);
else
reg->var_off = tnum_const((s16)u32_val);
u32_val = reg->var_off.value;
reg->s32_min_value = reg->s32_max_value = u32_val;
reg->u32_min_value = reg->u32_max_value = u32_val;
return;
}
top_smax_value = ((u32)reg->s32_max_value >> num_bits) << num_bits;
top_smin_value = ((u32)reg->s32_min_value >> num_bits) << num_bits;
if (top_smax_value != top_smin_value)
goto out;
/* find the s32_min and s32_min after sign extension */
if (size == 1) {
init_s32_max = (s8)reg->s32_max_value;
init_s32_min = (s8)reg->s32_min_value;
} else {
/* size == 2 */
init_s32_max = (s16)reg->s32_max_value;
init_s32_min = (s16)reg->s32_min_value;
}
s32_max = max(init_s32_max, init_s32_min);
s32_min = min(init_s32_max, init_s32_min);
if ((s32_min >= 0) == (s32_max >= 0)) {
reg->s32_min_value = s32_min;
reg->s32_max_value = s32_max;
reg->u32_min_value = (u32)s32_min;
reg->u32_max_value = (u32)s32_max;
return;
}
out:
set_sext32_default_val(reg, size);
}
static bool bpf_map_is_rdonly(const struct bpf_map *map) static bool bpf_map_is_rdonly(const struct bpf_map *map)
{ {
/* A map is considered read-only if the following condition are true: /* A map is considered read-only if the following condition are true:
@@ -5847,7 +5995,8 @@ static bool bpf_map_is_rdonly(const struct bpf_map *map)
!bpf_map_write_active(map); !bpf_map_write_active(map);
} }
static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val) static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val,
bool is_ldsx)
{ {
void *ptr; void *ptr;
u64 addr; u64 addr;
@@ -5860,13 +6009,13 @@ static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val)
switch (size) { switch (size) {
case sizeof(u8): case sizeof(u8):
*val = (u64)*(u8 *)ptr; *val = is_ldsx ? (s64)*(s8 *)ptr : (u64)*(u8 *)ptr;
break; break;
case sizeof(u16): case sizeof(u16):
*val = (u64)*(u16 *)ptr; *val = is_ldsx ? (s64)*(s16 *)ptr : (u64)*(u16 *)ptr;
break; break;
case sizeof(u32): case sizeof(u32):
*val = (u64)*(u32 *)ptr; *val = is_ldsx ? (s64)*(s32 *)ptr : (u64)*(u32 *)ptr;
break; break;
case sizeof(u64): case sizeof(u64):
*val = *(u64 *)ptr; *val = *(u64 *)ptr;
@@ -6285,7 +6434,7 @@ static int check_stack_access_within_bounds(
*/ */
static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno, static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno,
int off, int bpf_size, enum bpf_access_type t, int off, int bpf_size, enum bpf_access_type t,
int value_regno, bool strict_alignment_once) int value_regno, bool strict_alignment_once, bool is_ldsx)
{ {
struct bpf_reg_state *regs = cur_regs(env); struct bpf_reg_state *regs = cur_regs(env);
struct bpf_reg_state *reg = regs + regno; struct bpf_reg_state *reg = regs + regno;
@@ -6346,7 +6495,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
u64 val = 0; u64 val = 0;
err = bpf_map_direct_read(map, map_off, size, err = bpf_map_direct_read(map, map_off, size,
&val); &val, is_ldsx);
if (err) if (err)
return err; return err;
@@ -6516,8 +6665,11 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ && if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
regs[value_regno].type == SCALAR_VALUE) { regs[value_regno].type == SCALAR_VALUE) {
if (!is_ldsx)
/* b/h/w load zero-extends, mark upper bits as known 0 */ /* b/h/w load zero-extends, mark upper bits as known 0 */
coerce_reg_to_size(&regs[value_regno], size); coerce_reg_to_size(&regs[value_regno], size);
else
coerce_reg_to_size_sx(&regs[value_regno], size);
} }
return err; return err;
} }
@@ -6609,17 +6761,17 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
* case to simulate the register fill. * case to simulate the register fill.
*/ */
err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
BPF_SIZE(insn->code), BPF_READ, -1, true); BPF_SIZE(insn->code), BPF_READ, -1, true, false);
if (!err && load_reg >= 0) if (!err && load_reg >= 0)
err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
BPF_SIZE(insn->code), BPF_READ, load_reg, BPF_SIZE(insn->code), BPF_READ, load_reg,
true); true, false);
if (err) if (err)
return err; return err;
/* Check whether we can write into the same memory. */ /* Check whether we can write into the same memory. */
err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
BPF_SIZE(insn->code), BPF_WRITE, -1, true); BPF_SIZE(insn->code), BPF_WRITE, -1, true, false);
if (err) if (err)
return err; return err;
@@ -6865,7 +7017,7 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
return zero_size_allowed ? 0 : -EACCES; return zero_size_allowed ? 0 : -EACCES;
return check_mem_access(env, env->insn_idx, regno, offset, BPF_B, return check_mem_access(env, env->insn_idx, regno, offset, BPF_B,
atype, -1, false); atype, -1, false, false);
} }
fallthrough; fallthrough;
@@ -7237,7 +7389,7 @@ static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn
/* we write BPF_DW bits (8 bytes) at a time */ /* we write BPF_DW bits (8 bytes) at a time */
for (i = 0; i < BPF_DYNPTR_SIZE; i += 8) { for (i = 0; i < BPF_DYNPTR_SIZE; i += 8) {
err = check_mem_access(env, insn_idx, regno, err = check_mem_access(env, insn_idx, regno,
i, BPF_DW, BPF_WRITE, -1, false); i, BPF_DW, BPF_WRITE, -1, false, false);
if (err) if (err)
return err; return err;
} }
@@ -7330,7 +7482,7 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
for (i = 0; i < nr_slots * 8; i += BPF_REG_SIZE) { for (i = 0; i < nr_slots * 8; i += BPF_REG_SIZE) {
err = check_mem_access(env, insn_idx, regno, err = check_mem_access(env, insn_idx, regno,
i, BPF_DW, BPF_WRITE, -1, false); i, BPF_DW, BPF_WRITE, -1, false, false);
if (err) if (err)
return err; return err;
} }
@@ -9474,7 +9626,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
*/ */
for (i = 0; i < meta.access_size; i++) { for (i = 0; i < meta.access_size; i++) {
err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B, err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B,
BPF_WRITE, -1, false); BPF_WRITE, -1, false, false);
if (err) if (err)
return err; return err;
} }
@@ -12931,7 +13083,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
} else { } else {
if (insn->src_reg != BPF_REG_0 || insn->off != 0 || if (insn->src_reg != BPF_REG_0 || insn->off != 0 ||
(insn->imm != 16 && insn->imm != 32 && insn->imm != 64) || (insn->imm != 16 && insn->imm != 32 && insn->imm != 64) ||
BPF_CLASS(insn->code) == BPF_ALU64) { (BPF_CLASS(insn->code) == BPF_ALU64 &&
BPF_SRC(insn->code) != BPF_TO_LE)) {
verbose(env, "BPF_END uses reserved fields\n"); verbose(env, "BPF_END uses reserved fields\n");
return -EINVAL; return -EINVAL;
} }
@@ -12956,11 +13109,24 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
} else if (opcode == BPF_MOV) { } else if (opcode == BPF_MOV) {
if (BPF_SRC(insn->code) == BPF_X) { if (BPF_SRC(insn->code) == BPF_X) {
if (insn->imm != 0 || insn->off != 0) { if (insn->imm != 0) {
verbose(env, "BPF_MOV uses reserved fields\n"); verbose(env, "BPF_MOV uses reserved fields\n");
return -EINVAL; return -EINVAL;
} }
if (BPF_CLASS(insn->code) == BPF_ALU) {
if (insn->off != 0 && insn->off != 8 && insn->off != 16) {
verbose(env, "BPF_MOV uses reserved fields\n");
return -EINVAL;
}
} else {
if (insn->off != 0 && insn->off != 8 && insn->off != 16 &&
insn->off != 32) {
verbose(env, "BPF_MOV uses reserved fields\n");
return -EINVAL;
}
}
/* check src operand */ /* check src operand */
err = check_reg_arg(env, insn->src_reg, SRC_OP); err = check_reg_arg(env, insn->src_reg, SRC_OP);
if (err) if (err)
@@ -12984,6 +13150,7 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
!tnum_is_const(src_reg->var_off); !tnum_is_const(src_reg->var_off);
if (BPF_CLASS(insn->code) == BPF_ALU64) { if (BPF_CLASS(insn->code) == BPF_ALU64) {
if (insn->off == 0) {
/* case: R1 = R2 /* case: R1 = R2
* copy register state to dest reg * copy register state to dest reg
*/ */
@@ -12996,6 +13163,20 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
copy_register_state(dst_reg, src_reg); copy_register_state(dst_reg, src_reg);
dst_reg->live |= REG_LIVE_WRITTEN; dst_reg->live |= REG_LIVE_WRITTEN;
dst_reg->subreg_def = DEF_NOT_SUBREG; dst_reg->subreg_def = DEF_NOT_SUBREG;
} else {
/* case: R1 = (s8, s16 s32)R2 */
bool no_sext;
no_sext = src_reg->umax_value < (1ULL << (insn->off - 1));
if (no_sext && need_id)
src_reg->id = ++env->id_gen;
copy_register_state(dst_reg, src_reg);
if (!no_sext)
dst_reg->id = 0;
coerce_reg_to_size_sx(dst_reg, insn->off >> 3);
dst_reg->live |= REG_LIVE_WRITTEN;
dst_reg->subreg_def = DEF_NOT_SUBREG;
}
} else { } else {
/* R1 = (u32) R2 */ /* R1 = (u32) R2 */
if (is_pointer_value(env, insn->src_reg)) { if (is_pointer_value(env, insn->src_reg)) {
@@ -13004,19 +13185,33 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
insn->src_reg); insn->src_reg);
return -EACCES; return -EACCES;
} else if (src_reg->type == SCALAR_VALUE) { } else if (src_reg->type == SCALAR_VALUE) {
if (insn->off == 0) {
bool is_src_reg_u32 = src_reg->umax_value <= U32_MAX; bool is_src_reg_u32 = src_reg->umax_value <= U32_MAX;
if (is_src_reg_u32 && need_id) if (is_src_reg_u32 && need_id)
src_reg->id = ++env->id_gen; src_reg->id = ++env->id_gen;
copy_register_state(dst_reg, src_reg); copy_register_state(dst_reg, src_reg);
/* Make sure ID is cleared if src_reg is not in u32 range otherwise /* Make sure ID is cleared if src_reg is not in u32
* dst_reg min/max could be incorrectly * range otherwise dst_reg min/max could be incorrectly
* propagated into src_reg by find_equal_scalars() * propagated into src_reg by find_equal_scalars()
*/ */
if (!is_src_reg_u32) if (!is_src_reg_u32)
dst_reg->id = 0; dst_reg->id = 0;
dst_reg->live |= REG_LIVE_WRITTEN; dst_reg->live |= REG_LIVE_WRITTEN;
dst_reg->subreg_def = env->insn_idx + 1; dst_reg->subreg_def = env->insn_idx + 1;
} else {
/* case: W1 = (s8, s16)W2 */
bool no_sext = src_reg->umax_value < (1ULL << (insn->off - 1));
if (no_sext && need_id)
src_reg->id = ++env->id_gen;
copy_register_state(dst_reg, src_reg);
if (!no_sext)
dst_reg->id = 0;
dst_reg->live |= REG_LIVE_WRITTEN;
dst_reg->subreg_def = env->insn_idx + 1;
coerce_subreg_to_size_sx(dst_reg, insn->off >> 3);
}
} else { } else {
mark_reg_unknown(env, regs, mark_reg_unknown(env, regs,
insn->dst_reg); insn->dst_reg);
@@ -13047,7 +13242,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
} else { /* all other ALU ops: and, sub, xor, add, ... */ } else { /* all other ALU ops: and, sub, xor, add, ... */
if (BPF_SRC(insn->code) == BPF_X) { if (BPF_SRC(insn->code) == BPF_X) {
if (insn->imm != 0 || insn->off != 0) { if (insn->imm != 0 || insn->off > 1 ||
(insn->off == 1 && opcode != BPF_MOD && opcode != BPF_DIV)) {
verbose(env, "BPF_ALU uses reserved fields\n"); verbose(env, "BPF_ALU uses reserved fields\n");
return -EINVAL; return -EINVAL;
} }
@@ -13056,7 +13252,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
if (err) if (err)
return err; return err;
} else { } else {
if (insn->src_reg != BPF_REG_0 || insn->off != 0) { if (insn->src_reg != BPF_REG_0 || insn->off > 1 ||
(insn->off == 1 && opcode != BPF_MOD && opcode != BPF_DIV)) {
verbose(env, "BPF_ALU uses reserved fields\n"); verbose(env, "BPF_ALU uses reserved fields\n");
return -EINVAL; return -EINVAL;
} }
@@ -14600,7 +14797,7 @@ static int visit_func_call_insn(int t, struct bpf_insn *insns,
static int visit_insn(int t, struct bpf_verifier_env *env) static int visit_insn(int t, struct bpf_verifier_env *env)
{ {
struct bpf_insn *insns = env->prog->insnsi, *insn = &insns[t]; struct bpf_insn *insns = env->prog->insnsi, *insn = &insns[t];
int ret; int ret, off;
if (bpf_pseudo_func(insn)) if (bpf_pseudo_func(insn))
return visit_func_call_insn(t, insns, env, true); return visit_func_call_insn(t, insns, env, true);
@@ -14648,14 +14845,19 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
if (BPF_SRC(insn->code) != BPF_K) if (BPF_SRC(insn->code) != BPF_K)
return -EINVAL; return -EINVAL;
if (BPF_CLASS(insn->code) == BPF_JMP)
off = insn->off;
else
off = insn->imm;
/* unconditional jump with single edge */ /* unconditional jump with single edge */
ret = push_insn(t, t + insn->off + 1, FALLTHROUGH, env, ret = push_insn(t, t + off + 1, FALLTHROUGH, env,
true); true);
if (ret) if (ret)
return ret; return ret;
mark_prune_point(env, t + insn->off + 1); mark_prune_point(env, t + off + 1);
mark_jmp_point(env, t + insn->off + 1); mark_jmp_point(env, t + off + 1);
return ret; return ret;
@@ -16202,7 +16404,7 @@ static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type typ
* Have to support a use case when one path through * Have to support a use case when one path through
* the program yields TRUSTED pointer while another * the program yields TRUSTED pointer while another
* is UNTRUSTED. Fallback to UNTRUSTED to generate * is UNTRUSTED. Fallback to UNTRUSTED to generate
* BPF_PROBE_MEM. * BPF_PROBE_MEM/BPF_PROBE_MEMSX.
*/ */
*prev_type = PTR_TO_BTF_ID | PTR_UNTRUSTED; *prev_type = PTR_TO_BTF_ID | PTR_UNTRUSTED;
} else { } else {
@@ -16343,7 +16545,8 @@ static int do_check(struct bpf_verifier_env *env)
*/ */
err = check_mem_access(env, env->insn_idx, insn->src_reg, err = check_mem_access(env, env->insn_idx, insn->src_reg,
insn->off, BPF_SIZE(insn->code), insn->off, BPF_SIZE(insn->code),
BPF_READ, insn->dst_reg, false); BPF_READ, insn->dst_reg, false,
BPF_MODE(insn->code) == BPF_MEMSX);
if (err) if (err)
return err; return err;
@@ -16380,7 +16583,7 @@ static int do_check(struct bpf_verifier_env *env)
/* check that memory (dst_reg + off) is writeable */ /* check that memory (dst_reg + off) is writeable */
err = check_mem_access(env, env->insn_idx, insn->dst_reg, err = check_mem_access(env, env->insn_idx, insn->dst_reg,
insn->off, BPF_SIZE(insn->code), insn->off, BPF_SIZE(insn->code),
BPF_WRITE, insn->src_reg, false); BPF_WRITE, insn->src_reg, false, false);
if (err) if (err)
return err; return err;
@@ -16405,7 +16608,7 @@ static int do_check(struct bpf_verifier_env *env)
/* check that memory (dst_reg + off) is writeable */ /* check that memory (dst_reg + off) is writeable */
err = check_mem_access(env, env->insn_idx, insn->dst_reg, err = check_mem_access(env, env->insn_idx, insn->dst_reg,
insn->off, BPF_SIZE(insn->code), insn->off, BPF_SIZE(insn->code),
BPF_WRITE, -1, false); BPF_WRITE, -1, false, false);
if (err) if (err)
return err; return err;
@@ -16450,15 +16653,18 @@ static int do_check(struct bpf_verifier_env *env)
mark_reg_scratched(env, BPF_REG_0); mark_reg_scratched(env, BPF_REG_0);
} else if (opcode == BPF_JA) { } else if (opcode == BPF_JA) {
if (BPF_SRC(insn->code) != BPF_K || if (BPF_SRC(insn->code) != BPF_K ||
insn->imm != 0 ||
insn->src_reg != BPF_REG_0 || insn->src_reg != BPF_REG_0 ||
insn->dst_reg != BPF_REG_0 || insn->dst_reg != BPF_REG_0 ||
class == BPF_JMP32) { (class == BPF_JMP && insn->imm != 0) ||
(class == BPF_JMP32 && insn->off != 0)) {
verbose(env, "BPF_JA uses reserved fields\n"); verbose(env, "BPF_JA uses reserved fields\n");
return -EINVAL; return -EINVAL;
} }
if (class == BPF_JMP)
env->insn_idx += insn->off + 1; env->insn_idx += insn->off + 1;
else
env->insn_idx += insn->imm + 1;
continue; continue;
} else if (opcode == BPF_EXIT) { } else if (opcode == BPF_EXIT) {
@@ -16833,7 +17039,8 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
for (i = 0; i < insn_cnt; i++, insn++) { for (i = 0; i < insn_cnt; i++, insn++) {
if (BPF_CLASS(insn->code) == BPF_LDX && if (BPF_CLASS(insn->code) == BPF_LDX &&
(BPF_MODE(insn->code) != BPF_MEM || insn->imm != 0)) { ((BPF_MODE(insn->code) != BPF_MEM && BPF_MODE(insn->code) != BPF_MEMSX) ||
insn->imm != 0)) {
verbose(env, "BPF_LDX uses reserved fields\n"); verbose(env, "BPF_LDX uses reserved fields\n");
return -EINVAL; return -EINVAL;
} }
@@ -17304,13 +17511,13 @@ static bool insn_is_cond_jump(u8 code)
{ {
u8 op; u8 op;
op = BPF_OP(code);
if (BPF_CLASS(code) == BPF_JMP32) if (BPF_CLASS(code) == BPF_JMP32)
return true; return op != BPF_JA;
if (BPF_CLASS(code) != BPF_JMP) if (BPF_CLASS(code) != BPF_JMP)
return false; return false;
op = BPF_OP(code);
return op != BPF_JA && op != BPF_EXIT && op != BPF_CALL; return op != BPF_JA && op != BPF_EXIT && op != BPF_CALL;
} }
@@ -17527,11 +17734,15 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
for (i = 0; i < insn_cnt; i++, insn++) { for (i = 0; i < insn_cnt; i++, insn++) {
bpf_convert_ctx_access_t convert_ctx_access; bpf_convert_ctx_access_t convert_ctx_access;
u8 mode;
if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) || if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) ||
insn->code == (BPF_LDX | BPF_MEM | BPF_H) || insn->code == (BPF_LDX | BPF_MEM | BPF_H) ||
insn->code == (BPF_LDX | BPF_MEM | BPF_W) || insn->code == (BPF_LDX | BPF_MEM | BPF_W) ||
insn->code == (BPF_LDX | BPF_MEM | BPF_DW)) { insn->code == (BPF_LDX | BPF_MEM | BPF_DW) ||
insn->code == (BPF_LDX | BPF_MEMSX | BPF_B) ||
insn->code == (BPF_LDX | BPF_MEMSX | BPF_H) ||
insn->code == (BPF_LDX | BPF_MEMSX | BPF_W)) {
type = BPF_READ; type = BPF_READ;
} else if (insn->code == (BPF_STX | BPF_MEM | BPF_B) || } else if (insn->code == (BPF_STX | BPF_MEM | BPF_B) ||
insn->code == (BPF_STX | BPF_MEM | BPF_H) || insn->code == (BPF_STX | BPF_MEM | BPF_H) ||
@@ -17590,8 +17801,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
*/ */
case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED: case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED:
if (type == BPF_READ) { if (type == BPF_READ) {
if (BPF_MODE(insn->code) == BPF_MEM)
insn->code = BPF_LDX | BPF_PROBE_MEM | insn->code = BPF_LDX | BPF_PROBE_MEM |
BPF_SIZE((insn)->code); BPF_SIZE((insn)->code);
else
insn->code = BPF_LDX | BPF_PROBE_MEMSX |
BPF_SIZE((insn)->code);
env->prog->aux->num_exentries++; env->prog->aux->num_exentries++;
} }
continue; continue;
@@ -17601,6 +17816,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
ctx_field_size = env->insn_aux_data[i + delta].ctx_field_size; ctx_field_size = env->insn_aux_data[i + delta].ctx_field_size;
size = BPF_LDST_BYTES(insn); size = BPF_LDST_BYTES(insn);
mode = BPF_MODE(insn->code);
/* If the read access is a narrower load of the field, /* If the read access is a narrower load of the field,
* convert to a 4/8-byte load, to minimum program type specific * convert to a 4/8-byte load, to minimum program type specific
@@ -17660,6 +17876,10 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
(1ULL << size * 8) - 1); (1ULL << size * 8) - 1);
} }
} }
if (mode == BPF_MEMSX)
insn_buf[cnt++] = BPF_RAW_INSN(BPF_ALU64 | BPF_MOV | BPF_X,
insn->dst_reg, insn->dst_reg,
size * 8, 0);
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog) if (!new_prog)
@@ -17779,7 +17999,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
insn = func[i]->insnsi; insn = func[i]->insnsi;
for (j = 0; j < func[i]->len; j++, insn++) { for (j = 0; j < func[i]->len; j++, insn++) {
if (BPF_CLASS(insn->code) == BPF_LDX && if (BPF_CLASS(insn->code) == BPF_LDX &&
BPF_MODE(insn->code) == BPF_PROBE_MEM) (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
BPF_MODE(insn->code) == BPF_PROBE_MEMSX))
num_exentries++; num_exentries++;
} }
func[i]->aux->num_exentries = num_exentries; func[i]->aux->num_exentries = num_exentries;

View File

@@ -223,17 +223,6 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto = {
.arg3_type = ARG_ANYTHING, .arg3_type = ARG_ANYTHING,
}; };
static __always_inline int
bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr)
{
int ret;
ret = copy_from_kernel_nofault(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
memset(dst, 0, size);
return ret;
}
BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
const void *, unsafe_ptr) const void *, unsafe_ptr)
{ {

View File

@@ -555,12 +555,15 @@ static int perf_call_bpf_enter(struct trace_event_call *call, struct pt_regs *re
struct syscall_trace_enter *rec) struct syscall_trace_enter *rec)
{ {
struct syscall_tp_t { struct syscall_tp_t {
unsigned long long regs; struct trace_entry ent;
unsigned long syscall_nr; unsigned long syscall_nr;
unsigned long args[SYSCALL_DEFINE_MAXARGS]; unsigned long args[SYSCALL_DEFINE_MAXARGS];
} param; } __aligned(8) param;
int i; int i;
BUILD_BUG_ON(sizeof(param.ent) < sizeof(void *));
/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
*(struct pt_regs **)&param = regs; *(struct pt_regs **)&param = regs;
param.syscall_nr = rec->nr; param.syscall_nr = rec->nr;
for (i = 0; i < sys_data->nb_args; i++) for (i = 0; i < sys_data->nb_args; i++)
@@ -657,11 +660,12 @@ static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_regs *reg
struct syscall_trace_exit *rec) struct syscall_trace_exit *rec)
{ {
struct syscall_tp_t { struct syscall_tp_t {
unsigned long long regs; struct trace_entry ent;
unsigned long syscall_nr; unsigned long syscall_nr;
unsigned long ret; unsigned long ret;
} param; } __aligned(8) param;
/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
*(struct pt_regs **)&param = regs; *(struct pt_regs **)&param = regs;
param.syscall_nr = rec->nr; param.syscall_nr = rec->nr;
param.ret = rec->ret; param.ret = rec->ret;

View File

@@ -20,6 +20,7 @@
#include <linux/smp.h> #include <linux/smp.h>
#include <linux/sock_diag.h> #include <linux/sock_diag.h>
#include <linux/netfilter.h> #include <linux/netfilter.h>
#include <net/netdev_rx_queue.h>
#include <net/xdp.h> #include <net/xdp.h>
#include <net/netfilter/nf_bpf_link.h> #include <net/netfilter/nf_bpf_link.h>

View File

@@ -133,6 +133,7 @@
#include <trace/events/net.h> #include <trace/events/net.h>
#include <trace/events/skb.h> #include <trace/events/skb.h>
#include <trace/events/qdisc.h> #include <trace/events/qdisc.h>
#include <trace/events/xdp.h>
#include <linux/inetdevice.h> #include <linux/inetdevice.h>
#include <linux/cpu_rmap.h> #include <linux/cpu_rmap.h>
#include <linux/static_key.h> #include <linux/static_key.h>
@@ -151,6 +152,7 @@
#include <linux/pm_runtime.h> #include <linux/pm_runtime.h>
#include <linux/prandom.h> #include <linux/prandom.h>
#include <linux/once_lite.h> #include <linux/once_lite.h>
#include <net/netdev_rx_queue.h>
#include "dev.h" #include "dev.h"
#include "net-sysfs.h" #include "net-sysfs.h"
@@ -9475,6 +9477,7 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{ {
struct net *net = current->nsproxy->net_ns; struct net *net = current->nsproxy->net_ns;
struct bpf_link_primer link_primer; struct bpf_link_primer link_primer;
struct netlink_ext_ack extack = {};
struct bpf_xdp_link *link; struct bpf_xdp_link *link;
struct net_device *dev; struct net_device *dev;
int err, fd; int err, fd;
@@ -9502,12 +9505,13 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
goto unlock; goto unlock;
} }
err = dev_xdp_attach_link(dev, NULL, link); err = dev_xdp_attach_link(dev, &extack, link);
rtnl_unlock(); rtnl_unlock();
if (err) { if (err) {
link->dev = NULL; link->dev = NULL;
bpf_link_cleanup(&link_primer); bpf_link_cleanup(&link_primer);
trace_bpf_xdp_link_attach_failed(extack._msg);
goto out_put_dev; goto out_put_dev;
} }

View File

@@ -7351,8 +7351,8 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
return -EOPNOTSUPP; return -EOPNOTSUPP;
if (unlikely(dev_net(skb->dev) != sock_net(sk))) if (unlikely(dev_net(skb->dev) != sock_net(sk)))
return -ENETUNREACH; return -ENETUNREACH;
if (unlikely(sk_fullsock(sk) && sk->sk_reuseport)) if (sk_unhashed(sk))
return -ESOCKTNOSUPPORT; return -EOPNOTSUPP;
if (sk_is_refcounted(sk) && if (sk_is_refcounted(sk) &&
unlikely(!refcount_inc_not_zero(&sk->sk_refcnt))) unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
return -ENOENT; return -ENOENT;

View File

@@ -23,6 +23,7 @@
#include <linux/of.h> #include <linux/of.h>
#include <linux/of_net.h> #include <linux/of_net.h>
#include <linux/cpu.h> #include <linux/cpu.h>
#include <net/netdev_rx_queue.h>
#include "dev.h" #include "dev.h"
#include "net-sysfs.h" #include "net-sysfs.h"

View File

@@ -28,7 +28,7 @@
#include <net/tcp.h> #include <net/tcp.h>
#include <net/sock_reuseport.h> #include <net/sock_reuseport.h>
static u32 inet_ehashfn(const struct net *net, const __be32 laddr, u32 inet_ehashfn(const struct net *net, const __be32 laddr,
const __u16 lport, const __be32 faddr, const __u16 lport, const __be32 faddr,
const __be16 fport) const __be16 fport)
{ {
@@ -39,6 +39,7 @@ static u32 inet_ehashfn(const struct net *net, const __be32 laddr,
return __inet_ehashfn(laddr, lport, faddr, fport, return __inet_ehashfn(laddr, lport, faddr, fport,
inet_ehash_secret + net_hash_mix(net)); inet_ehash_secret + net_hash_mix(net));
} }
EXPORT_SYMBOL_GPL(inet_ehashfn);
/* This function handles inet_sock, but also timewait and request sockets /* This function handles inet_sock, but also timewait and request sockets
* for IPv4/IPv6. * for IPv4/IPv6.
@@ -332,20 +333,38 @@ static inline int compute_score(struct sock *sk, struct net *net,
return score; return score;
} }
static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, /**
* inet_lookup_reuseport() - execute reuseport logic on AF_INET socket if necessary.
* @net: network namespace.
* @sk: AF_INET socket, must be in TCP_LISTEN state for TCP or TCP_CLOSE for UDP.
* @skb: context for a potential SK_REUSEPORT program.
* @doff: header offset.
* @saddr: source address.
* @sport: source port.
* @daddr: destination address.
* @hnum: destination port in host byte order.
* @ehashfn: hash function used to generate the fallback hash.
*
* Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to
* the selected sock or an error.
*/
struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
struct sk_buff *skb, int doff, struct sk_buff *skb, int doff,
__be32 saddr, __be16 sport, __be32 saddr, __be16 sport,
__be32 daddr, unsigned short hnum) __be32 daddr, unsigned short hnum,
inet_ehashfn_t *ehashfn)
{ {
struct sock *reuse_sk = NULL; struct sock *reuse_sk = NULL;
u32 phash; u32 phash;
if (sk->sk_reuseport) { if (sk->sk_reuseport) {
phash = inet_ehashfn(net, daddr, hnum, saddr, sport); phash = INDIRECT_CALL_2(ehashfn, udp_ehashfn, inet_ehashfn,
net, daddr, hnum, saddr, sport);
reuse_sk = reuseport_select_sock(sk, phash, skb, doff); reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
} }
return reuse_sk; return reuse_sk;
} }
EXPORT_SYMBOL_GPL(inet_lookup_reuseport);
/* /*
* Here are some nice properties to exploit here. The BSD API * Here are some nice properties to exploit here. The BSD API
@@ -369,8 +388,8 @@ static struct sock *inet_lhash2_lookup(struct net *net,
sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) { sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
score = compute_score(sk, net, hnum, daddr, dif, sdif); score = compute_score(sk, net, hnum, daddr, dif, sdif);
if (score > hiscore) { if (score > hiscore) {
result = lookup_reuseport(net, sk, skb, doff, result = inet_lookup_reuseport(net, sk, skb, doff,
saddr, sport, daddr, hnum); saddr, sport, daddr, hnum, inet_ehashfn);
if (result) if (result)
return result; return result;
@@ -382,24 +401,23 @@ static struct sock *inet_lhash2_lookup(struct net *net,
return result; return result;
} }
static inline struct sock *inet_lookup_run_bpf(struct net *net, struct sock *inet_lookup_run_sk_lookup(struct net *net,
struct inet_hashinfo *hashinfo, int protocol,
struct sk_buff *skb, int doff, struct sk_buff *skb, int doff,
__be32 saddr, __be16 sport, __be32 saddr, __be16 sport,
__be32 daddr, u16 hnum, const int dif) __be32 daddr, u16 hnum, const int dif,
inet_ehashfn_t *ehashfn)
{ {
struct sock *sk, *reuse_sk; struct sock *sk, *reuse_sk;
bool no_reuseport; bool no_reuseport;
if (hashinfo != net->ipv4.tcp_death_row.hashinfo) no_reuseport = bpf_sk_lookup_run_v4(net, protocol, saddr, sport,
return NULL; /* only TCP is supported */
no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_TCP, saddr, sport,
daddr, hnum, dif, &sk); daddr, hnum, dif, &sk);
if (no_reuseport || IS_ERR_OR_NULL(sk)) if (no_reuseport || IS_ERR_OR_NULL(sk))
return sk; return sk;
reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum,
ehashfn);
if (reuse_sk) if (reuse_sk)
sk = reuse_sk; sk = reuse_sk;
return sk; return sk;
@@ -417,9 +435,11 @@ struct sock *__inet_lookup_listener(struct net *net,
unsigned int hash2; unsigned int hash2;
/* Lookup redirect from BPF */ /* Lookup redirect from BPF */
if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
result = inet_lookup_run_bpf(net, hashinfo, skb, doff, hashinfo == net->ipv4.tcp_death_row.hashinfo) {
saddr, sport, daddr, hnum, dif); result = inet_lookup_run_sk_lookup(net, IPPROTO_TCP, skb, doff,
saddr, sport, daddr, hnum, dif,
inet_ehashfn);
if (result) if (result)
goto done; goto done;
} }

View File

@@ -7,6 +7,7 @@
#include <linux/ip.h> #include <linux/ip.h>
#include <linux/netfilter.h> #include <linux/netfilter.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/rcupdate.h>
#include <linux/skbuff.h> #include <linux/skbuff.h>
#include <net/netns/generic.h> #include <net/netns/generic.h>
#include <net/route.h> #include <net/route.h>
@@ -113,17 +114,31 @@ static void __net_exit defrag4_net_exit(struct net *net)
} }
} }
static const struct nf_defrag_hook defrag_hook = {
.owner = THIS_MODULE,
.enable = nf_defrag_ipv4_enable,
.disable = nf_defrag_ipv4_disable,
};
static struct pernet_operations defrag4_net_ops = { static struct pernet_operations defrag4_net_ops = {
.exit = defrag4_net_exit, .exit = defrag4_net_exit,
}; };
static int __init nf_defrag_init(void) static int __init nf_defrag_init(void)
{ {
return register_pernet_subsys(&defrag4_net_ops); int err;
err = register_pernet_subsys(&defrag4_net_ops);
if (err)
return err;
rcu_assign_pointer(nf_defrag_v4_hook, &defrag_hook);
return err;
} }
static void __exit nf_defrag_fini(void) static void __exit nf_defrag_fini(void)
{ {
rcu_assign_pointer(nf_defrag_v4_hook, NULL);
unregister_pernet_subsys(&defrag4_net_ops); unregister_pernet_subsys(&defrag4_net_ops);
} }

View File

@@ -407,9 +407,9 @@ static int compute_score(struct sock *sk, struct net *net,
return score; return score;
} }
static u32 udp_ehashfn(const struct net *net, const __be32 laddr, INDIRECT_CALLABLE_SCOPE
const __u16 lport, const __be32 faddr, u32 udp_ehashfn(const struct net *net, const __be32 laddr, const __u16 lport,
const __be16 fport) const __be32 faddr, const __be16 fport)
{ {
static u32 udp_ehash_secret __read_mostly; static u32 udp_ehash_secret __read_mostly;
@@ -419,22 +419,6 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
udp_ehash_secret + net_hash_mix(net)); udp_ehash_secret + net_hash_mix(net));
} }
static struct sock *lookup_reuseport(struct net *net, struct sock *sk,
struct sk_buff *skb,
__be32 saddr, __be16 sport,
__be32 daddr, unsigned short hnum)
{
struct sock *reuse_sk = NULL;
u32 hash;
if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
hash = udp_ehashfn(net, daddr, hnum, saddr, sport);
reuse_sk = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));
}
return reuse_sk;
}
/* called with rcu_read_lock() */ /* called with rcu_read_lock() */
static struct sock *udp4_lib_lookup2(struct net *net, static struct sock *udp4_lib_lookup2(struct net *net,
__be32 saddr, __be16 sport, __be32 saddr, __be16 sport,
@@ -452,42 +436,36 @@ static struct sock *udp4_lib_lookup2(struct net *net,
score = compute_score(sk, net, saddr, sport, score = compute_score(sk, net, saddr, sport,
daddr, hnum, dif, sdif); daddr, hnum, dif, sdif);
if (score > badness) { if (score > badness) {
result = lookup_reuseport(net, sk, skb,
saddr, sport, daddr, hnum);
/* Fall back to scoring if group has connections */
if (result && !reuseport_has_conns(sk))
return result;
result = result ? : sk;
badness = score; badness = score;
if (sk->sk_state == TCP_ESTABLISHED) {
result = sk;
continue;
}
result = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
saddr, sport, daddr, hnum, udp_ehashfn);
if (!result) {
result = sk;
continue;
}
/* Fall back to scoring if group has connections */
if (!reuseport_has_conns(sk))
return result;
/* Reuseport logic returned an error, keep original score. */
if (IS_ERR(result))
continue;
badness = compute_score(result, net, saddr, sport,
daddr, hnum, dif, sdif);
} }
} }
return result; return result;
} }
static struct sock *udp4_lookup_run_bpf(struct net *net,
struct udp_table *udptable,
struct sk_buff *skb,
__be32 saddr, __be16 sport,
__be32 daddr, u16 hnum, const int dif)
{
struct sock *sk, *reuse_sk;
bool no_reuseport;
if (udptable != net->ipv4.udp_table)
return NULL; /* only UDP is supported */
no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_UDP, saddr, sport,
daddr, hnum, dif, &sk);
if (no_reuseport || IS_ERR_OR_NULL(sk))
return sk;
reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
if (reuse_sk)
sk = reuse_sk;
return sk;
}
/* UDP is nearly always wildcards out the wazoo, it makes no sense to try /* UDP is nearly always wildcards out the wazoo, it makes no sense to try
* harder than this. -DaveM * harder than this. -DaveM
*/ */
@@ -512,9 +490,11 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
goto done; goto done;
/* Lookup redirect from BPF */ /* Lookup redirect from BPF */
if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
sk = udp4_lookup_run_bpf(net, udptable, skb, udptable == net->ipv4.udp_table) {
saddr, sport, daddr, hnum, dif); sk = inet_lookup_run_sk_lookup(net, IPPROTO_UDP, skb, sizeof(struct udphdr),
saddr, sport, daddr, hnum, dif,
udp_ehashfn);
if (sk) { if (sk) {
result = sk; result = sk;
goto done; goto done;
@@ -2412,7 +2392,11 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
if (udp4_csum_init(skb, uh, proto)) if (udp4_csum_init(skb, uh, proto))
goto csum_error; goto csum_error;
sk = skb_steal_sock(skb, &refcounted); sk = inet_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest,
&refcounted, udp_ehashfn);
if (IS_ERR(sk))
goto no_sk;
if (sk) { if (sk) {
struct dst_entry *dst = skb_dst(skb); struct dst_entry *dst = skb_dst(skb);
int ret; int ret;
@@ -2433,7 +2417,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable); sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
if (sk) if (sk)
return udp_unicast_rcv_skb(sk, skb, uh); return udp_unicast_rcv_skb(sk, skb, uh);
no_sk:
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
goto drop; goto drop;
nf_reset_ct(skb); nf_reset_ct(skb);

View File

@@ -39,6 +39,7 @@ u32 inet6_ehashfn(const struct net *net,
return __inet6_ehashfn(lhash, lport, fhash, fport, return __inet6_ehashfn(lhash, lport, fhash, fport,
inet6_ehash_secret + net_hash_mix(net)); inet6_ehash_secret + net_hash_mix(net));
} }
EXPORT_SYMBOL_GPL(inet6_ehashfn);
/* /*
* Sockets in TCP_CLOSE state are _always_ taken out of the hash, so * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so
@@ -111,22 +112,40 @@ static inline int compute_score(struct sock *sk, struct net *net,
return score; return score;
} }
static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, /**
* inet6_lookup_reuseport() - execute reuseport logic on AF_INET6 socket if necessary.
* @net: network namespace.
* @sk: AF_INET6 socket, must be in TCP_LISTEN state for TCP or TCP_CLOSE for UDP.
* @skb: context for a potential SK_REUSEPORT program.
* @doff: header offset.
* @saddr: source address.
* @sport: source port.
* @daddr: destination address.
* @hnum: destination port in host byte order.
* @ehashfn: hash function used to generate the fallback hash.
*
* Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to
* the selected sock or an error.
*/
struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk,
struct sk_buff *skb, int doff, struct sk_buff *skb, int doff,
const struct in6_addr *saddr, const struct in6_addr *saddr,
__be16 sport, __be16 sport,
const struct in6_addr *daddr, const struct in6_addr *daddr,
unsigned short hnum) unsigned short hnum,
inet6_ehashfn_t *ehashfn)
{ {
struct sock *reuse_sk = NULL; struct sock *reuse_sk = NULL;
u32 phash; u32 phash;
if (sk->sk_reuseport) { if (sk->sk_reuseport) {
phash = inet6_ehashfn(net, daddr, hnum, saddr, sport); phash = INDIRECT_CALL_INET(ehashfn, udp6_ehashfn, inet6_ehashfn,
net, daddr, hnum, saddr, sport);
reuse_sk = reuseport_select_sock(sk, phash, skb, doff); reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
} }
return reuse_sk; return reuse_sk;
} }
EXPORT_SYMBOL_GPL(inet6_lookup_reuseport);
/* called with rcu_read_lock() */ /* called with rcu_read_lock() */
static struct sock *inet6_lhash2_lookup(struct net *net, static struct sock *inet6_lhash2_lookup(struct net *net,
@@ -143,8 +162,8 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) { sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
score = compute_score(sk, net, hnum, daddr, dif, sdif); score = compute_score(sk, net, hnum, daddr, dif, sdif);
if (score > hiscore) { if (score > hiscore) {
result = lookup_reuseport(net, sk, skb, doff, result = inet6_lookup_reuseport(net, sk, skb, doff,
saddr, sport, daddr, hnum); saddr, sport, daddr, hnum, inet6_ehashfn);
if (result) if (result)
return result; return result;
@@ -156,30 +175,30 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
return result; return result;
} }
static inline struct sock *inet6_lookup_run_bpf(struct net *net, struct sock *inet6_lookup_run_sk_lookup(struct net *net,
struct inet_hashinfo *hashinfo, int protocol,
struct sk_buff *skb, int doff, struct sk_buff *skb, int doff,
const struct in6_addr *saddr, const struct in6_addr *saddr,
const __be16 sport, const __be16 sport,
const struct in6_addr *daddr, const struct in6_addr *daddr,
const u16 hnum, const int dif) const u16 hnum, const int dif,
inet6_ehashfn_t *ehashfn)
{ {
struct sock *sk, *reuse_sk; struct sock *sk, *reuse_sk;
bool no_reuseport; bool no_reuseport;
if (hashinfo != net->ipv4.tcp_death_row.hashinfo) no_reuseport = bpf_sk_lookup_run_v6(net, protocol, saddr, sport,
return NULL; /* only TCP is supported */
no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_TCP, saddr, sport,
daddr, hnum, dif, &sk); daddr, hnum, dif, &sk);
if (no_reuseport || IS_ERR_OR_NULL(sk)) if (no_reuseport || IS_ERR_OR_NULL(sk))
return sk; return sk;
reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff,
saddr, sport, daddr, hnum, ehashfn);
if (reuse_sk) if (reuse_sk)
sk = reuse_sk; sk = reuse_sk;
return sk; return sk;
} }
EXPORT_SYMBOL_GPL(inet6_lookup_run_sk_lookup);
struct sock *inet6_lookup_listener(struct net *net, struct sock *inet6_lookup_listener(struct net *net,
struct inet_hashinfo *hashinfo, struct inet_hashinfo *hashinfo,
@@ -193,9 +212,11 @@ struct sock *inet6_lookup_listener(struct net *net,
unsigned int hash2; unsigned int hash2;
/* Lookup redirect from BPF */ /* Lookup redirect from BPF */
if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
result = inet6_lookup_run_bpf(net, hashinfo, skb, doff, hashinfo == net->ipv4.tcp_death_row.hashinfo) {
saddr, sport, daddr, hnum, dif); result = inet6_lookup_run_sk_lookup(net, IPPROTO_TCP, skb, doff,
saddr, sport, daddr, hnum, dif,
inet6_ehashfn);
if (result) if (result)
goto done; goto done;
} }

View File

@@ -10,6 +10,7 @@
#include <linux/module.h> #include <linux/module.h>
#include <linux/skbuff.h> #include <linux/skbuff.h>
#include <linux/icmp.h> #include <linux/icmp.h>
#include <linux/rcupdate.h>
#include <linux/sysctl.h> #include <linux/sysctl.h>
#include <net/ipv6_frag.h> #include <net/ipv6_frag.h>
@@ -96,6 +97,12 @@ static void __net_exit defrag6_net_exit(struct net *net)
} }
} }
static const struct nf_defrag_hook defrag_hook = {
.owner = THIS_MODULE,
.enable = nf_defrag_ipv6_enable,
.disable = nf_defrag_ipv6_disable,
};
static struct pernet_operations defrag6_net_ops = { static struct pernet_operations defrag6_net_ops = {
.exit = defrag6_net_exit, .exit = defrag6_net_exit,
}; };
@@ -114,6 +121,9 @@ static int __init nf_defrag_init(void)
pr_err("nf_defrag_ipv6: can't register pernet ops\n"); pr_err("nf_defrag_ipv6: can't register pernet ops\n");
goto cleanup_frag6; goto cleanup_frag6;
} }
rcu_assign_pointer(nf_defrag_v6_hook, &defrag_hook);
return ret; return ret;
cleanup_frag6: cleanup_frag6:
@@ -124,6 +134,7 @@ cleanup_frag6:
static void __exit nf_defrag_fini(void) static void __exit nf_defrag_fini(void)
{ {
rcu_assign_pointer(nf_defrag_v6_hook, NULL);
unregister_pernet_subsys(&defrag6_net_ops); unregister_pernet_subsys(&defrag6_net_ops);
nf_ct_frag6_cleanup(); nf_ct_frag6_cleanup();
} }

View File

@@ -72,7 +72,8 @@ int udpv6_init_sock(struct sock *sk)
return 0; return 0;
} }
static u32 udp6_ehashfn(const struct net *net, INDIRECT_CALLABLE_SCOPE
u32 udp6_ehashfn(const struct net *net,
const struct in6_addr *laddr, const struct in6_addr *laddr,
const u16 lport, const u16 lport,
const struct in6_addr *faddr, const struct in6_addr *faddr,
@@ -161,24 +162,6 @@ static int compute_score(struct sock *sk, struct net *net,
return score; return score;
} }
static struct sock *lookup_reuseport(struct net *net, struct sock *sk,
struct sk_buff *skb,
const struct in6_addr *saddr,
__be16 sport,
const struct in6_addr *daddr,
unsigned int hnum)
{
struct sock *reuse_sk = NULL;
u32 hash;
if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
hash = udp6_ehashfn(net, daddr, hnum, saddr, sport);
reuse_sk = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));
}
return reuse_sk;
}
/* called with rcu_read_lock() */ /* called with rcu_read_lock() */
static struct sock *udp6_lib_lookup2(struct net *net, static struct sock *udp6_lib_lookup2(struct net *net,
const struct in6_addr *saddr, __be16 sport, const struct in6_addr *saddr, __be16 sport,
@@ -195,44 +178,35 @@ static struct sock *udp6_lib_lookup2(struct net *net,
score = compute_score(sk, net, saddr, sport, score = compute_score(sk, net, saddr, sport,
daddr, hnum, dif, sdif); daddr, hnum, dif, sdif);
if (score > badness) { if (score > badness) {
result = lookup_reuseport(net, sk, skb,
saddr, sport, daddr, hnum);
/* Fall back to scoring if group has connections */
if (result && !reuseport_has_conns(sk))
return result;
result = result ? : sk;
badness = score; badness = score;
if (sk->sk_state == TCP_ESTABLISHED) {
result = sk;
continue;
}
result = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
saddr, sport, daddr, hnum, udp6_ehashfn);
if (!result) {
result = sk;
continue;
}
/* Fall back to scoring if group has connections */
if (!reuseport_has_conns(sk))
return result;
/* Reuseport logic returned an error, keep original score. */
if (IS_ERR(result))
continue;
badness = compute_score(sk, net, saddr, sport,
daddr, hnum, dif, sdif);
} }
} }
return result; return result;
} }
static inline struct sock *udp6_lookup_run_bpf(struct net *net,
struct udp_table *udptable,
struct sk_buff *skb,
const struct in6_addr *saddr,
__be16 sport,
const struct in6_addr *daddr,
u16 hnum, const int dif)
{
struct sock *sk, *reuse_sk;
bool no_reuseport;
if (udptable != net->ipv4.udp_table)
return NULL; /* only UDP is supported */
no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_UDP, saddr, sport,
daddr, hnum, dif, &sk);
if (no_reuseport || IS_ERR_OR_NULL(sk))
return sk;
reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
if (reuse_sk)
sk = reuse_sk;
return sk;
}
/* rcu_read_lock() must be held */ /* rcu_read_lock() must be held */
struct sock *__udp6_lib_lookup(struct net *net, struct sock *__udp6_lib_lookup(struct net *net,
const struct in6_addr *saddr, __be16 sport, const struct in6_addr *saddr, __be16 sport,
@@ -257,9 +231,11 @@ struct sock *__udp6_lib_lookup(struct net *net,
goto done; goto done;
/* Lookup redirect from BPF */ /* Lookup redirect from BPF */
if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
sk = udp6_lookup_run_bpf(net, udptable, skb, udptable == net->ipv4.udp_table) {
saddr, sport, daddr, hnum, dif); sk = inet6_lookup_run_sk_lookup(net, IPPROTO_UDP, skb, sizeof(struct udphdr),
saddr, sport, daddr, hnum, dif,
udp6_ehashfn);
if (sk) { if (sk) {
result = sk; result = sk;
goto done; goto done;
@@ -992,7 +968,11 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
goto csum_error; goto csum_error;
/* Check if the socket is already available, e.g. due to early demux */ /* Check if the socket is already available, e.g. due to early demux */
sk = skb_steal_sock(skb, &refcounted); sk = inet6_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest,
&refcounted, udp6_ehashfn);
if (IS_ERR(sk))
goto no_sk;
if (sk) { if (sk) {
struct dst_entry *dst = skb_dst(skb); struct dst_entry *dst = skb_dst(skb);
int ret; int ret;
@@ -1026,7 +1006,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
goto report_csum_error; goto report_csum_error;
return udp6_unicast_rcv_skb(sk, skb, uh); return udp6_unicast_rcv_skb(sk, skb, uh);
} }
no_sk:
reason = SKB_DROP_REASON_NO_SOCKET; reason = SKB_DROP_REASON_NO_SOCKET;
if (!uh->check) if (!uh->check)

View File

@@ -680,6 +680,12 @@ EXPORT_SYMBOL_GPL(nfnl_ct_hook);
const struct nf_ct_hook __rcu *nf_ct_hook __read_mostly; const struct nf_ct_hook __rcu *nf_ct_hook __read_mostly;
EXPORT_SYMBOL_GPL(nf_ct_hook); EXPORT_SYMBOL_GPL(nf_ct_hook);
const struct nf_defrag_hook __rcu *nf_defrag_v4_hook __read_mostly;
EXPORT_SYMBOL_GPL(nf_defrag_v4_hook);
const struct nf_defrag_hook __rcu *nf_defrag_v6_hook __read_mostly;
EXPORT_SYMBOL_GPL(nf_defrag_v6_hook);
#if IS_ENABLED(CONFIG_NF_CONNTRACK) #if IS_ENABLED(CONFIG_NF_CONNTRACK)
u8 nf_ctnetlink_has_listener; u8 nf_ctnetlink_has_listener;
EXPORT_SYMBOL_GPL(nf_ctnetlink_has_listener); EXPORT_SYMBOL_GPL(nf_ctnetlink_has_listener);

View File

@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0 // SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h> #include <linux/bpf.h>
#include <linux/filter.h> #include <linux/filter.h>
#include <linux/kmod.h>
#include <linux/module.h>
#include <linux/netfilter.h> #include <linux/netfilter.h>
#include <net/netfilter/nf_bpf_link.h> #include <net/netfilter/nf_bpf_link.h>
@@ -23,8 +25,90 @@ struct bpf_nf_link {
struct nf_hook_ops hook_ops; struct nf_hook_ops hook_ops;
struct net *net; struct net *net;
u32 dead; u32 dead;
const struct nf_defrag_hook *defrag_hook;
}; };
#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) || IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
static const struct nf_defrag_hook *
get_proto_defrag_hook(struct bpf_nf_link *link,
const struct nf_defrag_hook __rcu *global_hook,
const char *mod)
{
const struct nf_defrag_hook *hook;
int err;
/* RCU protects us from races against module unloading */
rcu_read_lock();
hook = rcu_dereference(global_hook);
if (!hook) {
rcu_read_unlock();
err = request_module(mod);
if (err)
return ERR_PTR(err < 0 ? err : -EINVAL);
rcu_read_lock();
hook = rcu_dereference(global_hook);
}
if (hook && try_module_get(hook->owner)) {
/* Once we have a refcnt on the module, we no longer need RCU */
hook = rcu_pointer_handoff(hook);
} else {
WARN_ONCE(!hook, "%s has bad registration", mod);
hook = ERR_PTR(-ENOENT);
}
rcu_read_unlock();
if (!IS_ERR(hook)) {
err = hook->enable(link->net);
if (err) {
module_put(hook->owner);
hook = ERR_PTR(err);
}
}
return hook;
}
#endif
static int bpf_nf_enable_defrag(struct bpf_nf_link *link)
{
const struct nf_defrag_hook __maybe_unused *hook;
switch (link->hook_ops.pf) {
#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)
case NFPROTO_IPV4:
hook = get_proto_defrag_hook(link, nf_defrag_v4_hook, "nf_defrag_ipv4");
if (IS_ERR(hook))
return PTR_ERR(hook);
link->defrag_hook = hook;
return 0;
#endif
#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
case NFPROTO_IPV6:
hook = get_proto_defrag_hook(link, nf_defrag_v6_hook, "nf_defrag_ipv6");
if (IS_ERR(hook))
return PTR_ERR(hook);
link->defrag_hook = hook;
return 0;
#endif
default:
return -EAFNOSUPPORT;
}
}
static void bpf_nf_disable_defrag(struct bpf_nf_link *link)
{
const struct nf_defrag_hook *hook = link->defrag_hook;
if (!hook)
return;
hook->disable(link->net);
module_put(hook->owner);
}
static void bpf_nf_link_release(struct bpf_link *link) static void bpf_nf_link_release(struct bpf_link *link)
{ {
struct bpf_nf_link *nf_link = container_of(link, struct bpf_nf_link, link); struct bpf_nf_link *nf_link = container_of(link, struct bpf_nf_link, link);
@@ -32,11 +116,11 @@ static void bpf_nf_link_release(struct bpf_link *link)
if (nf_link->dead) if (nf_link->dead)
return; return;
/* prevent hook-not-found warning splat from netfilter core when /* do not double release in case .detach was already called */
* .detach was already called if (!cmpxchg(&nf_link->dead, 0, 1)) {
*/
if (!cmpxchg(&nf_link->dead, 0, 1))
nf_unregister_net_hook(nf_link->net, &nf_link->hook_ops); nf_unregister_net_hook(nf_link->net, &nf_link->hook_ops);
bpf_nf_disable_defrag(nf_link);
}
} }
static void bpf_nf_link_dealloc(struct bpf_link *link) static void bpf_nf_link_dealloc(struct bpf_link *link)
@@ -92,6 +176,8 @@ static const struct bpf_link_ops bpf_nf_link_lops = {
static int bpf_nf_check_pf_and_hooks(const union bpf_attr *attr) static int bpf_nf_check_pf_and_hooks(const union bpf_attr *attr)
{ {
int prio;
switch (attr->link_create.netfilter.pf) { switch (attr->link_create.netfilter.pf) {
case NFPROTO_IPV4: case NFPROTO_IPV4:
case NFPROTO_IPV6: case NFPROTO_IPV6:
@@ -102,19 +188,18 @@ static int bpf_nf_check_pf_and_hooks(const union bpf_attr *attr)
return -EAFNOSUPPORT; return -EAFNOSUPPORT;
} }
if (attr->link_create.netfilter.flags) if (attr->link_create.netfilter.flags & ~BPF_F_NETFILTER_IP_DEFRAG)
return -EOPNOTSUPP; return -EOPNOTSUPP;
/* make sure conntrack confirm is always last. /* make sure conntrack confirm is always last */
* prio = attr->link_create.netfilter.priority;
* In the future, if userspace can e.g. request defrag, then if (prio == NF_IP_PRI_FIRST)
* "defrag_requested && prio before NF_IP_PRI_CONNTRACK_DEFRAG" return -ERANGE; /* sabotage_in and other warts */
* should fail. else if (prio == NF_IP_PRI_LAST)
*/ return -ERANGE; /* e.g. conntrack confirm */
switch (attr->link_create.netfilter.priority) { else if ((attr->link_create.netfilter.flags & BPF_F_NETFILTER_IP_DEFRAG) &&
case NF_IP_PRI_FIRST: return -ERANGE; /* sabotage_in and other warts */ prio <= NF_IP_PRI_CONNTRACK_DEFRAG)
case NF_IP_PRI_LAST: return -ERANGE; /* e.g. conntrack confirm */ return -ERANGE; /* cannot use defrag if prog runs before nf_defrag */
}
return 0; return 0;
} }
@@ -149,6 +234,7 @@ int bpf_nf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
link->net = net; link->net = net;
link->dead = false; link->dead = false;
link->defrag_hook = NULL;
err = bpf_link_prime(&link->link, &link_primer); err = bpf_link_prime(&link->link, &link_primer);
if (err) { if (err) {
@@ -156,8 +242,17 @@ int bpf_nf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
return err; return err;
} }
if (attr->link_create.netfilter.flags & BPF_F_NETFILTER_IP_DEFRAG) {
err = bpf_nf_enable_defrag(link);
if (err) {
bpf_link_cleanup(&link_primer);
return err;
}
}
err = nf_register_net_hook(net, &link->hook_ops); err = nf_register_net_hook(net, &link->hook_ops);
if (err) { if (err) {
bpf_nf_disable_defrag(link);
bpf_link_cleanup(&link_primer); bpf_link_cleanup(&link_primer);
return err; return err;
} }

View File

@@ -14,6 +14,7 @@
#include <linux/types.h> #include <linux/types.h>
#include <linux/btf_ids.h> #include <linux/btf_ids.h>
#include <linux/net_namespace.h> #include <linux/net_namespace.h>
#include <net/xdp.h>
#include <net/netfilter/nf_conntrack_bpf.h> #include <net/netfilter/nf_conntrack_bpf.h>
#include <net/netfilter/nf_conntrack_core.h> #include <net/netfilter/nf_conntrack_core.h>

View File

@@ -25,6 +25,7 @@
#include <linux/vmalloc.h> #include <linux/vmalloc.h>
#include <net/xdp_sock_drv.h> #include <net/xdp_sock_drv.h>
#include <net/busy_poll.h> #include <net/busy_poll.h>
#include <net/netdev_rx_queue.h>
#include <net/xdp.h> #include <net/xdp.h>
#include "xsk_queue.h" #include "xsk_queue.h"

View File

@@ -19,6 +19,7 @@
/* ld/ldx fields */ /* ld/ldx fields */
#define BPF_DW 0x18 /* double word (64-bit) */ #define BPF_DW 0x18 /* double word (64-bit) */
#define BPF_MEMSX 0x80 /* load with sign extension */
#define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */ #define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */
#define BPF_XADD 0xc0 /* exclusive add - legacy name */ #define BPF_XADD 0xc0 /* exclusive add - legacy name */
@@ -1187,6 +1188,11 @@ enum bpf_perf_event_type {
*/ */
#define BPF_F_KPROBE_MULTI_RETURN (1U << 0) #define BPF_F_KPROBE_MULTI_RETURN (1U << 0)
/* link_create.netfilter.flags used in LINK_CREATE command for
* BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation.
*/
#define BPF_F_NETFILTER_IP_DEFRAG (1U << 0)
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions: * the following extensions:
* *
@@ -4198,9 +4204,6 @@ union bpf_attr {
* **-EOPNOTSUPP** if the operation is not supported, for example * **-EOPNOTSUPP** if the operation is not supported, for example
* a call from outside of TC ingress. * a call from outside of TC ingress.
* *
* **-ESOCKTNOSUPPORT** if the socket type is not supported
* (reuseport).
*
* long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags) * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
* Description * Description
* Helper is overloaded depending on BPF program type. This * Helper is overloaded depending on BPF program type. This

View File

@@ -293,11 +293,11 @@ help:
@echo ' HINT: use "V=1" to enable verbose build' @echo ' HINT: use "V=1" to enable verbose build'
@echo ' all - build libraries and pkgconfig' @echo ' all - build libraries and pkgconfig'
@echo ' clean - remove all generated files' @echo ' clean - remove all generated files'
@echo ' check - check abi and version info' @echo ' check - check ABI and version info'
@echo '' @echo ''
@echo 'libbpf install targets:' @echo 'libbpf install targets:'
@echo ' HINT: use "prefix"(defaults to "/usr/local") or "DESTDIR" (defaults to "/")' @echo ' HINT: use "prefix"(defaults to "/usr/local") or "DESTDIR" (defaults to "/")'
@echo ' to adjust target desitantion, e.g. "make prefix=/usr/local install"' @echo ' to adjust target destination, e.g. "make prefix=/usr/local install"'
@echo ' install - build and install all headers, libraries and pkgconfig' @echo ' install - build and install all headers, libraries and pkgconfig'
@echo ' install_headers - install only headers to include/bpf' @echo ' install_headers - install only headers to include/bpf'
@echo '' @echo ''

View File

@@ -13,6 +13,7 @@ test_dev_cgroup
/test_progs /test_progs
/test_progs-no_alu32 /test_progs-no_alu32
/test_progs-bpf_gcc /test_progs-bpf_gcc
/test_progs-cpuv4
test_verifier_log test_verifier_log
feature feature
test_sock test_sock
@@ -36,6 +37,7 @@ test_cpp
*.lskel.h *.lskel.h
/no_alu32 /no_alu32
/bpf_gcc /bpf_gcc
/cpuv4
/host-tools /host-tools
/tools /tools
/runqslower /runqslower

View File

@@ -33,11 +33,16 @@ CFLAGS += -g -O0 -rdynamic -Wall -Werror $(GENFLAGS) $(SAN_CFLAGS) \
LDFLAGS += $(SAN_LDFLAGS) LDFLAGS += $(SAN_LDFLAGS)
LDLIBS += -lelf -lz -lrt -lpthread LDLIBS += -lelf -lz -lrt -lpthread
# Silence some warnings when compiled with clang
ifneq ($(LLVM),) ifneq ($(LLVM),)
# Silence some warnings when compiled with clang
CFLAGS += -Wno-unused-command-line-argument CFLAGS += -Wno-unused-command-line-argument
endif endif
# Check whether bpf cpu=v4 is supported or not by clang
ifneq ($(shell $(CLANG) --target=bpf -mcpu=help 2>&1 | grep 'v4'),)
CLANG_CPUV4 := 1
endif
# Order correspond to 'make run_tests' order # Order correspond to 'make run_tests' order
TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
test_dev_cgroup \ test_dev_cgroup \
@@ -51,6 +56,10 @@ ifneq ($(BPF_GCC),)
TEST_GEN_PROGS += test_progs-bpf_gcc TEST_GEN_PROGS += test_progs-bpf_gcc
endif endif
ifneq ($(CLANG_CPUV4),)
TEST_GEN_PROGS += test_progs-cpuv4
endif
TEST_GEN_FILES = test_lwt_ip_encap.bpf.o test_tc_edt.bpf.o TEST_GEN_FILES = test_lwt_ip_encap.bpf.o test_tc_edt.bpf.o
TEST_FILES = xsk_prereqs.sh $(wildcard progs/btf_dump_test_case_*.c) TEST_FILES = xsk_prereqs.sh $(wildcard progs/btf_dump_test_case_*.c)
@@ -383,6 +392,11 @@ define CLANG_NOALU32_BPF_BUILD_RULE
$(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2) $(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
$(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v2 -o $2 $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v2 -o $2
endef endef
# Similar to CLANG_BPF_BUILD_RULE, but with cpu-v4
define CLANG_CPUV4_BPF_BUILD_RULE
$(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
$(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v4 -o $2
endef
# Build BPF object using GCC # Build BPF object using GCC
define GCC_BPF_BUILD_RULE define GCC_BPF_BUILD_RULE
$(call msg,GCC-BPF,$(TRUNNER_BINARY),$2) $(call msg,GCC-BPF,$(TRUNNER_BINARY),$2)
@@ -425,7 +439,7 @@ LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(ske
# $eval()) and pass control to DEFINE_TEST_RUNNER_RULES. # $eval()) and pass control to DEFINE_TEST_RUNNER_RULES.
# Parameters: # Parameters:
# $1 - test runner base binary name (e.g., test_progs) # $1 - test runner base binary name (e.g., test_progs)
# $2 - test runner extra "flavor" (e.g., no_alu32, gcc-bpf, etc) # $2 - test runner extra "flavor" (e.g., no_alu32, cpuv4, gcc-bpf, etc)
define DEFINE_TEST_RUNNER define DEFINE_TEST_RUNNER
TRUNNER_OUTPUT := $(OUTPUT)$(if $2,/)$2 TRUNNER_OUTPUT := $(OUTPUT)$(if $2,/)$2
@@ -453,7 +467,7 @@ endef
# Using TRUNNER_XXX variables, provided by callers of DEFINE_TEST_RUNNER and # Using TRUNNER_XXX variables, provided by callers of DEFINE_TEST_RUNNER and
# set up by DEFINE_TEST_RUNNER itself, create test runner build rules with: # set up by DEFINE_TEST_RUNNER itself, create test runner build rules with:
# $1 - test runner base binary name (e.g., test_progs) # $1 - test runner base binary name (e.g., test_progs)
# $2 - test runner extra "flavor" (e.g., no_alu32, gcc-bpf, etc) # $2 - test runner extra "flavor" (e.g., no_alu32, cpuv4, gcc-bpf, etc)
define DEFINE_TEST_RUNNER_RULES define DEFINE_TEST_RUNNER_RULES
ifeq ($($(TRUNNER_OUTPUT)-dir),) ifeq ($($(TRUNNER_OUTPUT)-dir),)
@@ -565,8 +579,8 @@ TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c \
network_helpers.c testing_helpers.c \ network_helpers.c testing_helpers.c \
btf_helpers.c flow_dissector_load.h \ btf_helpers.c flow_dissector_load.h \
cap_helpers.c test_loader.c xsk.c disasm.c \ cap_helpers.c test_loader.c xsk.c disasm.c \
json_writer.c unpriv_helpers.c json_writer.c unpriv_helpers.c \
ip_check_defrag_frags.h
TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \
$(OUTPUT)/liburandom_read.so \ $(OUTPUT)/liburandom_read.so \
$(OUTPUT)/xdp_synproxy \ $(OUTPUT)/xdp_synproxy \
@@ -584,6 +598,13 @@ TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
$(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32)) $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
# Define test_progs-cpuv4 test runner.
ifneq ($(CLANG_CPUV4),)
TRUNNER_BPF_BUILD_RULE := CLANG_CPUV4_BPF_BUILD_RULE
TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
$(eval $(call DEFINE_TEST_RUNNER,test_progs,cpuv4))
endif
# Define test_progs BPF-GCC-flavored test runner. # Define test_progs BPF-GCC-flavored test runner.
ifneq ($(BPF_GCC),) ifneq ($(BPF_GCC),)
TRUNNER_BPF_BUILD_RULE := GCC_BPF_BUILD_RULE TRUNNER_BPF_BUILD_RULE := GCC_BPF_BUILD_RULE
@@ -681,7 +702,7 @@ EXTRA_CLEAN := $(TEST_CUSTOM_PROGS) $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \
prog_tests/tests.h map_tests/tests.h verifier/tests.h \ prog_tests/tests.h map_tests/tests.h verifier/tests.h \
feature bpftool \ feature bpftool \
$(addprefix $(OUTPUT)/,*.o *.skel.h *.lskel.h *.subskel.h \ $(addprefix $(OUTPUT)/,*.o *.skel.h *.lskel.h *.subskel.h \
no_alu32 bpf_gcc bpf_testmod.ko \ no_alu32 cpuv4 bpf_gcc bpf_testmod.ko \
liburandom_read.so) liburandom_read.so)
.PHONY: docs docs-clean .PHONY: docs docs-clean

View File

@@ -98,6 +98,12 @@ bpf_testmod_test_struct_arg_8(u64 a, void *b, short c, int d, void *e,
return bpf_testmod_test_struct_arg_result; return bpf_testmod_test_struct_arg_result;
} }
noinline int
bpf_testmod_test_arg_ptr_to_struct(struct bpf_testmod_struct_arg_1 *a) {
bpf_testmod_test_struct_arg_result = a->a;
return bpf_testmod_test_struct_arg_result;
}
__bpf_kfunc void __bpf_kfunc void
bpf_testmod_test_mod_kfunc(int i) bpf_testmod_test_mod_kfunc(int i)
{ {
@@ -240,7 +246,7 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
.off = off, .off = off,
.len = len, .len = len,
}; };
struct bpf_testmod_struct_arg_1 struct_arg1 = {10}; struct bpf_testmod_struct_arg_1 struct_arg1 = {10}, struct_arg1_2 = {-1};
struct bpf_testmod_struct_arg_2 struct_arg2 = {2, 3}; struct bpf_testmod_struct_arg_2 struct_arg2 = {2, 3};
struct bpf_testmod_struct_arg_3 *struct_arg3; struct bpf_testmod_struct_arg_3 *struct_arg3;
struct bpf_testmod_struct_arg_4 struct_arg4 = {21, 22}; struct bpf_testmod_struct_arg_4 struct_arg4 = {21, 22};
@@ -259,6 +265,7 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
(void)bpf_testmod_test_struct_arg_8(16, (void *)17, 18, 19, (void)bpf_testmod_test_struct_arg_8(16, (void *)17, 18, 19,
(void *)20, struct_arg4, 23); (void *)20, struct_arg4, 23);
(void)bpf_testmod_test_arg_ptr_to_struct(&struct_arg1_2);
struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) + struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) +
sizeof(int)), GFP_KERNEL); sizeof(int)), GFP_KERNEL);

View File

@@ -0,0 +1,90 @@
#!/bin/env python3
# SPDX-License-Identifier: GPL-2.0
"""
This script helps generate fragmented UDP packets.
While it is technically possible to dynamically generate
fragmented packets in C, it is much harder to read and write
said code. `scapy` is relatively industry standard and really
easy to read / write.
So we choose to write this script that generates a valid C
header. Rerun script and commit generated file after any
modifications.
"""
import argparse
import os
from scapy.all import *
# These constants must stay in sync with `ip_check_defrag.c`
VETH1_ADDR = "172.16.1.200"
VETH0_ADDR6 = "fc00::100"
VETH1_ADDR6 = "fc00::200"
CLIENT_PORT = 48878
SERVER_PORT = 48879
MAGIC_MESSAGE = "THIS IS THE ORIGINAL MESSAGE, PLEASE REASSEMBLE ME"
def print_header(f):
f.write("// SPDX-License-Identifier: GPL-2.0\n")
f.write("/* DO NOT EDIT -- this file is generated */\n")
f.write("\n")
f.write("#ifndef _IP_CHECK_DEFRAG_FRAGS_H\n")
f.write("#define _IP_CHECK_DEFRAG_FRAGS_H\n")
f.write("\n")
f.write("#include <stdint.h>\n")
f.write("\n")
def print_frags(f, frags, v6):
for idx, frag in enumerate(frags):
# 10 bytes per line to keep width in check
chunks = [frag[i : i + 10] for i in range(0, len(frag), 10)]
chunks_fmted = [", ".join([str(hex(b)) for b in chunk]) for chunk in chunks]
suffix = "6" if v6 else ""
f.write(f"static uint8_t frag{suffix}_{idx}[] = {{\n")
for chunk in chunks_fmted:
f.write(f"\t{chunk},\n")
f.write(f"}};\n")
def print_trailer(f):
f.write("\n")
f.write("#endif /* _IP_CHECK_DEFRAG_FRAGS_H */\n")
def main(f):
# srcip of 0 is filled in by IP_HDRINCL
sip = "0.0.0.0"
sip6 = VETH0_ADDR6
dip = VETH1_ADDR
dip6 = VETH1_ADDR6
sport = CLIENT_PORT
dport = SERVER_PORT
payload = MAGIC_MESSAGE.encode()
# Disable UDPv4 checksums to keep code simpler
pkt = IP(src=sip,dst=dip) / UDP(sport=sport,dport=dport,chksum=0) / Raw(load=payload)
# UDPv6 requires a checksum
# Also pin the ipv6 fragment header ID, otherwise it's a random value
pkt6 = IPv6(src=sip6,dst=dip6) / IPv6ExtHdrFragment(id=0xBEEF) / UDP(sport=sport,dport=dport) / Raw(load=payload)
frags = [f.build() for f in pkt.fragment(24)]
frags6 = [f.build() for f in fragment6(pkt6, 72)]
print_header(f)
print_frags(f, frags, False)
print_frags(f, frags6, True)
print_trailer(f)
if __name__ == "__main__":
dir = os.path.dirname(os.path.realpath(__file__))
header = f"{dir}/ip_check_defrag_frags.h"
with open(header, "w") as f:
main(f)

View File

@@ -0,0 +1,57 @@
// SPDX-License-Identifier: GPL-2.0
/* DO NOT EDIT -- this file is generated */
#ifndef _IP_CHECK_DEFRAG_FRAGS_H
#define _IP_CHECK_DEFRAG_FRAGS_H
#include <stdint.h>
static uint8_t frag_0[] = {
0x45, 0x0, 0x0, 0x2c, 0x0, 0x1, 0x20, 0x0, 0x40, 0x11,
0xac, 0xe8, 0x0, 0x0, 0x0, 0x0, 0xac, 0x10, 0x1, 0xc8,
0xbe, 0xee, 0xbe, 0xef, 0x0, 0x3a, 0x0, 0x0, 0x54, 0x48,
0x49, 0x53, 0x20, 0x49, 0x53, 0x20, 0x54, 0x48, 0x45, 0x20,
0x4f, 0x52, 0x49, 0x47,
};
static uint8_t frag_1[] = {
0x45, 0x0, 0x0, 0x2c, 0x0, 0x1, 0x20, 0x3, 0x40, 0x11,
0xac, 0xe5, 0x0, 0x0, 0x0, 0x0, 0xac, 0x10, 0x1, 0xc8,
0x49, 0x4e, 0x41, 0x4c, 0x20, 0x4d, 0x45, 0x53, 0x53, 0x41,
0x47, 0x45, 0x2c, 0x20, 0x50, 0x4c, 0x45, 0x41, 0x53, 0x45,
0x20, 0x52, 0x45, 0x41,
};
static uint8_t frag_2[] = {
0x45, 0x0, 0x0, 0x1e, 0x0, 0x1, 0x0, 0x6, 0x40, 0x11,
0xcc, 0xf0, 0x0, 0x0, 0x0, 0x0, 0xac, 0x10, 0x1, 0xc8,
0x53, 0x53, 0x45, 0x4d, 0x42, 0x4c, 0x45, 0x20, 0x4d, 0x45,
};
static uint8_t frag6_0[] = {
0x60, 0x0, 0x0, 0x0, 0x0, 0x20, 0x2c, 0x40, 0xfc, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x1, 0x0, 0xfc, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0,
0x11, 0x0, 0x0, 0x1, 0x0, 0x0, 0xbe, 0xef, 0xbe, 0xee,
0xbe, 0xef, 0x0, 0x3a, 0xd0, 0xf8, 0x54, 0x48, 0x49, 0x53,
0x20, 0x49, 0x53, 0x20, 0x54, 0x48, 0x45, 0x20, 0x4f, 0x52,
0x49, 0x47,
};
static uint8_t frag6_1[] = {
0x60, 0x0, 0x0, 0x0, 0x0, 0x20, 0x2c, 0x40, 0xfc, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x1, 0x0, 0xfc, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0,
0x11, 0x0, 0x0, 0x19, 0x0, 0x0, 0xbe, 0xef, 0x49, 0x4e,
0x41, 0x4c, 0x20, 0x4d, 0x45, 0x53, 0x53, 0x41, 0x47, 0x45,
0x2c, 0x20, 0x50, 0x4c, 0x45, 0x41, 0x53, 0x45, 0x20, 0x52,
0x45, 0x41,
};
static uint8_t frag6_2[] = {
0x60, 0x0, 0x0, 0x0, 0x0, 0x12, 0x2c, 0x40, 0xfc, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x1, 0x0, 0xfc, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0,
0x11, 0x0, 0x0, 0x30, 0x0, 0x0, 0xbe, 0xef, 0x53, 0x53,
0x45, 0x4d, 0x42, 0x4c, 0x45, 0x20, 0x4d, 0x45,
};
#endif /* _IP_CHECK_DEFRAG_FRAGS_H */

View File

@@ -270,15 +270,24 @@ int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts)
opts = &default_opts; opts = &default_opts;
optlen = sizeof(type); optlen = sizeof(type);
if (opts->type) {
type = opts->type;
} else {
if (getsockopt(server_fd, SOL_SOCKET, SO_TYPE, &type, &optlen)) { if (getsockopt(server_fd, SOL_SOCKET, SO_TYPE, &type, &optlen)) {
log_err("getsockopt(SOL_TYPE)"); log_err("getsockopt(SOL_TYPE)");
return -1; return -1;
} }
}
if (opts->proto) {
protocol = opts->proto;
} else {
if (getsockopt(server_fd, SOL_SOCKET, SO_PROTOCOL, &protocol, &optlen)) { if (getsockopt(server_fd, SOL_SOCKET, SO_PROTOCOL, &protocol, &optlen)) {
log_err("getsockopt(SOL_PROTOCOL)"); log_err("getsockopt(SOL_PROTOCOL)");
return -1; return -1;
} }
}
addrlen = sizeof(addr); addrlen = sizeof(addr);
if (getsockname(server_fd, (struct sockaddr *)&addr, &addrlen)) { if (getsockname(server_fd, (struct sockaddr *)&addr, &addrlen)) {
@@ -301,6 +310,7 @@ int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts)
strlen(opts->cc) + 1)) strlen(opts->cc) + 1))
goto error_close; goto error_close;
if (!opts->noconnect)
if (connect_fd_to_addr(fd, &addr, addrlen, opts->must_fail)) if (connect_fd_to_addr(fd, &addr, addrlen, opts->must_fail))
goto error_close; goto error_close;
@@ -423,6 +433,9 @@ fail:
void close_netns(struct nstoken *token) void close_netns(struct nstoken *token)
{ {
if (!token)
return;
ASSERT_OK(setns(token->orig_netns_fd, CLONE_NEWNET), "setns"); ASSERT_OK(setns(token->orig_netns_fd, CLONE_NEWNET), "setns");
close(token->orig_netns_fd); close(token->orig_netns_fd);
free(token); free(token);

View File

@@ -21,6 +21,9 @@ struct network_helper_opts {
const char *cc; const char *cc;
int timeout_ms; int timeout_ms;
bool must_fail; bool must_fail;
bool noconnect;
int type;
int proto;
}; };
/* ipv4 test vector */ /* ipv4 test vector */

View File

@@ -0,0 +1,199 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Isovalent */
#include <uapi/linux/if_link.h>
#include <test_progs.h>
#include <netinet/tcp.h>
#include <netinet/udp.h>
#include "network_helpers.h"
#include "test_assign_reuse.skel.h"
#define NS_TEST "assign_reuse"
#define LOOPBACK 1
#define PORT 4443
static int attach_reuseport(int sock_fd, int prog_fd)
{
return setsockopt(sock_fd, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF,
&prog_fd, sizeof(prog_fd));
}
static __u64 cookie(int fd)
{
__u64 cookie = 0;
socklen_t cookie_len = sizeof(cookie);
int ret;
ret = getsockopt(fd, SOL_SOCKET, SO_COOKIE, &cookie, &cookie_len);
ASSERT_OK(ret, "cookie");
ASSERT_GT(cookie, 0, "cookie_invalid");
return cookie;
}
static int echo_test_udp(int fd_sv)
{
struct sockaddr_storage addr = {};
socklen_t len = sizeof(addr);
char buff[1] = {};
int fd_cl = -1, ret;
fd_cl = connect_to_fd(fd_sv, 100);
ASSERT_GT(fd_cl, 0, "create_client");
ASSERT_EQ(getsockname(fd_cl, (void *)&addr, &len), 0, "getsockname");
ASSERT_EQ(send(fd_cl, buff, sizeof(buff), 0), 1, "send_client");
ret = recv(fd_sv, buff, sizeof(buff), 0);
if (ret < 0) {
close(fd_cl);
return errno;
}
ASSERT_EQ(ret, 1, "recv_server");
ASSERT_EQ(sendto(fd_sv, buff, sizeof(buff), 0, (void *)&addr, len), 1, "send_server");
ASSERT_EQ(recv(fd_cl, buff, sizeof(buff), 0), 1, "recv_client");
close(fd_cl);
return 0;
}
static int echo_test_tcp(int fd_sv)
{
char buff[1] = {};
int fd_cl = -1, fd_sv_cl = -1;
fd_cl = connect_to_fd(fd_sv, 100);
if (fd_cl < 0)
return errno;
fd_sv_cl = accept(fd_sv, NULL, NULL);
ASSERT_GE(fd_sv_cl, 0, "accept_fd");
ASSERT_EQ(send(fd_cl, buff, sizeof(buff), 0), 1, "send_client");
ASSERT_EQ(recv(fd_sv_cl, buff, sizeof(buff), 0), 1, "recv_server");
ASSERT_EQ(send(fd_sv_cl, buff, sizeof(buff), 0), 1, "send_server");
ASSERT_EQ(recv(fd_cl, buff, sizeof(buff), 0), 1, "recv_client");
close(fd_sv_cl);
close(fd_cl);
return 0;
}
void run_assign_reuse(int family, int sotype, const char *ip, __u16 port)
{
DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
.ifindex = LOOPBACK,
.attach_point = BPF_TC_INGRESS,
);
DECLARE_LIBBPF_OPTS(bpf_tc_opts, tc_opts,
.handle = 1,
.priority = 1,
);
bool hook_created = false, tc_attached = false;
int ret, fd_tc, fd_accept, fd_drop, fd_map;
int *fd_sv = NULL;
__u64 fd_val;
struct test_assign_reuse *skel;
const int zero = 0;
skel = test_assign_reuse__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
goto cleanup;
skel->rodata->dest_port = port;
ret = test_assign_reuse__load(skel);
if (!ASSERT_OK(ret, "skel_load"))
goto cleanup;
ASSERT_EQ(skel->bss->sk_cookie_seen, 0, "cookie_init");
fd_tc = bpf_program__fd(skel->progs.tc_main);
fd_accept = bpf_program__fd(skel->progs.reuse_accept);
fd_drop = bpf_program__fd(skel->progs.reuse_drop);
fd_map = bpf_map__fd(skel->maps.sk_map);
fd_sv = start_reuseport_server(family, sotype, ip, port, 100, 1);
if (!ASSERT_NEQ(fd_sv, NULL, "start_reuseport_server"))
goto cleanup;
ret = attach_reuseport(*fd_sv, fd_drop);
if (!ASSERT_OK(ret, "attach_reuseport"))
goto cleanup;
fd_val = *fd_sv;
ret = bpf_map_update_elem(fd_map, &zero, &fd_val, BPF_NOEXIST);
if (!ASSERT_OK(ret, "bpf_sk_map"))
goto cleanup;
ret = bpf_tc_hook_create(&tc_hook);
if (ret == 0)
hook_created = true;
ret = ret == -EEXIST ? 0 : ret;
if (!ASSERT_OK(ret, "bpf_tc_hook_create"))
goto cleanup;
tc_opts.prog_fd = fd_tc;
ret = bpf_tc_attach(&tc_hook, &tc_opts);
if (!ASSERT_OK(ret, "bpf_tc_attach"))
goto cleanup;
tc_attached = true;
if (sotype == SOCK_STREAM)
ASSERT_EQ(echo_test_tcp(*fd_sv), ECONNREFUSED, "drop_tcp");
else
ASSERT_EQ(echo_test_udp(*fd_sv), EAGAIN, "drop_udp");
ASSERT_EQ(skel->bss->reuseport_executed, 1, "program executed once");
skel->bss->sk_cookie_seen = 0;
skel->bss->reuseport_executed = 0;
ASSERT_OK(attach_reuseport(*fd_sv, fd_accept), "attach_reuseport(accept)");
if (sotype == SOCK_STREAM)
ASSERT_EQ(echo_test_tcp(*fd_sv), 0, "echo_tcp");
else
ASSERT_EQ(echo_test_udp(*fd_sv), 0, "echo_udp");
ASSERT_EQ(skel->bss->sk_cookie_seen, cookie(*fd_sv),
"cookie_mismatch");
ASSERT_EQ(skel->bss->reuseport_executed, 1, "program executed once");
cleanup:
if (tc_attached) {
tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0;
ret = bpf_tc_detach(&tc_hook, &tc_opts);
ASSERT_OK(ret, "bpf_tc_detach");
}
if (hook_created) {
tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS;
bpf_tc_hook_destroy(&tc_hook);
}
test_assign_reuse__destroy(skel);
free_fds(fd_sv, 1);
}
void test_assign_reuse(void)
{
struct nstoken *tok = NULL;
SYS(out, "ip netns add %s", NS_TEST);
SYS(cleanup, "ip -net %s link set dev lo up", NS_TEST);
tok = open_netns(NS_TEST);
if (!ASSERT_OK_PTR(tok, "netns token"))
return;
if (test__start_subtest("tcpv4"))
run_assign_reuse(AF_INET, SOCK_STREAM, "127.0.0.1", PORT);
if (test__start_subtest("tcpv6"))
run_assign_reuse(AF_INET6, SOCK_STREAM, "::1", PORT);
if (test__start_subtest("udpv4"))
run_assign_reuse(AF_INET, SOCK_DGRAM, "127.0.0.1", PORT);
if (test__start_subtest("udpv6"))
run_assign_reuse(AF_INET6, SOCK_DGRAM, "::1", PORT);
cleanup:
close_netns(tok);
SYS_NOFAIL("ip netns delete %s", NS_TEST);
out:
return;
}

View File

@@ -0,0 +1,283 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include <net/if.h>
#include <linux/netfilter.h>
#include <network_helpers.h>
#include "ip_check_defrag.skel.h"
#include "ip_check_defrag_frags.h"
/*
* This selftest spins up a client and an echo server, each in their own
* network namespace. The client will send a fragmented message to the server.
* The prog attached to the server will shoot down any fragments. Thus, if
* the server is able to correctly echo back the message to the client, we will
* have verified that netfilter is reassembling packets for us.
*
* Topology:
* =========
* NS0 | NS1
* |
* client | server
* ---------- | ----------
* | veth0 | --------- | veth1 |
* ---------- peer ----------
* |
* | with bpf
*/
#define NS0 "defrag_ns0"
#define NS1 "defrag_ns1"
#define VETH0 "veth0"
#define VETH1 "veth1"
#define VETH0_ADDR "172.16.1.100"
#define VETH0_ADDR6 "fc00::100"
/* The following constants must stay in sync with `generate_udp_fragments.py` */
#define VETH1_ADDR "172.16.1.200"
#define VETH1_ADDR6 "fc00::200"
#define CLIENT_PORT 48878
#define SERVER_PORT 48879
#define MAGIC_MESSAGE "THIS IS THE ORIGINAL MESSAGE, PLEASE REASSEMBLE ME"
static int setup_topology(bool ipv6)
{
bool up;
int i;
SYS(fail, "ip netns add " NS0);
SYS(fail, "ip netns add " NS1);
SYS(fail, "ip link add " VETH0 " netns " NS0 " type veth peer name " VETH1 " netns " NS1);
if (ipv6) {
SYS(fail, "ip -6 -net " NS0 " addr add " VETH0_ADDR6 "/64 dev " VETH0 " nodad");
SYS(fail, "ip -6 -net " NS1 " addr add " VETH1_ADDR6 "/64 dev " VETH1 " nodad");
} else {
SYS(fail, "ip -net " NS0 " addr add " VETH0_ADDR "/24 dev " VETH0);
SYS(fail, "ip -net " NS1 " addr add " VETH1_ADDR "/24 dev " VETH1);
}
SYS(fail, "ip -net " NS0 " link set dev " VETH0 " up");
SYS(fail, "ip -net " NS1 " link set dev " VETH1 " up");
/* Wait for up to 5s for links to come up */
for (i = 0; i < 5; ++i) {
if (ipv6)
up = !system("ip netns exec " NS0 " ping -6 -c 1 -W 1 " VETH1_ADDR6 " &>/dev/null");
else
up = !system("ip netns exec " NS0 " ping -c 1 -W 1 " VETH1_ADDR " &>/dev/null");
if (up)
break;
}
return 0;
fail:
return -1;
}
static void cleanup_topology(void)
{
SYS_NOFAIL("test -f /var/run/netns/" NS0 " && ip netns delete " NS0);
SYS_NOFAIL("test -f /var/run/netns/" NS1 " && ip netns delete " NS1);
}
static int attach(struct ip_check_defrag *skel, bool ipv6)
{
LIBBPF_OPTS(bpf_netfilter_opts, opts,
.pf = ipv6 ? NFPROTO_IPV6 : NFPROTO_IPV4,
.priority = 42,
.flags = BPF_F_NETFILTER_IP_DEFRAG);
struct nstoken *nstoken;
int err = -1;
nstoken = open_netns(NS1);
skel->links.defrag = bpf_program__attach_netfilter(skel->progs.defrag, &opts);
if (!ASSERT_OK_PTR(skel->links.defrag, "program attach"))
goto out;
err = 0;
out:
close_netns(nstoken);
return err;
}
static int send_frags(int client)
{
struct sockaddr_storage saddr;
struct sockaddr *saddr_p;
socklen_t saddr_len;
int err;
saddr_p = (struct sockaddr *)&saddr;
err = make_sockaddr(AF_INET, VETH1_ADDR, SERVER_PORT, &saddr, &saddr_len);
if (!ASSERT_OK(err, "make_sockaddr"))
return -1;
err = sendto(client, frag_0, sizeof(frag_0), 0, saddr_p, saddr_len);
if (!ASSERT_GE(err, 0, "sendto frag_0"))
return -1;
err = sendto(client, frag_1, sizeof(frag_1), 0, saddr_p, saddr_len);
if (!ASSERT_GE(err, 0, "sendto frag_1"))
return -1;
err = sendto(client, frag_2, sizeof(frag_2), 0, saddr_p, saddr_len);
if (!ASSERT_GE(err, 0, "sendto frag_2"))
return -1;
return 0;
}
static int send_frags6(int client)
{
struct sockaddr_storage saddr;
struct sockaddr *saddr_p;
socklen_t saddr_len;
int err;
saddr_p = (struct sockaddr *)&saddr;
/* Port needs to be set to 0 for raw ipv6 socket for some reason */
err = make_sockaddr(AF_INET6, VETH1_ADDR6, 0, &saddr, &saddr_len);
if (!ASSERT_OK(err, "make_sockaddr"))
return -1;
err = sendto(client, frag6_0, sizeof(frag6_0), 0, saddr_p, saddr_len);
if (!ASSERT_GE(err, 0, "sendto frag6_0"))
return -1;
err = sendto(client, frag6_1, sizeof(frag6_1), 0, saddr_p, saddr_len);
if (!ASSERT_GE(err, 0, "sendto frag6_1"))
return -1;
err = sendto(client, frag6_2, sizeof(frag6_2), 0, saddr_p, saddr_len);
if (!ASSERT_GE(err, 0, "sendto frag6_2"))
return -1;
return 0;
}
void test_bpf_ip_check_defrag_ok(bool ipv6)
{
struct network_helper_opts rx_opts = {
.timeout_ms = 1000,
.noconnect = true,
};
struct network_helper_opts tx_ops = {
.timeout_ms = 1000,
.type = SOCK_RAW,
.proto = IPPROTO_RAW,
.noconnect = true,
};
struct sockaddr_storage caddr;
struct ip_check_defrag *skel;
struct nstoken *nstoken;
int client_tx_fd = -1;
int client_rx_fd = -1;
socklen_t caddr_len;
int srv_fd = -1;
char buf[1024];
int len, err;
skel = ip_check_defrag__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
if (!ASSERT_OK(setup_topology(ipv6), "setup_topology"))
goto out;
if (!ASSERT_OK(attach(skel, ipv6), "attach"))
goto out;
/* Start server in ns1 */
nstoken = open_netns(NS1);
if (!ASSERT_OK_PTR(nstoken, "setns ns1"))
goto out;
srv_fd = start_server(ipv6 ? AF_INET6 : AF_INET, SOCK_DGRAM, NULL, SERVER_PORT, 0);
close_netns(nstoken);
if (!ASSERT_GE(srv_fd, 0, "start_server"))
goto out;
/* Open tx raw socket in ns0 */
nstoken = open_netns(NS0);
if (!ASSERT_OK_PTR(nstoken, "setns ns0"))
goto out;
client_tx_fd = connect_to_fd_opts(srv_fd, &tx_ops);
close_netns(nstoken);
if (!ASSERT_GE(client_tx_fd, 0, "connect_to_fd_opts"))
goto out;
/* Open rx socket in ns0 */
nstoken = open_netns(NS0);
if (!ASSERT_OK_PTR(nstoken, "setns ns0"))
goto out;
client_rx_fd = connect_to_fd_opts(srv_fd, &rx_opts);
close_netns(nstoken);
if (!ASSERT_GE(client_rx_fd, 0, "connect_to_fd_opts"))
goto out;
/* Bind rx socket to a premeditated port */
memset(&caddr, 0, sizeof(caddr));
nstoken = open_netns(NS0);
if (!ASSERT_OK_PTR(nstoken, "setns ns0"))
goto out;
if (ipv6) {
struct sockaddr_in6 *c = (struct sockaddr_in6 *)&caddr;
c->sin6_family = AF_INET6;
inet_pton(AF_INET6, VETH0_ADDR6, &c->sin6_addr);
c->sin6_port = htons(CLIENT_PORT);
err = bind(client_rx_fd, (struct sockaddr *)c, sizeof(*c));
} else {
struct sockaddr_in *c = (struct sockaddr_in *)&caddr;
c->sin_family = AF_INET;
inet_pton(AF_INET, VETH0_ADDR, &c->sin_addr);
c->sin_port = htons(CLIENT_PORT);
err = bind(client_rx_fd, (struct sockaddr *)c, sizeof(*c));
}
close_netns(nstoken);
if (!ASSERT_OK(err, "bind"))
goto out;
/* Send message in fragments */
if (ipv6) {
if (!ASSERT_OK(send_frags6(client_tx_fd), "send_frags6"))
goto out;
} else {
if (!ASSERT_OK(send_frags(client_tx_fd), "send_frags"))
goto out;
}
if (!ASSERT_EQ(skel->bss->shootdowns, 0, "shootdowns"))
goto out;
/* Receive reassembled msg on server and echo back to client */
caddr_len = sizeof(caddr);
len = recvfrom(srv_fd, buf, sizeof(buf), 0, (struct sockaddr *)&caddr, &caddr_len);
if (!ASSERT_GE(len, 0, "server recvfrom"))
goto out;
len = sendto(srv_fd, buf, len, 0, (struct sockaddr *)&caddr, caddr_len);
if (!ASSERT_GE(len, 0, "server sendto"))
goto out;
/* Expect reassembed message to be echoed back */
len = recvfrom(client_rx_fd, buf, sizeof(buf), 0, NULL, NULL);
if (!ASSERT_EQ(len, sizeof(MAGIC_MESSAGE) - 1, "client short read"))
goto out;
out:
if (client_rx_fd != -1)
close(client_rx_fd);
if (client_tx_fd != -1)
close(client_tx_fd);
if (srv_fd != -1)
close(srv_fd);
cleanup_topology();
ip_check_defrag__destroy(skel);
}
void test_bpf_ip_check_defrag(void)
{
if (test__start_subtest("v4"))
test_bpf_ip_check_defrag_ok(false);
if (test__start_subtest("v6"))
test_bpf_ip_check_defrag_ok(true);
}

View File

@@ -0,0 +1,139 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates.*/
#include <test_progs.h>
#include <network_helpers.h>
#include "test_ldsx_insn.skel.h"
static void test_map_val_and_probed_memory(void)
{
struct test_ldsx_insn *skel;
int err;
skel = test_ldsx_insn__open();
if (!ASSERT_OK_PTR(skel, "test_ldsx_insn__open"))
return;
if (skel->rodata->skip) {
test__skip();
goto out;
}
bpf_program__set_autoload(skel->progs.rdonly_map_prog, true);
bpf_program__set_autoload(skel->progs.map_val_prog, true);
bpf_program__set_autoload(skel->progs.test_ptr_struct_arg, true);
err = test_ldsx_insn__load(skel);
if (!ASSERT_OK(err, "test_ldsx_insn__load"))
goto out;
err = test_ldsx_insn__attach(skel);
if (!ASSERT_OK(err, "test_ldsx_insn__attach"))
goto out;
ASSERT_OK(trigger_module_test_read(256), "trigger_read");
ASSERT_EQ(skel->bss->done1, 1, "done1");
ASSERT_EQ(skel->bss->ret1, 1, "ret1");
ASSERT_EQ(skel->bss->done2, 1, "done2");
ASSERT_EQ(skel->bss->ret2, 1, "ret2");
ASSERT_EQ(skel->bss->int_member, -1, "int_member");
out:
test_ldsx_insn__destroy(skel);
}
static void test_ctx_member_sign_ext(void)
{
struct test_ldsx_insn *skel;
int err, fd, cgroup_fd;
char buf[16] = {0};
socklen_t optlen;
cgroup_fd = test__join_cgroup("/ldsx_test");
if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /ldsx_test"))
return;
skel = test_ldsx_insn__open();
if (!ASSERT_OK_PTR(skel, "test_ldsx_insn__open"))
goto close_cgroup_fd;
if (skel->rodata->skip) {
test__skip();
goto destroy_skel;
}
bpf_program__set_autoload(skel->progs._getsockopt, true);
err = test_ldsx_insn__load(skel);
if (!ASSERT_OK(err, "test_ldsx_insn__load"))
goto destroy_skel;
skel->links._getsockopt =
bpf_program__attach_cgroup(skel->progs._getsockopt, cgroup_fd);
if (!ASSERT_OK_PTR(skel->links._getsockopt, "getsockopt_link"))
goto destroy_skel;
fd = socket(AF_INET, SOCK_STREAM, 0);
if (!ASSERT_GE(fd, 0, "socket"))
goto destroy_skel;
optlen = sizeof(buf);
(void)getsockopt(fd, SOL_IP, IP_TTL, buf, &optlen);
ASSERT_EQ(skel->bss->set_optlen, -1, "optlen");
ASSERT_EQ(skel->bss->set_retval, -1, "retval");
close(fd);
destroy_skel:
test_ldsx_insn__destroy(skel);
close_cgroup_fd:
close(cgroup_fd);
}
static void test_ctx_member_narrow_sign_ext(void)
{
struct test_ldsx_insn *skel;
struct __sk_buff skb = {};
LIBBPF_OPTS(bpf_test_run_opts, topts,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.ctx_in = &skb,
.ctx_size_in = sizeof(skb),
);
int err, prog_fd;
skel = test_ldsx_insn__open();
if (!ASSERT_OK_PTR(skel, "test_ldsx_insn__open"))
return;
if (skel->rodata->skip) {
test__skip();
goto out;
}
bpf_program__set_autoload(skel->progs._tc, true);
err = test_ldsx_insn__load(skel);
if (!ASSERT_OK(err, "test_ldsx_insn__load"))
goto out;
prog_fd = bpf_program__fd(skel->progs._tc);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run");
ASSERT_EQ(skel->bss->set_mark, -2, "set_mark");
out:
test_ldsx_insn__destroy(skel);
}
void test_ldsx_insn(void)
{
if (test__start_subtest("map_val and probed_memory"))
test_map_val_and_probed_memory();
if (test__start_subtest("ctx_member_sign_ext"))
test_ctx_member_sign_ext();
if (test__start_subtest("ctx_member_narrow_sign_ext"))
test_ctx_member_narrow_sign_ext();
}

View File

@@ -11,6 +11,7 @@
#include "verifier_bounds_deduction_non_const.skel.h" #include "verifier_bounds_deduction_non_const.skel.h"
#include "verifier_bounds_mix_sign_unsign.skel.h" #include "verifier_bounds_mix_sign_unsign.skel.h"
#include "verifier_bpf_get_stack.skel.h" #include "verifier_bpf_get_stack.skel.h"
#include "verifier_bswap.skel.h"
#include "verifier_btf_ctx_access.skel.h" #include "verifier_btf_ctx_access.skel.h"
#include "verifier_cfg.skel.h" #include "verifier_cfg.skel.h"
#include "verifier_cgroup_inv_retcode.skel.h" #include "verifier_cgroup_inv_retcode.skel.h"
@@ -24,6 +25,7 @@
#include "verifier_direct_stack_access_wraparound.skel.h" #include "verifier_direct_stack_access_wraparound.skel.h"
#include "verifier_div0.skel.h" #include "verifier_div0.skel.h"
#include "verifier_div_overflow.skel.h" #include "verifier_div_overflow.skel.h"
#include "verifier_gotol.skel.h"
#include "verifier_helper_access_var_len.skel.h" #include "verifier_helper_access_var_len.skel.h"
#include "verifier_helper_packet_access.skel.h" #include "verifier_helper_packet_access.skel.h"
#include "verifier_helper_restricted.skel.h" #include "verifier_helper_restricted.skel.h"
@@ -31,6 +33,7 @@
#include "verifier_int_ptr.skel.h" #include "verifier_int_ptr.skel.h"
#include "verifier_jeq_infer_not_null.skel.h" #include "verifier_jeq_infer_not_null.skel.h"
#include "verifier_ld_ind.skel.h" #include "verifier_ld_ind.skel.h"
#include "verifier_ldsx.skel.h"
#include "verifier_leak_ptr.skel.h" #include "verifier_leak_ptr.skel.h"
#include "verifier_loops1.skel.h" #include "verifier_loops1.skel.h"
#include "verifier_lwt.skel.h" #include "verifier_lwt.skel.h"
@@ -40,6 +43,7 @@
#include "verifier_map_ret_val.skel.h" #include "verifier_map_ret_val.skel.h"
#include "verifier_masking.skel.h" #include "verifier_masking.skel.h"
#include "verifier_meta_access.skel.h" #include "verifier_meta_access.skel.h"
#include "verifier_movsx.skel.h"
#include "verifier_netfilter_ctx.skel.h" #include "verifier_netfilter_ctx.skel.h"
#include "verifier_netfilter_retcode.skel.h" #include "verifier_netfilter_retcode.skel.h"
#include "verifier_prevent_map_lookup.skel.h" #include "verifier_prevent_map_lookup.skel.h"
@@ -51,6 +55,7 @@
#include "verifier_ringbuf.skel.h" #include "verifier_ringbuf.skel.h"
#include "verifier_runtime_jit.skel.h" #include "verifier_runtime_jit.skel.h"
#include "verifier_scalar_ids.skel.h" #include "verifier_scalar_ids.skel.h"
#include "verifier_sdiv.skel.h"
#include "verifier_search_pruning.skel.h" #include "verifier_search_pruning.skel.h"
#include "verifier_sock.skel.h" #include "verifier_sock.skel.h"
#include "verifier_spill_fill.skel.h" #include "verifier_spill_fill.skel.h"
@@ -113,6 +118,7 @@ void test_verifier_bounds_deduction(void) { RUN(verifier_bounds_deduction);
void test_verifier_bounds_deduction_non_const(void) { RUN(verifier_bounds_deduction_non_const); } void test_verifier_bounds_deduction_non_const(void) { RUN(verifier_bounds_deduction_non_const); }
void test_verifier_bounds_mix_sign_unsign(void) { RUN(verifier_bounds_mix_sign_unsign); } void test_verifier_bounds_mix_sign_unsign(void) { RUN(verifier_bounds_mix_sign_unsign); }
void test_verifier_bpf_get_stack(void) { RUN(verifier_bpf_get_stack); } void test_verifier_bpf_get_stack(void) { RUN(verifier_bpf_get_stack); }
void test_verifier_bswap(void) { RUN(verifier_bswap); }
void test_verifier_btf_ctx_access(void) { RUN(verifier_btf_ctx_access); } void test_verifier_btf_ctx_access(void) { RUN(verifier_btf_ctx_access); }
void test_verifier_cfg(void) { RUN(verifier_cfg); } void test_verifier_cfg(void) { RUN(verifier_cfg); }
void test_verifier_cgroup_inv_retcode(void) { RUN(verifier_cgroup_inv_retcode); } void test_verifier_cgroup_inv_retcode(void) { RUN(verifier_cgroup_inv_retcode); }
@@ -126,6 +132,7 @@ void test_verifier_direct_packet_access(void) { RUN(verifier_direct_packet_acces
void test_verifier_direct_stack_access_wraparound(void) { RUN(verifier_direct_stack_access_wraparound); } void test_verifier_direct_stack_access_wraparound(void) { RUN(verifier_direct_stack_access_wraparound); }
void test_verifier_div0(void) { RUN(verifier_div0); } void test_verifier_div0(void) { RUN(verifier_div0); }
void test_verifier_div_overflow(void) { RUN(verifier_div_overflow); } void test_verifier_div_overflow(void) { RUN(verifier_div_overflow); }
void test_verifier_gotol(void) { RUN(verifier_gotol); }
void test_verifier_helper_access_var_len(void) { RUN(verifier_helper_access_var_len); } void test_verifier_helper_access_var_len(void) { RUN(verifier_helper_access_var_len); }
void test_verifier_helper_packet_access(void) { RUN(verifier_helper_packet_access); } void test_verifier_helper_packet_access(void) { RUN(verifier_helper_packet_access); }
void test_verifier_helper_restricted(void) { RUN(verifier_helper_restricted); } void test_verifier_helper_restricted(void) { RUN(verifier_helper_restricted); }
@@ -133,6 +140,7 @@ void test_verifier_helper_value_access(void) { RUN(verifier_helper_value_access
void test_verifier_int_ptr(void) { RUN(verifier_int_ptr); } void test_verifier_int_ptr(void) { RUN(verifier_int_ptr); }
void test_verifier_jeq_infer_not_null(void) { RUN(verifier_jeq_infer_not_null); } void test_verifier_jeq_infer_not_null(void) { RUN(verifier_jeq_infer_not_null); }
void test_verifier_ld_ind(void) { RUN(verifier_ld_ind); } void test_verifier_ld_ind(void) { RUN(verifier_ld_ind); }
void test_verifier_ldsx(void) { RUN(verifier_ldsx); }
void test_verifier_leak_ptr(void) { RUN(verifier_leak_ptr); } void test_verifier_leak_ptr(void) { RUN(verifier_leak_ptr); }
void test_verifier_loops1(void) { RUN(verifier_loops1); } void test_verifier_loops1(void) { RUN(verifier_loops1); }
void test_verifier_lwt(void) { RUN(verifier_lwt); } void test_verifier_lwt(void) { RUN(verifier_lwt); }
@@ -142,6 +150,7 @@ void test_verifier_map_ptr_mixing(void) { RUN(verifier_map_ptr_mixing); }
void test_verifier_map_ret_val(void) { RUN(verifier_map_ret_val); } void test_verifier_map_ret_val(void) { RUN(verifier_map_ret_val); }
void test_verifier_masking(void) { RUN(verifier_masking); } void test_verifier_masking(void) { RUN(verifier_masking); }
void test_verifier_meta_access(void) { RUN(verifier_meta_access); } void test_verifier_meta_access(void) { RUN(verifier_meta_access); }
void test_verifier_movsx(void) { RUN(verifier_movsx); }
void test_verifier_netfilter_ctx(void) { RUN(verifier_netfilter_ctx); } void test_verifier_netfilter_ctx(void) { RUN(verifier_netfilter_ctx); }
void test_verifier_netfilter_retcode(void) { RUN(verifier_netfilter_retcode); } void test_verifier_netfilter_retcode(void) { RUN(verifier_netfilter_retcode); }
void test_verifier_prevent_map_lookup(void) { RUN(verifier_prevent_map_lookup); } void test_verifier_prevent_map_lookup(void) { RUN(verifier_prevent_map_lookup); }
@@ -153,6 +162,7 @@ void test_verifier_regalloc(void) { RUN(verifier_regalloc); }
void test_verifier_ringbuf(void) { RUN(verifier_ringbuf); } void test_verifier_ringbuf(void) { RUN(verifier_ringbuf); }
void test_verifier_runtime_jit(void) { RUN(verifier_runtime_jit); } void test_verifier_runtime_jit(void) { RUN(verifier_runtime_jit); }
void test_verifier_scalar_ids(void) { RUN(verifier_scalar_ids); } void test_verifier_scalar_ids(void) { RUN(verifier_scalar_ids); }
void test_verifier_sdiv(void) { RUN(verifier_sdiv); }
void test_verifier_search_pruning(void) { RUN(verifier_search_pruning); } void test_verifier_search_pruning(void) { RUN(verifier_search_pruning); }
void test_verifier_sock(void) { RUN(verifier_sock); } void test_verifier_sock(void) { RUN(verifier_sock); }
void test_verifier_spill_fill(void) { RUN(verifier_spill_fill); } void test_verifier_spill_fill(void) { RUN(verifier_spill_fill); }

View File

@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0 // SPDX-License-Identifier: GPL-2.0
#include <test_progs.h> #include <test_progs.h>
#include "test_xdp_attach_fail.skel.h"
#define IFINDEX_LO 1 #define IFINDEX_LO 1
#define XDP_FLAGS_REPLACE (1U << 4) #define XDP_FLAGS_REPLACE (1U << 4)
@@ -85,10 +86,74 @@ out_1:
bpf_object__close(obj1); bpf_object__close(obj1);
} }
#define ERRMSG_LEN 64
struct xdp_errmsg {
char msg[ERRMSG_LEN];
};
static void on_xdp_errmsg(void *ctx, int cpu, void *data, __u32 size)
{
struct xdp_errmsg *ctx_errmg = ctx, *tp_errmsg = data;
memcpy(&ctx_errmg->msg, &tp_errmsg->msg, ERRMSG_LEN);
}
static const char tgt_errmsg[] = "Invalid XDP flags for BPF link attachment";
static void test_xdp_attach_fail(const char *file)
{
struct test_xdp_attach_fail *skel = NULL;
struct xdp_errmsg errmsg = {};
struct perf_buffer *pb = NULL;
struct bpf_object *obj = NULL;
int err, fd_xdp;
LIBBPF_OPTS(bpf_link_create_opts, opts);
skel = test_xdp_attach_fail__open_and_load();
if (!ASSERT_OK_PTR(skel, "test_xdp_attach_fail__open_and_load"))
goto out_close;
err = test_xdp_attach_fail__attach(skel);
if (!ASSERT_EQ(err, 0, "test_xdp_attach_fail__attach"))
goto out_close;
/* set up perf buffer */
pb = perf_buffer__new(bpf_map__fd(skel->maps.xdp_errmsg_pb), 1,
on_xdp_errmsg, NULL, &errmsg, NULL);
if (!ASSERT_OK_PTR(pb, "perf_buffer__new"))
goto out_close;
err = bpf_prog_test_load(file, BPF_PROG_TYPE_XDP, &obj, &fd_xdp);
if (!ASSERT_EQ(err, 0, "bpf_prog_test_load"))
goto out_close;
opts.flags = 0xFF; // invalid flags to fail to attach XDP prog
err = bpf_link_create(fd_xdp, IFINDEX_LO, BPF_XDP, &opts);
if (!ASSERT_EQ(err, -EINVAL, "bpf_link_create"))
goto out_close;
/* read perf buffer */
err = perf_buffer__poll(pb, 100);
if (!ASSERT_GT(err, -1, "perf_buffer__poll"))
goto out_close;
ASSERT_STRNEQ((const char *) errmsg.msg, tgt_errmsg,
42 /* strlen(tgt_errmsg) */, "check error message");
out_close:
perf_buffer__free(pb);
bpf_object__close(obj);
test_xdp_attach_fail__destroy(skel);
}
void serial_test_xdp_attach(void) void serial_test_xdp_attach(void)
{ {
if (test__start_subtest("xdp_attach")) if (test__start_subtest("xdp_attach"))
test_xdp_attach("./test_xdp.bpf.o"); test_xdp_attach("./test_xdp.bpf.o");
if (test__start_subtest("xdp_attach_dynptr")) if (test__start_subtest("xdp_attach_dynptr"))
test_xdp_attach("./test_xdp_dynptr.bpf.o"); test_xdp_attach("./test_xdp_dynptr.bpf.o");
if (test__start_subtest("xdp_attach_failed"))
test_xdp_attach_fail("./xdp_dummy.bpf.o");
} }

View File

@@ -0,0 +1,104 @@
// SPDX-License-Identifier: GPL-2.0-only
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#include "bpf_tracing_net.h"
#define NF_DROP 0
#define NF_ACCEPT 1
#define ETH_P_IP 0x0800
#define ETH_P_IPV6 0x86DD
#define IP_MF 0x2000
#define IP_OFFSET 0x1FFF
#define NEXTHDR_FRAGMENT 44
extern int bpf_dynptr_from_skb(struct sk_buff *skb, __u64 flags,
struct bpf_dynptr *ptr__uninit) __ksym;
extern void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, uint32_t offset,
void *buffer, uint32_t buffer__sz) __ksym;
volatile int shootdowns = 0;
static bool is_frag_v4(struct iphdr *iph)
{
int offset;
int flags;
offset = bpf_ntohs(iph->frag_off);
flags = offset & ~IP_OFFSET;
offset &= IP_OFFSET;
offset <<= 3;
return (flags & IP_MF) || offset;
}
static bool is_frag_v6(struct ipv6hdr *ip6h)
{
/* Simplifying assumption that there are no extension headers
* between fixed header and fragmentation header. This assumption
* is only valid in this test case. It saves us the hassle of
* searching all potential extension headers.
*/
return ip6h->nexthdr == NEXTHDR_FRAGMENT;
}
static int handle_v4(struct sk_buff *skb)
{
struct bpf_dynptr ptr;
u8 iph_buf[20] = {};
struct iphdr *iph;
if (bpf_dynptr_from_skb(skb, 0, &ptr))
return NF_DROP;
iph = bpf_dynptr_slice(&ptr, 0, iph_buf, sizeof(iph_buf));
if (!iph)
return NF_DROP;
/* Shootdown any frags */
if (is_frag_v4(iph)) {
shootdowns++;
return NF_DROP;
}
return NF_ACCEPT;
}
static int handle_v6(struct sk_buff *skb)
{
struct bpf_dynptr ptr;
struct ipv6hdr *ip6h;
u8 ip6h_buf[40] = {};
if (bpf_dynptr_from_skb(skb, 0, &ptr))
return NF_DROP;
ip6h = bpf_dynptr_slice(&ptr, 0, ip6h_buf, sizeof(ip6h_buf));
if (!ip6h)
return NF_DROP;
/* Shootdown any frags */
if (is_frag_v6(ip6h)) {
shootdowns++;
return NF_DROP;
}
return NF_ACCEPT;
}
SEC("netfilter")
int defrag(struct bpf_nf_ctx *ctx)
{
struct sk_buff *skb = ctx->skb;
switch (bpf_ntohs(skb->protocol)) {
case ETH_P_IP:
return handle_v4(skb);
case ETH_P_IPV6:
return handle_v6(skb);
default:
return NF_ACCEPT;
}
}
char _license[] SEC("license") = "GPL";

View File

@@ -0,0 +1,142 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Isovalent */
#include <stdbool.h>
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/in.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/tcp.h>
#include <linux/udp.h>
#include <bpf/bpf_endian.h>
#include <bpf/bpf_helpers.h>
#include <linux/pkt_cls.h>
char LICENSE[] SEC("license") = "GPL";
__u64 sk_cookie_seen;
__u64 reuseport_executed;
union {
struct tcphdr tcp;
struct udphdr udp;
} headers;
const volatile __u16 dest_port;
struct {
__uint(type, BPF_MAP_TYPE_SOCKMAP);
__uint(max_entries, 1);
__type(key, __u32);
__type(value, __u64);
} sk_map SEC(".maps");
SEC("sk_reuseport")
int reuse_accept(struct sk_reuseport_md *ctx)
{
reuseport_executed++;
if (ctx->ip_protocol == IPPROTO_TCP) {
if (ctx->data + sizeof(headers.tcp) > ctx->data_end)
return SK_DROP;
if (__builtin_memcmp(&headers.tcp, ctx->data, sizeof(headers.tcp)) != 0)
return SK_DROP;
} else if (ctx->ip_protocol == IPPROTO_UDP) {
if (ctx->data + sizeof(headers.udp) > ctx->data_end)
return SK_DROP;
if (__builtin_memcmp(&headers.udp, ctx->data, sizeof(headers.udp)) != 0)
return SK_DROP;
} else {
return SK_DROP;
}
sk_cookie_seen = bpf_get_socket_cookie(ctx->sk);
return SK_PASS;
}
SEC("sk_reuseport")
int reuse_drop(struct sk_reuseport_md *ctx)
{
reuseport_executed++;
sk_cookie_seen = 0;
return SK_DROP;
}
static int
assign_sk(struct __sk_buff *skb)
{
int zero = 0, ret = 0;
struct bpf_sock *sk;
sk = bpf_map_lookup_elem(&sk_map, &zero);
if (!sk)
return TC_ACT_SHOT;
ret = bpf_sk_assign(skb, sk, 0);
bpf_sk_release(sk);
return ret ? TC_ACT_SHOT : TC_ACT_OK;
}
static bool
maybe_assign_tcp(struct __sk_buff *skb, struct tcphdr *th)
{
if (th + 1 > (void *)(long)(skb->data_end))
return TC_ACT_SHOT;
if (!th->syn || th->ack || th->dest != bpf_htons(dest_port))
return TC_ACT_OK;
__builtin_memcpy(&headers.tcp, th, sizeof(headers.tcp));
return assign_sk(skb);
}
static bool
maybe_assign_udp(struct __sk_buff *skb, struct udphdr *uh)
{
if (uh + 1 > (void *)(long)(skb->data_end))
return TC_ACT_SHOT;
if (uh->dest != bpf_htons(dest_port))
return TC_ACT_OK;
__builtin_memcpy(&headers.udp, uh, sizeof(headers.udp));
return assign_sk(skb);
}
SEC("tc")
int tc_main(struct __sk_buff *skb)
{
void *data_end = (void *)(long)skb->data_end;
void *data = (void *)(long)skb->data;
struct ethhdr *eth;
eth = (struct ethhdr *)(data);
if (eth + 1 > data_end)
return TC_ACT_SHOT;
if (eth->h_proto == bpf_htons(ETH_P_IP)) {
struct iphdr *iph = (struct iphdr *)(data + sizeof(*eth));
if (iph + 1 > data_end)
return TC_ACT_SHOT;
if (iph->protocol == IPPROTO_TCP)
return maybe_assign_tcp(skb, (struct tcphdr *)(iph + 1));
else if (iph->protocol == IPPROTO_UDP)
return maybe_assign_udp(skb, (struct udphdr *)(iph + 1));
else
return TC_ACT_SHOT;
} else {
struct ipv6hdr *ip6h = (struct ipv6hdr *)(data + sizeof(*eth));
if (ip6h + 1 > data_end)
return TC_ACT_SHOT;
if (ip6h->nexthdr == IPPROTO_TCP)
return maybe_assign_tcp(skb, (struct tcphdr *)(ip6h + 1));
else if (ip6h->nexthdr == IPPROTO_UDP)
return maybe_assign_udp(skb, (struct udphdr *)(ip6h + 1));
else
return TC_ACT_SHOT;
}
}

View File

@@ -12,6 +12,15 @@
#include <linux/ipv6.h> #include <linux/ipv6.h>
#include <linux/udp.h> #include <linux/udp.h>
/* offsetof() is used in static asserts, and the libbpf-redefined CO-RE
* friendly version breaks compilation for older clang versions <= 15
* when invoked in a static assert. Restore original here.
*/
#ifdef offsetof
#undef offsetof
#define offsetof(type, member) __builtin_offsetof(type, member)
#endif
struct gre_base_hdr { struct gre_base_hdr {
uint16_t flags; uint16_t flags;
uint16_t protocol; uint16_t protocol;

View File

@@ -0,0 +1,118 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18
const volatile int skip = 0;
#else
const volatile int skip = 1;
#endif
volatile const short val1 = -1;
volatile const int val2 = -1;
short val3 = -1;
int val4 = -1;
int done1, done2, ret1, ret2;
SEC("?raw_tp/sys_enter")
int rdonly_map_prog(const void *ctx)
{
if (done1)
return 0;
done1 = 1;
/* val1/val2 readonly map */
if (val1 == val2)
ret1 = 1;
return 0;
}
SEC("?raw_tp/sys_enter")
int map_val_prog(const void *ctx)
{
if (done2)
return 0;
done2 = 1;
/* val1/val2 regular read/write map */
if (val3 == val4)
ret2 = 1;
return 0;
}
struct bpf_testmod_struct_arg_1 {
int a;
};
long long int_member;
SEC("?fentry/bpf_testmod_test_arg_ptr_to_struct")
int BPF_PROG2(test_ptr_struct_arg, struct bpf_testmod_struct_arg_1 *, p)
{
/* probed memory access */
int_member = p->a;
return 0;
}
long long set_optlen, set_retval;
SEC("?cgroup/getsockopt")
int _getsockopt(volatile struct bpf_sockopt *ctx)
{
int old_optlen, old_retval;
old_optlen = ctx->optlen;
old_retval = ctx->retval;
ctx->optlen = -1;
ctx->retval = -1;
/* sign extension for ctx member */
set_optlen = ctx->optlen;
set_retval = ctx->retval;
ctx->optlen = old_optlen;
ctx->retval = old_retval;
return 0;
}
long long set_mark;
SEC("?tc")
int _tc(volatile struct __sk_buff *skb)
{
long long tmp_mark;
int old_mark;
old_mark = skb->mark;
skb->mark = 0xf6fe;
/* narrowed sign extension for ctx member */
#if __clang_major__ >= 18
/* force narrow one-byte signed load. Otherwise, compiler may
* generate a 32-bit unsigned load followed by an s8 movsx.
*/
asm volatile ("r1 = *(s8 *)(%[ctx] + %[off_mark])\n\t"
"%[tmp_mark] = r1"
: [tmp_mark]"=r"(tmp_mark)
: [ctx]"r"(skb),
[off_mark]"i"(offsetof(struct __sk_buff, mark))
: "r1");
#else
tmp_mark = (char)skb->mark;
#endif
set_mark = tmp_mark;
skb->mark = old_mark;
return 0;
}
char _license[] SEC("license") = "GPL";

View File

@@ -0,0 +1,54 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright Leon Hwang */
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#define ERRMSG_LEN 64
struct xdp_errmsg {
char msg[ERRMSG_LEN];
};
struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__type(key, int);
__type(value, int);
} xdp_errmsg_pb SEC(".maps");
struct xdp_attach_error_ctx {
unsigned long unused;
/*
* bpf does not support tracepoint __data_loc directly.
*
* Actually, this field is a 32 bit integer whose value encodes
* information on where to find the actual data. The first 2 bytes is
* the size of the data. The last 2 bytes is the offset from the start
* of the tracepoint struct where the data begins.
* -- https://github.com/iovisor/bpftrace/pull/1542
*/
__u32 msg; // __data_loc char[] msg;
};
/*
* Catch the error message at the tracepoint.
*/
SEC("tp/xdp/bpf_xdp_link_attach_failed")
int tp__xdp__bpf_xdp_link_attach_failed(struct xdp_attach_error_ctx *ctx)
{
char *msg = (void *)(__u64) ((void *) ctx + (__u16) ctx->msg);
struct xdp_errmsg errmsg = {};
bpf_probe_read_kernel_str(&errmsg.msg, ERRMSG_LEN, msg);
bpf_perf_event_output(ctx, &xdp_errmsg_pb, BPF_F_CURRENT_CPU, &errmsg,
ERRMSG_LEN);
return 0;
}
/*
* Reuse the XDP program in xdp_dummy.c.
*/
char LICENSE[] SEC("license") = "GPL";

View File

@@ -0,0 +1,59 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"
#if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18
SEC("socket")
__description("BSWAP, 16")
__success __success_unpriv __retval(0x23ff)
__naked void bswap_16(void)
{
asm volatile (" \
r0 = 0xff23; \
r0 = bswap16 r0; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("BSWAP, 32")
__success __success_unpriv __retval(0x23ff0000)
__naked void bswap_32(void)
{
asm volatile (" \
r0 = 0xff23; \
r0 = bswap32 r0; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("BSWAP, 64")
__success __success_unpriv __retval(0x34ff12ff)
__naked void bswap_64(void)
{
asm volatile (" \
r0 = %[u64_val] ll; \
r0 = bswap64 r0; \
exit; \
" :
: [u64_val]"i"(0xff12ff34ff56ff78ull)
: __clobber_all);
}
#else
SEC("socket")
__description("cpuv4 is not supported by compiler or jit, use a dummy test")
__success
int dummy_test(void)
{
return 0;
}
#endif
char _license[] SEC("license") = "GPL";

View File

@@ -0,0 +1,44 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"
#if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18
SEC("socket")
__description("gotol, small_imm")
__success __success_unpriv __retval(1)
__naked void gotol_small_imm(void)
{
asm volatile (" \
call %[bpf_ktime_get_ns]; \
if r0 == 0 goto l0_%=; \
gotol l1_%=; \
l2_%=: \
gotol l3_%=; \
l1_%=: \
r0 = 1; \
gotol l2_%=; \
l0_%=: \
r0 = 2; \
l3_%=: \
exit; \
" :
: __imm(bpf_ktime_get_ns)
: __clobber_all);
}
#else
SEC("socket")
__description("cpuv4 is not supported by compiler or jit, use a dummy test")
__success
int dummy_test(void)
{
return 0;
}
#endif
char _license[] SEC("license") = "GPL";

View File

@@ -0,0 +1,131 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"
#if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18
SEC("socket")
__description("LDSX, S8")
__success __success_unpriv __retval(-2)
__naked void ldsx_s8(void)
{
asm volatile (" \
r1 = 0x3fe; \
*(u64 *)(r10 - 8) = r1; \
r0 = *(s8 *)(r10 - 8); \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("LDSX, S16")
__success __success_unpriv __retval(-2)
__naked void ldsx_s16(void)
{
asm volatile (" \
r1 = 0x3fffe; \
*(u64 *)(r10 - 8) = r1; \
r0 = *(s16 *)(r10 - 8); \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("LDSX, S32")
__success __success_unpriv __retval(-1)
__naked void ldsx_s32(void)
{
asm volatile (" \
r1 = 0xfffffffe; \
*(u64 *)(r10 - 8) = r1; \
r0 = *(s32 *)(r10 - 8); \
r0 >>= 1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("LDSX, S8 range checking, privileged")
__log_level(2) __success __retval(1)
__msg("R1_w=scalar(smin=-128,smax=127)")
__naked void ldsx_s8_range_priv(void)
{
asm volatile (" \
call %[bpf_get_prandom_u32]; \
*(u64 *)(r10 - 8) = r0; \
r1 = *(s8 *)(r10 - 8); \
/* r1 with s8 range */ \
if r1 s> 0x7f goto l0_%=; \
if r1 s< -0x80 goto l0_%=; \
r0 = 1; \
l1_%=: \
exit; \
l0_%=: \
r0 = 2; \
goto l1_%=; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("socket")
__description("LDSX, S16 range checking")
__success __success_unpriv __retval(1)
__naked void ldsx_s16_range(void)
{
asm volatile (" \
call %[bpf_get_prandom_u32]; \
*(u64 *)(r10 - 8) = r0; \
r1 = *(s16 *)(r10 - 8); \
/* r1 with s16 range */ \
if r1 s> 0x7fff goto l0_%=; \
if r1 s< -0x8000 goto l0_%=; \
r0 = 1; \
l1_%=: \
exit; \
l0_%=: \
r0 = 2; \
goto l1_%=; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("socket")
__description("LDSX, S32 range checking")
__success __success_unpriv __retval(1)
__naked void ldsx_s32_range(void)
{
asm volatile (" \
call %[bpf_get_prandom_u32]; \
*(u64 *)(r10 - 8) = r0; \
r1 = *(s32 *)(r10 - 8); \
/* r1 with s16 range */ \
if r1 s> 0x7fffFFFF goto l0_%=; \
if r1 s< -0x80000000 goto l0_%=; \
r0 = 1; \
l1_%=: \
exit; \
l0_%=: \
r0 = 2; \
goto l1_%=; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
#else
SEC("socket")
__description("cpuv4 is not supported by compiler or jit, use a dummy test")
__success
int dummy_test(void)
{
return 0;
}
#endif
char _license[] SEC("license") = "GPL";

View File

@@ -0,0 +1,213 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"
#if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18
SEC("socket")
__description("MOV32SX, S8")
__success __success_unpriv __retval(0x23)
__naked void mov32sx_s8(void)
{
asm volatile (" \
w0 = 0xff23; \
w0 = (s8)w0; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("MOV32SX, S16")
__success __success_unpriv __retval(0xFFFFff23)
__naked void mov32sx_s16(void)
{
asm volatile (" \
w0 = 0xff23; \
w0 = (s16)w0; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("MOV64SX, S8")
__success __success_unpriv __retval(-2)
__naked void mov64sx_s8(void)
{
asm volatile (" \
r0 = 0x1fe; \
r0 = (s8)r0; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("MOV64SX, S16")
__success __success_unpriv __retval(0xf23)
__naked void mov64sx_s16(void)
{
asm volatile (" \
r0 = 0xf0f23; \
r0 = (s16)r0; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("MOV64SX, S32")
__success __success_unpriv __retval(-1)
__naked void mov64sx_s32(void)
{
asm volatile (" \
r0 = 0xfffffffe; \
r0 = (s32)r0; \
r0 >>= 1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("MOV32SX, S8, range_check")
__success __success_unpriv __retval(1)
__naked void mov32sx_s8_range(void)
{
asm volatile (" \
call %[bpf_get_prandom_u32]; \
w1 = (s8)w0; \
/* w1 with s8 range */ \
if w1 s> 0x7f goto l0_%=; \
if w1 s< -0x80 goto l0_%=; \
r0 = 1; \
l1_%=: \
exit; \
l0_%=: \
r0 = 2; \
goto l1_%=; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("socket")
__description("MOV32SX, S16, range_check")
__success __success_unpriv __retval(1)
__naked void mov32sx_s16_range(void)
{
asm volatile (" \
call %[bpf_get_prandom_u32]; \
w1 = (s16)w0; \
/* w1 with s16 range */ \
if w1 s> 0x7fff goto l0_%=; \
if w1 s< -0x80ff goto l0_%=; \
r0 = 1; \
l1_%=: \
exit; \
l0_%=: \
r0 = 2; \
goto l1_%=; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("socket")
__description("MOV32SX, S16, range_check 2")
__success __success_unpriv __retval(1)
__naked void mov32sx_s16_range_2(void)
{
asm volatile (" \
r1 = 65535; \
w2 = (s16)w1; \
r2 >>= 1; \
if r2 != 0x7fffFFFF goto l0_%=; \
r0 = 1; \
l1_%=: \
exit; \
l0_%=: \
r0 = 0; \
goto l1_%=; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("socket")
__description("MOV64SX, S8, range_check")
__success __success_unpriv __retval(1)
__naked void mov64sx_s8_range(void)
{
asm volatile (" \
call %[bpf_get_prandom_u32]; \
r1 = (s8)r0; \
/* r1 with s8 range */ \
if r1 s> 0x7f goto l0_%=; \
if r1 s< -0x80 goto l0_%=; \
r0 = 1; \
l1_%=: \
exit; \
l0_%=: \
r0 = 2; \
goto l1_%=; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("socket")
__description("MOV64SX, S16, range_check")
__success __success_unpriv __retval(1)
__naked void mov64sx_s16_range(void)
{
asm volatile (" \
call %[bpf_get_prandom_u32]; \
r1 = (s16)r0; \
/* r1 with s16 range */ \
if r1 s> 0x7fff goto l0_%=; \
if r1 s< -0x8000 goto l0_%=; \
r0 = 1; \
l1_%=: \
exit; \
l0_%=: \
r0 = 2; \
goto l1_%=; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("socket")
__description("MOV64SX, S32, range_check")
__success __success_unpriv __retval(1)
__naked void mov64sx_s32_range(void)
{
asm volatile (" \
call %[bpf_get_prandom_u32]; \
r1 = (s32)r0; \
/* r1 with s32 range */ \
if r1 s> 0x7fffffff goto l0_%=; \
if r1 s< -0x80000000 goto l0_%=; \
r0 = 1; \
l1_%=: \
exit; \
l0_%=: \
r0 = 2; \
goto l1_%=; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
#else
SEC("socket")
__description("cpuv4 is not supported by compiler or jit, use a dummy test")
__success
int dummy_test(void)
{
return 0;
}
#endif
char _license[] SEC("license") = "GPL";

View File

@@ -0,0 +1,781 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"
#if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18
SEC("socket")
__description("SDIV32, non-zero imm divisor, check 1")
__success __success_unpriv __retval(-20)
__naked void sdiv32_non_zero_imm_1(void)
{
asm volatile (" \
w0 = -41; \
w0 s/= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero imm divisor, check 2")
__success __success_unpriv __retval(-20)
__naked void sdiv32_non_zero_imm_2(void)
{
asm volatile (" \
w0 = 41; \
w0 s/= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero imm divisor, check 3")
__success __success_unpriv __retval(20)
__naked void sdiv32_non_zero_imm_3(void)
{
asm volatile (" \
w0 = -41; \
w0 s/= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero imm divisor, check 4")
__success __success_unpriv __retval(-21)
__naked void sdiv32_non_zero_imm_4(void)
{
asm volatile (" \
w0 = -42; \
w0 s/= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero imm divisor, check 5")
__success __success_unpriv __retval(-21)
__naked void sdiv32_non_zero_imm_5(void)
{
asm volatile (" \
w0 = 42; \
w0 s/= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero imm divisor, check 6")
__success __success_unpriv __retval(21)
__naked void sdiv32_non_zero_imm_6(void)
{
asm volatile (" \
w0 = -42; \
w0 s/= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero imm divisor, check 7")
__success __success_unpriv __retval(21)
__naked void sdiv32_non_zero_imm_7(void)
{
asm volatile (" \
w0 = 42; \
w0 s/= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero imm divisor, check 8")
__success __success_unpriv __retval(20)
__naked void sdiv32_non_zero_imm_8(void)
{
asm volatile (" \
w0 = 41; \
w0 s/= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero reg divisor, check 1")
__success __success_unpriv __retval(-20)
__naked void sdiv32_non_zero_reg_1(void)
{
asm volatile (" \
w0 = -41; \
w1 = 2; \
w0 s/= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero reg divisor, check 2")
__success __success_unpriv __retval(-20)
__naked void sdiv32_non_zero_reg_2(void)
{
asm volatile (" \
w0 = 41; \
w1 = -2; \
w0 s/= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero reg divisor, check 3")
__success __success_unpriv __retval(20)
__naked void sdiv32_non_zero_reg_3(void)
{
asm volatile (" \
w0 = -41; \
w1 = -2; \
w0 s/= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero reg divisor, check 4")
__success __success_unpriv __retval(-21)
__naked void sdiv32_non_zero_reg_4(void)
{
asm volatile (" \
w0 = -42; \
w1 = 2; \
w0 s/= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero reg divisor, check 5")
__success __success_unpriv __retval(-21)
__naked void sdiv32_non_zero_reg_5(void)
{
asm volatile (" \
w0 = 42; \
w1 = -2; \
w0 s/= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero reg divisor, check 6")
__success __success_unpriv __retval(21)
__naked void sdiv32_non_zero_reg_6(void)
{
asm volatile (" \
w0 = -42; \
w1 = -2; \
w0 s/= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero reg divisor, check 7")
__success __success_unpriv __retval(21)
__naked void sdiv32_non_zero_reg_7(void)
{
asm volatile (" \
w0 = 42; \
w1 = 2; \
w0 s/= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, non-zero reg divisor, check 8")
__success __success_unpriv __retval(20)
__naked void sdiv32_non_zero_reg_8(void)
{
asm volatile (" \
w0 = 41; \
w1 = 2; \
w0 s/= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero imm divisor, check 1")
__success __success_unpriv __retval(-20)
__naked void sdiv64_non_zero_imm_1(void)
{
asm volatile (" \
r0 = -41; \
r0 s/= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero imm divisor, check 2")
__success __success_unpriv __retval(-20)
__naked void sdiv64_non_zero_imm_2(void)
{
asm volatile (" \
r0 = 41; \
r0 s/= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero imm divisor, check 3")
__success __success_unpriv __retval(20)
__naked void sdiv64_non_zero_imm_3(void)
{
asm volatile (" \
r0 = -41; \
r0 s/= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero imm divisor, check 4")
__success __success_unpriv __retval(-21)
__naked void sdiv64_non_zero_imm_4(void)
{
asm volatile (" \
r0 = -42; \
r0 s/= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero imm divisor, check 5")
__success __success_unpriv __retval(-21)
__naked void sdiv64_non_zero_imm_5(void)
{
asm volatile (" \
r0 = 42; \
r0 s/= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero imm divisor, check 6")
__success __success_unpriv __retval(21)
__naked void sdiv64_non_zero_imm_6(void)
{
asm volatile (" \
r0 = -42; \
r0 s/= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero reg divisor, check 1")
__success __success_unpriv __retval(-20)
__naked void sdiv64_non_zero_reg_1(void)
{
asm volatile (" \
r0 = -41; \
r1 = 2; \
r0 s/= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero reg divisor, check 2")
__success __success_unpriv __retval(-20)
__naked void sdiv64_non_zero_reg_2(void)
{
asm volatile (" \
r0 = 41; \
r1 = -2; \
r0 s/= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero reg divisor, check 3")
__success __success_unpriv __retval(20)
__naked void sdiv64_non_zero_reg_3(void)
{
asm volatile (" \
r0 = -41; \
r1 = -2; \
r0 s/= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero reg divisor, check 4")
__success __success_unpriv __retval(-21)
__naked void sdiv64_non_zero_reg_4(void)
{
asm volatile (" \
r0 = -42; \
r1 = 2; \
r0 s/= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero reg divisor, check 5")
__success __success_unpriv __retval(-21)
__naked void sdiv64_non_zero_reg_5(void)
{
asm volatile (" \
r0 = 42; \
r1 = -2; \
r0 s/= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, non-zero reg divisor, check 6")
__success __success_unpriv __retval(21)
__naked void sdiv64_non_zero_reg_6(void)
{
asm volatile (" \
r0 = -42; \
r1 = -2; \
r0 s/= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero imm divisor, check 1")
__success __success_unpriv __retval(-1)
__naked void smod32_non_zero_imm_1(void)
{
asm volatile (" \
w0 = -41; \
w0 s%%= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero imm divisor, check 2")
__success __success_unpriv __retval(1)
__naked void smod32_non_zero_imm_2(void)
{
asm volatile (" \
w0 = 41; \
w0 s%%= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero imm divisor, check 3")
__success __success_unpriv __retval(-1)
__naked void smod32_non_zero_imm_3(void)
{
asm volatile (" \
w0 = -41; \
w0 s%%= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero imm divisor, check 4")
__success __success_unpriv __retval(0)
__naked void smod32_non_zero_imm_4(void)
{
asm volatile (" \
w0 = -42; \
w0 s%%= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero imm divisor, check 5")
__success __success_unpriv __retval(0)
__naked void smod32_non_zero_imm_5(void)
{
asm volatile (" \
w0 = 42; \
w0 s%%= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero imm divisor, check 6")
__success __success_unpriv __retval(0)
__naked void smod32_non_zero_imm_6(void)
{
asm volatile (" \
w0 = -42; \
w0 s%%= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero reg divisor, check 1")
__success __success_unpriv __retval(-1)
__naked void smod32_non_zero_reg_1(void)
{
asm volatile (" \
w0 = -41; \
w1 = 2; \
w0 s%%= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero reg divisor, check 2")
__success __success_unpriv __retval(1)
__naked void smod32_non_zero_reg_2(void)
{
asm volatile (" \
w0 = 41; \
w1 = -2; \
w0 s%%= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero reg divisor, check 3")
__success __success_unpriv __retval(-1)
__naked void smod32_non_zero_reg_3(void)
{
asm volatile (" \
w0 = -41; \
w1 = -2; \
w0 s%%= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero reg divisor, check 4")
__success __success_unpriv __retval(0)
__naked void smod32_non_zero_reg_4(void)
{
asm volatile (" \
w0 = -42; \
w1 = 2; \
w0 s%%= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero reg divisor, check 5")
__success __success_unpriv __retval(0)
__naked void smod32_non_zero_reg_5(void)
{
asm volatile (" \
w0 = 42; \
w1 = -2; \
w0 s%%= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, non-zero reg divisor, check 6")
__success __success_unpriv __retval(0)
__naked void smod32_non_zero_reg_6(void)
{
asm volatile (" \
w0 = -42; \
w1 = -2; \
w0 s%%= w1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero imm divisor, check 1")
__success __success_unpriv __retval(-1)
__naked void smod64_non_zero_imm_1(void)
{
asm volatile (" \
r0 = -41; \
r0 s%%= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero imm divisor, check 2")
__success __success_unpriv __retval(1)
__naked void smod64_non_zero_imm_2(void)
{
asm volatile (" \
r0 = 41; \
r0 s%%= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero imm divisor, check 3")
__success __success_unpriv __retval(-1)
__naked void smod64_non_zero_imm_3(void)
{
asm volatile (" \
r0 = -41; \
r0 s%%= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero imm divisor, check 4")
__success __success_unpriv __retval(0)
__naked void smod64_non_zero_imm_4(void)
{
asm volatile (" \
r0 = -42; \
r0 s%%= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero imm divisor, check 5")
__success __success_unpriv __retval(-0)
__naked void smod64_non_zero_imm_5(void)
{
asm volatile (" \
r0 = 42; \
r0 s%%= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero imm divisor, check 6")
__success __success_unpriv __retval(0)
__naked void smod64_non_zero_imm_6(void)
{
asm volatile (" \
r0 = -42; \
r0 s%%= -2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero imm divisor, check 7")
__success __success_unpriv __retval(0)
__naked void smod64_non_zero_imm_7(void)
{
asm volatile (" \
r0 = 42; \
r0 s%%= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero imm divisor, check 8")
__success __success_unpriv __retval(1)
__naked void smod64_non_zero_imm_8(void)
{
asm volatile (" \
r0 = 41; \
r0 s%%= 2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero reg divisor, check 1")
__success __success_unpriv __retval(-1)
__naked void smod64_non_zero_reg_1(void)
{
asm volatile (" \
r0 = -41; \
r1 = 2; \
r0 s%%= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero reg divisor, check 2")
__success __success_unpriv __retval(1)
__naked void smod64_non_zero_reg_2(void)
{
asm volatile (" \
r0 = 41; \
r1 = -2; \
r0 s%%= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero reg divisor, check 3")
__success __success_unpriv __retval(-1)
__naked void smod64_non_zero_reg_3(void)
{
asm volatile (" \
r0 = -41; \
r1 = -2; \
r0 s%%= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero reg divisor, check 4")
__success __success_unpriv __retval(0)
__naked void smod64_non_zero_reg_4(void)
{
asm volatile (" \
r0 = -42; \
r1 = 2; \
r0 s%%= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero reg divisor, check 5")
__success __success_unpriv __retval(0)
__naked void smod64_non_zero_reg_5(void)
{
asm volatile (" \
r0 = 42; \
r1 = -2; \
r0 s%%= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero reg divisor, check 6")
__success __success_unpriv __retval(0)
__naked void smod64_non_zero_reg_6(void)
{
asm volatile (" \
r0 = -42; \
r1 = -2; \
r0 s%%= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero reg divisor, check 7")
__success __success_unpriv __retval(0)
__naked void smod64_non_zero_reg_7(void)
{
asm volatile (" \
r0 = 42; \
r1 = 2; \
r0 s%%= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, non-zero reg divisor, check 8")
__success __success_unpriv __retval(1)
__naked void smod64_non_zero_reg_8(void)
{
asm volatile (" \
r0 = 41; \
r1 = 2; \
r0 s%%= r1; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV32, zero divisor")
__success __success_unpriv __retval(0)
__naked void sdiv32_zero_divisor(void)
{
asm volatile (" \
w0 = 42; \
w1 = 0; \
w2 = -1; \
w2 s/= w1; \
w0 = w2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SDIV64, zero divisor")
__success __success_unpriv __retval(0)
__naked void sdiv64_zero_divisor(void)
{
asm volatile (" \
r0 = 42; \
r1 = 0; \
r2 = -1; \
r2 s/= r1; \
r0 = r2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD32, zero divisor")
__success __success_unpriv __retval(-1)
__naked void smod32_zero_divisor(void)
{
asm volatile (" \
w0 = 42; \
w1 = 0; \
w2 = -1; \
w2 s%%= w1; \
w0 = w2; \
exit; \
" ::: __clobber_all);
}
SEC("socket")
__description("SMOD64, zero divisor")
__success __success_unpriv __retval(-1)
__naked void smod64_zero_divisor(void)
{
asm volatile (" \
r0 = 42; \
r1 = 0; \
r2 = -1; \
r2 s%%= r1; \
r0 = r2; \
exit; \
" ::: __clobber_all);
}
#else
SEC("socket")
__description("cpuv4 is not supported by compiler or jit, use a dummy test")
__success
int dummy_test(void)
{
return 0;
}
#endif
char _license[] SEC("license") = "GPL";

View File

@@ -176,11 +176,11 @@
.retval = 1, .retval = 1,
}, },
{ {
"invalid 64-bit BPF_END", "invalid 64-bit BPF_END with BPF_TO_BE",
.insns = { .insns = {
BPF_MOV32_IMM(BPF_REG_0, 0), BPF_MOV32_IMM(BPF_REG_0, 0),
{ {
.code = BPF_ALU64 | BPF_END | BPF_TO_LE, .code = BPF_ALU64 | BPF_END | BPF_TO_BE,
.dst_reg = BPF_REG_0, .dst_reg = BPF_REG_0,
.src_reg = 0, .src_reg = 0,
.off = 0, .off = 0,
@@ -188,7 +188,7 @@
}, },
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
}, },
.errstr = "unknown opcode d7", .errstr = "unknown opcode df",
.result = REJECT, .result = REJECT,
}, },
{ {

View File

@@ -2076,7 +2076,7 @@ static void init_iface(struct ifobject *ifobj, const char *dst_mac, const char *
err = bpf_xdp_query(ifobj->ifindex, XDP_FLAGS_DRV_MODE, &query_opts); err = bpf_xdp_query(ifobj->ifindex, XDP_FLAGS_DRV_MODE, &query_opts);
if (err) { if (err) {
ksft_print_msg("Error querrying XDP capabilities\n"); ksft_print_msg("Error querying XDP capabilities\n");
exit_with_error(-err); exit_with_error(-err);
} }
if (query_opts.feature_flags & NETDEV_XDP_ACT_RX_SG) if (query_opts.feature_flags & NETDEV_XDP_ACT_RX_SG)