Project

General

Profile

Actions

Feature #6235

open

osmo-epdg: gtp tunnel management

Added by pespin 4 months ago. Updated about 23 hours ago.

Status:
In Progress
Priority:
Urgent
Assignee:
Target version:
-
Start date:
10/25/2023
Due date:
% Done:

90%


Description

osmo-epdg needs to create GTP tunnels (GTP-U) for each session created against PGW over the S2b interface.
The GTPv2C side of things is already more or less in place for CreateSession Req+Resp, but we are not yet setting up the tunnel.

AFAIU the idea is to use netfilter rules to route traffic between the IPsec strongswan tunnel and each of the GTP tunnels, using fwmark iirc.
It's still not clear who's in charge to set up the netfilter rules.

From https://osmocom.org/issues/5288#note-4:

It would be an idea to first have a small stand-alone erlang program that uses those libraries to create kernel GTP tunnels whose creation/modification/removal can be confirmed using libgtpnl/tools/gtp-tunnel to display the current in-kernel state.

We also need to think about the best approach to run the erlang code with NET_ADMIN capabilities which will probably be needed to create the tunnels and set up netfiler rules.


Related issues

Related to osmo-ePDG - VoWifi Evolved Packet Data Gateway - Bug #6361: open5gs-upfd: Fix open5gs package assigning 1st IP address of the UE pool to the ogstunFeedbackpespin02/15/2024

Actions
Actions #2

Updated by pespin 4 months ago

AFAIU we need, at osmo-epdg, to create/manage one GTP tun dev for each <remote_pgw_ip_addr, APN> key tuple.

Then, on each of those, we have a complete TEID namespace, and we need to configure 1 entry for each bearer created during CreateSessionResponse, containing:
- local_teid: Selected by osmo-epdg when sending CreateSessionRequest
- remote_teid: Found in UE IP in CreateSessionResponse
- local inner (ms) IP address: Found in UE IP in CreateSessionResponse
- local outter (gsn) IP address: Set up in config file
- remote outter (gsn) IP address: Set up in config file and perhaps updated in CreateSessionResponse

So upon receiving CreateSession Response, if we didn't yet create any tun dev for that <remote_pgw_ip_addr, APN>,we first create the tun dev. Then, we set up the tunnel config above.

Actions #3

Updated by laforge 4 months ago

On Wed, Oct 25, 2023 at 05:00:28PM +0000, pespin wrote:

We also need to think about the best approach to run the erlang code with NET_ADMIN capabilities which will probably be needed to create the tunnels and set up netfiler rules.

I would expect the ergw folks have some solution for that?

Actions #4

Updated by pespin 4 months ago

I started trying to import gtp_u_kmod in order to use it in osmo-epdg, but I'm facing several problems which seem to be related to unmaintainted public repo (and its dpeendencies):
  • Most of them pull github links with "git://" proto, which is not supported anymore. I submitted several PR upstream to fix some of them, but I had to end up using this to workaround it:
    "git config --global url."https://github.com/".insteadOf git://github.com/".
  • Some are pulling an old version of lager which fails to build here with OTP-26. I submitted some patches upgrading lager to last upstream 3.9.2 release.
  • gen_socket is linking against -lerl_interface, which doesn't exist apparently since OTP-23. I submitted a PR dropping that link argument, since that fixes it here locally.

So I now have my own fork of gen_socket, gen_netlink, gtplib and gtp_u_kmod repositories, which the patches submitted as PR plus updates to rebar.config pointing to my forked repo/branches.

With that, I can build everything inside osmo-epdg.git (branch "pespin/master"), but I'm having problems at runtime too:

18:29:57.856 [info] RegName: port_grx
18:29:57.873 [notice] VrfOpts: [{routes,[{{10,180,0,0},16}]}]
18:29:57.873 [notice] FDesc: {file_descriptor,raw_file_io_list,{file_descriptor,prim_file,#{handle => #Ref<0.114031393.2719088661.121660>,owner => <0.526.0>,r_buffer => #Ref<0.114031393.2719088642.121370>,r_ahead_size => 0}}}
terminating ss7_routes with reason shutdownterminating ss7_links with reason shutdown18:29:57.873 [error] CRASH REPORT Process <0.526.0> with 1 neighbours crashed with reason: no match of right hand value {file_descriptor,raw_file_io_list,{file_descriptor,prim_file,#{handle => #Ref<0.114031393.2719088661.121660>,owner => <0.526.0>,r_buffer => #Ref<0.114031393.2719088642.121370>,r_ahead_size => 0}}} in gtp_u_kernel:init/1 line 53

The notice log lines above are added in code by myself. The line failing is:

    #file_descriptor{module = prim_file,
             data   = {_Port, NsFd}} = FDesc,

My bet is that #file_descriptor record changed at some point during newer OTP releases?

Actions #5

Updated by pespin 4 months ago

I think I was able to fix the issue above with the following patch:

diff --git a/src/gtp_u_kernel.erl b/src/gtp_u_kernel.erl
index ef60038..2abd772 100644
--- a/src/gtp_u_kernel.erl
+++ b/src/gtp_u_kernel.erl
@@ -47,12 +47,11 @@ delete_pdp_context(Server, Version, SGSN, MS, LocalTEI, RemoteTEI) ->

 init([Device, FD0, FD1u, Opts]) ->
     VrfOpts = proplists:get_value(vrf, Opts, []),
-    {ok, FDesc} = get_ns_fd(VrfOpts),
+    {ok, FDesc} = get_ns_fdesc(VrfOpts),
+    NsFd = get_ns_fd(FDesc),
     lager:notice("VrfOpts: ~p~n", [VrfOpts]),
     lager:notice("FDesc: ~p~n", [FDesc]),
-    #file_descriptor{module = prim_file,
-                    data   = {_Port, NsFd}} = FDesc,
-
+    lager:notice("NsFd: ~p~n", [NsFd]),
     {RtNl, RtNlNs} = netlink_sockets(VrfOpts),
     CreateGTPLinkInfo = [{fd0, FD0}, {fd1, FD1u}, {hashsize, 131072}],
     CreateGTPData = netlink:linkinfo_enc(inet, "gtp", CreateGTPLinkInfo),
@@ -170,7 +169,7 @@ code_change(_OldVsn, State, _Extra) ->
 -define(SELF_NET_NS, "/proc/self/ns/net").
 -define(SIOCGIFINDEX, 16#8933).

-get_ns_fd(Opts) ->
+get_ns_fdesc(Opts) ->
     try
        {netns, NetNs} = lists:keyfind(netns, 1, Opts),
        {ok, _} = file:open(filename:join("/var/run/netns", NetNs), [raw, read])
@@ -179,6 +178,18 @@ get_ns_fd(Opts) ->
            {ok, _} = file:open(?SELF_NET_NS, [raw, read])
     end.

+get_ns_fd(FDesc) ->
+    lager:notice("FDesc: ~p~n", [FDesc]),
+    case FDesc of
+    #file_descriptor{module = prim_file} ->
+        #file_descriptor{data = {_, NsFd}} = FDesc,
+        NsFd;
+    #file_descriptor{module = _} ->
+        PrivFDesc = FDesc#file_descriptor.data,
+        #file_descriptor{data = #{handle := NsFd}} = PrivFDesc,
+        NsFd
+    end.
+

In summary, it seems at least my OTP-26 version is creating a tuple of type raw_file_io_list which encloses the usual file reference which is required here. This is due to the "[raw, read]" fields being passed. Maybe in older versions of OTP this didnit happen.

Actions #6

Updated by pespin 4 months ago

It seems my previous patch to take the proper FD was not good after all. The netlink code is failing at a later step because it's expecting an uint32 (unix fd) while I'm passing some other field which seems to be of another type :/

I added the err rline to log encode_huint32 params:

[error] encode_huint32: 1 28
18:22:56.094 [error] encode_huint32: 2 29
18:22:56.094 [error] encode_huint32: 3 131072
18:22:56.094 [debug] CreateGTPReq: {rtnetlink,newlink,[create,excl,ack,request],2,0,{inet,arphrd_none,0,[up],[up],[{net_ns_fd,#Ref<0.1102821846.1330774038.183476>},{ifname,"gtp0"},{linkinfo,[{kind,"gtp"},{data,<<8,0,1,0,28,0,0,0,8,0,2,0,29,0,0,0,8,0,3,0,0,0,2,0>>}]}]}}
18:22:56.094 [error] encode_huint32: 28 #Ref<0.1102821846.1330774038.183476>
18:22:56.095 [error] CRASH REPORT Process <0.526.0> with 1 neighbours crashed with reason: bad argument in netlink:encode_huint32/2 line 534

The "#Ref<0.1102821846.1330774038.183476>" is the field I'm getting with my previous patch.

Actions #7

Updated by pespin 4 months ago

I was able to apparently fix it with a new patch version:

diff --git a/src/gtp_u_kernel.erl b/src/gtp_u_kernel.erl
index ef60038..ba6785e 100644
--- a/src/gtp_u_kernel.erl
+++ b/src/gtp_u_kernel.erl
@@ -47,12 +47,11 @@ delete_pdp_context(Server, Version, SGSN, MS, LocalTEI, RemoteTEI) ->

 init([Device, FD0, FD1u, Opts]) ->
     VrfOpts = proplists:get_value(vrf, Opts, []),
-    {ok, FDesc} = get_ns_fd(VrfOpts),
+    {ok, FDesc} = get_ns_fdesc(VrfOpts),
+    NsFd = get_ns_fd(FDesc),
     lager:notice("VrfOpts: ~p~n", [VrfOpts]),
     lager:notice("FDesc: ~p~n", [FDesc]),
-    #file_descriptor{module = prim_file,
-                    data   = {_Port, NsFd}} = FDesc,
-
+    lager:notice("NsFd: ~p~n", [NsFd]),
     {RtNl, RtNlNs} = netlink_sockets(VrfOpts),
     CreateGTPLinkInfo = [{fd0, FD0}, {fd1, FD1u}, {hashsize, 131072}],
     CreateGTPData = netlink:linkinfo_enc(inet, "gtp", CreateGTPLinkInfo),
@@ -170,7 +169,7 @@ code_change(_OldVsn, State, _Extra) ->
 -define(SELF_NET_NS, "/proc/self/ns/net").
 -define(SIOCGIFINDEX, 16#8933).

-get_ns_fd(Opts) ->
+get_ns_fdesc(Opts) ->
     try
        {netns, NetNs} = lists:keyfind(netns, 1, Opts),
        {ok, _} = file:open(filename:join("/var/run/netns", NetNs), [raw, read])
@@ -179,6 +178,19 @@ get_ns_fd(Opts) ->
            {ok, _} = file:open(?SELF_NET_NS, [raw, read])
     end.

+get_ns_fd(FDesc) ->
+    lager:notice("FDesc: ~p~n", [FDesc]),
+    case FDesc of
+    #file_descriptor{module = prim_file} ->
+        #file_descriptor{data = {_, NsFd}} = FDesc,
+        NsFd;
+    #file_descriptor{module = _} ->
+        PrivFDesc = FDesc#file_descriptor.data,
+        binary:decode_unsigned(prim_file:get_handle(PrivFDesc),little)
+        %#file_descriptor{data = #{handle := NsFd}} = PrivFDesc,
+        %prim_file:get_handle(NsFd)
+    end.
+

After applying that one, the FD looks sane (30 being allocated after previous 29 one):

19:23:28.581 [error] encode_huint32: 2 29
19:23:28.581 [error] encode_huint32: 3 131072
19:23:28.581 [debug] CreateGTPReq: {rtnetlink,newlink,[create,excl,ack,request],35,0,{inet,arphrd_none,0,[up],[up],[{net_ns_fd,30},{ifname,"gtp0"},{linkinfo,[{kind,"gtp"},{data,<<8,0,1,0,28,0,0,0,8,0,2,0,29,0,0,0,8,0,3,0,0,0,2,0>>}]}]}}
19:23:28.581 [notice] nl_simple_request do_request {rtnetlink,newlink,[create,excl,ack,request],35,0,{inet,arphrd_none,0,[up],[up],[{net_ns_fd,30},{ifname,"gtp0"},{linkinfo,[{kind,"gtp"},{data,<<8,0,1,0,28,0,0,0,8,0,2,0,29,0,0,0,8,0,3,0,0,0,2,0>>}]}]}}
19:23:28.581 [error] encode_huint32: 28 30

However I'm still facing problems. Now during submitting of CreateGTPReq (rtnetlink,newlink). So some problem in netlink I still need to debug.

Actions #8

Updated by pespin 4 months ago

  • Status changed from New to In Progress

For now I submitted upstream a PR fixing the "obtain netns FD" problem:
https://github.com/travelping/gtp_u_kmod/pull/2

Actions #9

Updated by pespin 4 months ago

I created forks under github/osmocom/ for the following repos:
https://github.com/osmocom/gen_socket/
https://github.com/osmocom/gen_netlink/
https://github.com/osmocom/gtplib/
https://github.com/osmocom/gtp_u_kmod/

All those repos have the master branch tracking the travelping upstream master branch. All our patches are on top in "osmocom/master" branch. In those branches they also have rebar.config modified to pull the dependent modules from the forked repo+branch.

My WIP code in osmo-epdg.git using those modules to set up a tunnel (still not really working due to errors) can be found in branch "pespin/master".

Actions #10

Updated by pespin 4 months ago

Further investigation so far seems to indicate there's a bug in the logic handling rtnetlink.

In summary, osmo-epdg sends the following request and processes messages until receiving an ACK for it (seqnum):

Linux netlink (cooked header)
    Link-layer address type: Netlink (824)
    Family: Route (0x0000)
Linux rtnetlink (route netlink) protocol
    Netlink message header (type: Create network interface)
        Length: 92
        Message type: Create network interface (16)
        Flags: 0x0605
            .... .... .... ...1 = Request: 1
            .... .... .... ..0. = Multipart message: 0
            .... .... .... .1.. = Ack: 1
            .... .... .... 0... = Echo: 0
            .... .... ...0 .... = Dump inconsistent: 0
            .... .... ..0. .... = Dump filtered: 0
            .... ...0 .... .... = Specify tree root: 0
            .... ..1. .... .... = Return all matching: 1
            .... .1.. .... .... = Atomic: 1
        Flags: 0x0605
            .... .... .... ...1 = Request: 1
            .... .... .... ..0. = Multipart message: 0
            .... .... .... .1.. = Ack: 1
            .... .... .... 0... = Echo: 0
            .... .... ...0 .... = Dump inconsistent: 0
            .... .... ..0. .... = Dump filtered: 0
            .... ...0 .... .... = Replace: 0
            .... ..1. .... .... = Excl: 1
            .... .1.. .... .... = Create: 1
            .... 0... .... .... = Append: 0
        Sequence: 34
        Port ID: 0
    Interface family: 2
    Device type: zero header length (65534)
    Interface index: 0
    Device flags: UP (0x00000001)
    Device change flags: 1
    Attribute: NetNs fd
        Len: 8
        Type: 0x001c, NetNs fd (28)
            0... .... .... .... = Nested: False
            .0.. .... .... .... = Network byte order: False
            Attribute type: NetNs fd (28)
        Data: 1e000000
    Attribute: Device name: gtp0
        Len: 9
        Type: 0x0003, Device name (3)
            0... .... .... .... = Nested: False
            .0.. .... .... .... = Network byte order: False
            Attribute type: Device name (3)
        Device name: gtp0
    Attribute: Link info
        Len: 40
        Type: 0x0012, Link info (18)
            0... .... .... .... = Nested: False
            .0.. .... .... .... = Network byte order: False
            Attribute type: Link info (18)
        Data: 08000100677470001c000200080001001c000000080002001d0000000800030000000200

Then, after some dump messages, the ACK comes (error type but with err_code=0, so it's just confirming everything was fine):

Linux netlink (cooked header)
    Link-layer address type: Netlink (824)
    Family: Route (0x0000)
Netlink message
    Netlink message header (type: Error)
        Length: 36
        Message type: Error (0x0002)
        Flags: 0x0100
            .... .... .... ...0 = Request: 0
            .... .... .... ..0. = Multipart message: 0
            .... .... .... .0.. = Ack: 0
            .... .... .... 0... = Echo: 0
            .... .... ...0 .... = Dump inconsistent: 0
            .... .... ..0. .... = Dump filtered: 0
        Sequence: 34
        Port ID: 261026
    Error code: Success (0)
    Netlink message header (type: 0x0010)
        Length: 92
        Message type: Protocol-specific (0x0010)
        Flags: 0x0605
            .... .... .... ...1 = Request: 1
            .... .... .... ..0. = Multipart message: 0
            .... .... .... .1.. = Ack: 1
            .... .... .... 0... = Echo: 0
            .... .... ...0 .... = Dump inconsistent: 0
            .... .... ..0. .... = Dump filtered: 0
            .... ...0 .... .... = Specify tree root: 0
            .... ..1. .... .... = Return all matching: 1
            .... .1.. .... .... = Atomic: 1
        Flags: 0x0605
            .... .... .... ...1 = Request: 1
            .... .... .... ..0. = Multipart message: 0
            .... .... .... .1.. = Ack: 1
            .... .... .... 0... = Echo: 0
            .... .... ...0 .... = Dump inconsistent: 0
            .... .... ..0. .... = Dump filtered: 0
            .... ...0 .... .... = Replace: 0
            .... ..1. .... .... = Excl: 1
            .... .1.. .... .... = Create: 1
            .... 0... .... .... = Append: 0
        Sequence: 34
        Port ID: 0

This is how gen_netlink parses the ACK message:

Response: [{rtnetlink,error,[256],34,261026,[{ifinfomsg,unspec,2,92,100990992,34},{rawdata,<<0,0,0,0>>}]}]

However, that kind of message payload for an "error" type packet is not expected at all. This fails in gtp_u_kmod when matching the response from the request:

nl_simple_response(error, {0, _}, _Response) ->
    ok;
nl_simple_response(error, {Code, _}, _Response) ->
    {error, Code};
nl_simple_response(_, _, Response) ->
    Response.

nl_simple_response(_Seq, []) ->
    continue;
nl_simple_response(Seq, [Response = #rtnetlink{type = Type, seq = Seq, msg = Msg} | Next ]) ->
    nl_simple_response(-1, Next),
    lager:debug("nl_simple_response: Matching Msg=~p", [Msg]),
    nl_simple_response(Type, Msg, Response);

As one can see, it expects the "error" type messages to only have a msg payload of "{Code, whatever}", but this message has "[{ifinfomsg,unspec,2,92,100990992,34},{rawdata,<<0,0,0,0>>}]" instead.

That's because gtp_netlink is incorrectly decoding the message in nl_rt_dec() around line 983, where it calls is_rt_dump() and it wrongly returns true:

is_rt_dump(Type, Flags) ->
    (Type band 3) =:= 2 andalso Flags band ?NLM_F_DUMP =/= 0.

nl_rt_dec(Protocol, << Len:32/native-integer, Type:16/native-integer, Flags:16/native-integer, Seq:32/native-integer, Pid:32/native-integer, Data/binary >> = Msg, Acc) ->
    {DecodedMsg, Next} = case nlmsg_ok(size(Msg), Len) of
[...]
                 case is_rt_dump(Type, Flags) of
                     true ->
                     <<IfiFam:8, _Pad:8, IfiType:16/native-integer, IfiIndex:32/native-integer, IfiFlags:32/native-integer, IfiChange:32/native-integer, Filter/binary >> = PayLoad,
                     InfoMsg = #ifinfomsg{family = gen_socket:family(IfiFam),
                                  type = Type,
                                  index = IfiIndex,
                                  flags = IfiFlags,
                                  change = IfiChange},
                     {RtMsg#rtnetlink{msg = [InfoMsg | nl_dec_nla(IfiFam, fun decode_rtnetlink_link/3, Filter)]}, NextMsg};

Instead, is_rt_dump() should return false there and go through this path:

%% Error
nl_dec_payload(_Type, error, <<Error:32, Msg/binary>>) ->
    {Error, Msg};

nl_rt_dec(Protocol, << Len:32/native-integer, Type:16/native-integer, Flags:16/native-integer, Seq:32/native-integer, Pid:32/native-integer, Data/binary >> = Msg, Acc) ->
    {DecodedMsg, Next} = case nlmsg_ok(size(Msg), Len) of
[...]
                 case is_rt_dump(Type, Flags) of
[...]
                     _ ->
                     {RtMsg#rtnetlink{msg = nl_dec_payload(rtnetlink, MsgType, PayLoad)}, NextMsg}

I think the bug is due to:

%% Modifiers to GET request
-define(NLM_F_ROOT,      16#100).   %% specify tree root
-define(NLM_F_MATCH,     16#200).   %% return all matching
-define(NLM_F_ATOMIC,    16#400).   %% atomic GET
-define(NLM_F_DUMP,      (?NLM_F_ROOT bor ?NLM_F_MATCH)).

%% Modifiers to NEW request
-define(NLM_F_REPLACE,   16#100).   %% Override existing
-define(NLM_F_EXCL,      16#200).   %% Do not touch, if it exists

The received message has flags=256=0x100, so it matches NLM_F_ROOT and hence NLM_F_DUMP and finally that's why is_rt_dump() returns true.
The problem here is that afaiu, the original request which triggered the ACK is not a "GET" request, but a "NEW" request, so the flag there's is simply asking to replace the device.

So I'd say so far nl_rt_dec/is_rt_dump() is buggy in gen_netlink, but I'm not sure how to easily fix the problem yet...

Actions #11

Updated by pespin 4 months ago

https://kernel.org/doc/html/next/userspace-api/netlink/intro.html
"For GET - NLM_F_ROOT and NLM_F_MATCH are combined into NLM_F_DUMP, and not used separately. NLM_F_ATOMIC is never used."

So probably checking that both are set instead of checking any set is the way to go?

Probably something like:

diff --git a/src/netlink.erl b/src/netlink.erl
index 51efc33..08a0a95 100644
--- a/src/netlink.erl
+++ b/src/netlink.erl
@@ -974,7 +974,7 @@ nl_ct_dec(_Protocol, << >>, Acc) ->
     lists:reverse(Acc).

 is_rt_dump(Type, Flags) ->
-    (Type band 3) =:= 2 andalso Flags band ?NLM_F_DUMP =/= 0.
+    (Type band 3) =:= 2 andalso Flags band ?NLM_F_DUMP =:= ?NLM_F_DUMP.

Actions #12

Updated by pespin 4 months ago

pespin wrote in #note-11:

https://kernel.org/doc/html/next/userspace-api/netlink/intro.html
"For GET - NLM_F_ROOT and NLM_F_MATCH are combined into NLM_F_DUMP, and not used separately. NLM_F_ATOMIC is never used."

So probably checking that both are set instead of checking any set is the way to go?

ACK to myself, I gave it a try and I'm reaching way further in the gtp device setup at startup now. I pushed the commit to github/osmocom/gen_netlink.git branch osmocom/master with the other fixes.

Actions #13

Updated by pespin 4 months ago

I submitted a PR for the is_rt_dump() bug from above here:
https://github.com/travelping/gen_netlink/pull/9

I now seem to be starting everything more or less fine with the gtp tun created (doing nothing with it yet). However, this is only when the gtp0 netif was not previously created. After the first time, when I kill the process, the netdev is still kept alive, and next time I tun osmo-epdg it will fail due to netlink returning:

Netlink message
    Netlink message header (type: Error)
        Length: 112
        Message type: Error (0x0002)
        Flags: 0x0000
            .... .... .... ...0 = Request: 0
            .... .... .... ..0. = Multipart message: 0
            .... .... .... .0.. = Ack: 0
            .... .... .... 0... = Echo: 0
            .... .... ...0 .... = Dump inconsistent: 0
            .... .... ..0. .... = Dump filtered: 0
        Sequence: 34
        Port ID: 34285
    Error code: File exists (-EEXIST) (-17)
    Netlink message header (type: 0x0010)
        Length: 92
        Message type: Protocol-specific (0x0010)
        Flags: 0x0605
            .... .... .... ...1 = Request: 1
            .... .... .... ..0. = Multipart message: 0
            .... .... .... .1.. = Ack: 1
            .... .... .... 0... = Echo: 0
            .... .... ...0 .... = Dump inconsistent: 0
            .... .... ..0. .... = Dump filtered: 0
            .... ...0 .... .... = Specify tree root: 0
            .... ..1. .... .... = Return all matching: 1
            .... .1.. .... .... = Atomic: 1
        Flags: 0x0605
            .... .... .... ...1 = Request: 1
            .... .... .... ..0. = Multipart message: 0
            .... .... .... .1.. = Ack: 1
            .... .... .... 0... = Echo: 0
            .... .... ...0 .... = Dump inconsistent: 0
            .... .... ..0. .... = Dump filtered: 0
            .... ...0 .... .... = Replace: 0
            .... ..1. .... .... = Excl: 1
            .... .1.. .... .... = Create: 1
            .... 0... .... .... = Append: 0
        Sequence: 34
        Port ID: 0

So "Error code: File exists (-EEXIST) (-17)" seems acceptable given the iface is already existing, but somehow gtp_u_kmod is not contemplating that possibility in gtp_u_kmod:init/1 (line 67):

ok = nl_simple_request(RtNl, ?NETLINK_ROUTE, CreateGTPReq),

I get returned:

wait_for_response: Response: [{rtnetlink,error,[],34,41847,{-17,<<92,0,0,0,16,0,5,6,34,0,0,0,0,0,0,0,2,0,254,255,0,0,0,0,1,0,0,0,1,0,0,0,8,0,28,0,30,0,0,0,9,0,3,0,103,116,112,48,0,0,0,0,40,0,18,0,8,0,1,0,103,116,112,0,28,0,2,0,8,0,1,0,28,0,0,0,8,0,2,0,29,0,0,0,8,0,3,0,0,0,2,0>>}}]

I also fixed gen_netlink incorrectly decoding the error code field in here:
https://github.com/travelping/gen_netlink/pull/9

Actions #14

Updated by pespin 4 months ago

I tried dropping the excl flag from the new_link nl message in order to allow reusing the tundev:

diff --git a/src/gtp_u_kernel.erl b/src/gtp_u_kernel.erl
index 1918fb8..257a01d 100644
--- a/src/gtp_u_kernel.erl
+++ b/src/gtp_u_kernel.erl
@@ -58,7 +58,7 @@ init([Device, FD0, FD1u, Opts]) ->
                     {linkinfo,[{kind, "gtp"},
                                {data, CreateGTPData}]}]},
     CreateGTPReq = #rtnetlink{type  = newlink,
-                             flags = [create,excl,ack,request],
+                             flags = [create,ack,request],
                              seq   = erlang:unique_integer([positive]),
                              pid   = 0,
                              msg   = CreateGTPMsg},

But now I get "Error code: Operation not supported on transport endpoint (-EOPNOTSUPP) (-95)" instead.

Actions #15

Updated by pespin 4 months ago

I think it hits this path in the kernel in __rtnl_newlink:

        if (linkinfo[IFLA_INFO_DATA]) {
            if (!ops || ops != dev->rtnl_link_ops ||
                !ops->changelink)
                return -EOPNOTSUPP;

            err = ops->changelink(dev, tb, data, extack);
            if (err < 0)
                return err;
            status |= DO_SETLINK_NOTIFY;
        }

So we need to make sure the tun device is released before starting the app.

Actions #16

Updated by pespin about 1 month ago

I worked further on how the gtp_u_kmod was being used in my WIP osmo-epdg branch. I also fixed gen_sock module (submitted to osmocom/master branch in our fork) to be able to find the nif objects (shared libraries) when running code in escript mode.

I can already run the EPDG_Tests ttcn testsuite in docker with osmo-epdg creating the tunnel device at startup.

Next step is to start doing stuff on the tunnel device created.

Actions #17

Updated by laforge 23 days ago

I've described my understanding of how the user plane looks like at EPDG_implementation_plan

Please note that I had assumed IPv6 inner IP addresses as that's the default within IMS. However, given that the kernel GTP driver IPv6 support is currently still not in mainline yet (let alone in a debian kernel) we'll have to set up everything to use IPv4 inside the tunnels.

Actions #18

Updated by laforge 21 days ago

  • Priority changed from Normal to Urgent
Actions #19

Updated by pespin 15 days ago

current osmo-epdg master already support PDP Context being created and deleted in the gtp0 tun device upon Session establishment.

What's missing now is to set up the routing/fwmark on strongswan.

Actions #20

Updated by pespin 15 days ago

So far this is the current setup:

[ping 8.8.8.8]---[tun0][SWu-emulator]----ipsec----[strongswan_____________________________][enp1s0]--------------------[gtp0]-------gtpv1u-------[upf]
                 172.20.0.1                       epdg.osmocom.org(213.95.46.81)            172.20.0.1                   EUA=10.45.0.1            GTPv1U_ADDR=10.74.0.24

strongswan is not yet applying the fwmark, nor using the EUA allocated by SMF and forwarded by osmo-epdg to it.

I can easily now set up the above described setup running SWu-emulator on my laptop. I get a "tun0" interface which gets a local IP address 172.20.0.1 assigned (negotiated with strongswan). I tell SWu-emulator to create the "tun0" device in a netns where I can easily do "ping 8.8.8.8".

With wireshark, I see the ping packet successfuly decrypted and injected by strongswan in epdg.osmocom.org as if it arrives from enp1s0, containing src_addr=172.20.0.1 and dst_addr=8.8.8.8.

I can verify what I mentioned with the following:

export MS_IPSEC_ADDR=172.20.0.1
export MS_INTERNAL_ADDR=10.45.0.1
iptables -t mangle -A PREROUTING -s "${MS_IPSEC_ADDR}" -j LOG --log-prefix "iptables: " 

Then, since strongswan is not yet applying the fwmark, I'm doing it manually in iptables for now (I verified it matches the arriving ping packets with iptables -L -v -n showing the match count):

iptables -t mangle -A PREROUTING -s "${MS_IPSEC_ADDR}" -j MARK --set-mark 2

Since strongswan doesn't yet set the proper IP address for the MS, ideally I'd have an extra iptables command to modify the src IP address for packets with fwmakr=2 from $MS_IPSEC_ADDR to $MS_INTERNAL_ADDR. I still need to find out how.

Finally, I started setting up some routing rules for fwmark:

echo "2 pespin" >> /etc/iproute2/rt_tables
ip rule add fwmark 2 table pespin
ip route add default dev gtp0 table pespin

This is what I got so far:

# ip route get 8.8.8.8
8.8.8.8 via 213.95.46.1 dev enp1s0 src 213.95.46.81 uid 0
    cache
root@epdg:~# ip route get 8.8.8.8 mark 2
8.8.8.8 dev gtp0 table pespin src 213.95.46.81 mark 2 uid 0
    cache
root@epdg:~# ip route get 8.8.8.8 mark 2 from 172.20.0.1
RTNETLINK answers: Network is unreachable
root@epdg:~# ip route get 8.8.8.8 mark 2 from 10.45.0.1
RTNETLINK answers: Network is unreachable

So it seems routing towards gtp0 is fine whenever the src IP address is automatically selected from the local host, but it doesn't seem to like the routing when the src address comes from outside the host (I have net.ipv4.ip_forward=1).

Actions #21

Updated by pespin 15 days ago

If I set the input iface in "ip route get", which I forgot, the routing looks better:

root@epdg:~# ip route get 8.8.8.8 mark 2 from 10.45.0.1 iif enp1s0
8.8.8.8 from 10.45.0.1 dev gtp0 table pespin mark 2
    cache iif enp1s0
root@epdg:~# ip route get 8.8.8.8 mark 2 from 172.20.0.1 iif enp1s0
8.8.8.8 from 172.20.0.1 dev gtp0 table pespin mark 2
    cache iif enp1s0

So maybe patching the source address to match the EUA expected by gtp0 may be enough to get the first packets reaching the UPF.

Actions #22

Updated by laforge 15 days ago

On Tue, Feb 13, 2024 at 11:39:48AM +0000, pespin wrote:

Issue #6235 has been updated by pespin.

current osmo-epdg master already support PDP Context being created and deleted in the gtp0 tun device upon Session establishment.

What's missing now is to set up the routing/fwmark on strongswan.

Assuming a packets fwmark survives the IP xfrm input decapsulation (I would expect it does), you could try to simply mark all traffic from your (outer) source IPv4 address (or even all traffic to UDP port 4500?) with one mark, and then use 'ip rule' to use a separate non-standard routing table, and have that routing table's default route point to gtp0

At leat I couldn't find any code in net/xfrm/*.c whihc would touch the skb fwmark, so that should be true.

Also, looking at https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilter-packet-flow.svg it should be
clear that you should see both the encrypted as wel las the decrypted/decapsulated packet going ghrtouhg
filter/input.

That would be my first try, and it should be quick to verify.

The more proper approach would likely use the nft ipsec expression (or legacy iptables policy match),
see https://thermalcircle.de/doku.php?id=blog:linux:nftables_demystifying_ipsec_expressions

this way it should be possible to have a rule that matches on all inbound ipsec packets. As all of them ar e
treted the same "ipsec in" without any further details like spi, reqid should be sufficient.

Actions #23

Updated by laforge 15 days ago

From my point this looked all great.

After some joint debuggin we could narrow it down to xfrm policy related dropping in ip_forward.

It seems the local_ts is set to 172.16.24.0/24 and hence the inner IP packet destination outsie that
subnet would be dropped, while pau has been testing with 8.8.8.8

Test with local_ts 0.0.0.0/8 have started

Actions #24

Updated by pespin 15 days ago

I got so far to the point where the ping initiated at my laptop ends up being transmitted to the UPF encapsulated in GTP, and the UPF decaps it and transmits it.

I had to change the "ip" bound to the gtp0 tun dev in osmo-epdg config in order to avoid getting errors like:

[427554.144163] gtp0: found PDP context 0000000002724036
[427554.144167] gtp0: no route to SSGN 10.74.0.24

I changed it to be a local address at osmo-epdg host from where the peer can be reached:

  {gtp_u_kmod, [
    %% grx: Name used to log by the module.
    {sockets, [{grx, [%% ip: IP Address assigned at the tunnel. TODO: not currently applied?
-                      %{ip, {192,0,2,16}},
+                  {ip, {10,74,0,11}},

I'm currently using the following setup:

export MS_IPSEC_ADDR=172.20.0.1
export MS_INTERNAL_ADDR=10.45.0.1
export FWMARK=2
export GTP_TUNDEV="gtp0" 

#iptables -t mangle -A PREROUTING -s "${MS_IPSEC_ADDR}" -j LOG --log-prefix "iptables: " 
iptables -t mangle -A PREROUTING -s "${MS_IPSEC_ADDR}" -j MARK --set-mark "${FWMARK}" 
iptables -t nat -A POSTROUTING -m mark --mark "${FWMARK}" -o "${GTP_TUNDEV}" -j SNAT --to "${MS_INTERNAL_ADDR}" 

echo "2 epdg" >> /etc/iproute2/rt_tables
ip rule add fwmark "${MARK}" table epdg
ip route add default dev gtp0 table epdg

Then I start osmo-epdg like this:

ip link del gtp0; sleep 1; rebar3 shell --config ./config/local.config

Note: It's important to make sure the gtp0 device doesn't exist before osmo-epdg starts, otherwise gtp_u_kmod fails to create/configure the tun device.
Note2: As "ip link del gtp0" drops the device, that means the "ip route add default dev gtp0 table epdg" line needs to be reapplied every time osmo-epdg is restarted.

I'm now trying to figure out why ICMP responses never go back into the GTP tunnel in epc.epdg.osmocom.org (seems UPF fault). The fact that internet is not reachable there by default routing may be a related problem:

root@epc:~# ping 8.8.8.8
ping: connect: Network is unreachable

Actions #25

Updated by laforge 15 days ago

On Tue, Feb 13, 2024 at 05:11:53PM +0000, pespin wrote:

The fact that internet is not reachable there by default routing may be a related problem:

"The internet" is very well reachable, but not the IPv4 portion of it. This is what as far as I understood was requested. Back at the time when lynxis requested this setup, he stated "IPv6 only" for the IMS and EPC machines, and that's how I configured it.

The problem now is that the 'eth0' interface of those machines is directly bridged to a physical network device in a network segment behind a colocated firewall, and that network segment only has public V6 and public V4. There is no NAT in my colocation setup. So either I allocate a public v4 address to it, or we'd have to add an eth2 network that bridges to lxcbr0 on the host, whihc in turn then can do NAT towards the public v4.

But then, why do we need to reach public IPv4 addresss through all of this? I thought the point of the ePDG setup was to reach [only] the IMS core behind it?

Actions #26

Updated by pespin 15 days ago

  • % Done changed from 0 to 60

laforge wrote in #note-25:

On Tue, Feb 13, 2024 at 05:11:53PM +0000, pespin wrote:

The fact that internet is not reachable there by default routing may be a related problem:

"The internet" is very well reachable, but not the IPv4 portion of it. This is what as far as I understood was requested. Back at the time when lynxis requested this setup, he stated "IPv6 only" for the IMS and EPC machines, and that's how I configured it.

The problem now is that the 'eth0' interface of those machines is directly bridged to a physical network device in a network segment behind a colocated firewall, and that network segment only has public V6 and public V4. There is no NAT in my colocation setup. So either I allocate a public v4 address to it, or we'd have to add an eth2 network that bridges to lxcbr0 on the host, whihc in turn then can do NAT towards the public v4.

But then, why do we need to reach public IPv4 addresss through all of this? I thought the point of the ePDG setup was to reach [only] the IMS core behind it?

Fine then. I used IMS ip address as a ping target.

I had to add the following in ims.epdg.osmocom.org in order for he host to answer ping requests coming from the MS IP address pool from UPF:

# 10.45.0.0/24 is the address pool of UPF.
root@ims:~# ip route add 10.45.0.0/24 via 10.74.0.21

Also, I had to tweak broken open5gs setup where open5gs-upfd ogtsun gets assigned IP address "10.45.0.1/16", but that IP address is actually also assigned to the first MS (my SWu emulator), and that creates problems in the network stack when the inner packet is decapsulated from GTP. In order to fix it:

root@epc:~# ip addr del 10.45.0.1/16 dev ogstun
root@epc:~# ip route add 10.45.0.0/24 dev ogstun

According to lynxis this is a problem coming from open5gs package (file /etc/systemd/network/99-open5gs.network). The IP address is set in order to get the routing entry for free. Instead, it should only add the routing entry.

SWu-IKEv2]# ping 10.74.0.31
PING 10.74.0.31 (10.74.0.31) 56(84) bytes of data.
64 bytes from 10.74.0.31: icmp_seq=1 ttl=62 time=48.3 ms
64 bytes from 10.74.0.31: icmp_seq=2 ttl=62 time=47.5 ms
64 bytes from 10.74.0.31: icmp_seq=3 ttl=62 time=48.0 ms
64 bytes from 10.74.0.31: icmp_seq=4 ttl=62 time=47.3 ms
64 bytes from 10.74.0.31: icmp_seq=5 ttl=62 time=48.4 ms
64 bytes from 10.74.0.31: icmp_seq=6 ttl=62 time=47.6 ms
64 bytes from 10.74.0.31: icmp_seq=7 ttl=62 time=47.5 ms
64 bytes from 10.74.0.31: icmp_seq=8 ttl=62 time=47.6 ms
64 bytes from 10.74.0.31: icmp_seq=9 ttl=62 time=47.2 ms
64 bytes from 10.74.0.31: icmp_seq=10 ttl=62 time=48.3 ms
^C
--- 10.74.0.31 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9032ms
rtt min/avg/max/mdev = 47.159/47.756/48.403/0.428 ms

Next step: test IPv6 PDP Contexts over IPv4 gtp tunnel.

Actions #27

Updated by laforge 15 days ago

On Tue, Feb 13, 2024 at 06:38:34PM +0000, pespin wrote:

Next step: test IPv6 PDP Contexts over IPv4 gtp tunnel.

That won't work, at least not with the mainline kernel GTP module. See #6096

I think I already mentioned this in the wiki: You will have to use v4 for now in the IMS, until the kernel GTP has been fully tested (and a custom kernel module built for this installation)

Actions #28

Updated by pespin 14 days ago

I have been working in using a pre-created tundev device in osmo-epdg/gtp_u_kmod.
In order to suppport that, I added a new config which allows specificing the creation policy. From my gtp_u_kmod WIP patch description:

    Add config {create_mode, (nocreate|create|replace)}

    This allows configuring the tun dev creation policy:
    - nocreate: The process assumes the tundev was created previously by
      some other means (eg. previous instance of the same process or some
      external tool/service). If the tundev is not found, failure will
      occur.
    - create: The default behavior. It will create the tun dev and related
      GTPU sockets from config. If the tundev already exists, a failure from
      the kernel will occur (-EEXIST, -17, see __rtnl_newlink()).
    - replace: It will create the tun dev and related GTPU sockets from
      config. If the tundev already exists, it will be replaced.
      NOTE: The linux kernel doesn't support this feature yet, it will fail with
      -EOPNOTSUPP in __rtnl_newlink().

The idea here is that the gtp tun dev is created by external means, that is:
- Systemd: we'd need to somehow configure the UDP/GTP-U sockets in a systemd socket file and pass those to osmo-epdg. Probably not what we want to implement now.
- gtp-link: Use libgtpnl/tools/gtp-link.c to create the UDP/GTP-U sockets. So far it creates them using ANY_ADDR and standard ports. We probably want to improve a bit the tool to at least be able to make it listen on a given IP address. The call to gtp-link can be added as a PreExec to the Systemd service.

Actions #29

Updated by pespin 13 days ago

I submitted the changes so that osmo-epdg uses a pre-created gtp tun device in default config from osmo-epdg.git and in docker ttcn3-epdg-test here:
https://gerrit.osmocom.org/c/erlang/osmo-epdg/+/35993 Use new 'pre-create tundev' feature from gtp_u_kmod
https://gerrit.osmocom.org/c/docker-playground/+/35994 ttcn3-epdg: Use new 'pre-create tundev' feature from gtp_u_kmod

I'm now applying the same config in epdg.osmocom.org.

Actions #30

Updated by pespin 13 days ago

I now have epdg.osmocom.org setup working with ping end-to-end from emulated UE in my laptop towards ims.epdg.osmocom.org.

As a summary of what's needed to have the osmo-epdg host working:

export FWMARK=2

####
# This block here is needed until:
# 1- strongswan sends FWMARK own its own on the decapsulated packets (see the MARK iptables cmd below). We may choose to keep it here external.
# 2- strongswan passes the correct EUA to the UE (see the SNAT iptables cmd below)
####
export GTP_TUNDEV="gtp0" 
export MS_IPSEC_ADDR=172.20.0.1
export MS_INTERNAL_ADDR=10.45.0.1
iptables -t mangle -A PREROUTING -s "${MS_IPSEC_ADDR}" -j MARK --set-mark "${FWMARK}" 
iptables -t nat -A POSTROUTING -m mark --mark "${FWMARK}" -o "${GTP_TUNDEV}" -j SNAT --to "${MS_INTERNAL_ADDR}" 
#### 

# This needs to be applied once, probably by ansible:
echo "2 epdg" >> /etc/iproute2/rt_tables
# This need to be applied once upon startup. Not sure where's the best place to put this:
ip rule add fwmark "${MARK}" table epdg
# This needs to be applied every time *after* the gtp0 tundev is created (see gtp-link rant below).
# Every time the gtp0 device is deleted, this rule is deleted too automatically sicne the iface does not longer exist:
ip route add default dev gtp0 table epdg

The tun interface is to be created by external means like gtp-link. Ideally it is recreated immediately before restarting osmo-epdg, in order to clean up old tun state.

gtp-link add gtp0 --sgsn

IMPORTANT: For some unknown reason, the gtp-link needs to be kept running in its poll loop, otherwise the tundev device no longer allows setting up pdp contexts. Example: "gtp-tunnel add gtp0 v1 1876990469 16988 10.45.0.3 10.74.0.24" will return -ENODEV if the gtp-link process was stopped, even if the interface is still created and UP. laforge do you know if this is an expected behavior? Looks weird to me.

So in general, one does the following to start the whole thing every time:

gtp-link add gtp0 --sgsn &
MYPID=$!
echo "gtp-link pid ${MYPID}" 
sleep 1
# gtp-link needs to be kept running in order for osmo-epdg to be able to set up PDP contexts...
#kill ${MYPID}
ip route add default dev gtp0 table epdg
rebar3 shell --config ./config/local.config
# here setup a bash TRAP to kill MYPID

lynxis as a reminder of what needs to be updated in ansible config files:
  • The swanctl.cfg line changing the children/net/local_ts to "= 0.0.0.0/0"
  • You need to copy to your ansible files the config file available in /srv/osmo-epdg/config/local.config
Actions #31

Updated by pespin 13 days ago

  • Related to Bug #6361: open5gs-upfd: Fix open5gs package assigning 1st IP address of the UE pool to the ogstun added
Actions #32

Updated by laforge 13 days ago

On Thu, Feb 15, 2024 at 07:42:21PM +0000, pespin wrote:

iptables -t mangle -A PREROUTING -s "${MS_IPSEC_ADDR}" -j MARK --set-mark "${FWMARK}"

this is a hack that works right now for first tests. However, this should be switched to use matching on ipsec "policy", as I mentioned several times by now. That way it will continue to work for any number of MSs also in the future - not just right now.

Also, all iptbles rules should switch to nftables. If we write new software in 2024 we shouldn't use a 25-year-old system that's only emulated for backwards compatibility, but the present-day system (nft/nftables). You can create your rules with iptables cmmands and then 'nft show ruleset' to see how it looks in nft syntax (which can be added to /etc/nftables.conf and then the systemd service for nftables will install them at boot (if enabled).

iptables -t nat -A POSTROUTING -m mark --mark "${FWMARK}" -o "${GTP_TUNDEV}" -j SNAT --to "${MS_INTERNAL_ADDR}"

This is just a temporary hack. It can stay this way until strongswan gets fixed and uses the PGW-allocated MS IP.

  1. This needs to be applied once, probably by ansible:
    echo "2 epdg" >> /etc/iproute2/rt_tables
  2. This need to be applied once upon startup. Not sure where's the best place to put this:
    ip rule add fwmark "${MARK}" table epdg

I could imagine it would fit into an 'up' rule for the gtp0 device in /etc/network/interfaces (but then a
matching ip rule del would have to go into a 'down' rule. Alternatively it could be a separate systemd unit
started once on boot.

  1. This needs to be applied every time after the gtp0 tundev is created (see gtp-link rant below).
  2. Every time the gtp0 device is deleted, this rule is deleted too automatically sicne the iface does not longer exist:
    ip route add default dev gtp0 table epdg

Would it work with an

allow-hotplug gtp0
iface gtp0 inet manual
    up ip route add default dev gtp0 table epdg

section in /etc/network/interfaces ?

IMPORTANT: For some unknown reason, the gtp-link needs to be kept running in its poll loop, otherwise the tundev device no longer allows setting up pdp contexts. Example: "gtp-tunnel add gtp0 v1 1876990469 16988 10.45.0.3 10.74.0.24" will return -ENODEV if the gtp-link process was stopped, even if the interface is still created and UP. laforge do you know if this is an expected behavior? Looks weird to me.

I don't know about that, no, sorry. Does strace show any read/write activity on the fd while you're
adding/removing the PDP contexts?

In general the idea was that this fd is owned by the "GSN" process (ggsn, upf, or now osmo-epdg). The reason to externalize its creation is privilege separation. We might come up with some smart way to drop privileges or to pass it into osmo-epdg?

Also, if you use the separate 'gtp-link' binary, how do you get the udp socket it creates into the epdg?

The epdg will need it in order to implement mandatory support for responding GTP ECHO, and possibly also
related to the weird flavour of IPv6 neighbor discovery that's spoken over GTP in case of inner IPv6 addresses (not supported today).

The kernel GTP driver passes all GTP packets which are not supported by the kernel to userspace, and
osmo-epdg needs to handle them. This could be, for example:
  • GTP ECHO reauests
  • packets for unknown PDP contexts
  • packets with unknown or unsupported GTP header options

Regards,
Harald

Actions #33

Updated by pespin 13 days ago

laforge wrote in #note-32:

On Thu, Feb 15, 2024 at 07:42:21PM +0000, pespin wrote:

iptables -t mangle -A PREROUTING -s "${MS_IPSEC_ADDR}" -j MARK --set-mark "${FWMARK}"

this is a hack that works right now for first tests. However, this should be switched to use matching on ipsec "policy", as I mentioned several times by now. That way it will continue to work for any number of MSs also in the future - not just right now.

Ok, I first thought that "nft ipsec" was just a temporary solution, and that we wanted to implement setting the fwmark somehow within the strongswan process/service.
If we want to fwmark through nft as permanent solution then I will look into adding related rules.

Also, all iptbles rules should switch to nftables.

ACK. Thanks for the tip, I'll convert them to nftables.

IMPORTANT: For some unknown reason, the gtp-link needs to be kept running in its poll loop, otherwise the tundev device no longer allows setting up pdp contexts. Example: "gtp-tunnel add gtp0 v1 1876990469 16988 10.45.0.3 10.74.0.24" will return -ENODEV if the gtp-link process was stopped, even if the interface is still created and UP. laforge do you know if this is an expected behavior? Looks weird to me.

I don't know about that, no, sorry. Does strace show any read/write activity on the fd while you're
adding/removing the PDP contexts?

Which fd do you mean? the netlink one? Yes I monitored it nlmon+wireshark and checked with strace. It's the kernel netlink response message containing the error -ENODEV.

What I think is happening is:
  • the gtp-link process creates the FDs and sends them to the kernel. They are stored under gtp->sk0 and gtp-sk1u.
  • When the gtp-link process dies, probably those fds are removed and gtp->sk0 and gtp->sk1u become NULL in the kernel.
  • When the add_pdp_ctx request is sent over netlink, it calls kernel's gtp.c gtp_genl_new_pdp(), which does:
        rtnl_lock();
    
        gtp = gtp_find_dev(sock_net(skb->sk), info->attrs);
        if (!gtp) {
            err = -ENODEV;
            goto out_unlock;
        }
    
        if (version == GTP_V0)
            sk = gtp->sk0;
        else if (version == GTP_V1)
            sk = gtp->sk1u;
        else
            sk = NULL;
    
        if (!sk) {
            err = -ENODEV;
            goto out_unlock;
        }
    

Hence why I guess gtp->sk1u somehow was turned to NULL. Now the question is (not totally important to fix right now): does it really make sense to keep the tun iface created if the related user-space sockets are dropped?
If the tun device is somehow related to those FDs, what's the purpose of keeping the iface and not destroying it rather than create confusion? May it make sense to ask Pablo to look at this?

In general the idea was that this fd is owned by the "GSN" process (ggsn, upf, or now osmo-epdg). The reason to externalize its creation is privilege separation. We might come up with some smart way to drop privileges or to pass it into osmo-epdg?

Also, if you use the separate 'gtp-link' binary, how do you get the udp socket it creates into the epdg?

I see 3 ways of doing this fd passing between gtp-link and osmo-epdg:
  • process inheritance through open fd + env var telling osmo-epdg where to find it.
  • Unix socket
  • Systemd inheritance (not sure if this can be done: creating a gtp tun from systemd files)
So basically we could either:
  • Add to gtp-link a "--cmd" param which is run with fork() passing the fd.
  • Have a "--unix-socket" param where gtp-link listens and clients can connect to get an fd.
  • Have a systemd.socket/device file with the GTPu socket and have it passed to osmo-epdg when it is started. Not sure if this can really be done with a gtp tun though.

But I think I'll leave all this privilege escalation for later and focus on having the gtp-u echo implemented in osmo-epdg first, and other stuff like IPv6 slaac too.

Actions #34

Updated by pespin 13 days ago

pespin wrote in #note-33:

But I think I'll leave all this privilege escalation for later and focus on having the gtp-u echo implemented in osmo-epdg first, and other stuff like IPv6 slaac too.

netlink message GTP_CMD_NEWPDP is marked GENL_ADMIN_PERM so it requires CAP_NET_ADMIN in osmo-epdg anyway afaict, same as RTNL_NEWLINK when creating the device.
So even if we create the tun outside of the process, we'd still require CAP_NET_ADMIN every time we want to create a pdp context...

Actions #35

Updated by pespin 13 days ago

I tested adding the rules to /etc/network/interfaces and it's working fine.
lynxis I started moving all those to the ansible scripts here: https://gitea.osmocom.org/ims-volte-vowifi/ansible-prototype/src/branch/pespin/main
I'll create a PR tomorrow after I test deploying them.

Actions #36

Updated by pespin 12 days ago

I have been reading on ip xfrm and nft ipsec.
References:

I see 2 approaches:

Global match on all ipsec traffic:

Seems like we can match if the incoming packet is ipsec with:

nft add rule ip mangle INPUT meta ipsec exists meta mark set 2

By appending "counter" to it, one can enable counter to check the times the rule hits. For some unknown reason yet, it is never hit. (I Also verified with an appended separate LOG to print the packet and see if FWMARK was applied in dmesg).

Seems we could also match on "esp" which is used by ipsec too:
https://itecnotes.com/server/linux-netfilter-how-to-mark-packet-by-reqid/

Match per-tunnel ipsec traffic

There's this way to match a given tun using eg reqid (ip xfrm state).

ipsec {in | out} [ spnum NUM ]  {reqid | spi}
ipsec {in | out} [ spnum NUM ]  {ip | ip6} {saddr | daddr}

It seems we need a nft rule like this:

nft add rule ip mangle INPUT ipsec in reqid 1 meta mark set 2

It is not possible afaict to apply it to all ipsec tunnels at once, since one of the filters must be passed (I still need to confirm this in case there's some wildcard value):

ipsec {in | out} [ spnum NUM ]  {reqid | spi}
ipsec {in | out} [ spnum NUM ]  {ip | ip6} {saddr | daddr}

So that would mean we need to add one rule per tunnel (UE). This can be done using the updown feature of strongswan:
https://docs.strongswan.org/docs/5.9/plugins/updown.html
https://wiki.strongswan.org/projects/strongswan/wiki/Updown/3

Our /usr/local/etc/swanctl7swanctl.conf already contains the line, but it points to a non existing script:

updown = /usr/lib/ipsec/_updown iptables

An example script can be found in our strongswan git repo in strongswan.git/src/_updown/_updown.in. Other examples can be found int:

./testing/tests/ikev1/nat-virtual-ip/hosts/moon/etc/nat_updown
./testing/tests/ikev2/nat-virtual-ip/hosts/moon/etc/nat_updown
./testing/tests/ikev2/net2net-same-nets/hosts/sun/etc/mark_upd

Actions #37

Updated by laforge 12 days ago

On Fri, Feb 16, 2024 at 06:57:16PM +0000, pespin wrote:

Seems like we can match if the incoming packet is ipsec with:

> nft add rule ip mangle INPUT meta ipsec exists meta mark set 2
> 

why are you looking at the input chain/hook? Input is for packets to local
sockets only. I was under the assumption we're talking about forwarded
packets here? Received from the internet, ipsec-decapsulated and routed/forwarded
to the gtp net-device.

Actions #38

Updated by pespin 12 days ago

laforge wrote in #note-37:

why are you looking at the input chain/hook? Input is for packets to local
sockets only. I was under the assumption we're talking about forwarded
packets here? Received from the internet, ipsec-decapsulated and routed/forwarded
to the gtp net-device.

Because I was trying to match on the ipsec packet, which is not forwarded but handled by the host (see https://www.rsquare.org/wp-content/uploads/2013/01/Netfilter-packet-flow1.png xfrm lookup + decode path).

But thinking about it again, maybe the "meta ipsec exists" actually matches for packets after+ xfrm has been applied, hence when matching the *inner ipsec packets, which indeed go through FORWARD.
I'll give it a try on Monday.

Actions #39

Updated by laforge 12 days ago

On Fri, Feb 16, 2024 at 07:36:09PM +0000, pespin wrote:

But thinking about it again, maybe the "meta ipsec exists" actually matches for packets after+ xfrm has been applied, hence when matching the *inner ipsec packets, which indeed go through FORWARD.

yes, I think this is the way to go. The "meta ipsec" information is not known until the xfrm has been
applied, I guess.

Also, matching on the inner packet is really what (I think) we want here: We want to match packets that have been successfully authenticated and decapsulated. We don't want to match any random packet that somebody sent with an ESP-in-UDP header without even knowing it it passes cryptographic authentication.

I'll give it a try on Monday.

excellent.

Actions #40

Updated by pespin 9 days ago

I got it working with the following nft rule:

nft add rule ip mangle PREROUTING meta ipsec exists meta mark set 2 counter

I applied it to PREROUTING because the mark must be applied before routing decision is done, since that's where the fwmark is used as configured by ip rule.

Actions #41

Updated by pespin 9 days ago

I submitted a PR to the ansible repo. With it I can setup everything and I made sure everything is started after a reboot of epdg.osmocom.org and I can connect to it with SWu-emulator:
https://gitea.osmocom.org/ims-volte-vowifi/ansible-prototype/pulls/2

Actually, one think still needs to be started manually: osmo-epdg, with:

root@epdg:~# cd /srv/
# ip l del dev gtp0; sleep1; ERL_FLAGS='-config /srv/osmo-epdg/config/local.config' /srv/osmo-epdg/_build/default/bin/osmo-epdg

I'll work next on adding a systemd service installed by ansible so that it is started automatically and we can restart the service easily.

Actions #42

Updated by pespin 9 days ago

  • % Done changed from 60 to 90

I updated the pull request of ansible repo with a systemd service file.
Everything is now started automatically upon host boot, one can simply run the SWu-emulator and start using userplane with ping:

SWu-IKEv2]# pipenv run python ./swu_emulator.py -d epdg.osmocom.org -I 999421234567890 -M 999 -N 42 -K 11111111111111111111111111111111 -C 22222222222222222222222222222222 -n epdg

# ip netns exec epdg ping 10.74.0.3
Actions #43

Updated by pespin about 23 hours ago

What we miss here still is operating on the GTPv1U fd passed to the tundev, so that we handle:
- GTP-U echo req/resp
- SLAAC in IPv6?

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)