Cisco’s EVPN Multi-Site it’s a great technology that allows us to achieve massive scale of an EVPN network. With the latest release, the official scalability numbers give us something in the realm of over 12000 VTEPs (512 VTEPs per site x 25 sites). I’m in no way suggesting that you would need such a big topology and you definitely should segment way sooner you reach the limit, but still…
The main configuration requirement for the Multi-Site overlay is to have a full mesh of eBGP peering between all border gateways.
This has scalability drawbacks as usual. Not only each leaf will have ever growing number of peers which will soon grow out of control, but maybe, worse is the fact that after one site is added, every other site must be touched too.
To avoid a full mesh, for iBGP topologies we would be using a Route Reflector, but with eBGP that’s obviously not an option. So, instead of a RR they way to scale eBGP peerings is to leverage a Route-Server.
A Route-Server provides route reflection capabilities and as such it must ensure that NLRIs attributes like the Next Hop and route-targets aren’t changed. In Cisco’s EVPN implementation, the auto defined route-targets are based on ASN:VNI, and in order to be able to use this simplified config, the RS should also support the “rewrite-evpn-rt-asn” feature; if that’s not the case, then hard coded and consistent route-targets must be defined across the VTEPs in the network. Finally, the route-server doesn’t have to be in the data plane since it’s only a control plane node.
Unfortunately, for EVPN, there isn’t a “route-server-client” configuration nob yet π¦ , nor we can find a configuration example in the Cisco pages. Fortunately, knowing the requirements we can figure out how the config should look like.
The command “disable-connected-check” is required otherwise the router will reject received prefixes with “DENIED due to: non-connected MP_REACH NEXTHOP“
The command “next-hop-unchanged” has no effect in the address-family L2VPN EVPN (probably a bug). A route-map is necessary in order to achieve the same result.
As for IOS-XE, the command “ignore-connected-check” is required. Additionally, IOS-XR unfortunately doesn’t support “rewrite-evpn-rt-asn“. This means that each VTEP will need to have manually configured the appropriate route-targets highly increasing configuration complexity.
Unless you have some automation backing up your EVPN deployment, probably isn’t a good idea to use IOS-XR as an EVPN Route-Server.
Do you have anything else to add? Then contact me, or leave a message below.
After discussing the architecture of our design during part 1, and the underlay configuration during part 2, today i’ll show how the overlay it’s configured and hopefully we will be able to draw our conclusions to the question: Are SONiC and White Box switches ready to be used in the enterprise DC?
Our two servers will be connected with LACP and trunk interfaces. 1 VLAN will be bridged (no SVI) and both servers will have an interface into such vlan so that layer 2 can be tested. Other 2 vlans instead will each be configured on a different pair of switches together with an SVI so that Layer 3 symmetric IRB can be tested.
VRF Configuration
First of all, let’s create a VRF. This vrf requires an VLAN and a Layer 3 VNI for symmetric IRB to function. Configuration is really simple, but a small caveat must be overlooked, specifically every vrf must contain the prefix Vrf- in the name.
From a configuration point of view, we have to follow the usual steps:
Create a VRF
Create a Vlan and allow it to the peer-link port channel
Create a SVI interface and assign it to the VRF itself
Associate the VNI to the vlan, then map it as a L3 VNI
root@SONIC-Leaf301:/home/admin# show ip route vrf Vrf-prod Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR, f - OpenFabric, > - selected route, * - FIB route, q - queued route, r - rejected route, # - not installed in hardware VRF Vrf-prod: C>* 1.1.1.0/31 is directly connected, Vlan1234, 00:00:06 B>* 1.1.1.2/31 [200/0] via 11.11.11.113, Vlan3800 onlink, 00:21:34 C>* 100.100.100.1/32 is directly connected, Loopback100, 00:07:20 B>* 100.100.100.2/32 [200/0] via 1.1.1.1, Vlan1234, 00:00:04 B>* 100.100.100.3/32 [200/0] via 11.11.11.113, Vlan3800 onlink, 00:21:34 B>* 100.100.100.4/32 [200/0] via 11.11.11.113, Vlan3800 onlink, 00:21:34
root@SONIC-Leaf301:/home/admin# ping 100.100.100.2 -I Vrf-prod ping: Warning: source address might be selected on device other than Vrf-prod. PING 100.100.100.2 (100.100.100.2) from 1.1.1.0 Vrf-prod: 56(84) bytes of data. 64 bytes from 100.100.100.2: icmp_seq=1 ttl=64 time=0.255 ms 64 bytes from 100.100.100.2: icmp_seq=2 ttl=64 time=0.239 ms ^C --- 100.100.100.2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1007ms rtt min/avg/max/mdev = 0.239/0.247/0.255/0.008 ms root@SONIC-Leaf301:/home/admin# ping 100.100.100.3 -I Vrf-prod ping: Warning: source address might be selected on device other than Vrf-prod. PING 100.100.100.3 (100.100.100.3) from 100.100.100.1 Vrf-prod: 56(84) bytes of data. 64 bytes from 100.100.100.3: icmp_seq=1 ttl=64 time=0.452 ms 64 bytes from 100.100.100.3: icmp_seq=2 ttl=64 time=0.301 ms ^C --- 100.100.100.3 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1005ms rtt min/avg/max/mdev = 0.301/0.376/0.452/0.077 ms root@SONIC-Leaf301:/home/admin# ping 100.100.100.4 -I Vrf-prod ping: Warning: source address might be selected on device other than Vrf-prod. PING 100.100.100.4 (100.100.100.4) from 100.100.100.1 Vrf-prod: 56(84) bytes of data. 64 bytes from 100.100.100.4: icmp_seq=1 ttl=63 time=0.345 ms 64 bytes from 100.100.100.4: icmp_seq=2 ttl=63 time=0.279 ms 64 bytes from 100.100.100.4: icmp_seq=3 ttl=63 time=0.251 ms ^C --- 100.100.100.4 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2045ms rtt min/avg/max/mdev = 0.251/0.291/0.345/0.044 ms
I’ve also included some one unique Loopback on each leaf, and vrf-lite iBGP between the two MCLAG peers across the peer-link (the reason why this is necessary it’s left to the reader to figure out, at least for now π ). Connectivity between Loopbacks is also verified.
VLAN Configuration
It might have been overlooked, but while configuring a VRF, we already configured a vlan (vlan 3800). But let’s give it another try.
The configuration of an SVI (when necessary) is also trivial, we just need to take care of enabling suppress-arp and to specify that the IP address is a Distributed Anycast Gateway (DAG):
In my case, i also want to configure DHCP relay from my server, and i can do that with a single line (can you tell why i need to enable option 82 sub-option link selection?):
config interface ip dhcp-relay add Vlan200 10.10.10.100 10.10.10.101 -src-intf=Loopback100 -link-select=enable
On top of the previously used show commands, other commands can be used to verify the config applied:
root@SONIC-Leaf301:/home/admin# show ip static-anycast-gateway Configured Anycast Gateway MAC address: 00:00:22:22:33:33 IPv4 Anycast Gateway MAC address: enable Total number of gateway: 2 Total number of gateway admin UP: 2 Total number of gateway oper UP: 2 Interfaces Gateway Address Vrf Admin/Oper ------------ ----------------- -------- ------------ Vlan200 10.10.200.1/24 Vrf-prod up/up Vlan500 10.10.10.1/24 Vrf-prod up/up
root@SONIC-Leaf301:/home/admin# show neigh-suppress all +----------+----------------+---------------------+ | VLAN | STATUS | ASSOCIATED_NETDEV | +==========+================+=====================+ | Vlan3800 | Not Configured | nve1-3800 | +----------+----------------+---------------------+ | Vlan100 | Not Configured | nve1-100 | +----------+----------------+---------------------+ | Vlan200 | Configured | nve1-200 | +----------+----------------+---------------------+ | Vlan500 | Configured | nve1-500 | +----------+----------------+---------------------+ Total count : 4
root@SONIC-Leaf301:/home/admin# show ip dhcp-relay brief +------------------+-----------------------+ | Interface Name | DHCP Helper Address | +==================+=======================+ | Vlan200 | 10.10.10.100 | | | 10.10.10.101 | +------------------+-----------------------+
SONIC-Leaf301# show bgp l2vpn evpn vni Advertise Gateway Macip: Disabled Advertise SVI Macip: Disabled Advertise All VNI flag: Enabled BUM flooding: Head-end replication Number of L2 VNIs: 3 Number of L3 VNIs: 1 Flags: * - Kernel VNI Type RD Import RT Export RT Tenant VRF * 1000200 L2 10.0.0.11:200 65000:1000200 65000:1000200 Vrf-prod * 1000500 L2 10.0.0.11:500 65000:1000500 65000:1000500 Vrf-prod * 1000100 L2 10.0.0.11:100 65000:1000100 65000:1000100 default * 1000000 L3 10.0.0.11:5096 65000:1000000 65000:1000000 Vrf-prod
We did quiet a big deal of configuration right now, but of course, we cannot see anything unless we configure the ports facing our hosts
Hosts port configuration
The switches i am working on, have a limitation where every 12 ports must have exactly the same speed. This is an issue of this specific switch, not a sonic problem, nonetheless we need to be aware of it.
Now it’s time to configure our MCLAG port-channel:
config portchannel add PortChannel100 config portchannel member add PortChannel100 Ethernet9 config mclag member add 1 PortChannel100 config interface startup Ethernet9
to verify the config:
root@SONIC-Leaf301:/home/admin# show interfaces portchannel Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available, S - selected, D - deselected No. Team Dev Protocol Ports ----- -------------- ----------- --------------------------- 1 PortChannel1 LACP(A)(Up) Ethernet52(S) Ethernet48(S) 100 PortChannel100 LACP(A)(Up) Ethernet9(S) 101 PortChannel101 LACP(A)(Up) Ethernet1(S)
admin@SONIC-Leaf301:~$ sonic-cli SONIC-Leaf301# show mclag brief
Domain ID : 1 Role : active Session Status : up Peer Link Status : up Source Address : 10.0.0.11 Peer Address : 10.0.0.12 Peer Link : PortChannel1 Keepalive Interval : 1 secs Session Timeout : 30 secs System Mac : 80:a2:35:81:dd:f0
Number of MLAG Interfaces:2 ----------------------------------------------------------- MLAG Interface Local/Remote Status ----------------------------------------------------------- PortChannel101 up/up PortChannel100 up/up
And to finish, we only need to add the vlans to the trunks:
config vlan member add 100 PortChannel100 config vlan member add 200 PortChannel100
With what we have seen so far, i really believe that SONiC is mature enough to cover most of the common DC network requirements. Notice that differently than other vendor’s solution that believe can do everything, including make you coffee or take you to the moon; SONiC is more specialised, does a few things and does them very well. As long as what you need to do is supported by SONiC, then go ahead, it isn’t going to disappoint you.
An enterprise that considers to run SONiC should also understand the support model. SONiC itself comes without support, and really, here we are looking at a typical open-source situation where you can choose to operate a software completely free of charge on your own, or you could pay a reputable company to provide you with a patched and supported software revision (a bit like Red Hat or SUSE Linux).
From an hardware standpoint, i think that the white boxes are mature enough. For example, Edge-Coreβs AS7326-56X is basically identical to Juniper’s QFX 5120 (including port groups). We are in the same world as your servers really. You can get your hardware from any vendor or you can find a trusted one like Dell. It’s up to you really..
In short then, what are the take away for “standard” enterprises?
SONiC will work great if what you need to do fits the supported features
White Box switches are comparable or identical to big vendor’s hardware
You REALLY should be looking at someone to provide you with end to end support though. Maybe someone like Broadcom or other Service Providers to ensure you can get a single point of contact for all of your possible problems
The knowledge gap can be scary at first, but it’s no longer a big obstacle. ACI for example was a nightmare and took me forever to learn and understand, SONiC on the other end was a piece of cake.
Try and Experiment, open networking is so cheap that it costs almost nothing to bring up a small lab or even a production POC.
As discussed during our part 1, we are trying to configure a VXLAN-EVPN fabric using SONiC on white box switches in order to determine if Open Networking is ready to be deployed in most enterprise DCs.
As a small Recap, below is the topology we are trying to bring online:
Familiarise with the OS
The most interesting thing of SONiC is its architecture! I’ll write a blog just about it because it’s a fascinating topic, but in short, every single process is living inside a dedicated container.
Below is the list of interfaces on my leaf. Notice how the naming of such interfaces can be confusing, specifically for the ones that can be channelised (like 40/100Gbps interfaces which support breakout). The primary channel is used as the interface number like with interface Ethernet48. In case an interface is then broken out, the other channels will be listed as Ethernet49, 50 and 51, making the next physical interface Ethernet52. Interface Aliases are really interesting, unfortunately they currently act more like a description and even switching to “alias” as the default interface name, such alias is used in very few places making it pretty much useless as of now.
admin@SONIC-Leaf301:~$ show interfaces status
Interface Lanes Speed MTU Alias Vlan Oper Admin Type Asym PFC
----------- --------------- ------- ----- ---------------- ------ ------ ------- --------------- ----------
Ethernet0 3 25G 9100 twentyFiveGigE1 routed down down SFP/SFP+/SFP28 N/A
Ethernet1 2 25G 9100 twentyFiveGigE2 routed down down SFP/SFP+/SFP28 N/A
Ethernet2 4 25G 9100 twentyFiveGigE3 routed down down N/A N/A
Ethernet3 8 25G 9100 twentyFiveGigE4 routed down down N/A N/A
Ethernet4 7 25G 9100 twentyFiveGigE5 routed down down N/A N/A
Ethernet5 1 25G 9100 twentyFiveGigE6 routed down down N/A N/A
Ethernet6 5 25G 9100 twentyFiveGigE7 routed down down N/A N/A
Ethernet7 16 25G 9100 twentyFiveGigE8 routed down down N/A N/A
Ethernet8 6 25G 9100 twentyFiveGigE9 routed down down N/A N/A
Ethernet9 14 25G 9100 twentyFiveGigE10 routed down down SFP/SFP+/SFP28 N/A
Ethernet10 13 25G 9100 twentyFiveGigE11 routed down down N/A N/A
Ethernet11 15 25G 9100 twentyFiveGigE12 routed down down N/A N/A
Ethernet12 23 25G 9100 twentyFiveGigE13 routed down down N/A N/A
Ethernet13 22 25G 9100 twentyFiveGigE14 routed down down N/A N/A
Ethernet14 24 25G 9100 twentyFiveGigE15 routed down down N/A N/A
Ethernet15 32 25G 9100 twentyFiveGigE16 routed down down N/A N/A
Ethernet16 31 25G 9100 twentyFiveGigE17 routed down down N/A N/A
Ethernet17 21 25G 9100 twentyFiveGigE18 routed down down N/A N/A
Ethernet18 29 25G 9100 twentyFiveGigE19 routed down down N/A N/A
Ethernet19 36 25G 9100 twentyFiveGigE20 routed down down N/A N/A
Ethernet20 30 25G 9100 twentyFiveGigE21 routed down down N/A N/A
Ethernet21 34 25G 9100 twentyFiveGigE22 routed down down N/A N/A
Ethernet22 33 25G 9100 twentyFiveGigE23 routed down down N/A N/A
Ethernet23 35 25G 9100 twentyFiveGigE24 routed down down N/A N/A
Ethernet24 43 25G 9100 twentyFiveGigE25 routed down down N/A N/A
Ethernet25 42 25G 9100 twentyFiveGigE26 routed down down N/A N/A
Ethernet26 44 25G 9100 twentyFiveGigE27 routed down down N/A N/A
Ethernet27 52 25G 9100 twentyFiveGigE28 routed down down N/A N/A
Ethernet28 51 25G 9100 twentyFiveGigE29 routed down down N/A N/A
Ethernet29 41 25G 9100 twentyFiveGigE30 routed down down N/A N/A
Ethernet30 49 25G 9100 twentyFiveGigE31 routed down down N/A N/A
Ethernet31 60 25G 9100 twentyFiveGigE32 routed down down N/A N/A
Ethernet32 50 25G 9100 twentyFiveGigE33 routed down down N/A N/A
Ethernet33 58 25G 9100 twentyFiveGigE34 routed down down N/A N/A
Ethernet34 57 25G 9100 twentyFiveGigE35 routed down down N/A N/A
Ethernet35 59 25G 9100 twentyFiveGigE36 routed down down N/A N/A
Ethernet36 62 25G 9100 twentyFiveGigE37 routed down down N/A N/A
Ethernet37 63 25G 9100 twentyFiveGigE38 routed down down N/A N/A
Ethernet38 64 25G 9100 twentyFiveGigE39 routed down down N/A N/A
Ethernet39 65 25G 9100 twentyFiveGigE40 routed down down N/A N/A
Ethernet40 66 25G 9100 twentyFiveGigE41 routed down down N/A N/A
Ethernet41 61 25G 9100 twentyFiveGigE42 routed down down N/A N/A
Ethernet42 68 25G 9100 twentyFiveGigE43 routed down down N/A N/A
Ethernet43 69 25G 9100 twentyFiveGigE44 routed down down N/A N/A
Ethernet44 67 25G 9100 twentyFiveGigE45 routed down down N/A N/A
Ethernet45 71 25G 9100 twentyFiveGigE46 routed down down N/A N/A
Ethernet46 72 25G 9100 twentyFiveGigE47 routed down down N/A N/A
Ethernet47 70 25G 9100 twentyFiveGigE48 routed down down N/A N/A
Ethernet48 77,78,79,80 100G 9100 hundredGigE49 routed down down QSFP28 or later N/A
Ethernet52 85,86,87,88 100G 9100 hundredGigE50 routed down down QSFP28 or later N/A
Ethernet56 93,94,95,96 100G 9100 hundredGigE51 routed down down N/A N/A
Ethernet60 97,98,99,100 100G 9100 hundredGigE52 routed down down N/A N/A
Ethernet64 105,106,107,108 100G 9100 hundredGigE53 routed down down N/A N/A
Ethernet68 113,114,115,116 100G 9100 hundredGigE54 routed down down N/A N/A
Ethernet72 121,122,123,124 100G 9100 hundredGigE55 routed down down QSFP28 or later N/A
Ethernet76 125,126,127,128 100G 9100 hundredGigE56 routed down down QSFP28 or later N/A
Ethernet80 129 10G 9100 mgmtTenGigE57 routed down down N/A N/A
Ethernet81 128 10G 9100 mgmtTenGigE58 routed down down N/A N/
Configuring the Underlay Routing
I’m a big fan of automation and configuration simplicity. I strongly believe that if i can automate with “notepad” using blind copy/paste i have good templates for fancier automation. For this reason i really think that the use of unnumbered interfaces are a great way to configure spine/leaf interfaces.
First step then, is to configure all fabric interfaces with a proper MTU and ip unnumbered as follows in the example below. Please notice that this post isn’t meant to be a full configuration tutorial.
config loopback add Loopback0
config interface ip add Loopback0 10.0.0.1/32
config interface ip unnumbered add Ethernet120 Loopback0
config interface mtu Ethernet120 9216
config interface startup Ethernet120
... Repeat for all interfaces facing a leaf ...
config save -y
A leaf switch would be configured exactly the same way, but I need also to add a second loopback interface to be used as VTEP source interface. As this loopback will act as an MC-LAG Anycast IP, both leafs in MC-LAG will have the same exact IP on their loopback 0
config loopback add Loopback0
config interface ip add Loopback0 10.0.0.11/32
config loopback add Loopback1
config interface ip add Loopback1 11.11.11.111/32
config interface ip unnumbered add Ethernet72 Loopback0
config interface mtu Ethernet72 9216
config interface description Ethernet72 "LINK_TO_SPINE_1"
config interface startup Ethernet72
config interface ip unnumbered add Ethernet76 Loopback0
config interface mtu Ethernet76 9216
config interface description Ethernet76 "LINK_TO_SPINE_2"
config interface startup Ethernet76
config save -y
At this point we need to configure OSPF between leafs and spines. Unfortunately, advanced routing configs can only be applied inside the FRR container, so we need to switch to the FRR shell using the command “vtysh” first. From there on, really there is almost no difference with the well known cisco-like cli.
The biggest downside of this lack of integration is that FRR config needs to be saved separately from the rest of SONiC’s config, and we also need to tell SONiC to look at the routing config in a different place from the rest. To do that, we also need to apply the “config routing_config_mode split” command and most importantly, you need to reboot the box as the warning message will tell you. Failure to do so, will cause the switch to loose the FRR config in case of reload.
vtysh
conf t
!
bfd
!
router ospf
ospf router-id 10.0.0.11
log-adjacency-changes
auto-cost reference-bandwidth 100000
!
interface Ethernet72
ip ospf area 0.0.0.1
ip ospf bfd
ip ospf network point-to-point
!
interface Ethernet76
ip ospf area 0.0.0.1
ip ospf bfd
ip ospf network point-to-point
!
interface Loopback0
ip ospf area 0.0.0.1
!
interface Loopback1
ip ospf area 0.0.0.1
end
write memory
exit
config routing_config_mode split
config save -y
Once everything is configured, from FRR we can check our routing:
SONIC-Leaf301# show ip ospf neighbor
Neighbor ID Pri State Dead Time Address Interface RXmtL RqstL DBsmL
10.0.0.1 1 Full/DROther 33.775s 10.0.0.1 Ethernet72:10.0.0.11 0 0 0
10.0.0.2 1 Full/DROther 33.968s 10.0.0.2 Ethernet76:10.0.0.11 0 0 0
SONIC-Leaf301# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route, # - not installed in hardware
O>* 10.0.0.1/32 [110/11] via 10.0.0.1, Ethernet72 onlink, 00:03:08
O>* 10.0.0.2/32 [110/11] via 10.0.0.2, Ethernet76 onlink, 00:03:18
C * 10.0.0.11/32 is directly connected, Ethernet76, 00:05:08
C * 10.0.0.11/32 is directly connected, Ethernet72, 00:05:08
O 10.0.0.11/32 [110/10] via 0.0.0.0, Loopback0 onlink, 00:05:14
C>* 10.0.0.11/32 is directly connected, Loopback0, 00:05:15
O>* 10.0.0.12/32 [110/12] via 10.0.0.1, Ethernet72 onlink, 00:03:08
* via 10.0.0.2, Ethernet76 onlink, 00:03:08
O>* 10.0.0.13/32 [110/12] via 10.0.0.1, Ethernet72 onlink, 00:03:08
* via 10.0.0.2, Ethernet76 onlink, 00:03:08
O>* 10.0.0.14/32 [110/12] via 10.0.0.1, Ethernet72 onlink, 00:03:08
* via 10.0.0.2, Ethernet76 onlink, 00:03:08
O>* 10.10.10.2/31 [110/12] via 10.0.0.1, Ethernet72 onlink, 00:03:08
* via 10.0.0.2, Ethernet76 onlink, 00:03:08
O 11.11.11.111/32 [110/10] via 0.0.0.0, Loopback1 onlink, 00:05:14
C>* 11.11.11.111/32 is directly connected, Loopback1, 00:05:15
O>* 11.11.11.113/32 [110/12] via 10.0.0.1, Ethernet72 onlink, 00:03:08
* via 10.0.0.2, Ethernet76 onlink, 00:03:08
SONIC-Leaf301# ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.243 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.186 ms
^C
--- 10.0.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1011ms
rtt min/avg/max/mdev = 0.186/0.214/0.243/0.032 ms
SONIC-Leaf301# ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.215 ms
^C
--- 10.0.0.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.215/0.215/0.215/0.000 ms
admin@SONIC-Leaf301:~$ traceroute 10.0.0.13
traceroute to 10.0.0.13 (10.0.0.13), 30 hops max, 60 byte packets
1 10.0.0.1 (10.0.0.1) 0.218 ms 10.0.0.2 (10.0.0.2) 0.178 ms 10.0.0.1 (10.0.0.1) 0.126 ms
2 10.0.0.13 (10.0.0.13) 0.439 ms 0.461 ms 0.468 ms
Now that every loopback is reachable (and we see also ECMP across the two spines) is time to configure MC-LAG between or leafs as well as underlay routing across the peer-link. This step can be done successfully only now because the MC-LAG peer has to be reachable via the fabric.
config portchannel add PortChannel1
config interface mtu PortChannel1 9216
config interface mtu Ethernet48 9216
config interface description Ethernet48 "Peer-link"
config interface startup Ethernet48
config interface mtu Ethernet52 9216
config interface description Ethernet52 "Peer-link"
config interface startup Ethernet52
config portchannel member add PortChannel1 Ethernet48
config portchannel member add PortChannel1 Ethernet52
config mclag add 1 10.0.0.11 10.0.0.12 PortChannel1
config vlan add 3965
config vlan member add 3965 PortChannel1
config mclag unique-ip add Vlan3965
config interface ip add Vlan3965 10.10.10.0/31
vtysh
conf t
!
interface Vlan3965
ip ospf area 0.0.0.1
end
write memory
exit
config save -y
Once done, we should see our additional OSPF peer and a working MC-LAG cluster
admin@SONIC-Leaf301:~$ vtysh
Hello, this is FRRouting (version 7.2-sonic).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
SONIC-Leaf301# show ip ospf neighbor
Neighbor ID Pri State Dead Time Address Interface RXmtL RqstL DBsmL
10.0.0.1 1 Full/DROther 30.645s 10.0.0.1 Ethernet72:10.0.0.11 0 0 0
10.0.0.2 1 Full/DROther 30.899s 10.0.0.2 Ethernet76:10.0.0.11 0 0 0
10.0.0.12 1 Full/DR 36.717s 10.10.10.1 Vlan3965:10.10.10.0 0 0 0
SONIC-Leaf301# exit
admin@SONIC-Leaf301:~$ sonic-cli
SONIC-Leaf301# show mclag brief
Domain ID : 1
Role : active
Session Status : up
Peer Link Status : up
Source Address : 10.0.0.11
Peer Address : 10.0.0.12
Peer Link : PortChannel1
Keepalive Interval : 1 secs
Session Timeout : 30 secs
System Mac : 80:a2:35:81:dd:f0
Number of MLAG Interfaces:0
Everything works as expected, but we also faced yet another SONiC problem. We needed to configure interfaces and their IP addresses, OSPF and MC-LAG, to do this, we needed access to 3 different configuration shells (Linux CLI, VTYSH and sonic-cli) either for configuration or to be able to run show commands and verify our configs.
Configuring BGP-EVPN control plane
Now it’s time to configure BGP. As per our architecture, I’ll be configuring iBGP with Route Reflectors sitting on the Spines. To do so i’ll need FRR shell. The spine config will look something similar to this one:
Once done, I should be able to see all peering formed on my spines:
SONIC-Spine31# show bgp l2vpn evpn summary
BGP router identifier 10.0.0.1, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 16, using 3072 bytes of memory
Peers 4, using 82 KiB of memory
Peer groups 1, using 64 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
*10.0.0.11 4 65000 5 29 0 0 0 00:01:33 0
*10.0.0.12 4 65000 6 30 0 0 0 00:01:34 0
*10.0.0.13 4 65000 7 31 0 0 0 00:01:40 0
*10.0.0.14 4 65000 7 31 0 0 0 00:01:44 0
Total number of neighbors 4
- dynamic neighbor
4 dynamic neighbor(s), limit 100
Total number of neighbors established
At this point, the only missing piece is to configure the VTEP on the leaf switches as well as the anycast gateway’s mac-address; fortunately this is very simple and straight forward:
root@SONIC-Leaf301:/home/admin# show vxlan interface
VTEP Information:
VTEP Name : nve1, SIP : 11.11.11.111
NVO Name : nvo1, VTEP : nve1
Source interface : Loopback1
root@SONIC-Leaf301:/home/admin# show ip static-anycast-gateway
Configured Anycast Gateway MAC address: aa:aa:bb:bb:cc:cc
IPv4 Anycast Gateway MAC address: enable
In shortβ¦
We configured a fully functional fabric providing underlay connectivity and EVPNcontrol plane as follows:
A unique loopbacks on every switch
Each physical interface between spine and leaf as an ip unnumbered interface
OSPF area 1 within the fabric
MC-LAG and underlay peering across the peer-link
iBGP EVPN between leaf and spines with RR on the spines themeselves
Each MC-LAG pair as a unique Virtual VTEP.
We also noticed how while the configuration isn’t complicated by any mean, the need to move between multiple shells just to apply or verify configs can be very confusing to the end user. To be fair though, the SONiC community is working on improving this part by working on delivering a single unified shell.
FRR config is always very familiar as it resembles Cisco’s IOS cli; on the other end the basic Sonic CLI can be a bit frustrating at times, especially due to the fact that it’s case sensitive making typos easy to occur.
In the next blog post we will look how to actually configure VXLANs and server facing interfaces… stay tuned!
In recent years two buzz words began to arise: open-networking and white box switches. Those two words go often hand-in-hand with each other. They are often promoted by big names like Facebook or Microsoft. From the software side, SONiCis maybe the biggest player out there as it powers Microsoft Azure’s cloud, while from the hardware side, Accton has arguably been one of the most important vendors.
The truth though, at least in my opinion, is that while this innovation is great it is not ready to be embraced by everyone yet. Only companies willing to make this “leap of faith” can take advantage of all of this, but what about us poor mortals? Are SONiC and white boxes ready to be widely deployed? Well let’s give it a look!
We will be deploying a simple VXLAN-EVPN Fabric like in the picture below and we will be checking how difficult is to configure and troubleshoot the fabric, but also and most importantly if this common Enterprise design actually works.
The Hardware
For our spines we’ll be usingEdge-Core’s AS7816-64X, powered by Broadcom’s Tomahawk II chipset. This switch is a 2RU lean spine providing 64x 40/100 Gbps QSF28 ports.
For the leafs, we’ll be using Edge-Core’s AS7326-56X, powered by Broadcom’s Trident III chipset. This switch is a 1RU TOR providing 48x 1/10/25 Gbps SFP28 and 8x 40/100 Gbps QSFP28 ports.
The Software
As for the software, we will be focusing on SONiC version 3.0.1. This version introduces support to VXLAN-EVPN among many other things, that in my opinion, makes it ready for a more wide spread distribution.
The Architecture
Looking at SONiC’s features, we will try to implement the architecture below. Some choices though, like the usage of a Virtual-VTEP as opposed of EVPN Multi-homing or ingress-replication for BUM traffic, are dictated only by SONiC support of such configurations.
I won’t explain why I’ve chosen OSPF+iBGP, this is a discussion for another time. It suffices to say that there is no reason to reinvent the wheel as this design worked perfectly for decades in the much more complex MPLS Service Provider’s space.
In short…
In this first post, I wanted to appeal to your curiosity, and set expectation right. Accton switches powered by Boradcom chipset will be our white box switches while SONiC is our open source operating system. In the next one instead, we will be implementing the above design looking at SONiC CLI and we’ll try to make it work.
Spoiler alert… It works but… well… details are a lot more interesting…