Scale Computing has developed a fun-sized fabric for edge environments (think fast-food chain or retail store) where you need resilient connectivity on a budget.
Scale uses BGP EVPN and VXLAN to build a fabric just like you would in a great big data center, but for a little cluster of compute nodes and a single switch.
Curious? Read on for the details.
Living On The Edge
When the public cloud is too far away to be a practical platform to run applications on, edge computing steps in, putting computing power precisely where you need it. Scale Computing offers its HC3 line that includes small server clusters designed to work at the edge.
If you hear the word “server” and immediately think of something you have to rack mount, think again. You can run a Scale Computing cluster on boxes as small as Intel NUCs.
Edge computing sites aren’t often blessed with lots of space to house gear or the budget to buy it. In that context, Scale Computing recently discussed their new HC3 Edge Fabric product. HC3 Edge Fabric is about providing network resiliency while at the same time conserving ports on network switches.
A Little History
In the past, Scale’s architecture usually involved four cables per host.
- An active link to the outside world.
- A standby link to the outside world.
- An active link for backplane traffic such as Scale’s SCRIBE (Scale Computing Reliable Independent Block Engine) distributed storage layer. That is, inter-node traffic that has specific performance requirements that shouldn’t be mixed with outside-world communications.
- A standby link for backplane traffic.
Historically, Scale also recommended dedicated switches in the cluster–one for the outside world, and one for the backplane.
Therefore, in a three-node cluster, you’d end up using six ports on each of two switches to meet Scale’s simplest network architecture requirements.
Complications At The Edge
This traditional network design could be cost-prohibitive in edge environments. For example, Scale Computing cited fast food restaurants, gas stations, supermarkets, and industrial + manufacturing sites as types of businesses that will use edge computing.
These sites likely have a limited infrastructure budget, but the business demands highly available applications. The idea then is to roll out the app on a Scale Computing HC3 three node cluster using small servers, minimizing the networking switch port requirement with HC3 Edge Fabric.
Introducing HC3 Edge Fabric
With HC3 Edge Fabric, Scale Computing has created a networking architecture that reduces the cluster’s hardware requirements. There’s one less switch to worry about. There are fewer Ethernet NIC ports required on the hosts. At the same time, Scale Computing isn’t wimping out on resiliency.
The big idea is to interconnect the cluster nodes directly to each other for backplane traffic and backup links. That cuts down the need for switch ports to a single link per host, instead of the four links per host previously required.
The topology above, if you focus on the black lines interconnecting the three HC3 cluster nodes, is a ring. For networkers used to hub-and-spoke designs, this should feel liberating. There is no explicit topology required for HC3 Edge Fabric. Connect the nodes directly to each other however you like, although design principles and common sense still apply.
Scale Computing points out that 10GBASE-T ports are increasingly available on many of these small boxes that might be used in edge deployments. Therefore, imagine using 1Gbps ports to uplink to the switch for application consumers, while at the same time having a 10Gbps path for backplane and failover traffic to traverse.
Another way to look at it is a 1Gbps path for north-south traffic, and a 10Gbps path for east-west traffic. However you want to visualize it, the 10Gbps path doesn’t require an Ethernet switch. That’s a big deal.
Interestingly, HC3 Switch Fabric can even use Thunderbolt 15Gbps ports between the nodes, and run IP over that to make a 15Gbps path. That frees up the Ethernet ports completely.
How Does HC3 Edge Fabric Work?
In a presentation, Phil White CTO stated, “HC3 Edge Fabric really is a distributed bridge. It’s allowing VMs to communicate to each other along that fabric without having to leave the cluster, leave the uplinks and enter the switching infrastructure.”
And if you just went, “Um…what? Those are servers probably running Linux. That’s not a distributed bridge. What am I missing?” What you’re missing is that HC3 Edge Fabric is actually running BGP EVPN.
The arbitrary topology is allowed because it is a routed layer tree design. There are no loops to be concerned with. Nodes discover each other across links using BGP. Links come up as BGP unnumbered, link local. Once the peer node is discovered, a /32 routing table entry is added. These inter-node links form an underlay fabric.
VXLAN using EVPN as the control plane encapsulates traffic between the VMs via MAC routing. Policy per VNI defines what happens to an Ethernet frame that enters or exits the fabric from the outside world or a VM. That policy is most likely going to be a mapping between a VXLAN VNI and Ethernet VLAN tag.
Is That…Complicated?
In short, HC3 Edge Fabric is a tiny BGP underlay with VXLAN overlay and EVPN control plane running in between cluster nodes. Distributed bridge, just like you would have in your big fancy data center using leaf-spine, only fun-sized. How cool is that?
You might argue that it feels too complicated for a little cluster you’re going to stick in a closet at a fast-food place, but the benefit of an arbitrary network topology that’s fully resilient (assuming the cabling engineer uses common sense) while still eliminating a switch is a powerful value proposition.
Plus, VXLAN and EVPN are well-known technologies at this point. The industry understands them and lots of code bases support them. This isn’t bleeding edge stuff. We’ve been talking about EVPN on Packet Pushers at least as far back as July 2014.
Observations From The Demo
Phil White demonstrated HC3 Edge Fabric during his presentation. He used commands provided by their new daemon, scfabricd, to create VNIs and describe what happens to frames ingressing and egressing that VNI. This served largely to map VLAN tags to VNIs.
Then Phil stood up a new VM, and made it a member of the newly created VNI. And then Phil did bad things to the wiring–very bad things–mostly in the form of plugging and unplugging cables, showing with a live ping that the VM was (or was not) reachable from the outside world. Everything worked in that demo as expected. If at least one cable was plugged in, all was well. If none were plugged in, then the VM was no longer pingable, as you’d expect.
If unplugging all the cables seems a pedantic point to make, I suppose it is. However, I also think it emphasizes the need to carefully think through the cabling scheme for your specific hardware deployment.
Also noteworthy was the speed at which topology changes would converge. This seemed undetectably fast to my eye. When asked about this, Phil said that they are monitoring the state of a particular netsocket to determine that a link is no longer available.
He did not say, “BFD.” Sounds like Scale Computing is doing their own thing for fast link failure detection, but I’m okay with that, assuming they’ve found something more efficient and/or necessary. Phil wasn’t able to get into why or why not BFD in his presentation due to time constraints.
Although still…I wonder. Most of HC3’s Edge Fabric code is based on FRRouting + Linux bridging. Phil gave props to Cumulus Networks’ efforts in helping FRRouting to grow. BFD is there in the FRRouting stack.
But I digress.
What’s Next For HC3 Edge Fabric?
Next steps for Scale Computing’s HC3 Edge Fabric include an LACP 802.3ad based MLAG implementation. Phil pointed out that the basis for this is already in place. The issue is introducing LACP code with multi-node support.
I guess I’d use that feature if it was there for the LAN-facing uplinks, although I’d think hard about it first. MLAG is often a pain in the butt, failing in odd and sometimes exciting (not in a good sense) ways. I’d be more inclined to leverage LACP within a single cluster node and multi-home links to MLAG running on the switch side.
Another road map item is true microsegmentation. This is another element where all the pieces are there, but there’s a good bit of work for Scale Computing to do to pull this off. They would need a policy engine and then the ability to write ACLs in the hypervisor vSwitch and/or in kernel forwarding tables.
Although I didn’t discuss the daemon architecture in this piece, the scfabricd is positioned in the forwarding table flow such that it could be used to program the will of a microsegmentation policy engine into the forwarding tables.
On the other hand, Scale Computing could look at a partnership with someone like Illumio that could, at least from my position as armchair quarterback, integrate plausibly well with their platform.
A final roadmap item Scale Computing mentioned is external EVPN integration. That would allow for seamless connectivity with data centers that already have EVPN deployments.
For More Information
Tech Field Day 20 YouTube Playlist. Scale Computing’s presentations are 12-14 on the list.