128 Technology (128T) makes routers. But…they don’t make routers in the way you’d normally think of. Instead, a 128T network makes security and traffic engineering features first class citizens along with routing protocols and packet forwarding.
Before you dive into this piece, you should check out Briefings In Brief Episode 81 for an 8 minute audio overview of 128T with me and Drew Conry-Murray. I’ll wait right here.
You’re back? Let’s go!
What Makes 128T Special?
When building a 128T network, you get several features that distinguish it from traditionally routed networks most network engineers are familiar with.
- Session-based. 128T devices forward sessions–not simply packets.
- Zero-trust. 128T devices do not forward sessions unless a security policy permits the session.
- Engineered. Paths are constructed for sessions through the 128T network to meet a policy-defined service level agreement (SLA).
- Encrypted. By default, all traffic payloads are encrypted via AES-256.
- Translated. 128T devices do not tunnel between themselves to create engineered forwarding paths. Instead, they rewrite source and destination IPs using network address translation (NAT).
If it’s not obvious why these features are interesting, keep reading. We’re going to go more deeply into 128T’s perspective on forwarding traffic.
How Does 128T View The World?
128T sees the network somewhat differently than others. Their point of view is described in their data model. We need to walk through this, as the terminology is crucial for understanding what happens when we get to the packet walk promised by the title.
Conductor
Conductor is 128T’s network controller. Therefore, Conductor is your primary interface to manage what’s going on in the 128T environment. You don’t have to use Conductor–you could configure all 128T devices in the network by hand if you wanted to, but 128T points out that none of their customers manage their environments without Conductor.
Conductor is all-seeing in that it has a complete view of the network, including an understanding of how all 128T devices are interconnected. Conductor also acts as the 128T monitoring platform.
The Conductor is all-knowing in that it also holds all 128T device configurations. This starts right at the beginning of a new 128T device’s life–Conductor is the main ZTP server.
For those who recall the early software defined networking model where an OpenFlow controller would get in the middle of new flows–introducing lots of latency–forget about that with Conductor. Conductor is management plane only, not data plane. Conductor is not responsible for forwarding packets through a 128T network, not even on initial session setup.
If connectivity between Conductor and the 128T devices it’s managing is disrupted, sessions will continue to be forwarded. The only impact of the disruption is that configuration changes won’t be reflected. Effectively, the management plane is down for 128T devices when the connection is interrupted to the Conductor. When the management plane connection is restored, configuration changes will catch up.
Services & Tenants
128T defines a service as anything in your network that is a destination. Think of a service in 128T parlance as something being consumed. Examples include a phone, an application, or a print server.
128T’s idea of a tenant is the complement of a 128T service. In the 128T world view, tenants are sources of traffic–the things that are consuming the services. They are the origin of sessions.
This definition of “tenant” is key to grasp, as the more usual definition is usually tied to a customer being isolated from others in a virtual networking model. The 128T view on what a tenant is is much more broad. If you limit your thinking of tenant to how most of the industry uses the term, you might be confused.
128T tenants, global in nature, are defined by as many as three different properties.
- A hierarchical scope. That is, several objects could be grouped hierarchically, and policy applied to the group. The group name is carried in the TLV field of 128T metadata. We’ll talk more about metadata later on.
- A network interface. This is sort of like zone-based firewalling, but doesn’t feel exactly the same to me. The idea is to make a network interface an object against which policy can be written.
- IP addresses. This is exactly what you think, where individual IP address or netblocks can become policy objects.
In summary, tenants are sources, and services are destinations. As a network engineer, you’ll define tenants, define services, and write policy that governs how tenants can (security) and should (SLA) consume services.
The tenant conceptual abstraction is necessary because of mobile users and the fuzzy edges our networks have these days. Writing security policies based on IP addresses is not enough by itself. Make sense? Excellent!
Routers & Network Topology
A 128T router is an x86-based system. You can run the 128T software on bare metal if you need ultimate performance, or run it as a virtual machine, as you see fit.
128T considers a router to be made up of either one or two nodes. A two-node router is how you’d deploy for maximum redundancy, but this is not the active/passive pair that might come to your mind.
Instead, think of a two-node 128T routers as sort of like a chassis and sort of like a cluster.
- The nodes are connected via Ethernet links that act as fabric interfaces.
- Traffic can flow from any router ingress port to any egress port on either node, using a fabric interface if traversing from one node to the other.
- State is synced between the routers, and they share an identical forwarding table.
- A dedicated HA link monitors cluster member state.
- If a node failure occurs, gratuitous ARP (GARP) makes others aware of the new MAC address(es).
128T routers do not need to be physically adjacent. There can be non-128T devices in between them. This means that some different concepts apply when we speak of a 128T topology, similar to how arbitrary topologies can be created with tunnel-based overlay networks. Although, let’s remember that a 128T network NATs to get from router to router–it doesn’t tunnel.
128T uses the idea of a neighborhood to help Conductor construct network topologies. Network interfaces in the same neighborhood can work together to forward traffic. Interfaces in the same neighborhood might become adjacent, or might not, depending on the topology you’re looking for.
Perhaps you’ve defined a hub-and-spoke topology for a given neighborhood of interfaces. In this case, Conductor will configure the spoke interfaces to become adjacent to hub interfaces, but not to other spoke interfaces. However, in a mesh topology, all interfaces in the same neighborhood could become adjacent to one another. Again, Conductor takes care of making interfaces adjacent.
As I understand them, 128T neighborhoods are effectively a grouping mechanism. You can apply policy to a neighborhood, so you end up with something like a VRF, as best as I can tell.
Beyond neighborhoods and mesh or hub/spoke configurations, 128T topologies define peers, adjacencies, and vectors.
- Peers are simply two interconnected 128T routers. Remember that a single 128T router could be made up of either one or two physical nodes acting as a single router. Ideally, there will be more than one physical path between two peers, which gives a 128T routers the opportunity to re-route sessions according to how the network is behaving in real-time.
- An adjacency is an individual path between peers. As noted above, two peers could have, and ideally will have, more than one adjacency. 128T uses BFD to monitor adjacency paths, not only for link failure, but also for link quality by tracking jitter, latency, and loss.
- A vector is
a really cool car from the 80’s & 90’sthe best path for a particular session traversing the 128T network. Multiple interfaces combined with different services and policy defines the vector for a specific session. For example, policy might require that voice traffic is sent out a dedicated T1 interface, while all other traffic uses a broadband interface–two different vectors.
You should have at least a sense now of how traffic paths are created through a 128T network via peers, adjacencies, and vectors, but there’s one more element that’s important to understand about 128T topologies–service routes.
Service routes have been described as the glue that ties the underlay network back to services. Put another way, they define the path that will be taken for a given service. A service can be made of several components including policy and external routing domains.
- One component of the service route could be a link to a directly connected 128T peer.
- Another component could be a next hop that’s a non 128T device. 128T routers speak BGP and OSPF, and learned critical routing information to determine how to get to another 128T device across a non-128T space.
- 128T devices can perform service chaining, sending traffic to a third-party device, say for security inspection.
- Resiliency might also characterize a service route. For example, you might define a policy that dictates a specific service should have service duplication across multiple links for guaranteed deliver.
128T notes that service routes are locally significant, specific to a single router at a specific location.
How Does A Packet Flow Through A 128T Network?
All we’ve done so far is define terms. Since 128T’s terminology is helpful to understanding how they forward, we’re now at a place where we can hop on a packet and go for a ride through a 128T network.
The cornerstone technology that dresses 128T’s concept salad is Secure Vector Routing (SVR). SVR will guide us on our packet walk.
What Is SVR?
SVR is as much a philosophy as a technology. The key to understand is that while, yes, a 128T network forwards packets like any other network, 128T doesn’t care about packets. A 128T network cares about sessions. Every packet belongs to a session, and it is in the context of that session that a packet will be forwarded (or dropped) across some policy-determined link.
Thus, the “secure” in Secure Vector Routing is about security policy. A 128T device is, in fact, an L4 firewall as well as a router, one that starts with a zero-trust, “drop everything” posture. But, secure also means encrypted. A packet’s payload will be encrypted upon ingress to the 128T network and decrypted before it leaves the 128T network.
If your brain lit up with questions about encryption, there are a few points worth knowing.
- 128T encryption does not imply an IPSEC tunnel. This is payload (and metadata payload) encryption only.
- The default encryption scheme is AES-256, but you can optionally choose AES-128 if you like.
- The solution is key-based, not certificate-based.
- A secure key manager creates random keys and securely distributes them to the routers.
- Keys live in memory only. They are not stored.
- A re-key mechanism can be enabled, and an operator can set the re-key interval. All re-keying is automated.
- All 128T devices get the keys. In theory, only the ingress and egress routers need the keys to decrypt traffic for a given session, but 128T points out that ingress and egress routers can differ over the life of a session due to a policy update or topology change.
The “vector” in Secure Vector Routing points at the unique path a session will follow through a 128T network. Different sessions made up of different applications might have different policies and SLAs assigned to them. Therefore, you could have an awful lot paths traffic might take through a 128T network. Throw away your notion of best path as defined by a traditional routing protocol and think instead of policy routing, traffic engineered paths, and schemes such as segment routing.
128T calls intermediate 128T router hops waypoints. Think in terms of GPS. A vector could traverse several 128T router waypoints.
Vectors are directional. While a path might be symmetrical, a 128T network evaluates traffic both between a tenant and a service as well as between a service and a tenant.
With a basic understanding of SVR behind us, we can walk through a 128T network from a packet’s point of view.
A Packet Walk
Let’s start with what happens when the first packet of a session hits a 128T router.
A packet comes in, and is detected to be the first packet of a session. That is, there is no existing session in the 128T session table that matches this packet. Now, the 128T device must determine which tenants and services are associated with the packet.
To determine the packet tenant, the 128T device hits a source lookup table. The source lookup table indicates what the tenant actually is based on a number of criteria like the source network and interface.
To determine the packet service, a FIB table is checked to determine if the requested service is reachable by that tenant. Then, an access policy is checked to make sure that this tenant is allowed to access the requested service.
At this point, the 128T device knows where the packet came from (tenant), where the packet is going (service), and that the packet is allowed (security policy). However, we don’t want to go through this sanity checking for every packet in the session that’s going to follow. That would be inefficient. Therefore, for this first session packet, the 128T device will add a proprietary metadata field.
The metadata is used by subsequent 128T devices along the forwarding path to know that the packet is to be trusted, and to create a session table entry. In this way, the metadata acts as a session key, containing a five tuple of the original source and destination IP address, the original source and destination ports, and the IP protocol. The metadata also contains the tenant and service identified by the ingress 128T device.
With the metadata added, encryption of the packet payload and metadata payload can occur. Note that encryption of the packet payload doesn’t have to happen. For instance, why re-encrypt an already encrypted SSL packet? Policy governs the decision to encrypt or not.
For my most pedantic readers, I’m not certain whether encryption happens before or after the NAT we’re about to discuss. In theory, it could happen either way without blowing up a CRC because the encryption is of payload and metadata payload encryption, and not encryption of the entire packet including IP headers.
Since a 128T router does not tunnel, there is no encapsulation of the packet before forwarding to the next 128T device. Therefore, once the metadata (session key) is added, a NAT occurs. The source IP is translated to that of the sending 128T device, and the destination IP translated to that of the receiving 128T device–the next router, or waypoint, in the 128T service path.
Destination port numbers are also translated to something in the 16K to 65K range. This is important, because it means that packets flowing between 128T devices can be hashed over different links in any ECMP bundle there might be along the way. If the destination port was static, the hashing algorithm would land the entire 128T-128T flow onto a single ECMP member. And hey, we like load balancing.
An aside…
VxLAN tunnels vary the UDP source port by hashing some portion of the encapsulated frame for the same reason 128T varies their destination ports–ECMP friendliness. What precisely is hashed in the case of VxLAN encapsulation varies subtly by switch vendor.
Most switches can be configured to hash IP packet flows against one or more combination of port numbers that are a part of the packet. As port numbers tend to change during each flow that makes up a session, an ECMP bundle tends to get a roughly even distribution across the links. All is well as long as either the source or destination port changes with enough regularity.
GRE tunnels do not vary their TCP or UDP ports at all, as GRE doesn’t even use TCP or UDP, but rather is IP protocol 47. (TCP is IP protocol 6, and UDP is IP protocol 17.)
GRE’s lack of ECMP friendliness is a problem for VxLAN competitor NVGRE, which recommends that “the ECMP hash is calculated either using the outer IP frame fields and entire Key field (32 bits) or the inner IP and transport frame fields.” Not many devices I am aware of support NVGRE packet hashing to improve ECMP load balancing, although there are some.
And now back to our 128T packet walk…
With encryption complete and the packet IPs and destination port re-written, the packet is forwarded to the next 128T waypoint.
When the next 128T device receives the metadata-laden packet, it understands that the packet is to be trusted because the metadata is present, creates a session in its session table, encrypts, performs NAT, and forwards however is appropriate based on service policy, retaining metadata for the benefit of the next 128T device in the path.
When the packet arrives at the final (egress) 128T device in the path, the packet reverts back to its original state. The payload is decrypted, and if metadata was present, it is removed. The original source and destination IP and ports are restored (they were preserved in the metadata, remember?), and the packet forwarded towards its final destination.
Subsequent packets in a session flow through a 128T network much the same as initial packets. The key difference is that metadata is not added to a packet if the packet matches an existing session.
Should You Consider 128T?
Phew! A lot happened in that packet walk. Now, if that feels complicated…it is. However, there’s a lot of value 128T has packed into their routing platform. All of those features imply some complexity.
You’re getting a lot of tightly integrated–and automated–functionality in a system that would have been a great deal of work to create from separate pieces. In that sense, don’t be put off by the level of detail I chose to share here. Think instead about whether the platform, as delivered by 128T, gives you functionality you’re looking for.
The use cases for a 128T system are where the platform becomes especially interesting. As I see it, their home is anywhere in the WAN. Perhaps the chief example of this is SD-WAN, a service 128T is aptly able to deliver.
Another example is that of satellite communication links, which are relatively high latency and low bandwidth. 128T claims a ~30% improvement in session throughput over tunneled alternatives, since there is no tunnel overhead in the 128T forwarding paradigm. This has proven to be a winner in the IIoT market, where high bandwidth, terrestrial WAN links are often hard to come by.
Another aside. You might be wondering, “How well does the 30% claim holds up in the face of 128T metadata?”
The answer is that 128T metadata adds about 150-250 bytes of overhead into the initial packet of a session flow. Subsequent packets in an established session don’t get tagged with metadata. Thus, metadata overhead is truly minimal.
This is in sharp contrast to tunneling schemes where every packet has tunnel overhead added.
If you’re thinking, “I don’t care about tunnel overhead. It just doesn’t impact my network appreciably,” you might be right. Even so, two key considerations come to mind. One, your traffic mix. Two, your WAN link types. 30% is an especially big deal when you’re bandwidth-starved.
Don’t limit your thinking to just SD-WAN or even the broader WAN, though. 128T has stated that they want the entire routing market. They don’t want to be pigeonholed as an SD-WAN vendor.
I understand their aspirations, as they’ve built a compelling platform. Displacing incumbents like Cisco and Juniper is always difficult, however. If I were the one signing off on the purchase, I would need to understand ROI very clearly.
Some ROI Questions
I don’t have the answers to all of these questions, but these are topics I’d be thinking about if evaluating a business relationship with 128T.
Is it cheaper capex and/or opex to acquire and own a 128T network?
This might be the smallest considertion of a technology purchase for many organizations, but costs can tip the scales one way or the other.
What is the TCO of a 128T network if we use the benchmark of a typical ~7 year lifecycle for traditional router hardware?
This is a more nuanced version of the capex/opex question, raised because of 128T’s target of “the entire router market.”
How hard is it to train my engineering and operations teams on 128T technologies?
A 128T network is highly automated, but that doesn’t mean it’s a turnkey solution. Training will be required to become as independent as most IT organizations, especially large ones, like to be.
What quirks do I have to deal with when integrating my 128T system with my traditional network?
This might sound like a question about BGP or OSPF, but there’s more here. Think also about monitoring and reporting, as well as integration with ticketing systems.
Is there a reference architecture for a brownfield transition to 128T?
This comes to mind because of the challenges tied to transitioning WAN models. Routing gets complicated as you build out what is essentially two wide area networks, one of which forwards traffic according its own rules as opposed to familiar routing schemes. How do you announce routes from one network to the other and avoid topology loops or blackholes? Does 128T offer a reference architecture or at least best practices to help site migrations go smoothly?
How much of a pain is it to push 128T transit traffic through non-128T middleboxes?
128T has built-in workarounds for some of these challenges, such as firewalls that drop packets containing 128T metadata.
How solid is 128T code, and what’s the time to repair on bugs?
No vendors I’m aware of are putting out good code these days, but it’s still a question worth asking.
What is my attack surface when my 128T device is exposed to the Internet?
Zero-trust sounds ideal, but there’s always more to the story. What daemons are sitting there listening? Is there a device hardening guide?
What is my support experience like?
Especially when dealing with a proprietary system, you need a vendor support organization that understands your issues quickly and can address effectively. Many vendor support organizations are poor at both comprehension and timely resolution due to apathy and/or outsourcing. Where does 128T fit? I’m not sure, but I’d want access to someone well-versed in their unique platform if things aren’t going well.
What middleboxes can I get rid of if I roll out 128T?
A 128T router is also an L4 firewall. It can also do a form of service chaining. Does this present me with an opportunity to re-architect my edges, chuck some middleboxes, simplify operations, and save some money? If so, there’s probably an ROI win.
Does 128T make my branch office management simpler?
Is there an opportunity to get away from so many different boxes at my branches?
Can my monitoring & traffic analysis solutions effectively handle 128T-encrypted and -translated packets?
This is a potentially huge question, as 128T’s approach of translating packets rather than encapsulating packets (as most SD-WAN solutions do) means that IP sources and destinations are obscured. There’s no peeking inside the tunnel headers to grok what you can from the original packet. Only the 128T system itself is tracking that data via the metadata session key.
The Bottom Line
Those ROI questions aren’t me throwing shade at 128T, nor are they to imply that I don’t find 128 Technology a compelling company who created something quite interesting. There’s a big part of me that likes what 128T has built. But another part of me thinks of various places that I might want to use 128T, and sees other solutions that do, more or less, the same thing.
But perhaps those other solutions are uni-taskers, and not as versatile as 128T. And probably those other solutions also have their proprietary quirks. For example, if you peek under the hood of most SD-WAN solutions, there’s lot of proprietary magic going on with controllers, overlay forwarding schemes, and cloud integration.
A lot of the decision making process might go back to the ROI points I was making above. In nearly every point I raise, 128T has a chance to shine. For many folks, 128T has indeed become a router of choice. 128T has customers with sizable deployments–over 5,000 128T devices deployed in one case, and a 10,000 router deployment coming soon.
The bottom line is that 128T is worth looking at. If you think about their routing technology and believe you have a use case, they should be able to handle about anything you can throw at them. They’ve got the features, scale, and automation to get the job done. You just need to decide what that job is.
For More Information
The information gathered here was from a Tech Field Day event held in Burlington, Massachusetts on July 23, 2019 where I attended as a delegate.
The event was video recorded. You can watch all presentations from the day here.
For this piece, most of my information came from the following two videos.
Errata
For those who know the 128T routing platform intimately and wish to correct any errors in this post, please send your corrections to [email protected]. I will make the appropriate updates.
Thanks, this is really interesting
Really helpful Erhan. Good work. Thanks