Serial Link Equalization Primer

Tuesday, January 14, 2020

Author: Donald Telian, SI Guys - Guest Blogger

Equalization is powerful. SerDes (Serializer-Deserializer) Equalization has enabled data rates we would not have thought possible, and it continues to be the secret sauce of serial links. While good performance depends on doing basic things in hardware (Steps 1 to 6), it is arguably the configuration of SerDes Equalization Settings (SES, Step 7) that determines eye margins - as illustrated in Fixing Signal Integrity Issues in Software. As such, SI and hardware engineers who work with serial links need a basic understanding of SerDes Equalization.

The modern serial link uses a SerDes on each end of the link, each with its own Transmitter (Tx) and Receiver (Rx). By definition the two SerDes are in different components, and hence have differing amounts and types of equalization. While some serial standards specify minimum Tx and Rx capabilities, components typically provide more equalization than required. Amidst lack of standardization, the variety of implementations, and its overall complexity, understanding equalization threatens to become overwhelming.
 
What are the common types of SerDes Equalization, and is there a way to simplify things? Amidst thousands of SES options, is there a way to understand what types of equalization to use when? Thankfully the answer to both of these questions is a resounding "Yes!" However, for my explanation to make sense we need to start with some history.

The History of SerDes Equalization

SerDes Equalization began in the context of longer, higher-loss interconnects. The task of equalizing that loss originally belonged to the Tx, while the Rx focused on recovering the clock. Equalization in the Tx was fairly simple: switching bits (0 to 1, or 1 to 0) were given a higher amplitude than static bits (0 to 0, or 1 to 1). Looking at the same thing in the frequency domain, this technique boosts high frequencies and attenuates low frequencies. Some called it "pre-emphasis" (focused on the switching bit) while others called it "de-emphasis" (focused on the static bit). Either way, the end result was a flatter frequency response when you combine the transfer functions of the Tx and a lossy interconnect.

Tx equalization worked well for a while, but then two important things changed: (1) integration caused interconnects to become shorter, and (2) PCB loss decreased an order of magnitude (Df = 0.02 to 0.002). These two changes have caused the majority of today's interconnects to be limited more by discontinuities than loss. Because most SerDes' Tx "default" to handling loss many serial links are over-equalized, causing a decrease in link margin. Oddly enough, I often help customers correct system problems by simply decreasing or shutting off Tx equalization. But I'm getting ahead of myself.
 
In the field of medicine "Tx" stands for Treatment and "Rx" designates the prescription for handling the problem. And so it is with serial links. Because the SerDes' Rx is the point where the signal is determined, a powerful array of equalization techniques has been added to the Rx. These techniques are robust, efficient and self-optimizing - even to the point of informing the Tx to "calm down and let me handle it." Indeed, when such "back-channel" communication doesn't exist, many an Rx wastes power undoing excess equalization applied by the Tx.
 
Given this background and evolution of Tx/Rx capabilities, we're now in a position to understand the basics of equalization and - more importantly - how to use it effectively.
 

SerDes Equalization, the Basics

When SerDes equalization began, the IBIS "template" for the digital IO was shattered. And so IO models learned to call external executables compiled to handle equalization nuances, in the form of IBIS-AMI. Over time the types of equalization stabilized; so much so that now all simulation tool vendors offer "template" SerDes models with placeholders for the common equalization techniques: Tx FFE, and Rx CTLE and DFE. Before I decode those acronyms, it's necessary to explain the concept of "cursors" and "taps". Be forewarned, engineers use the terms UI (Unit Interval), bit time, tap, and cursor interchangeably to mean almost the same thing.

In a stream of serial data, we think of each bit in terms of the bits that surround it that might interfere with its transmission. Looking at any moment in time, a bit the Tx is attempting to transmit to the Rx correctly is called the "main cursor" and this bit occurs at "tap 0" in both the Tx and Rx. As the bit propagates, losses and discontinuities in the interconnect spread and/or reflect its energy into neighboring bit times. This energy is problematic because it makes the neighboring bits harder to decipher, and hence becomes the phenomenon we hope to correct with equalization. Because most of the spreading/reflections affect subsequent bit times, we primarily focus on correcting "post-cursor" bits or "taps +1, +2, +3…" and so on. Oddly enough, some of a bit's energy can spread into the previous bit times we refer to as "pre-cursor" or "taps -1, -2…". For an example of how a single "one" bit can spread into both pre- and post-cursor bit times, have a look at the black signal in Figure 2.
 
Every SerDes implements Tx FFE (Feed-Forward Equalization) and many also implement Rx DFE (Decision Feedback Equalization). These two are interrelated because they both correct (e.g., remove energy from) post-cursors, applying bit-time corrections in ratios called "tap weights". In fact, when a Tx FFE wastes power over-equalizing "taps" the Rx's DFE is called upon to waste power correcting the very same taps on the other end. The key differences between Tx FFE and Rx DFE are: (1) FFE reduces signal amplitude while DFE does not, (2) FFE taps affect subsequent cursors while DFE does not, (3) FFE guesses at proper tap weights while DFE can determine them, and (4) FFE can correct pre-cursors while DFE cannot. If you read that and decided to let the DFE handle all bit times except pre-cursors, you're on the right track.
 
Before Rx DFE became common, many Rx implemented a "peaking filter" or CTLE (Continuous-Time Linear Equalizer). As the name implies, this equalizer is not as focused on bit times or taps; instead it simply boosts a certain frequency - primarily targeting the link's highest frequency. This boost may or may not include gain, as shown and further described in Figures 3 and 4 in this paper.
 
The most important thing to know about SerDes equalization is that your IC vendors have pre-configured "Default Settings" without knowing how your link is constructed or what device is on the other end. Knowing that very few customers will adapt the settings, the defaults are typically aimed at the worst-case high-loss system. And who can blame them; I would likely do the same. But the end result is over-equalization in most cases and system failures in a few. As such it's important to know the basics of tuning equalization for the day you too will fix signal integrity issues in software.
 

Tuning Equalization, Then and Now

For several years I've been predicting that an increase in data rates will be achieved through system-level optimization of equalization settings. What's not clear is if that will be done through up-front engineering and configuration, or through run-time optimization performed by the SerDes themselves. Given our aversion for this topic, it will likely be the latter. Either way, significant performance gains are available for the taking. So let's have a look at what and how to tune.
Increasingly the Tx' task is simply to inject signal amplitude into the system, providing sufficient energy for the Rx' internal equalization to recover the clock and data correctly. This idea challenges both our historic Tx-centric equalization mindset and our assumption that good SI is measured at the Rx pins. But thinking at a system-level, the Rx pins have become an intermediate point because the most relevant electronics have not yet been traversed: the Rx equalization. And so increasingly we judge good Signal Integrity at the Rx' internal latch or "decision point", deeper inside the IC beyond the Rx equalization.
 
Figure 1 shows system simulations, measuring eye performance at the internal Rx latch. The three design options shown trade 1st post-cursor equalization between the Tx and Rx, from left to right. Note that turning off the Tx post-cursor increases eye height by 100%. To see plots of the same signals at the Rx pins and then how the concept scales across thousands of signals, read pages 16-19 of this paper. This paper shows that improving the eye at the Rx internal latch might cause it to degrade at the Rx pins - a side effect that may not be intuitive or desirable to some.
 

Trading the 1st Post-Cursor Between Tx FFE and Rx DFE, as seen at Rx Internal Latch

Figure 1: Trading the 1st Post-Cursor Between Tx FFE and Rx DFE, as seen at Rx Internal Latch

System-level equalization optimization is enhanced by learning how to read a channel's pulse response. To illustrate this process, Figure 2 shows the unequalized pulse response (black, a single "one" bit as seen at the Rx) of a 10 Gbps channel plotted in terms of UI (bit times) on the X axis. Reflections in this channel significantly degrade eye performance out to 20 UI and beyond. This is a bad channel, in which the amount of eye degradation can be calculated by summing the millivolt levels found on the Y axis at each UI. Indeed, this ISI (Inter-Symbol Interference) is the energy that obscures adjacent bits or symbols.
 
The other colors in Figure 2 apply different equalization schemes, with a goal to leave signal only at UI=0 while reducing signal at all neighboring UI to 0mV. Note that it is primarily the moment of sampling, or tick marks on the X axis, we care about. Pre-cursor pulse spreading can be seen at UI=-1, which is improved by applying a tap weight of -15% to Tx FFE tap-1 (blue). Here, "improvement" means moving the pulse response closer to 0V at UI=-1. Next compare black and blue at UI=0 (tap0, main cursor), noting that applying a pre-cursor tap also decreased the main cursor's amplitude. The blue "equalized" pulse responses then use Rx DFE to clean up the post-cursor bits, with 8-tap (dark blue) and 16-tap (light blue) options shown. Note how the DFE taps effectively pull the pulse response to 0V at each UI until the taps expire at 8 or 16 UI. The color-coded arrows show that the additional taps produce a 30% better eye. The red pulse response shows performance with default Tx settings (tap-1=-15%, tap+1=-25%) and an 8-tap Rx DFE. While the DFE doesn't have to work as hard (DFE tap1 changes from -10% to +3% and tap2 changes -6% to -1%, blue to red), the associated eye diagrams reveal how the decrease in amplitude at UI=0 (red) causes a 50% decrease in eye height.
 

Pulse Response versus UI, 10 Gbps Channel Equalization and Performance

Figure 2: Pulse Response versus UI, 10 Gbps Channel Equalization and Performance

The example above shows how a pulse response fingerprints an interconnect and reveals how it is or should be equalized. This paper (pages 7-18) explains the process in more detail, describing which types of equalization to apply when, summarizing the methods and trade-offs on slide 15. Read the full paper for a better understanding of taps, tap weights, pre- and post-cursors, and effective equalization - concepts observable on a pulse response.
 
Tuning and optimizing Tx/Rx equalization can be done pre-hardware and post-hardware. Pre-hardware simulation puts all the trade-offs in your hands, including PCB materials and layout. IBIS-AMI models capture the nuances of device-specific equalization, enabling equalization tuning in your PCB simulator. Post-hardware, many ICs offer software routines to examine how the Rx internal eye or BER changes while you manipulate SerDes Equalization Settings (SES) in hardware. As pre- and post-hardware SES optimization techniques become more understood and accessible, the ability of the ICs to manipulate and self-optimize without user intervention will also improve. Watch this space.
 

Higher Data Rates

Higher data rates bring commensurate improvements in both hardware and software. On the hardware side, new materials have improved trace loss while via, connector, and package impedances experience continuous refinement and improvement. This is imperative, because as data rates increase smaller features become relevant. However, amidst all this change and improvement - and because PCB technologies do not change at the same rate as ICs - the magnitude of system-level pulse spreading stays somewhat constant. As such, the time that occupied only one bit or tap at 2.5 Gbps now fills 10 taps at 25 Gbps. For this reason, you will notice an increasing amount of taps in higher data rate devices. Furthermore Rx has begun adding FFE, and automated Rx equalization optimization is increasingly becoming the prescription for functional serial links.
 

Conclusion

In this "primer" we've covered serial link equalization terminology and concepts. I've detailed how the evolution of equalization caused the industry to slant toward, if not overuse, Tx equalization. Over time the bulk of equalization capabilities have been added into the Rx. This raises the importance of thinking at a system level about how Tx and Rx equalization capabilities should be combined for optimal performance. Learning to read and adapt a system's pulse response is a valuable tool to tune equalization, add margin, and avoid problematic situations. I trust the concepts presented will raise your EQ IQ, empowering you to adjust marginal links, increase throughput, and add design margin.

Donald Telian, SiGuys - Guest Blogger 1/14/2020

Add your comments:

Items in bold indicate required information.