S/PDIF or USB?

For an updated overview on the USB and associated topics, please see also this later article written in 2010, the year that appeared a breakthrough in the USB domain, paying special attention to part 2, “PC, USB and hi-res audio: Do the PC & USB really fit into high quality audio today?”.

The fundamentals of asynchronous USB operation, used in the Audial USB DACs released since 2010, are summed up in this topic.

S/PDIF or USB?

by Pedja Rogic

Personal computers entered the audio scene about two decades ago but their role was mostly limited for professional environments. However, their overall growing use resulted in them being adopted home audio scene use as well. This move was partially made possible also by the increasing use of USB, a user friendly external PC interface with sufficient bandwidth for audio purposes. As usual, there are some initial misunderstandings associated with this issue, and we can easily find both too optimistic claims, as well as unfounded denials: such extreme views in fact still dominate discussions on related topics.

One of the most important points in this regard was and probably still is the question of jitter and this article will also try to explain the current state in this domain. Assuming the PC is, so long as it doesn’t perform some kind of signal processing, easily able to preserve data integrity (a.k.a. bit perfect information) we will concentrate on this topic. It still doesn’t imply that the jitter is actually the complete answer for digital audio performance. In fact this article points out at least one reason why it is not.

S/PDIF (pre)history

Indeed, things may get very confusing these days. Is it the old vs. new, or the mature against immature, or what is it about at all? Does the USB bring lower or higher jitter than classic home digital audio devices do? What is the current state of things in this domain, and can anyone predict the future?

Historically, things are apparently once again going the same old known route. At the time it was born, S/PDIF was, just as the whole digital audio concept, usually considered technically perfect. Since common high end audio wisdom has been always suggesting dividing the units into functional blocks, people were satisfied with an interface that made possible an external connection between the transport and D/A part of the player. The interface was able to pass the bits, and preserve the data integrity, and to preserve it even if the line was hundreds of meters long, and even under quite critical conditions (related to the noise). It was simply one unquestionable part of the chain. Enquiring minds of our days will be probably surprised to look back and find no trace of concern about any associated problem that could be discussed further at all.

The nineties however brought a much better understanding of the problems associated with S/PDIF. And exactly because these problems became visible, people started paying attention to, and correcting them. This consequently brought progress and, as a result, today’s S/PDIF hardware is incomparably better then it was 20 years ago. Today’s Cirrus receivers’ performance is 50ps of intrinsic jitter,¹ and they lock on the preamble, thus being relatively free from data induced jitter. There are other manufacturers claiming similar performance. Funnily enough, today S/PDIF is exactly more than ever considered a big must-avoid bottleneck. Is it thus only because people now much better understand the seriousness of associated problems? Or is it a plain need of selling new technologies, and announcing existing ones bad and obsolete? And do we have to be afraid of audio history once again ignoring the actual performance and moving into a new direction forced by “external reasons”? Indeed, once the main stream goes one way, accepting a particular attitude, it is hard to stop it and turn things back. Remember, when the digital audio knocked analogue records out, there was no way back!

Going USB

With USB, the things apparently look the same. Presented to the audio community as flawless (still) new technology, it gained a stream of followers in short term, and we now can relatively safely diagnose this as a trend. Unfortunately, the reasons behind the trend are again a bit outside of the technology and its understanding. Now, however, totally opposite statements regarding the performance of this technology can be heard, both those claiming its superiority and flawlessness, as well as those claiming its uselessness for any serious audio purpose. And this applies to the objective technical performance! The mess following its subjective i.e. sonic qualities is even bigger.

Before proceeding further, one additional notice could be useful. Those believing in the ultimate relationship between perceived sound quality of the digital audio device and the quality of its clocking scheme (i.e. low sampling jitter), and who really took the “trouble” of thinking about the ways to accomplish the best results in this domain, are probably aware of the plain simple fact: something, at the end of the chain, always must clock the D/A converter, and it is the quality of this clock that determines what you finally get. And to get the best results in this regard, what can be better than the classic, old fashioned, integrated CD player, where the system clock, designed without any constraints and for the ultimate jitter performance, feeds the D/A converter directly, and where everything else is simply synchronized i.e. slaved to it? OK, in fact, you don’t see such a scheme in the actual CD players regularly, but it doesn’t change the point. You can also, if you want, imagine a “CD player” as multi box unit, since this is not about the number of boxes but about the general architecture. (One box is preferred for easier clock routing though.) What does matter, is a master clock that is ideally a crystal oscillator that feeds the D/A converter directly. And not the PLL forced to retrieve clock from coded protocols.

It should be further noted that the CD reading process is set up the way it easily corrects for the errors and it hardly and very rarely produces the jitter of its own. Even the “pit jitter” doesn’t in fact translate into the “sampling jitter”, as shown by Dennis, Dunn and Carson. [1] Jitter possibly associated to the CD transport (as a whole) is rather about the other parts of the chain, and which are anyhow not avoided in any chain: crystal oscillator itself, including its distribution scheme, supply, layout… So, the idea of the PC as an advanced or even ideal audio source because “it removes problems of data errors and jitter associated with the CD reading” may be seductive as such but is not really founded in reality.

Having said that, what the S/PDIF, USB, or any other external interface is for? “For convenience” is of course the good answer but before proceeding to analyze the current state of performance of USB based digital audio reproduction systems, it is important to understand this: there is nothing that can have that uncompromising clocking scheme, and thus that low jitter, as integrated CD player can. External interfaces do bring convenience but they don’t bring improvements in performance. With respect to the performance, they rather bring only additional problems.

The convenience that a PC can offer should not be neglected though, and USB further offers plug ‘n’ play approach with fair stability in use. (OK, PCs are used, and they will be always used for redundant purposes but that is the other part of the story.) The best USB scheme would be the equivalent to the scheme explained above: with system clock placed in the USB D/A device, and PC (i.e. its USB port) slaved to this USB device. A couple of manufacturers claim to utilize such a scheme in their most recent USB audio devices, however, up to this moment no detailed info or performance was published. Anyhow, regularly available USB decoders don’t work this way and can not slave the PC; they retrieve a sampling clock from the existing USB stream.

It may look that with this we are at the same problem as we are with S/PDIF but we are not. S/PDIF uses Biphase Mark (“Manchester”) coding which embeds the clock into the data, and for this reason, its major problem is a difficulty to retrieve this clock back, without it being phase modulated by data; in other words, S/PDIF principally suffers from data related jitter.

USB is quite different. It uses Non-Return to Zero Inverted (NRZI) coding (to save the bandwidth), which essentially doesn’t comprise clock. In some ways, this may make the final results independent from the actual jitter of the USB stream. The D/A however normally needs to trigger the output by the clock that is somehow synchronized to the data, and it is on the USB decoder to provide i.e. to recover this clock. In practice, unless the USB device is a master device that controls the PC host, requesting the data as needed and according to the speed of own (free-running) clock, it will need to recover the clock from the USB stream.

Actual USB decoders do this task more or less successfully, and performances of available USB decoders, in reality, differ markedly. A short overview will follow.

USB decoders

So, starting from the bottom, the worst jitter performer I’ve come across so far was Philips UDA1321. It actually showed the highest jitter I have ever measured, not only in PC audio but in the audio generally: in fact, when I saw this result, my first thought was that something was wrong… but it wasn’t: its jitter is about 1ms (yes, millisecond). And it is not only a relatively high absolute value but it is also mostly not benign 1/f jitter. (I thought this would be a good candidate for the world record, but when I posted this result a few months ago to diyhifi.org I was told that some of the first DVD players actually had even higher jitter.) Yet, this chip is used in some pieces marketed as “audiophile” ones. One will also come across enthusiastic owners reporting that they’ve sold all the expensive classic equipment they used to have, finding a USB PC system based on this device superior, and tweaks (cryogenic treatment and mechanical damping…) “bring it even to the higher level”. It may look like a total mess, but one doesn’t have to wonder if we find the same enthusiastic owners of UDA1321 explaining how all this PC stuff is good “because it is jitter free”.

Fig. 1: Jitter performance of UDA1321 based USB decoder
[click to see a higher res graph]

And taking such events into account, the caution regarding PC audio shown by many serious audio consumers is understandable. Fortunately, PC stuff can indeed do better.

According to the measurements posted by John Westlake to diyhifi.org in 2006, a C-Media CM108 is better but still with a relatively high figure of 3ns and with the probably more important problem of bimodal recovered clock distribution.

As shown by the same author, the TUSB3200 from Texas Instruments reveals a bit different approach in designing; overall recovered clock jitter is a bit lower @ 2ns, and it has so-called spread spectrum.

Additionally, the measurement of TAS1020 has been submitted by Ergo Esken. This chip apparently performed worse than TUSB3200. The things being said, it is not clear to what extent it was the programming that determined results achieved by these Texas chips.²

PCM2704/5/6/7 units confirm Texas Instruments’ leadership in this domain. This series perform apparently better than any of the chips above, and it doesn’t show the problems of bimodal distribution or spread spectrum or any other problems of apparently poor PLL or small buffer. I ran the frequency analysis of J-signal, using a PC running under Windows XP, and I used the Foobar2000 player with ASIO output. The result shows relatively wideband jitter content (supposedly benign) with a total amount I estimated at about 1ns (RMS), and with very low data related artifacts â€“ the point at which a USB link apparently easily overcomes S/PDIF related issues. The shape of the wideband content is not entirely “random” (Gaussian) but includes a certain unevenness, even some parts that can be considered discrete ones, and prominent parts in fact float a bit over time. The two curves shown below were captured within a couple of tens of seconds. (This effect is in my experience associated with certain asynchronous work of the source and conversion clock. More on the internal structure of PCM2704/5/6/7, including its “Sampling Period Adaptive Controlled Tracking System” (SpAct), can be found in [2] and [3].) Also, the jitter content still shows a couple of low frequency artifacts as well as certain 1/f skirt, however it is still much lower than that of the previously considered chips. The absolute number of 1ns is still too high to be considered a definitive solution though. Also, the total amount I estimated based on the frequency analysis didn’t match completely to the result reported by John Westlake, who used the same printed circuit board of mine to analyze PCM2706 MCK out directly. (It isn’t necessarily the full explanation, but the difference may be caused by the use of a different PC; we’ll get back to this topic a bit later.)

Fig. 2: Jitter performance of PCM2706 based USB decoder
[click to see a higher res graph]

So, why do these things look like that?

The best feature of the USB interface, its “non-return to zero inverted”, “clockless” protocol, means that the results may be free of the source jitter, but at the same time, this is its main problem: unless the host PC is slaved to the peripheral device, it is not clear how to synchronize the peripheral device to the host PC. This is of no concern for the most purposes that USB is normally used for, but here it is, because conversion is performed in the real time. A data buffering comes to mind as a solution, but unless complete files are buffered, a buffer will be either overflowed (if PC clock is faster), or emptied (if PC clock is slower).

One way to solve this is to design the audio unit to use USB packets, which are sent every millisecond, as a time basis. This, unfortunately, brings exactly what was meant to be avoided, the results totally dependent on the source jitter, and very high jitter should be expected.

However, the time interval of USB packets can be also used only as a basis for PLL, which locks to it, slightly adjusting its own frequency, and thus achieving certain jitter immunity with regard to the actual jitter of the USB stream. Such use of PLL, which works together with certain data buffering, is often called an “adaptive” USB mode, and is what the majority of today’s USB devices are using. Designing such a PLL is however not a trivial task. Further, it normally needs some data buffer i.e. memory, and memory is not cheap. As reported, results achieved by different USB decoder manufacturers differ both in the overall quality and in nature, depending on the approach of the particular manufacturer and their abilities, but apparently also on their understandings of (un)important parameters.

How about the PC used as an S/PDIF source?

Checking this may seem redundant at the moment, but it is not. The previous paragraph in fact points out why the final jitter is dependent on the jitter of the source. So I’ve run the same test again to see if this jitter is associated with the PC itself, rather than to its USB interface as such. So I used again the same DAC (The Model) but this time measuring both via its USB and S/PDIF inputs, later being fed by the soundcard (E-MU) hosted by the same PC. Indeed, a bulk of this jitter figure is actually related to the PC, since these two curves have very much in common. (Another change in the shape of USB associated jitter displays another shift, as pointed out earlier.)

Fig. 3: Jitter of PC USB port vs S/PDIF output of E-MU soundcard
[click to see a higher res graph]

Improving on PC side ³

Efforts to improve in this domain can be concentrated on several main problems.

Jitter

As we have just seen, the jitter performance of the USB port is relatively similar to the jitter performance of the S/PDIF output of the soundcard hosted in the given PC. So, if we were using a different PC, would this performance also be different? Can some PC’s perform better than is just shown?

The previous graph shows the performance of the USB input of The Model. It was made using one PC both as a USB source and as a host for the measurement A/D front end. This was done intentionally, to lower possible common mode and ground loop artifacts caused by having two PCs in the chain. However, to find the answer to this last question, I used another PC to source the USB signal to the same USB input of The Model. This second PC again runs under Windows XP and uses a Foobar2000 player. The same first PC is again used for measurement. Jitter related artifacts (red curve) dropped visibly. In fact, in most of the audio band they got down to the noise floor of the whole set up, which in turn expectedly, as a consequence of the use of two PCs, increased (black curve – measurement is taken in the absence of the audio signal).

Fig. 4: Jitter performance of PCM2706 driven by another PC
[click to see a higher res graph]

One will normally want to investigate deeper within PC architecture on exact causes of these differences, and consequently on possibilities to further improve on this performance, but it shall remain out of the scope of this article.

Noise

One will normally also want to further investigate possibilities to lower the noise floor, searching for its causes. Going deeper in this direction is again also out of the scope of this article but the following graph may be indicative. I first turned the source PC, i.e. its operating system, off, but it didn’t lower the noise floor really importantly (blue curve). Then I turned the main PC mains switch off, and it again helped only slightly, in the 15-16kHz area (black curve). Only completely unplugging the mains cable from the wall yielded the noise floor down to the limit of the measuring set up (red curve). Please note that this wasn’t the ground loop, since only the measurement PC has been using the connection to the safety earth.

Fig. 5: PC PSU noise
[click to see a higher res graph]

Further investigation would tell whether some part of the supply worked even when the switch was turned off, or this was solely a common mode noise, but this should be enough to make the importance of noise issues quite obvious. In fact this doesn’t look like anything new but, unfortunately, despite all the PC tweaks proposed these days, the problems in this area are still more or less totally intact.

PC noise related issues however are not even remotely limited to the noise shown in the graph above, i.e. to the noise within the audio band. They are probably not even primarily related to the noise within the audio band. In fact, most of the audio gear has a relatively reliable power supply rejection within the audio band. Consequently, many advanced modern PCI audio cards, which are both supplied by and located in the PC, achieve a vanishingly low noise level within the audio band. (Actually, the soundcard used for these measurements is exactly such a card and its noise floor is well shown by the last curve of the last graph). It is however also a high frequency noise that disturbs our audio circuits the same, and many times it is the actual reason for their disappointing subjective performance.

According to the feedback I’ve had from Audial DACs users, in some cases decoupling for noise appeared more important than jitter performance. In other words, the external USB to S/PDIF converter that was feeding the S/PDIF input of the DAC, soundwise worked better than a direct connection with the USB input of the DAC, with USB decoding hardware being more or less equal. (As opposed to their USB inputs, S/PDIF inputs of Audial DACs provide galvanic (transformer) decoupling.⁴) This may be further emphasized by the known susceptibility of TDA1541A to the high frequency noise.

The problem, as I understand it, and putting aside ground loop problems which would have to be addressed additionally, is made by two major groups of contributors:

1. Supply noise

PCs almost without exception use switching mode power supplies, including battery supplied laptops, since these also employ DC-DC converters. The use of such supplies means major savings in big industry manufacturing. On the other hand, savings per one item doesn’t mean much in the world of high quality audio. One may talk about the improved noise performance of the recent SMPS generation, however there is no guarantee with regard to the actual performance of the units available. Obviously, this suggests a move to a quality linear supply. Additionally, it would solve the problem of noise caused by the PC supply fan (both the mechanical one, as the electrical one dumped back into the supply line). The current lack of adequate commercial products however requires some kind of DIY solution.

2. Noise generators like ICs and components with mechanically rotating parts (fans, HD)

These parts dump the switching currents back into the ground, thus disturbing the system ground (“ground bounce”). As the USB galvanically couples the PC ground to the USB peripherals, the DAC’s ground is directly violated. This is apparently harder to solve. The use of USB decoupling devices is suggested, however the prices of the currently available devices significantly add to the overall system price, and reported results are not consistent. Solving the problem right at the start is recommended anyhow. The usual suggestions for using the most powerful PC machines may be wrong in this regard, since the noise generated by them is normally higher in frequency.

A lot of it, however, depends on the implementation. There is no doubt that the computer hardware designers do a lot of very good work in this regard, however the usual PC environment is still too noisy even for basic audio standards. Unfortunately, at this point there is quite a gap that severely limits a possible move forward: PC designers are usually not audio oriented, and audio guys are normally total strangers in PC architecture – and there is practically no communication between these two worlds.

Ground loops and cables

The use of two or more galvanically coupled devices where each is using its own connection to the safety earth is a recipe for ground loops. Use of switching supplies makes the use of safety earth a must (otherwise the ground may shift notably above the earth voltage, sometimes even 100 or more Volts). Optical isolators are sometimes recommended to break this loop, however one has to be cautious with them. They are often not good jitter performers and usual USB devices still use USB stream to retrieve the sampling clock.

USB cables have to be as short as possible. Long cables won’t only increase the ground loops, but will also increase a current induced common mode noise and bring 1kHz USB packet frequency into the audio signal. Please see the third and fourth graph in this post that show the difference between 1m and 3m of the same cable.

Spread spectrum clock

This technique is used in some PCs to make them easier to pass the FCC regulative. The graphs published here are made using PCs not employing a spread spectrum clock. The second PC however had the option to turn the spread spectrum on in the BIOS setting. So I tried turning it on to see the possible consequences. This may come as a surprise but the use of the spread spectrum clock didn’t change anything about the USB jitter performance of this PC.

Software

The software was among more discussed USB PC audio issues. Like many others, I also found the software to contribute to the final sonic results but, as opposed to what you can read at some places, it has nothing to do with jitter. I went through many programs, including both those claimed to improve on the jitter and those without any ambitions in this direction (WinAmp, Foobar2000, XXHighEnd, and a couple of other more or less known players), as well as through a couple of output drivers (kernel mixer, a few ASIO drivers), and found no trace of any relation between the jitter performance and software used. In other words, unless one really screws up something on his own to make it perform unnecessarily badly, the software doesn’t change a squat about jitter. I have to repeat that all these tests were performed on Windows XP, and speaking strictly this statement would apply to the application layer and not to the operating system. However, there are no indications of things being different when it comes to the operating system. And Windows XP is what the vast majority of consumers use these days anyway.

Of course, the question is, if it is not a jitter, what is it then? Unfortunately, up to now to my knowledge there has been no serious research in this domain, and I honestly think the things here are at least if not more hopeless than those related to the hardware. Software for now thus remains practically exclusively listening experiments area.

Improving on peripherals side

It will be up to the semiconductors manufacturers to improve on the performance of USB decoders, using more sophisticated PLLs and more powerful buffers, and/or software.

Nevertheless, three methods to improve on jitter performance on the side of peripheral devices are:

1. Making the USB device a master that controls the host PC, breaking thus the dependency of the final result on the PC. As said, some new USB audio devices are supposedly designed this way. Up to now their performance however hasn’t been published.

2. Use of secondary PLL. This general traditional method is still not perfect but has good merits.

3. Use of asynchronous sample rate converter (ASRC). The general modern way to remove the jitter, unfortunately rather than removes the jitter, embeds it into the data.

Audial D/A converters and USB

The first version of the Audial D/A converter The Model has been using the PCM2706 as the USB decoding engine, with no further jitter reduction or galvanic isolation. The latest Audial DAC, AYA II, is using the same approach.

The feedback we’ve had up to now points out that in most cases it was the S/PDIF input that yielded superior performance. In some way, this is surprising since the jitter associated with the PCM2706 decoder previously appeared to work well with TDA1541A (as said, a similar kind of jitter is produced by asynchronous reclocking which I’ve been using for a couple of years). Also, my original experiments with the USB interface, also based on the PCM2706 but with the TDA1543 D/A chip, brought performance that was at least equal to that of the good S/PDIF source. The main problem with the USB interface, when used with the TDA1541A, is about certain smearing, a kind of delta/sigma sonic signature. I am prone to associate such a problem rather with the sensitivity of TDA1541A to the noise coupling, than to the linked jitter.

___________________

¹ – Jitter performance of Cirrus S/PDIF receivers is actually better than that claimed by the manufacturer itself.

² – According to the available data, some devices use this chip and proprietary firmware to slave the host PC.

³ – Many texts have been written on this topic last year or so. Most of them were, unfortunately, useless and even misleading.

⁴ (added in 2011) – This applies to the Audial DACs available back in 2008, The Model and AYA II. Later Audial DACs with USB inputs, D-09 and Model S USB, do isolate USB.

[1] Ian Dennis, Julian Dunn and Doug Carson – “The Numerically Identical CD Mystery: A Study in Perception versus Measurement”, AES, 1996

[2] Hitoshi Kondoh – “The D/A diaries”, Planet Analog, 2002-2005

[3] Hitoshi Kondoh (Texas Instruments Incorporated) – US Patent # 6,724,264: “Controller for a device with change in characteristics”, 2004

S/PDIF or USB?

S/PDIF (pre)history

Going USB

USB decoders

Improving on PC side 3

Improving on peripherals side

Audial D/A converters and USB

Improving on PC side ³