Why you should NOT buy ASUS hardware

After using my E8400-based system for almost 5 years, I decided it was time for an upgrade. I had been waiting for LGA 2011 processors for a while, and they had finally arrived. I wanted the i7-3930K. So I began looking at motherboards built with the X79 chipset.

There were a few requirements. Since I am working with virtualization quite a lot, it needed VT-x, VT-d, and a good amount of RAM. Say 32GB. And the possibility to add even more at a later time. So 8 RAM slots, and start with 4x8GB. I also wanted two decent gigabit ethernet interfaces. Given the issues I've seen with Realtek chips, those were out of the question. Because of these requirements, my choice was rather limited.

So after comparing some boards and reading some reviews, I decided to go for the ASUS P9X79 WS motherboard. I wasn't 100% happy with it being ASUS, due to a few issues I had with their hardware before.

ASUS Striker Extreme
I once got an ASUS Striker Extreme from someone who didn't use it anymore, and I used it in a testing machine at home. It was just horrible. Physically relocating the case it was in, was enough for the motherboard to stop booting. To fix this, I had to remove or disconnect about anything that wasn't needed for the motherboard to boot, clear the BIOS by removing the CMOS battery, hope for it to boot, and then add back the devices one by one, and trying to boot the system after each one of them. I wasted days of my life trying to get this broken crap to boot every other day. I did blame this on the nForce chipset though.

ASUS M6Ne
Then there was my ASUS M6Ne laptop. I caught this thing idling with the Dothan 1.6 CPU at 102C once, in Windows. Under load, I could reach the same temperatures in Linux. To fix this, I hacked the kernel to lower the different voltages for each of the CPU frequencies. So I contacted ASUS support about this issue, and they suggested to bring it in for "repair". While I clearly stated the problem was in the BIOS, and that lowering the voltages fixed the problem, all they did was apply new thermal paste, and clean the dust inside the laptop. Of course this didn't fix anything, but since the kernel hack worked fine and I didn't want to send it in again, I just forgot about the issue.

Unfortunately I was stupid enough to give ASUS the benefit of the doubt, and bought the P9X79 WS. After all, the board scored very well in different reviews. I had hoped that they thoroughly tested this board with features like VT-d, since it was in the "workstation" class, and came with a nice price tag.

Some time later, after assembling all the hardware and optimizing my kernel config for my new system, I ran into the first problem.

40 second hang during boot with VT-d enabled
When enabling VT-d in the BIOS, and adding intel_iommu=on to the kernel boot parameters, I notice that the system hangs for ~40 seconds during boot. It seems to sit there, doing nothing.

This kernel log snippet shows what's happening:

[ 0.873065] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 0.873088] dmar: DRHD: handling fault status reg 2
[ 0.873107] ata13: SATA link down (SStatus 0 SControl 300)
[ 0.873153] ata10: SATA link down (SStatus 0 SControl 300)
[ 0.873205] ata8: SATA link down (SStatus 0 SControl 300)
[ 0.873246] ata9: SATA link down (SStatus 0 SControl 300)
[ 0.873300] ata7: SATA link down (SStatus 0 SControl 300)
[ 0.873345] ata12: SATA link down (SStatus 0 SControl 300)
[ 0.874037] dmar: DMAR:[DMA Read] Request device [07:00.1] fault addr fff00000
[ 0.874037] DMAR:[fault reason 02] Present bit in context entry is clear
[ 0.892375] ata11: SATA link down (SStatus 0 SControl 300)
[ 0.895527] ata3.00: failed to get Identify Device Data, Emask 0x1
[ 0.895536] ata3.00: configured for UDMA/133
[ 0.895613] scsi 2:0:0:0: Direct-Access ATA OCZ-VERTEX2 1.28 PQ: 0 ANSI: 5
[ 0.895901] sd 2:0:0:0: [sdb] 351651888 512-byte logical blocks: (180 GB/167 GiB)
[ 0.896073] sd 2:0:0:0: [sdb] Write Protect is off
[ 0.896076] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 0.896102] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 0.896164] scsi 4:0:0:0: Direct-Access ATA WDC WD10EAVS-00D 01.0 PQ: 0 ANSI: 5
[ 0.896289] sd 4:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[ 0.896304] sd 4:0:0:0: [sdc] Write Protect is off
[ 0.896305] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 0.896311] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 0.897362] sdb: sdb1 sdb2
[ 0.897622] sd 2:0:0:0: [sdb] Attached SCSI disk
[ 0.905118] scsi 5:0:0:0: CD-ROM HL-DT-ST BDDVDRW CH10LS28 1.00 PQ: 0 ANSI: 5
[ 0.906554] sda: sda1 sda2
[ 0.906703] sd 0:0:0:0: [sda] Attached SCSI disk
[ 1.344731] sdc: sdc1 sdc2
[ 1.345050] sd 4:0:0:0: [sdc] Attached SCSI disk
[ 5.873047] ata14.00: qc timeout (cmd 0xa1)
[ 5.875111] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 6.182058] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 16.182048] ata14.00: qc timeout (cmd 0xa1)
[ 16.184119] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 16.184122] ata14: limiting SATA link speed to 1.5 Gbps
[ 16.491058] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 46.491171] ata14.00: qc timeout (cmd 0xa1)
[ 46.493173] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 46.800059] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 46.802126] registered taskstats version 1

While I don't really know how all this stuff inside a computer works, I could figure out what the problem was. The highlighted line contains a PCI address, 07:00.1, but according to lspci there is no device at this address. There is one at 07:00.0 however:

07:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11)

Combine this with the SATA errors in the kernel log, and we have a suspect. So I disabled the Marvell SATA controller in the BIOS, but the kernel still detected it anyway. Then I stumbled upon this post. The exact same problem, only the reporter had an MSI board, with the same Marvell SATA controller. So I decided to contact ASUS about this problem, suggesting a simple BIOS update might fix this problem. It's likely a bug in the Marvell BIOS, but since that is contained in the ASUS BIOS, it's ASUS that should fix this.

Here is what I sent them:

Apply date : 9/12/2012 2:23:10 PM(UTC Time)

[Contact Information]
*Name : Stijn Tintel
*Email Address : stijn@some.domain
Phone Number : +32487xxxxxx
City : Erps-Kwerps
*Country : Belgium (netherlands)[België (Nederlands)]

[Product Information]
*Product Type : Motherboard
*Product Model : P9X79 WS
*Product S/N : ST20230169xxxxx
Place of Purchase : Alternate.be
*Date of Purchase : 2012/7/16

[Motherboard Specification]
*Motherboard Revision : Rev 1.xx
*Motherboard BIOS Revision : 3101

[VGA Card Specification]
*VGA Card Vendor : ASUS
*VGA Card Model : HD7970-3GD5
*VGA Card Chipset : HD7970
*VGA Card Driver : Catalyst 12.8

[CPU Specification]
*CPU Vendor : Intel
*CPU Type : i7 3930K
*CPU Speed : 3200MHz

[Memory Specification]
*Memory Vendor : Geil
*Memory Model : GOC332GB2400C10AQC
*Memory Capacity : 4x8GB

[HDD Specification]
HDD Vendor :
HDD Model :
HDD Capacity :

[Add-on Card Specificatio]
Add-on Card Vendor :
Add-on Card Type :
Add-on Card Model :

*Operating System : Linux

[Problem Description]
When enabling VT-D, and enabling Intel IOMMU in Linux, I get a ~40 second freeze
right after loading the kernel during boot. This is because the integrated Marvell
controller reports the wrong PCI address.

As you can see in this lspci snippet, the Marvel controller is on PCI addres 07:00.0.

07:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s
RAID controller with HyperDuo (rev 11)

However, it reports 07:00.1 to the IOMMU, causing this error and the ~40s freeze:

[ 0.774162] DMAR:[DMA Read] Request device [07:00.1] fault addr fff00000
[ 0.774162] DMAR:[fault reason 02] Present bit in context entry is clear

I found this thread on the Internet, where someone with an MSI board has the same
problem. It suggested to contact the manufacturer: https://lists.linux-
foundation.org/pipermail/iommu/2012-January/003554.html

So this is why I am contacting you to get this issue resolved. I suspect this issue can
be fixed with a BIOS upgrade.

Thanks in advance for your reply.

The next day, I got this awful reply:

Dear Mr. Tintel,

Thank you for your email.

Do notice that Asus does not formally claim Linux support.
This because Linux support is not the same as Windows. This does not mean that our motherboards support Linux or not.

Our Research and Development Deparment only use very simple check procedures on Linux.
If it is no problem to install some specified Linux versions, we will claim support Linux support. For example, we may claim that we support Redhat Linux.
But this only means if we use specified Linux version, we can install OS without any error. So it relies on the Linux vendor.

So we cannot help with problems under Linux as we do not provide support for this.

I hope to have informed you enough.
Ik hoop u hierbij voldoende te hebben geinformeerd.

Kind regards/ vriendelijke groet,

[removed name]
Asus TSD.

I can't even begin to describe how I felt. I just spent almost 1K EUR on their hardware, and they are basically telling me to go fuck myself. Not to mention the hilariously horrible English. I really wonder how this guy even passed the selection process.

This gave me very little hope to get the more disturbing issues resolved with them.

Hard lock when rebooting a VM with a GPU assigned via VT-d
One of the things I wanted to test on this hardware, is creating a virtual workstation with real keyboard, mouse and monitor. This is actually pretty straight-forward to set up, and I got it work in no time. Until I rebooted the VM: the host system locks up, hard. Even the reset button doesn't respond anymore, and there is no output at all on the serial console. Later I found that pushing the reset button causes the system to power down after 15 seconds or so, and then powers up again. Add the ~40 second hang during boot, and you have a very annoying issue to debug.

CPU Vcore voltage too high when set to auto
This one reminds me on the issue with my laptop. Since this board came with a truckload of overclocking features anyway (I still wonder why this is needed on a workstation class motherboard), I decided to try if I could get my maximum turbo frequency a bit higher. Changed the turbo multiplier to 44 or 45, for 4400 or 4500Mhz, but I left the voltage on auto. System came up fine, and I ran "openssl speed -multi 12", to put 100% load on all logical CPUs. Temperature went up rather quickly, and when I saw it on 84C, I became worried, and checked the VCore voltage. It was at 1.47V! This is dangerous for the processor, and it is totally unacceptable that setting the voltage on auto doesn't have a reasonable upper limit. I wonder what would have happened if I didn't have a liquid cooling system.

So, yet another broken ASUS BIOS. This is it, I'm done with ASUS, I will never buy anything from them again. And I suggest everyone to do the same, especially if you run Linux.

The one thing I still need to decide on: keep using this board and cope with its issues, or fix it with a hammer, and buy something from a decent brand instead. I did work around the hang during boot. Should you have the same issue, have a look at this kernel bug report, and please do comment on/vote for it.

EDIT: there is patch that solves the 40-65s hang, see patchwork

Tags: 

Topics: