In the meantime, AMD has released Ryzen Pro CPUs, supposedly with official ECC support, but there are only few offerings (and no Ryzen Pro 3xxx ones) at the dealers we buy from. So we again went for the inofficial ECC support of the Ryzen 3900X.
1 x ASUS TUF B450M-Plus Gaming (90MB0YQ0-M0EAY0) 1 x AMD Ryzen 9 3900X, 12x 3.80GHz, boxed (100-100000023BOX) 2 x Kingston Server Premier DIMM 16GB, DDR4-2666, CL19-19-19, ECC (KSM26ED8/16ME) 1 x Samsung SSD PM883 480GB, SATA (MZ7LH480HAHQ-00005) 1 x Micron 5200 ECO 480GB, SATA (MTFDDAK480TDC-1AT1ZABYY) 1 x Intel Gigabit CT Desktop, RJ-45, PCIe 1.1 x1, bulk (EXPI9301CTBLK) 1 x Sapphire Radeon HD 6450, 1GB DDR3, VGA, DVI, HDMI, lite retail (11190-02-20G) 1 x LC-Power 2014MB schwarz (LC-2014MB-ON) 1 x Xilence Performance A+ Serie 530W ATX 2.4 (XP530R8/XN061)
After several months we upgraded the RAM by replacing both 16GB DIMMs shown above with
4 x Samsung DIMM 32GB, DDR4-2666, CL19-19-19, ECC (M391A4G43MB1-CTD)for a total of 128GB RAM. You can get links and more information for the 128GB configuration here.
We chose the ASUS TUF B450M-Plus Gaming mainboard because of ECC support (and to buy something different after an Asrock board died in the last iteration). Unfortunately, this board exceeds the normal power limits of the CPU; the CPU has 105W TDP and that should limit power to ~140W as long as the cooling is sufficient. Yet we saw 190W difference in mains power between idle and full CPU (only) load. Given the results below, apparently only CPU temperature limits the power consumption, and if we had used a more powerful cooler (we used the boxed cooler), we might have seen even higher power consumption; now combine that with a 3950X... This makes me wonder how long the board's voltage regulators can survive this load.
The board's BIOS does not offer a way to limit power directly. You can limit CPU temperature; limiting it to 70 resulted in the same 190W difference at the start, but it fell to a 135W difference in our build after several minutes, with 5% lower performance. Limiting the temperature to 45 cut the power consumption drastically (~65W difference), but also roughly halved the performance.
The manual of the board is not satisfactory: we missed proper descriptions of the connectors.
The CPU comes with a cooler and fan with a nice (though completely unnecessary for our purposes) light show, and we used this cooler.
We chose two SSDs with power-loss protection from different manufacturers (to reduce common-mode failures).
The case is very light and probably not particularly sturdy, but good enough for our purposes. It has nice rubber feet.
Upgrading the RAM was interesting. We first put in all 4 DIMMs. 1 long, 2 short beeps (RAM problem). Then we took out two DIMMs, and 32GB were recognized. Reduced to 1 DIMM, RAM problem. 2 DIMMs in different slots, 64GB; installed the third DIMM, 96GB; installed the fourth DIMM, 128GB. My guess is that the DIMMs were not sitting correctly at first (the feeling while putting the DIMMs in did not give any indication, though). Note that the board first checks the RAM for a while before giving any sound or video.
dmesg
output:
[ 8.658747] EDAC amd64: Node 0: DRAM ECC enabled.The Debian 10 (buster) kernel (4.19) does not report that, but the 5.4 kernel from buster-backports does.
Note that the currently available motherboards are not designed for servers, so they may miss features you may be interested in. In our case what we miss is slightly better ECC support and on-board graphics; we decided to live with the limited ECC support, and use a discrete graphics card.
The components we use for our server are (we built two similar ones, with the components of the smaller one in parentheses, if they differ):
CPU: Ryzen 7 1800X (Ryzen 5 1600X) Motherboard: Asrock A320M Pro4 RAM: 4 Kingston ValueRAM Server Premier DIMM 16GB, DDR4-2400 (2 of these DIMMs) Cooler: Thermalright AXP-200R ROG Case and PSU: LC-Power 2002MB, 300W ATX 2.2 Graphics: Sapphire Radeon R5 230 Ethernet (2nd port): Intel Gigabit CT Desktop Adapter Mass Storage: Intel SSD DC S3520 480GB, Seagate Nytro XF1230 480GB (Western Digital WD Purple 2TB, Seagate SkyHawk 2TB)Some comments on the components:
Motherboard: You may wonder about the low-end A320-based board, but it has all that we need (apart from a second Ethernet port) and is therefore sufficient for our needs. If you want to overclock, you need a B350-based board, though; but who wants to overclock a server? We chose an Asrock board, because Asrock and ASUS are reported to have the best ECC support for AM4 boards.
Cooler: We decided on a relatively small case, which eliminated powerful tower coolers from the selection. This cooler fits in this configuration only with the ends of the heat pipes oriented towards the back of the case (I/O Panel). The part that holds the cooler to the CPU is designed to lock with the stabilizing wires on the cooler, but that does not fit (the heat pipes collide with the mounting frame), so we went without this locking (should not hurt given that we don't move our servers a lot). Also, at first it looked as if the supplied back plate collides with some stuff on the board, but once we put the plastic washers in the right place between the board and the back plate, this proved not to be a problem.
Mass Storage: For SSDs we decided on SATA models with power loss protection, and for both SSDs and hard disks, we chose two models per server from two different manufacturers (to be used in a RAID1; we have experienced that drives from the same manufacturer failed at the same time). While PCIe M.2 SSDs are the rage, and the board has space for two M.2 SSDs (but apparently only one of them PCIe), we chose SATA so that we can also access them on our legacy machines if necessary.
The components cost a little shy of EUR3000 (including 20% VAT) in July 2017, with the components for the big box being about EUR1900, and the components for the smaller box a little over EUR1000.
For the smaller machine (2 DIMMs), we did all that they did in Linux, and got pretty much the same results, with a few differences: We changed the timings to DDR4-2400 13-13-13-13-21 in order to see correctable errors, and then it soon crashed.
For the bigger machine (4 DIMMs), we saw the EDAC entries reporting ECC, but I had a hard time finding timings that would run, but produce errors reported by EDAC. Eventually I found that changing the first two parameters can easily cross the border into Crashland (in one case we needed to take out the CMOS battery to get to sane BIOS settings again), while varying the third and/or fourth parameters (Trcdwr, Trp) resulted in a setting that was stable enough to run, yet also produced (correctable) ECC errors; the setting I used was 14-14-11-10-21. I first tested with "stress -m50" and (on a RAM-disk) with "stress -m 50 -d 50 --hdd-bytes 100M"; this produced reports of correctable ECC errors. In order to test whether the correction actually works correctly, we then ran "memtester 60G" (as root); this produced correctable error reports at a slower rate than stress (often with 5 minutes between reports), but in >1h of memtesting (with over 10 errors corrected), no error was reported by memtester, so it looks like the correction is working.
Proper results for the 1800X box are not yet done, but the first impression is 38W idle, i.e., a little less when idle (thanks to the SSDs, and obviously AMD now implements power-gating of idle cores well), and quite a bit more when loaded (I have seen 180W; makes me wonder if the CPU stays within its TDP).
Anyway, if you have this problem and want to disable SMT, you can do so at the BIOS level; but alternatively, you can ask the Linux kernel to use only one logical thread per core. For our Ryzen 1xxx CPUs, you can do it like this (as root with bash):
for i in /sys/devices/system/cpu/cpu[0-9]*; do if test $(( ${i##*cpu} % 2)) = 1; then echo 0 >$i/online fi done(Note that the logical cores are numbered differently on Intel CPUs, so you need to change this for Intel CPUs.)
In June 2019 (after 23 months), the mainboard of the 1800X died. We replaced it with an Asus Prime X370-A, but had to use a bigger case for that.
On our Latex Benchmark (the numbers are the user time in seconds):
- Ryzen 5 1600X, 4000MHz, 8MB L2, Debian 9 (64-bit) 0.287 - Core i7-4790K, 4400MHz (Turbo), 8MB L3, Debian Jessie (64-bit) 0.204 - Core i7-6700K, 4200MHz (Turbo), 8MB L3, Debian Jessie (64-bit) 0.200On the Gforth benchmarks (again, user time in seconds):
sieve bubble matrix fib fft release; CPU; gcc 0.093 0.099 0.042 0.104 0.030 2017-07-05; AMD Ryzen 1600X 4GHz; gcc-6.3 0.076 0.104 0.040 0.076 0.032 2016-05-03; Intel Core i7-4790K 4.4GHz; gcc-4.9 0.076 0.112 0.040 0.080 0.028 2015-12-26; Intel Core i7-6700K 4.0GHz; gcc-4.9