Rockpro64 Experiences

The Rockpro64 is an SBC from Pine64 with the Rockchip RK3399 with 2 Cortex-A72 and 4 Cortex-A53 cores, and 4GB or 2GB of RAM.

Links:

Getting it

The German and French distributors of Pine64 did not have all the parts in stock that we wanted, so I ordered directly at Pine64, who ship out of China. I wanted to pay by credit card, which somehow required getting a PayPal account. I ordered on 2019-01-10, got an order confirmation by mail right away, got a mail with a tracking number on 2019-01-16, and the package was delivered (in Austria) on 2019-01-28.

I ordered

We also added a 32GB microSD card (on our Odroid C2 we use 12GB after two years, so 32GB is a little overkill), and four screws and nuts, which we use as legs for the board.

Software

We chose the latest minimal Debian stretch image (stretch-minimal-rockpro64-0.7.11-1075-arm64, but updates might exist when you read this) from ayufan-rock64, and put it on the SD-Card with Etcher (under Windows). The default username/password is rock64/rock64, und you become root with sudo su.

Fan

The fan does not start on booting with this image. We fixed this by creating an executable file /etc/rc.local with the following content:
#! /bin/sh
echo 200 >/sys/class/hwmon/hwmon0/pwm1
You can vary the value between 0 (off) and 255 (full speed); 200 cools nicely (not over 50 degrees under load) without being too noisy.

Cores and Benchmarking

The SBC has 2 Cortex-A72 cores (CPU 4 and 5) with clocks up to 1800MHz and 4 Cortex-A53 cores (CPU 0-3) with clocks up to 1416MHz. If you want to disable and reenable some set of cores for benchmarking reasons, use the following as root:
#disable the Cortex-A72 cores
for i in 4 5; do echo 0 >/sys/devices/system/cpu/cpu$i/online; done
#reenable them
for i in 4 5; do echo 1 >/sys/devices/system/cpu/cpu$i/online; done
#disable the Cortex-A53 cores
for i in 0 1 2 3; do echo 0 >/sys/devices/system/cpu/cpu$i/online; done
#reenable them
for i in 0 1 2 3; do echo 1 >/sys/devices/system/cpu/cpu$i/online; done
The Linux scheduler puts programs that produce full load at one core at the Cortex-A72 cores, so I usually don't need to disable the A53 cores. The system uses the powersave CPU governor, so I run some dummy load for warming up before the actual benchmark run in order to avoid seeing ramp-up effects in the timings.

You can see the CPU status with

rock64_diagnostics.sh -m
Run it as root if you want to see the clocks of the cores.

perf

The image mentioned above comes with the 4.4.154-1124-rockchip-ayufan-ged3ce4d15ec1 kernel instead of a mainline kernel, because this kernel supports more of the SoC's hardware AFAIK thanks to the efforts of Ayufan. Unfortunately, stretch does not have the matching version of perf. One option is to install the mainline kernel, and maybe we will do this in the future, but for now I went the other way:

I added the line

deb http://ftp.debian.org/debian jessie-backports main
to /etc/apt/sources.list. Unfortunately, linux-perf-4.4 does not want to install, because it depends on libperl5.20, which is not available for Debian 9 and has a dependency that conflicts with the Debian 9 version of that dependency. So I thought I would try to use linux-perf-4.4 with libperl-5.24, and it seems to work fine. Here's what I did:
#install dependencies of linux-perf-4.4, but with libperl5.24
apt-get install libdw1 libnuma1 libperl5.24 libpython2.7 libunwind8
#download and install linux-perf-4.4
apt-get download linux-perf-4.4
dpkg --force-depends -i linux-perf-4.4_4.4-4~bpo8+1_arm64.deb
#edit the package status:
# search for linux-perf-4.4 and remove the dependency on libperl5.20
emacs /var/lib/dpkg/status
#now let libperl5.24 appear as libperl5.20
cd /usr/lib/aarch64-linux-gnu/
ln -s libperl.so.5.24.1 libperl.so.5.20
perf reports performance monitoring counters on the Cortex-A72, but does not count on the Cortex-A53 (in contrast to the 3.14.79+ kernel and matching perf version that we use on the Odroid C2); in order to get proper counts, I disable the Cortex-A53 as shown above when using perf, otherwise I sometimes get results like:
   1788312341    cycles                (83.06%)
   11407114      branch-misses         (83.06%)
which indicates that 83% of the time the performance counters were available (apparently the rest of the time was spent on an A53 core).

Benchmark results

Gforth-fast small benchmark results (same binaries from Gforth 0.7.9_20190124 compiled with gcc-6.3):
sieve bubble matrix fib   fft  
0.204 0.232  0.108 0.212 0.100 RockPro64 (1800MHz Cortex-A72)
0.388 0.424  0.252 0.504 0.276 RockPro64 (1416MHz Cortex-A53)
0.350 0.390  0.240 0.470 0.280 Odroid C2 (1536MHz Cortex-A53)
LaTeX benchmark results:
                                                                   s   inst
- Rockpro64 (1416MHz Cortex A53) Debian 9 (Stretch)               3.24 3.09G
- Odroid C2 (1536MHz Cortex A53) Ubuntu 16.04                     2.32 2.64G
- Rockpro64 (1800MHz Cortex A72) Debian 9 (Stretch)               1.30 3.09G
The LaTeX installation on the Rockpro64 takes 3.09G instructions for this benchmark compared to 2.64G for the Odroid C2 installation (factor 1.17 difference); combined with the clock speed difference this would explain a factor 1.27 of the observed difference between the Rockpro64 and the Odroid C2 results, but a factor 1.11 still remains unexplained; the additional instructions would have to run at 0.42 instructions per cycle (IPC; compared to 0.75 for the 2.64G instructions on the Odroid C2) to explain the difference. Cache sizes seem to be the same for the two A53 variants (see below), so that's probably not the problem; the benchmarks does not miss the caches much, so main memory performance is probably also not the reason for the speed difference.

Memory latency results (measured with bplat), in ns:

          RP64   RP64  Od.C2
    size   A72    A53    A53
    1024   2.2    2.2    1.9    
    2048   2.2    2.2    1.9    
    4096   2.2    2.2    2.0    
    8192   2.2    2.2    2.0    
   16384   2.2    2.2    2.0   
   32768   6.8    2.2    4.0   
   65536  11.1   10.0    9.1   
  131072  11.2   11.5   10.8  
  262144  13.0   11.9   11.1  
  524288  16.0   40.2   36.1  
 1048576  68.8  141.2  124.9 
 2097152 136.7  161.9  139.0 
 4194304 156.0  162.2  142.1 
 8388608 156.1  162.4  140.0 
16777216 154.6  162.4  140.0
The Cortex-A72 seems to have 4 cycles of D-cache load latency, the Cortex A53 3 cycles. The D-cache seems to be 32KB on all three core/SoC combinations (SoC makers can configure cache sizes). The L2 cache is 1MB for the Rockpro Cortex-A72, and 0.5MB for the two A53 variants. Main memory performance is somewhat better on the Odroid C2; interestingly, the A72 has a little better main memory performance than the A53.
Anton Ertl