Author Archives: zo0ok

Node.js Benchmark on Raspberry Pi (v1)

I have experimented a bit with Node.js and Raspberry Pi lately, and I have found the performance… surprisingly bad. So I decided to run some standard tests: benchmark-octane (v9).

Octane is essentially run like:

$ npm install benchmark-octane
$ cd node_modules/benchmark-octane
$ node run.js

The distilled result of Octane is a total run time and a score. Here are a few results:

                         OS             Node.js                   Time    Score
QNAP TS-109 500MHz       Debian        v0.10.29 (self built)     3350s      N/A
Raspberry Pi v1 700MHz   OpenWrt BB    v0.10.35 (self built)     2267s      140
Raspberry Pi v1 700MHz   Raspbian       v0.6.19 (Raspbian)       2083s      N/A
Eee701 Celeron 630Mhz    Xubuntu       v0.10.25 (Ubuntu)          171s     1655
MacBook Air i5@1.4Ghz    Mac OS X      v0.10.35 (pkgsrc)           47s    10896
HP 2560p i7@2.7Ghz       Xubuntu       v0.10.25 (Ubuntu)           41s    15450

Score N/A means that one test failed and there was no final score.

When I first saw the RPi performance I thought I had done something wrong building (using a cross compiler) Node.js myself for RPi and OpenWRT. However Node.js with Raspbian is basically not faster, and also RPi ARMv6 with FPU is not much faster than the QNAP ARMv5 without FPU.

I think the Eee701 serves as a good baseline here. At first glance, possible reasons for the RPi underperformance relative to the Celeron are:

  • Smaller cache (16kb of L1 cache and L2 only available to GPU, i Read) compared to Celeron (512k)
  • Bad or not well utilised FPU (but there at least is one on the RPi)
  • Node.js (V8) less optimized for ARM

I found that I have benchmarked those to CPUs against each other before. That time the Celeron was twice as fast as the RPi, and the FPU of the RPi performed decently. Blaming the small cache makes more sense to me, than blaming the people who implemented ARM support in V8.

The conclusion is that Raspberry Pi (v1 at least) is extremely slow running Node.js. Other benchmarks indicate that RPi v2 is significantly faster.

Eee701 in 2015

My eee701 is not doing very much anymore, but sometimes it is handy to have it around. I have not upgraded it since Lubuntu 13.10, and that version is not supported anymore. I found that:

  1. Since 13.10 is abandoned upgrading with apt-get generated errors.
  2. 15.04 Lubuntu desktop ISO complains that the hard drive is less than 4.1Gb.
  3. 14.04.2 Lubuntu desktop ISO complains that the hard drive is less than 4.5Gb.

Thus, the standard upgrade or installation paths were blocked. And I was not very interested in putting lots of effort into getting my eee701 running a current system.

Instead, i tried the Ubuntu mini-iso. That was very nice! The iso itself is written with dd (rather than the Startup Disk Creator) to USB drive. The installation is text (curses) based, but very guided (just like Debian). I choose a single 4GB ext4 partition for root, no swap (since I have 2GB RAM) and “Xubuntu minimal” desktop. Keyboard, Wifi and timezones were all correctly set up. When installation was complete and system restarted I had 1.8Gb used and 1.7Gb available. Not even a web browser was installed, but Xubuntu itself was fine.

Not bad at all!

Seeding Xubuntu and Lubuntu

It is not much I do to contribute to the Ubuntu community so when Ubuntu 15.04 was released a few weeks ago I downloaded 4 isos with bittorrent and kept seeding them for the benefit of Ubuntu users.

Now (2015-05-10):

Image                                Size     Ratio
lubuntu-15.04-desktop-i386.iso      696Mb       120
lubuntu-15.04-desktop-amd64.iso     690Mb      60.0
xubuntu-15.04-desktop-i386.iso      970Mb      68.5
xubuntu-15.04-desktop-amd64.iso     963Mb      68.2

Does this mean anything?

If the ratio has anything to do with the popularity of different ubuntu versions:

  1. It surprised my that Lubuntu is so popular
  2. It surprised me that i386 is still very popular

There are, however, a number of factors that could disturb the correlation between my ratio and the true popularity of different Ubuntu flavours:

  • The way Lubuntu vs Xubuntu push people to the torrent (rather than the direct http/ftp) download, I found it harder to find the Lubuntu torrent, though.
  • Lubuntu and Xubuntu may have different audiences, with different attitude to torrent downloads.
  • i386 and amd86 may have different audiences as well
  • If more downloaders mean more seeders, and people on average seed close to 1.0 (or even above) then my Ratio may mean very little.
  • The tracker may have identified me as a stable seeder and sent downloaders of less popular images (both to download and seed) my way, in an attempt to provide equally good service for everyone (no idea if trackers do this).

The i386-heaviness of Lubuntu indicates that there should be correlation between general popularity and my Ratio.

Raspberry Pi (v1), OpenWrt (14.07) and Node.js (v0.10.35 & v0.12.2)

Since I gave up running NetBSD on my Raspberry pi I decided it was time to try OpenWrt. And, to my surprise I also managed to cross compile Node.js!

Install OpenWrt on Raspberry Pi (v1@700MHz)
I installed OpenWrt Barrier Breaker (the currently stable release) using the standard instructions.

After you have put the image on an SD-card with dd, it is quite easy to resize the root partition:

  1. copy the second partition to an image file using dd
  2. use fdisk to delete the second partition, and create a new, bigger
  3. format the new partition with mkfs.ext4
  4. mount the image file using mount -o loop
  5. mount the new second partition
  6. copy all data from image file to second partition using cp -a

If you want to, you can edit /etc/config/network while you are anyway working with the OpenWrt root partition:

#config interface 'lan'
#	option ifname 'eth0'
#	option type 'bridge'
#	option proto 'static'
#	option ipaddr '192.168.1.1'
#	option netmask '255.255.255.0'
#	option ip6assign '60'
#	option gateway '?.?.?.?'
#	option dns '?.?.?.?'
config interface 'lan'
	option ifname 'eth0'
	option proto 'dhcp'
	option macaddr 'XX:XX:XX:XX:XX:XX'
	option hostname 'rpiopenwrt'

Probably you want to disable dnsmasq, odhcpd and firewall too:

.../etc/init.d/$ chmod -x dnsmasq firewall odhcpd

OR (depending on your idea of what is the right way)

.../etc/rc.d$ sudo rm S60dnsmasq S35odhcpd K85odhcpd S19firewall

Also, it is a good idea to edit config.txt (on the DOS partition):

gpu_mem=1

I don’t know if 1 is really a legal value, but it worked for me, and I had much more memory available than when gpu_mem was not set.

Building Node.js v0.12.2
I downloaded and built Node.js v0.12.2 on a Xubuntu machine with an x64 cpu. On such a machine you can download the standard OpenWrt toolchain for Raspberry Pi.

I replaced configure and cpu.cc in the standard sources with the files from This Page (they are meant for v0.12.1 but they work equally good for v0.12.2).

I then found an a gist that gave me a good start. I modified it, and ended up with:

#!/bin/sh -e

export STAGING_DIR=...path to your toolchain...

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib
export ARM_TARGET_LIB=$CSTOOLS_LIB

export TARGET_ARCH="-march=armv6j"

#Define the cross compilators on your system
export AR="arm-openwrt-linux-uclibcgnueabi-ar"
export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LINK="arm-openwrt-linux-uclibcgnueabi-g++"
export CPP="arm-openwrt-linux-uclibcgnueabi-gcc -E"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"
export AS="arm-openwrt-linux-uclibcgnueabi-as"
export CCLD="arm-openwrt-linux-uclibcgnueabi-gcc ${TARGET_ARCH} ${TARGET_TUNE}"
export NM="arm-openwrt-linux-uclibcgnueabi-nm"
export STRIP="arm-openwrt-linux-uclibcgnueabi-strip"
export OBJCOPY="arm-openwrt-linux-uclibcgnueabi-objcopy"
export RANLIB="arm-openwrt-linux-uclibcgnueabi-ranlib"
export F77="arm-openwrt-linux-uclibcgnueabi-g77 ${TARGET_ARCH} ${TARGET_TUNE}"
unset LIBC

#Define flags
export CXXFLAGS="-march=armv6j"
export LDFLAGS="-L${CSTOOLS_LIB} -Wl,-rpath-link,${CSTOOLS_LIB} -Wl,-O1 -Wl,--hash-style=gnu"
export CFLAGS="-isystem${CSTOOLS_INC} -fexpensive-optimizations -frename-registers -fomit-frame-pointer -O2"
export CPPFLAGS="-isystem${CSTOOLS_INC}"
export CCFLAGS="-march=armv6j"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm

bash --norc

Run this script in the Node.js source directory. If everything goes fine it configures the Node.js build, and leaves you with a shell where you can simply run:

$ make

If compilation is fine, you find the node binary in the out/Release folder. Copy it to your OpenWrt Raspberry Pi.

Building Node.js v0.10.35
I first successfully built Node.js v0.10.35.

The (less refined) script for configuring that I used was:

#!/bin/sh -e

export STAGING_DIR=...path to your toolchain...

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib
export ARM_TARGET_LIB=$CSTOOLS_LIB
export GYP_DEFINES="armv7=0"

#Define our target device
export TARGET_ARCH="-march=armv6"
export TARGET_TUNE="-mfloat-abi=hard"

#Define the cross compilators on your system
export AR="arm-openwrt-linux-uclibcgnueabi-ar"
export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LINK="arm-openwrt-linux-uclibcgnueabi-g++"
export CPP="arm-openwrt-linux-uclibcgnueabi-gcc -E"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"
export AS="arm-openwrt-linux-uclibcgnueabi-as"
export CCLD="arm-openwrt-linux-uclibcgnueabi-gcc ${TARGET_ARCH} ${TARGET_TUNE}"
export NM="arm-openwrt-linux-uclibcgnueabi-nm"
export STRIP="arm-openwrt-linux-uclibcgnueabi-strip"
export OBJCOPY="arm-openwrt-linux-uclibcgnueabi-objcopy"
export RANLIB="arm-openwrt-linux-uclibcgnueabi-ranlib"
export F77="arm-openwrt-linux-uclibcgnueabi-g77 ${TARGET_ARCH} ${TARGET_TUNE}"
unset LIBC

#Define flags
export CXXFLAGS="-march=armv6"
export LDFLAGS="-L${CSTOOLS_LIB} -Wl,-rpath-link,${CSTOOLS_LIB} -Wl,-O1 -Wl,--hash-style=gnu"
export CFLAGS="-isystem${CSTOOLS_INC} -fexpensive-optimizations -frename-registers -fomit-frame-pointer -O2 -ggdb3"
export CPPFLAGS="-isystem${CSTOOLS_INC}"
export CCFLAGS="-march=armv6"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux
bash --norc

Running node on the Raspberry Pi
Back on the Raspberry Pi you need to install a few packages:

# ldd ./node 
	libdl.so.0 => /lib/libdl.so.0 (0xb6f60000)
	librt.so.0 => not found
	libstdc++.so.6 => not found
	libm.so.0 => /lib/libm.so.0 (0xb6f48000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb6f34000)
	libpthread.so.0 => not found
	libc.so.0 => /lib/libc.so.0 (0xb6edf000)
	ld-uClibc.so.0 => /lib/ld-uClibc.so.0 (0xb6f6c000)
# opkg update
# opkg install librt
# opkg install libstdcpp

That is all! Now you should be ready to run node. The node binary is about 13Mb (the v0.10.35 was 19Mb perhaps becuase of -ggdb3), so it is not optimal to deploy it to other typical OpenWrt hardware.

Final comments
I ran a few small programs to test, and they were fine. I guess some more testing would be appropriate. The performance is very comparable to Node.js built and executed on Raspbian.

I think RaspberryPi+OpenWrt+Node.js is a very interesting and competitive combination for microservices!

NetBSD on a Raspberry Pi

As a long time Linux user I have always had some kind of curiosity about the BSDs, especially NetBSD and its minimalistic approach to system design. For a while I have been thinking that perhaps NetBSD is the perfect operating system for turning a Raspberry Pi into a server.

I have read anti-BSD rants like this “BSD, the truth“, and I have also appreciated pkgsrc for Mac OS X. I felt I needed got get my own opinion. It is easy to have a romantic idea about “Old Real UNIX”, but my limited experience with IRIX and Solaris is not that positive. And BSD is another beast.

For the Raspberry Pi (Version 1, Model B) it is supposed to be possible to run both (stable) NetBSD 6.1.5 and (beta) NetBSD 7.0. It seemed, after all, that the beta 7.0 was the way to go.

At first it was fine

I followed the official instructions and installed NetBSD 7.0. I (first) used the (800MB) rpi.img. I set up my user:

# useradd zo0ok
...
# mkdir /home
# mkdir /home/zo0ok
# chown zo0ok:users /home/zo0ok
# usermod -G wheel zo0ok

Then it was time to configure pkgsrc and start installing packages.

The Disk Problem
I did a quick check to see how much available space I have, before installing stuff. To my surprise:

# df -h
Filesystem         Size       Used      Avail %Cap Mounted on
/dev/ld0a          650M       623M      -5.4M 100% /
/dev/ld0e           56M        14M        42M  24% /boot
kernfs             1.0K       1.0K         0B 100% /kern
ptyfs              1.0K       1.0K         0B 100% /dev/pts
procfs             8.0K       8.0K         0B 100% /proc
tmpfs              112M       8.0K       112M   0% /var/shm

It seemed like the filesystem had not been (automatically) expanded as it should be according to the instructions above. So i followed the manual instructions to resize my root partition, with no success whatsover.

So I ran disklabel to see if NetBSD recognized my 8GB SD-card…

# /sbin/disklabel ld0
# /dev/rld0c:
type: SCSI
disk: STORAGE DEVICE
label: fictitious
flags: removable
bytes/sector: 512
sectors/track: 32
tracks/cylinder: 64
sectors/cylinder: 2048
cylinders: 862
total sectors: 1766560
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

8 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a:   1381536    385024     4.2BSD      0     0     0  # (Cyl.    188 -    862*)
 b:    262144    122880       swap                     # (Cyl.     60 -    187)
 c:   1766560         0     unused      0     0        # (Cyl.      0 -    862*)
 d:   1766560         0     unused      0     0        # (Cyl.      0 -    862*)
 e:    114688      8192      MSDOS                     # (Cyl.      4 -     59)

Clearly, NetBSD thought this SD-card was 900MB rather than 8GB, and this is why it failed to automatically resize it.

The sysinst install
I was anyway not very comfortable with getting a preinstalled/preconfigured 800MB system with swap and everything, so I formatted the 8GB SD card with my digital camera (just to be sure the partition table did not contain anything weird), downloaded (6MB) rpi_inst.img and wrote it to the SD card.

NetBSD installation started properly, and I was looking forward to install over SSH. According to the instructions I was supposed to start the DHCP somehow. But DHCP seemed on (the RPi got an IP) but SSH was off, so I installed using keyboard.

Quite immediately I was informed that NetBSD failed to recognise the “disk geometry” properly. I tried the SD card in Linux which reluctantly reported that it had 166 heads and 30 sectors per track (it sounds like nonsense). So I gave this information to the NetBSD sysinst program and now the SD card seemed to be 7.5GB.

Then followed a long and confused period of time when I tried to be smart enough to come up with any working partition scheme that NetBSD could accept. The right procedure was:

  1. Choose entire disk
  2. Confirm to delete the (required) 56MB dos partition
  3. Partition, pretending to be unaware of the need of a dos partition
  4. Magically, in the end, it added the dos partition

I am clearly stupid. There are no words for how confused I am about the a:, c: and e: partitions (that seems to reuse the DOS naming, but for other purposes), the empty space, the disk labels, the BSD partitions inside a (non existing) primary partition.

Anyway, just after I gave up and then gave it a final try I convinced sysinst to install. Then came a phase of choosing download paths, which clearly was non-trivial since I installed a Beta, and I am fine with that.

Installation went on. In the end came a nice menu where I could configure stuff. I liked it! (I wish I knew how to start it later). It managed to get my network settings from DHCP (except the GW), but it failed to configure and test the network itself (despite it had downloaded everything over the network just a few minutes ago). I configured a few other things, I restarted, network was working and I was happy… for a while.

I configured pkgsrc, and it seems ALL other systems where pkgsrc exist have been blessed with the pkgin tool, except NetBSD where you are supposed to do all the job yourself. Well, I added the PKG_PATH to the .shrc (of my user, not root) and enjoyed pkg_add.

(not) Compiling NodeJS
I want to install node.js on my NetBSD Raspberry Pi. It is not in pkgsrc (which is it for Mac OS X, but whatever) so I had to build it myself. I am used to building node.js and I was looking forward to fix all the broken dependencies. If I had ever gotten there.

I downloaded the source and started unpacking it… it is about 10000 files and 100MB of data. My SD card (a SanDisk Ultra, class 10) is not super fast, dd-ing the image to it earlier wrote at a speed of 3MB/s. The unpacking speed of node.js; roughly 1 file per second. I realised I need a (fast) USB-drive or a faster SD card, so I (literally) went out to town, bought a fast USB drive (did not find the SD card I wanted) and a few other things. When I came back more than 8000 files had been extracted and less than 2000 remained. I started reading about how to partition and format a USB drive for NetBSD, and at some point I inserted it in the Raspberry Pi. A little later I noticed my ssh sessions were dead, and the RPi had restarted. It turns out what reality was worse than the truth in “BSD, the truth”:

[…] the kernels of the BSDs are also very fault intolerant.

The best example of this is the issue with removing USBs. The problem appears when USBs are removed without unmounting them first. The result is a kernel panic. The astounding aspect of this is that this problem has been exhibited by all the major BSD variants Free, Open, Net and DragonflyBSD ever since USB support was implemented in them 5 to 6 years ago and has never ever been fixed. FreeBSD mailing lists even ban people who dare mention about it. In Linux, such things never and happen and bugs as serious as this gets fixed before a release is made.

Fact is, NetBSD 7.0 Beta for RPi, crashes, immediately, when I insert a USB drive.

This actually did not make me give up. I really restarted the system with the USB drive inserted, with the intention of treating my USB drive as a fixed disk and not inserting/removing it unless I shut the RPi down first. This was when I did give up: deleting the 16GB dos partition and creating a NetBSD filesystem was just too difficult for me. Admittedly, my patience was running out.

More on memory card performance
I found this very interesting article (linked to, by the Gentoo people, of course). Without going into details; clearly a Raspberry Pi with an SD card root filesystem needs a filesystem and block device implementation that works well with actual SD cards. This is not trivial and this means doing things very differently from rotating media.

I did the same unpacking of the node.js source on Raspbian (I installed Raspbian on exactly the same SD card as I used for NetBSD): 22 seconds (tar: 18s, sync 4s), compared to 3h for NetBSD.

Conclusion
In theory, NetBSD would be a beautiful fit for the Raspberry Pi. The ARMv6 is not supported by standard Debian. Raspbian comes with a little “too much” for my taste (it is not a real problem), and it does not have the feeling of “Debian stable”, but more some “inoffical Debian test” (sorry Raspbian people – I really appreciate your job!).

I have wondered why Noobs does not come with NetBSD… but I think I know now. And, sometimes I am surpised that Linux seems to work better than Mac OS X, perhaps now I know why.

My romantic idea that NetBSD would be perfect for the RPI was just plain wrong. Installing NetBSD today made me remember installing Slackware on a Compaq laptop in 1998.

Perhaps I will give Arch a try. Or put OpenWRT on the RPi.

Faking a good goto in JavaScript

There are cases where gotos are good (most possible uses of gotos are not good). I needed to write JavaScript functions (for running in NodeJS) where I wanted to call the callback function just once in the end (to make things as clear as possible). In C that would be (this is a simplified example):

void withgoto(int x, void(*callback)(int) ) {
  int r;

  if ( (r = test1(x)) )
    goto done;

  if ( (r = test2(x)) )
    goto done;
 
  if ( (r = test3(x)) )
    goto done;
 
  r = 0;
done:
  (*callback)(r);
}

I think that looks nice! I mean the way goto controls the flow, not the syntax for function pointers.

JavaScript: multiple callbacks
The most obvious way to me to write this in JavaScript was:

var with4callbacks = function(x, callback) {
  var r

  if ( r = test1(x) ) {
    callback(r)
    return
  }

  if ( r = test2(x) ) {
    callback(r)
    return
  }

  if ( r = test3(x) ) {
    callback(r)
    return
  }
  r = 0
  callback(r)
}

This works perfectly, of course. But it is not nice with callback in several places. It is annoying (bloated) to always write return after callback. And in other cases it can be a little unclear if callback is called zero times, or more than one time… which is basically catastrophic. What options are there?

JavaScript: abusing exceptions
My first idea was to abuse the throw/catch-construction:

var withexceptions = function(x, callback) {
  var r

  try {
    if ( r = test1(x) )
      throw null

    if ( r = test2(x) )
      throw null

    if ( r = test3(x) )
      throw null

    r = 0
  } catch(e) {
  }
  callback(r)
}

This works just perfectly. In a more real world case you would probably put some code in the catch block. Is it good style? Maybe not.

JavaScript: an internal function
With an internal (is it called so?) function, a return does the job:

var withinternalfunc = function(x, callback) {
  var r
  var f

  f = function() {
    if ( r = test1(x) )
      return

    if ( r = test2(x) )
      return

    if ( r = test3(x) )
      return

    r = 0
  }
  f()
  callback(r)
}

Well, this looks like JavaScript, but it is not super clear.

JavaScript: an external function
You can also do with an external function (risking that you need to pass plenty of parameters to it, but in my simple example that is not an issue):

var externalfunc = function(x) {
  var r
  if ( r = test1(x) )
    return r

  if ( r = test2(x) )
    return r

  if ( r = test3(x) )
    return r

  return 0
}

var withexternalfunc = function(x, callback) {
  callback(externalfunc(x))
}

Do you think the readability is improved compared to the goto code? I don’t think so.

JavaScript: Break out of Block
Finally (and I got help coming up with this one), it is possible to do:

var withbreakblock = function(x, callback) {
  var r
  var f

myblock:
  {
    if ( r = test1(x) )
      break myblock

    if ( r = test2(x) )
      break myblock

    if ( r = test3(x) )
      break myblock

    r = 0
  }
  callback(r)
}

Well, that is at close to the goto construction I come with JavaScript. Pretty nice!

JavaScript: Multiple if(done)
Using a done-variable and multiple if statements is also possible:

var with3ifs = function(x, callback) {
  var r
  var done = false

  if ( r = test1(x) )
    done = true

  if ( !done ) {
    if ( r = test2(x) )
      done = true
  }

  if ( !done ) {
    if ( r = test3(x) )
      done = true
  }

  if ( !done ) {
    r = 0
  } 
  callback(r)
}

Hardly pretty, I think. The longer the code gets (the more sequential ifs there is), the higher the penalty for the ifs will be.

Performance
Which one I choose may depend on performance, if the difference is big. They should all be fast, but:

  • It is quite unclear what the cost of throwing (an exception) is
  • The internal function, is it recompiled and what is the cost?

I measured performance as (millions of) calls to the function per second. The test functions are rather cheap, and x is an integer in this case.

I did three test runs:

  1. The fall through case (r=0) is relatively rare (~8%)
  2. The fall through case is very common (~92%)
  3. The fall through case is extremely common (>99.99%)

In real applications fallthrough rate may be the most common case, with no error input data found. The benchmark environment is:

  • Mac Book Air Core i5@1.4GHz
  • C Compiler: Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)
  • C Flags: -O2
  • Node version: v0.10.35 (installed from pkgsrc.org, x86_64 version)

Performance was rather consistent over several runs (for 1000 000 calls each):

Fallthrough Rate           ~8%      ~92      >99.99%      
---------------------------------------------------------
     C: withgoto           66.7     76.9     83.3  Mops
NodeJS: with4callbacks     14.9     14.7     16.4  Mops
NodeJS: with exceptions     3.67     8.77    10.3  Mops
NodeJS: withinternalfunc    8.33     8.54     9.09 Mops
NodeJS: withexternalfunc   14.5     14.9     15.6  Mops
NodeJS: withbreakblock     14.9     15.4     17.5  Mops
NodeJS: with3ifs           15.2     15.6     16.9  Mops

The C code was row-by-row translated into the JavaScript code. The performance difference is between C/Clang and NodeJS, not thanks to the goto construction itself of course.

On Recursion
In JavaScript it is quite natural to do recursion when you deal with callbacks. So I decided to run the same benchmarks using recursion instead of a loop. Each recursion step involves three called ( function()->callback()->next()-> ). With this setup the maximum recursion depth was about 3×5300 (perhaps close to 16535?). That may sound much, but not enough to produce any benchmarks. Do I need to mention that C delivered 1000 000 recursive calls at exactly the same performance as the loop?

Conclusion
For real code 3.7 millions exceptions per second sounds pretty far fetched. Unless you are in a tight loop (which you probably are not, when you deal with callbacks), all solutions will perform well. However, the break out of a block is clearly the most elegant way and also the most efficient, second only to the real goto of course. I suspect the generally higher performance in the (very) high fallthrough case is because branch prediction gets more successful.

Any better ideas?

Storage and filesystem performance test

I have lately been curious about performance for low-end storage and asked myself questions like:

  1. Raspberry Pi or Banana Pi? Is the SATA of the Banana Pi a deal breaker? Especially now when the Raspberry Pi has 4 cores, and I don’t mind if one of them is mostly occupied with USB I/O overhead.
  2. For a Chromebook or a Mac Book Air where internal storage is fairly limited (or very expensive), how practical is it to use USB storage?
  3. Building OpenWRT buildroot requires a case sensitive filesystem (disqualifying the standard Mac OS X filesystem) – is it feasible to use a USB device?
  4. The journalling feature of HFS+ and ext4 is probably a good idea. How does it affect performance?
  5. For USB drives and Memory cards, what filesystems are better?
  6. Theoretical maximum throughput is usually not that interesting. I am more interested in actual performance (time to accomplish tasks), and I believe this is often limited by latency and overhead than throughput. Is it so?

Building OpenWRT on Mac Book Air
I tried building OpenWRT on a USB drive (with case sensitive HFS+), and it turned out to be very slow. I did some structured testing by checked out the code, putting it in a tarball, and repeating:

   $ cd /external/disk
1  $ time cp ~/openwrt.tar . ; time sync
2  $ time tar -xf ~/openwrt.tar ; time sync   (total 17k files)
   $ make menuconfig - not benchmarked)
3  $ time make tools/install                  (+38k files, +715MB)

I did this on the internal SSD (this first step of OpenWRT buildroot was not case sensitive-dependent), on an external old rotating 2.5 USB drive and on a cheap USB drive. I tried a few different filesystem combinations:

$ diskutil eraseVolume hfsx  NAME /dev/diskXsY   (non journaled case sensitive)
$ diskutil eraseVolume jhfsx NAME /dev/diskXsY   (journaled case sensitive)
$ diskutil eraseVolume ExFAT NAME /dev/diskXsY   (Microsoft ExFAT)

The results were (usually just a single run):

Drive and Interface Filesystem time cp time tar time make
Internal 128GB SSD Journalled HFS+ 5.4s 16m13s
2.5′ 160GB USB2 HFS+ 3.1s 7.0s 17m44s
2.5′ 160GB USB2 Journalled HFS+ 3.1s 7.1s 17m00s
Sandisk Extreme
16GB USB Drive USB3
HFS+ 2.0s 6.9s 18m13s
Kingston DTSE9H
8GB USB Drive USB2
HFS+ 20-30s 1m40s-2m20s 1h
Kingston DTSE9H
8GB USB Drive USB2
ExFAT 28.5s 15m52s N/A

Findings:

  • Timings on USB drives were quite inconsistent over several runs (while internal SSD and hard drive were consistent).
  • The hard drive is clearly not the limiting factor in this scenario, when comparing internal SSD to external 2.5′ USB. Perhaps a restart between “tar xf” and “make” would have cleared the buffer caches and the internal SSD would have come out better.
  • When it comes to USB drives: WOW, you get what you pay for! Turns out the Kingston is among the slowest USB drive that money can buy.
  • ExFAT? I don’t think so!
  • For HFS+ and OS X, journalling is not much of a problem

Building OpenWRT in Linux
I decided to repeat the tests on a Linux (Ubuntu x64) machine, this time building using two CPUs (make -j 2) to stress the storage a little more. The results were:

Drive and Interface Filesystem real time user time system time
Internal SSD ext4 9m40s 11m53s 3m40s
2.5′ 160GB USB2 ext2 8m53s 11m54s 3m38s
2.5′ 160GB USB2 (just after reboot) ext2 9m24s 11m56s 3m31s
Kingston DTSE9H
8GB USB Drive USB2
ext2 11m36s
+3m48s (sync)
11m57s 3m44s

Findings:

  • Linux block device layer almost eliminates the performance differences of the underlying storage.
  • The worse real time for the SSD is probably because of other processes taking CPU cycles

My idea was to test connecting the 160GB drive directly via SATA, but given the results I saw no point in doing so.

More reading on flash storage performance
I found this very interesting article (linked to by the Gentoo people of course). I think it explains a lot of what i have measured. I think, even the slowest USB drives and Memory cards would often be fast enough, if the OS handles them properly.

Conclusions
The results were not exactly what I expected. Clearly the I/O load during build is too low to affect performance in a siginficant way (except for Mac OS X and a slow USB drive). Anyway, USB2 itself has not proved to be the weak link in my tests.

Build OpenWRT Toolchain on Mac OS X

A very quick guide to building the OpenWRT buildroot or toolchain on Mac OS X (10.10).

1. Install Xcode
Install Xcode from App Store (it is free).

2. Install pkgsrc
I have used fink, macports and homebrew, but now that I have tried pkgsrc I don’t think I will consider any of the others in a while. Install pkgsrc the standard way. Note: there is an x86_64 version for Mac OS X – it is probably what you want – just replace i386 with x86_64 in the download link.

Using pkgsrc, install these packages required by OpenWRT:

$ sudo pkgin install getopt coreutils gawk gtar findutils

3. Case sensitive filesystem
Your root filesystem on your Mac is probably case insensitive, and that is supposed to cause problems to building OpenWRT. Get yourself a USB disk or make a disk image an format it as case sensitive HFS+. If you do it from the command line you can avoid making it journaling:

$ diskutil eraseVolume hfsx OpenWRTdisk /dev/disk3s2

4. Building
This assumes you want the toolchain from that current stable build (14.07):

$ git clone git://git.openwrt.org/14.07/openwrt.git
$ cd openwrt
$ scripts/feeds update -a
$ scripts/feeds install -a
$ make menuconfig

In menuconfig I made just two changes: 1) setting my target platform, 2) asking toolchain to be built. Make all settings you want. I then ran:

$ make toolchain/install

You now find your toolchain is now in staging_dir :)
If you instead would have run just “make” the entire OpenWRT firmware would have been built.

Working OpenVPN configuration

I am posting my working OpenVPN server configuration, and client configuration for Linux, Android and iOS. First a little background.

I have an OpenWRT (14.07) router running OpenVPN server. This router has a public IP address and thanks to dyn.com/dns it can be resolved using a domain name (ROUTER.PUBLIC in all configuration examples below).

My router LAN address is 192.168.8.1, the LAN network is 192.168.8.*, and the OpenVPN network is 192.168.9.* (in this range OpenVPN-clients will be given an address to their vpn/dun-device). I run OpenVPN on TCP 1143.

What I want to achieve is
1) to access local services (like ownCloud and ssh) of computers on the LAN
2) to access internet as if I were at home, when I have an internet access that is somehow restricted

The Server
Essentially, this OpenWRT OpenVPN Setup Guide is very good. Follow it. I am not going to repeat everything, just post my working configurations.

root@breidablick:/etc/config# cat openvpn 

config openvpn 'myvpn'
	option enabled '1'
	option dev 'tun'
	option proto 'tcp'
	option status '/tmp/openvpn.clients'
	option log '/tmp/openvpn.log'
	option verb '3'
	option ca '/etc/openvpn/ca.crt'
	option cert '/etc/openvpn/my-server.crt'
	option key '/etc/openvpn/my-server.key'
	option server '192.168.9.0 255.255.255.0'
	option port '1143'
	option keepalive '10 120'
	option dh '/etc/openvpn/dh2048.pem'
	option push 'redirect-gateway def1'
	option push 'dhcp-option DNS 192.168.8.1'
	option push 'route 192.168.8.0 255.255.255.0'

It is a little unclear if the last three options really work for all clients. I also have:

root@breidablick:/etc/config# cat network 
.
.
.
config interface 'vpn0'
	option ifname 'tun0'
	option proto 'none'

and

root@breidablick:/etc/config# cat firewall 
.
.
.
config zone
	option name 'vpn'
	option input 'ACCEPT'
	option forward 'ACCEPT'
	option output 'ACCEPT'
	list network 'vpn0'
.
.
.
config forwarding
	option src 'lan'
	option dest 'vpn'

config forwarding
	option src 'vpn'
	option dest 'wan'
.
.
.
# may not be needed depending on your lan policys (2 next)
config rule
	option name 'Allow-lan-vpn'
	option src 'lan'
	option dest 'vpn'
	option target ACCEPT
	option family 'ipv4'

config rule
	option name 'Allow-vpn-lan'
	option src 'vpn'
	option dest 'lan'
	option target ACCEPT
	option family 'ipv4'
.
.
.
# may not be needed depending on your wan policy
config rule
	option name 'Allow-OpenVPN-from-Internet'
	option src 'wan'
	option proto 'tcp'
	option dest_port '1143'
	option target 'ACCEPT'
	option family 'ipv4'

iOS client
You need to install OpenVPN client for iOS from the app store. The client configuration is prepared on your computer, and synced with iOS using iTunes (brilliant or braindead?). This is my working configuration:

client
dev tun
ca ca.crt
cert iphone.crt
key iphone.key
remote ROUTER.PUBLIC 1143 tcp-client
route 0.0.0.0 0.0.0.0 vpn_gateway
dhcp-option DNS 192.168.8.1
redirect-gateway def1

This route and redirect-gateway configuration makes all traffic go via VPN. Omit those lines if you want direct internet access.

Android client
For Android, you also need to install the OpenVPN client from the Store. My client is the “OpenVPN for Android” by Arne Schwabe. This client has a GUI that allows you to configure everything (but you need to get the certificate files to your Android device somehow). You can watch the entire Generated Config in the GUI and mine looks like this (omitting GUI and Android-specific stuff, and the certificates):

ifconfig-nowarn
client
verb 4
connect-retry-max 5
connect-retry 5
resolv-retry 60
dev tun
remote ROUTER.PUBLIC 1143 tcp-client
route 0.0.0.0 0.0.0.0 vpn_gateway
dhcp-option DNS 192.168.8.1
remote-cert-tls server
management-query-proxy

Linux client
I also connect linux computers occationally. The configuration is:

client
remote ROUTER.PUBLIC 1194
ca ca.crt
cert linux.crt
key linux.key
dev tun
proto tcp
nobind
auth-nocache
script-security 2
persist-key
persist-tun
user nobody
group nogroup
verb 5
# redirect-gateway local def1
log log.txt

Here the redirect-gateway is commented away, so internet traffic is not going via VPN.

Certificates
The easy-rsa package and instructions in the OpenWRT guide above are excellent. You should have different certificates for different clients. One certificate can only be used for one connection at a time.

Better configuration?
I dont say this is the optimal or best way to configure OpenVPN – but it works for me. You may prefer UDP over TCP, and may reasons for running TCP are perhaps not valid for you. You may want different encryption or data compressions options, different logging options and so on.

Nodejs v0.12.0 on (unsupported) PowerPC G4

Nodejs can not be built for a G4 processor (PowerPC 7455, as found in pre-Intel Apple hardware) because of a few missing CPU instructions. IBM has made a Power/PowerPC-port of V8 (the JavaScript engine of Nodejs), but it does not work with the G4.

However, there is a quite simple workaround that can probably work for other unsupported platforms (PowerPC G3) as well, but ARMv5 failed.

This solution is to emulate a supported (i386) CPU using Qemu. Qemu is capable of emulating an entire computer (qemu-system-i386) or just emulate for a single program/process (qemu-i386). That is what I do.

I am running Debian 7 on my G4 computer, which comes with an old version of Qemu. It is old enough to not support the system call ‘futex’ (system call 240). My suggestion is to simply use debian backports to install a much more recent version of qemu.

# Add to /etc/apt/sources.list
deb http://http.debian.net/debian wheezy-backports main

# Then run
$ sudo apt-get update
$ sudo apt-get -t wheezy-backports install qemu-user

Now you can use the command qemu-i386 to run i386 binaries. Download the i386 binary linux version of nodejs and extract it somewhere. I extracted mine in /opt and made a symlink to /opt/node for convenience. Now:

zo0ok@sleipnir:~$ qemu-i386 /opt/node/bin/node 
/lib/ld-linux.so.2: No such file or directory

Unless you want to build your own statically linked nodejs binary, you need to get a few libraries from an i386 linux machine. I put these in /opt/node/bin/lib:

zo0ok@sleipnir:/opt/node/bin/lib$ ls -l
total 3320
-rw-r--r-- 1 zo0ok zo0ok  134380 mar  3 21:02 ld-linux.so.2
-rw-r--r-- 1 zo0ok zo0ok 1754876 mar  3 21:13 libc.so.6
-rw-r--r-- 1 zo0ok zo0ok   13856 mar  3 21:06 libdl.so.2
-rw-r--r-- 1 zo0ok zo0ok  113588 mar  3 21:12 libgcc_s.so.1
-rw-r--r-- 1 zo0ok zo0ok  280108 mar  3 21:11 libm.so.6
-rw-r--r-- 1 zo0ok zo0ok  134614 mar  3 21:12 libpthread.so.0
-rw-r--r-- 1 zo0ok zo0ok   30696 mar  3 21:05 librt.so.1
-rw-r--r-- 1 zo0ok zo0ok  922096 mar  3 21:08 libstdc++.so.6

For your convenience, I packed them for you:
https://dl.dropboxusercontent.com/u/9061436/code/linux-i386-lib.tgz
These are from Xubuntu 14.04.1 i386. The original symlinks are eliminated and the files come from different lib-folders. I packed exactly what you need to run the precompiled node-v0.12.0 binary.

Now you should be able to actually run nodejs:

$ zo0ok@sleipnir:~$ qemu-i386 -L /opt/node/bin/ /opt/node/bin/node --version
v0.12.0

To make it 100% convenient I created /usr/local/bin/nodejs:

zo0ok@sleipnir:~$ cat /usr/local/bin/nodejs 
#!/bin/sh
qemu-i386 -L /opt/node/bin /opt/node/bin/node "$@"

Dont forget to make it executable (chmod +x).

Performance is not amazing, but good enough for my purposes. It takes a few seconds to start nodejs, but when running it seems quite fast. I may post benchmarks in the future.