Tag Archives: Performance

Python vs C

A few days ago I wrote a post about how to convert a csv list to a table. I spent some time writing my program in C (for fun and for performance).

Now I have written an equivalent program in Python, and I am going to present the differences.
First the sourcecode: csv-list2table.c and csv-list2table.py.

About the C implementation
The C implementation uses only standard C library (stdlib.h, stdio.h, string.h). The good thing is that the program is very portable and easy to compile. The bad thing is that I do not have access to powerful datatypes (hash tables, balanced trees, sets). This basically means wasting time reinventing the wheel, and not getting a perfect wheel anyway. My estimation is that the C implementation is O(n) for suitable input (fairly sorted data) and O(n)Sqrt(n) for unsuitable data (reversely sorted). This is for fairly “square” matrices. The C implementation is not really optimized (perhaps a later excercise).

About the python implementation
The python implementation was written after the C implementation, and very nicely utilizes the python built in set and dict datatypes. Should be O(n) or O(n)log(n) depending on Python dict implementation.

The machine
The benchmarks are performed using the time command on an AMD Athlon II X2 250 machine with 4GB of RAM. The machine runs x64 Xubuntu 11.10.

The tests

C Python
Environment gcc 4.6.1, -O2 Python 2.7.2+
Lines of code
(excl help text)
355 lines 53 lines
Coding time 8h 1h
Input size Execution time
100×100
288kb
Sorted data: 0.014s
Reversed data: 0.027s
0.11s
200×200
1.2Mb
0.039s – 0.052s 0.18s
400×400
4.5Mb
0.16s – 0.25s 0.64s
800×800
18Mb
0.57s – 1.2s 2.6s
1600×1600
72Mb
2.4s – 7.8s 10s
3200×3200
288Mb
9.7s – 51s 41s
6400×6400
1.2Gb
40s – 337s Not enough RAM
RAM usage for
3200×3200
152Mb 1.4Gb

Conclusions
I draw the following conclusions:

  • Python is a very efficient language compared to C, when it comes to producing code fast, in this case about 10x faster
  • Python is impressively fast, even startup overhead is reasonable
  • Python can beat C on performance, when C programmer has not found optimal data type/algorithm. The Python datatypes makes it easy to write code that is fast even for large data
  • A well written C program uses 10% the RAM of Python, and is at least 5x faster. Especially the small RAM requirement is very powerful and valuable for many applications

It is tempting to optimize the C program to see if I can get 2x, 3x or 5x speedup.
It is also tempting to write a C program that uses Glib, and have access to well implemented data types. How would it compare?

Railworks 3 Train Simulator 2012 performance tuning

A while ago I wrote a post about system requirements for Train Simulator 2012. The more I have been plaing the simulator, the more I have had the feeling that I am missing something (because my visual experience is not the best possible, or even very impressive).

My computer is an AMD X2 250 (dual core 3.0Ghz), 4GB RAM, GeForce 520 GT, and Windows 7 64-bit edition. That is well above the requirements of the game, but far from state of the art hardware. Perhaps buying a new computer?

I have been reading forums and and many people, many with very powerful hardware, have been complaining. A new computer, even on a high budget, is not guaranteed to run Train Simulator 2012 perfectly, it seems.

I decided to measure performance for different game setting and both measure FPS and look at screenshots. The parameters I have worked with are:

  • Screen resolution
  • Basic settings (Anti alising)
  • Train Simulator 2012 Game Technology (on/off)
  • Master settings (Details)

I have been using Fraps to measure FPS. All FPS tests and screenshots are from the Bath Green Park to Templecombe scenario A Day of Two Halves

Disappointed at 1280×1024
For a long while I have been playing the simulator at 1280×1024 (the native resultion of my display). With my computer, it is hard to achieve reasonable FPS as you see in this table.

2012 Tech: OFF ON
Basic settings: Low Medium Low Medium
Master settings
– Lowest 50-70 33-35 25-35
– Medium 30-40 13-14 12-14
– Highest 30-40

I want 30 fps, nothing below 20 fps. So, I didn’t even bother to measure the left out cells. Obviously, at 1280×1024, I can choose between some details (like buildings), or some anti-aliasing. And there are too few meaningful combinations to really draw any conclusions.

More interesting at 800×600
I decided to test more combinations at minimum resolution (800×600), and also take screenshots to be able to decide what compromises are best. I believe with better hardware, you will get a similar pattern at higher resolution.

2012 Tech: OFF ON
Basic settings: Low Medium High Low Medium High
Master settings
– Lowest 70 70 42 71 50 40
– Medium 60 35 19 46 35 27
– Highest 60 32 18 37 25 20

Clicking the number should open the screen shot in a separate tag. This way, you can easily compare two different settings side by side to decide for yourself.

Conclusions
My conclusions are:

  • Resolution matters a lot when it comes to what settings you can use
  • Master setting: medium is enough, highest makes little difference
  • Basic settings: medium is a huge improvement over low
  • 2012 Tech: in some ways nicer, but in my opinion, not that much of an improvement

I think I will run the game on Medium/Medium in the future, with 2012 tech turned off, in 800×600 or 1024×768, depending on the scenario (North East Corridor seems heavier). And I might have overcome my desire to get a faster computer.

Vmware Server 2 high cpu waiting time problem

The following observations are made on VmWare Server 2.0.2, with Debain 6.0 Host and Windows 2008R2 Guests, all systems being x64. The host has a Core2Quad CPU and 8GB of RAM.

A previously healthy environment turned very slow after setting up another Windows guest on the same host. The Windows guest system “felt slow” in the way that the GUI froze for seconds, and that loading times were very slow. The Linux host appeared healthy (immediate command line response, no swap usage, plenty of disk available). The top command revealed very high “waiting time”. I first interpreted this as an I/O problem… but it was not.

The three Windows guests all had 2 virtual cpus, making a total of 6 virtual CPUs running on four physical cores (I do believe it is four cores, not two cores with four threads, not 100% there). Assigning just one virtual CPU to each virtual guest essentially solved to problems.

Conclusion: Avoid virtual SMP in Vmware Server 2, unless you have “enough” physical cores.

Perhaps I will try a one-dual-cpu plus three-single-cpu configuration some day to see how that works.

Improve WiFi range and speed

I did some reading to understand what affects WiFi range, but it was hard to find really useful information. Especially, I wanted to know what range I could expect, measured in meters, under different conditions. Well, I did some experiments and here are the results.

The main base station is an Apple Airport Extreme basestation (the one introduced in 2004, without N capability). The range has been extended with WDS using either an Airport Express, or an Asus wl-520gu running Tomato firmware. An iPhone 4 was used to measure range and signal quality (yeah, the best tool for that job).

All experiments are made in a rural area with no other interfering WiFi networks.

Original setup
Originally, the range was extended with the Airport Express, extending the network from one house to another. The distance between the Airports was about 15 meters, going through a few wooden walls. The connection quality was very questionable with lost connections, occationally very bad bandwidth and sometimes the need to restart the network equipment. A mixed B/G network with no encryption was used.

wl-520gu with improved antenna
I got a 32cm long antenna (9dbi) for the wl-520gu. Replacing the Airport Express with this improved wl-520gl made a huge difference. The connection to the Airport Extreme was perfect.

I decided to walk around in the vicinity with my iPhone, and found good reception (3 bars out of 4) about 75 meters away. There were no obstacles between me and and the house where the wl-520gl was located. As soon as I got behind a hill or something, connection was completely lost.

It is worth noting that AirPort Extreme and Tomato can do WDS together (at least without encryption).

wl-520gu with standard antenna
Repating the experiments above with the standard antenna I got much worse results. At 30 meters away, I got worse reception on my iPhone, than I had 75 meters away with the improved antenna. The included standard antennas are not the best ones for your equipment.

Experimenting with transmission power
Tomato allows you to set the WiFi transmission power (range 1-255 mW). Default was 42mW and I raised it to 200mW. So far, I have not noticed any benefits with higher transmission power, and I left it at 50mW. After all, this is the same frequency that microwave ovens use.

G more stable than B
Initially the network was in mixed B/G mode. I thought perhaps the network would be more stable and have better range in the slower B mode. Wrong! It turned out that G mode is not only faster, but also more stable over longer distances, than B.

Directional antennas
I have not experimented with directional antennas. But, my 9dbi antenna gave me quite good range. Obstacles seems to matter much more than distance, at least for shorter distances (up to 75 meters).

On antenna connectors
There are different connectors for WiFi antennas. A very common one is RP-SMA (also called rev-SMA). ASUS uses this connector for their routers. The Linksys WRT54GL has another connector.

VmWare Server 2.0 host filesystem performance

I manage a few Linux machines that run VmWare Server 2.0.2. On those I have a few Windows Server OS guests.

A typical host is a Quad-Core Intel Core 2 processor, 8 GB RAM, separate system-drive and drive for virtual machines. It runs Debian (5.0 or 6.0) and VmWare Server 2.0.2.

A typical guest could be a Windows Server 2008 with 2GB RAM, 24GB C-drive, 12GB E-drive.

To get decent filesystem performance on the hosts I have used XFS and split the VmWare disk images into 2GB pieces. They have been allowed to grow dynamically.

Over time I have the feeling performance have grown worse, and not been very impressive. Different things have been tried. Finally, on of the hosts where reinstalled (Debian 6.0 instead of Debian 5.0), and btrfs was used instead of XFS. Horrible!

Filesystem 2GB Split Growable Performance
XFS Yes yes Questionable (at least after 12 months)
btrfs No No Horrible – 30min until Windows replies to ping
btrfs Yes No Bad – replies to ping in less than three minutes, but both physical Linux and virtual Windows experiences I/O-delays of a few seconds. Very un-snappy.
ext2 No No Excellent! Fast boot. Snappy.

Perhaps journaling filesystems have their advantages, but I make backups of all machines nightly and dont worry much of a filesystem crash. Also, ext2 can be considered fairly mature, proven and stable.

I will probably do some migration in the next days (reformat some XFS as ext2). Maybe I will provide some properly quantified measures. However, just moving the virtual machines and changing their format may fix problems with fragmentation, so it is hard to make a fair before-after-test.

Windows 7 – faster and simpler graphics

Windows 7 is arguably the best client OS from Microsoft. It can be a little slow on old computers or netbooks). When you have enjoyed the GUI a few days and want more productivity, at the cost of some appearance, stop and disable this service:

  Desktop Windows Manager Session Manager

It still looks like Windows 7, but it looks a bit less transparent and fat.

Disable Paging in Mac OS X

Virtual memory, Paging or Swap used to be a way to pretend your computer had more memory than it actually had. It made it possible to run programs that would otherwise not run at all. But it was never fast. Nowadays when modern computers have at least 2Gb of RAM, swap is not very needed. In fact you rather want your OS to use memory to cache the contents of your hard drive, not the other way around. Linux handles this thing beautifully; give it swap it does not need and it will not touch it; give it more memory than needed and it will make use of it. Mac OS X however, does not use its swap (or paging) very nicely. I found a hack that disables paging entirely in Mac OS X. I like it. Try it if you have more memory than you need. Dont use all your memory ;)

#OFF
sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.dynamic_pager.plist

#ON
sudo launchctl load -wF /System/Library/LaunchDaemons/com.apple.dynamic_pager.plist

Works on 10.5, and probably more versions. You find your swap files in

/private/var/vm

and you can delete them after turning paging off.