It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
Here is a strange question I have. Which of the following would perform better on an x86 system:

1. A program written in C, compiled to ARM machine code, then run on the x86 system via qemu.
2. A program written in Python, run using cpython (the standard Python interpreter) natively (no emulation) on the same CPU.

(The CPU architectures here are, of course, just examples; any two CPU architectures could be chosen for the comparison, and I wouldn't expect the result to be different.)

Another question I could ask, of course, is how would the VMs of other programming languages that use them (like Java and C#) compare here?
Typically emulation takes 200x more power to do the same thing.

Interpreters aren't much better, but often they are pre-compiled into a bytecodes or symbols, which just require a look-up and then doing the individual executions.

Of course full emulation also has to work around trying to emulate hardware as well, while interpreters have access to actual hardware (when present).

If there's JIT or recompiling to native code done, then the emulation would probably be superior even if it's not running fully natively.

But then there's the architecture differences. x86 CISC (Complex) vs ARM RISC (Reduced), where you have a lot of specialized instructions vs simpler instructions but takes more of them to do a job. It would mean if you converted a program for ARM directly to x86 with no changes (other than registers and workarounds) it would be slower because the ARM instructions run 1 cycle per instruction where most of the x86 ones require anywhere from 1 to 80 cycles.

Hmmm... might need some benchmarks...
avatar
rtcvb32: Of course full emulation also has to work around trying to emulate hardware as well, while interpreters have access to actual hardware (when present).
QEMU has user space emulation; if you use qemu-arm (as opposed to qemu-system-arm), you can emulate *just* the ARM CPU without having to emulate other hardware such as storage; this would get rid of the issue.

(Something like qemu-arm can, for example, be used to fix a broken Raspbian system that won't boot on the Pi due to a problem with the userland, or even just to install software if the Pi doesn't have a (good) internet connection, for example.)
avatar
rtcvb32: If there's JIT or recompiling to native code done, then the emulation would probably be superior even if it's not running fully natively.
QEMU does do JIT compiling, if you use the tcg backend (as opposed to, say, kvm, which isn't CPU emulation at that point). Then again, the VMs of many languages, Java being the most famous example, use JIT compilers (though these days I believe even JavaScript in web browsers is JIT compiled). Even Python has a JIT implementation (PyPy).
Post edited July 28, 2019 by dtgreene
In my experience, Python consumes all available resources, and more, and is slower than you would expect for nearly everything. Other people think you can use Python for low-level high-resolution system monitoring without incurring a penalty. Then again, some people (at a place I used to work, no less) claimed, via published research, that JIT can be faster than static compilation.

As with anything, it depends on what your program is intended to do, and how many resources are available. I would go for C -> emulation simply because I hate Python with a passion. On the other hand, you might be better off with a less generic emulator than qemu for better performance. Not that I can recommend any specific one, mind you.

edit: FYI, if you had suggested Lua, rather than Python, I would've said "Use Lua" even though I avoid tome4 specifically because it is coded entirely in Lua. This is not only because I like Lua more than Python, but also because luajit can help with performance. The best you can do with Python is probably pypy, which still sucks.
Post edited July 28, 2019 by darktjm
I did a bit of testing, using a (non-memoized) recursive Fibonacci sequence calculator to calculate the 32nd term in the sequence. Here are the results:

* C++ (via QEMU) is faster than native cpython 3 by a factor of almost 5x.
* Native pypy 3 is faster than emulated C++, though not by that much (only about 20-25%). It's worth noting, however, that the difference is still longer than the time it takes to run /bin/true via qemu, so that rules out the start-up time being the deciding factor.
avatar
rtcvb32: Typically emulation takes 200x more power to do the same thing.

Interpreters aren't much better, but often they are pre-compiled into a bytecodes or symbols, which just require a look-up and then doing the individual executions.

Of course full emulation also has to work around trying to emulate hardware as well, while interpreters have access to actual hardware (when present).

If there's JIT or recompiling to native code done, then the emulation would probably be superior even if it's not running fully natively.

But then there's the architecture differences. x86 CISC (Complex) vs ARM RISC (Reduced), where you have a lot of specialized instructions vs simpler instructions but takes more of them to do a job. It would mean if you converted a program for ARM directly to x86 with no changes (other than registers and workarounds) it would be slower because the ARM instructions run 1 cycle per instruction where most of the x86 ones require anywhere from 1 to 80 cycles.

Hmmm... might need some benchmarks...
The actual difference between native and emulated performance seems to be about 8x-10x in my fibonacci tests (this time using the python version, which does include emulating the interpreter for one test).

In case it matters, the CPU in question used for the tests is a modern Celeron, and I was actually emulating x86_64 rather than ARM.

Maybe I need to do these tests on a Raspberry Pi, to see if the results are any different for that architecture (and to compare how fast ARM emulation is compared to x86).
Post edited July 29, 2019 by dtgreene