Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

ab · 2014-06-09 19:47:22

Since there was recently some articles about performance comparison between several versions of the Delphi compiler.

IMHO there won't be any definitive statement about this.
I'm always doubtful about any conclusion which may be achieved with such kind of benchmarks.
Performance is an iterative process, always a matter of circumstances, and implementation.

Circumstances of the benchmark itself.
Each benchmark will report only information about the process it measured.
What you compare is a limited set of features, running most of the time an idealized and simplified pattern, which shares nothing with real-world process.

Implementation is what gives performance.
Changing a compiler will only gives you some percents of time change.
Identifying the true bottlenecks of an application via a profiler, then changing the implementation of the identified bottlenecks may give order of magnitudes of speed improvement.
For instance, multi-threading abilities can be achieved by mORMot regression tests.

With our huge set of regression tests, we have at hand more than 16,500,000 individual checks, covering low-level features (like numerical and text marshaling), or high-level process (like concurrent client/server and database multi-threaded process).

You will find here some benchmarks run with Delphi 6, 7, 2007, XE4 and XE6 under Win32, and XE4 and XE6 under Win64.
In short, all compilers performs more or less at the same speed.
Win64 is a little slower than Win32, and the fastest appears to be Delphi 7, using our enhanced and optimized RTL.

This is the forum thread of http://blog.synopse.info/post/2014/06/0 … E4-and-XE6

edwinsn · 2014-06-11 10:35:32

FPC/Lazarus is very promising!

Stefan · 2014-06-11 17:17:45

edwinsn wrote:

FPC/Lazarus is very promising!

Yeah, for over ten years now...

What I am missing with all these recent performance comparisons is an exact analysis what makes the difference if there is one and what parts could be improved.
Like for example you say D7 is fastest which is because of the optimized RTL. I see a different of like 10% give or take between D7 and XE6.
Is it really the RTL that is making these 10%? Or is it something different? Maybe the RTL would make 20% but the generated code got 10% worse?

So imho without further analysis we can only guess and claim that poor code generation in some parts are "slowing it down" and that compiler X or Y is sooo much better.

Last edited by Stefan (2014-06-11 17:22:47)

ab · 2014-06-11 18:18:14

I guess that it is in fact, as you wrote, 20% better RTL, but 10% worse code generation (no inlining).

For Delphi 7, we used our enhanced RTL.
With the "standard" RTL, I guess that Delphi 2007 will be the fastest platform.

I would never write that compiler X or Y is much better.
As I wrote in the conclusion, there is no big difference in the generated asm.

Stefan · 2014-06-11 20:27:46

I know you don't but some people do when it comes to these comparisons.

Yes it did not get worse. But is that good? Other compilers are getting better or faster over time and we are happy if it doesn't get worse? I know that is not true.
But not only the compiler but also the RTL. I know there have been some - well let's say it polite - bugs in some places that made one or another routine dead slow.

ab · 2014-09-02 07:33:51

From Goggle+ discussion:

Bill Meyer wrote:

Interesting observations, Arnaud. To be fair, we have been stuck on unimproved CPU speeds, too. Parallelism would seem to be the only path for performance increases, other than through better compiler design and better optimizations. Your summary suggests that there has been little happening on those fronts, and now XE7 sticks its toes in the water on parallel loops.
At a larger view, I think that either people are not taking the view that performance is sufficient (never true) or perhaps that there have been too many distractions from the pursuit of performance.
Each new version of Windows offers greater consumption of resources to paint the screen, but not so much in other areas. My Ivy Bridge i7 is nice, but in general, apps open in Win7 with about the same performance as Win2K apps did in Win2K. They are larger--much--and have more features, but increased performance is not usually one of them. For contrast, WinXP in a VM on the same machine is very fast. If I could bother mysaelf to install NT4 in a VM, it might be stunningly fast.
Oddly, I think there is stagnation in tools development. Not with respect to features, but with respect to fundamentals. We get more embellishments, more chrome, but we have seen long-standing defects continue to exist, and the list of defects grow longer as the new features inevitably bring new defects.
In XE6 and XE7, has ErrorInsight been repaired? Has J# been removed from the environment, or is it retained? I have seen comments regarding IDE stability improvements, and those are one of my top concerns, languishing here in D2007. But in my last big project, in DXE, I found it necessary to shut down features, as a trade-off against stability.

Indeed.
My answer:

Arnaud wrote:

There were some great pieces of codes like OmniThreadLibray (which you know well, if I remember well) Will there be enhancements at language level, like a "yield" new keyword? Or would some OTL-like be bound to the RTL?
What worried me about the multi-thread efficiency of future Delphi, is that its low-level RTL was never freed from its locks (e.g. at FastMM4 level - they should offer an alternative for multi-core process), and, on the contrary, some awful implementation patterns were introduced. For instance, the [weak] attribute uses a TMonitor giant lock, so if you use it - which is mandatory with ARC and circular references - your multi core perfs won't scale.
Delphi apps are still very fast to run, using low resources, when compared to C#, Java or JavaScript, for instance. IMHO this is one huge benefit of our beloved platform. Manual memory management was never a problem to me, especially for performance sensitive applications.
The #1 rule for performance optimization is not the compiler, the language, neither the RTL. The biggest speed improvement comes from profiling, then changes of algorithms. A lot of developers see the compiler as some kind of "wizard" who would magically speed up thinks. They focus on code. Whereas, to quote Linus Torvald: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." Adding a lookup table, cache some data, change the algorithm, tune your data structures, knowing how much heap allocation costs, using arrays to favor L1 CPU cache, even use the Alt-F7 asm-level step-by-step debugging in the IDE.... all this will give you performance for real-world applications. Then if it is not enough (e.g. for picture or video process, or for a server with a lot of concurrent clients), you may start to think about parallelism.
Like you, I always shut down Error Insight and similar buggy features in the Delphi IDE. Otherwise, you just can't work with it... Even the RTL libraries are not properly handled by the Error Insight compiler
The slow startup time (mostly due to the license check code AFAIK) of the IDE is also a concern to me, since I have to restart it quite often . When I compare with SmartMobileStudio (which opens and compiles a project as quick as Notepad starts) which is my 2nd IDE those days, and see how its IDE and compiler is improving every day, I feel a bit sorry for EMB.
For instance, there is only one compiler in SMS (whereas EMB fails to maintain all the diverse compilers embedded within Delphi): you can be sure that when the IDE offers you code completion, or show you errors, it will match reality. Just another world....

turrican · 2015-09-02 20:09:32

There is any benchmark mORMot with Delphi compiler VS FPC Compiler?

ab · 2015-09-03 07:39:48

It depends on the part of the process.

But running our regression tests after FPC compilation is slightly slower than with Delphi.
It is IMHO mostly due to FastMM4, shipped with Delphi, which is faster than FPC heap manager, in a single-threaded application. FPC heap manager is cross-platform and written in object pascal, and uses a per-thread heap, whereas FastMM4 has tuned asm versions. On a multi-threaded server, FPC heap manager may behave better - but mORMot tries to avoid most memory allocations, so perhaps the heap is not the bottleneck.
On production, we would hardly find a noticeable performance difference between the two compilers, under Win32. For Win64, I found the FPC compiler to be more efficient than Delphi's.
For mobile/ARM platforms, FPC is usually more efficient than Delphi's, especially due to the RTL, not the compiler.

In practice, both are just fine, comparable in execution speed, with a slight advantage to Delphi.

mORMot Open Source

#1 2014-06-09 19:47:22

Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

#2 2014-06-11 10:35:32

Re: Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

#3 2014-06-11 17:17:45

Re: Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

#4 2014-06-11 18:18:14

Re: Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

#5 2014-06-11 20:27:46

Re: Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

#6 2014-09-02 07:33:51

Re: Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

#7 2015-09-02 20:09:32

Re: Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

#8 2015-09-03 07:39:48

Re: Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

Board footer