You are not logged in.
Pages: 1
Just open account on Oracle Cloud and create new compute VM: 4 ARMv8.2 CPU 3GHz, 24GB Ram (yes 24GB).
This is always free VM (you can combine this 4 cores and 24GB Ram to 1 or many (4) VM).
Install Ubuntu 20.04 server, then install LXDE and XRdp for remote access.
Now I have nice speed workstation. Install fpcupdeluxe then fpc 3.2.2/laz 2.0.12, all OK. fpcup build is faster then my local pc build :-)
mormot2 compile package OK, build mormot2_test OK.
mormot2test freeze on 2.7. Client server access: , I must press ^C
log end screen stdout: http://halkyon.com/download/mormot2test.zip
This OCI VM can be great mormot application server for some projects. I don't have any connection to Oracle just test there product.
Offline
Nice findings about Oracle Cloud!
FPC 3.2.2 has troubles with variants late binding, even on Intel/AMD.
https://synopse.info/forum/viewtopic.php?id=5894
But there seems indeed to be a problem with our SQLite3 / AARCH64 static linked content.
We will try to see what's wrong. Perhaps Alfred could help here also.
I just tried Oracle Cloud, but it needs a small credit card fee of 1€ which failed for my french Visa. Don't know why.
Anyway, I will boot up my Raspery Pi and try out!
Online
1€ only for verification, try several times. I see this problem on some youtube video.
Try Netherlands as home
Last edited by ttomas (2021-08-03 08:52:38)
Offline
Online
After boot of VM you need this commands:
sudo apt update
sudo apt upgrade
sudo apt install lxde
sudo apt install xrdp
sudo passwd ubuntu
# you will add pw for gui login to default ubuntu user
# Lazarus dependency
sudo apt install make binutils build-essential gdb subversion zip unzip libx11-dev libgtk2.0-dev libgdk-pixbuf2.0-dev libcairo2-dev libpango1.0-dev
wget fpcupdeluxe aarch64 and be happy
Yes rdp tcp port 3389 is not public I use ssh port forwarding to my PC
Linux: ssh -L 3389:localhost:3389 -i mykey ubuntu@myip
Windows: Configure Putty-Connection.SSH.Tunnels
You can publish any port in OCI config and Ubuntu FW, not recommended for rdp :-)
Offline
If we use the external SQlite3 library, I have almost 100% green, and amazing performance for ARM - much faster than on Raspberry or mobile CPUs.
https://gist.github.com/synopse/60fe59d … b5d6294c39
Numbers are at the level of the Core i5 on my laptop.
Remains only problems at SOA level, which we still need to investigate.
Online
I spend hours fighting against the Lazarus debugger, with no luck to discover what is wrong with processing EchoRecord.
Fighting is right term, because GDB is just crashing and crashing, on whatever Dwarf symbol level we use, and even on -O0.
If I replace the returned TServiceComplexCalculator.EchoRecord from the result to an "out" parameter, then there if no GPF any more.
So I guess we could just reject the use of such result on Aarch64 and requires an "out" parameter instead.
Online
As an in-between test, you could add the EchoRecord function into mORMot1.18 and see if it works with that version.
(skip this post if it is already there)
Last edited by AOG (2021-08-03 18:40:37)
Offline
From the book:
Otherwise, the caller shall reserve a block of memory of sufficient size and alignment to hold the result. The
address of the memory block shall be passed as an additional argument to the function in x8. The callee may
modify the result memory block at any point during the execution of the subroutine (there is no requirement for
the callee to preserve the value stored in x8).
From the assembler:
TEST.SOA.CORE$_$TSERVICECOMPLEXCALCULATOR_$__$$_ECHORECORD$TCONSULTANAV$$TCONSULTANAV:
// [548] begin
stp x29,x30,[sp, #-16]!
mov x29,sp
// Var $self located in register x0
// Var $result located in register x8
It seems we have to take care of results passed in x8 !
Offline
What I don't understand is that https://github.com/ARM-software/abi-aa/ … r-passings states that any result > 16 bytes should move to x8 and not use regular registers.
But TServiceCustomAnswer is 24 bytes on AARCH64 and works in regular registers., whereas the documentation would rather put it into x8..
Whereas TConsultVav is 26 bytes and seems to expect to be in x8...
The worse is that if TConsultaNav is modified to take only 24 bytes... it also fails, whereas TServiceCustomAnswer has no problem...
And if I define it as such:
TConsultaNav = packed record
Header: RawUtf8;
Content: RawByteString;
Status: cardinal;
//MaxRows, Row0, RowCount: int64;
//IsSqlUpdateBack, EOF: boolean;
end;
still 24 bytes but it is passed as registers!
I am lost - perhaps it is a FPC bug?
Online
OK.
This is not a FPC bug, this is a FPC feature...
procedure tcpuparamanager.create_paraloc_info_intern(p : tabstractprocdef; side: tcallercallee; paras: tparalist; isvariadic: boolean);
var
hp: tparavarsym;
i: longint;
begin
for i:=0 to paras.count-1 do
begin
hp:=tparavarsym(paras[i]);
{ hidden function result parameter is passed in X8 (doesn't have to
be valid on return) according to the ABI
-- don't follow the ABI for managed types, because
a) they are passed in registers as parameters, so we should also
return them in a register to be ABI-compliant (which we can't
because the entire compiler is built around the idea that
they are returned by reference, for ref-counting performance
and Delphi-compatibility reasons)
b) there are hacks in the system unit that expect that you can
call
function f: com_interface;
as
procedure p(out o: obj);
That can only work in case we do not use x8 to return them
from the function, but the regular first parameter register.
As the ABI says this behaviour is ok for C++ classes with a
non-trivial copy constructor or destructor, it seems reasonable
for us to do this for managed types as well.}
At least it is clearly documented as such.
Online
Nice find. This is good to know ! Lets try to use this "feature" ...
Offline
Note that Delphi has another way of processing parameters:
if IsManaged(FReturnType) or ((FReturnType^.Kind in [tkRecord, tkMRecord]) and
(GetTypeData(FReturnType)^.RecSize > 16)) then
begin
FResultLoc := TParamLoc.Create(FReturnType, True, False, True);
// In this case, ARM64 will use X8
FResultLoc.FOffset := $10000008;
end;
So it uses x8 every time for managed parameters, whereas FPC never for managed parameters...
Online
I have made a lot of enhancements to the AARCH64 support, on the Ampere/Oracle server.
Now tests all pass with static linked SQlite3.
Please see the regression tests console output: https://gist.github.com/synopse/256190a … ed99f07da6
I have included optimized ARMv8 SIMD asm for crc32c, aes and sha256.
Numbers are even better than our optimized Intel/AMD SSE4.2/AES-NI code running on my Core i5 CPU: 20GB/s for crc32c, more than 1GB/s for AES, 1.5GB/s for SHA-256.
This Ampere CPU is clearly something, in respect to other ARM CPUs I tested.
The good news is that those ARMv8 optimizations should benefit on other CPUs, e.g. on the Android platform or Apple M1 laptops.
Note that I still need to publish the associated static .o files.
They will include the latest SQlite 3 3.36.0 library on all targets.
Online
I just wrote a blog article about this experiment, and how we were pleased with this Ampere CPU and the Oracle Cloud platform.
Any feedback about https://blog.synopse.info/?post/2021/08 … AARM64-CPU
is welcome on this forum thread.
Online
Great finding and Great job! However, my visa credit card failed to register...
Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.
Offline
And that's why I offered to help spending their money by consuming their always-free VM
Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.
Offline
I just wrote a blog article about this experiment, and how we were pleased with this Ampere CPU and the Oracle Cloud platform.
Any feedback about https://blog.synopse.info/?post/2021/08 … AARM64-CPU
is welcome on this forum thread.
If you need support for further opcodes in the inline assembler then please report them. FPK for example added some more in fb7cdbef.
Also regarding GDB: someone could try to get FpDebug to support Aarch64, too, as that appears to more suitable and stable on e.g. the x86 targets as well.
Free Pascal Compiler Core developer
Offline
@PascalDragon
There are a lot of opcodes missing. Not only a few one.
Currently, I am happy about our low level C use, which good very good performance results, and are easy to maintain.
Yes FPDebug may help for sure, as it does on Intel/AMD.
Online
@PascalDragon
There are a lot of opcodes missing. Not only a few one.
Adding a bug report wouldn't be a bad idea nevertheless as this way we know that there is a need for more (and it won't get lost as easily).
Free Pascal Compiler Core developer
Offline
Pages: 1