#1 Re: Low level and performance » FPC call c library speed very slow on aarch64 linux » 2024-03-01 15:31:38

thanks, @mpv,  i generated an ASM file (- al - sh - st) using Lazarus. I have almost forgotten about the assembly,  sad and sad Please help me analyze and see what factors affect efficiency? 
There will indeed be issues on my machine as mentioned by subject top.
but uploading files on the forum seems to be prohibited?
my zip attachment inlcude:  asm file  + c library file + lpr file.
now, c library code is :

//----------------------------------
#include <stdint.h>
#include <time.h>
#include <stdio.h>

static const int8_t cabac_context_init_I[1024][2] = {}; //content not important

void for_loop_time()
{
    int ii, tmp, pre;

    clock_t t1 = clock();

    /* calculate pre-state */
    for( ii= 0; ii < 1024; ii++ ) {
     pre = 2*(((cabac_context_init_I[ii][0] * 35) >>4 ) + cabac_context_init_I[ii][1]) - 127;

        pre^= pre>>31;
        if(pre > 124)
            pre= 124 + (pre&1);

        tmp =  pre;
    }

    printf("for_loop_time space %d %d\n", (int)(clock() - t1), tmp);
}

#2 Low level and performance » FPC call c library speed very slow on aarch64 linux » 2024-03-01 08:16:20

vster
Replies: 2

The efficiency of FPC calling simple C libraries is very low.
My test code is as follows. You would say that the execution time should be very fast, even a few microseconds is enough,
But the reality is 35 microseconds.

1、c libray code,save as forloop.c

     //----------------------------------------------------------------------
     #include <stdint.h>
     #include <time.h>
     #include <stdio.h>
   
     void for_loop_time()
     {
         int i, tmp = 0;
         clock_t t1 = clock();

         for (i = 0; i < 1024; i++)
         {
             tmp += i;
         }
       
         printf("space %d clock_t value %d \n", (int)(clock() - t1), tmp); //print tmp to prohibit gcc optimize
     }
     //-------------------------------------------------------------------------

    use gcc compile, output forloop.o
    gcc -c ./forloop.c -O3

2、in lazarus,create simple program, static link forloop.o
   
    //----------------------------------------
    program testforloop;
   
    {$link ./forloop.o}
    {$linklib c}
   
    uses
       CTypes, sysutils;

    procedure for_loop_time; cdecl; external;

    begin
        for_loop_time;
    end.   
    //----------------------------------------
   
   run this fpc program, printed "35 clock_t" in my aarch64 linux , glibc version 2.28

#3 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-28 13:13:49

thanks, ab.
I will write the timing code in the ffmpeg c file and compile it into libraries such as libavcodec.so,
Then Lazarus demo loading the libavcodec.so library through loadlibrary and running it,
Print out the timing results. The testing process for QT and Lazarus is the same, The timing code has also been executed
but the timing result for QT is ok, Lazarus results are too slow.

#4 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-27 06:38:41

Further debugging, ff_h264_init_cabac_states mainly takes time on for loop

#5 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-27 05:54:42

Under qt, ff_h264_init_cabac_states costs 6 clock_t, while under fpc, it costs 36 clock_t.
@ab, @all,please help me, thanks

#6 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-27 05:49:20

Debugging the source code of ffmpeg avcodec_decode_video2, The files involved mainly include h264dec. c, h264_slice. c, and h264_cabac.c。
The main functions are h264_decode_frame ->decodedeal_units ->decode_slice.
The function that takes the longest time is decode_slice. Decode_slice Internal
Brief process: ff_init_cabac_decoder ->ff_h264_init_cabac_states->Ff_h264_decode_mbm_cabac.

In the FPC environment, ff_h264_init_cabac_states and ff_h264_decode_mbm_cabac
The execution efficiency of is only 1/6 of that in the QT environment.

Posting the source code of ff_h264_init_cabac_states will mainly take time on the for loop. Exactly, ff_h264_decode_mb_cabac is also in the
a large amount of for loop, because the h264 frame is composed of many macro blocks, ff_h264_decode_mb_cabac only decodes a single macro block.

void ff_h264_init_cabac_states(const H264Context *h, H264SliceContext *sl)
{
    int ii;
    const int8_t (*tab)[2];
    const int slice_qp = av_clip(sl->qscale - 6*(h->ps.sps->bit_depth_luma-8), 0, 51);

    if (sl->slice_type_nos == AV_PICTURE_TYPE_I)
          tab = cabac_context_init_I;
    else                               
          tab = cabac_context_init_PB[sl->cabac_init_idc];

    //calculate pre-state
    for( ii= 0; ii < 1024; ii++ ) {
        int pre = 2*(((tab[ii][0] * slice_qp) >>4 ) + tab[ii][1]) - 127;

        pre^= pre>>31;
        if(pre > 124)
            pre= 124 + (pre&1);

        sl->cabac_state[ii] =  pre;
    }
}

#7 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-20 09:33:24

It is confusing why QT calls FFMPEG C Library is OK?
my confidence in Pascal has wavered a bit

#8 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-20 09:21:47

By using gettickcount for simple timing, avcodec_decode_video2 took 70 milliseconds, mainly due to decoding H264

#9 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-20 09:17:18

thanks. ab. I raised this question in Lazarus forums before, but there is currently no result.
https://forum.lazarus.freepascal.org/in … 953.0.html

#10 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-20 03:44:10

the ffmpeg and sdl2 dynamic libraries are both loaded in FPC standard mode.
the key is that the avcodec.decode_video2 takes too long.

#11 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-20 03:22:40

I have some doubts about whether it is related to the compilation of FPC on arm64.
I feel that it has little to do with the Lazarus FPC version. Currently, I am using Lazarus 2.2.2+FPC 3.2.2

#12 Re: Low level and performance » decode video, Why is efficiency very low? » 2024-02-20 03:13:20

thank you, ab, I should explain more clearly.
1.   The target task is to use FPC to decode 1080P video files.
2.   The current issue I am facing is that on the aarch64 Linux platform based ubuntu, 
      the decoding efficiency of using FPC to call FFMPEG c library is too low, resulting 70 milliseconds per frame.
      As a comparison, I used QT Creator with the same FFMPEG c library and the same calling method, resulting in 10 milliseconds per second. This is ok.
3.   In my past work experience, I have mostly worked under Windows and Linux, both of which are X86 architectures.
      I used FPC to call FFMPEG c library and completed this task very smoothly.
4.   I placed the code in FTP://1: Test1234@121.40.151.139 /,Please refer to readme.txt
5.   If necessary, please let me know,I can open TeamViewer remotely control.

#13 Low level and performance » decode video, Why is efficiency very low? » 2024-02-19 08:45:34

vster
Replies: 13

Dear ab, gods.
I am using fpc to call ffmpeg c library on aarch64 Linux PC, aim to decode 1080p video.
Result: 70 milliseconds per frame.
Why is the efficiency so low? Normally, it should be 10 milliseconds per frame.
On the same machine, use qt creator to call ffmpeg c library to complete the same task,Result OK.
my pc is aarch64 linux based ubuntu.
If necessary, I can open TeamViewer remotely control.

#14 Re: mORMot 2 » sqlite build fails on aarch64 » 2023-11-06 15:02:34

@PierceNg, thanks. The version of ubuntu22.04 gblic may be higher than 2.27

#15 Re: mORMot 2 » sqlite build fails on aarch64 » 2023-11-06 14:59:18

@ab,  I do trust the same static libraries.
The current solution I have taken is to carefully upgrade Glibc from 2.27 to 2.28, as a result, the problem has disappeared.
Thank you, ab, for contributing such an excellent work.

#16 Re: mORMot 2 » sqlite build fails on aarch64 » 2023-11-02 09:40:28

I am on x86_64 ubuntu 14.04 (GLIBC 2.19) Compilation is normal.

But I need to statically link sqlite3. o on aarch64 Linux.
What should I do to work properly without reporting an error "undefined reference to ` fcntl64 '"

#17 mORMot 2 » sqlite build fails on aarch64 » 2023-11-02 09:31:30

vster
Replies: 5

I used TRestServerDB for mormot2, but compilation encountered “Undefined reference to ` fcntl64 '.”。

Error message:

Error:/home/nvidia/desktop/component/mORMot2-2.1. stable/mORMot2-2.1. stable/static/aarch64 Linux/../../static/aarch64 Linux/sqlite3. o: (. data+0x268): undefined reference to ` fcntl64 '.

My Env:

Machine: nvidia xiavier nx(aarch64-linux)

Operating system:  ubuntu18.04

Glibc version: ldd (Ubuntu GLIBC 2.27-3ubuntu1.5) 2.27

Development tools: freepascal, Lazarus

Mormot version: mORMot2-2.1.stable

I have checked the information and found that fcntl64 works on GLIBC 2.28.

But upgrading GLIBC may affect other programs on the machine.

Is there a way to avoid the occurrence of 'undefined reference to ` fcntl64'?

Or can sqlite3. o be linked to 'fcntl'?


Thank you!

Board footer

Powered by FluxBB