#1 2021-12-09 05:44:39

uian2000
Member
Registered: 2014-05-06
Posts: 69

Two issues about TZipRead

Hi, ab

I'm working on a project extracting zipped files online and found two issues about TZipRead.

1. TZipRead demonds more too big WorkMem to extract content files when fileinfo is stored in DataDiscriptor.
I must assign WorkingMem to filesize to make it run, even the half will fail.

TZipRead.Create(BufZip: PByteArray; Size: PtrInt; Offset: Int64);
//  ...
    if e^.localoffs >= Offset then
    begin
      // can unzip directly from existing memory buffer
      e^.local := @BufZip[Int64(e^.localoffs) - Offset];
      with e^.local^.fileInfo do
        if flags and FLAG_DATADESCRIPTOR <> 0 then
          // crc+sizes in "data descriptor" -> call RetrieveFileInfo()
          if (zcrc32 <> 0) or
             (zzipSize <> 0) or
             (zfullSize <> 0) then
            raise ESynZip.CreateUtf8('%.Create: data descriptor (MacOS) with ' +
              'sizes for % %', [self, e^.zipName, fFileName]);
//  ...

In constructor, BuffZip must contain even the first local info to setup Entry.local, else we must call RetrieveFileInfo to get local.

function TZipRead.RetrieveFileInfo(Index: integer;
  out Info: TFileInfoFull): boolean;
//  ...
  if e^.local = nil then
  begin
    local.DataSeek(fSource, e^.localoffs + fSourceOffset);
    if local.fileInfo.flags and FLAG_DATADESCRIPTOR <> 0 then
      raise ESynZip.CreateUtf8('%: increase WorkingMem for data descriptor ' +
        '(MacOS) support on % %', [self, e^.zipName, fFileName]);
    Info.localfileheadersize := local.Size;
  end
  else
  begin
    Info.localfileheadersize := e^.local^.Size;
    if e^.local^.fileInfo.flags and FLAG_DATADESCRIPTOR <> 0 then
//  ...

But in RetrieveFileInfo() Exception will be raised because Entry.local equals nil!

Maybe we should try to setup Entry.local first because we just skipped this step in the constructor?

2.Sometimes charset of filename is not setup correctly in zip files, in that case TZipRead.NameToIndex will not work well.
Can I specify a default encoding type when I open a file, and use this default encoding type (such as UTF8) instead of the OemToFileName when ansi7 detection fails?

Best regards.

Last edited by uian2000 (2021-12-09 05:45:22)

Offline

#2 2021-12-09 08:05:55

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Two issues about TZipRead

1. This is because the ZIP was created on Mac, I guess.
There is a limitation about those files.
Can you propose a pull request?

2. We had a lot of discussion and some fixes about charset of filenames.
It appears that it is not very well defined, and sometimes some files do not follow the APPNOTE.
Check e.g. https://synopse.info/forum/viewtopic.php?id=6052
What do you propose?

Offline

#3 2021-12-09 09:49:19

uian2000
Member
Registered: 2014-05-06
Posts: 69

Re: Two issues about TZipRead

1.I'll do some test and make a pr if I could fix it.

2.For TZipRead only, I think add a FaverEncode param in constructor might be a good option.
Most of  times, one zip file is built with one single Charset, so let the user fix unstanderd files dose make sense.

Offline

#4 2021-12-25 10:05:20

uian2000
Member
Registered: 2014-05-06
Posts: 69

Re: Two issues about TZipRead

I've made a pr https://github.com/synopse/mORMot2/pull/69, see if it works.

regards.

Offline

#5 2021-12-25 17:55:35

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Two issues about TZipRead

Thanks

I will look into it in the next dans!

Offline

#6 2021-12-28 09:58:19

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Two issues about TZipRead

Please try https://github.com/synopse/mORMot2/comm … 2f64b38f1c

I tried to reduce the memory consumption.
The pull request read the whole file into memory just to read the last few bytes, which may be resource consuming.

Offline

#7 2021-12-28 14:14:30

uian2000
Member
Registered: 2014-05-06
Posts: 69

Re: Two issues about TZipRead

I didn't find a good size to reduce mem, I'll try this one.

Thanks ab. wink

Offline

#8 2021-12-28 16:57:24

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Two issues about TZipRead

Your feedback issues should be fixed by https://github.com/synopse/mORMot2/commit/316f3c01

Offline

#9 2021-12-29 11:30:36

uian2000
Member
Registered: 2014-05-06
Posts: 69

Re: Two issues about TZipRead

Hi, ab.

Thanks for your fix, that's efficent and do works for me.

I've digging a new issue.

According to TZipRead.Create(Buf...), directory is not count as Entry.

constructor TZipRead.Create(BufZip: PByteArray; Size: PtrInt; Offset: Int64);
...
    if P[-1] = fZipNamePathDelim then
    begin
      h := hnext;
      continue; // ignore void folder entry
    end;
...

But, when we need to search a data descriptor before a directory, the result will be descriptor of this directory not that file.

[local file header n] (file n) <-- Entry[n].localoff
[zipped file data n]
[data descriptor n]
[local file header n+1] (directory after target file)
[zipped file data n+1]
[data descriptor n+1]
[local file header n+2] (file under nearby directory)  <-- Entry[n+1].localoff
[zipped file data n+2]
[data descriptor n+2]

In this case RetrieveFileInfo will return false.

Regards wink

Last edited by uian2000 (2021-12-29 11:38:21)

Offline

#10 2021-12-29 16:15:19

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Two issues about TZipRead

I thought a directory has no data, so no descriptor.

Anyway, I have tried to fix MacOS / DataDescriptor ZIP with folders in https://github.com/synopse/mORMot2/commit/c019517d

Offline

#11 2022-01-02 10:13:16

uian2000
Member
Registered: 2014-05-06
Posts: 69

Re: Two issues about TZipRead

I have tried this commit, and it truely worked.
Thanks for your great work! smile

Last edited by uian2000 (2022-01-02 10:14:57)

Offline

Board footer

Powered by FluxBB