#1 2021-11-16 16:06:53

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,539
Website

SynZip with UTF8 file names inside archive

current SynZip implementation always sets an "operating system of origin"
- in mORMot1 to 0 (DOS)
- in mORMot2 to ZIP_OS constant what depends on OS compilation target), but for windows target - 0

If "operating system of origin" is 0, even in case UFT8 flag is sets for file name, most of clients ( verified on Win with winrar, 7zip, explorer and on Linux with zipinfo tool) displays wrong file name (ignores UTF8 bit and display as ASCII)

I made a PR in mORMot1 - https://github.com/synopse/mORMot/pull/414 what force OS to Unix, in mORMot2 I propose to set ZIP_OS to 3 for Windows

Offline

#2 2021-11-16 18:50:25

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,539
Website

Re: SynZip with UTF8 file names inside archive

I attach 2 files to PR, both with cyrillic file names inside, one before fix and one - after fix

Offline

#3 2021-11-16 21:17:13

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,205
Website

Re: SynZip with UTF8 file names inside archive

My guess is that 10 = Windows NTFS should be used instead.
See https://pkware.cachefly.net/webdocs/cas … PPNOTE.TXT

4.4.2.2 The current mappings are:

         0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
         1 - Amiga                     2 - OpenVMS
         3 - UNIX                      4 - VM/CMS
         5 - Atari ST                  6 - OS/2 H.P.F.S.
         7 - Macintosh                 8 - Z-System
         9 - CP/M                     10 - Windows NTFS
        11 - MVS (OS/390 - Z/OS)      12 - VSE
        13 - Acorn Risc               14 - VFAT
        15 - alternate MVS            16 - BeOS
        17 - Tandem                   18 - OS/400
        19 - OS X (Darwin)            20 thru 255 - unused

Offline

#4 2021-11-16 21:22:27

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,205
Website

Re: SynZip with UTF8 file names inside archive

I have merged your pull request for mORMot 1.

And set ZIP_OS = 10 on Windows for mORMot 2.

Offline

#5 2021-11-17 11:16:44

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,539
Website

Re: SynZip with UTF8 file names inside archive

It's strange, but when I sets madeBy flag to $0A14 then zipinfo -v fileName.zip shows OS as TOPS-20 (even don't know before what such OS exists), not an NTFS as in spec, but Unicode names works in this case

$ zipinfo -v utf8.zip 
....

Central directory entry #1:
---------------------------

  папка/новыйФайл.txt

  offset of local header from start of archive:   0
  file system or operating system of origin:      TOPS-20
  version of encoding software:                   2.0
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT

From zipinfo sources https://github.com/LuaDist/unzip/blob/m … fo.c#L1003 for NTFS 11 ($0B) is used (the same zipinfo.c is in apple repository)

If I sets madeBy flag to $0B14 zipinfo show OS as NTFS but filename is broken

Central directory entry #1:
---------------------------

  ╨┐╨░╨┐╨║╨░/╨╜╨╛╨▓╤Л╨╣╨д╨░╨╣╨╗.txt

  offset of local header from start of archive:   0
  file system or operating system of origin:      NTFS
  version of encoding software:                   2.0
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT

So I really don't know where is a source of truth.

P.S.
$03 - Unix works as expected for both OS info and file name encoding

Last edited by mpv (2021-11-17 11:20:32)

Offline

#6 2021-11-17 13:57:37

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,205
Website

Re: SynZip with UTF8 file names inside archive

What does WinZip emit as version madeBy flag value?

Offline

#7 2023-03-22 18:56:52

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,539
Website

Re: SynZip with UTF8 file names inside archive

Another patch for non-english file names in zip - see mORMot1 MR444. We discover what build-in zip on latest Windows10 Home put such names into Unicode Path Extra Field.

Good to be ported also in mORMot2

Last edited by mpv (2023-03-22 18:58:04)

Offline

#8 2023-03-23 13:14:16

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,539
Website

Re: SynZip with UTF8 file names inside archive

I attach test zip file into https://github.com/synopse/mORMot/pull/444

Offline

#9 2023-03-23 20:43:05

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,205
Website

Re: SynZip with UTF8 file names inside archive

Yes, I have made a similar modification in mORMot 2.
But diverse because we need to support also Zip64 which is also in an extended file info extra block.
https://github.com/synopse/mORMot2/commit/76ca69e1c3a78

Offline

Board footer

Powered by FluxBB