#1 2022-04-28 15:47:57

rvk
Member
Registered: 2022-04-14
Posts: 47

Removing unwanted data from embedded subset fonts in PDF

Well, today I also got to tackle the compression of the fonts in SynPdf.

In my previous post I thought I found a possible solution with the TTFCFP_FLAGS_COMPRESS flag. But it turns out I put that flag in the wrong place. I used it with usSubsetFormat and not the usFlags where it should be. When using it in usSybsetFormat the $0002 will act as TTFCFP_DELTA to create incremental characters. But it did show how much initial data was getting into the PDF because the actual characters just make up 15KB (instead of 60KB).

So I went to investigate the TTF data from CreateFontPackage for the SegoeScript. It contains A LOT of garbage. Copyright notices, certificates etc. I don't think they are really needed for embedding. So I needed to see what makes a TTF tick.

As I see it now... TTF fonts are made up of tables. Result from CreateFontPackage for the SegoeScript Subset ("Hello World").

==== Table directory
Version: 1.0, number of tables: 21
Name     Offset     Length
DSIG     160736       7620
GDEF      79464         90
GPOS      79556      26766
GSUB     106324      54340
LTSH       5800       1944
OS/2        472         96
cmap      54408        608
cvt       57892        476
fpgm      55016       2384
gasp      79448         16
glyf      58368      14488
hdmx       7744      46664
head        348         54
hhea        404         36
hmtx        568       5232
loca      72856       3882
maxp        440         32
meta     160664         72
name      76740       2675
post      79416         32
prep      57400        490

For embedding a font in a PDF we actually only need the following 10 tables.
cvt (476), fpgm (2384), prep (490), head (54), hhea (36), maxp (32), hmtx (5232), cmap (608), loca (3882) and glyf (14488).
Thats 27.682 instead of 167.997 (before compression). I got this information from here. (I hope that is all correct)

I think 'name' and 'post' are not required for embedding. They ARE required for saving a .ttf file (but we are not doing that).

So... let's go stripping. We don't need to do the subsetting ourselves because the CreateFontPackage has already done that. We just need to remove the unwanted tables.

This is my final result. Calling code:

// subset was created successfully -> save to PDF file
SetString(TTF, SubSetData, SubSetSize);
FreeMem(SubSetData);

if fDoc.fEmbeddedSubsetCleanup then // I added this one to the interface
  CleanUpSubsetTTFTables(TTF);

// this is from the other topic to mark the fonts as subset correctly
Prefix := '';
if System.RandSeed = 0 then Randomize; // only call when needed
for i := 1 to 6 do Prefix := Prefix + Chr(65 + Random(26));
Prefix := Prefix + '+';
if fFontDescriptor.ValueByName('FontName') <> nil then
    TPdfName(fFontDescriptor.ValueByName('FontName')).Value := Prefix + TPdfName(fFontDescriptor.ValueByName('FontName')).Value;
if Data.ValueByName('BaseFont') <> nil then
    TPdfName(Data.ValueByName('BaseFont')).Value := Prefix + TPdfName(Data.ValueByName('BaseFont')).Value;

The CleanUpSubsetTTFTables looks like this. It takes the TTF string, reads it into a TMemoryStream and only outputs the tables we actually want back into TTF.

// ============================================

type
  TByte2 = array [0 .. 1] of byte; // 16-bit
  TByte4 = array [0 .. 3] of byte; // 32-bit

function WordToBytes(const Data: word): TByte2;
begin
  Result[0] := (Data shr 8) and 255;
  Result[1] := Data and 255;
end;

function CardinalToBytes(const Data: Cardinal): TByte4;
begin
  Result[0] := (Data shr 24) and 255;
  Result[1] := (Data shr 16) and 255;
  Result[2] := (Data shr 8) and 255;
  Result[3] := Data and 255;
end;

function BytesToWord(const Data: TByte2): word;
begin
  Result := (Data[0] * 256) + Data[1];
end;

function BytesToCardinal(const Data: TByte4): Cardinal;
begin
  Result := (Data[0] * 16777216) + (Data[1] * 65536) + (Data[2] * 256) + Data[3];
end;

type
  recTableDirectory = record
    sfntVersion: TByte4; // 0x00010000 for version 1.0
    numTables: TByte2; // number of tables
    searchRange: TByte2; // (Maximum power of 2 <= NumTables) x 16
    entrySelector: TByte2; // Log2(maximum power of 2 <= NumTables
    rangeShift: TByte2; // NumTables x 16 - SearchRange
  end;

  recTableEntry = record
    Tag: array [0 .. 3] of AnsiChar; // table identifier
    CheckSum: TByte4; // checksum for this table
    offset: TByte4; // offset from start of font file
    length: TByte4; // length of this table
  end;

  recTableData = TBytes;

procedure CleanUpSubsetTTFTables(var TTF: PDFString);
const
  TablesWeWant: array [0 .. 9] of AnsiString =
    ('cvt ', 'fpgm', 'prep', 'head', 'hhea', 'maxp', 'hmtx', 'cmap', 'loca', 'glyf');
  // 'name', 'post' are not needed for embedding, they are needed for a .ttf file
var
  Input: TMemoryStream;
  Output: TMemoryStream;
  TD: recTableDirectory;
  FontEntries: array of recTableEntry;
  FontData: array of recTableData;
  numTables: word;
  i, j: integer;
  Off, Len: Cardinal;
begin
  Input := TMemoryStream.Create;
  Output := TMemoryStream.Create;
  try
    Input.Write(TTF[1], length(TTF));
    Input.Position := 0;
    Input.Read(TD, SizeOf(TD));
    numTables := BytesToWord(TD.numTables);
    SetLength(FontEntries, numTables);
    SetLength(FontData, numTables);
    Input.Read(FontEntries[0], numTables * SizeOf(recTableEntry));
    for i := 0 to numTables - 1 do
    begin
      Off := BytesToCardinal(FontEntries[i].offset);
      Len := BytesToCardinal(FontEntries[i].length);
      Input.Position := Off;
      SetLength(FontData[i], Len);
      Input.Read(FontData[i], Len);
    end;
    for i := numTables - 1 downto 0 do
    begin
      if not MatchStr(FontEntries[i].Tag, TablesWeWant) then
      begin
        for j := i + 1 to numTables - 1 do FontEntries[j - 1] := FontEntries[j];
        for j := i + 1 to numTables - 1 do FontData[j - 1] := FontData[j];
        dec(numTables);
        SetLength(FontEntries, numTables);
        SetLength(FontData, numTables);
      end;
    end;
    Output.Position := SizeOf(TD) + numTables * SizeOf(recTableEntry); // always on 4 byte boundary
    for i := 0 to numTables - 1 do
    begin
      Off := Output.Position;
      FontEntries[i].offset := CardinalToBytes(Off);
      Len := BytesToCardinal(FontEntries[i].length);
      Output.Write(FontData[i], Len);
      Off := 0;
      while (Output.Position mod 4 <> 0) do Output.Write(Off, 1); // align on 4 bytes boundary
    end;
    TD.numTables := WordToBytes(numTables);
    System.Move(TD, (PByte(Output.Memory))^, SizeOf(TD));
    System.Move(FontEntries[0], (PByte(Output.Memory) + SizeOf(TD))^, numTables * SizeOf(recTableEntry));
    SetString(TTF, PAnsiChar(Output.Memory), Output.size);
  finally
    Output.Free;
    Input.Free;
  end;
end;


// ============================================

Testing code:

procedure MakePdfSynPdf;
var
  FileTemp: string;
  Doc: TPdfDocumentGDI;
  // Page: TPdfPage;
begin
  // if CheckC39 then; // For testing I installed this font for current users

  FileTemp := 'C:\Temp\Test2.pdf';
  Doc := TPdfDocumentGDI.Create;
  try
    Doc.GeneratePDF15File := true; // kleiner
    Doc.EmbeddedTTF := true;
    Doc.EmbeddedTTFIgnore.Text := MSWINDOWS_DEFAULT_FONTS;
    Doc.EmbeddedWholeTTF := false;
    Doc.EmbeddedSubsetCleanup := false;
    Doc.Root.PageLayout := plSinglePage;
    Doc.NewDoc;
    { Page := } Doc.AddPage;

    Doc.VCLCanvas.TextOut(40, 40, 'Test1');
    Doc.VCLCanvas.TextOut(60, 60, 'Test2');

    Doc.VCLCanvas.Font.Name := 'Code 3 de 9';
    Doc.VCLCanvas.Font.size := 24;
    Doc.VCLCanvas.TextOut(80, 80, '*123456789*'); // blocks

    Doc.VCLCanvas.Font.Name := 'Code 128';
    Doc.VCLCanvas.Font.size := 24;
    Doc.VCLCanvas.TextOut(120, 120, '*123456789*'); // blocks

    Doc.VCLCanvas.Font.Name := 'KIX Barcode';
    Doc.VCLCanvas.Font.size := 12;
    Doc.VCLCanvas.TextOut(160, 160, '5569LB33'); // correct

    Doc.VCLCanvas.Font.Name := 'Segoe Script';
    Doc.VCLCanvas.Font.size := 14;
    Doc.VCLCanvas.TextOut(190, 190, 'Hello World'); // correct

    Doc.SaveToFile(FileTemp);
    // ExecAssociatedApp(FileTemp);
  finally
      Doc.Free;
  end;

  FileTemp := 'C:\Temp\Test3.pdf';
  Doc := TPdfDocumentGDI.Create;
  try
    Doc.GeneratePDF15File := true; // kleiner
    Doc.EmbeddedTTF := true;
    Doc.EmbeddedTTFIgnore.Text := MSWINDOWS_DEFAULT_FONTS;
    Doc.EmbeddedWholeTTF := false;
    Doc.EmbeddedSubsetCleanup := true;
    Doc.Root.PageLayout := plSinglePage;
    Doc.NewDoc;
    { Page := } Doc.AddPage;

    Doc.VCLCanvas.TextOut(40, 40, 'Test1');
    Doc.VCLCanvas.TextOut(60, 60, 'Test2');

    Doc.VCLCanvas.Font.Name := 'Code 3 de 9';
    Doc.VCLCanvas.Font.size := 24;
    Doc.VCLCanvas.TextOut(80, 80, '*123456789*'); // blocks

    Doc.VCLCanvas.Font.Name := 'Code 128';
    Doc.VCLCanvas.Font.size := 24;
    Doc.VCLCanvas.TextOut(120, 120, '*123456789*'); // blocks

    Doc.VCLCanvas.Font.Name := 'KIX Barcode';
    Doc.VCLCanvas.Font.size := 12;
    Doc.VCLCanvas.TextOut(160, 160, '5569LB33'); // correct

    Doc.VCLCanvas.Font.Name := 'Segoe Script';
    Doc.VCLCanvas.Font.size := 14;
    Doc.VCLCanvas.TextOut(190, 190, 'Hello World'); // correct

    Doc.SaveToFile(FileTemp);
    // ExecAssociatedApp(FileTemp);
  finally
      Doc.Free;
  end;

end;

Result with EmbeddedSubsetCleanup false is 60KB (thanks to the fixed EmbeddedWholeTTF otherwise it was 352KB)
Result with EmbeddedSubsetCleanup true is 15KB smile smile

(Both PDF's seem to be correct in Adobe reader)

I hope this is all correct and it would need to be thoroughly tested (with multiple fonts) but with the option EmbeddedSubsetCleanup default as false it couldn't hurt either.

Offline

#2 2022-04-29 12:50:12

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 13,504
Website

Re: Removing unwanted data from embedded subset fonts in PDF

This is great work!

I have refactored your proposal to include a proper trailing header, and also recompute the checksum.
It works just fine on my side.

Please try https://github.com/synopse/mORMot/commi … b00bf31efe

Offline

#3 2022-04-30 21:32:08

rvk
Member
Registered: 2022-04-14
Posts: 47

Re: Removing unwanted data from embedded subset fonts in PDF

Yes. The ReduceTTF does a good job as far as I can see for now smile
Thanks.

Offline

Board footer

Powered by FluxBB