You are not logged in.
I'm trying to see why SynPdf files don't show embedded subset fonts as Subset in Adobe reader.
Looking at pdffonts from xpdf-tools you see that for Ghostscript (Temp0.pdf) the fonts are embedded but also Subsetted and Unicode.
According to the official PDF specs, embedded subset codes need to be proceeded by 6 random character followed by a + sign.
The fonts in Temp1.pdf from SynPdf are not marked as subset (and are also not embedded as unicode).
5.5.3 Font Subsets
PDF 1.1 permits documents to include subsets of Type 1 and TrueType fonts. The font and font descriptor that describe a font subset are slightly different from those of ordinary fonts. These differences allow an application to recognize font subsets and to merge documents containing different subsets of the same font. (For more information on font descriptors, see Section 5.7, “Font Descriptors.”) For a font subset, the PostScript name of the font —the value of the font’s BaseFont entry and the font descriptor’s FontName entry— begins with a tag followed by a plus sign (+). The tag consists of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets in the same PDF file must have different tags. For example, EOODIA+Poetica is the name of a subset of Poetica®, a Type 1 font. (See implementation note 63 in Appendix H.)
https://ghostscript.com/~robin/pdf_reference17.pdf
This one is from ghostscript:
S:\pdfs\xpdf-tools-win-4.02\bin32>pdffonts -loc c:\temp\Test0.pdf
name type emb sub uni prob object ID location
---------------------------------------------- ----------------- --- --- --- ---- --------- --------
RDZRPI+Code128 TrueType yes yes yes 12 0 embedded
UFQSLH+KIXBarcode TrueType yes yes yes 14 0 embedded
ZRSKVS+SegoeScript TrueType yes yes yes 16 0 embedded
UFQSLH+Tahoma TrueType yes yes yes 8 0 embedded
RDZRPI+Code3de9 TrueType yes yes yes 10 0 embedded
This one is from SynPdf (note the "no" in the sub column, and in Adobe reader there is also no Subset keyword):
S:\pdfs\xpdf-tools-win-4.02\bin32>pdffonts -loc c:\temp\Test1.pdf
name type emb sub uni prob object ID location
---------------------------------------------- ----------------- --- --- --- ---- --------- --------
Tahoma TrueType no no no 6 0 external: C:\WINDOWS\Fonts\tahoma.ttf
Code3de9 TrueType yes no no 8 0 embedded
Code128 TrueType yes no no 10 0 embedded
KIXBarcode TrueType yes no no 12 0 embedded
SegoeScript TrueType yes no no 14 0 embedded
So I hacked the code a little to add the random characters. Of course they should not collide with other fonts and I haven't implemented that but with this code they do show as Subset.
var
Prefix: AnsiString;
//...
if CreateFontPackage(pointer(ttf),ttfSize,
SubSetData,SubSetMem,SubSetSize,
usFlags,ttcIndex,TTFMFP_SUBSET,0,
TTFCFP_MS_PLATFORMID,TTFCFP_DONT_CARE,
pointer(Used.Values),Used.Count,
@lpfnAllocate,@lpfnReAllocate,@lpfnFree,nil)=0 then begin
// subset was created successfully -> save to PDF file
SetString(ttf,SubSetData,SubSetSize);
FreeMem(SubSetData);
// CleanUpSubsetTTFTables(TTF); // working on this, see future topic
//---------
Prefix := '';
if System.RandSeed = 0 then Randomize; // only call when needed
for i := 1 to 6 do Prefix := Prefix + Chr(65 + Random(26));
Prefix := Prefix + '+';
if fFontDescriptor.ValueByName('FontName') <> nil then
TPdfName(fFontDescriptor.ValueByName('FontName')).Value := Prefix + TPdfName(fFontDescriptor.ValueByName('FontName')).Value;
if Data.ValueByName('BaseFont') <> nil then
TPdfName(Data.ValueByName('BaseFont')).Value := Prefix + TPdfName(Data.ValueByName('BaseFont')).Value;
//---------
end;
Result (Adobe reader also shows it correctly now):
S:\pdfs\xpdf-tools-win-4.02\bin32>pdffonts -loc "c:\temp\test3.pdf"
name type emb sub uni prob object ID location
---------------------------------------------- ----------------- --- --- --- ---- --------- --------
Tahoma TrueType no no no 6 0 external: C:\WINDOWS\Fonts\tahoma.ttf
OQRQQB+Code3de9 TrueType yes yes no 8 0 embedded
UXMNNE+Code128 TrueType yes yes no 10 0 embedded
VXAITD+KIXBarcode TrueType yes yes no 12 0 embedded
CAOSJQ+SegoeScript TrueType yes yes no 14 0 embedded
I'm sure this bit of code can be much approved upon when officially integrated (or done in a completely other way)
Last edited by rvk (2022-04-28 14:30:47)
Offline
Perhaps https://synopse.info/fossil/info/8d158c3f61 is good enough.
Since we have a single subset per font, we can reuse the very same non-random prefix.
Offline
It does not seem to give a real error . I tried is with just 'ABCDEF+' for every font/subset and that worked too.
But the documentation states:
The tag consists of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets in the same PDF file must have different tags.
Or do you think they mean with different subsets, multiple subsets from the same font? That wouldn't really make much sense.
I need to find some tool online which check the validity of the PDF to make sure (but all the generators I've seen use really random characters for all subsets).
Edit: The original PDF from SynPDF (so all PDF's) give an error for this online validator https://www.datalogics.com/products/pdf … f-checker/ (Ghostscript ones are fine)
Edit #2: Ah. 1.3 files are fine.
Edit #3: This way you could also potentially have trouble if merging multiple SynPdf files with the same font but different subsets, I think
Last edited by rvk (2022-04-28 16:35:48)
Offline
I guess it is about the merger to ensure the subsets are compatible.
When using our generator, each font is processed exactly once, so it is safe to use "SUBSET+" as prefix.
Offline
I meant merging with another pdf with another tool which also merges embedded subset fonts.
But I'm not sure exactly why this requirement for random prefix is there.
It says random and unique over all subsets. Not just subsets per font. (It also says clearly that the prefix tag should be unique, so not the combination of prefix with fontname).
But I'll use it like this for now.
If I find a more clear source that this would be against specifications I will let you know.
Offline
Please check https://github.com/synopse/mORMot/commi … a661c973e9
Offline
I think there was still a small error in you previous change.
You had this:
// see 5.5.3 Font Subsets: begins with a tag followed by a +
TPdfName(fFontDescriptor.ValueByName('FontName')).AppendPrefix;
TPdfName(fFontDescriptor.ValueByName('BaseFont')).AppendPrefix; // <---- this line
But I think BaseFont isn't part of fFontDescriptor but of Data.
So it should be
TPdfName(Data.ValueByName('BaseFont')).AppendPrefix;
Otherwise BaseFont isn't found and isn't changed with prefix. And BOTH FontName AND BaseFont need to be prefixed.
(Adobe reader does show it as Subset when only FontName is prefixed but pdffonts.exe does not show it as subset.)
This was the result if BaseFont is not prefixed.
S:\pdfs\xpdf-tools-win-4.02\bin32>pdffonts -loc c:\temp\test2.pdf
name type emb sub uni prob object ID location
---------------------------------------------- ----------------- --- --- --- ---- --------- --------
Tahoma TrueType no no no 6 0 external: C:\WINDOWS\Fonts\tahoma.ttf
Code3de9 TrueType yes no no 8 0 embedded
Code128 TrueType yes no no 10 0 embedded
KIXBarcode TrueType yes no no 12 0 embedded
SegoeScript TrueType yes no no 14 0 embedded
When I change the line to
TPdfName(Data.ValueByName('BaseFont')).AppendPrefix;
I get
S:\pdfs\xpdf-tools-win-4.02\bin32>pdffonts -loc c:\temp\test2.pdf
name type emb sub uni prob object ID location
---------------------------------------------- ----------------- --- --- --- ---- --------- --------
Tahoma TrueType no no no 6 0 external: C:\WINDOWS\Fonts\tahoma.ttf
NFHHHJ+Code3de9 TrueType yes yes no 8 0 embedded
DABIJH+Code128 TrueType yes yes no 10 0 embedded
ANHKBL+KIXBarcode TrueType yes yes no 12 0 embedded
DCMDKN+SegoeScript TrueType yes yes no 14 0 embedded
And that one seems to be correct.
BTW. Nice idea to use random32 and just snip 4 bits off each time for the random 6 letters
(With svn update your changes did get merged nicely locally with my already changed fEmbeddedSubsetCleanup changes. Never tried that before )
Last edited by rvk (2022-04-29 22:05:03)
Offline
You are right!
Please try https://github.com/synopse/mORMot2/commit/c857b693
Offline
You are right!
Please try https://github.com/synopse/mORMot2/commit/c857b693
I'm using SynPDF. Not mORMot2. And SynPdf hasn't changed for me yet on github (still at revision 214).
I'll try again tomorrow and/or monday.
(The ReduceTTF is already in SynPdf trunk (r213) and seems to do a good job ).
Offline
Besides that the BaseFont still needs to be prefixed I also noticed something else with CID embedding.
In Adobe reader those embedded CID fonts are marked as SubSet but with pdffonts they are not.
S:\pdfs\xpdf-tools-win-4.02\bin32>pdffonts -loc c:\temp\test2.pdf
name type emb sub uni prob object ID location
---------------------------------------------- ----------------- --- --- --- ---- --------- --------
Arial TrueType no no no 6 0 external: C:\WINDOWS\Fonts\arial.ttf
Arial,BoldItalic TrueType no no no 8 0 external: C:\WINDOWS\Fonts\arialbi.ttf
GJGBGN+Code3de9 TrueType yes yes no 10 0 embedded
LJFFNH+Code128 TrueType yes yes no 12 0 embedded
LOPGJG+KIXBarcode TrueType yes yes no 14 0 embedded
EAMHGK+SegoeScript TrueType yes yes no 16 0 embedded
SegoeScript CID TrueType yes no yes 18 0 embedded // <---- this one is subset = no ??
(That last one doesn't get marked as subset)
Maybe they also need to be prefixed.
I'm not sure why it all seems to works correctly with even those fontnames not named correct but I thought I would mention it.
CID is a subset, isn't it? (Adobe says so)
Offline
ab wrote:You are right!
Please try https://github.com/synopse/mORMot2/commit/c857b693
I'm using SynPDF. Not mORMot2. And SynPdf hasn't changed for me yet on github (still at revision 214).
I'll try again tomorrow and/or monday.(The ReduceTTF is already in SynPdf trunk (r213) and seems to do a good job ).
FYI (and reminder). Changes of mormot.ui.pdf.pas from Revision 3316 haven't made it to SynPDF trunk yet.
(It still uses fFontDescriptor.ValueByName('BaseFont') instead of Data.ValueByName('BaseFont'))
Also, the subsetting of CID embedded fonts are also not prefixed (see post above).
Not a problems for me, but I thought I mention it before the changes grow further apart
(if this is something that usually takes more time you can forget this reminder)
Offline
About mORMot 1 backport of Data instead of fFontDescriptor bug:
https://synopse.info/fossil/info/a5e5d4c449
About subsetting of CID embedded fonts, I tried to fix it with the following:
https://synopse.info/fossil/info/1ba26a0223
but it was not successful. At least the 'BaseFont' name match for the Ansi and CID fonts, and for the descriptor.
I don't know what is required for CID fonts - I couldn't find anything in the official PDF reference manual.
Offline
About mORMot 1 backport of Data instead of fFontDescriptor bug:
https://synopse.info/fossil/info/a5e5d4c449
Yes, that seems to do the trick now.
About subsetting of CID embedded fonts, I tried to fix it with the following:
https://synopse.info/fossil/info/1ba26a0223
but it was not successful. At least the 'BaseFont' name match for the Ansi and CID fonts, and for the descriptor.
I don't know what is required for CID fonts - I couldn't find anything in the official PDF reference manual.
It's not a critical thing I guess. Adobe reader does mark both the TT and CID as Subset.
But Adobe doesn't really use the 6 random characters to differentiate between Subset and Full (it uses something else).
The pdffont gives this
S:\pdfs\xpdf-tools-win-4.02\bin32>pdffonts c:\temp\test3.pdf
name type emb sub uni prob object ID
---------------------------------------------- ----------------- --- --- --- ---- ---------
Arial TrueType no no no 6 0
Arial,BoldItalic TrueType no no no 8 0
ELKHBC+Code3de9 TrueType yes yes no 10 0
PPIMAN+Code128 TrueType yes yes no 12 0
JIGIKN+KIXBarcode TrueType yes yes no 14 0
OMPCCH+SegoeScript TrueType yes yes no 16 0
SegoeScript CID TrueType yes no yes 18 0
So the Subset TT are now indeed seen as sub.
The CID is not marked as sub although in Adobe it does.
So, it probably isn't a critical thing (it all seems to work correctly).
If I find more exact official documentation about CID I'll let you know.
Thanks.
Offline