#1 2018-10-22 11:35:46

DanielTr
Member
Registered: 2018-10-22
Posts: 5

Saving a PDF document twice

Hi,

I was just trying to create several iterations of a PDF document, saving the document to file at times. E.g.:

fPDF := TPdfDocumentGDI.Create;
{Create some PDF content}
fPDF.SaveToFile('file1.pdf');
{Do something with fPDF, like adding a page}
fPDF.SaveToFile('file2.pdf');

Other than I would have expected, the two files are not equal (nor usable). file1.pdf is valid and can be opened, file2 is only 1kB in size and can not be opened.

Is this expected (and if so, why)? Or ist there any other way to create a correct copy of the current document?

Thanks in advance!

Daniel

Last edited by DanielTr (2019-01-14 10:20:02)

Offline

#2 2018-10-22 12:23:23

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: Saving a PDF document twice

IIRC the saving feature is expected to be run once.

The ground reason is that the save process works per-page, and for each page memory is released ASAP.
It allows to generate a pdf of thousands of pages.
I guess this may have induced a bug in SaveToFile() successive calls.

Offline

#3 2018-10-23 10:45:17

DanielTr
Member
Registered: 2018-10-22
Posts: 5

Re: Saving a PDF document twice

Thank you for the swift response!

I see the point in trying to release the memory as soon as possible, but I don't think this handling would match with the idea of saving only once. You have to have all of the pages to be ready at once before saving and as such, the required memory grows linear with the number of pages. Releasing the memory after saving wouldn't reduce the peak memory, but only result in an actually unusable instance of the document.

Is there any chance this behavior will be changed (fixed) in the future?

Offline

#4 2018-10-23 15:05:56

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: Saving a PDF document twice

It is not so simple.

If you look how the code works, you will see that TPdfPageGDI.FlushVCLCanvas writes the content of each page, and compress them as individuals buffer in memory.
Then the resulting pdf content is also flushed to disk by chunks.
Total memory needed by TPdfDocumentGDI.SaveToStream is therefore reduced by this iterative decompression/flush pattern.

Offline

#5 2018-11-14 10:13:03

DanielTr
Member
Registered: 2018-10-22
Posts: 5

Re: Saving a PDF document twice

Hi .. sorry I couldn't get back to you earlier.

Yes, I saw that pages are handled each on its own when storing them. I still don't think this would/could save any memory ..

Please correct me where I'm wrong or comment:
- When creating a PDF Document, all pages are stored uncompressed at first
- When saving a PDF Document, all pages have to be available at once (saving is possible only once)
- Saving a PDF Document iterates over all pages, compresses them independently and writes the compressed data to stream/file
- The overall amount of memory required to save a PDF Document is thus always at least: [Number of pages] * [Size of an uncompressed page] + 1 * [Size of a compressed page]
- Releasing the memory for a page after (compressing and) writing it won't lower this number, it would just release the memory a little earlier (which may be too early, as for the object is broken afterwards as described above)

Offline

#6 2018-11-14 15:57:43

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: Saving a PDF document twice

No, pages can be stored as SynLZ-compressed TMetaFile content first.
Then they are uncompressed into TMetaFile binary, then rendered as pdf in a page stream which is compressed and written to the file stream. One by one.
At the end, the missing pdf objects and trailer are appended to the file stream.
So memory consumption is much lower than what you think.

Offline

#7 2018-11-14 16:20:02

DanielTr
Member
Registered: 2018-10-22
Posts: 5

Re: Saving a PDF document twice

Ok ..

ab wrote:

No, pages can be stored as SynLZ-compressed TMetaFile content first.

Is this compression applied on each separate page or on the whole document? If so, I would update one point of my question to:
- The overall amount of memory required to save a PDF Document is thus always at least: [Number of pages] * [Size of a compressed page] + 1 * [Size of a uncompressed/rendered page]

And (as before): Is there any chance this behavior will be changed (fixed) in the future, so saving a document won't result in an invalid, but seemingly working, object?

Offline

#8 2019-01-14 09:51:57

fs999
Member
Registered: 2014-06-25
Posts: 7

Re: Saving a PDF document twice

You could do it this way :

fPDF := TPdfDocumentGDI.Create;
{Create some PDF content}
fPDF.SaveToFile('file1.pdf');
CopyFile('file1.pdf', 'file2.pdf', false);

Offline

#9 2019-01-14 10:24:33

DanielTr
Member
Registered: 2018-10-22
Posts: 5

Re: Saving a PDF document twice

Hi fs999,

Thank you for the idea, but this would only create an identical copy of the pdf file. I need to continue working on the actual TPdfDocumentGDI object (fPdf) and save at different states (iterations) or (at least) on a copy of the object.

Offline

Board footer

Powered by FluxBB