You are not logged in.
Pages: 1
I believe that you briefly mentioned that automated PDF testing would be a great addition in order to avoid regressions with new code. Do you have any specific idea or requirements in mind (such as being compatible with both FPC and Delphi, being cross-platform, not using external tools...) ?
I thought about this a little and here is what I believe could work:
- Create a repository of trusted source to produce PDF documents for various features of SynPDF
- Use a tool such as ImageMagick to extract each of their pages as PNG images
- Automated tests compare the newly generated PNGs with source PNGs and fail if the difference is too big
Any thoughts ? Would you accept such a contribution ?
Offline
With PNG images and bitmap rastering from a vectorial format like PDF, it is very unlikely that you may not be able to make proper comparisons.
In mORMot 1, we have basic SynPDF validation using a simple fixed EMF input.
And even with this, we need to validate several hashes, depending on the system it renders on.
So in mORMot 2, we did not put any such basic test yet.
But we are open to any contribution, of course.
Offline
In follow up on the other topic... I found a directory here with lots of EMF files to test.
https://github.com/kakwa/libemf2svg/tre … ources/emf
When putting this in the bugreport directory from the other topic... and changing the following.
It will create all the pdf files in that emf directory.
There are lots of pdf that end up corrupt.
And others have strange result. 000 and 015 doesn't have the axes. and 040 also has lots of things wrong.
It indeed shows the need for automated test
BTW. comparing images extracted from pdf from a testrun with baseline images would be very hard I imagine. They would never be pixel perfect.
And before that... you would first need to fix all the problems that already exist with the above emf's before using these for regression testing
uses
mormot.ui.pdf, ShellApi, System.IOUtils, System.Types;
procedure TForm1.Button1Click(Sender: TObject);
procedure ProcessAllFilesInDirectory(const Directory: string);
var
Files: TStringDynArray;
FileName: string;
begin
if not TDirectory.Exists(Directory) then exit;
Files := TDirectory.GetFiles(Directory, '*.emf', TSearchOption.soAllDirectories);
for FileName in Files do
begin
Self.DoConvertMetafileToPdf(Filename, ChangeFileExt(Filename, '.pdf'));
end;
end;
begin
ProcessAllFilesInDirectory(ExtractFilePath(Application.ExeName) + 'emf');
//Self.DoConvertMetafileToPdf(ExtractFilePath(Application.ExeName) + 'bogus.wmf', ExtractFilePath(Application.ExeName) + 'bogus.pdf');
end;
test-040.pdf
test-015.pdf
Last edited by rvk (2024-05-15 10:53:09)
Offline
As a starting point, here is a Python script that I've created with the help of ChatGPT and here is what it does:
- It extracts all pages from a reference PDF and a generated PDF as PNG images
- It compares the number of pages and fail if different
- It compares each pages and for each of them, it outputs the difference as both a difference image, and a percentage
- It outputs the final result as a consistent and clear textual content for easy integration with automated tests
So I support we could create multiple small command line programs to produce PDFs using SynPDF and test most parts of the library, including MetaFiles conversion. Those programs generate the PDF in a path specified by arguments, so that they can be used to generate the reference PDFs at first (and update them if needed), and re-generate them in the correct folder during automated tests.
Then the Python script is called for each files in the reference folder and fails based on specific conditions.
Requirements: pip install PyMuPDF Pillow Wand numpy termcolor
Script:
import fitz # PyMuPDF
from PIL import Image, ImageChops
import numpy as np
import os
import shutil
from termcolor import colored
# Function to clear the content of a folder or create it if it does not exist
def clear_folder(folder):
if os.path.exists(folder):
shutil.rmtree(folder)
os.makedirs(folder)
# Function to convert PDF pages to PNG images and save them in the output folder
def convert_pdf_to_png(pdf_path, output_folder):
clear_folder(output_folder)
pdf_document = fitz.open(pdf_path)
for page_num in range(len(pdf_document)):
page = pdf_document.load_page(page_num)
pix = page.get_pixmap()
output_path = f"{output_folder}/page_{page_num + 1}.png"
pix.save(output_path)
pdf_document.close()
# Function to compare two images and save the difference image if specified
def compare_images(img1_path, img2_path, diff_img_path=None):
img1 = Image.open(img1_path).convert('RGB')
img2 = Image.open(img2_path).convert('RGB')
# Check if page sizes match
if img1.size != img2.size:
return False, 100.0, "Error: Page sizes do not match"
diff = ImageChops.difference(img1, img2)
# Save the difference image if a path is provided
if diff_img_path:
diff.save(diff_img_path)
np_diff = np.array(diff)
diff_count = np.count_nonzero(np_diff)
total_pixels = np_diff.size / 3 # Divide by 3 for RGB channels
diff_percentage = (diff_count / total_pixels) * 100
return diff_count == 0, diff_percentage, None
# Function to display the final result summary
def display_final_result_summary(all_match, total_diff_percentage, num_pages, page_results, error_message=None):
if error_message:
final_status = "NOT OK"
color = 'red'
avg_diff_percentage = 100.0
else:
avg_diff_percentage = total_diff_percentage / num_pages
final_status = "OK" if all_match else "Partial"
if any(status == "Error" for _, _, status in page_results):
final_status = "NOT OK"
color = 'red'
else:
color = 'green' if final_status == "OK" else 'yellow'
# Output final result summary
print("\nFinal result summary:")
print(colored(f"Average difference percentage: {avg_diff_percentage:.2f}%", color))
print(colored(f"Result: {final_status}", color))
if error_message:
print(colored(error_message, 'red'))
# Main function to handle the PDF comparison process
def main(reference_pdf, generated_pdf, output_folder):
# Check if the reference PDF exists
if not os.path.exists(reference_pdf):
error_message = f"Error: Reference PDF '{reference_pdf}' not found."
print(colored(error_message, 'red'))
display_final_result_summary(False, 0, 0, [], error_message)
return
# Check if the generated PDF exists
if not os.path.exists(generated_pdf):
error_message = f"Error: Generated PDF '{generated_pdf}' not found."
print(colored(error_message, 'red'))
display_final_result_summary(False, 0, 0, [], error_message)
return
# Define folders for reference, generated, and difference images
reference_folder = f"{output_folder}/reference"
generated_folder = f"{output_folder}/generated"
diff_folder = f"{output_folder}/differences"
# Clear or create the folders
clear_folder(reference_folder)
clear_folder(generated_folder)
clear_folder(diff_folder)
# Convert PDFs to PNG images
convert_pdf_to_png(reference_pdf, reference_folder)
convert_pdf_to_png(generated_pdf, generated_folder)
# Get the list of image files
reference_files = sorted([f"{reference_folder}/{file}" for file in os.listdir(reference_folder)])
generated_files = sorted([f"{generated_folder}/{file}" for file in os.listdir(generated_folder)])
# Check if the number of pages (images) match
if len(reference_files) != len(generated_files):
error_message = "Error: PDFs have a different number of pages."
print(colored(error_message, 'red'))
print(f"Reference PDF has {len(reference_files)} pages.")
print(f"Generated PDF has {len(generated_files)} pages.")
display_final_result_summary(False, 0, 0, [], error_message)
return
all_match = True
total_diff_percentage = 0
page_results = []
# Compare each page and collect results
for i, (ref_img, gen_img) in enumerate(zip(reference_files, generated_files)):
diff_img_path = f"{diff_folder}/diff_{os.path.basename(ref_img)}"
match, diff_percentage, error = compare_images(ref_img, gen_img, diff_img_path)
total_diff_percentage += diff_percentage
if error:
print(colored(f"Page {i + 1}: {error}", 'red'))
all_match = False
page_results.append((i + 1, diff_percentage, "Error"))
else:
page_status = "OK" if match else "Partial"
page_results.append((i + 1, diff_percentage, page_status))
if not match:
all_match = False
# Output page-by-page results
print("Page-by-page differences:")
for page_num, diff_percentage, status in page_results:
if status == "OK":
color = 'green'
elif status == "Partial":
color = 'yellow'
else:
color = 'red'
print(colored(f"Page {page_num}: {diff_percentage:.2f}% difference - {status}", color))
# Display final result summary
display_final_result_summary(all_match, total_diff_percentage, len(reference_files), page_results)
if __name__ == "__main__":
import sys
# Ensure the correct number of arguments are provided
if len(sys.argv) != 4:
print("Usage: python script.py <reference_pdf> <generated_pdf> <output_folder>")
sys.exit(1)
# Get the input arguments
reference_pdf = sys.argv[1]
generated_pdf = sys.argv[2]
output_folder = sys.argv[3]
# Create the output folder if it does not exist
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# Run the main function
main(reference_pdf, generated_pdf, output_folder)
Offline
Pages: 1