PDFExplorer Class
Properties Methods Events Config Settings Errors
The PDFExplorer class provides access to the low-level PDF document structure.
Syntax
class securepdf.PDFExplorer
Remarks
The PDFExplorer class can be used to inspect the internals of a PDF document and make changes to the low-level document structure.
Object Types and Document Structure
The PDF specification defines eight object types:
- Name
- String
- Real
- Integer
- Boolean
- Array
- Dictionary
- Stream
Before accessing individual objects with the class, it is important to understand how they are structured in the document. PDFExplorer aims to distinguish between the logical and physical representations of objects.
The logical representation is that a PDF document is a tree of objects that can be traversed to extract data. For example, every document contains a document catalog that references a next-level object /Pages, which in turn references individual pages via a /Kids array. So to get a page, you would first look for the /Root object in the document trailer, then proceed to its /Pages element, and then work with the /Kids array.
Then, there is the physical structure that consists of all the objects that constitute the document. Every object is recorded as either:
- A direct (in-place) object (e.g., /Numbers [1 2 3 888]),
- An indirect (numbered) object, or
- A reference to an indirect object (e.g., /Numbers 8 0 R).
Note that most heavy objects (such as streams and dictionaries) are recorded in PDF files as indirect objects, with other objects referencing them. An indirect object is a global object that is uniquely identified by its object number followed by its generation number (e.g., 1 0 obj).
Navigating the Document
To navigate the object tree, first provide the input document as a file (input_file), byte array (input_data), or stream (set_input_stream) and call the open method. This method will populate the root_objects properties with the existing objects in the document trailer, as the trailer is considered to be the root of the logical object tree. The keys in the document trailer will typically be /Size, /Info, /Root, /ID, and /Encrypt for encrypted documents.These objects can then be used as a starting point for the document tree navigation, which is done using the select method. This method and others operate the following syntax for specifying objects in the document:
- Slashes separate levels of hierarchy, like in file paths.
- The "root" slash (/) points to the document trailer dictionary.
- A path that does not start with a slash specifies an indirect object in the list of global numbered objects.
- The asterisk character (*) specifies all objects at the provided path.
Consider the following PDF document:
%PDF-1.4 %cmmt 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /MediaBox [ 0 0 612 792 ] /Resources << /ProcSet 4 0 R >> >> endobj 4 0 obj [ /PDF ] endobj xref 0 5 0000000000 65535 f 0000000015 00000 n 0000000065 00000 n 0000000125 00000 n 0000000234 00000 n trailer << /Size 5 /Root 1 0 R >> startxref 259 %%EOF
select would return the following results for the respective paths:
- / - a dictionary object that corresponds to the trailer dictionary.
- /Root - a dictionary object that corresponds to the dictionary at 1 0 obj, with its property set to Reference (as this is a reference to an indirect object).
- /Size - an integer object whose is 5 and is Direct (as this is a direct, in-place object).
- /Root/Type - a name object whose is Catalog ( = Direct).
- /Root/Pages - a dictionary object that corresponds to the dictionary at 2 0 obj ( = Reference).
- /Root/Pages/Kids - an array object ( = Direct).
- /Root/Pages/Kids[0] - a dictionary object that corresponds to the dictionary at 3 0 obj ( = Reference).
- /Root/Pages/Kids[0]/MediaBox - an array object with four integer elements ( = Direct).
- /Root/Pages/Kids[0]/MediaBox[2] - an integer object whose is 612 ( = Direct).
- 3 0 obj - a dictionary object that corresponds to the dictionary at 3 0 obj ( = Indirect).
- 3 0 obj/Type - a name object whose is Page ( = Direct).
- 3 0 obj/Parent - a dictionary object that corresponds to the dictionary at 2 0 obj ( = Reference).
Adding and Modifying Objects
The below sections contain instructions for adding and modifying each type of object. Note that each of the following Add* methods returns the path of the newly added object in the document, making it easy to access the PDFObject object later using the select method. These objects' values can then be adjusted to ensure the PDF document meets your requirements.Primitive Objects
A primitive object is a non-container object that represents a name, string, real (double), integer, or boolean
value. Primitive objects are typically stored in-place and referenced directly. Use the add_primitive method
to add a direct primitive object and the add_object method (with the Indirect parameter set to
True) to add an indirect primitive object:
// Adding a direct string object to the /Info dictionary
string stringPath = pdfexplorer.AddPrimitive("/Info", "Creator", "Microsoft Word");
// Adding an indirect boolean object to the root
string booleanPath = pdfexplorer.AddObject("", 5, "", "true", true);
5 0 obj << ... /Creator (Microsoft Word) >> endobj ... 6 0 obj true endobj
The value of a primitive object can then be modified if desired:
pdfexplorer.Select(stringPath, true);
pdfexplorer.SelectedObjects[0].Value = "nsoftware.SecurePDF";
pdfexplorer.Select(booleanPath, true);
pdfexplorer.SelectedObjects[0].Value = "false";
5 0 obj << ... /Creator (nsoftware.SecurePDF) >> endobj ... 6 0 obj false endobj
Array and Dictionary Objects
Unlike primitives, arrays and dictionaries are objects that contain other objects. Elements within array objects
are arranged sequentially and have implicit zero-based indices, whereas dictionary objects contain named
key-value pairs that are unordered. Use the add_container method to add a direct or indirect array or
dictionary object:
// Adding a direct array object to the first page's /Page dictionary
string arrayPath = pdfexplorer.AddContainer("/Root/Pages/Kids[0]", "CropBox", false, false);
// Adding an indirect dictionary object to the root
string dictPath = pdfexplorer.AddContainer("", "", true, true);
3 0 obj << /Type /Page ... /CropBox [ ]>> endobj ... 7 0 obj << >> endobj
An array or dictionary object can then be modified by adding elements to it. The example below populates the
/CropBox array with four integer objects and adds a /Type key to the newly created
dictionary.
string cropBox0Path = pdfexplorer.AddPrimitive(arrayPath, "", "0");
string cropBox1Path = pdfexplorer.AddPrimitive(arrayPath, "", "0");
string cropBox2Path = pdfexplorer.AddPrimitive(arrayPath, "", "612");
string cropBox3Path = pdfexplorer.AddPrimitive(arrayPath, "", "792");
string typePath = pdfexplorer.AddPrimitive(dictPath, "Type", "/SampleType");
3 0 obj << /Type /Page ... /CropBox [ 0 0 612 792 ]>> endobj ... 7 0 obj << /Type /SampleType >> endobj
Stream Objects
A stream object is a compound object consisting of a dictionary and a sequence of bytes. Stream objects are
always indirect and are used to store data such as images, fonts, and other resources. Use the add_stream
method to add a stream object:
// Adding a stream object to the root
byte[] image1Data = File.ReadAllBytes("image1.png");
string streamPath = pdfexplorer.AddStream("", "", image1Data);
8 0 obj << /Length 6317 >>stream ... % binary data for image1.png endstream endobj
To modify a stream object, use the set_object_data or set_object_stream method:
byte[] image2Data = File.ReadAllBytes("image2.png");
pdfexplorer.SetObjectData(streamPath, image2Data);
// or pdfexplorer.SetObjectStream(streamPath, new MemoryStream(image2Data));
8 0 obj << /Length 197 >>stream ... % binary data for image2.png endstream endobj
Object References
An (indirect) object reference is a reference to an indirect object from another object. Its syntax consists of
the destination object's object number, its generation number, and R (e.g., 1 0 R). Use
the add_reference method to add a reference to an existing object:
// Creating a reference to the stream at 8 0 obj and adding it to the dictionary at 7 0 obj
string path = pdfexplorer.AddReference("7 0 obj", "Image", "8 0 obj");
7 0 obj << /Image 8 0 R /Type /SampleType >> endobj
The contents of the destination object can be modified using the path returned by add_reference in the same way as any other indirect object - the reference will remain intact because the object and generation numbers of the destination object will not be affected.
Removing Objects
The remove_object method can be used to remove an object from the document. While this method will invalidate the former path of the object itself, if it was an indirect object any references to it will not be removed.pdfexplorer.RemoveObject("7 0 obj/Image");
7 0 obj << /Type /SampleType >> endobj
When finished adding, modifying, or removing objects, call the close method to close the document and save the changes to either output_file, output_data, or the stream set in set_output_stream.
Property List
The following is the full list of the properties of the class with short descriptions. Click on the links for further details.
input_data | A byte array containing the PDF document to process. |
input_file | The PDF file to process. |
output_data | A byte array containing the PDF document after processing. |
output_file | The path to a local file where the output will be written. |
overwrite | Whether or not the class should overwrite files. |
root_object_count | The number of records in the RootObject arrays. |
root_object_container | Whether the object is a container for other objects (i. |
root_object_disposition | The method by which the object is addressed in the document. |
root_object_element_count | The number of sub-elements in the object, such as keys in the dictionary or elements in the array. |
root_object_gen_number | The generation number of the indirect (top-level) object. |
root_object_keys | A CRLF-separated list of the keys of the dictionary or indices of the array. |
root_object_object_number | The object number of the indirect (top-level) object. |
root_object_object_type | The type of the object. |
root_object_offset | The start offset of the object, in bytes, from the beginning of the PDF document. |
root_object_path | The path to the object, for example /Root/Pages . |
root_object_size | The physical length of the object in bytes. |
root_object_value | The value of the object. |
selected_object_count | The number of records in the SelectedObject arrays. |
selected_object_container | Whether the object is a container for other objects (i. |
selected_object_disposition | The method by which the object is addressed in the document. |
selected_object_element_count | The number of sub-elements in the object, such as keys in the dictionary or elements in the array. |
selected_object_gen_number | The generation number of the indirect (top-level) object. |
selected_object_keys | A CRLF-separated list of the keys of the dictionary or indices of the array. |
selected_object_object_number | The object number of the indirect (top-level) object. |
selected_object_object_type | The type of the object. |
selected_object_offset | The start offset of the object, in bytes, from the beginning of the PDF document. |
selected_object_path | The path to the object, for example /Root/Pages . |
selected_object_size | The physical length of the object in bytes. |
selected_object_value | The value of the object. |
Method List
The following is the full list of the methods of the class with short descriptions. Click on the links for further details.
add_container | Adds a dictionary or array object to the document. |
add_object | Inserts an object into the document. |
add_opaque | Adds an opaque piece of PDF to the document. |
add_primitive | Adds a primitive object to the document. |
add_reference | Adds an object reference to the document. |
add_stream | Adds a stream object to the document. |
close | Closes the opened document. |
config | Sets or retrieves a configuration setting. |
create_new | Creates a new PDF document. |
get_object_data | Returns the content of a stream object. |
open | Opens the document for processing. |
remove_object | Removes an object from the document. |
reset | Resets the class. |
select | Selects an object or multiple objects from the document. |
set_object_data | Sets the content of a stream object. |
Event List
The following is the full list of the events fired by the class with short descriptions. Click on the links for further details.
on_error | Fired when information is available about errors during data delivery. |
on_log | Fired once for each log message. |
Config Settings
The following is a list of config settings for the class with short descriptions. Click on the links for further details.
CloseInputStreamAfterProcessing | Whether to close the input stream after processing. |
CloseOutputStreamAfterProcessing | Whether to close the output stream after processing. |
LogLevel | The level of detail that is logged. |
OwnerPassword | The owner password to decrypt the document with. |
SaveChanges | Whether to save changes made to the document. |
StringEncoding | The encoding to use for string objects. |
TempPath | The location where temporary files are stored. |
BuildInfo | Information about the product's build. |
CodePage | The system code page used for Unicode to Multibyte translations. |
LicenseInfo | Information about the current license. |
MaskSensitiveData | Whether sensitive data is masked in log messages. |
ProcessIdleEvents | Whether the class uses its internal event loop to process events when the main thread is idle. |
SelectWaitMillis | The length of time in milliseconds the class will wait when DoEvents is called if there are no events to process. |
UseInternalSecurityAPI | Whether or not to use the system security libraries or an internal implementation. |
input_data property
A byte array containing the PDF document to process.
Syntax
def get_input_data() -> bytes: ... def set_input_data(value: bytes) -> None: ...
input_data = property(get_input_data, set_input_data)
Remarks
This property is used to assign a byte array containing the PDF document to be processed.
input_file property
The PDF file to process.
Syntax
def get_input_file() -> str: ... def set_input_file(value: str) -> None: ...
input_file = property(get_input_file, set_input_file)
Default Value
""
Remarks
This property is used to provide a path to the PDF document to be processed.
output_data property
A byte array containing the PDF document after processing.
Syntax
def get_output_data() -> bytes: ...
output_data = property(get_output_data, None)
Remarks
This property is used to read the byte array containing the produced output after the operation has completed. It will only be set if an output file and output stream have not been assigned via output_file and set_output_stream respectively.
This property is read-only.
output_file property
The path to a local file where the output will be written.
Syntax
def get_output_file() -> str: ... def set_output_file(value: str) -> None: ...
output_file = property(get_output_file, set_output_file)
Default Value
""
Remarks
This property is used to provide a path where the resulting PDF document will be saved after the operation has completed.
overwrite property
Whether or not the class should overwrite files.
Syntax
def get_overwrite() -> bool: ... def set_overwrite(value: bool) -> None: ...
overwrite = property(get_overwrite, set_overwrite)
Default Value
FALSE
Remarks
This property indicates whether or not the class will overwrite output_file, output_data, or the stream set in set_output_stream. If set to False, an error will be thrown whenever output_file, output_data, or the stream set in set_output_stream exists before an operation.
root_object_count property
The number of records in the RootObject arrays.
Syntax
def get_root_object_count() -> int: ...
root_object_count = property(get_root_object_count, None)
Default Value
0
Remarks
This property controls the size of the following arrays:
- root_object_container
- root_object_disposition
- root_object_element_count
- root_object_gen_number
- root_object_keys
- root_object_object_number
- root_object_object_type
- root_object_offset
- root_object_path
- root_object_size
- root_object_value
This property is read-only.
root_object_container property
Whether the object is a container for other objects (i.
Syntax
def get_root_object_container(root_object_index: int) -> bool: ...
Default Value
FALSE
Remarks
Whether the object is a container for other objects (i.e., a dictionary, array, or stream).
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_disposition property
The method by which the object is addressed in the document.
Syntax
def get_root_object_disposition(root_object_index: int) -> int: ...
Default Value
0
Remarks
The method by which the object is addressed in the document.
Possible values are:
0 (Direct - default) | The object is recorded in-place. |
1 (Reference) | The object is recorded as a reference to an indirect object. |
2 (Indirect) | The object is an indirect object. |
Example:
5 0 obj << /KeyM (Electricity) >> ... << /KeyA (Some Value) /KeyB << /X /Y >> /KeyC 5 0 R >>
- The value of /KeyA (Some Value) is a direct string object.
- The value of /X (/Y) is a direct name object.
- The value of /KeyB is a direct dictionary object.
- The value of /KeyC is a reference to the indirect dictionary object 5 0 obj.
- The value of 5 0 obj is an indirect dictionary object.
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_element_count property
The number of sub-elements in the object, such as keys in the dictionary or elements in the array.
Syntax
def get_root_object_element_count(root_object_index: int) -> int: ...
Default Value
0
Remarks
The number of sub-elements in the object, such as keys in the dictionary or elements in the array.
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_gen_number property
The generation number of the indirect (top-level) object.
Syntax
def get_root_object_gen_number(root_object_index: int) -> int: ...
Default Value
0
Remarks
The generation number of the indirect (top-level) object.
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_keys property
A CRLF-separated list of the keys of the dictionary or indices of the array.
Syntax
def get_root_object_keys(root_object_index: int) -> str: ...
Default Value
""
Remarks
A CRLF-separated list of the keys of the dictionary or indices of the array.
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_object_number property
The object number of the indirect (top-level) object.
Syntax
def get_root_object_object_number(root_object_index: int) -> int: ...
Default Value
0
Remarks
The object number of the indirect (top-level) object.
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_object_type property
The type of the object.
Syntax
def get_root_object_object_type(root_object_index: int) -> int: ...
Default Value
0
Remarks
The type of the object.
Possible values are:
0 (Undefined - default) | |
1 (Name) | |
2 (String) | |
3 (Real) | |
4 (Integer) | |
5 (Boolean) | |
6 (Array) | |
7 (Dictionary) | |
8 (Stream) |
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_offset property
The start offset of the object, in bytes, from the beginning of the PDF document.
Syntax
def get_root_object_offset(root_object_index: int) -> int: ...
Default Value
0
Remarks
The start offset of the object, in bytes, from the beginning of the PDF document.
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_path property
The path to the object, for example /Root/Pages .
Syntax
def get_root_object_path(root_object_index: int) -> str: ...
Default Value
""
Remarks
The path to the object, for example /Root/Pages.
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_size property
The physical length of the object in bytes.
Syntax
def get_root_object_size(root_object_index: int) -> int: ...
Default Value
0
Remarks
The physical length of the object in bytes.
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
This property is read-only.
root_object_value property
The value of the object.
Syntax
def get_root_object_value(root_object_index: int) -> str: ... def set_root_object_value(root_object_index: int, value: str) -> None: ...
Default Value
""
Remarks
The value of the object.
NOTE: This property only applies to primitive objects (strings, names, integers, reals, and booleans). To access and modify contents of complex objects such as streams, use the get_object_data and set_object_data methods.
The root_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the root_object_count property.
selected_object_count property
The number of records in the SelectedObject arrays.
Syntax
def get_selected_object_count() -> int: ...
selected_object_count = property(get_selected_object_count, None)
Default Value
0
Remarks
This property controls the size of the following arrays:
- selected_object_container
- selected_object_disposition
- selected_object_element_count
- selected_object_gen_number
- selected_object_keys
- selected_object_object_number
- selected_object_object_type
- selected_object_offset
- selected_object_path
- selected_object_size
- selected_object_value
This property is read-only.
selected_object_container property
Whether the object is a container for other objects (i.
Syntax
def get_selected_object_container(selected_object_index: int) -> bool: ...
Default Value
FALSE
Remarks
Whether the object is a container for other objects (i.e., a dictionary, array, or stream).
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_disposition property
The method by which the object is addressed in the document.
Syntax
def get_selected_object_disposition(selected_object_index: int) -> int: ...
Default Value
0
Remarks
The method by which the object is addressed in the document.
Possible values are:
0 (Direct - default) | The object is recorded in-place. |
1 (Reference) | The object is recorded as a reference to an indirect object. |
2 (Indirect) | The object is an indirect object. |
Example:
5 0 obj << /KeyM (Electricity) >> ... << /KeyA (Some Value) /KeyB << /X /Y >> /KeyC 5 0 R >>
- The value of /KeyA (Some Value) is a direct string object.
- The value of /X (/Y) is a direct name object.
- The value of /KeyB is a direct dictionary object.
- The value of /KeyC is a reference to the indirect dictionary object 5 0 obj.
- The value of 5 0 obj is an indirect dictionary object.
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_element_count property
The number of sub-elements in the object, such as keys in the dictionary or elements in the array.
Syntax
def get_selected_object_element_count(selected_object_index: int) -> int: ...
Default Value
0
Remarks
The number of sub-elements in the object, such as keys in the dictionary or elements in the array.
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_gen_number property
The generation number of the indirect (top-level) object.
Syntax
def get_selected_object_gen_number(selected_object_index: int) -> int: ...
Default Value
0
Remarks
The generation number of the indirect (top-level) object.
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_keys property
A CRLF-separated list of the keys of the dictionary or indices of the array.
Syntax
def get_selected_object_keys(selected_object_index: int) -> str: ...
Default Value
""
Remarks
A CRLF-separated list of the keys of the dictionary or indices of the array.
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_object_number property
The object number of the indirect (top-level) object.
Syntax
def get_selected_object_object_number(selected_object_index: int) -> int: ...
Default Value
0
Remarks
The object number of the indirect (top-level) object.
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_object_type property
The type of the object.
Syntax
def get_selected_object_object_type(selected_object_index: int) -> int: ...
Default Value
0
Remarks
The type of the object.
Possible values are:
0 (Undefined - default) | |
1 (Name) | |
2 (String) | |
3 (Real) | |
4 (Integer) | |
5 (Boolean) | |
6 (Array) | |
7 (Dictionary) | |
8 (Stream) |
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_offset property
The start offset of the object, in bytes, from the beginning of the PDF document.
Syntax
def get_selected_object_offset(selected_object_index: int) -> int: ...
Default Value
0
Remarks
The start offset of the object, in bytes, from the beginning of the PDF document.
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_path property
The path to the object, for example /Root/Pages .
Syntax
def get_selected_object_path(selected_object_index: int) -> str: ...
Default Value
""
Remarks
The path to the object, for example /Root/Pages.
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_size property
The physical length of the object in bytes.
Syntax
def get_selected_object_size(selected_object_index: int) -> int: ...
Default Value
0
Remarks
The physical length of the object in bytes.
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
This property is read-only.
selected_object_value property
The value of the object.
Syntax
def get_selected_object_value(selected_object_index: int) -> str: ... def set_selected_object_value(selected_object_index: int, value: str) -> None: ...
Default Value
""
Remarks
The value of the object.
NOTE: This property only applies to primitive objects (strings, names, integers, reals, and booleans). To access and modify contents of complex objects such as streams, use the get_object_data and set_object_data methods.
The selected_object_index parameter specifies the index of the item in the array. The size of the array is controlled by the selected_object_count property.
add_container method
Adds a dictionary or array object to the document.
Syntax
def add_container(base_path: str, object_name: str, dictionary: bool, indirect: bool) -> str: ...
Remarks
This method is used to add a new dictionary or array object to the document at BasePath. If adding to an existing dictionary, set ObjectName to the key that the new object will be added or referenced under.
The Dictionary parameter specifies whether to create a dictionary object.
The Indirect parameter specifies whether to add the dictionary or array to the indirect (numbered) object list and reference it from BasePath instead of creating an in-place object.
This method returns the path of the new object in the document.
Please see Navigating the Document for more details about object paths.
add_object method
Inserts an object into the document.
Syntax
def add_object(base_path: str, object_type: int, object_name: str, value: str, indirect: bool) -> str: ...
Remarks
This method is used to add a new object with value Value to the document at BasePath. If adding to an existing dictionary, set ObjectName to the key that the new object will be added or referenced under.
The ObjectType parameter specifies the type of the object and can be one of the following values:
- 1 (Name)
- 2 (String)
- 3 (Real)
- 4 (Integer)
- 5 (Boolean)
- 6 (Array)
- 7 (Dictionary)
- 8 (Stream)
NOTE: This method can be particularly useful to add a primitive object (name, string, real, integer, or boolean) to the list of indirect (numbered) objects.
This method returns the path of the new object in the document.
Please see Navigating the Document for more details about object paths.
add_opaque method
Adds an opaque piece of PDF to the document.
Syntax
def add_opaque(base_path: str, object_name: str, value: str) -> str: ...
Remarks
This method is used to add an uninterpreted string of PDF objects with value Value to the document at BasePath. If adding to an existing dictionary, set ObjectName to the key that the new object will be added under.
Example:
pdfexplorer.InputFile = "input.pdf";
pdfexplorer.OutputFile = "modified.pdf";
pdfexplorer.Open();
string value = "<< /Producer (Secure PDF)\r\n" +
"/CreationDate (D:20250725102001Z00'00')\r\n" +
"/ModDate (D:20250725102001Z00'00')\r\n" +
"/Author (Edvard Grieg)\r\n" +
"/Title (In the Hall of the Mountain King)\r\n" +
">>";
string path = pdfexplorer.AddOpaque("/", "Info", value);
pdfexplorer.Close();
This method returns the path of the new object in the document.
Please see Navigating the Document for more details about object paths.
add_primitive method
Adds a primitive object to the document.
Syntax
def add_primitive(base_path: str, object_name: str, value: str) -> str: ...
Remarks
This method is used to add a new name, string, real (double), integer, or boolean object with value Value to the document at BasePath. If adding to an existing dictionary, set ObjectName to the key that the new object will be added under.
The class will automatically determine the type of the object based on the Value parameter.
Examples:
pdfexplorer.InputFile = "input.pdf";
pdfexplorer.OutputFile = "modified.pdf";
pdfexplorer.Open();
// Adding a name object to the dictionary at 3 0 obj
string namePath = pdfexplorer.AddPrimitive("3 0 obj", "Type", "/Font");
// Adding a string object to the dictionary at 1 0 obj
string stringPath = pdfexplorer.AddPrimitive("1 0 obj", "Name", "John Doe");
// Adding a real object to an array
string realPath = pdfexplorer.AddPrimitive("5 0 obj/Rect", "", "100.5");
// Adding an integer object to an array
string integerPath = pdfexplorer.AddPrimitive("/Root/Pages/Kids[0]/MediaBox", "", "792");
// Adding a boolean object to a dictionary
string booleanPath = pdfexplorer.AddPrimitive("/Root/AcroForm", "NeedAppearances", "true");
pdfexplorer.Close();
This method returns the path of the new object in the document.
Please see Navigating the Document for more details about object paths.
add_reference method
Adds an object reference to the document.
Syntax
def add_reference(base_path: str, object_name: str, ref_path: str) -> str: ...
Remarks
This method is used to create a new reference to an existing object, such as a page dictionary, at BasePath. If adding to an existing dictionary, set ObjectName to the key that the new reference will be added under.
The RefPath parameter specifies the destination object and must point to one of the indirect (numbered) objects.
This method returns the path of the new object in the document.
Please see Navigating the Document for more details about object paths.
add_stream method
Adds a stream object to the document.
Syntax
def add_stream(base_path: str, object_name: str, value: bytes) -> str: ...
Remarks
This method is used to create a new stream object with value Value at BasePath. If adding to an existing dictionary, set ObjectName to the key that the new stream will be referenced under.
NOTE: Stream objects are always indirect (i.e., part of the numbered object list).
This method returns the path of the new object in the document.
Please see Navigating the Document for more details about object paths.
close method
Closes the opened document.
Syntax
def close() -> None: ...
Remarks
This method is used to close the previously opened document. It should always be preceded by a call to the open method.
Example:
component.InputFile = "input.pdf";
component.Open();
// Some operation
component.Close();
If any changes are made to the document, they will be saved automatically to output_file, output_data, or
the stream set in set_output_stream when this method is called. To configure this saving behavior, set the
SaveChanges configuration setting.
config method
Sets or retrieves a configuration setting.
Syntax
def config(configuration_string: str) -> str: ...
Remarks
config is a generic method available in every class. It is used to set and retrieve configuration settings for the class.
These settings are similar in functionality to properties, but they are rarely used. In order to avoid "polluting" the property namespace of the class, access to these internal properties is provided through the config method.
To set a configuration setting named PROPERTY, you must call Config("PROPERTY=VALUE"), where VALUE is the value of the setting expressed as a string. For boolean values, use the strings "True", "False", "0", "1", "Yes", or "No" (case does not matter).
To read (query) the value of a configuration setting, you must call Config("PROPERTY"). The value will be returned as a string.
create_new method
Creates a new PDF document.
Syntax
def create_new() -> None: ...
Remarks
This method is used to create a blank PDF document with one empty page. Having created the baseline document, use the class's methods (such as add_stream) to add objects to it.
get_object_data method
Returns the content of a stream object.
Syntax
def get_object_data(path: str) -> bytes: ...
Remarks
This method is used to retrieve the content of the PDF stream object at Path.
Please see Navigating the Document for more details about object paths.
open method
Opens the document for processing.
Syntax
def open() -> None: ...
Remarks
This method is used to open the document specified in input_file, input_data, or set_input_stream before performing some operation on it, such as accessing or modifying individual PDF objects. When finished, call close to complete or discard the operation.
It is recommended to use this method (alongside close) when performing multiple operations on the document at once.
NOTE: This method will populate the root_objects properties with the keys found in the document trailer dictionary.
remove_object method
Removes an object from the document.
Syntax
def remove_object(path: str) -> None: ...
Remarks
This method is used to remove the object at Path from the document.
Note the following peculiarities of the PDF format:
- Certain objects ("indirect objects") are global, numbered objects that can be referenced from other objects in the document. To remove an indirect object, all the references to it must be removed first, followed by the object itself.
- Indirect objects may have more than one reference. Removing such an object may inadvertently invalidate other references in the document.
reset method
Resets the class.
Syntax
def reset() -> None: ...
Remarks
This method is used to reset the class's properties and configuration settings to their default values.
select method
Selects an object or multiple objects from the document.
Syntax
def select(filter: str, clear_existing_selection: bool) -> None: ...
Remarks
This method is used to select objects from the document using an XPath-like language. Upon completion of this method, objects with paths matching the Filter parameter will be populated in the selected_objects properties.
The ClearExistingSelection parameter specifies whether selected_objects will be cleared before performing the select operation.
NOTE: Since streams are compound objects consisting of a dictionary and data, when selecting a stream object this method will select its dictionary. Use the get_object_data or get_object_stream methods to extract the content of stream objects.
Please see Navigating the Document for more details about object paths.
set_object_data method
Sets the content of a stream object.
Syntax
def set_object_data(path: str, value: bytes) -> None: ...
Remarks
This method is used to set the content of the PDF stream object at Path. The Value parameter specifies the data of the stream.
Please see Navigating the Document for more details about object paths.
on_error event
Fired when information is available about errors during data delivery.
Syntax
class PDFExplorerErrorEventParams(object): @property def error_code() -> int: ... @property def description() -> str: ... # In class PDFExplorer: @property def on_error() -> Callable[[PDFExplorerErrorEventParams], None]: ... @on_error.setter def on_error(event_hook: Callable[[PDFExplorerErrorEventParams], None]) -> None: ...
Remarks
The on_error event is fired in case of exceptional conditions during message processing. Normally the class fails with an error.
The ErrorCode parameter contains an error code, and the Description parameter contains a textual description of the error. For a list of valid error codes and their descriptions, please refer to the Error Codes section.
on_log event
Fired once for each log message.
Syntax
class PDFExplorerLogEventParams(object): @property def log_level() -> int: ... @property def message() -> str: ... @property def log_type() -> str: ... # In class PDFExplorer: @property def on_log() -> Callable[[PDFExplorerLogEventParams], None]: ... @on_log.setter def on_log(event_hook: Callable[[PDFExplorerLogEventParams], None]) -> None: ...
Remarks
This event is fired once for each log message generated by the class. The verbosity is controlled by the LogLevel configuration setting.
The LogLevel parameter indicates the detail level of the message. Possible values are:
0 (None) | No messages are logged. |
1 (Info - default) | Informational events such as the basics of the chain validation procedure are logged. |
2 (Verbose) | Detailed data such as HTTP requests are logged. |
3 (Debug) | Debug data including the full chain validation procedure are logged. |
The Message parameter is the log message.
The LogType parameter identifies the type of log entry. Possible values are:
- CertValidator
- Font
- HTTP
- PDFInvalidSignature
- PDFRevocationInfo
- Timestamp
- TSL
PDFExplorer Config Settings
The class accepts one or more of the following configuration settings. Configuration settings are similar in functionality to properties, but they are rarely used. In order to avoid "polluting" the property namespace of the class, access to these internal properties is provided through the config method.PDFExplorer Config Settings
0 (None) | No messages are logged. |
1 (Info - default) | Informational events such as the basics of the chain validation procedure are logged. |
2 (Verbose) | Detailed data such as HTTP requests are logged. |
3 (Debug) | Debug data including the full chain validation procedure are logged. |
0 | Discard all changes. |
1 | Save the document to output_file, output_data, or the stream set in set_output_stream, even if it has not been modified. |
2 (default) | Save the document to output_file, output_data, or the stream set in set_output_stream, but only if it has been modified. |
Auto (default) | Encode the string as a hex string if no human-readable text is identified; otherwise, encode it as a literal string. |
Hex | Encode the string as a hex string (e.g., hex:48656C6C6F20776F726C6421). |
Binary | Encode the string as a literal string, converting to human-readable text when possible (e.g., Hello world!). |
Base Config Settings
The following is a list of valid code page identifiers:
Identifier | Name |
037 | IBM EBCDIC - U.S./Canada |
437 | OEM - United States |
500 | IBM EBCDIC - International |
708 | Arabic - ASMO 708 |
709 | Arabic - ASMO 449+, BCON V4 |
710 | Arabic - Transparent Arabic |
720 | Arabic - Transparent ASMO |
737 | OEM - Greek (formerly 437G) |
775 | OEM - Baltic |
850 | OEM - Multilingual Latin I |
852 | OEM - Latin II |
855 | OEM - Cyrillic (primarily Russian) |
857 | OEM - Turkish |
858 | OEM - Multilingual Latin I + Euro symbol |
860 | OEM - Portuguese |
861 | OEM - Icelandic |
862 | OEM - Hebrew |
863 | OEM - Canadian-French |
864 | OEM - Arabic |
865 | OEM - Nordic |
866 | OEM - Russian |
869 | OEM - Modern Greek |
870 | IBM EBCDIC - Multilingual/ROECE (Latin-2) |
874 | ANSI/OEM - Thai (same as 28605, ISO 8859-15) |
875 | IBM EBCDIC - Modern Greek |
932 | ANSI/OEM - Japanese, Shift-JIS |
936 | ANSI/OEM - Simplified Chinese (PRC, Singapore) |
949 | ANSI/OEM - Korean (Unified Hangul Code) |
950 | ANSI/OEM - Traditional Chinese (Taiwan; Hong Kong SAR, PRC) |
1026 | IBM EBCDIC - Turkish (Latin-5) |
1047 | IBM EBCDIC - Latin 1/Open System |
1140 | IBM EBCDIC - U.S./Canada (037 + Euro symbol) |
1141 | IBM EBCDIC - Germany (20273 + Euro symbol) |
1142 | IBM EBCDIC - Denmark/Norway (20277 + Euro symbol) |
1143 | IBM EBCDIC - Finland/Sweden (20278 + Euro symbol) |
1144 | IBM EBCDIC - Italy (20280 + Euro symbol) |
1145 | IBM EBCDIC - Latin America/Spain (20284 + Euro symbol) |
1146 | IBM EBCDIC - United Kingdom (20285 + Euro symbol) |
1147 | IBM EBCDIC - France (20297 + Euro symbol) |
1148 | IBM EBCDIC - International (500 + Euro symbol) |
1149 | IBM EBCDIC - Icelandic (20871 + Euro symbol) |
1200 | Unicode UCS-2 Little-Endian (BMP of ISO 10646) |
1201 | Unicode UCS-2 Big-Endian |
1250 | ANSI - Central European |
1251 | ANSI - Cyrillic |
1252 | ANSI - Latin I |
1253 | ANSI - Greek |
1254 | ANSI - Turkish |
1255 | ANSI - Hebrew |
1256 | ANSI - Arabic |
1257 | ANSI - Baltic |
1258 | ANSI/OEM - Vietnamese |
1361 | Korean (Johab) |
10000 | MAC - Roman |
10001 | MAC - Japanese |
10002 | MAC - Traditional Chinese (Big5) |
10003 | MAC - Korean |
10004 | MAC - Arabic |
10005 | MAC - Hebrew |
10006 | MAC - Greek I |
10007 | MAC - Cyrillic |
10008 | MAC - Simplified Chinese (GB 2312) |
10010 | MAC - Romania |
10017 | MAC - Ukraine |
10021 | MAC - Thai |
10029 | MAC - Latin II |
10079 | MAC - Icelandic |
10081 | MAC - Turkish |
10082 | MAC - Croatia |
12000 | Unicode UCS-4 Little-Endian |
12001 | Unicode UCS-4 Big-Endian |
20000 | CNS - Taiwan |
20001 | TCA - Taiwan |
20002 | Eten - Taiwan |
20003 | IBM5550 - Taiwan |
20004 | TeleText - Taiwan |
20005 | Wang - Taiwan |
20105 | IA5 IRV International Alphabet No. 5 (7-bit) |
20106 | IA5 German (7-bit) |
20107 | IA5 Swedish (7-bit) |
20108 | IA5 Norwegian (7-bit) |
20127 | US-ASCII (7-bit) |
20261 | T.61 |
20269 | ISO 6937 Non-Spacing Accent |
20273 | IBM EBCDIC - Germany |
20277 | IBM EBCDIC - Denmark/Norway |
20278 | IBM EBCDIC - Finland/Sweden |
20280 | IBM EBCDIC - Italy |
20284 | IBM EBCDIC - Latin America/Spain |
20285 | IBM EBCDIC - United Kingdom |
20290 | IBM EBCDIC - Japanese Katakana Extended |
20297 | IBM EBCDIC - France |
20420 | IBM EBCDIC - Arabic |
20423 | IBM EBCDIC - Greek |
20424 | IBM EBCDIC - Hebrew |
20833 | IBM EBCDIC - Korean Extended |
20838 | IBM EBCDIC - Thai |
20866 | Russian - KOI8-R |
20871 | IBM EBCDIC - Icelandic |
20880 | IBM EBCDIC - Cyrillic (Russian) |
20905 | IBM EBCDIC - Turkish |
20924 | IBM EBCDIC - Latin-1/Open System (1047 + Euro symbol) |
20932 | JIS X 0208-1990 & 0121-1990 |
20936 | Simplified Chinese (GB2312) |
21025 | IBM EBCDIC - Cyrillic (Serbian, Bulgarian) |
21027 | Extended Alpha Lowercase |
21866 | Ukrainian (KOI8-U) |
28591 | ISO 8859-1 Latin I |
28592 | ISO 8859-2 Central Europe |
28593 | ISO 8859-3 Latin 3 |
28594 | ISO 8859-4 Baltic |
28595 | ISO 8859-5 Cyrillic |
28596 | ISO 8859-6 Arabic |
28597 | ISO 8859-7 Greek |
28598 | ISO 8859-8 Hebrew |
28599 | ISO 8859-9 Latin 5 |
28605 | ISO 8859-15 Latin 9 |
29001 | Europa 3 |
38598 | ISO 8859-8 Hebrew |
50220 | ISO 2022 Japanese with no halfwidth Katakana |
50221 | ISO 2022 Japanese with halfwidth Katakana |
50222 | ISO 2022 Japanese JIS X 0201-1989 |
50225 | ISO 2022 Korean |
50227 | ISO 2022 Simplified Chinese |
50229 | ISO 2022 Traditional Chinese |
50930 | Japanese (Katakana) Extended |
50931 | US/Canada and Japanese |
50933 | Korean Extended and Korean |
50935 | Simplified Chinese Extended and Simplified Chinese |
50936 | Simplified Chinese |
50937 | US/Canada and Traditional Chinese |
50939 | Japanese (Latin) Extended and Japanese |
51932 | EUC - Japanese |
51936 | EUC - Simplified Chinese |
51949 | EUC - Korean |
51950 | EUC - Traditional Chinese |
52936 | HZ-GB2312 Simplified Chinese |
54936 | Windows XP: GB18030 Simplified Chinese (4 Byte) |
57002 | ISCII Devanagari |
57003 | ISCII Bengali |
57004 | ISCII Tamil |
57005 | ISCII Telugu |
57006 | ISCII Assamese |
57007 | ISCII Oriya |
57008 | ISCII Kannada |
57009 | ISCII Malayalam |
57010 | ISCII Gujarati |
57011 | ISCII Punjabi |
65000 | Unicode UTF-7 |
65001 | Unicode UTF-8 |
Identifier | Name |
1 | ASCII |
2 | NEXTSTEP |
3 | JapaneseEUC |
4 | UTF8 |
5 | ISOLatin1 |
6 | Symbol |
7 | NonLossyASCII |
8 | ShiftJIS |
9 | ISOLatin2 |
10 | Unicode |
11 | WindowsCP1251 |
12 | WindowsCP1252 |
13 | WindowsCP1253 |
14 | WindowsCP1254 |
15 | WindowsCP1250 |
21 | ISO2022JP |
30 | MacOSRoman |
10 | UTF16String |
0x90000100 | UTF16BigEndian |
0x94000100 | UTF16LittleEndian |
0x8c000100 | UTF32String |
0x98000100 | UTF32BigEndian |
0x9c000100 | UTF32LittleEndian |
65536 | Proprietary |
- Product: The product the license is for.
- Product Key: The key the license was generated from.
- License Source: Where the license was found (e.g., RuntimeLicense, License File).
- License Type: The type of license installed (e.g., Royalty Free, Single Server).
- Last Valid Build: The last valid build number for which the license will work.
Setting this configuration setting to True tells the class to use the internal implementation instead of using the system security libraries.
On Windows, this setting is set to False by default. On Linux/macOS, this setting is set to True by default.
To use the system security libraries for Linux, OpenSSL support must be enabled. For more information on how to enable OpenSSL, please refer to the OpenSSL Notes section.
PDFExplorer Errors
PDFExplorer Errors
1301 | Invalid path. |
1302 | Unsupported object type. |
1304 | Object with this name already exists. |
1307 | Cannot add direct object to root. |
1308 | Cannot add reference to root. |
PDF Errors
804 | PDF decompression failed. |
805 | Cannot add entry to cross-reference table. |
806 | Unsupported field size. |
807 | Unsupported Encoding filter. |
808 | Unsupported predictor algorithm. |
809 | Unsupported document version. |
812 | Cannot read PDF file stream. |
813 | Cannot write to PDF file stream. |
814 | output_file already exists and overwrite is False. |
815 | Invalid parameter. |
817 | Bad cross-reference entry. |
818 | Invalid object or generation number. |
819 | Invalid object stream. |
820 | Invalid stream dictionary. |
821 | Invalid AcroForm entry. |
822 | Invalid Root entry. |
823 | Invalid annotation. |
824 | The input document is empty. |
826 | OpenType font error. The error description contains the detailed message. |
828 | Invalid CMS data. The error description contains the detailed message. |
835 | Cannot change decryption mode for opened document. |
836 | Unsupported Date string. |
838 | Cryptographic error. The error description contains the detailed message. |
840 | decryption_cert error. The error description contains the detailed message. |
841 | Encryption failed. The error description contains the detailed message. |
842 | No proper certificate for encryption found. |
846 | Unsupported revision. |
847 | Unsupported security handler SubFilter. |
848 | Failed to verify permissions. |
849 | Invalid password. |
850 | Invalid password information. |
852 | Unsupported encryption algorithm. |
859 | Cannot encrypt encrypted document. |
864 | Cannot modify document after signature update. |
868 | Cannot encrypt or decrypt object. |
869 | Invalid security handler information. |
870 | Invalid encrypted data. |
871 | Invalid block cipher padding. |
872 | Failed to reload signature. |
873 | Object is not encrypted. |
874 | Unexpected cipher information. |
877 | Invalid document. Bad document catalog. |
878 | Invalid document Id. |
880 | Invalid document. Invalid requirements dictionary. |
881 | Invalid linearization dictionary. |
882 | Invalid signature information. |
883 | Unsupported document format. |
890 | Unsupported feature. |
891 | Internal error. The error description contains the detailed message. |
892 | Unsupported color. |
893 | This operation is not supported for this PDF/A level. |
894 | Interactive features () are not supported by PDF/A. Set EnforcePDFA to False or clear the property of the field. |
895 | Font file not found. |
Parsing Errors
1001 | Bad object. |
1002 | Bad document trailer. |
1003 | Illegal stream dictionary. |
1004 | Illegal string. |
1005 | Indirect object expected. |
1007 | Invalid reference. |
1008 | Invalid reference table. |
1009 | Invalid stream data. |
1010 | Unexpected character. |
1011 | Unexpected EOF. |
1012 | Unexpected indirect object in cross-reference table. |
1013 | RDF object not found. |
1014 | Invalid RDF object. |
1015 | Cannot create element with unknown prefix. |
1021 | Invalid type in Root object list. |