OfficeDoc Class
Properties Methods Events Config Settings Errors
The OfficeDoc class implements support for the Open XML Packaging Format used in Office 2007 documents.
Syntax
class ipworkszip.OfficeDoc
Remarks
The class provides a way to extract information and content from an Open XML packaged document, examine the package properties and basic read/update facilities.
Property List
The following is the full list of the properties of the class with short descriptions. Click on the links for further details.
content_type_count | The number of records in the ContentType arrays. |
content_type_is_override | Specifies if this is a default content type or an override. |
content_type_media_type | The media type for this entry, as defined by RFC2616. |
content_type_target | If it's a default content type, this will be the file extension it applies to. |
namespace_count | The number of records in the Namespace arrays. |
namespace_prefix | This property contains the Prefix for the Namespace . |
namespace_uri | This property contains the namespace URI associated with the corresponding Prefix . |
package_path | The path to the Open XML package file. |
package_property_count | The number of records in the PackageProperty arrays. |
package_property_data_type | The data type associated with this property, if the information is available. |
package_property_name | The name of this property. |
package_property_namespace | The XML Namespace URI associated with this property. |
package_property_prop_id | If this is a custom property, this will be the pid assigned to it. |
package_property_prop_set | If this is a custom property, this will be the GUID of the property set it belongs to. |
package_property_value | The value of this property. |
part_data | The contents of the currently selected part. |
part_name | The name of the currently selected part. |
relationship_count | The number of records in the Relationship arrays. |
relationship_content_type | The content type for the part referenced by this relationship, resolved from [Content_Types]. |
relationship_id | The unique ID of this relationship within this . |
relationship_part_name | The name of the part this relationship entry applies to. |
relationship_type_uri | The XML namespace URI that specifies the meaning of this relationship. |
validate | This property controls whether documents are validated during parsing. |
attr_count | The number of records in the Attr arrays. |
attr_name | The Name provides the local name (without prefix) of the attribute. |
attr_namespace | This property contains the attribute namespace. |
attr_prefix | This property contains the attribute prefix (if any). |
attr_value | This property contains the attribute value. |
xchild_count | The number of records in the XChild arrays. |
xchild_name | The Name property provides the local name (without a prefix) of the element. |
xchild_namespace | This property contains the namespace of the element. |
xchild_prefix | This property contains the prefix of the element (if any). |
xchild_x_text | This property contains the inner text of the element. |
xelement | The name of the current element. |
xnamespace | The namespace of the current element. |
xparent | This property includes the parent of the current element. |
xpath | This property provides a way to point to a specific element in the document. |
xprefix | The prefix of the current element. |
xsub_tree | This property includes a snapshot of the current element in the document. |
xtext | The text of the current element. |
Method List
The following is the full list of the methods of the class with short descriptions. Click on the links for further details.
close | Closes the Open XML package archive. |
config | Sets or retrieves a configuration setting. |
extract_part | Reads the contents of the currently selected part. |
find_part_by_type | Looks up a part in the current relationships file by it's type namespace URI. |
get_property_value | Returns the value of the specified package property. |
list_parts | List all the parts contained in the document and their relationships. |
open | Opens the Open XML package archive. |
parse_part | Parses the specified part as XML. |
read_relationships | Reads the relationships file in the archive associated with the specified part. |
replace_part | Replaces the contents of the specified part in the package. |
reset | Resets the class. |
resolve_content_type | Returns the content type of the specified part. |
Event List
The following is the full list of the events fired by the class with short descriptions. Click on the links for further details.
on_begin_file | Fired before each file is processed. |
on_characters | This event is fired for plaintext segments of the input stream. |
on_comment | This event is fired when a comment section is encountered. |
on_end_element | This event is fired when an end-element tag is encountered. |
on_end_file | Fired after each file is processed. |
on_end_prefix_mapping | This event is fired when leaving the scope of a namespace declaration. |
on_error | Fired when information is available about errors during data delivery. |
on_eval_entity | This event is fired every time an entity needs to be evaluated. |
on_ignorable_whitespace | This event is fired when a section of ignorable whitespace is encountered. |
on_meta | This event fires when a meta section is encountered. |
on_overwrite | Fired whenever a file exists and may be overwritten. |
on_pi | This event is fired when a processing instruction section is encountered. |
on_progress | Fired as progress is made. |
on_special_section | This event is fired when a special section is encountered. |
on_start_element | This event is fired when a begin-element tag is encountered in the document. |
on_start_prefix_mapping | This event is fired when entering the scope of a namespace declaration. |
Config Settings
The following is a list of config settings for the class with short descriptions. Click on the links for further details.
NormalizePartName | Whether to normalize Part Names. |
RelationshipIsExternal[x] | Whether the relationship part is internal or external. |
BuildInfo | Information about the product's build. |
CodePage | The system code page used for Unicode to Multibyte translations. |
LicenseInfo | Information about the current license. |
MaskSensitiveData | Whether sensitive data is masked in log messages. |
ProcessIdleEvents | Whether the class uses its internal event loop to process events when the main thread is idle. |
SelectWaitMillis | The length of time in milliseconds the class will wait when DoEvents is called if there are no events to process. |
UseInternalSecurityAPI | Whether or not to use the system security libraries or an internal implementation. |
content_type_count Property
The number of records in the ContentType arrays.
Syntax
def get_content_type_count() -> int: ...
content_type_count = property(get_content_type_count, None)
Default Value
0
Remarks
This property controls the size of the following arrays:
The array indices start at 0 and end at content_type_count - 1.
This property is read-only.
content_type_is_override Property
Specifies if this is a default content type or an override.
Syntax
def get_content_type_is_override(content_type_index: int) -> bool: ...
Default Value
TRUE
Remarks
Specifies if this is a default content type or an override.
The content_type_index parameter specifies the index of the item in the array. The size of the array is controlled by the content_type_count property.
This property is read-only.
content_type_media_type Property
The media type for this entry, as defined by RFC2616.
Syntax
def get_content_type_media_type(content_type_index: int) -> str: ...
Default Value
""
Remarks
The media type for this entry, as defined by RFC2616.
The content_type_index parameter specifies the index of the item in the array. The size of the array is controlled by the content_type_count property.
This property is read-only.
content_type_target Property
If it's a default content type, this will be the file extension it applies to.
Syntax
def get_content_type_target(content_type_index: int) -> str: ...
Default Value
""
Remarks
If it's a default content type, this will be the file extension it applies to. Otherwise, it will be the part name.
The content_type_index parameter specifies the index of the item in the array. The size of the array is controlled by the content_type_count property.
This property is read-only.
namespace_count Property
The number of records in the Namespace arrays.
Syntax
def get_namespace_count() -> int: ...
namespace_count = property(get_namespace_count, None)
Default Value
0
Remarks
This property controls the size of the following arrays:
The array indices start at 0 and end at namespace_count - 1.
This property is read-only.
namespace_prefix Property
This property contains the Prefix for the Namespace .
Syntax
def get_namespace_prefix(namespace_index: int) -> str: ...
Default Value
""
Remarks
This property contains the namespace_prefix for the namespace.
The namespace_index parameter specifies the index of the item in the array. The size of the array is controlled by the namespace_count property.
This property is read-only.
namespace_uri Property
This property contains the namespace URI associated with the corresponding Prefix .
Syntax
def get_namespace_uri(namespace_index: int) -> str: ...
Default Value
""
Remarks
This property contains the namespace URI associated with the corresponding namespace_prefix. This URL is usually pointing to the XML schema for the namespace.
The namespace_index parameter specifies the index of the item in the array. The size of the array is controlled by the namespace_count property.
This property is read-only.
package_path Property
The path to the Open XML package file.
Syntax
def get_package_path() -> str: ... def set_package_path(value: str) -> None: ...
package_path = property(get_package_path, set_package_path)
Default Value
""
Remarks
This property specifies the path and filename of the Open XML package to work on.
package_property_count Property
The number of records in the PackageProperty arrays.
Syntax
def get_package_property_count() -> int: ...
package_property_count = property(get_package_property_count, None)
Default Value
0
Remarks
This property controls the size of the following arrays:
- package_property_data_type
- package_property_name
- package_property_namespace
- package_property_prop_id
- package_property_prop_set
- package_property_value
The array indices start at 0 and end at package_property_count - 1.
This property is read-only.
package_property_data_type Property
The data type associated with this property, if the information is available.
Syntax
def get_package_property_data_type(package_property_index: int) -> str: ...
Default Value
""
Remarks
The data type associated with this property, if the information is available.
The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.
This property is read-only.
package_property_name Property
The name of this property.
Syntax
def get_package_property_name(package_property_index: int) -> str: ...
Default Value
""
Remarks
The name of this property
The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.
This property is read-only.
package_property_namespace Property
The XML Namespace URI associated with this property.
Syntax
def get_package_property_namespace(package_property_index: int) -> str: ...
Default Value
""
Remarks
The XML Namespace URI associated with this property. For custom properties, this will be an empty string.
The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.
This property is read-only.
package_property_prop_id Property
If this is a custom property, this will be the pid assigned to it.
Syntax
def get_package_property_prop_id(package_property_index: int) -> str: ...
Default Value
""
Remarks
If this is a custom property, this will be the pid assigned to it.
The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.
This property is read-only.
package_property_prop_set Property
If this is a custom property, this will be the GUID of the property set it belongs to.
Syntax
def get_package_property_prop_set(package_property_index: int) -> str: ...
Default Value
""
Remarks
If this is a custom property, this will be the GUID of the property set it belongs to.
The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.
This property is read-only.
package_property_value Property
The value of this property.
Syntax
def get_package_property_value(package_property_index: int) -> str: ...
Default Value
""
Remarks
The value of this property
The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.
This property is read-only.
part_data Property
The contents of the currently selected part.
Syntax
def get_part_data() -> bytes: ... def set_part_data(value: bytes) -> None: ...
part_data = property(get_part_data, set_part_data)
Default Value
""
Remarks
This property will hold the contents of the part selected by part_name after calling the extract_part method. It can also be set before calling replace_part.
part_name Property
The name of the currently selected part.
Syntax
def get_part_name() -> str: ... def set_part_name(value: str) -> None: ...
part_name = property(get_part_name, set_part_name)
Default Value
""
Remarks
This property specifies the name of the currently selected part in the document. If null or empty, no part is currently selected.
relationship_count Property
The number of records in the Relationship arrays.
Syntax
def get_relationship_count() -> int: ...
relationship_count = property(get_relationship_count, None)
Default Value
0
Remarks
This property controls the size of the following arrays:
The array indices start at 0 and end at relationship_count - 1.
This property is read-only.
relationship_content_type Property
The content type for the part referenced by this relationship, resolved from [Content_Types].
Syntax
def get_relationship_content_type(relationship_index: int) -> str: ...
Default Value
""
Remarks
The content type for the part referenced by this relationship, resolved from [Content_Types].xml according to the Open XML packaging specification rules.
The relationship_index parameter specifies the index of the item in the array. The size of the array is controlled by the relationship_count property.
This property is read-only.
relationship_id Property
The unique ID of this relationship within this .
Syntax
def get_relationship_id(relationship_index: int) -> str: ...
Default Value
""
Remarks
The unique ID of this relationship within this .rels file
The relationship_index parameter specifies the index of the item in the array. The size of the array is controlled by the relationship_count property.
This property is read-only.
relationship_part_name Property
The name of the part this relationship entry applies to.
Syntax
def get_relationship_part_name(relationship_index: int) -> str: ...
Default Value
""
Remarks
The name of the part this relationship entry applies to.
The relationship_index parameter specifies the index of the item in the array. The size of the array is controlled by the relationship_count property.
This property is read-only.
relationship_type_uri Property
The XML namespace URI that specifies the meaning of this relationship.
Syntax
def get_relationship_type_uri(relationship_index: int) -> str: ...
Default Value
""
Remarks
The XML namespace URI that specifies the meaning of this relationship.
The relationship_index parameter specifies the index of the item in the array. The size of the array is controlled by the relationship_count property.
This property is read-only.
validate Property
This property controls whether documents are validated during parsing.
Syntax
def get_validate() -> bool: ... def set_validate(value: bool) -> None: ...
validate = property(get_validate, set_validate)
Default Value
TRUE
Remarks
When True (default), the document will be validated during parsing. To disable validation set validate to False. Disabling validation may be useful in cases in which data can still be parsed even if the document is not well formed.
attr_count Property
The number of records in the Attr arrays.
Syntax
def get_attr_count() -> int: ...
attr_count = property(get_attr_count, None)
Default Value
0
Remarks
This property controls the size of the following arrays:
The array indices start at 0 and end at attr_count - 1.
This property is read-only.
attr_name Property
The Name provides the local name (without prefix) of the attribute.
Syntax
def get_attr_name(attr_index: int) -> str: ...
Default Value
""
Remarks
The attr_name provides the local name (without prefix) of the attribute.
The attr_index parameter specifies the index of the item in the array. The size of the array is controlled by the attr_count property.
This property is read-only.
attr_namespace Property
This property contains the attribute namespace.
Syntax
def get_attr_namespace(attr_index: int) -> str: ...
Default Value
""
Remarks
This property contains the attribute namespace.
The attr_index parameter specifies the index of the item in the array. The size of the array is controlled by the attr_count property.
This property is read-only.
attr_prefix Property
This property contains the attribute prefix (if any).
Syntax
def get_attr_prefix(attr_index: int) -> str: ...
Default Value
""
Remarks
This property contains the attribute prefix (if any). If the attribute does not have a prefix, this property is empty.
The attr_index parameter specifies the index of the item in the array. The size of the array is controlled by the attr_count property.
This property is read-only.
attr_value Property
This property contains the attribute value.
Syntax
def get_attr_value(attr_index: int) -> str: ...
Default Value
""
Remarks
This property contains the attribute value.
The attr_index parameter specifies the index of the item in the array. The size of the array is controlled by the attr_count property.
This property is read-only.
xchild_count Property
The number of records in the XChild arrays.
Syntax
def get_xchild_count() -> int: ...
xchild_count = property(get_xchild_count, None)
Default Value
0
Remarks
This property controls the size of the following arrays:
The array indices start at 0 and end at xchild_count - 1.
This property is read-only.
xchild_name Property
The Name property provides the local name (without a prefix) of the element.
Syntax
def get_xchild_name(xchild_index: int) -> str: ...
Default Value
""
Remarks
The xchild_name property provides the local name (without a prefix) of the element.
The xchild_index parameter specifies the index of the item in the array. The size of the array is controlled by the xchild_count property.
This property is read-only.
xchild_namespace Property
This property contains the namespace of the element.
Syntax
def get_xchild_namespace(xchild_index: int) -> str: ...
Default Value
""
Remarks
This property contains the namespace of the element.
The xchild_index parameter specifies the index of the item in the array. The size of the array is controlled by the xchild_count property.
This property is read-only.
xchild_prefix Property
This property contains the prefix of the element (if any).
Syntax
def get_xchild_prefix(xchild_index: int) -> str: ...
Default Value
""
Remarks
This property contains the prefix of the element (if any). If the element does not have a prefix, this property is empty.
The xchild_index parameter specifies the index of the item in the array. The size of the array is controlled by the xchild_count property.
This property is read-only.
xchild_x_text Property
This property contains the inner text of the element.
Syntax
def get_xchild_x_text(xchild_index: int) -> str: ...
Default Value
""
Remarks
This property contains the inner text of the element.
The xchild_index parameter specifies the index of the item in the array. The size of the array is controlled by the xchild_count property.
This property is read-only.
xelement Property
The name of the current element.
Syntax
def get_xelement() -> str: ...
xelement = property(get_xelement, None)
Default Value
""
Remarks
The current element is specified via the xpath property.
This property is read-only.
xnamespace Property
The namespace of the current element.
Syntax
def get_xnamespace() -> str: ...
xnamespace = property(get_xnamespace, None)
Default Value
""
Remarks
The current element is specified via the xpath property.
This property is read-only.
xparent Property
This property includes the parent of the current element.
Syntax
def get_xparent() -> str: ...
xparent = property(get_xparent, None)
Default Value
""
Remarks
The current element is specified through the xpath property.
This property is read-only.
xpath Property
This property provides a way to point to a specific element in the document.
Syntax
def get_xpath() -> str: ... def set_xpath(value: str) -> None: ...
xpath = property(get_xpath, set_xpath)
Default Value
""
Remarks
xpath implements a subset of the XML XPath specification, allowing you to point to specific elements in the XML documents.
The path is a series of one or more element accessors separated by '/'. The path can be absolute (starting with '/') or relative to the current xpath location.
The following are possible values for an element accessor:
'name' | A particular element name |
name[i] | The i-th subelement of the current element with the given name |
[i] | The i-th subelement of the current element |
[last()] | The last subelement of the current element |
[last()-i] | The subelement located at the last location minus i in the current element |
name[@attrname="attrvalue"] | The subelement containing a particular value for a given attribute (supports single AND double quotes) |
.. | The parent of the current element |
build_dom must be set to True before parsing the document for the xpath functionality to be available.
Example. Setting XPath:
Document root | XML.XPath = "/" |
Specific Element | XML.XPath = "/root/SubElement1/SubElement2/" |
i-th Child | XML.XPath = "/root/SubElement1[i]" |
xprefix Property
The prefix of the current element.
Syntax
def get_xprefix() -> str: ...
xprefix = property(get_xprefix, None)
Default Value
""
Remarks
The current element is specified via the xpath property.
This property is read-only.
xsub_tree Property
This property includes a snapshot of the current element in the document.
Syntax
def get_xsub_tree() -> str: ...
xsub_tree = property(get_xsub_tree, None)
Default Value
""
Remarks
The current element is specified through the xpath property. For this property to work, you must set the CacheContent to True.
This property is read-only.
xtext Property
The text of the current element.
Syntax
def get_xtext() -> str: ...
xtext = property(get_xtext, None)
Default Value
""
Remarks
The current element is specified via the xpath property.
This property is read-only.
close Method
Closes the Open XML package archive.
Syntax
def close() -> None: ...
Remarks
When this method is called, the class will close the current archive and release all resources.
config Method
Sets or retrieves a configuration setting.
Syntax
def config(configuration_string: str) -> str: ...
Remarks
config is a generic method available in every class. It is used to set and retrieve configuration settings for the class.
These settings are similar in functionality to properties, but they are rarely used. In order to avoid "polluting" the property namespace of the class, access to these internal properties is provided through the config method.
To set a configuration setting named PROPERTY, you must call Config("PROPERTY=VALUE"), where VALUE is the value of the setting expressed as a string. For boolean values, use the strings "True", "False", "0", "1", "Yes", or "No" (case does not matter).
To read (query) the value of a configuration setting, you must call Config("PROPERTY"). The value will be returned as a string.
extract_part Method
Reads the contents of the currently selected part.
Syntax
def extract_part() -> None: ...
Remarks
If the part specified by the part_name property exists, the corresponding physical file will be extracted from the archive and will be available through the part_data property.
If the part doesn't exists, or it's stored in interleaved format, an error will be raised.
find_part_by_type Method
Looks up a part in the current relationships file by it's type namespace URI.
Syntax
def find_part_by_type(type_uri: str) -> str: ...
Remarks
If a matching part can be found, it's part name is returned. Otherwise, an empty string is returned.
get_property_value Method
Returns the value of the specified package property.
Syntax
def get_property_value(prop_name: str, prop_namespace: str) -> str: ...
Remarks
Looks up a package property named PropName in namespace PropNamespace in the core and app properties tables and returns it's value, if found.
If the property doesn't exists, an empty string is returned.
For custom properties. use an empty string ("") as the value of the PropNamespace parameter.
list_parts Method
List all the parts contained in the document and their relationships.
Syntax
def list_parts() -> None: ...
Remarks
When this method is called, the class will read all the relationships in the document, recursively, and populate the relationships collection.
open Method
Opens the Open XML package archive.
Syntax
def open() -> None: ...
Remarks
When this method is called, the class will attempt to open the archive specified in package_path and extract package properties, content types and parse the master relationships file in the archive.
parse_part Method
Parses the specified part as XML.
Syntax
def parse_part() -> None: ...
Remarks
If the part specified by part_name exists, the corresponding physical file will be extracted from the archive and parsed as XML. If build_dom is enabled, the DOM will be built internally and you can use XPath to query the resulting document, using the xpath property. If build_dom is disabled, only the XML parser-related events will be fired.
read_relationships Method
Reads the relationships file in the archive associated with the specified part.
Syntax
def read_relationships() -> None: ...
Remarks
When this method is called, the class will look for a .rels file associated with the part specified by the part_name property. If found, the relationships collection will now expose the contents of the relationships for that part.
replace_part Method
Replaces the contents of the specified part in the package.
Syntax
def replace_part() -> None: ...
Remarks
If the part specified by the part_name property exists, the corresponding physical file will be replaced with the contents of the part_data property. The package file will be modified in place right away.
If the part doesn't exists, it's stored in interleaved format, or part_data is null or empty, an error will be raised.
reset Method
Resets the class.
Syntax
def reset() -> None: ...
Remarks
reset resets the state of the class. All properties will be set to their default values, and any files open will be closed.
resolve_content_type Method
Returns the content type of the specified part.
Syntax
def resolve_content_type() -> str: ...
Remarks
Applies the content type resolution rules specified in the Open XML packaging specification and returns the content type associated with part_name in the archive.
If there's no content type mapped for the part or for the extension, an empty string is returned.
on_begin_file Event
Fired before each file is processed.
Syntax
class OfficeDocBeginFileEventParams(object): @property def index() -> int: ... @property def skip() -> bool: ... @skip.setter def skip(value) -> None: ... # In class OfficeDoc: @property def on_begin_file() -> Callable[[OfficeDocBeginFileEventParams], None]: ... @on_begin_file.setter def on_begin_file(event_hook: Callable[[OfficeDocBeginFileEventParams], None]) -> None: ...
Remarks
on_begin_file is fired before each file is processed by the compressor or decompressor, as appropriate. Index contains the array index of the file about to be processed, and the file_compressed_name, file_decompressed_name, file_compressed_size (decompression only), and file_decompressed_size fields of the files collection for this index contain more detailed information about the file about to be processed.
When extracting, an alternate location may be specified by trapping the event, and modifying file_decompressed_name and/or extract_to_path. If file_decompressed_name is set to an empty string, the file will not be written to disk. If WriteToProgressEvent is true, the file will still be decompressed, and the data may be extracted through the on_progress event.
This event may also be trapped while compressing. file_compressed_name and file_decompressed_name may be changed.
You may set the Skip parameter to true in order to skip the file completely while compressing or extracting.
on_characters Event
This event is fired for plaintext segments of the input stream.
Syntax
class OfficeDocCharactersEventParams(object): @property def text() -> str: ... # In class OfficeDoc: @property def on_characters() -> Callable[[OfficeDocCharactersEventParams], None]: ... @on_characters.setter def on_characters(event_hook: Callable[[OfficeDocCharactersEventParams], None]) -> None: ...
Remarks
The on_characters event provides the plaintext content of the XML document (i.e., the text inside the tags). The text is provided through the Text parameter.
The text includes white space as well as end-of-line characters, except for ignorable whitespace, which is fired through the on_ignorable_whitespace event.
on_comment Event
This event is fired when a comment section is encountered.
Syntax
class OfficeDocCommentEventParams(object): @property def text() -> str: ... # In class OfficeDoc: @property def on_comment() -> Callable[[OfficeDocCommentEventParams], None]: ... @on_comment.setter def on_comment(event_hook: Callable[[OfficeDocCommentEventParams], None]) -> None: ...
Remarks
The on_comment event is fired whenever a comment section (<!-- ..text... -->) is found in the document.
The full text of the comment is provided by the Text parameter.
on_end_element Event
This event is fired when an end-element tag is encountered.
Syntax
class OfficeDocEndElementEventParams(object): @property def namespace() -> str: ... @property def element() -> str: ... @property def q_name() -> str: ... @property def is_empty() -> bool: ... # In class OfficeDoc: @property def on_end_element() -> Callable[[OfficeDocEndElementEventParams], None]: ... @on_end_element.setter def on_end_element(event_hook: Callable[[OfficeDocEndElementEventParams], None]) -> None: ...
Remarks
The on_end_element event is fired when an end-element tag is found in the document.
The element name is provided by the Element parameter.
The IsEmpty parameter is true when the event corresponds to an empty element declaration.
on_end_file Event
Fired after each file is processed.
Syntax
class OfficeDocEndFileEventParams(object): @property def index() -> int: ... # In class OfficeDoc: @property def on_end_file() -> Callable[[OfficeDocEndFileEventParams], None]: ... @on_end_file.setter def on_end_file(event_hook: Callable[[OfficeDocEndFileEventParams], None]) -> None: ...
Remarks
on_end_file is fired after each file is processed by the compressor or decompressor, as appropriate. Index contains the array index of the file processed, and the file_compressed_name, file_decompressed_name, file_compressed_size, and file_decompressed_size fields in the files collection for this index contain more detailed information about the file processed.
on_end_prefix_mapping Event
This event is fired when leaving the scope of a namespace declaration.
Syntax
class OfficeDocEndPrefixMappingEventParams(object): @property def prefix() -> str: ... # In class OfficeDoc: @property def on_end_prefix_mapping() -> Callable[[OfficeDocEndPrefixMappingEventParams], None]: ... @on_end_prefix_mapping.setter def on_end_prefix_mapping(event_hook: Callable[[OfficeDocEndPrefixMappingEventParams], None]) -> None: ...
Remarks
The on_start_prefix_mapping event is fired when entering the scope of a namespace declaration.
on_error Event
Fired when information is available about errors during data delivery.
Syntax
class OfficeDocErrorEventParams(object): @property def error_code() -> int: ... @property def description() -> str: ... # In class OfficeDoc: @property def on_error() -> Callable[[OfficeDocErrorEventParams], None]: ... @on_error.setter def on_error(event_hook: Callable[[OfficeDocErrorEventParams], None]) -> None: ...
Remarks
The on_error event is fired in case of exceptional conditions during message processing. Normally the class fails with an error.
The ErrorCode parameter contains an error code, and the Description parameter contains a textual description of the error. For a list of valid error codes and their descriptions, please refer to the Error Codes section.
on_eval_entity Event
This event is fired every time an entity needs to be evaluated.
Syntax
class OfficeDocEvalEntityEventParams(object): @property def entity() -> str: ... @property def value() -> str: ... @value.setter def value(value) -> None: ... # In class OfficeDoc: @property def on_eval_entity() -> Callable[[OfficeDocEvalEntityEventParams], None]: ... @on_eval_entity.setter def on_eval_entity(event_hook: Callable[[OfficeDocEvalEntityEventParams], None]) -> None: ...
Remarks
The Value parameter contains a suggested value for the entity (normally the entity name itself). You may set Value to a value of your choice, which will be later passed into the text stream.
on_ignorable_whitespace Event
This event is fired when a section of ignorable whitespace is encountered.
Syntax
class OfficeDocIgnorableWhitespaceEventParams(object): @property def text() -> str: ... # In class OfficeDoc: @property def on_ignorable_whitespace() -> Callable[[OfficeDocIgnorableWhitespaceEventParams], None]: ... @on_ignorable_whitespace.setter def on_ignorable_whitespace(event_hook: Callable[[OfficeDocIgnorableWhitespaceEventParams], None]) -> None: ...
Remarks
The ignorable whitespace section is provided by the Text parameter.
on_meta Event
This event fires when a meta section is encountered.
Syntax
class OfficeDocMetaEventParams(object): @property def text() -> str: ... # In class OfficeDoc: @property def on_meta() -> Callable[[OfficeDocMetaEventParams], None]: ... @on_meta.setter def on_meta(event_hook: Callable[[OfficeDocMetaEventParams], None]) -> None: ...
Remarks
The on_meta event is fired whenever a meta information section (<! ..text... >) is found in the document.
The full text of the meta section is provided by the Text parameter.
on_overwrite Event
Fired whenever a file exists and may be overwritten.
Syntax
class OfficeDocOverwriteEventParams(object): @property def filename() -> str: ... @filename.setter def filename(value) -> None: ... @property def overwrite() -> bool: ... @overwrite.setter def overwrite(value) -> None: ... # In class OfficeDoc: @property def on_overwrite() -> Callable[[OfficeDocOverwriteEventParams], None]: ... @on_overwrite.setter def on_overwrite(event_hook: Callable[[OfficeDocOverwriteEventParams], None]) -> None: ...
Remarks
on_overwrite is fired when a file is about to be overwritten, and would overwrite an existing file. The event is fired during decompression.
Filename contains the full name of the file, specified with its pathname.
Overwrite specifies whether or not the file will be overwritten. For Zip, Jar, and Tar, this is equal by default to the value of the overwrite_files property. For Gzip, this value defaults to true.
Either of the parameters may be changed when the event is fired. Changing the value of Overwrite will override the default behavior of the class, and cause the file to be overwritten or not overwritten, depending on the value set. If Filename is changed, the value of Overwrite will be ignored, and the file will be written with the specified name. If a file of the new name also exists this file will be silently overwritten.
on_pi Event
This event is fired when a processing instruction section is encountered.
Syntax
class OfficeDocPIEventParams(object): @property def text() -> str: ... # In class OfficeDoc: @property def on_pi() -> Callable[[OfficeDocPIEventParams], None]: ... @on_pi.setter def on_pi(event_hook: Callable[[OfficeDocPIEventParams], None]) -> None: ...
Remarks
The on_pi event is fired whenever a processing instruction section (<? ..text... ?>) is found in the document.
The full text of the processing instruction is provided by the Text parameter.
on_progress Event
Fired as progress is made.
Syntax
class OfficeDocProgressEventParams(object): @property def data() -> bytes: ... @property def filename() -> str: ... @property def bytes_processed() -> int: ... @property def percent_processed() -> int: ... # In class OfficeDoc: @property def on_progress() -> Callable[[OfficeDocProgressEventParams], None]: ... @on_progress.setter def on_progress(event_hook: Callable[[OfficeDocProgressEventParams], None]) -> None: ...
Remarks
The on_progress event is automatically fired as compression or decompression is performed. When WriteToProgressEvent is true, the output data is provided through the Data parameter, allowing for it to be streamed out.
Filename contains the name of the file being written. If no file is being written, Filename will contain an empty string, and the output data will be provided exclusively through this event.
BytesProcessed contains the total number of uncompressed bytes processed. PercentProcessed contains the percent of uncompressed bytes processed, corresponding roughly to the running time of the operation.
For Gzip extraction only, BytesProcessed and PercentProcessed will reflect the number of compressed bytes extracted, as it is generally impossible to predetermine the total uncompressed size.
If WriteToProgressEvent is false, Data will contain null.
on_special_section Event
This event is fired when a special section is encountered.
Syntax
class OfficeDocSpecialSectionEventParams(object): @property def section_id() -> str: ... @property def text() -> str: ... # In class OfficeDoc: @property def on_special_section() -> Callable[[OfficeDocSpecialSectionEventParams], None]: ... @on_special_section.setter def on_special_section(event_hook: Callable[[OfficeDocSpecialSectionEventParams], None]) -> None: ...
Remarks
The on_special_section event is fired whenever a special section (such as <![ CDATA [ ..text... ]]>) is found in the document.
The full text of the special section is provided by the Text parameter, and the SectionId parameter provides the section identifier (e.g., CDATA).
on_start_element Event
This event is fired when a begin-element tag is encountered in the document.
Syntax
class OfficeDocStartElementEventParams(object): @property def namespace() -> str: ... @property def element() -> str: ... @property def q_name() -> str: ... @property def is_empty() -> bool: ... # In class OfficeDoc: @property def on_start_element() -> Callable[[OfficeDocStartElementEventParams], None]: ... @on_start_element.setter def on_start_element(event_hook: Callable[[OfficeDocStartElementEventParams], None]) -> None: ...
Remarks
The on_start_element event is fired when a begin-element tag is found in the document.
The element name is provided through the Element parameter. The attribute names and values (if any) are provided through the attr_name, attr_namespace, attr_prefix, and attr_value properties.
The IsEmpty parameter is True when the event corresponds to an empty element declaration.
on_start_prefix_mapping Event
This event is fired when entering the scope of a namespace declaration.
Syntax
class OfficeDocStartPrefixMappingEventParams(object): @property def prefix() -> str: ... @property def uri() -> str: ... # In class OfficeDoc: @property def on_start_prefix_mapping() -> Callable[[OfficeDocStartPrefixMappingEventParams], None]: ... @on_start_prefix_mapping.setter def on_start_prefix_mapping(event_hook: Callable[[OfficeDocStartPrefixMappingEventParams], None]) -> None: ...
Remarks
The on_end_prefix_mapping event is fired when leaving the scope of a namespace declaration.
OfficeDoc Config Settings
The class accepts one or more of the following configuration settings. Configuration settings are similar in functionality to properties, but they are rarely used. In order to avoid "polluting" the property namespace of the class, access to these internal properties is provided through the config method.OfficeDoc Config Settings
Sometimes the Part Names retrieved from a document will be of a format that is not directly usable in the part_name property when retrieving the part. For example:
/ppt/slides/../media/image1.jpeg
When this option is set to True the component will automatically normalize these Part Names so that they can be directly used in the part_name property for retrieving the part. For example, the above would become:
/ppt/media/image1.jpeg
The default is True.
Some relationships in an Office document may be external items, such as URLs and files on disk. These relationships are not directly accessible via extract_part. This configuration option will return "1" if the relationship at index "x" of relationships is an external part. Otherwise it will return "0"
Base Config Settings
When queried, this setting will return a string containing information about the product's build.
The default code page is Unicode UTF-8 (65001).
The following is a list of valid code page identifiers:
Identifier | Name |
037 | IBM EBCDIC - U.S./Canada |
437 | OEM - United States |
500 | IBM EBCDIC - International |
708 | Arabic - ASMO 708 |
709 | Arabic - ASMO 449+, BCON V4 |
710 | Arabic - Transparent Arabic |
720 | Arabic - Transparent ASMO |
737 | OEM - Greek (formerly 437G) |
775 | OEM - Baltic |
850 | OEM - Multilingual Latin I |
852 | OEM - Latin II |
855 | OEM - Cyrillic (primarily Russian) |
857 | OEM - Turkish |
858 | OEM - Multilingual Latin I + Euro symbol |
860 | OEM - Portuguese |
861 | OEM - Icelandic |
862 | OEM - Hebrew |
863 | OEM - Canadian-French |
864 | OEM - Arabic |
865 | OEM - Nordic |
866 | OEM - Russian |
869 | OEM - Modern Greek |
870 | IBM EBCDIC - Multilingual/ROECE (Latin-2) |
874 | ANSI/OEM - Thai (same as 28605, ISO 8859-15) |
875 | IBM EBCDIC - Modern Greek |
932 | ANSI/OEM - Japanese, Shift-JIS |
936 | ANSI/OEM - Simplified Chinese (PRC, Singapore) |
949 | ANSI/OEM - Korean (Unified Hangul Code) |
950 | ANSI/OEM - Traditional Chinese (Taiwan; Hong Kong SAR, PRC) |
1026 | IBM EBCDIC - Turkish (Latin-5) |
1047 | IBM EBCDIC - Latin 1/Open System |
1140 | IBM EBCDIC - U.S./Canada (037 + Euro symbol) |
1141 | IBM EBCDIC - Germany (20273 + Euro symbol) |
1142 | IBM EBCDIC - Denmark/Norway (20277 + Euro symbol) |
1143 | IBM EBCDIC - Finland/Sweden (20278 + Euro symbol) |
1144 | IBM EBCDIC - Italy (20280 + Euro symbol) |
1145 | IBM EBCDIC - Latin America/Spain (20284 + Euro symbol) |
1146 | IBM EBCDIC - United Kingdom (20285 + Euro symbol) |
1147 | IBM EBCDIC - France (20297 + Euro symbol) |
1148 | IBM EBCDIC - International (500 + Euro symbol) |
1149 | IBM EBCDIC - Icelandic (20871 + Euro symbol) |
1200 | Unicode UCS-2 Little-Endian (BMP of ISO 10646) |
1201 | Unicode UCS-2 Big-Endian |
1250 | ANSI - Central European |
1251 | ANSI - Cyrillic |
1252 | ANSI - Latin I |
1253 | ANSI - Greek |
1254 | ANSI - Turkish |
1255 | ANSI - Hebrew |
1256 | ANSI - Arabic |
1257 | ANSI - Baltic |
1258 | ANSI/OEM - Vietnamese |
1361 | Korean (Johab) |
10000 | MAC - Roman |
10001 | MAC - Japanese |
10002 | MAC - Traditional Chinese (Big5) |
10003 | MAC - Korean |
10004 | MAC - Arabic |
10005 | MAC - Hebrew |
10006 | MAC - Greek I |
10007 | MAC - Cyrillic |
10008 | MAC - Simplified Chinese (GB 2312) |
10010 | MAC - Romania |
10017 | MAC - Ukraine |
10021 | MAC - Thai |
10029 | MAC - Latin II |
10079 | MAC - Icelandic |
10081 | MAC - Turkish |
10082 | MAC - Croatia |
12000 | Unicode UCS-4 Little-Endian |
12001 | Unicode UCS-4 Big-Endian |
20000 | CNS - Taiwan |
20001 | TCA - Taiwan |
20002 | Eten - Taiwan |
20003 | IBM5550 - Taiwan |
20004 | TeleText - Taiwan |
20005 | Wang - Taiwan |
20105 | IA5 IRV International Alphabet No. 5 (7-bit) |
20106 | IA5 German (7-bit) |
20107 | IA5 Swedish (7-bit) |
20108 | IA5 Norwegian (7-bit) |
20127 | US-ASCII (7-bit) |
20261 | T.61 |
20269 | ISO 6937 Non-Spacing Accent |
20273 | IBM EBCDIC - Germany |
20277 | IBM EBCDIC - Denmark/Norway |
20278 | IBM EBCDIC - Finland/Sweden |
20280 | IBM EBCDIC - Italy |
20284 | IBM EBCDIC - Latin America/Spain |
20285 | IBM EBCDIC - United Kingdom |
20290 | IBM EBCDIC - Japanese Katakana Extended |
20297 | IBM EBCDIC - France |
20420 | IBM EBCDIC - Arabic |
20423 | IBM EBCDIC - Greek |
20424 | IBM EBCDIC - Hebrew |
20833 | IBM EBCDIC - Korean Extended |
20838 | IBM EBCDIC - Thai |
20866 | Russian - KOI8-R |
20871 | IBM EBCDIC - Icelandic |
20880 | IBM EBCDIC - Cyrillic (Russian) |
20905 | IBM EBCDIC - Turkish |
20924 | IBM EBCDIC - Latin-1/Open System (1047 + Euro symbol) |
20932 | JIS X 0208-1990 & 0121-1990 |
20936 | Simplified Chinese (GB2312) |
21025 | IBM EBCDIC - Cyrillic (Serbian, Bulgarian) |
21027 | Extended Alpha Lowercase |
21866 | Ukrainian (KOI8-U) |
28591 | ISO 8859-1 Latin I |
28592 | ISO 8859-2 Central Europe |
28593 | ISO 8859-3 Latin 3 |
28594 | ISO 8859-4 Baltic |
28595 | ISO 8859-5 Cyrillic |
28596 | ISO 8859-6 Arabic |
28597 | ISO 8859-7 Greek |
28598 | ISO 8859-8 Hebrew |
28599 | ISO 8859-9 Latin 5 |
28605 | ISO 8859-15 Latin 9 |
29001 | Europa 3 |
38598 | ISO 8859-8 Hebrew |
50220 | ISO 2022 Japanese with no halfwidth Katakana |
50221 | ISO 2022 Japanese with halfwidth Katakana |
50222 | ISO 2022 Japanese JIS X 0201-1989 |
50225 | ISO 2022 Korean |
50227 | ISO 2022 Simplified Chinese |
50229 | ISO 2022 Traditional Chinese |
50930 | Japanese (Katakana) Extended |
50931 | US/Canada and Japanese |
50933 | Korean Extended and Korean |
50935 | Simplified Chinese Extended and Simplified Chinese |
50936 | Simplified Chinese |
50937 | US/Canada and Traditional Chinese |
50939 | Japanese (Latin) Extended and Japanese |
51932 | EUC - Japanese |
51936 | EUC - Simplified Chinese |
51949 | EUC - Korean |
51950 | EUC - Traditional Chinese |
52936 | HZ-GB2312 Simplified Chinese |
54936 | Windows XP: GB18030 Simplified Chinese (4 Byte) |
57002 | ISCII Devanagari |
57003 | ISCII Bengali |
57004 | ISCII Tamil |
57005 | ISCII Telugu |
57006 | ISCII Assamese |
57007 | ISCII Oriya |
57008 | ISCII Kannada |
57009 | ISCII Malayalam |
57010 | ISCII Gujarati |
57011 | ISCII Punjabi |
65000 | Unicode UTF-7 |
65001 | Unicode UTF-8 |
Identifier | Name |
1 | ASCII |
2 | NEXTSTEP |
3 | JapaneseEUC |
4 | UTF8 |
5 | ISOLatin1 |
6 | Symbol |
7 | NonLossyASCII |
8 | ShiftJIS |
9 | ISOLatin2 |
10 | Unicode |
11 | WindowsCP1251 |
12 | WindowsCP1252 |
13 | WindowsCP1253 |
14 | WindowsCP1254 |
15 | WindowsCP1250 |
21 | ISO2022JP |
30 | MacOSRoman |
10 | UTF16String |
0x90000100 | UTF16BigEndian |
0x94000100 | UTF16LittleEndian |
0x8c000100 | UTF32String |
0x98000100 | UTF32BigEndian |
0x9c000100 | UTF32LittleEndian |
65536 | Proprietary |
When queried, this setting will return a string containing information about the license this instance of a class is using. It will return the following information:
- Product: The product the license is for.
- Product Key: The key the license was generated from.
- License Source: Where the license was found (e.g., RuntimeLicense, License File).
- License Type: The type of license installed (e.g., Royalty Free, Single Server).
- Last Valid Build: The last valid build number for which the license will work.
In certain circumstances it may be beneficial to mask sensitive data, like passwords, in log messages. Set this to True to mask sensitive data. The default is True.
This setting only works on these classes: AS3Receiver, AS3Sender, Atom, Client(3DS), FTP, FTPServer, IMAP, OFTPClient, SSHClient, SCP, Server(3DS), Sexec, SFTP, SFTPServer, SSHServer, TCPClient, TCPServer.
If set to False, the class will not fire internal idle events. Set this to False to use the class in a background thread on Mac OS. By default, this setting is True.
If there are no events to process when do_events is called, the class will wait for the amount of time specified here before returning. The default value is 20.
When set to False, the class will use the system security libraries by default to perform cryptographic functions where applicable.
Setting this configuration setting to True tells the class to use the internal implementation instead of using the system security libraries.
On Windows, this setting is set to False by default. On Linux/macOS, this setting is set to True by default.
To use the system security libraries for Linux, OpenSSL support must be enabled. For more information on how to enable OpenSSL, please refer to the OpenSSL Notes section.
OfficeDoc Errors
ErrorsThe following errors may be generated by the class. Note that frequently the error message will contain more specific information than what is listed here.
Note that some non-fatal errors may be trapped and explicitly ignored in the on_error event. This will allow the class to continue operation even in case of error.
OfficeDoc Errors
268 | The specified part name could not be found on the package, or the part is stored in interleaved format. |