OfficeDoc Class

Properties   Methods   Events   Config Settings   Errors  

The OfficeDoc class implements support for the Open XML Packaging Format used in Office 2007 documents.

Syntax

class ipworkszip.OfficeDoc

Remarks

The class provides a way to extract information and content from an Open XML packaged document, examine the package properties and basic read/update facilities.

Property List


The following is the full list of the properties of the class with short descriptions. Click on the links for further details.

content_type_countThe number of records in the ContentType arrays.
content_type_is_overrideSpecifies if this is a default content type or an override.
content_type_media_typeThe media type for this entry, as defined by RFC2616.
content_type_targetIf it's a default content type, this will be the file extension it applies to.
namespace_countThe number of records in the Namespace arrays.
namespace_prefixThe Prefix for the Namespace .
namespace_uriNamespace URI associated with the corresponding Prefix .
package_pathThe path to the Open XML package file.
package_property_countThe number of records in the PackageProperty arrays.
package_property_data_typeThe data type associated with this property, if the information is available.
package_property_nameThe name of this property.
package_property_namespaceThe XML Namespace URI associated with this property.
package_property_prop_idIf this is a custom property, this will be the pid assigned to it.
package_property_prop_setIf this is a custom property, this will be the GUID of the property set it belongs to.
package_property_valueThe value of this property.
part_dataThe contents of the currently selected part.
part_nameThe name of the currently selected part.
relationship_countThe number of records in the Relationship arrays.
relationship_content_typeThe content type for the part referenced by this relationship, resolved from [Content_Types].
relationship_idThe unique ID of this relationship within this .
relationship_part_nameThe name of the part this relationship entry applies to.
relationship_type_uriThe XML namespace URI that specifies the meaning of this relationship.
validateThis property controls whether documents are validated during parsing.
attr_countThe number of records in the Attr arrays.
attr_nameThe Name provides the local name (without prefix) of the attribute.
attr_namespaceAttribute namespace.
attr_prefixAttribute prefix (if any).
attr_valueAttribute value.
x_child_countThe number of records in the XChild arrays.
x_child_nameThe Name property provides the local name (without prefix) of the element.
x_child_namespaceNamespace of the element.
x_child_prefixPrefix of the element (if any).
x_child_x_textThe inner text of the element.
x_elementThe name of the current element.
x_namespaceThe namespace of the current element.
x_parentThe parent of the current element.
xpathProvides a way to point to a specific element in the document.
x_prefixThe prefix of the current element.
x_sub_treeA snapshot of the current element in the document.
x_textThe text of the current element.

Method List


The following is the full list of the methods of the class with short descriptions. Click on the links for further details.

closeCloses the Open XML package archive.
configSets or retrieves a configuration setting.
extract_partReads the contents of the currently selected part.
find_part_by_typeLooks up a part in the current relationships file by it's type namespace URI.
get_property_valueReturns the value of the specified package property.
list_partsList all the parts contained in the document and their relationships.
openOpens the Open XML package archive.
parse_partParses the specified part as XML.
read_relationshipsReads the relationships file in the archive associated with the specified part.
replace_partReplaces the contents of the specified part in the package.
resetResets the class.
resolve_content_typeReturns the content type of the specified part.

Event List


The following is the full list of the events fired by the class with short descriptions. Click on the links for further details.

on_begin_fileFired before each file is processed.
on_charactersFired for plain text segments of the input stream.
on_commentFired when a comment section is encountered.
on_end_elementFired when an end-element tag is encountered.
on_end_fileFired after each file is processed.
on_end_prefix_mappingFired when leaving the scope of a namespace declaration.
on_errorInformation about errors during data delivery.
on_eval_entityFired every time an entity needs to be evaluated.
on_ignorable_whitespaceFired when a section of ignorable whitespace is encountered.
on_metaFired when a meta section is encountered.
on_overwriteFired whenever a file exists and may be overwritten.
on_piFired when a processing instruction section is encountered.
on_progressFired as progress is made.
on_special_sectionFired when a special section is encountered.
on_start_elementFired when a begin-element tag is encountered in the document.
on_start_prefix_mappingFired when entering the scope of a namespace declaration.

Config Settings


The following is a list of config settings for the class with short descriptions. Click on the links for further details.

NormalizePartNameWhether to normalize Part Names.
RelationshipIsExternal[x]Whether the relationship part is internal or external.
BuildInfoInformation about the product's build.
CodePageThe system code page used for Unicode to Multibyte translations.
LicenseInfoInformation about the current license.
MaskSensitiveWhether sensitive data is masked in log messages.
ProcessIdleEventsWhether the class uses its internal event loop to process events when the main thread is idle.
SelectWaitMillisThe length of time in milliseconds the class will wait when DoEvents is called if there are no events to process.
UseInternalSecurityAPITells the class whether or not to use the system security libraries or an internal implementation.

content_type_count Property

The number of records in the ContentType arrays.

Syntax

def get_content_type_count() -> int: ...

content_type_count = property(get_content_type_count, None)

Default Value

0

Remarks

This property controls the size of the following arrays:

The array indices start at 0 and end at content_type_count - 1.

This property is read-only.

content_type_is_override Property

Specifies if this is a default content type or an override.

Syntax

def get_content_type_is_override(content_type_index: int) -> bool: ...

Default Value

TRUE

Remarks

Specifies if this is a default content type or an override.

The content_type_index parameter specifies the index of the item in the array. The size of the array is controlled by the content_type_count property.

This property is read-only.

content_type_media_type Property

The media type for this entry, as defined by RFC2616.

Syntax

def get_content_type_media_type(content_type_index: int) -> str: ...

Default Value

""

Remarks

The media type for this entry, as defined by RFC2616.

The content_type_index parameter specifies the index of the item in the array. The size of the array is controlled by the content_type_count property.

This property is read-only.

content_type_target Property

If it's a default content type, this will be the file extension it applies to.

Syntax

def get_content_type_target(content_type_index: int) -> str: ...

Default Value

""

Remarks

If it's a default content type, this will be the file extension it applies to. Otherwise, it will be the part name.

The content_type_index parameter specifies the index of the item in the array. The size of the array is controlled by the content_type_count property.

This property is read-only.

namespace_count Property

The number of records in the Namespace arrays.

Syntax

def get_namespace_count() -> int: ...

namespace_count = property(get_namespace_count, None)

Default Value

0

Remarks

This property controls the size of the following arrays:

The array indices start at 0 and end at namespace_count - 1.

This property is read-only.

namespace_prefix Property

The Prefix for the Namespace .

Syntax

def get_namespace_prefix(namespace_index: int) -> str: ...

Default Value

""

Remarks

The namespace_prefix for the namespace.

The namespace_index parameter specifies the index of the item in the array. The size of the array is controlled by the namespace_count property.

This property is read-only.

namespace_uri Property

Namespace URI associated with the corresponding Prefix .

Syntax

def get_namespace_uri(namespace_index: int) -> str: ...

Default Value

""

Remarks

Namespace URI associated with the corresponding namespace_prefix. This is usually a URL pointing to the XML schema for the namespace.

The namespace_index parameter specifies the index of the item in the array. The size of the array is controlled by the namespace_count property.

This property is read-only.

package_path Property

The path to the Open XML package file.

Syntax

def get_package_path() -> str: ...
def set_package_path(value: str) -> None: ...

package_path = property(get_package_path, set_package_path)

Default Value

""

Remarks

This property specifies the path and filename of the Open XML package to work on.

package_property_count Property

The number of records in the PackageProperty arrays.

Syntax

def get_package_property_count() -> int: ...

package_property_count = property(get_package_property_count, None)

Default Value

0

Remarks

This property controls the size of the following arrays:

The array indices start at 0 and end at package_property_count - 1.

This property is read-only.

package_property_data_type Property

The data type associated with this property, if the information is available.

Syntax

def get_package_property_data_type(package_property_index: int) -> str: ...

Default Value

""

Remarks

The data type associated with this property, if the information is available.

The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.

This property is read-only.

package_property_name Property

The name of this property.

Syntax

def get_package_property_name(package_property_index: int) -> str: ...

Default Value

""

Remarks

The name of this property

The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.

This property is read-only.

package_property_namespace Property

The XML Namespace URI associated with this property.

Syntax

def get_package_property_namespace(package_property_index: int) -> str: ...

Default Value

""

Remarks

The XML Namespace URI associated with this property. For custom properties, this will be an empty string.

The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.

This property is read-only.

package_property_prop_id Property

If this is a custom property, this will be the pid assigned to it.

Syntax

def get_package_property_prop_id(package_property_index: int) -> str: ...

Default Value

""

Remarks

If this is a custom property, this will be the pid assigned to it.

The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.

This property is read-only.

package_property_prop_set Property

If this is a custom property, this will be the GUID of the property set it belongs to.

Syntax

def get_package_property_prop_set(package_property_index: int) -> str: ...

Default Value

""

Remarks

If this is a custom property, this will be the GUID of the property set it belongs to.

The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.

This property is read-only.

package_property_value Property

The value of this property.

Syntax

def get_package_property_value(package_property_index: int) -> str: ...

Default Value

""

Remarks

The value of this property

The package_property_index parameter specifies the index of the item in the array. The size of the array is controlled by the package_property_count property.

This property is read-only.

part_data Property

The contents of the currently selected part.

Syntax

def get_part_data() -> bytes: ...
def set_part_data(value: bytes) -> None: ...

part_data = property(get_part_data, set_part_data)

Default Value

""

Remarks

This property will hold the contents of the part selected by part_name after calling the extract_part method. It can also be set before calling replace_part.

part_name Property

The name of the currently selected part.

Syntax

def get_part_name() -> str: ...
def set_part_name(value: str) -> None: ...

part_name = property(get_part_name, set_part_name)

Default Value

""

Remarks

This property specifies the name of the currently selected part in the document. If null or empty, no part is currently selected.

relationship_count Property

The number of records in the Relationship arrays.

Syntax

def get_relationship_count() -> int: ...

relationship_count = property(get_relationship_count, None)

Default Value

0

Remarks

This property controls the size of the following arrays:

The array indices start at 0 and end at relationship_count - 1.

This property is read-only.

relationship_content_type Property

The content type for the part referenced by this relationship, resolved from [Content_Types].

Syntax

def get_relationship_content_type(relationship_index: int) -> str: ...

Default Value

""

Remarks

The content type for the part referenced by this relationship, resolved from [Content_Types].xml according to the Open XML packaging specification rules.

The relationship_index parameter specifies the index of the item in the array. The size of the array is controlled by the relationship_count property.

This property is read-only.

relationship_id Property

The unique ID of this relationship within this .

Syntax

def get_relationship_id(relationship_index: int) -> str: ...

Default Value

""

Remarks

The unique ID of this relationship within this .rels file

The relationship_index parameter specifies the index of the item in the array. The size of the array is controlled by the relationship_count property.

This property is read-only.

relationship_part_name Property

The name of the part this relationship entry applies to.

Syntax

def get_relationship_part_name(relationship_index: int) -> str: ...

Default Value

""

Remarks

The name of the part this relationship entry applies to.

The relationship_index parameter specifies the index of the item in the array. The size of the array is controlled by the relationship_count property.

This property is read-only.

relationship_type_uri Property

The XML namespace URI that specifies the meaning of this relationship.

Syntax

def get_relationship_type_uri(relationship_index: int) -> str: ...

Default Value

""

Remarks

The XML namespace URI that specifies the meaning of this relationship.

The relationship_index parameter specifies the index of the item in the array. The size of the array is controlled by the relationship_count property.

This property is read-only.

validate Property

This property controls whether documents are validated during parsing.

Syntax

def get_validate() -> bool: ...
def set_validate(value: bool) -> None: ...

validate = property(get_validate, set_validate)

Default Value

TRUE

Remarks

When True (default) the document will be validated during parsing. To disable validation set validate to False. Disabling validation may be useful in cases where data can still be parsed even if the document is not well-formed.

attr_count Property

The number of records in the Attr arrays.

Syntax

def get_attr_count() -> int: ...

attr_count = property(get_attr_count, None)

Default Value

0

Remarks

This property controls the size of the following arrays:

The array indices start at 0 and end at attr_count - 1.

This property is read-only.

attr_name Property

The Name provides the local name (without prefix) of the attribute.

Syntax

def get_attr_name(attr_index: int) -> str: ...

Default Value

""

Remarks

The attr_name provides the local name (without prefix) of the attribute.

The attr_index parameter specifies the index of the item in the array. The size of the array is controlled by the attr_count property.

This property is read-only.

attr_namespace Property

Attribute namespace.

Syntax

def get_attr_namespace(attr_index: int) -> str: ...

Default Value

""

Remarks

Attribute namespace.

The attr_index parameter specifies the index of the item in the array. The size of the array is controlled by the attr_count property.

This property is read-only.

attr_prefix Property

Attribute prefix (if any).

Syntax

def get_attr_prefix(attr_index: int) -> str: ...

Default Value

""

Remarks

Attribute prefix (if any). If the attribute does not have a prefix, this property is empty.

The attr_index parameter specifies the index of the item in the array. The size of the array is controlled by the attr_count property.

This property is read-only.

attr_value Property

Attribute value.

Syntax

def get_attr_value(attr_index: int) -> str: ...

Default Value

""

Remarks

Attribute value.

The attr_index parameter specifies the index of the item in the array. The size of the array is controlled by the attr_count property.

This property is read-only.

x_child_count Property

The number of records in the XChild arrays.

Syntax

def get_x_child_count() -> int: ...

x_child_count = property(get_x_child_count, None)

Default Value

0

Remarks

This property controls the size of the following arrays:

The array indices start at 0 and end at x_child_count - 1.

This property is read-only.

x_child_name Property

The Name property provides the local name (without prefix) of the element.

Syntax

def get_x_child_name(x_child_index: int) -> str: ...

Default Value

""

Remarks

The x_child_name property provides the local name (without prefix) of the element.

The x_child_index parameter specifies the index of the item in the array. The size of the array is controlled by the x_child_count property.

This property is read-only.

x_child_namespace Property

Namespace of the element.

Syntax

def get_x_child_namespace(x_child_index: int) -> str: ...

Default Value

""

Remarks

Namespace of the element.

The x_child_index parameter specifies the index of the item in the array. The size of the array is controlled by the x_child_count property.

This property is read-only.

x_child_prefix Property

Prefix of the element (if any).

Syntax

def get_x_child_prefix(x_child_index: int) -> str: ...

Default Value

""

Remarks

Prefix of the element (if any). If the element does not have a prefix, this property is empty.

The x_child_index parameter specifies the index of the item in the array. The size of the array is controlled by the x_child_count property.

This property is read-only.

x_child_x_text Property

The inner text of the element.

Syntax

def get_x_child_x_text(x_child_index: int) -> str: ...

Default Value

""

Remarks

The inner text of the element.

The x_child_index parameter specifies the index of the item in the array. The size of the array is controlled by the x_child_count property.

This property is read-only.

x_element Property

The name of the current element.

Syntax

def get_x_element() -> str: ...

x_element = property(get_x_element, None)

Default Value

""

Remarks

The current element is specified via the xpath property.

This property is read-only.

x_namespace Property

The namespace of the current element.

Syntax

def get_x_namespace() -> str: ...

x_namespace = property(get_x_namespace, None)

Default Value

""

Remarks

The current element is specified via the xpath property.

This property is read-only.

x_parent Property

The parent of the current element.

Syntax

def get_x_parent() -> str: ...

x_parent = property(get_x_parent, None)

Default Value

""

Remarks

The current element is specified via the xpath property.

This property is read-only.

xpath Property

Provides a way to point to a specific element in the document.

Syntax

def get_xpath() -> str: ...
def set_xpath(value: str) -> None: ...

xpath = property(get_xpath, set_xpath)

Default Value

""

Remarks

xpath implements a subset of the XML XPath specification, allowing you to point to specific elements in the XML documents.

The path is a series of one or more element accessors separated by '/'. The path can be absolute (starting with '/') or relative to the current xpath location.

The following are possible values for an element accessor:

'name'A particular element name
name[i]The i-th subelement of the current element with the given name
[i]The i-th subelement of the current element
[last()]The last subelement of the current element
[last()-i]The subelement located at the last location minus i in the current element
name[@attrname="attrvalue"]The subelement containing a particular value for a given attribute (supports single AND double quotes)
..The parent of the current element
When xpath is set to a valid path, x_element points to the name of the element, with x_parent, x_namespace, x_prefix, x_child_name, x_child_namespace, x_child_prefix, x_child_x_text, and x_text providing other properties of the element. The attributes of the current element are provided in the attr_name, attr_namespace, attr_prefix, and attr_value properties.

build_dom must be set to True prior to parsing the document for the xpath functionality to be available.

Example (Setting XPath):

Document rootXML.XPath = "/"
Specific ElementXML.XPath = "/root/SubElement1/SubElement2/"
i-th ChildXML.XPath = "/root/SubElement1[i]"

x_prefix Property

The prefix of the current element.

Syntax

def get_x_prefix() -> str: ...

x_prefix = property(get_x_prefix, None)

Default Value

""

Remarks

The current element is specified via the xpath property.

This property is read-only.

x_sub_tree Property

A snapshot of the current element in the document.

Syntax

def get_x_sub_tree() -> str: ...

x_sub_tree = property(get_x_sub_tree, None)

Default Value

""

Remarks

The current element is specified via the xpath property. In order for this property to work you must have the CacheContent set to true.

This property is read-only.

x_text Property

The text of the current element.

Syntax

def get_x_text() -> str: ...

x_text = property(get_x_text, None)

Default Value

""

Remarks

The current element is specified via the xpath property.

This property is read-only.

close Method

Closes the Open XML package archive.

Syntax

def close() -> None: ...

Remarks

When this method is called, the class will close the current archive and release all resources.

config Method

Sets or retrieves a configuration setting.

Syntax

def config(configuration_string: str) -> str: ...

Remarks

config is a generic method available in every class. It is used to set and retrieve configuration settings for the class.

These settings are similar in functionality to properties, but they are rarely used. In order to avoid "polluting" the property namespace of the class, access to these internal properties is provided through the config method.

To set a configuration setting named PROPERTY, you must call Config("PROPERTY=VALUE"), where VALUE is the value of the setting expressed as a string. For boolean values, use the strings "True", "False", "0", "1", "Yes", or "No" (case does not matter).

To read (query) the value of a configuration setting, you must call Config("PROPERTY"). The value will be returned as a string.

extract_part Method

Reads the contents of the currently selected part.

Syntax

def extract_part() -> None: ...

Remarks

If the part specified by the part_name property exists, the corresponding physical file will be extracted from the archive and will be available through the part_data property.

If the part doesn't exists, or it's stored in interleaved format, an error will be raised.

find_part_by_type Method

Looks up a part in the current relationships file by it's type namespace URI.

Syntax

def find_part_by_type(type_uri: str) -> str: ...

Remarks

If a matching part can be found, it's part name is returned. Otherwise, an empty string is returned.

get_property_value Method

Returns the value of the specified package property.

Syntax

def get_property_value(prop_name: str, prop_namespace: str) -> str: ...

Remarks

Looks up a package property named PropName in namespace PropNamespace in the core and app properties tables and returns it's value, if found.

If the property doesn't exists, an empty string is returned.

For custom properties. use an empty string ("") as the value of the PropNamespace parameter.

list_parts Method

List all the parts contained in the document and their relationships.

Syntax

def list_parts() -> None: ...

Remarks

When this method is called, the class will read all the relationships in the document, recursively, and populate the relationships collection.

open Method

Opens the Open XML package archive.

Syntax

def open() -> None: ...

Remarks

When this method is called, the class will attempt to open the archive specified in package_path and extract package properties, content types and parse the master relationships file in the archive.

parse_part Method

Parses the specified part as XML.

Syntax

def parse_part() -> None: ...

Remarks

If the part specified by part_name exists, the corresponding physical file will be extracted from the archive and parsed as XML. If build_dom is enabled, the DOM will be built internally and you can use XPath to query the resulting document, using the xpath property. If build_dom is disabled, only the XML parser-related events will be fired.

read_relationships Method

Reads the relationships file in the archive associated with the specified part.

Syntax

def read_relationships() -> None: ...

Remarks

When this method is called, the class will look for a .rels file associated with the part specified by the part_name property. If found, the relationships collection will now expose the contents of the relationships for that part.

replace_part Method

Replaces the contents of the specified part in the package.

Syntax

def replace_part() -> None: ...

Remarks

If the part specified by the part_name property exists, the corresponding physical file will be replaced with the contents of the part_data property. The package file will be modified in place right away.

If the part doesn't exists, it's stored in interleaved format, or part_data is null or empty, an error will be raised.

reset Method

Resets the class.

Syntax

def reset() -> None: ...

Remarks

reset resets the state of the class. All properties will be set to their default values, and any files open will be closed.

resolve_content_type Method

Returns the content type of the specified part.

Syntax

def resolve_content_type() -> str: ...

Remarks

Applies the content type resolution rules specified in the Open XML packaging specification and returns the content type associated with part_name in the archive.

If there's no content type mapped for the part or for the extension, an empty string is returned.

on_begin_file Event

Fired before each file is processed.

Syntax

class OfficeDocBeginFileEventParams(object):
  @property
  def index() -> int: ...

  @property
  def skip() -> bool: ...
  @skip.setter
  def skip(value) -> None: ...

# In class OfficeDoc:
@property
def on_begin_file() -> Callable[[OfficeDocBeginFileEventParams], None]: ...
@on_begin_file.setter
def on_begin_file(event_hook: Callable[[OfficeDocBeginFileEventParams], None]) -> None: ...

Remarks

on_begin_file is fired before each file is processed by the compressor or decompressor, as appropriate. Index contains the array index of the file about to be processed, and the file_compressed_name, file_decompressed_name, file_compressed_size (decompression only), and file_decompressed_size fields of the files collection for this index contain more detailed information about the file about to be processed.

When extracting, an alternate location may be specified by trapping the event, and modifying file_decompressed_name and/or extract_to_path. If file_decompressed_name is set to an empty string, the file will not be written to disk. If WriteToProgressEvent is true, the file will still be decompressed, and the data may be extracted through the on_progress event.

This event may also be trapped while compressing. file_compressed_name and file_decompressed_name may be changed.

You may set the Skip parameter to true in order to skip the file completely while compressing or extracting.

on_characters Event

Fired for plain text segments of the input stream.

Syntax

class OfficeDocCharactersEventParams(object):
  @property
  def text() -> str: ...

# In class OfficeDoc:
@property
def on_characters() -> Callable[[OfficeDocCharactersEventParams], None]: ...
@on_characters.setter
def on_characters(event_hook: Callable[[OfficeDocCharactersEventParams], None]) -> None: ...

Remarks

The on_characters event provides the plain text content of the XML document (i.e. the text inside the tags). The text is provided through the Text parameter.

The text includes white space as well as end of line characters, except for ignorable whitespace which is fired through the on_ignorable_whitespace event.

on_comment Event

Fired when a comment section is encountered.

Syntax

class OfficeDocCommentEventParams(object):
  @property
  def text() -> str: ...

# In class OfficeDoc:
@property
def on_comment() -> Callable[[OfficeDocCommentEventParams], None]: ...
@on_comment.setter
def on_comment(event_hook: Callable[[OfficeDocCommentEventParams], None]) -> None: ...

Remarks

The on_comment event is fired whenever a comment section (<!-- ..text... -->) is found in the document.

The full text of the comment is provided by the Text parameter.

on_end_element Event

Fired when an end-element tag is encountered.

Syntax

class OfficeDocEndElementEventParams(object):
  @property
  def namespace() -> str: ...

  @property
  def element() -> str: ...

  @property
  def q_name() -> str: ...

  @property
  def is_empty() -> bool: ...

# In class OfficeDoc:
@property
def on_end_element() -> Callable[[OfficeDocEndElementEventParams], None]: ...
@on_end_element.setter
def on_end_element(event_hook: Callable[[OfficeDocEndElementEventParams], None]) -> None: ...

Remarks

The on_end_element event is fired when an end-element tag is found in the document.

The element name is provided by the Element parameter.

The IsEmpty parameter is true when the event corresponds with an empty element declaration.

on_end_file Event

Fired after each file is processed.

Syntax

class OfficeDocEndFileEventParams(object):
  @property
  def index() -> int: ...

# In class OfficeDoc:
@property
def on_end_file() -> Callable[[OfficeDocEndFileEventParams], None]: ...
@on_end_file.setter
def on_end_file(event_hook: Callable[[OfficeDocEndFileEventParams], None]) -> None: ...

Remarks

on_end_file is fired after each file is processed by the compressor or decompressor, as appropriate. Index contains the array index of the file processed, and the file_compressed_name, file_decompressed_name, file_compressed_size, and file_decompressed_size fields in the files collection for this index contain more detailed information about the file processed.

on_end_prefix_mapping Event

Fired when leaving the scope of a namespace declaration.

Syntax

class OfficeDocEndPrefixMappingEventParams(object):
  @property
  def prefix() -> str: ...

# In class OfficeDoc:
@property
def on_end_prefix_mapping() -> Callable[[OfficeDocEndPrefixMappingEventParams], None]: ...
@on_end_prefix_mapping.setter
def on_end_prefix_mapping(event_hook: Callable[[OfficeDocEndPrefixMappingEventParams], None]) -> None: ...

Remarks

The on_start_prefix_mapping event is fired when entering the scope of a namespace declaration.

on_error Event

Information about errors during data delivery.

Syntax

class OfficeDocErrorEventParams(object):
  @property
  def error_code() -> int: ...

  @property
  def description() -> str: ...

# In class OfficeDoc:
@property
def on_error() -> Callable[[OfficeDocErrorEventParams], None]: ...
@on_error.setter
def on_error(event_hook: Callable[[OfficeDocErrorEventParams], None]) -> None: ...

Remarks

The on_error event is fired in case of exceptional conditions during message processing. Normally the class fails with an error.

The ErrorCode parameter contains an error code, and the Description parameter contains a textual description of the error. For a list of valid error codes and their descriptions, please refer to the Error Codes section.

on_eval_entity Event

Fired every time an entity needs to be evaluated.

Syntax

class OfficeDocEvalEntityEventParams(object):
  @property
  def entity() -> str: ...

  @property
  def value() -> str: ...
  @value.setter
  def value(value) -> None: ...

# In class OfficeDoc:
@property
def on_eval_entity() -> Callable[[OfficeDocEvalEntityEventParams], None]: ...
@on_eval_entity.setter
def on_eval_entity(event_hook: Callable[[OfficeDocEvalEntityEventParams], None]) -> None: ...

Remarks

The Value parameter contains a suggested value for the entity (normally the entity name itself). You may set Value to a value of your choice, which will be later passed into the text stream.

on_ignorable_whitespace Event

Fired when a section of ignorable whitespace is encountered.

Syntax

class OfficeDocIgnorableWhitespaceEventParams(object):
  @property
  def text() -> str: ...

# In class OfficeDoc:
@property
def on_ignorable_whitespace() -> Callable[[OfficeDocIgnorableWhitespaceEventParams], None]: ...
@on_ignorable_whitespace.setter
def on_ignorable_whitespace(event_hook: Callable[[OfficeDocIgnorableWhitespaceEventParams], None]) -> None: ...

Remarks

The ignorable whitespace section is provided by the Text parameter.

on_meta Event

Fired when a meta section is encountered.

Syntax

class OfficeDocMetaEventParams(object):
  @property
  def text() -> str: ...

# In class OfficeDoc:
@property
def on_meta() -> Callable[[OfficeDocMetaEventParams], None]: ...
@on_meta.setter
def on_meta(event_hook: Callable[[OfficeDocMetaEventParams], None]) -> None: ...

Remarks

The on_meta event is fired whenever a meta information section (<! ..text... >) is found in the document.

The full text of the meta section is provided by the Text parameter.

on_overwrite Event

Fired whenever a file exists and may be overwritten.

Syntax

class OfficeDocOverwriteEventParams(object):
  @property
  def filename() -> str: ...
  @filename.setter
  def filename(value) -> None: ...

  @property
  def overwrite() -> bool: ...
  @overwrite.setter
  def overwrite(value) -> None: ...

# In class OfficeDoc:
@property
def on_overwrite() -> Callable[[OfficeDocOverwriteEventParams], None]: ...
@on_overwrite.setter
def on_overwrite(event_hook: Callable[[OfficeDocOverwriteEventParams], None]) -> None: ...

Remarks

on_overwrite is fired when a file is about to be overwritten, and would overwrite an existing file. The event is fired during decompression.

Filename contains the full name of the file, specified with its pathname.

Overwrite specifies whether or not the file will be overwritten. For Zip, Jar, and Tar, this is equal by default to the value of the overwrite_files property. For Gzip, this value defaults to true.

Either of the parameters may be changed when the event is fired. Changing the value of Overwrite will override the default behavior of the class, and cause the file to be overwritten or not overwritten, depending on the value set. If Filename is changed, the value of Overwrite will be ignored, and the file will be written with the specified name. If a file of the new name also exists this file will be silently overwritten.

on_pi Event

Fired when a processing instruction section is encountered.

Syntax

class OfficeDocPIEventParams(object):
  @property
  def text() -> str: ...

# In class OfficeDoc:
@property
def on_pi() -> Callable[[OfficeDocPIEventParams], None]: ...
@on_pi.setter
def on_pi(event_hook: Callable[[OfficeDocPIEventParams], None]) -> None: ...

Remarks

The on_pi event is fired whenever a processing instruction section (<? ..text... ?>) is found in the document.

The full text of the processing instruction is provided by the Text parameter.

on_progress Event

Fired as progress is made.

Syntax

class OfficeDocProgressEventParams(object):
  @property
  def data() -> bytes: ...

  @property
  def filename() -> str: ...

  @property
  def bytes_processed() -> int: ...

  @property
  def percent_processed() -> int: ...

# In class OfficeDoc:
@property
def on_progress() -> Callable[[OfficeDocProgressEventParams], None]: ...
@on_progress.setter
def on_progress(event_hook: Callable[[OfficeDocProgressEventParams], None]) -> None: ...

Remarks

The on_progress event is automatically fired as compression or decompression is performed. When WriteToProgressEvent is true, the output data is provided through the Data parameter, allowing for it to be streamed out.

Filename contains the name of the file being written. If no file is being written, Filename will contain an empty string, and the output data will be provided exclusively through this event.

BytesProcessed contains the total number of uncompressed bytes processed. PercentProcessed contains the percent of uncompressed bytes processed, corresponding roughly to the running time of the operation.

For Gzip extraction only, BytesProcessed and PercentProcessed will reflect the number of compressed bytes extracted, as it is generally impossible to predetermine the total uncompressed size.

If WriteToProgressEvent is false, Data will contain null.

on_special_section Event

Fired when a special section is encountered.

Syntax

class OfficeDocSpecialSectionEventParams(object):
  @property
  def section_id() -> str: ...

  @property
  def text() -> str: ...

# In class OfficeDoc:
@property
def on_special_section() -> Callable[[OfficeDocSpecialSectionEventParams], None]: ...
@on_special_section.setter
def on_special_section(event_hook: Callable[[OfficeDocSpecialSectionEventParams], None]) -> None: ...

Remarks

The on_special_section event is fired whenever a special section (such as <![ CDATA [ ..text... ]]>) is found in the document.

The full text of the special section is provided by the Text parameter, while the SectionId parameter provides the section identifier (e.g. "CDATA").

on_start_element Event

Fired when a begin-element tag is encountered in the document.

Syntax

class OfficeDocStartElementEventParams(object):
  @property
  def namespace() -> str: ...

  @property
  def element() -> str: ...

  @property
  def q_name() -> str: ...

  @property
  def is_empty() -> bool: ...

# In class OfficeDoc:
@property
def on_start_element() -> Callable[[OfficeDocStartElementEventParams], None]: ...
@on_start_element.setter
def on_start_element(event_hook: Callable[[OfficeDocStartElementEventParams], None]) -> None: ...

Remarks

The on_start_element event is fired when a begin-element tag is found in the document.

The element name is provided through the Element parameter. The attribute names and values (if any) are provided through the attr_name, attr_namespace, attr_prefix, and attr_value properties.

The IsEmpty parameter is true when the event corresponds with an empty element declaration.

on_start_prefix_mapping Event

Fired when entering the scope of a namespace declaration.

Syntax

class OfficeDocStartPrefixMappingEventParams(object):
  @property
  def prefix() -> str: ...

  @property
  def uri() -> str: ...

# In class OfficeDoc:
@property
def on_start_prefix_mapping() -> Callable[[OfficeDocStartPrefixMappingEventParams], None]: ...
@on_start_prefix_mapping.setter
def on_start_prefix_mapping(event_hook: Callable[[OfficeDocStartPrefixMappingEventParams], None]) -> None: ...

Remarks

The on_end_prefix_mapping event is fired when leaving the scope of a namespace declaration.

OfficeDoc Config Settings

The class accepts one or more of the following configuration settings. Configuration settings are similar in functionality to properties, but they are rarely used. In order to avoid "polluting" the property namespace of the class, access to these internal properties is provided through the config method.

OfficeDoc Config Settings

NormalizePartName:   Whether to normalize Part Names.

Sometimes the Part Names retrieved from a document will be of a format that is not directly usable in the part_name property when retrieving the part. For example: /ppt/slides/../media/image1.jpeg

When this option is set to True the component will automatically normalize these Part Names so that they can be directly used in the part_name property for retrieving the part. For example, the above would become: /ppt/media/image1.jpeg

The default is True.

RelationshipIsExternal[x]:   Whether the relationship part is internal or external.

Some relationships in an Office document may be external items, such as URLs and files on disk. These relationships are not directly accessible via extract_part. This configuration option will return "1" if the relationship at index "x" of the Relationship* arrays is an external part. Otherwise it will return "0"

Base Config Settings

BuildInfo:   Information about the product's build.

When queried, this setting will return a string containing information about the product's build.

CodePage:   The system code page used for Unicode to Multibyte translations.

The default code page is Unicode UTF-8 (65001).

The following is a list of valid code page identifiers:

IdentifierName
037IBM EBCDIC - U.S./Canada
437OEM - United States
500IBM EBCDIC - International
708Arabic - ASMO 708
709Arabic - ASMO 449+, BCON V4
710Arabic - Transparent Arabic
720Arabic - Transparent ASMO
737OEM - Greek (formerly 437G)
775OEM - Baltic
850OEM - Multilingual Latin I
852OEM - Latin II
855OEM - Cyrillic (primarily Russian)
857OEM - Turkish
858OEM - Multilingual Latin I + Euro symbol
860OEM - Portuguese
861OEM - Icelandic
862OEM - Hebrew
863OEM - Canadian-French
864OEM - Arabic
865OEM - Nordic
866OEM - Russian
869OEM - Modern Greek
870IBM EBCDIC - Multilingual/ROECE (Latin-2)
874ANSI/OEM - Thai (same as 28605, ISO 8859-15)
875IBM EBCDIC - Modern Greek
932ANSI/OEM - Japanese, Shift-JIS
936ANSI/OEM - Simplified Chinese (PRC, Singapore)
949ANSI/OEM - Korean (Unified Hangul Code)
950ANSI/OEM - Traditional Chinese (Taiwan; Hong Kong SAR, PRC)
1026IBM EBCDIC - Turkish (Latin-5)
1047IBM EBCDIC - Latin 1/Open System
1140IBM EBCDIC - U.S./Canada (037 + Euro symbol)
1141IBM EBCDIC - Germany (20273 + Euro symbol)
1142IBM EBCDIC - Denmark/Norway (20277 + Euro symbol)
1143IBM EBCDIC - Finland/Sweden (20278 + Euro symbol)
1144IBM EBCDIC - Italy (20280 + Euro symbol)
1145IBM EBCDIC - Latin America/Spain (20284 + Euro symbol)
1146IBM EBCDIC - United Kingdom (20285 + Euro symbol)
1147IBM EBCDIC - France (20297 + Euro symbol)
1148IBM EBCDIC - International (500 + Euro symbol)
1149IBM EBCDIC - Icelandic (20871 + Euro symbol)
1200Unicode UCS-2 Little-Endian (BMP of ISO 10646)
1201Unicode UCS-2 Big-Endian
1250ANSI - Central European
1251ANSI - Cyrillic
1252ANSI - Latin I
1253ANSI - Greek
1254ANSI - Turkish
1255ANSI - Hebrew
1256ANSI - Arabic
1257ANSI - Baltic
1258ANSI/OEM - Vietnamese
1361Korean (Johab)
10000MAC - Roman
10001MAC - Japanese
10002MAC - Traditional Chinese (Big5)
10003MAC - Korean
10004MAC - Arabic
10005MAC - Hebrew
10006MAC - Greek I
10007MAC - Cyrillic
10008MAC - Simplified Chinese (GB 2312)
10010MAC - Romania
10017MAC - Ukraine
10021MAC - Thai
10029MAC - Latin II
10079MAC - Icelandic
10081MAC - Turkish
10082MAC - Croatia
12000Unicode UCS-4 Little-Endian
12001Unicode UCS-4 Big-Endian
20000CNS - Taiwan
20001TCA - Taiwan
20002Eten - Taiwan
20003IBM5550 - Taiwan
20004TeleText - Taiwan
20005Wang - Taiwan
20105IA5 IRV International Alphabet No. 5 (7-bit)
20106IA5 German (7-bit)
20107IA5 Swedish (7-bit)
20108IA5 Norwegian (7-bit)
20127US-ASCII (7-bit)
20261T.61
20269ISO 6937 Non-Spacing Accent
20273IBM EBCDIC - Germany
20277IBM EBCDIC - Denmark/Norway
20278IBM EBCDIC - Finland/Sweden
20280IBM EBCDIC - Italy
20284IBM EBCDIC - Latin America/Spain
20285IBM EBCDIC - United Kingdom
20290IBM EBCDIC - Japanese Katakana Extended
20297IBM EBCDIC - France
20420IBM EBCDIC - Arabic
20423IBM EBCDIC - Greek
20424IBM EBCDIC - Hebrew
20833IBM EBCDIC - Korean Extended
20838IBM EBCDIC - Thai
20866Russian - KOI8-R
20871IBM EBCDIC - Icelandic
20880IBM EBCDIC - Cyrillic (Russian)
20905IBM EBCDIC - Turkish
20924IBM EBCDIC - Latin-1/Open System (1047 + Euro symbol)
20932JIS X 0208-1990 & 0121-1990
20936Simplified Chinese (GB2312)
21025IBM EBCDIC - Cyrillic (Serbian, Bulgarian)
21027Extended Alpha Lowercase
21866Ukrainian (KOI8-U)
28591ISO 8859-1 Latin I
28592ISO 8859-2 Central Europe
28593ISO 8859-3 Latin 3
28594ISO 8859-4 Baltic
28595ISO 8859-5 Cyrillic
28596ISO 8859-6 Arabic
28597ISO 8859-7 Greek
28598ISO 8859-8 Hebrew
28599ISO 8859-9 Latin 5
28605ISO 8859-15 Latin 9
29001Europa 3
38598ISO 8859-8 Hebrew
50220ISO 2022 Japanese with no halfwidth Katakana
50221ISO 2022 Japanese with halfwidth Katakana
50222ISO 2022 Japanese JIS X 0201-1989
50225ISO 2022 Korean
50227ISO 2022 Simplified Chinese
50229ISO 2022 Traditional Chinese
50930Japanese (Katakana) Extended
50931US/Canada and Japanese
50933Korean Extended and Korean
50935Simplified Chinese Extended and Simplified Chinese
50936Simplified Chinese
50937US/Canada and Traditional Chinese
50939Japanese (Latin) Extended and Japanese
51932EUC - Japanese
51936EUC - Simplified Chinese
51949EUC - Korean
51950EUC - Traditional Chinese
52936HZ-GB2312 Simplified Chinese
54936Windows XP: GB18030 Simplified Chinese (4 Byte)
57002ISCII Devanagari
57003ISCII Bengali
57004ISCII Tamil
57005ISCII Telugu
57006ISCII Assamese
57007ISCII Oriya
57008ISCII Kannada
57009ISCII Malayalam
57010ISCII Gujarati
57011ISCII Punjabi
65000Unicode UTF-7
65001Unicode UTF-8
The following is a list of valid code page identifiers for Mac OS only:
IdentifierName
1ASCII
2NEXTSTEP
3JapaneseEUC
4UTF8
5ISOLatin1
6Symbol
7NonLossyASCII
8ShiftJIS
9ISOLatin2
10Unicode
11WindowsCP1251
12WindowsCP1252
13WindowsCP1253
14WindowsCP1254
15WindowsCP1250
21ISO2022JP
30MacOSRoman
10UTF16String
0x90000100UTF16BigEndian
0x94000100UTF16LittleEndian
0x8c000100UTF32String
0x98000100UTF32BigEndian
0x9c000100UTF32LittleEndian
65536Proprietary

LicenseInfo:   Information about the current license.

When queried, this setting will return a string containing information about the license this instance of a class is using. It will return the following information:

  • Product: The product the license is for.
  • Product Key: The key the license was generated from.
  • License Source: Where the license was found (e.g., RuntimeLicense, License File).
  • License Type: The type of license installed (e.g., Royalty Free, Single Server).
  • Last Valid Build: The last valid build number for which the license will work.
MaskSensitive:   Whether sensitive data is masked in log messages.

In certain circumstances it may be beneficial to mask sensitive data, like passwords, in log messages. Set this to True to mask sensitive data. The default is True.

This setting only works on these classes: AS3Receiver, AS3Sender, Atom, Client(3DS), FTP, FTPServer, IMAP, OFTPClient, SSHClient, SCP, Server(3DS), Sexec, SFTP, SFTPServer, SSHServer, TCPClient, TCPServer.

ProcessIdleEvents:   Whether the class uses its internal event loop to process events when the main thread is idle.

If set to False, the class will not fire internal idle events. Set this to False to use the class in a background thread on Mac OS. By default, this setting is True.

SelectWaitMillis:   The length of time in milliseconds the class will wait when DoEvents is called if there are no events to process.

If there are no events to process when do_events is called, the class will wait for the amount of time specified here before returning. The default value is 20.

UseInternalSecurityAPI:   Whether or not to use the system security libraries or an internal implementation.

When set to False, the class will use the system security libraries by default to perform cryptographic functions where applicable.

Setting this configuration setting to True tells the class to use the internal implementation instead of using the system security libraries.

On Windows, this setting is set to False by default. On Linux/macOS, this setting is set to True by default.

To use the system security libraries for Linux, OpenSSL support must be enabled. For more information on how to enable OpenSSL, please refer to the OpenSSL Notes section.

OfficeDoc Errors

Errors

The following errors may be generated by the class. Note that frequently the error message will contain more specific information than what is listed here.

Note that some non-fatal errors may be trapped and explicitly ignored in the on_error event. This will allow the class to continue operation even in case of error.

OfficeDoc Errors

268   The specified part name could not be found on the package, or the part is stored in interleaved format.