XML Class

Properties   Methods   Events   Configuration Settings   Errors  

The XML class can be used to both parse and create XML documents.

Syntax

class ipworks.XML

Remarks

The XML class can operate as either a parser of writer of XML.

Parsing XML

The XML class parses XML documents and verifies that they are well-formed. The results are provided through a set of events complying with the SAX2 specification.

In addition, the document structure may be queried through an x_path mechanism that supports a subset of the XPath specification.

The parser is optimized for read applications, with a very fast engine that builds internal DOM structures with close to zero heap allocations. Additionally, build_dom can be set to False which reduces the overhead of creating the DOM and offers a fast forward-only parsing implementation which fires events to provide the parsed data.

When parsing a document events will fire to provide information about the parsed data. After parse returns the document may be navigated by setting x_path if build_dom is True (default). If build_dom is False parsed data is only accessible through the events.

Events are fired only when qualifying conditions (such as, for example, the beginning of a new element) are met. In the meantime, text will be buffered internally. The following events will fire during parsing:

If build_dom is True (default), x_path may be set after this method returns. x_path implements a subset of the XML XPath specification, allowing you to point to specific elements in the XML documents.

The path is a series of one or more element accessors separated by '/'. The path can be absolute (starting with '/') or relative to the current x_path location.

The following are possible values for an element accessor:

'name'A particular element name
name[i]The i-th subelement of the current element with the given name
[i]The i-th subelement of the current element
[last()]The last subelement of the current element
[last()-i]The subelement located at the last location minus i in the current element
name[@attrname="attrvalue"]The subelement containing a particular value for a given attribute (supports single AND double quotes)
..The parent of the current element

When x_path is set to a valid path, x_element points to the name of the element, with x_parent, x_namespace, x_prefix, x_child_name, x_child_namespace, x_child_prefix, x_child_x_text, and x_text providing other properties of the element. The attributes of the current element are provided in the attr_name, attr_namespace, attr_prefix, and attr_value properties.

build_dom must be set to True prior to parsing the document for the x_path functionality to be available.

Example (Setting XPath):

Document rootXML.XPath = "/"
Specific ElementXML.XPath = "/root/SubElement1/SubElement2/"
i-th ChildXML.XPath = "/root/SubElement1[i]"

Input Properties

The class will determine the source of the input based on which properties are set.

The order in which the input properties are checked is as follows:

When a valid source is found the search stops.

If parsing multiple documents call reset between documents to reset the parser.

An additional "relaxed" mode allows for lexical parsing of non-XML documents (e.g. HTML). This is enabled by setting validate to False. In this case, events will be fired as elements, entities, etc. are encountered, but the structure of the document will not be checked for "well-formedness", and the internal DOM structure will not be built.

Writing XML

To use the class first decide whether or not to write to file, or to output_data.

Output Properties

The class will determine the destination of the output based on which properties are set.

The order in which the output properties are checked is as follows:

To begin writing the XML document first optionally set xml_declaration. If this is not set the class will use a default XML declaration at the beginning of the document.

Next begin adding elements to your document. Calling start_element will open an element with the specified name. To create a nested structure continue calling start_element to open more child elements. To write a value within an element call put_string. To close the element that was last opened call end_element. Each time end_element is called the element at the current level is closed. Alternatively, calling put_element will write the element specified with the value specified and will also close the element.

To write an attribute of the current element, after calling start_element call put_attr. Call put_attr multiple times to add multiple attributes.

Writing comments or CDATA can be done at any time with the put_comment and put_c_data methods.

To close your XML document call save. You can call save from any location and it will close any remaining open elements automatically.

Property List


The following is the full list of the properties of the class with short descriptions. Click on the links for further details.

build_domWhen True, an internal object model of the XML document is created.
input_dataThe XML data to parse.
input_fileThe file to process.
namespace_countThe number of records in the Namespace arrays.
namespace_prefixThe Prefix for the Namespace .
namespace_uriNamespace URI associated with the corresponding Prefix .
output_dataThe output XML after processing.
output_fileThe path to a local file where the output will be written.
overwriteIndicates whether or not the class should overwrite files.
validateWhen True, the parser checks that the document consists of well-formed XML.
attr_countThe number of records in the Attr arrays.
attr_nameThe Name provides the local name (without prefix) of the attribute.
attr_namespaceAttribute namespace.
attr_prefixAttribute prefix (if any).
attr_valueAttribute value.
x_child_countThe number of records in the XChild arrays.
x_child_nameThe Name property provides the local name (without prefix) of the element.
x_child_namespaceNamespace of the element.
x_child_prefixPrefix of the element (if any).
x_child_x_textThe inner text of the element.
x_comment_countThe number of records in the XComment arrays.
x_comment_textThis property holds the comment text.
x_elementThe name of the current element.
xml_declaration_encodingThis property specifies the XML encoding to use.
xml_declaration_standaloneThis property indicates whether the standalone attribute is present in the declaration with a value of true.
xml_declaration_versionThis property specifies the XML version.
x_namespaceThe namespace of the current element.
x_parentThe parent of the current element.
x_pathProvides a way to point to a specific element in the document.
x_prefixThe prefix of the current element.
x_sub_treeA snapshot of the current element in the document.
x_textThe text of the current element.

Method List


The following is the full list of the methods of the class with short descriptions. Click on the links for further details.

configSets or retrieves a configuration setting.
end_elementWrites the closing tag of an open XML element.
flushFlushes the parser and checks its end state.
get_attrReturns the value of the specified attribute.
has_x_pathDetermines whether a specific element exists in the document.
load_domLoads the DOM from a file.
load_schemaLoad the XML schema.
parseThis method parses the specified XML data.
put_attrWrites an XML attribute.
put_c_dataWrites an XML CDATA block.
put_commentWrites an XML comment block.
put_elementWrites a simple XML element with a value.
put_rawWrites a raw XML fragment.
put_stringWrites text inside an XML element.
remove_attrRemove a attribute.
remove_childrenRemoves the children of the elment as the specified XPath.
remove_elementRemoves the elment as the specified XPath.
resetResets the parser.
saveCloses the class writing stream.
save_domSaves the DOM to a file.
start_elementWrites the opening tag of an XML element.
try_x_pathNavigates to the specified XPath if it exists.

Event List


The following is the full list of the events fired by the class with short descriptions. Click on the links for further details.

on_charactersFired for plain text segments of the input stream.
on_commentFired when a comment section is encountered.
on_end_elementFired when an end-element tag is encountered.
on_end_prefix_mappingFired when leaving the scope of a namespace declaration.
on_errorInformation about errors during data delivery.
on_eval_entityFired every time an entity needs to be evaluated.
on_ignorable_whitespaceFired when a section of ignorable whitespace is encountered.
on_metaFired when a meta section is encountered.
on_piFired when a processing instruction section is encountered.
on_special_sectionFired when a special section is encountered.
on_start_elementFired when a begin-element tag is encountered in the document.
on_start_prefix_mappingFired when entering the scope of a namespace declaration.
on_xmlFires as XML is written.

Configuration Settings


The following is a list of configuration settings for the class with short descriptions. Click on the links for further details.

CacheContentIf true, the original XML is saved in a buffer.
CharsetSpecifies the charset used when encoding data.
EOLThe characters to use for separating lines.
ErrorOnEmptyAttrIf true, passing an invalid attribute to the Attr method will throw an exception.
ExtraNameCharsExtra characters for the parser to consider as name characters.
ExtraSpaceCharsExtra characters for the parser to consider as white space.
FlushOnEOLIf set, the parser flushes its text buffer after every line of text.
IgnoreBadAttributePrefixesIf true, bad (unknown) attribute prefixes are ignored.
IgnoreBadElementPrefixesIf true, bad (unknown) element prefixes are ignored.
IncludeElementPrefixWhether to include the prefix in the element name.
IncludeXMLDeclarationWhether to include the XML declaration when writing XML.
IndentThe characters to use for each indentation level.
OffsetCurrent offset of the document being parsed.
PreserveWhitespaceIf true, leading and trailing whitespace in element text is preserved.
QuoteCharQuote character to use for attribute values.
StringProcessingOptionsDefines options to use when processing string values.
BuildInfoInformation about the product's build.
CodePageThe system code page used for Unicode to Multibyte translations.
LicenseInfoInformation about the current license.
ProcessIdleEventsWhether the class uses its internal event loop to process events when the main thread is idle.
SelectWaitMillisThe length of time in milliseconds the class will wait when DoEvents is called if there are no events to process.
UseInternalSecurityAPITells the class whether or not to use the system security libraries or an internal implementation.

Copyright (c) 2022 /n software inc. - All rights reserved.
IPWorks 2020 Python Edition - Version 20.0 [Build 8307]