IPWorks EDIFACT 2020 Python Edition

Questions / Feedback?

EDI Translation Introduction

EDI Documents: Segments

The primary structure of EDI documents are segments. Each segment contains a specific set of known (expected) data. In many cases, a segment is represented as a single line. Segments are separated by a segment delimiter. A line feed is a common segment delimiter in EDI documents (but it could be something else).

Each segment starts with a tag that identifies the kind of data the segment contains. However, tags are not unique, and the context will determine the kind of segment the tag represents. Examples of tags are UNB, UNZ, UNG, etc. Here's an example segment in which the tag of the segment is BGM and the "'" character is the segment delimiter:

BGM+220++4768+9'
A segment can be further decomposed into 0 or more data elements. The sample segment above contains 4 data elements (delimited by '+'). Elements are located within a segment by their position, which is why empty elements must be represented with successive plus signs as shown in the example above. However, if an element is optional (doesn't require a value) and there are no other elements with a value after that, it may not appear. In other words, the fact that the segment in the instance document contains 4 data elements does not mean that it always will if the remaining data elements are empty.

In some cases , a data element itself can be a complex structure, split into components, separated with a different delimiter, for example:

DTM+137:20000101:102'
This sample EDIFACT segment uses a single quote (') as the segment delimiter, '+' as the element delimiter, and ':' as the component delimiter, so it has a single data element with 3 components.

EDI Documents: Structure

As described previously, the structure of entire EDI documents is made out of segments, with header/footer segment pairs forming nested envelopes around the data. These envelopes are well known and organized as follows:

  • All EDI Interchanges (documents) contain interchange header and footer segments. This is the outer envelope. The interchange header segment contains information such as an interchange ID, where it comes from, and where it should go. The interchange footer segment contains information that allows you to validate that the interchange is complete.
  • An EDI Transaction (commonly called a message in EDIFACT) contains the actual EDI data. Again, each transaction contains header and footer segments that identify the type of transaction (i.e. an invoice or a purchase order), the number of segments, etc.
  • An Interchange can contain EDI transactions directly, but more commonly transactions are grouped inside functional groups. Each functional group contains its own header and footer segments. The information contained in the group header and footer depends on the specification, but it generally identifies the type of transactions contained in the group and the specification version used.

In EDIFACT, the header and footer segments for interchanges, functional groups, and transactions have known tags. Below is a table that identifies the Segment Type and its corresponding tag in EDIFACT:

Segment Type EDIFACT Tag
Service String AdviceUNA
Interchange Header UNB
Functional Group Header UNG
Transaction Header UNH
Transaction Footer UNT
Functional Group Footer UNE
Interchange Footer UNZ

Note: The Service String Advice segment only occurs in EDIFACT documents and may appear at the start of the document. It serves to let the parser determine the delimiters that will be used in the rest of the document. Here's an example of a Service String Advice Segment:

UNA:+.? '
This specifies that the component delimiter will be ':', the element delimiter will be '+', the escape character will be '?' and the segment delimiter will be the single quote ('). The '.' is the decimal point character, and the space ' ' is reserved for future use.

Parsing Transactions: Schemas

Within a transaction, segments can only be interpreted with a prior knowledge of the structure of the transaction. In other words, if you see an arbitrary segment within a transaction (for example, an N1 segment within an X12 purchase order document (850)), there's no way to tell what it means or how it should look without knowing where in the transaction it is located. Because of this, to parse EDI documents a schema is required for the transaction that is being parsed to determine the order in which segments should appear. Note that within EDI documents, segments are not unique. In fact, a schema can specify that a given segment be optional, be required but appear only once, or be repeating so that it appears one or more times in a row.

A segment group is basically a series of elements with known tags that repeat. This is the equivalent of a loop in ANSI X12. For example: an EDIFACT purchase order document (ORDERS_D97A) needs to contain the items that are being ordered. Each item is represented as a set of segments made up of LIN, PIA, QTY, and DTM tags where only LIN and QTY are required. Consider then this part of a document:

LIN+1++28040336:MF::90
PIA+1+12345678:BP::92
QTY+21:10
DTM+10:20090515:102

LIN+2++28040357:MF::90
QTY+2:10
This sample actually contains 2 different group segments, for 2 different items. The first instance of the Segment Group has 4 segments (LIN, PIA, QTY, and DTM) which contains the information for the first item in the order form. The second instance of the Segment Group has 2 segments (LIN and QTY) which containts the information for the second/last item in the order form. Notice that since the PIA and DTM segments are not required, they are left off the second instance of the segment group.

It is important to recognize that, schema wise, the item segment group is only defined once, even though it can appear more than once in an EDI. Also, in an EDIFACT ORDERS_D97A document for example, there can be other segment groups in other parts of the document with different meanings and different structure.

Parsing Transactions: Loading and Compiling Schemas

There is no single official standard for EDI Schemas, and the components were designed with this in mind. Rather than require the use of a custom schema format, or require a single schema type, the components support multiple, common, schema formats. This allows for easier integration with existing EDI processing applications.

The following schema formats are supported:

0 (schemaAutomatic - default) The schema type is automatically determined based on file extension.
1 (schemaBinary) A binary schema that was previously compiled by calling on_compile_schema.
2 (schemaBizTalk) BizTalk (XSD): http://msdn.microsoft.com/en-us/library/aa559426(v=BTS.70).aspx
3 (schemaSEF) TIBCO Standard Exchange Format (SEF): https://docs.tibco.com/products/tibco-foresight-edisim-6-18-0
5 (schemaAltova) Altova: http://www.altova.com/
6 (schemaJSON - recommended) ArcESB JSON: https://arc.cdata.com/

Before loading or writing a document with the reader or writer components, a schema will need to be loaded using the LoadSchema() method. The LoadSchema() method can parse schemas and use them for basic validation and interpretation of EDI documents at runtime.

The components also support parsing an EDIFACT document without loading a schema. In this scenario you can traverse the document using the x_path property. Note that when using this approach the organization and loop structures of the segments within the transaction are not known.

The schemas can be very big (most are around 1-2MB in size) and parsing them can be memory intensive. For this reason, once a schema is loaded into a component instance, it is rendered into an internal representation and cached there until the next time it is needed. Designing your application to create and keep one instance of the component is a good way to avoid unnecessary schema loading.

Compiled Schemas offer another form of optimization. The components have a CompileSchema method that takes the path to an EDI Schema file and generates a .BIN file that contains a binary representation of the file. You can then load these into the component using the LoadSchema method.

The EDIFACTReader Components

The class allows you to parse an incoming EDI document. This class works in two states, loading an entire document at once, or streaming portions of the document. These states are controlled by the BuildDOM property. By default BuildDOM is set to bdEntireDocument which parses the entire document at once, allowing you to use the x_path property to navigate the document.

To save memory for larger documents, you can choose to parse only sections of the document, instead of the entire document. When build_dom is set to per interchange (bdInterchange ) or per transaction (bdTransaction), the respective section of the document will be available for use with x_path from within the corresponding Start and End events. Finally, you may choose to set build_dom to bdNone, which means no DOM will be built and all data will be available only through events, but also will use very little memory. Below are example steps to parse an entire document:

  1. First, use on_load_schema to load a schema file into the class. (Only necessary when preserving document structure).
  2. Open an EDI document or stream by setting input via input_file or input_data and calling on_parse.
  3. If build_dom is set to bdEntireDocument, the events of the class will fire as the document is parsed, and x_path may be set to access any part of the document.

    If bdInterchange or bdTransaction are specified, and on_input or on_parse_file is called the entire document will be parsed, with only the specified section being saved in memory at any given time. This means if you wish to set x_path to navigate within the section of the document, you will need to do so within the events of the class to prevent further processing of the document while you access the section. When parsing is completely, only the most recently parsed section will be available for use with x_path

    If bdNone is specified, then all document information must be obtained through the events fired during parsing.

During parsing, the class performs basic validation of the incoming document. If validation fails, a warning is generated (fired as an event).

The XPath navigation is done through the x_path property. For example:

EDIReader.XPath = "/IX[1]/FG[1]/TX[2]/LINLoop1[1]/QTY[1]";
This example path means the following: Select the first QTY segment within the first iteration of the LIN segment, within second transaction in the first functional group and interchange.

You can also make use of XPath conditional statements to locate the first element which matches a name=value. For example, you could use the following XPath to locate the path of the first element within any LIN segment that has a name=QTY and value=20:

EDIReader.XPath = "IX[1]/FG[1]/TX[1]/LINLoop1[QTY='20']";

Note that the conditional statements will search the children, but not the grand children of the element on which the conditional statement is applied. For instance in the above example the children of LIN will be searched, but the grandchildren will not.

To display the structure of the parsed document use on_display_schema_info. This is helpful when deciding how to navigate the document.

The EDIFACTWriter Components

The class allows you to create a document from scratch. The class allows you to create an EDI document one segment at a time. Here's how a document would normally be created:

  1. Call on_load_schema to load the necessary schemas for the transactions that will be used.
  2. Specify where to write the output document by setting the output_file property or on_set_output_stream method, or set neither and check the output_data property.
  3. Create a new interchange start segment using the on_start_interchange_header method and set its properties using on_write_element_string and on_write_component_string.
  4. To write a basic element value to the current location, call the on_write_element_string method. For complex element values, there are two possibilities, elements which are split into components, and elements which repeat. To write these complex element values, use the on_start_element and on_end_element methods, with on_write_component_string and on_repeat_element methods for writing the values. (Examples available below).
  5. Create a new functional group using on_start_functional_group_header and set its properties using on_write_element_string and on_write_component_string.
  6. Create a new transaction using on_start_transaction_header and set the properties for the header segment.
  7. Write all the data for the transaction by creating new data segments using on_start_segment and providing the path of the segment to create using the schema names of the loops and segments, like /N1Loop1/N1.
  8. Once you are done with the segment, call on_end_segment.
  9. Once you are done with the transaction, call on_create_transaction_footer.
  10. Once you are done with the functional group, call on_create_functional_group_footer.
  11. Once the interchange is complete, call on_create_interchange_footer.

The EDIFACTTranslator Components

The class will convert a document from the format specified by input_format to the format specified by output_format. In practice this allows for converting to XML or JSON from EDI and vice versa.

Before translating from EDI to XML or JSON it is recommended to load a schema using the on_load_schema method. This ensures additional information can be included in the XML or JSON document. If a schema is specified the XML or JSON will include types and descriptions as element attributes which are useful for interpreting the data.

EDI elements may optionally be renamed when creating XML. To define how an element is renamed add a renaming rule by calling on_add_renaming_rule.

After calling on_translate the resulting output will contain the EDI, XML or JSON data as defined by output_format. If the output data is XML the on_export_xml_schema method may be called to export a schema (.xsd) defining the structure of a valid XML document. XML documents which adhere to this document may be translated from XML to EDI.

Input and Output Properties

The class will determine the source and destination of the input and output based on which properties are set.

The order in which the input properties are checked is as follows:

  • input_file
  • input_data
When a valid source is found the search stops. The order in which the output properties are checked is as follows:
  • output_file
  • output_data: The output data is written to this property if no other destination is specified.

Copyright (c) 2022 /n software inc. - All rights reserved.
IPWorks EDIFACT 2020 Python Edition - Version 20.0 [Build 8209]