EDI Translation Introduction
EDI Documents: Segments
The primary structure of EDI documents are segments. Each segment contains a specific set of known (expected) data. In many cases, a segment is represented as a single line. Segments are separated by a segment delimiter. A line feed is a common segment delimiter in EDI documents (but it could be something else).
Each segment starts with a tag that identifies the kind of data the segment contains. However, tags are not unique, and the context will determine the kind of segment the tag represents. Examples of tags are ISA, GS, N4, etc. Here's an example X12 segment in
which the tag of the segment is BIG and the "~" character is the segment delimiter:
BIG*19971211*00001**A99999-01~A segment can be further decomposed into 0 or more data elements. The sample segment above contains 4 data elements (delimited by '*'), with one of them being empty. Elements are located within a segment by their position, which is why empty elements must be represented with successive asterisks as shown in the example above. However, if an element is optional (doesn't require a value) and there are no other elements with a value after that, it may not appear. In other words, the fact that the segment in the instance document contains 4 data elements does not mean that it always will if the remaining data elements are empty.
In some cases (more common in EDIFACT than X12), a data element itself can be a complex structure, split into components, separated with a different delimiter, for example:
DTM+137:20000101:102'This sample EDIFACT segment uses a single quote (') as the segment delimiter, '+' as the element delimiter, and ':' as the component delimiter, so it has a single data element with 3 components.
EDI Documents: Structure
As described previously, the structure of entire EDI documents is made out of segments, with header/footer segment pairs forming nested envelopes around the data. These envelopes are well known and organized as follows:
- All EDI Interchanges (documents) contain interchange header and footer segments. This is the outer envelope. The interchange header segment contains information such as an interchange ID, where it comes from, and where it should go. The interchange footer segment contains information that allows you to validate that the interchange is complete.
- An EDI Transaction (commonly called a message in EDIFACT) contains the actual EDI data. Again, each transaction contains header and footer segments that identify the type of transaction (i.e. an invoice or a purchase order), the number of segments, etc.
- An Interchange can contain EDI transactions directly, but more commonly transactions are grouped inside functional groups. Each functional group contains its own header and footer segments. The information contained in the group header and footer depends on the specification, but it generally identifies the type of transactions contained in the group and the specification version used.
In EDIFACT and X12, the header and footer segments for interchanges, functional groups, and transactions have known tags. Below is a table that identifies the Segment Type and its corresponding tag in X12 and EDIFACT:
|Segment Type||X12 Tag||EDIFACT Tag|
|Service String Advice||-||UNA|
|Functional Group Header||GS||UNG|
|Functional Group Footer||GE||UNE|
UNA:+.? 'This specifies that the component delimiter will be ':', the element delimiter will be '+', the escape character will be '?' and the segment delimiter will be the single quote ('). The '.' is the decimal point character, and the space ' ' is reserved for future use.
Parsing Transactions: Schemas
Within a transaction, segments can only be interpreted with a prior knowledge of the structure of the transaction. In other words, if you see an arbitrary segment within a transaction (for example, an N1 segment within an X12 purchase order document (850)), there's no way to tell what it means or how it should look without knowing where in the transaction it is located. Because of this, to parse EDI documents a schema is required for the transaction that is being parsed to determine the order in which segments should appear. Note that within EDI documents, segments are not unique. In fact, a schema can specify that a given segment be optional, be required but appear only once, or be repeating so that it appears one or more times in a row.
A loop is basically a series of elements with known tags that repeat as a group. For example: an X12 purchase order document (850) needs to contain the addresses of the places the items will travel through (the producer, the seller, the buyer, where the invoice should be sent, etc). Each address is represented as a set of segments, tagged N1..N4, of which only N1 is required. Consider then this part of a document:
N1*ST*BUYSNACKS PORT*9*1223334445~ N3*1000 N. SAMPLE HIGHWAY~ N4*ATHENS*GA*30603~ N1*BT*BUYSNACKS*9*1223334444~ N3*P.O. BOX 0000~ N1*RE*FOODSELLER*9*12345QQQQ~ N3*P.O. BOX 222222~ N4*DALLAS*TX*723224444~This sample actually contains 3 different N1-loops, for 3 different addresses. The first loop has 3 segments (N1, N3, and N4) and contains the Ship-To address (ST). The second loop has 2 segments (N1 and N3) and contains the Bill-To (BT) address. The last loop has 3 segments again (N1, N3, and N4) and specifies some other address.
It is important to recognize that, schema wise, these are just defined as a single, repeating loop of a set of segments (and not described one by one). Also, in an 850 document for example, there can be other loops in other parts of the document with different meanings and different structure.
Parsing Transactions: Loading and Compiling Schemas
There is no single official standard for EDI Schemas, and the components were designed with this in mind. Rather than require the use of a custom schema format, or require a single schema type, the components support multiple, common, schema formats. This allows for easier integration with existing EDI processing applications.
The following schema formats are supported:
|0 (schemaAutomatic - default)||The schema type is automatically determined based on file extension.|
|1 (schemaBinary)||A binary schema that was previously compiled by calling CompileSchema.|
|2 (schemaBizTalk)||BizTalk (XSD): http://msdn.microsoft.com/en-us/library/aa559426(v=BTS.70).aspx|
|3 (schemaSEF)||TIBCO Standard Exchange Format (SEF): https://docs.tibco.com/products/tibco-foresight-edisim-6-18-0|
|5 (schemaAltova)||Altova: http://www.altova.com/|
|6 (schemaJSON - recommended)||ArcESB JSON: http://www.arcesb.com/|
Before loading or writing a document with the reader or writer components, a schema will need to be loaded using the LoadSchema() method. The LoadSchema() method can parse schemas and use them for basic validation and interpretation of EDI documents at runtime.
The components also support parsing an EDI or X12 document without loading a schema. In this scenario you can traverse the document using the XPath property. Note that when using this approach the organization and loop structures of the segments within the transaction are not known.
The schemas can be very big (most are around 1-2MB in size) and parsing them can be memory intensive. For this reason, once a schema is loaded into a component instance, it is rendered into an internal representation and cached there until the next time it is needed. Designing your application to create and keep one instance of the component is a good way to avoid unnecessary schema loading.
Compiled Schemas offer another form of optimization. The components have a CompileSchema method that takes the path to an EDI Schema file and generates a .BIN file that contains a binary representation of the file. You can then load these into the component using the LoadSchema method.
The EDIFACTReader Components
The class allows you to parse an incoming EDI document. This class works in two states, loading an entire document at once, or streaming portions of the document. These states are controlled by the BuildDOM property. By default BuildDOM is set to bdEntireDocument which parses the entire document at once, allowing you to use the XPath property to navigate the document.
To save memory for larger documents, you can choose to parse only sections of the document, instead of the entire document. When BuildDOM is set to per interchange (bdInterchange ) or per transaction (bdTransaction), the respective section of the document will be available for use with XPath from within the corresponding Start and End events. Finally, you may choose to set BuildDOM to bdNone, which means no DOM will be built and all data will be available only through events, but also will use very little memory. Below are example steps to parse an entire document:
- First, use LoadSchema to load a schema file into the class. (Only necessary when preserving document structure).
- Open an EDI document or stream by setting input via InputFile or InputData and calling Parse.
- If BuildDOM is set to bdEntireDocument, the events of the class will fire as the document is parsed, and XPath may be set to access any part of the document.
If bdInterchange or bdTransaction are specified, and Input or ParseFile is called the entire document will be parsed, with only the specified section being saved in memory at any given time. This means if you wish to set XPath to navigate within the section of the document, you will need to do so within the events of the class to prevent further processing of the document while you access the section. When parsing is completely, only the most recently parsed section will be available for use with XPath
If bdNone is specified, then all document information must be obtained through the events fired during parsing.
During parsing, the class performs basic validation of the incoming document. If validation fails, a warning is generated (fired as an event).
The XPath navigation is done through the XPath property. For example:
EDIReader.XPath = "/IX/FG/TX/N1Loop1/N1";This example path means the following: Select the first N1 segment within the first iteration of the N1Loop1, within second transaction in the first functional group and interchange.
You can also make use of XPath conditional statements to locate the first element which matches a name=value. For example, you could use the following XPath to locate the path of the first element within any N1Loop1 that has a name=N101 and value=BT:
EDIReader.XPath = "IX/FG/TX/N1Loop1[N101='BT']";
Note that the conditional statements will search the children, but not the grand children of the element on which the conditional statement is applied. For instance in the above example the children of N1Loop1 will be searched, but the grandchildren will not.
Additionally if the schema loaded is a ArcESB JSON schema the element Id (from the schema) can be used in the conditional statement. For instance instead of N101 the following is also acceptable:
EDIReader.XPath = "IX/FG/TX/N1Loop1[98='BT']";
To display the structure of the parsed document use DisplaySchemaInfo. This is helpful when deciding how to navigate the document.
The EDIFACTWriter Components
The class allows you to create a document from scratch. The class allows you to create an EDI document one segment at a time. Here's how a document would normally be created:
- Call LoadSchema to load the necessary schemas for the transactions that will be used.
- Specify where to write the output document by setting the OutputFile property or SetOutputStream method, or set neither and check the OutputData property.
- Create a new interchange start segment using the StartInterchangeHeader method and set its properties using WriteElementString and WriteComponentString.
- To write a basic element value to the current location, call the WriteElementString method. For complex element values, there are two possibilities, elements which are split into components, and elements which repeat. To write these complex element values, use the StartElement and EndElement methods, with WriteComponentString and RepeatElement methods for writing the values. (Examples available below).
- Create a new functional group using StartFunctionalGroupHeader and set its properties using WriteElementString and WriteComponentString.
- Create a new transaction using StartTransactionHeader and set the properties for the header segment.
- Write all the data for the transaction by creating new data segments using StartSegment and providing the path of the segment to create using the schema names of the loops and segments, like /N1Loop1/N1.
- Once you are done with the segment, call EndSegment.
- Once you are done with the transaction, call CreateTransactionFooter.
- Once you are done with the functional group, call CreateFunctionalGroupFooter.
- Once the interchange is complete, call CreateInterchangeFooter.
The EDIFACTTranslator Components
The class will convert a document from the format specified by InputFormat to the format specified by OutputFormat. In practice this allows for converting to XML or JSON from EDI and vice versa.
Before translating from EDI to XML or JSON it is recommended to load a schema using the LoadSchema method. This ensures additional information can be included in the XML or JSON document. If a schema is specified the XML or JSON will include types and descriptions as element attributes which are useful for interpreting the data.
EDI elements may optionally be renamed when creating XML. To define how an element is renamed add a renaming rule by calling AddRenamingRule.
After calling Translate the resulting output will contain the EDI, XML or JSON data as defined by OutputFormat. If the output data is XML the ExportXMLSchema method may be called to export a schema (.xsd) defining the structure of a valid XML document. XML documents which adhere to this document may be translated from XML to EDI.
Input and Output Properties
The class will determine the source and destination of the input and output based on which properties are set.
The order in which the input properties are checked is as follows:
- OutputData: The output data is written to this property if no other destination is specified.