6. Bridges to Models: Generalised Transportion-data Format (GTF)

.

 

    6.1. GTF Definition
6.1.1. Homogeneous Data Model
6.1.2. Cross Platform/Human Readability
6.1.3. Segmented and Self-Describing

6.2. Data needs for advanced transport models

6.3. GTF Main concepts and Data Model

6.4. Fundamental Design of GTF Translators

6.5. GTF data model & GTF-GESMES message specification
6.5.1. Using the GTF data model and the GTF GESMES format

6.6. GTF GESMES Format

 


GTF is an acronym for "Generalised Transportation-data Format" specification. The goal of GTF is to standardise the information used by transport modelling software for the purpose of electronic data interchange (EDI).

The conceptual structure of Bridges places GTF on the borderline between external models and the system.

In the current situation, there is no homogeneous data format for transport models. The lack of such a standard makes the process of exchanging data between models extremely difficult and time consuming. Even worse, there is a risk of confusion and misunderstanding of both the terminology and the topological structures used by each transport modeller. Therefore, the success of a standard format depends on two elements:

  • First, the "data model" behind the format must be complex enough to cover the database structures used by most models and
  • Second, the actual format used must be compatible with accepted standards.

Achieving standard data models and formats is the first indispensable step in the process of integrating advanced models into decision support systems.

 

6.1. GTF Definition

The GTF specification uses previously defined standards wherever possible in order to maximise its acceptability.

To accomplish this, GTF comprises the following parts:

  • A standardised definition of transport information but without limiting the information possible to any specific sub-set. This is called the "GTF data model.
  • A standardised set of commands to run models and retrieve relevant data. This is called TIP (Transportation data Interchange Protocol). (This was not part of the Bridges project.)
  • A standard format for arranging data in a file used for EDI and a standard protocol for exchanging the data file. For this UN/EDIFACT's GESMES message is used.

To be able to exchange data electronically, it must be in a form that can be processed automatically. This is achieved by specifying the arrangement of the data within an EDI file and the protocol for the interpretation of each section in an EDI file. Furthermore, to ensure maximum portability across very different hardware and software platforms (e.g. the sender uses UNIX/Solaris and the receiver uses PC/Windows), the transmission files must be in ASCII. This is resolved by use of UN/EDIFACT's GESMES message. UN/EDIFACT is the specification of ASCII-based EDI messages.
The general requirements are the following ones.

 

6.1.1. Homogeneous Data Model

To make sure that no valuable information is lost during the transmission of data between models/between models and other software, a homogeneous and agreed upon data model needs to be designed. This will enable a non conflicting interpretation of the data between the sender and the receiver, i.e. important information like "waiting time", "storage time", components of travel time, "check-in time", "access time", "egress time", "taxiing time" etc. will be understood exactly in the same way by the sender and the receiver.

 

6.1.2. Cross Platform/Human Readability

The consequence of the cross platform requirement is that a non-binary code must be used. The ASCII code is used, because this format presents the fewest problems when being exchanged between heterogeneous platforms (for examples see SGML, PDF, RTF). ASCII has the added benefit that a file can be viewed directly using any editor.

 

6.1.3. Segmented and Self-Describing

As the data and control information of a model needs to be put together by the system, the exchange format must be very flexible and powerful. The best way to achieve these two goals is to design the format and protocol for data interchange in a structured and segmented manner. In this way, the system will have a "language" to describe the structure and contents of the exchange file. Using the building blocks and grammar (defined by the data model) of the GTF specification, the file becomes self-describing to a translator. This again has the added benefit of being readily understandable.

Also, the important requirement, the capability to transfer survey data and not only model specific data, is of relevance to this specification. The solution provided here is that the GTFGESMES definition developed in Bridges is complementary to the standard GESMES message defined by UN/EDIFACT. Therefore, the way to transmit both model information using GTFGESMES and survey data using GESMES is to use at least two messages, one being a GTFGESMES message and the other being a GESMES message. This will be possible, because the GTFGESMES message follows the format specifications defined in GESMES. This provides for a homogeneous EDI message, by enhancing implementations of GTF Translators by the GESMES specification, limiting the purpose of GTFGESMES to the interchange of model data and that of GESMES to the interchange of statistical, time series data. The two messages are therefore conceptually separate but within the same common and homogeneous framework of the UN/EDIFACT message interchange concept.

All this will enable the GTFGESMES message to be submitted to the standardisation board of UN/EDIFACT to include the message in the list of standard UN/EDIFACT messages, like GESMES.

 

6.2. Data needs for advanced transport models

In order to understand the basic concepts and ideas behind GTF, one has to know about modelling scientific paradigms and their data structure requirements. This section gives a brief overview of these problems and the resulting need for a "Generalised Transportation-data Format" for Electronic Data Interchange.

Generally speaking, transport models use the following information items for their computations:

  • Zonal data: any kind of zonal description, e.g. socio-economic data, ecological data, zonal boundaries, transport data, indicators, transport matrices etc.
  • Network data: data describing relations between the elements, e.g. link characteristics, a link has a starting node and an ending node (i.e. topological characteristics), link/network clusters etc.
  • Geographical data: for viewing purposes the information which needs to be exchanged with GTF should contain information typically needed by GIS, e.g. the underlying projection of the node and its co-ordinates.

With these different kinds of information in mind, a more detailed view of the information/data categories for transport information can be developed.
Models in general, even if they are not, for example, discrete choice models, are very demanding in terms of the amount and quality of input and calibration data.
The main problems with current data and databases at European level are the following:

  •  Data required by the model, e.g. for estimation, is not available. For example, a pan-European passenger transport model requires homogeneous input data from all countries at the same level of aggregation. This kind of database is not currently available and when data (or information of interest for the model) is found, not only does the format not correspond to the other data but also the information contained in the new data lacks some essential element.
  • The specification of the data required by the model does not match what is available and re-specification is not possible. Often, a database that was acquired for a model, holds the data at an aggregation level that cannot be matched to the one needed by the model. For example, if the model requires a NUTS zonal division of the data but the acquired data has a different regionalisation (e.g. CIP-Codes, Telephone Number or Car Registration System), the structure of data, in the first case a NUTS zone and in the second case a zone in another zoning system, do not match exactly, and a transformation of the data from the other system into data fitting the NUTS regionalisation (or vice versa) will be necessary. This could only be done, if further information concerning the amount of the data for a zone from the other zoning system that fits into the equivalent NUTS zone (i.e. the percentage of the data for the zone from the other zoning system that fits into the NUTS zone) is available. This is seldom the case. Indeed, what usually happens is that you need to take different percentages from a set of zones from another zoning system to create an equivalent NUTS zone (or vice versa). This makes the transformation a tedious and error prone task.
  • The level of aggregation of the available data does not match the required level. This often means, that the acquired data is aggregated to a higher level than required and cannot be disaggregated to the level needed.

Because of the problems described above, the structure of a "Generalised Transportation-data Format" should cover the following aspects:

  • Instead of having disparate and manifold software applications and databases, a GTF should contain all necessary elements and provide one single and homogenous data specification and format
  • Instead of having incompatible proprietary formats and informational contents, a "Generalised Transportation-data Format" (GTF) should be used throughout the whole system, by providing translators to/from the proprietary formats to GTF. GTF consists of a generalised data model (GTF-DM), and a standard exchange format (GTF-GESMES).
6.3. GTF Main concepts and Data Model

With GTF, the structure of the numerous software applications and databases are accessible in a homogeneous and compatible manner. A set of GTF Translators will provide a single access point to all models and data. The problems discussed previously of non-homogeneous software and data/informational structures and definitions is overcome by using the GTF Data Model (GTF-DM) specification to structure and flesh out data bases and for information exchange (by using GTF GESMES). The numerous databases can either be restructured according to the GTF Data Model or a specific GTF Translator for each database can be developed, thus providing a homogeneous and single access possibility.

The main concept for the development of a GTF-DM is that the informational units of GTF are "atomic". Therefore the informational units (the data) of any other DM (DM-X) can be decomposed according to the GTF-DM. In the GTF-DM, all pieces of information that qualify a piece of data are kept in separate entity instances which are linked through relationships to the entity instance containing the piece of raw data.
The main focus for the development of the GTF specification and subsequently the GTF Translators was: The goals of the GTF research are to define an abstract view of transport model information for the purpose of implementing translator software to exchange data electronically between modelling software and other software (e.g. database systems).
The primary goal therefore was the definition of a data model for transport information (a GTF specification of information structure) and the definition of a format and syntax for electronic transfer of information (a GTF Translator syntax and the format of information data).

Here are introduced in more detail the fundamental information classes that are the foundation of the GTF Data Model. This will describe the general structure from which the GTF Data Model was developed.

The transport data that is covered is primarily that which is used in strategic transport models. Thus, it covers interurban, regional and international travel on all transport modes for both passengers and freight. It does not cover detailed local traffic issues, such as the representation of road junction geometry although GTF can be extended to handle such issues or combined with more specialised data models, e.g. GDF.
The basic information captured in the GTF-DM is the infrastructure elements of networks: NODE, INFRASTRUCTURE-LINK, and the zonal elements needed for flow assignment: ZONE, FLOW and CONNECTOR-LINK

These types of basic information are further sub-divided in the GTF-DM until the full level of detail required is reached. Also, all entities of the GTF-DM have a "GIS" part and a "TYPE" part. The "GIS" part is used to capture graphical information, e.g. coordinates, vertices attached to LINKs etc. The "TYPE" part contains the information relevant to models, e.g. INFRASTRUCTURE-LINK and FLOW-LINK. Many usual elements found in the inputs used by models, e.g. node, link, are generalised in the GTF-DM to provide atomic informational units.

A brief overview covering the GTF entities and relationships is depicted in Figure 2.

 

Figure 2: GTF Entities & Relationships Overview

 

6.4. Fundamental Design of GTF Translators

From the description of the requirements of the system it follows that modelling data needs to be transferred across different platforms, mainly Windows and UNIX. This is because many modelling software applications are implemented on UNIX platforms and the envisaged default platform for a user is a PC with Windows NT as operating system.
The consequence of the cross-platform requirement is that a non-binary code must be used. ASCII code has been chosen because it is the format which presents the least problems when being exchanged between heterogeneous platforms. ASCII also has the additional benefit that a GTF file in ASCII can also be read using a standard editor, should any problems occur.

As the data and control information of a model needs to be put together by the system, the exchange format must be very flexible and powerful. The best way to achieve these two goals is to design the format and protocol for interchanging in a structured and segmented manner. In this way, the system will have a "language" to describe the structure and contents of the exchange file

With such a language/protocol, the handling of sparse arrays, for example, would become much less complex, as a translator just has to read the positional information attached to a data element of the array in order to assign this data element correctly.

A problem might arise if there is a need to compress the data. As compressed data is usually binary data, most compression utilities are of no use to the GTF translators. Only compression utilities that write ASCII output as a result of compression can be used in order to meet the cross-platform requirement.

 

6.5. GTF data model & GTF-GESMES message specification

This section briefly introduces the standard format used for the actual exchange of a GTF file. The format is based on UN/EDIFACT's GESMES message.

More information concerning interchange structures to be used with UN/EDIFACT messages is available in the "UN/EDIFACT standard directory (Part 4, Chapter 2.2 to Chapter 6. Structures"). Information can also be found at the UN website: http://www.unece.org/trade/untdid/. The UN/EDIFACT specification defines the allowed character sets to be used in a message transmission. The character set chosen for GTF is the Level A character set defined in the UN/EDIFACT standard directory.

 

6.5.1. Using the GTF data model and the GTF GESMES format

Basically, the GTF data model is a framework which can be used to define the information that is contained in data. The difference between 'data' and 'information' is crucial. It is clear that 'data' is not always equal to "information". Therefore, the GTF data model framework, allows a user of the GTF specification, to wrap data into information entities. These entities contain the basic data and the necessary supplementary information (meta-data) to give a meaning to the basic data. In this way, one can make sure that the input data to a transport model fits the model's information requirement. This is crucial for a model to compute valid results. If the input to a model does not fit the assumptions that were made about the information carried in the data (i.e. the meta-data associated to the input data), then the model is unlikely to produce valid results.

Here, the GTF data model comes into play. It forces a user of this specification to make the implicit information explicit by wrapping the data - data with implicit information - into entity structures - data with explicit information. These entities are then combined to represent the complete implicit information that a piece of data carries. In this way, all the data's information is made explicit and can thus be used to check whether data fits in with the model's philosophy or not.

The GTF data model provides a standard set of information pieces that can be used for wrapping data into them and combining the pieces of information into a larger chunk of information, just like building something using LEGO . With LEGO one can build many different things without always having to buy different components. One can use the standard set of LEGO pieces and still build a wide range of different things. One only needs a new piece of LEGO, if one wants to build something that does not fit into the concepts of the currently available LEGO pieces. For example, with only square LEGO components it is impossible to build something round. In this case, the concept of "round" is totally different to the concept of "square". The "square" concept is covered by the available pieces of LEGO, the "round" concept, is not. So, one uses a new "round" component to bring the concept to life. In the case of GTF, all this applies equally; the different concepts of LEGO pieces, e.g. "round", "square", are defined as entities. Each specific piece of LEGO is an entity instance. A construction made of LEGO is a GTFGESMES file with the entity instances and the definitions of the relationships between them. The main advantage of this kind of thinking is the relatively small number of different abstract concepts used to cover a very wide range of concrete objects.

The data model described in this document is complete in the sense that all parent entities required to define a child entity are also defined as separate entities in the model, although the parent entities are abstract and often included to complete the framework. The entities that are actually used in a GTFGESMES transmission have a definition of a GESMES segment following the tabular definition of the attributes associated with the entity. For example, the entity "TERMINATOR" which only captures the concept of "something that is the beginning or the end of something else" is abstract, the concrete concept is a "NODE" which is "a point in an infrastructure network". Thus, in a GTFGESMES only NODEs should be transmitted, because the TERMINATOR information is known automatically. This information is implicit in the GTFGESMES file but explicit when looking at the NODE definition in this specification. This means, that when a NODE is transmitted, the receiver (who also knows this specification) automatically knows that a TERMINATOR is the parent of the NODE. One can transmit a TERMINATOR, if needed, but a TERMINATOR is not concrete and can, therefore, be either a NODE or a ZONE. It is not possible to determine which, solely on the basis of the TERMINATOR information. The information whether a TERMINATOR's role is that of a NODE or a ZONE is explicitly contained in the corresponding entities in the data model. Thus, one TERMINATOR can be used as parent of a NODE and a ZONE, if the information to be conveyed is: "This starting/ending point has the role of an infrastructure node and it is also the input/output point of a zone, i.e. the zone's centroid, in this network".
The large number of defined entities (approx. 200) in the data model arises because of a combinatorial explosion. But only 8 basic entities were used to create the data model, namely:

 

factor terminator link physical
specification
vessel service alternative unit
These are called "topmost entities" or "top-levels". The top-level entities ("top-levels" = all entities in a tree from the topmost entity to a "concrete") are abstract entities and usually only to be used as the beginning of a structural tree. At the end of the trees one can find the concrete entities ("concretes"). The GTF specification defines which combination of entities should be used for actual transmission by defining GESMES Segments, e.g. the NODE's segment specification or the PHYSICAL_SPECIFICATION-TECHNICAL-ENGINEERED's segment specification etc. But one can also use the top-level entities in a transmission but if a transmission contains both "top-levels" and "concretes" then there is redundant information in the transmission, because a "concrete" always implies a parent "top-level". The "top-levels" can be transmitted without breaking any rule in this specification but it would be unnecessary. On the other hand, it is plausible to only transmit "top-levels", if one only wants to define a very abstract network without connection to a concrete transport network. Even the "concretes" have been kept logically abstract enough to be able to define many kinds of networks. For example, one can use the data model to define a network used only by trucks or one only used by cars.

The "top-levels" and the "concretes" can be combined using the defined relationships. These relationships are defined by a user of this data model by filling out an attribute in an entity that was migrated to the entity through the relationship. For example, the "is in ZONE.ID" attribute of the NODE entity is an attribute that is migrated to the NODE, because of the "is in ZONE" relationship between NODE and ZONE. This relationship can be used to associate the information within which ZONE a NODE is, e.g. a NODE "Hanover" (where the Expo 2000 will take place) is within the ZONE "Nordrhein-Westfalen". In this data model, a relationship per se does not carry any information in the way an entity does. All information in this data model is enclosed in an entity. The relationships only associate two entities. This association is the only information a relationship adds to the data model. But the actual place where data which describes the relationships (i.e. the two associated entities) is stored is as an entity's attribute. Also, a relationship only defines exactly one attribute. This means that the entity attributes defined in the data model are either generic to the entity or have been added to the entity because of a relationship. The top most relationships are:

 

Activity

ZONE X FACTOR

is in

NODE X ZONE

defined by

ZONE X PHYSICAL SPECIFICATION

technical (specification) by (vessel)

VESSEL X PHYSICAL SPECIFICATION

purpose (defines)

UNIT X PHYSICAL SPECIFICATION

Characteristics

PHYSICAL SPECIFICATION X LINK

allows usage by

VESSEL X LINK

begins in/ends in

TERMINATOR X LINK

time slot

SCHEDULE X SERVICE

'real' definition

VESSEL X SERVICE

allowed service

SERVICE X LINK

modelling information

SERVICE X ALTERNATIVE

Definition

VESSEL X ALTERNATIVE

Allowed

ALTERNATIVE X LINK

unit definition

UNIT X ALTENATIVE

 

6.6. GTF GESMES Format

Once it is clear which entities to use for the transmission of a piece of data, one needs to generate the corresponding GTFGESMES segments and construct a complete GTFGESMES interchange transmission file.
A GTFGESMES interchange always

  • Starts with a mandatory "UNB" segment
  • Followed by any number of messages
  • And ends with a mandatory "UNZ" segment.

Anything before or after these two segments is ignored by GTF Translators. Between these two segments any number of messages can be defined.
A message

  • Starts with a mandatory "UNH" segment
  • Followed by an optional "FNT-FTX" segment group,
  • Followed by any number of "segment blocks"
  • And ends with a mandatory "UNT" segment.

A "segment block" comprises

  • An optional "DSI" segment,
  • Optional "FTX" segments (up to 5, as defined by the GESMES specification)
  • A mandatory "ARR" segment and
  • A mandatory "IDE" segment

These are the only segments needed to transmit data structured as specified in this document:

  • The "UNB" segment is a beginning-of-interchange marker,
  • The "UNZ" segment is an end-of-interchange marker,
  • The "UNH" segment is a beginning-of-message marker,
  • The "UNT" is an end-of-message marker,
  • The "FNT-FTX" segment group is a group with textual TIP command information,
  • The "DSI" segment is a dataset identification segment for identifying the data in the subsequent ARR segments assigned by the sender of the interchange, e.g. data/time/project/scenario etc.,
  • The "FTX" segment is a textual comment to the following ARR segment,
  • The "ARR" segment is a segment containing the data - the segment's structure is defined in the entity definition,
  • The "IDE" segment is a structure identifier which identifies the structure used in the previous ARR segments, for example, if the previous ARR segments are NODE definition segments etc.

A transmission, e.g. a file, structured in this way, is a valid GTFGEMSES transmission. A GTF Translator that reads such a file needs to reconstruct the underlying GTF data model filled with the data in the transmission. Once this is done, the complete information is available again at the receiver's side, which was the goal of the GTF specification.

The following GTF message is an example that defines a piece of data according to Infostat's "Area of the zone (km2), NUTS 3" specification. It also defines two ZONEs, "Karlsruhe Stadtkreis" and "Greater London", further information and a flow between the two, LINK-FLOW. This latter is specifically a link with computed attributes in the PHYSICAL-SPECIFICATION entities, that specify the link attribute "distance" to be 1000 km (UNIT-DIMENSION-LENGTH - value = 3).
GTF message:

Transmission:
<connection establishment>
UNB+UNOA:2+MKmetric+MCRIT+971126:1510+GTFTIP-TEST-1'
UNH+TEST-MESSAGE-1+GESMES:0:27:M6'
BGM+ZZZ+GTFTIPV1.0'
ARR+5:DE122:Karlsruhe, Stadtkreis'
ARR+6:UK55:Greater London'
IDE+Z07+15' // ZONE
ARR+10:::2:Area of the Zone:5+1:C:50'
ARR+11:::2:Land-use type:5+2:1:17'
IDE+Z07+1' // FACTOR
ARR+7+3:1:10'
ARR+8+2:74:37'
IDE+Z07+136' // UNIT
ARR+7+3'
ARR+8+2'
IDE+Z07+184' // UNIT-DIMENSION
ARR+8+3'
IDE+Z07+192' // UNIT-DIMENSION-LENGTH
ARR+7+2'
IDE+Z07+196' // UNIT-DIMENSION-AREA
ARR+73639++3:2:5:6'
IDE+Z07+118' // LINK
ARR+73639+1'
IDE+Z07+132' // LINK-FLOW
ARR+37+3'
IDE+Z07+40' // PHYSICAL_SPECIFICATON
ARR+37+1'
IDE+Z07+79' // PHYSICAL_SPECIFICATON-MOVEMENT
ARR+37+1000'
IDE+Z07+82' // PHYSICAL_SPECIFICATON-MOVEMENT-COMPUTED
UNT+72+TEST-MESSAGE-1'
UNZ+1+GTFTIP-TEST-1'
<connection termination>

For a concrete proof-of-concept, ME&P implemented a translator in Visual Basic and MKmetric implemented a translator in Java. UML was used in the design of the Java translator.

Other formats have been scrutinised, e.g. GDF, XML, ICE etc, to check whether the chosen GESMES format can be replaced by any of these more modern, formats. The examination has shown possibilities which would lead to using one of the other formats instead of GESMES. For example, GDF, which is a very detailed format for capturing road features and which therefore has its own detailed data model, can potentially be used as an addition to GTF. The addition would specify in much more detail the "road" mode of the GTF data model, models normally use aggregated information. But since, GTF is a data model that comprises generic model features, like "mode", it also specifies modes other than "road", e.g. "rail", "waterway", "air". This implies, that for consistency, a mode specific "GDF" would have to be defined, or GTF uses the GDF definition for road and other mode specific definitions for all other modes (which would include by default the GTF definition of each mode). What should be noted however, is the fact that both data models, GTF and GDF, seem to be compatible. Hence, it is only a matter of mapping concepts from one data model to the other.

Another format that was analysed is XML. This specification defines a flexible and extensible format. It does not define any specific data model. Consequently any data model can use XML as a concrete definition of the format to store and retrieve the data in the data model. In the case of GTF this means, that XML can be used instead of GESMES. But one has to note, the disadvantages: XML is not as compact as GESMES, i.e. it uses tags to mark-up ("attach") the meta-data information for a piece of data. Depending on the definition of the mark-up tags, an XML file can be much larger than an equivalent GESMES formatted file. In view, of the fact, that model communication, generically implies a wealth of data to be transmitted, it should be clear, that XML in most cases will result in larger amounts of data. Also the modelling language UML , Unified Modelling Language, that has emerged in recent years, was considered as a data model definition notation instead of IDEF1X. At the time it was rejected, because UML did not seem to be stable enough as a data modelling language.

Because of these promising, emerging formats it is advised to follow the development and when necessary/favourable to add another format to possible formats for concrete GTF files.

info@mcrit.com