|
|
6. Bridges to Models: Generalised Transportion-data Format (GTF) |
. | |||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||
| 6.1. GTF Definition 6.1.1. Homogeneous Data Model 6.1.2. Cross Platform/Human Readability 6.1.3. Segmented and Self-Describing 6.2. Data needs for advanced transport models 6.3. GTF Main concepts and Data Model 6.4. Fundamental Design of GTF Translators 6.5. GTF data model & GTF-GESMES message specification
|
|||||||||||||||||||||||||||||||||
|
GTF is an acronym for "Generalised Transportation-data Format" specification. The goal of GTF is to standardise the information used by transport modelling software for the purpose of electronic data interchange (EDI). The conceptual structure of Bridges places GTF on the borderline between external models and the system. In the current situation, there is no homogeneous data format for transport models. The lack of such a standard makes the process of exchanging data between models extremely difficult and time consuming. Even worse, there is a risk of confusion and misunderstanding of both the terminology and the topological structures used by each transport modeller. Therefore, the success of a standard format depends on two elements:
Achieving standard data models and formats is the first indispensable step in the process of integrating advanced models into decision support systems.
|
|||||||||||||||||||||||||||||||||
|
6.1. GTF Definition
The GTF specification uses previously defined standards wherever possible in order to maximise its acceptability. To accomplish this, GTF comprises the following parts:
To be able to exchange data electronically, it must be in a form that can
be processed automatically. This is achieved by specifying the arrangement
of the data within an EDI file and the protocol for the interpretation of
each section in an EDI file. Furthermore, to ensure maximum portability
across very different hardware and software platforms (e.g. the sender uses
UNIX/Solaris and the receiver uses PC/Windows), the transmission files must
be in ASCII. This is resolved by use of UN/EDIFACT's GESMES message. UN/EDIFACT
is the specification of ASCII-based EDI messages.
|
|||||||||||||||||||||||||||||||||
|
6.1.1. Homogeneous Data Model
To make sure that no valuable information is lost during the transmission of data between models/between models and other software, a homogeneous and agreed upon data model needs to be designed. This will enable a non conflicting interpretation of the data between the sender and the receiver, i.e. important information like "waiting time", "storage time", components of travel time, "check-in time", "access time", "egress time", "taxiing time" etc. will be understood exactly in the same way by the sender and the receiver.
|
|||||||||||||||||||||||||||||||||
|
6.1.2. Cross Platform/Human Readability
The consequence of the cross platform requirement is that a non-binary code must be used. The ASCII code is used, because this format presents the fewest problems when being exchanged between heterogeneous platforms (for examples see SGML, PDF, RTF). ASCII has the added benefit that a file can be viewed directly using any editor.
|
|||||||||||||||||||||||||||||||||
|
6.1.3. Segmented and Self-Describing
As the data and control information of a model needs to be put together by the system, the exchange format must be very flexible and powerful. The best way to achieve these two goals is to design the format and protocol for data interchange in a structured and segmented manner. In this way, the system will have a "language" to describe the structure and contents of the exchange file. Using the building blocks and grammar (defined by the data model) of the GTF specification, the file becomes self-describing to a translator. This again has the added benefit of being readily understandable. Also, the important requirement, the capability to transfer survey data and not only model specific data, is of relevance to this specification. The solution provided here is that the GTFGESMES definition developed in Bridges is complementary to the standard GESMES message defined by UN/EDIFACT. Therefore, the way to transmit both model information using GTFGESMES and survey data using GESMES is to use at least two messages, one being a GTFGESMES message and the other being a GESMES message. This will be possible, because the GTFGESMES message follows the format specifications defined in GESMES. This provides for a homogeneous EDI message, by enhancing implementations of GTF Translators by the GESMES specification, limiting the purpose of GTFGESMES to the interchange of model data and that of GESMES to the interchange of statistical, time series data. The two messages are therefore conceptually separate but within the same common and homogeneous framework of the UN/EDIFACT message interchange concept. All this will enable the GTFGESMES message to be submitted to the standardisation board of UN/EDIFACT to include the message in the list of standard UN/EDIFACT messages, like GESMES.
|
|||||||||||||||||||||||||||||||||
|
6.2. Data needs for advanced transport models
In order to understand the basic concepts and ideas behind GTF, one has to know about modelling scientific paradigms and their data structure requirements. This section gives a brief overview of these problems and the resulting need for a "Generalised Transportation-data Format" for Electronic Data Interchange. Generally speaking, transport models use the following information items for their computations:
With these different kinds of information in mind, a more detailed view
of the information/data categories for transport information can be
developed.
Because of the problems described above, the structure of a "Generalised Transportation-data Format" should cover the following aspects:
|
|||||||||||||||||||||||||||||||||
|
6.3. GTF Main concepts and Data Model
With GTF, the structure of the numerous software applications and databases are accessible in a homogeneous and compatible manner. A set of GTF Translators will provide a single access point to all models and data. The problems discussed previously of non-homogeneous software and data/informational structures and definitions is overcome by using the GTF Data Model (GTF-DM) specification to structure and flesh out data bases and for information exchange (by using GTF GESMES). The numerous databases can either be restructured according to the GTF Data Model or a specific GTF Translator for each database can be developed, thus providing a homogeneous and single access possibility. The main concept for the development of a GTF-DM is that the
informational units of GTF are "atomic". Therefore the
informational units (the data) of any other DM (DM-X) can be decomposed
according to the GTF-DM. In the GTF-DM, all pieces of information that
qualify a piece of data are kept in separate entity instances which are
linked through relationships to the entity instance containing the piece of
raw data. Here are introduced in more detail the fundamental information classes that are the foundation of the GTF Data Model. This will describe the general structure from which the GTF Data Model was developed. The transport data that is covered is primarily that which is used in
strategic transport models. Thus, it covers interurban, regional and
international travel on all transport modes for both passengers and freight.
It does not cover detailed local traffic issues, such as the representation
of road junction geometry although GTF can be extended to handle such issues
or combined with more specialised data models, e.g. GDF. These types of basic information are further sub-divided in the GTF-DM until the full level of detail required is reached. Also, all entities of the GTF-DM have a "GIS" part and a "TYPE" part. The "GIS" part is used to capture graphical information, e.g. coordinates, vertices attached to LINKs etc. The "TYPE" part contains the information relevant to models, e.g. INFRASTRUCTURE-LINK and FLOW-LINK. Many usual elements found in the inputs used by models, e.g. node, link, are generalised in the GTF-DM to provide atomic informational units. A brief overview covering the GTF entities and relationships is depicted in Figure 2.
|
|||||||||||||||||||||||||||||||||
|
Figure 2: GTF Entities & Relationships Overview
|
|||||||||||||||||||||||||||||||||
|
6.4. Fundamental Design of GTF Translators
From the description of the requirements of the system it follows that
modelling data needs to be transferred across different platforms, mainly
Windows and UNIX. This is because many modelling software applications are
implemented on UNIX platforms and the envisaged default platform for a user
is a PC with Windows NT as operating system. As the data and control information of a model needs to be put together by the system, the exchange format must be very flexible and powerful. The best way to achieve these two goals is to design the format and protocol for interchanging in a structured and segmented manner. In this way, the system will have a "language" to describe the structure and contents of the exchange file With such a language/protocol, the handling of sparse arrays, for example, would become much less complex, as a translator just has to read the positional information attached to a data element of the array in order to assign this data element correctly. A problem might arise if there is a need to compress the data. As compressed data is usually binary data, most compression utilities are of no use to the GTF translators. Only compression utilities that write ASCII output as a result of compression can be used in order to meet the cross-platform requirement.
|
|||||||||||||||||||||||||||||||||
|
6.5. GTF data model & GTF-GESMES message specification
This section briefly introduces the standard format used for the actual exchange of a GTF file. The format is based on UN/EDIFACT's GESMES message. More information concerning interchange structures to be used with UN/EDIFACT messages is available in the "UN/EDIFACT standard directory (Part 4, Chapter 2.2 to Chapter 6. Structures"). Information can also be found at the UN website: http://www.unece.org/trade/untdid/. The UN/EDIFACT specification defines the allowed character sets to be used in a message transmission. The character set chosen for GTF is the Level A character set defined in the UN/EDIFACT standard directory.
|
|||||||||||||||||||||||||||||||||
|
6.5.1. Using the GTF data model and the GTF GESMES format
Basically, the GTF data model is a framework which can be used to define the information that is contained in data. The difference between 'data' and 'information' is crucial. It is clear that 'data' is not always equal to "information". Therefore, the GTF data model framework, allows a user of the GTF specification, to wrap data into information entities. These entities contain the basic data and the necessary supplementary information (meta-data) to give a meaning to the basic data. In this way, one can make sure that the input data to a transport model fits the model's information requirement. This is crucial for a model to compute valid results. If the input to a model does not fit the assumptions that were made about the information carried in the data (i.e. the meta-data associated to the input data), then the model is unlikely to produce valid results. Here, the GTF data model comes into play. It forces a user of this specification to make the implicit information explicit by wrapping the data - data with implicit information - into entity structures - data with explicit information. These entities are then combined to represent the complete implicit information that a piece of data carries. In this way, all the data's information is made explicit and can thus be used to check whether data fits in with the model's philosophy or not. The GTF data model provides a standard set of information pieces that can be used for wrapping data into them and combining the pieces of information into a larger chunk of information, just like building something using LEGO . With LEGO one can build many different things without always having to buy different components. One can use the standard set of LEGO pieces and still build a wide range of different things. One only needs a new piece of LEGO, if one wants to build something that does not fit into the concepts of the currently available LEGO pieces. For example, with only square LEGO components it is impossible to build something round. In this case, the concept of "round" is totally different to the concept of "square". The "square" concept is covered by the available pieces of LEGO, the "round" concept, is not. So, one uses a new "round" component to bring the concept to life. In the case of GTF, all this applies equally; the different concepts of LEGO pieces, e.g. "round", "square", are defined as entities. Each specific piece of LEGO is an entity instance. A construction made of LEGO is a GTFGESMES file with the entity instances and the definitions of the relationships between them. The main advantage of this kind of thinking is the relatively small number of different abstract concepts used to cover a very wide range of concrete objects. The data model described in this document is complete in the sense that
all parent entities required to define a child entity are also defined as
separate entities in the model, although the parent entities are abstract
and often included to complete the framework. The entities that are actually
used in a GTFGESMES transmission have a definition of a GESMES segment
following the tabular definition of the attributes associated with the
entity. For example, the entity "TERMINATOR" which only captures
the concept of "something that is the beginning or the end of something
else" is abstract, the concrete concept is a "NODE" which is
"a point in an infrastructure network". Thus, in a GTFGESMES only
NODEs should be transmitted, because the TERMINATOR information is known
automatically. This information is implicit in the GTFGESMES file but
explicit when looking at the NODE definition in this specification. This
means, that when a NODE is transmitted, the receiver (who also knows this
specification) automatically knows that a TERMINATOR is the parent of the
NODE. One can transmit a TERMINATOR, if needed, but a TERMINATOR is not
concrete and can, therefore, be either a NODE or a ZONE. It is not possible
to determine which, solely on the basis of the TERMINATOR information. The
information whether a TERMINATOR's role is that of a NODE or a ZONE is
explicitly contained in the corresponding entities in the data model. Thus,
one TERMINATOR can be used as parent of a NODE and a ZONE, if the
information to be conveyed is: "This starting/ending point has the role
of an infrastructure node and it is also the input/output point of a zone,
i.e. the zone's centroid, in this network".
|
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
|
These are called "topmost entities" or "top-levels". The
top-level entities ("top-levels" = all entities in a tree from the
topmost entity to a "concrete") are abstract entities and usually
only to be used as the beginning of a structural tree. At the end of the
trees one can find the concrete entities ("concretes"). The GTF
specification defines which combination of entities should be used for
actual transmission by defining GESMES Segments, e.g. the NODE's segment
specification or the PHYSICAL_SPECIFICATION-TECHNICAL-ENGINEERED's segment
specification etc. But one can also use the top-level entities in a
transmission but if a transmission contains both "top-levels" and
"concretes" then there is redundant information in the
transmission, because a "concrete" always implies a parent
"top-level". The "top-levels" can be transmitted without
breaking any rule in this specification but it would be unnecessary. On the
other hand, it is plausible to only transmit "top-levels", if one
only wants to define a very abstract network without connection to a
concrete transport network. Even the "concretes" have been kept
logically abstract enough to be able to define many kinds of networks. For
example, one can use the data model to define a network used only by trucks
or one only used by cars.
The "top-levels" and the "concretes" can be combined using the defined relationships. These relationships are defined by a user of this data model by filling out an attribute in an entity that was migrated to the entity through the relationship. For example, the "is in ZONE.ID" attribute of the NODE entity is an attribute that is migrated to the NODE, because of the "is in ZONE" relationship between NODE and ZONE. This relationship can be used to associate the information within which ZONE a NODE is, e.g. a NODE "Hanover" (where the Expo 2000 will take place) is within the ZONE "Nordrhein-Westfalen". In this data model, a relationship per se does not carry any information in the way an entity does. All information in this data model is enclosed in an entity. The relationships only associate two entities. This association is the only information a relationship adds to the data model. But the actual place where data which describes the relationships (i.e. the two associated entities) is stored is as an entity's attribute. Also, a relationship only defines exactly one attribute. This means that the entity attributes defined in the data model are either generic to the entity or have been added to the entity because of a relationship. The top most relationships are:
|
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
|
6.6. GTF GESMES Format
Once it is clear which entities to use for the transmission of a piece of
data, one needs to generate the corresponding GTFGESMES segments and
construct a complete GTFGESMES interchange transmission file.
Anything before or after these two segments is ignored by GTF
Translators. Between these two segments any number of messages can be
defined.
A "segment block" comprises
These are the only segments needed to transmit data structured as specified in this document:
A transmission, e.g. a file, structured in this way, is a valid GTFGEMSES transmission. A GTF Translator that reads such a file needs to reconstruct the underlying GTF data model filled with the data in the transmission. Once this is done, the complete information is available again at the receiver's side, which was the goal of the GTF specification. The following GTF message is an example that defines a piece of data
according to Infostat's "Area of the zone (km2), NUTS 3"
specification. It also defines two ZONEs, "Karlsruhe Stadtkreis"
and "Greater London", further information and a flow between the
two, LINK-FLOW. This latter is specifically a link with computed attributes
in the PHYSICAL-SPECIFICATION entities, that specify the link attribute
"distance" to be 1000 km (UNIT-DIMENSION-LENGTH - value = 3). Transmission: For a concrete proof-of-concept, ME&P implemented a translator in Visual Basic and MKmetric implemented a translator in Java. UML was used in the design of the Java translator. Other formats have been scrutinised, e.g. GDF, XML, ICE etc, to check whether the chosen GESMES format can be replaced by any of these more modern, formats. The examination has shown possibilities which would lead to using one of the other formats instead of GESMES. For example, GDF, which is a very detailed format for capturing road features and which therefore has its own detailed data model, can potentially be used as an addition to GTF. The addition would specify in much more detail the "road" mode of the GTF data model, models normally use aggregated information. But since, GTF is a data model that comprises generic model features, like "mode", it also specifies modes other than "road", e.g. "rail", "waterway", "air". This implies, that for consistency, a mode specific "GDF" would have to be defined, or GTF uses the GDF definition for road and other mode specific definitions for all other modes (which would include by default the GTF definition of each mode). What should be noted however, is the fact that both data models, GTF and GDF, seem to be compatible. Hence, it is only a matter of mapping concepts from one data model to the other. Another format that was analysed is XML. This specification defines a flexible and extensible format. It does not define any specific data model. Consequently any data model can use XML as a concrete definition of the format to store and retrieve the data in the data model. In the case of GTF this means, that XML can be used instead of GESMES. But one has to note, the disadvantages: XML is not as compact as GESMES, i.e. it uses tags to mark-up ("attach") the meta-data information for a piece of data. Depending on the definition of the mark-up tags, an XML file can be much larger than an equivalent GESMES formatted file. In view, of the fact, that model communication, generically implies a wealth of data to be transmitted, it should be clear, that XML in most cases will result in larger amounts of data. Also the modelling language UML , Unified Modelling Language, that has emerged in recent years, was considered as a data model definition notation instead of IDEF1X. At the time it was rejected, because UML did not seem to be stable enough as a data modelling language. Because of these promising, emerging formats it is advised to follow the
development and when necessary/favourable to add another format to possible
formats for concrete GTF files. |
|||||||||||||||||||||||||||||||||