CTM meeting notes

Kyoto, December 8-11 2007

Prof. Sam Oh tendered his resignation as editor because of the commitments involved in his new role as Chairman of SC34. Dmitry Bogachev was appointed as new editor and joins Lars Heuer, Gabriel Hopmans and Steve Pepper.

The following documents were considered by the working group:

The WG requests that the editors prepare a new draft of CTM by the end of January so that it is available to the editors of TMQL and TMCL who have been requested to produce FCD drafts by the end of February.

The WG recognizes that the number of changes proposed in this document may result in some anomalies and requests that these be discussed by the editors and, if necessary, on the WG3 mailing list.

Discussion of Norwegian comments (N945)

Delimiters for topic blocks

There was agreement on the need for a single delimiter for topic blocks. The use of a blank line was rejected for the reasons given in the comments. Instead of the period, which has low visibility, the WG recommends using curly braces to delimit the start and end of a topic block. This is familiar from Java, CSS and other languages. It also allows editors like emacs, Eclipse and Textpad to assist the user in balancing delimiters, applying syntax colouring, etc. It is further recommended that a topic block consist of a header, in which all identifiers are defined, and a body, delimited by curly braces, containing statements. The syntax of each part is described below in further responses to comments and issues.

Delimiters for statements

There was agreement on the need for delimiters to separate statements within a topic block and the WG accepted the proposal in the Norwegian comment to use the semi-colon.

Lists of values

The WG agreed with the comment proposing to allow comma-separated lists of values for statements of the same type – and also for context-dependent template invocations (see below), prefix directives, and elsewhere.

Delimiting IRIs (1)

The WG agreed that the occurrences whose values are IRIs should always be delimited by angle brackets and should always be given as full IRIs, not as QNames. This will allow occurrences to be easily distinguished from template invocations in all but an insignificant number of cases. Example:

puccini = http://psi.ontopedia.net/Giacomo_Puccini {
   email     <mailto:gp@lucca.net>;
   homepage  <http://en.wikipedia.org/Giacomo_Puccini>;
   born-in   on:Lucca;
}

Binding subject identifiers to local identifiers

The WG agreed with the proposed syntax for binding subject identifiers to local identifiers and extended the proposal as follows:

The following code shows examples of topic headers both with and without bindings:

%prefix o = http://psi.ontopedia.net/
%prefix w = http://en.wikipedia.org/

!! single subject identifier using fullIRI
http://psi.ontopedia.net/Composer {
   ! statements
}

!! single subject identifier using QName
o:Composer {
   ! statements
}

!! multiple subject identifiers
o:Composer, w:Composer {
   ! statements
}

!! subject identifier with binding to local identifier
composer = o:Composer {
   ! statements
}

!! subject locator without local identifier
~ http://www.example.com/somedir/myDocument.html {
   ! statements
}

!! subject locator with binding to local identifier
myDoc ~ http://www.example.com/somedir/myDocument.html {
   ! statements
}

!! item identifier without local identifier
# http://www.example.com/myMaps/myTopicMap.ctm#myTopic {
   ! statements
}

!! item identifier with binding to local identifier
myTopic # http://www.example.com/myMaps/myTopicMap.ctm#myTopic {
   ! statements
}

!! complex example with multiple identifiers of all kinds
myDoc = http://psi.ontopedia.net/MyDocument
      ~ http://www.example.com/somedir/myDocument.html
      # file:/usr/pepper/topicmaps/someMap.ctm#myDoc,
        file:/home/jaeho/maps/someOtherMap.xtm#thisDoc {
   ! statements
}

Reduction of cryptic delimiters

The delimiters proposed above (=, ~, #) for subject identifiers, subject locators and item identifiers were felt to be less cryptic and therefore acceptable. Their adoption requires some changes to the syntax of reification and comments as discussed below.

Discussion of Revised CTM - Issues (N934)

Item identifier syntax

This issue became obsolete as a result of the discussion of the Norwegian comments. The proposed syntax is as shown above (using the pound sign).

Multiline comments

The new proposed syntax for multiline comments is the double bang, as illustrated above. The proposed syntax for multiline comments is:

!* ... *!

At first the standard Java convention (// and /* ... */) was considered, but this was rejected on the grounds that the syntax for comments should be the same in CTM and TMQL and the slash cannot be used in TMQL because of its use in path expression. Instead the slash is replaced by the bang which is not used for any other purpose in either CTM or TMQL.

Reifier syntax

Since the tilde is now used to flag a subject locator it was decided to use the double tilde (~~) to mark a reifier. This is still reminiscent of the TMQL syntax. However the TMQL operator (<~) is a “step”, which implies navigation, which is not appropriate in CTM, and therefore it was deemed correct to have a slightly different syntax.

Template imports

The WG upheld the decision taken in Montreal to remove the template import functionality. Only include and mergemap are felt to be necessary, and the following issues were clarified regarding prefixes and templates, both in general and when including or merging maps:

In general:

Regarding %include:

Regarding %mergemap:

Finally, it was suggested that %mergemap be renamed %merge for consistency with %include. The decision on this is left to the editors.

'undef' literal

The WG considers undef to be unnecessary. It should be removed from CTM.

Angle brackets around IRIs

As noted above, it was decided that angle brackets must be used to delimit IRIs that are the values of occurrences. In addition, angle brackets must be used around so-called “rootless” IRIs, such as those belonging to schemes like mailto and urn, in the (unlikely) event that they are used as subject identifiers, subject locators or item identifiers in a topic header.

Meaningful whitespaces

This issue was resolved by the introduction of curly braces to delimit topic blocks.

Unicode escape sequences

The WG was not able to resolve this issue. The editors are requested to investigate it further and make a recommendation.

QNames vs. IRIs

IRIs conforming to “rootless” schemes (i.e., those that do not have a slash after the colon) must always be delimited by angle brackets. QNames are never delimited by angle brackets. The IRI foo:bar will be represented as <foo:bar> and interpreted as an IRI belonging to the “foo” scheme. On the other hand, foo:bar without angle brackets will be interpreted as a QName with the prefix “foo” and the local part “bar”. Since the local part of a QName cannot start with a slash, any IRI belonging to a scheme that requires a slash after the colon will always be interpreted as an IRI, whether it is delimited by angle brackets (i.e., when used as the value of an occurrence) or appears without delimiters (i.e., when used as an identifier in a topic header).

Discussion of Revised draft (N935)

The following issues related to changes made between the CD draft and the revised draft of 2007-11-16 were discussed and resolved.

Colons

With the introduction of semicolons and curly braces the syntax now provides clear visual clues about the start and end of each statement. It is therefore no longer necessary to use colons to mark the type of a statement: the type is simply whatever comes immediately after either an opening curly brace or a semicolon. Removing colons from this part of the grammar has the additional advantage of avoiding overloading since colons are now used only in QNames.

Template definitions

The template definition syntax was revisited and it was decided to avoid the use of reserved words like def and end (and the consequential intrusion into the user's name space). The solution is to use “%” to flag any kind of keyword, not just directives and to use curly braces to mark the start and end of a template body. Production [56] of the 2007-11-16 draft should therefore be changed to the following:

template  -> '%define' template-name '(' parameters?  ')' '{' template-body '}'

Template invocations

The WG reviewed the decision taken in Montreal to use colons after template invocations within topic blocks in the light of decisions taken on other issues. The intent of the Montreal decision was to achieve a more homogeneous treatment of statements within a topic block. The same intent can be realized by not using colons at all. It was therefore decided to return to the Montreal syntax – albeit without the colons – and not to approve the changes made in the 2007-11-16 draft.

The form that should be taken by template invocations within a topic block was discussed at some length and it was recognized that there are different requirements in different languages. For SVO languages, including English, it is useful to be able to assert a binary association within a topic block (where the “subject” topic is given by the context) by simply stating the association type and the identifier of the “object” topic. However, for other languages, such as Korean, this does not work very well.

It was therefore decided that there should be three forms of template invocation:

Given the following template definition:

%define born-in( $person, $place ) {
   o:born_in( o:Person : $person, o:Place : $place)
}

all four of the following invocations result in exactly the same association:

puccini = o:Giacomo_Puccini {
   born-in lucca;             !! compact invocation
   born-in( puccini, lucca ); !! context-free invocation (1)
   born-in( _, lucca );       !! context-dependent invocation
}

born-in( puccini, lucca )     !! context-free invocation (2)

Discussion of other issues

A number of other issues arose during the discussion. Their resolution is described below.

General escaping mechanism for CTM delimiters

The WG agreed on the need for a general escaping mechanism for CTM delimiters, in order to solve the problem that IRIs, which are widely used in CTM documents, either in their full form or as QNames, may contain characters that CTM uses as delimiters. What is needed is something equivalent to CDATA marked sections in XML, i.e. a means of suppressing delimiter recognition.

After some discussion it was agreed to use single quoted strings for this purpose. When the parser encounters the start of such a string, delimiter recognition is turned off until the end of the string is encountered. The specification should state exactly which delimiters this applies to (at a minimum , ; {} () ~ and @). The single quote is escaped within a single quoted string by the use of three consecutive single quotes. The following example shows how the escape mechanism might be used to handle the problem of subject identifiers that contain commas:

!! escape the comma in "http://psi.example.org/Person,Composer"
!! NB without the single quotes there would be a syntax error
http://psi.example.org/'Person,Composer',
http://psi.ontopedia.net/Composer {
   - "Composer";
}