|
ISO/IEC JTC 1/SC 34
Document Description and Processing Languages Secretariat: Japan (JISC) |
| DOC. TYPE | Other document | |||||||
| TITLE | Issue List for TMQL, CD 18048 - Information technology - Topic Maps - Query Language (TMQL) | |||||||
| SOURCE | Project Editor: Mr. Robert BARTA, Mr. Lars Marius GARSHOL | |||||||
| PROJECT | JTC 1.34.18048 | |||||||
| STATUS | For information and review by WG 3 members. | |||||||
| ACTION ID | FYI | |||||||
| DUE DATE | ||||||||
| DISTRIBUTION | P, O and L Members of ISO/IEC JTC 1/SC 34 ; ISO/IEC JTC 1 Secretariat; ISO/IEC ITTF | |||||||
| ACCESS LEVEL | Open | |||||||
| ISSUE NO. | 24 | |||||||
| FILE |
|
| ||||||
Secretariat ISO/IEC JTC 1/SC 34
- IPSJ/ITSCJ (Information Processing Society of Japan/Information Technology Standards Commission of Japan)*
Room 308-3, Kikai-Shinko-Kaikan Bldg., 3-5-8, Shiba-Koen, Minato-ku, Tokyo 105-0011 Japan
*Standard Organization Accredited by JISC
Telephone: +81-3-3431-2808;
Facsimile: +81-3-3431-6493;
E-mail: kimura@itscj.ipsj.or.jp
2008/04/28
Abstract
This live document contains all current issues with the current TMQL specification. Please feel free to feedback directly to the author or to the TMQL mailing list.
Table of Contents
Impact on Language: low
=> What is missing from the current list?
- double-check that all tuple sequence functions are there
- dicing and slicing ... fold, unfold? ....
- vertical (columns), horizontal (lines)
=> should operators be directly mentioned in the grammar?
- '+', '-', '/', ....
- or keep there binary infix ops, unary prefix
Advantage: all information pertinent to function ON ONE spot
perfect for extensions implementation might offer
Disadvantage: more conservative readers may expect a hard-coded grammar
=> do we need to say something about the precedence of operators?
- well, obviously
- this is just a 'precedence' occurrence with a value
- and the normal left to right order will apply
=> do we need unary postfix operators?
- may easily collide with path expression postfixes
=> is it a problem that all functions are "vertical"?
- all functions in TMQL take a TS and return one
- some will need to consume the whole TS and work with it
- others only need it tuple-by-tuple
- tmql:concat does it too => so what?
=> should all functions be forced to be "stable", i.e. maintain order?
RESOLVED
Impact on Language: low
This has already been decided in Leipzig 2007, but the precise mechanism did not find its way into TMQL yet. So how to do it? Syntactically like CTM? And which prefixes should be predefined?
%prefix dc http://purl.org/.... %prefix tmdm http://www.isotopicmaps.org/.....
RESOLVED
Impact on Language: medium
TMQL allows to emit topic map content. This is controlled in the WHERE clause as TM content, written in a notation which is compatible with CTM. Question is how it would look and what particular constructs mean at certain positions in the CTM stream.
for $p in // person
where
$p isa killer
return """
# CTM code
patrick
- : "Patrick Durusau"
homepage: ....
shoesize : .....
new-killer
- name: {$p / name}
{$p} # ???
{$a} # ???
"""
Ideas:
{$p} # get the "whole" topic copied over
{$p / *} # names + occs
{$p ~} # subj identifiers
{$p =} # address
{$p!} # identifier
{$p / nickname} # occurrences of this type
{ $p / name } # names of this type
Things to watch out:
Continue to use the """ ... """ wrapper, .... disambiguate between {$p} and {$p !}, defacto we introduce a new axis "give me the item identifier" ! $p in the {} context means "the WHOLE thing"
{$p} # a topic = SIs + names + occs + types
???? if possible, should be like in SELECT ????
? what should individual constructs render?
! consensus:
{$p} # get the "whole" topic copied over
{$p / *} # names + occs
{$p ~} # subj identifiers
{$p =} # address
{$p!} # item identifier(s)
{$p / nickname} # occurrences/names of this type
{ $p / name } # names of this type
{ $a -> wife } # include all the wifes in the result map
{$a} # an assoc = all its roles (but nothing else)
*x # named anonymous
homepage: {$a / shoesize} # stringification, typeless
homepage: {$a / shoesize}^^xsd:URI
RESOLVED
Impact on Language: low
RESOLVED
Impact on Language: low
RESOLVED
Impact on Language: low
select $person
where
is-employed-by (employer: tm:subject, employee: $person)
# or even
* (employer: rho, employee: $person)
It is felt that it is redundant as there is a variable $_ which can serve the same purpose.
is-employed-by (employer: $_, employee: $person)
RESOLVED
Impact on Language: low
if $p > age then
return null
else
....
# create a default value if there is no homepage
select $p / name, $p / homepage || undef
where
$p isa person
Instead of the undef, any other value would do, but undef seems to be a nice standard
way. Compare it with:
for $p in // person
return
for $name in $p // name
return
if $p / homepage then
($name, $p / homepage)
else
($name, "sorry no")
RESOLVED
Impact on Language: high
# for Literal values:
"http://whatever.com" ~ # identifier use
"http://whatever.com" = # address use
# subject identification
http://whatever.com # IRI as subject identifier
# if containing dangerous characters
< dangerous > # IRI as subject identifier
RESOLVED
Impact on Language: high
In Leipzig 2007 LMG proposed to think about a syntax to quickly jump/traverse over assocs. While there was no proposal yet, the basic idea is to still allow the canonical
$p <- author [ ^ is-author-of ] -> opus
while also adopting a 'property' approach
$p <-> is-author-of # one step
#expands to
$p ( . <- * [ ^ is-author-of ] -> *, .) [ not $0 == $1 ] ($0)
Property navigations can be easily combined:
$p <-> ( is-author-of | employed ) # alternatives
# expands to
$p <-> is-author-of ||| $p <-> employed
And there is no need to further extend TMQL. The current semantics can host this.
The problem with all that is that TMQL has NO knowledge about properties or additional knowledge whether some properties are reflexive or transitive. This is all ontological knowledge, that was banned from TMQL. For good reasons.
Alternative A: wait for an ontology language come along? allowing to say
is-author-of isa reflexive-property .
Alternative B: leaving it to the TMQL environment to allow to specify it
$q => new TM::QL ('select .....', { reflexive => [ is-author-of ] });
And there is also always the alternative that deployments procure special-purpose functions for traversal:
tmql:traverse ($p, 'whatever is there to describe a path');
? Fast traversing with properties, such as $p <-> ( is-author-of | employed )
+ easy to define semantics
+ ontology-neutral fallback (works even if we know nothing about the properties)
- impossible to express reflexitivity and transitivity inside TMQL
+ that is a feature, not a bug
- but it means that apps need a way to signal this to a processor
- yes, but this is actually in a (reasoning) layer which the processor will NOT see
! by editorial edict: adopt it
RESOLVED
Impact on Language: medium
TMQL allows to generate XML content, such as in
return
<whatever>
{$p}
</whatever>
The standard (4.9) says "an XTMified version 2.0" representation of the content is generated, but what does it mean (a) for a topic item or (b) for an assoc item?
Ideally, this should be the same as for the CTM case, although there are positional constraints, i.e. it makes a difference _where_ things are to be blended in. In the XML case here, it is XML left and right.
If someone needs more control on the extent of the fragment, then this is the space where extension functions have to be used:
bouvet:xhalo (.....)
Not much where TMQL can help here.
? XML embedding of a topic item
! XTM 2.0 serialization of that topic
= identification + types + names + occs
? XML embedding of an association item
! XTM 2.0 serialization of that association
= assoc + type + roles + scope
? XML embedding of an occurrence item
! XTM 2.0 serialization
= type + scope
? XML embedding of anything else
! as string
RESOLVED
Impact on Language: medium
We figured in Leipzig 2007 that there should be (at least) two modes under which a TMQL should be run:
(I) interpreting "subclass of" really 'semantically' => transitively, reflexive
(II) interpreting it as normal assoc: what you see is what you get
What was not clear whether (I) and (II) may happen concurrently in the same query.
In fact, this issue does not affect TMQL itself: any kind of taxonometric (and other) reasoning is handled in a layer underneath. How "subclassing" is interpreted is then actually a matter how it is mapped into TMRM.
What remains is that either the API allows to control that mapping:
my $q = new TM::QL (....., mapping => 'tmrm-no-taxonomy');
or via a directive inside the query:
%interpretation as-is # so
%legend taxonometric # or so
Note that maybe there are more ways.
RESOLVED
Impact on Language: low
In LMG's issue list there was an remark that there is no "reverse reification" axis. That is not true as reification is just like any other axis, so it can be moved across in both directions.
What is true is that there is only the "forward" way as shortcut
stalin-dictatorship ~~> -> dictator
A reverse version obviously would be <~~
stalin <- dictator <~~ # this gets the association, then the reifier
$p / name <~~ # reifier of any name(s)
RESOLVED
Impact on Language: medium
XML content can be whereever content is, LMG mused whether it is reasonable to forbid it, like for instance here:
select $x from XML....
where
....
if it does not have an (obvious) meaning.
The question is also how this is different from
select $x from file:map.xml ~~>
where
....
or
select ....
from file:graph.rdf ~~>
or
select $x from <xtm:topic >....</xtm:topic>
where
....
While it is quite interesting to think how these "incoming" things can be given a TM-ish meaning, we can also take a simplistic view and interpret XML content as what it is there: a literal (or a list thereof). Of course any navigation going from there will not likely lead to anywhere. Who knows.
RESOLVED
Impact on Language: low
It was argued that in the grammar the "anchor" concept in navigation is confusing. Fact, though, is that it simplifies presentation as it allows to treat occurrences and names as one.
So maybe it could be removed, but it is unclear what the presentational costs are.
RESOLVED
Impact on Language: low
Alledgedly SQL92 allows to control the "string collation" so the modalities how strings are compared. This is particularily important for language-dependent sorting where one needs to override unicode code point ordering. The question is whether it is necessary that TMQL allows control on it.
The alternative is that vendors provide either special sorting functions, or even special string subtypes. The relationship to CXTM was mentioned but remains unclear.
RESOLVED
Impact on Language: low
At this stage TMQL has no navigational facility to reach to and into variants. There never has been a use case for this, or for variants in the first place. Maybe variants may also disappear in the future, so maybe their use should not be encouraged.
There will be functions to extract an variant items.
RESOLVED
Impact on Language: medium
TMQL does use 'select', 'for', ... as "reserved" words. They are never declared as such, because there is no need to. The question arose whether this will lead to conflicts with topics of the same name.
Rephrased this asks the question whether are there situations in the grammar where there are ambiguities in that there can be a topic reference where also a, say, 'select' is expected.
There are (or have to) as with path expressions you can always start with any topic
select / name
The way this is resolved is in having the alternative with the 'keyword' always first. To override if you really have a topic 'select', then
( select / name )
should do the trick, as it is now explicitly a path expression.
RESOLVED
Impact on Language: low
After a loooooong discussion it was decided that CTM hosts a feature called templates (href="http://www.semagia.com/tmp/ctm.html#sec-template). The basic idea is that arbitrary topic map constructs can be captured with a template first, without having any impact on the map to be generated. The template can have formal parameters. Later, templates can be invoked, provided with parameters and so inject chunks of content into the generated map.
Technically templates are functions (weren't the 80ies great?), although templates cannot be recursive as there are no conditionals in CTM.
Ontologically speaking, a template is a newly introduced terminus. It is a pattern which will occur in the map as often as the template is invoked. So, it can be argued, templates are a poor-man's solution to ontology engineering. And ontology engineering should be done in an ontology engineering language.
So, as long as there is no available ontology definition language for TMs, we will run into these problems, maybe solving them in as many ways.
RESOLVED
Impact on Language: low
CTM uses the symbol ~ to indicate reification of TM items. The tilde does not incorporate a direction, so the convention is that the thing on the left of ~ is reified by a topic (ref) on the right.
Oh no! There is also a topicmap reifier which works completely different, although it is doing the same thing:
~ whatever -: "This is a strange planet"
introduces a full topic (not only a topic-ref) whatever which reifies the map. There is certainly a good explanation for this inconsistency.
TMQL uses the /reifier/ axis, abridged by ~~> in forward direction. Yes, that's it.
RESOLVED
Impact on Language: medium
In the Leipzig meeting it was discussed that the EVERY clause is only a syntactic variation of the SOME clause. Every EVERY can be transformed into an equivalent form using SOME:
every $p in // person
satisfies $p / born
==> not
some $p in // person
satisfies not $p / born
And hence, EVERY can be removed.
Following this argumentation much of the language can be removed. Also SOME is just a variation of FLWR:
some $p in // person
satifies $p / born
==>
for $p in // person
where
$p / born
return
$p
And FLWR expressions can be transformed into path expression, etc.
? Removal of EVERY clause
+ minimally smaller language
- users have to twist their brains to get the NOT right
! Oslo 2007: consensus: rejected
? Removal of SOME clause
+ it is just a convenience
- the symmetry is something which people expect
! Oslo 2007: vote (1/many) rejected
RESOLVED
Impact on Language: high
TMQL contains a number of sections which can be argued to actually belong into other TM standard documents. They have only be added to make TMQL work.
- Atoms and Identifiers -> CTM - Navigation -> TMDM - Environmental Clause -> CTM - Predicates -> TMCL/CTM - Ontologies -> TMCL/CTM - Predef Types & Functions -> CTM/TMCL
If these would be factored out, then the TMQL specification would be reduced by roughly 25% .
? Refactor TMQL standards part into TMDM, CTM and TMCL?
+ cleaner specification landscape
- TMDM is put in stone already
+ CTM and TMCL are close to finalization, do it now or never
! Oslo 2007: refine it as below
? Refactor atom handling into CTM
= which atoms are there, how do they look like
= would mean that this is defined in CTM
= would mean that TMQL is just reusing the definitions
! Oslo 2007: accepted by consensus
? Refactor inferencing
= at the moment inferencing is "predicates", i.e. associations only
not general (inferring values for occurrences, for example)
= taxonometric reasoning is implicitly defined by TMDM, should stay in TMQL
= means: refactor into what?
= new language for ontology definition (TIMBL, TOWL), or
- may take time to jump-start
- but Patrick says, this can be relatively fast
= squeeze it into TMCL?
- constraints and ontological info not the same
- still: it means, that TMQL must ___ALLOW___ to specify onto info inline
+ huge benefit: TMQL is itself inferencing-agnostic and is operating on
virtual map
+ different people have different needs
! Oslo 2007: accepted by consensus
? Refactoring ontology
= means how to define that something is a TMish resource (can be anything)
+ yes, this does not affect TMQL at all
- but there still needs a way to 'access' ontologies (as topics)
= means a 'prefix' mechanics
= AND!!!! means that if it is a topic it can be treated as such
= can go into TIMBL/TMOL
! Oslo 2007: accepted by consensus
? Refactor 'Navigation'
= Navigation is effectively the expectation a TMQL processor
has about the map it navigates through.
? refactor where to?
= into TMDM annex?
- but this may be difficult as TMDM is '60.60'
= into separate document?
- does this have a value by itself?
+ maybe only for architects
! Oslo 2007: rejected by vote
= keep navigation axes semantics inside TMQL
RESOLVED
Impact on Language: low
Some approaches for a TM query language have introduced language features to extract all or a particular part of the types of a topic, usually via a distance operand:
types ( cat , 3 ) # get the type, its super types, and theirs
types ( cat, * ) # get all of them
TMQL does not support this, mainly because using distances in the type structure in a query might be brittle as the type structure might be refined later.
RESOLVED
Impact on Language: low
TMQL allows to follow the 'reification' axis in both directions. It leads from a topic to the item it reifies or the other way 'round. For the forward direction there is already a shortcut ~~> defined.
When in CTM a reification is declared, then there also a symbol has to be used, one candidate being <~~. The question is now whether TMQL should have this as another shortcut:
<~~ ==> << reifier
RESOLVED
Impact on Language: medium
TMDM defines the concept 'subject', but it says nowhere that all things in the map (topics and associations, for instance) are instances of 'subjects'. And it also does not say that all types are subtypes of 'subject'.
The problem is that TMQL should offer users a 'catchall' concept, such that they can express a 'dont care semantics':
select $person
where is-employed-by (* : tbl, organisation: $o) &
$o isa ....
And the TMQL semantics of course also needs access to 'all the things in a map'.
What does this * amount to now? Is it identical to tm:subject? And are there specialized terms for (a) topics, (b) associations, (c) occurrences?
? Does tm:subject return all items in the map?
+ tm:subject is a great placeholder
- TMDM does not say it (or does it?)
- now it does, via the TMDM -> TMRM mapping
! fixed in TMRM appendix B
? How do topics, associations, .... relate then to tm:subject?
! defined in TMRM appendix B
RESOLVED
Impact on Language: medium
At the moment quantifiers in TMQL are only of a EVERY or SOME nature, there is no way to say "give me all hands with at least 5 fingers".
It is believed, that such a feature is required by TMCL and since TMCL will define its modelling patterns with TMQL, there is some issue here.
So the idea would be to allow
select $person
where
at least 5 $finger in $person <- * -> finger
satisfies
$finger / status == "functional"
or
select $person
where
exists at most 2 $finger in $person <- * -> finger
This functionality can be mapped into the existing TMQL, though, although this may be cumbersome to do it manually. The first would be transformed into
# there is one finger, satisfying this
# and another finger, satisfying this
# and another
#
# and the fifth one
The other would be transformed into the logic equivalent "the person has not at least 2 fingers satisfying".
? Should TMQL get quantifiers AT MOST integer, AT LEAST integer?
+ can be mapped into canonical TMQL by the processor
- can be expensive to compute
+ TMCL would profit from that
- a function fn:count (....) >= 10 has the same effect, so why introduce new keywords
- not true: fn:count has to produce ALL results, count them and then do compare
= an 'at least' expresses more what a user needs
+ can be used for optimization
! Oslo 2007: vote (7 yes, 1 no)
= editors will present a proposal in the next draft
RESOLVED
Impact on Language: low
At the moment, TMQL does not allow to retrieve item identifiers within a query. It is still possible to write
select $person
where
$person isa person
and that will produce a sequence of items; whether these are passed as items or item identifiers back to the calling application, TMQL does not say. Also when items are used in FLWR expressions, sometimes their id is used, depending on the context where they are embedded.
What does not work (yet) is to say
select $person >> id
mainly because it is unclear whether there is a use case for it.
Inside a query, item identifiers could be used for comparing
where
$person >> id == 'whatever'
but this can be expressed less baroquely as
where
$person == whatever
What could be interesting is to learn about the item identifiers of associations and characteristics
// is-employed-at >> id
or to create a string from the item identifier
select "this:is-all-weird#" + $person >> id
RESOLVED
Impact on Language: medium
'true' and 'false' have two meanings. One is that they are just values of a data type boolean, in the same way as 3, 3.14 or "sunshine" are values of other types.
So in this sense
select $person
where
exists 3.14
will return all persons as there always exists the constant 3.14. But what about
select $person
where
exists false
Also 'false' exists always, but most users would think that the above is equivalent with
select $person
where
false
which would - in fact - return nothing.
So there is a constant 'false' outside boolean expression and that behaves like a value. And then there is 'false' inside boolean.
RESOLVED
Impact on Language: medium
At the moment, TMQL allows only constants for roles when navigating to or from an association:
....
where
$person <- employee -> employer == big-bad-corp
It would be possible to generalize this and allow PEs and with them variables as roles
$person <- $whatever_role [ ^ is-employed-at ]
As path expressions and association predicates are closely connected, this also means that variables for roles are possible there too:
is-employed-at ($whatever_role : $person, ...)
This can also be extended to 'variables as assoc type':
select $person / name,
where
$assoc_type ($whatever: $person, ....)
select $person, $foo
where
$person / $foo
? should full path expressions be allowed for role (types)
- are there use cases for this? Not in the 'official' list
+ should not be too expensive to implement
+ no computational complexity added
- makes it impossible to use common index structures, may be slow
! Oslo 2007, rejected by consensus
? should variables be allowed for role types?
- impedes optimization (use of indices)
! Oslo 2007: weak acceptance
? should variables be allowed for assoc types?
- impedes optimization (use of indices)
! Oslo 2007: weak acceptance
RESOLVED
Impact on Language: medium
Functions in TMQL are - as usual - invoked with parameters, such as in
if math:sqrt ($person / shoesize) < 10
then
"big square foot"
Parameter are usually computed and - because of the nature of topic maps - the number of results may vary. What if a person does not have a shoesize at all, or 100eds of them?
One way to deal with that is to generalize all functions being able to process whole sequences. If a person has shoesizes 2, 3, and 4, then
math:sqr ( $person / shoesize )
will return a sequence 4, 9, 16. That is of course cheap to implement.
This procedure would also be consistent with the case that we had several parameters, each potentially generated a sequence of values:
terrorism:arrest-them-all ($person / name, $country / name, $pretext / name)
The tuple expression may generate tuple sequences with varying length, depending on how many names exist for the things above.
? functions operate always on tuple sequences
+ straight-forward semantics
+ formal semantics already defines functions so
+ easy to implement and also fast in implementation as calls can be optimized away
- slightly unsual semantics for normal engineers, they are not used to sequence processing
- Python, Ruby, Perl and Haskell all have list comprehension, so what?
! adopted into TMQM specification and formal semantics (by editor)
RESOLVED
Impact on Language: medium
TMQL is using the term 'characteristics' as subsumption for topic names and topic occurrences. This is in deviation to an earlier interpretation which also included topic roles. TMDM does not use the term 'characteristics' anymore.
The reason for re-introducing it in TMQL was that that way names and occurrences could be treated via one syntactic structure. It would equally be possible to break names and occurrences in two and introduced identical navigation mechanism for this.
? Should the concept of characteristics be broken up into 'names' and 'occurrences'?
+ alignment with TMDM in this respect
- pretty useless duplication of navigation, it is the same
+ 'characteristics' had a different meaning in the past
! Oslo 2007: consensus, better align it with TMDM
RESOLVED
Impact on Language: medium
At the moment, TMQL treats atomic data values as exactly this: atomic. This implies that the following is NOT possible:
select $person / name
where
$person / shoesize >> types == xsd:float
[ find all persons which have as shoesize a FLOAT value ]
RESOLVED
Impact on Language: high
At the moment, the TMQL standard does not detail the errors which can occur during the static/dynamic analysis; and it also does not give the errorenous situations a name.
From a language perspective this is not necessary, but OTOH, it reduces compatibility between TMQL processors as one application has to expect potentially different sets of exceptions.
An example of a transient error would be
select $p from http://www.topicmaps-are-us.com/ ~ ~~> where $p isa person
? Should TMQL specify the mechanism?
= i.e. exception propagation, or error codes?
- this does preempt implementation styles
! Oslo 2007: consensus, maybe better not
? Should the TMQL standard name all error situations?
+ higher compatibility between TMQL implementations
- less freedom for implementors
- makes the standard take longer
- ok, maybe a small bit
- some conditions may be hard to differentiate, depending on the implementation strategy
- compatibility on the error-level would be API compatibility
! Oslo 2007: rejected by vote (2/many)
= no list
? Should TMQL name only a top-level?
= error in analysis, error in evaluation
= persistent error, transient error (Geir Ove Gronmo)
+ somewhat a middleground
! Oslo 2007: consensus: then only 'good' vs. 'bad'
RESOLVED
Impact on Language: high
TMQL allows to generate XML content.
return
<terrorists>{
for $person in // person
where
soundex ($person / name , "isimi-bin-lidin")
return
<evil-evildoer>{$person / name}</evil-evildoer>
}</terrorists>
In that, XML content is _embedded_ into the TMQL expression, this is NOT just a text template which is expanded. The advantage is that processors can included this in the optimization process.
The XML allowed here at the moment does not reflect XML in all its beauty. So, for instance, TMQL does not support CDATA, processing instructions or XML namespaces. The reasoning being that all this can be much more effectively handled with XSLT.
Other ideas are to use an XSLTish syntax additionally for some flexibility:
<tmql:comment>
sdfsfdsfd
</tmql:comment>
<tmql:element name="whatever">
<tmql:attribute name="aname">avalue</tmql:attribute>
content
</tmql:element>
=> Issue: TMQL native XML support
? TMQL should support 100% of XML?
- quite expensive to implement
- missing pieces can be added with an XSLT processor much more effective
+ completeness is a beautiful thing
! Oslo 2007: what do we *really* need?
CDATA NO
PI NO
namespace YES
DOCTYPE NO
comments NO
variable element names YES
variable attribute names YES
RESOLVED
Impact on Language: High
In principle TMQL could do without any XML support. The only thing to be specified is that results should be generated in a fix, defined XML structure.
The downside of this is that SELECT, FLWR and path expressions only can generate tuple sequences and that the XML format only can contain exactly these.
? TMQL should support 0% of XML
+ TMQL syntax becomes smaller by 6 rules
+ implementation is a bit simpler
- results cannot be free, deeply nested XML, but only flat
- one always will need XSLT to beat the results into proper shape
! Oslo 2007: implicit rejection, see issue native-XML-support
RESOLVED
Impact on Language: low
Sometimes it would be convenient to have a Perl-like regexp operator inside the language, such as
select $person
where $person is person
& $person / name =~ /^Tim/i
RESOLVED
Impact on Language: very high
At the moment, TMQL has a rather arbitrary choice of primitive data types: integer, URIs, decimals, datetimes, most of them cloned from the XML schema data types
http://www.w3.org/TR/xmlschema-2/datatypes.html#typesystem
The question is which of these should be in the final standard. Maybe the following (lheuer)?
- One of the floating point numbers - one integer datatype - xs:string - xs:anyURI - xs:boolean - xs:date - xs:dateTime ??? - xs:time ???
And what about an 'undefined' data type to express that the results is _undefined_?
select $p / shoesize || undefined
where $p isa person
Maybe also aggregation functions?
http://www.w3.org/TR/xpath-functions/#aggregate-functions
Another alternative is not to define _any_ primitive data type and leave that to implementations. The only thing to define in the standard are minimal requirements each data type has to satisfy:
serialisation rules (text to value, and back) ordering (the meaning of <= operator)
Still, in TMQL, the binary operators +, -, *, / and the unary operator - can be left between value expressions. Implementations can the detail how these operators are then be interpreted for particular data types.
? should all types from XSD be understood?
+ very rich collection, strong big-vendor support
+ constants can be written 'naturally' (3 instead of "3"^xsd:integer)
=> also all XPath 2.0 functions and operators have to be supported?
- very expensive to produce for small developers
- supported data types and supported functions are not strictly connected
! Oslo 2007: rejected by consensus
=> see issue 'refactoring': all predefined datatypes are named by CTM
? should NO type be native?
+ postpones decision to implementation
- reduces the compatibility between implementations
! Oslo 2007: rejected by consensus
? should there be a predefined constant 'undefined'?
+ allows to express 'undefined' value
! Oslo 2007: weak acceptance
RESOLVED
Impact on Language: low
The operator '--' subtracts two sequences, such as in
// person -- // evildoing-evildoers
Should this also be allowed (all things about the person except the shoesize):
$p / * except $p / shoesize
RESOLVED
Impact on Language: high
## Comment: NOT A TMQL, but a CTM issue
When someone writes in CTM (modulo syntax details)
tbl isa person
! name: Tim Berners Lee
! nickname: TBL
homepage: http://www.bigtim.com
then what it implicitely means is that nickname is a subclass of a name and homepage is a subclass of an occurrence.
TMDM never spells that out explicitly, because it is irrelevant there. Only in serialization syntaxes like XTM or CTM this has to be defined.
? should an occurrence type be implicitly a subtype of 'occurrence'?
+ this is what people will expect
= also that a nickname is a special name
! Oslo 2007 resolution: accepted
? where should this fact be standardized?
= TMDM
- even an annex is difficult to add
! Oslo 2007: rejected by consensus
= CTM
- why CTM and not XTM (et.al)?
+ because it would be at least one place to get it right?
! Oslo 2007: rejected
= put it into TMDM -> TMRM mapping (LarsM)
! Oslo 2007: accepted by consensus
RESOLVED
Impact on Language: very high
A function is a systematic functional dependency over particular sets of values. The age of a person, for example, is functionally dependent on (a) the person's birthdate and (b) the current time. Similarily, a predicate is a constraint on a constellation of particular values, and items within a topic map.
Accordingly, both are expressing additional knowledge about a problem domain. As the modelling of an application domain is usually done via an ontology definition language, one can argue that functions and predicates should NOT be part of TMQL, which is meants as a data access language.
? Should Functions and Predicates be included into TMQL
- both are ontological information, should go into an ontology language
- yes, but TMCL will not offer them, neither does CTM, ...
+ they are EXTREMELY useful to organize the query, uhm knowledge
- SPARQL does not have function declarations or predicate declarations
! Oslo 2007: no, by consensus
=> all this should be hosted inside a TM 'ontology language'
=> still, TMQL must now allow statements made in this "TMOL" language
RESOLVED
Impact on Language: very high
If it is accepted that TMQL should contain features to define local functions and predicates (which are specialized functions), the question is then how this is integrated conceptually and syntactically into TMQL.
One option is to use a convential syntax, something like
function nr_employees (organisation: $o)
return
fn:length ( $o <- employer )
'function' and 'return' would become TMQL keywords connecting together the function name with the function body. The function body - by its nature - will always be a TMQL expression.
The parameter profile of the function - here after the function name inside a () pair - would specify which variables are to be treated as constants, i.e. those where the caller will provide values for at invocation time. Additionally, a type (here organisation) can provide additional information to the TMQL processor which it may (or may not) use.
Technically it would be sufficient to write
function nr_employees ($o)
return
fn:length ( $o <- employer )
or even
function nr_employees
return
fn:length ( $o <- employer )
if a rule is introduced that all unbound variables (those not explicitly quantified with FOR, SOME or EVERY) become a parameter. That rule is implicit already as it would be redundant to have a parameter list and a list of unbound variables. It would then be an error if the two lists would be different.
A similar structure and convention can be introduced for predicates
predicate is_NGO
where
not $o <- part -> whole == // government
That checks whether an organisation is part of anything which itself is an instance of a government (the == comparison is always interpreted existentially in TMQL!).
If it is also accepted that functions and predicates are actually ontological information then also a more TMish syntax can be chosen, especially since the only task is to bind an expression to a name:
nr_employees isa tmql:function
tmql:return : {
fn:length ( $o <- employer)
}
Hereby CTM can be used (AsTMa= here only for demonstration). Otherwise, no special syntax is necessary. The {} bracket pair is used throughout TMQL already to wrap query expressions. The only procurement are two predefined TMQL concepts (tmql:function and tmql:return).
There is, though, an additional requirement for CTM to allow occurrence values to be wrapped inside {} pairs. By itself this would imply that a CTM parser has to parse TMQL expressions. To avoid this burden and to keep CTM parsers independent from TMQL expressions, also the following can be allowed in CTM:
nr_employees isa tmql:function
tmql:return : {{
.... whatever, even with use of brackets {}
}}
The opening {{ (any number of brackets is possible as prolog) must only have one matching set }}. This is simple and fast to implement.
For predicates this scheme runs similar:
is_NGO isa tmql:predicate
tmql:where : not $o <- part -> whole == // government
As the syntax of the boolean expression in the WHERE clause is quite restricted (one cannot generate content with it), there would be no need for any terminators. But again, this would imply that an (embedded) CTM parser would have to understand the syntax of boolean expressions. To keep that agnostic, the same {} bracketing can be used; and as above, this is can be kept optional if the end-of-line is terminating occurrence values:
is_NGO isa tmql:predicate
tmql:where : {{
not $o <- part -> whole = // government
}}
? should functions, predicates and templates be first-class topics?
- many people will find this very unconventional, are used to different syntax
- "un-conventional" is a marketing argument, not a technical argument
- "it is just a fu**ing syntax"
+ maybe, but the question is: use yet another new one or re-use an existing one?
+ it is a TMish view on existing concepts
+ directly reflects a TM-view on functions, no further explanation to a user necessary
+ also predefined functions and predicates can be presented that way
- it can also be done with conventional syntax,
- yes, but then more explanation is necessary how particular syntactic elements
have to be mapped onto a TMish view of them
- yes, but we need syntax for: functions, predicates and 'prefixes'
+ reduces syntax for TMQL
- yes, but it puts some burden onto CTM
- which is minimal ({{...}}...)
- dedicated syntax has less clutter
- every dedicated syntax is _expected_ to have less clutter
- and a second step to expose functions as topics has to be defined
- parsing gets complicated when functions are subclassed (my_func iko tmql:function)
- not really, the functionality must be in a TMQL processor already
- parsing gets more complicated when scoping is used
- maybe, not sure how much of an issue this is
- with dedicated syntax the requirements on the (function and predicate) body
can be tested better by a parser
+ it is not possible to 'hide' a function using subtypes of tmql:function
+ it is not possible to abuse scope
! Oslo 2007: this question will be refactored into a TMOL language
=> see 'refactorisation' issue
=> how functions and predicates can be declared is one thing
=> how TMQL 'experiences' these functions and predicates is another thing