Mapping Topic Maps on Relational Databases

Abstract

Topic maps are a new ISO standard for describing knowledge structures and associating them with information resources, a solution for organizing and accessing large and continuously growing information pools (1).

Figure 1.
An overview of the final result.
They are dubbed as the 'GPS of the information universe'. One possibility to permanently store Topic Maps is using Relational Databases. In this document I present a possible database schema that allows to store Topic Maps.

There are general rules to map DTDs to databases (2) but here I tried to build something ad hoc for Topic Maps. The result is a bit naive and cluttered (Figure 1). I was hoping I could find out a more elegant solution. If you have any suggestion or you know better database model please let me know.

The Data Type Definition (DTD) for XML Topic Maps (XTM)

The starting point is the Data Type Definition (3) of XML Topic Maps (4). First obstacle: there is no way of printing this document on only one sheet of paper. It is hard to understand a concept if you cannot summarize it in only one sheet. For this reason I remove all the comments and I change the structure of the document to obtain a shorter version that I can print and keep on my desk. During the conversion I introduced some modification but I think this will not compromise the final result. If you want to print this schema in one page use this color or black and white version. This is the color version of this remodeling:

topicMap
|
|--- topic *
|    |
|    |--- instanceOf *
|    |       ( topicRef | subjectIndicatorRef )
|    |--- subjectIdentity ?
|    |       ( topicRef | subjectIndicatorRef ) *
|    |       resourceRef ?
|    |--- baseName *
|    |    |
|    |    |--- baseNameString
|    |    |--- scope ?
|    |    |       ( topicRef | subjectIndicatorRef | resourceRef ) +
|    |    +--- variant *
|    |         |
|    |         |--- parameters
|    |         |       ( topicRef | subjectIndicatorRef ) +
|    |         |--- variantName ?
|    |         |       ( resourceRef | resourceData )
|    |         +--- variant *
|    |                 -> recursive
|    +--- occurrence *
|         |
|         |--- ( resourceRef | resourceData )
|         |--- instanceOf ?
|         |       ( topicRef | subjectIndicatorRef )
|         +--- scope ?
|                 ( topicRef | subjectIndicatorRef | resourceRef ) +
|
|--- association *
|    |
|    |--- instanceOf ?
|    |       ( topicRef | subjectIndicatorRef )
|    |--- scope ?
|    |       ( topicRef | subjectIndicatorRef | resourceRef ) +
|    +--- member +
|         |
|         |--- ( topicRef | subjectIndicatorRef | resourceRef ) *
|         +--- roleSpec ?
|                 ( topicRef | subjectIndicatorRef )
|
+--- mergeMap *
        ( topicRef | subjectIndicatorRef | resourceRef ) *

Legend

  Mandatory Single
+ Mandatory Repeatable
? Optional  Single
* Optional  Repeatable

topicRef            -> xlink:href (Reference to a Topic element)
subjectIndicatorRef -> xlink:href (Reference to a Subject Indicator)
resourceRef         -> xlink:href (Reference to a Resource)
baseNameString      -> #PCDATA
resourceData        -> #PCDATA
Let's start from the leaves of this tree. As you can see almost all the leaves have a common pattern:
6 repetitions: ( topicRef | subjectIndicatorRef )
5 repetitions: ( topicRef | subjectIndicatorRef | resourceRef )
2 repetitions: ( resourceRef | resourceData )
1 repetition:  resourceRef
               baseNameString

All of them are 'xlink:href' type. Only resourceData and baseNameString are PCDATA. Being in parenthesis separated by pipes means that only one of these elements need to be defined.

There are few possibilities to store such information. One is having a table where each item is a field. Other possibility is having two fields. In the first we can store the value of the field, in the second we store the type of the first field.

To understand better these two possibilities let's suppose that we built the topic "Hamlet" that is the play of Shakespeare. This topic is a instance of the class "plays". This class is a non addressable subject that we will indicate with the link http://www.shakespeare.org/plays.html. This is how the two different tables could look like:

DTD
<!ELEMENT instanceOf  ( topicRef | subjectIndicatorRef ) >

First possibility

Table instanceOf
Column Required
topicRef No
subjectIndicatorRef No

Example
topicRef subjectIndicatorRef
  http://www.shakespeare.org/plays.html

Second possibility

Table instanceOf
Column Required
value Yes
type Yes.
  • 't' for topicRef
  • 's' for subjectIndicatorRef

Example
value type
http://www.shakespeare.org/plays.html s

My choice is for the second type that should keep the structure of the database simpler.

These are all the tables present in the database.

topicMap
|
|--- topic * [topic]
|    |
|    |--- instanceOf *
|    |       ( topicRef | subjectIndicatorRef ) [instanceOf]
|    |--- subjectIdentity ?
|    |       ( topicRef | subjectIndicatorRef ) * [subjectIdentityV]
|    |       resourceRef ? [subjectIdentity]
|    |--- baseName *
|    |    |
|    |    |--- baseNameString  [baseName]
|    |    |--- scope ?
|    |    |       ( topicRef | subjectIndicatorRef | resourceRef ) + [scope]
|    |    +--- variant * [variant]
|    |         |
|    |         |--- parameters
|    |         |       ( topicRef | subjectIndicatorRef ) + [parameters]
|    |         |--- variantName ?
|    |         |       ( resourceRef | resourceData ) [variantName]
|    |         +--- variant * [variantRecursive]
|    |                 -> recursive
|    +--- occurrence *
|         |
|         |--- ( resourceRef | resourceData )  [occurrence]
|         |--- instanceOf ?
|         |       ( topicRef | subjectIndicatorRef ) [instanceOf]
|         +--- scope ?
|                 ( topicRef | subjectIndicatorRef | resourceRef ) + [scope]
|
|--- association * [association]
|    |
|    |--- instanceOf ?
|    |       ( topicRef | subjectIndicatorRef ) [instanceOf]
|    |--- scope ?
|    |       ( topicRef | subjectIndicatorRef | resourceRef ) + [scope]
|    +--- member + [member]
|         |
|         |--- ( topicRef | subjectIndicatorRef | resourceRef ) * [memberV]
|         +--- roleSpec ?
|                 ( topicRef | subjectIndicatorRef )  [roleSpec]
|
+--- mergeMap * [mergeMap]
        ( topicRef | subjectIndicatorRef | resourceRef ) * [mergeMapV]

Database schema

And finally this is the schema! Many of the tables in this schema are linked to the table topic. In a topic map (almost) everything is a topic. I didn't link graphically these tables with lines to avoid a too complicate schema. I instead used a yellow border for all the tables that could be linked to the table topic through the topicRef element.

Downloads

Click here for the download page.

Useful Links and Documents

All the images and the database structure are generate with the program DeZign by Datanamic. You can download a demo version here: http://www.datanamic.com/dezign/.

For information about Topic Maps, these are good starting points:

Notes

  1. For a good introduction to topic maps consult the White Paper done by Empolis (www.empolis.com/download/docs/whitepapers/empolistopicmapswhitepaper_eng.pdf).
  2. Mapping DTDs to Databases www.xml.com/pub/a/2001/05/09/dtdtodbs.html
  3. DTD document for XTM: www.topicmaps.org/xtm/1.0/xtm1.dtd
  4. Document with XTM specification: www.topicmaps.org/xtm/1.0/

Home |