Difference between revisions of "Data View"
|Line 1:||Line 1:|
= The big picture =
= The big picture =
= Semantic data model =
= Semantic data model =
Revision as of 23:31, 14 October 2010
- 1 The big picture
- 2 Semantic data model
- 2.1 Conceptual model
- 2.2 Client models
- 3 Semantics
- 4 Framework ontologies
- 5 Address Space
The big picture
Semantic data model
Semi-structured semantic data
A semi-structured data model is a model which represents the data and the rules about the structure of the data in the same database. The Simantics semantic data model is a semi-structured graph of statements about resources.
The following statements about the Simantics semantic data model apply
- Each resource can contain at most one attached primitive value
- The native access model is localized e.g. list predicates for subject or objects for (subject, predicate) pair
- Some edges in the graph have a corresponding reverse edge but this is optional
- The set of statements is partitioned and the partitions can be independently processed
- The semantics of the model are built on layered ontologies
Some data does not benefit from semantic representation. This is the case for example when
- Data is numeric or textual
- Data has a clear primitive structure e.g. array
- There is really a lot data
- The data should be treated with a different granularity of versioning than semantic data.
Simantics includes its own primitive data modelling system. The system
- Specifies primitive data type schemas
- Selected semantic information can also be embedded into the primitive data model
- Specifies powerful primitive data structure access and manipulation interfaces
- Arbitrary data sources can be exposed through these interfaces
The semantic graph can be used to store one piece of primitive data per resource. This data is manipulated atomically and versioned alongside the semantic graph.
Configuration and Valuations
A general modelling contract in Simantics involves the following concepts
- Configuration. A semantic graph which describes the structure of a model.
- Variable. An identifier constructed from the configuration.
- Valuation. An assignment of values for a set of variables.
- Value. A primitive data valued function of time. Can be sampled or accessed as a time series.
The contract is that all primitive data for a model should be accessed from some valuation.
Simantics exposes the valuations and their structure using the semantic data model. The persistency model (database, workspace or memory) of semantic valuations is application specific.
The semantic valuation is needed as a uniform interface for accessing the data model produced by a semantic model.
- The configuration can be expressive and terse
- A simple configuration can produce a complex set of variables e.g. even an infinite set. The valuation can be lazily produced.
- The complex variable formation logic is hidden behind the valuation
- The model user only operates on the results
- There are usually multiple values for a given variable
- E.g. default, known good configurations and several experiments
- The valuation directly supports this by specifying a valuation identifier which is inserted in the middle of the variable identifier
- The values can come from arbitrary sources
- E.g. semantic configuration, running experiment, historical archive, sensor
- The model user accesses data by value using supplied interfaces. There is no need to know where the data came from.
- Computational variables can be freely implemented
- E.g. expressions based on variable identities can be evaluated by value or they can even be sent to the simulator as expressions.
- The implementation for new values does not know how dependent values are produced
Some general semantics for variables are established even at the Layer0 level. These semantics can be freely extended by applications. Variables can belong to some of the following categories
- Fixed. In many cases e.g. model component names are unique and persistent across all valuations.
- Configuration. Some variables do not change value during simulation, but there can be several known configuration values e.g. initial states for experiments.
- Simulator. Some variables have their values produced by a black box simulator process but other e.g. initial values are also stored semantically within Simantics
- Simulator only. Some simulator variables are only available from the black box simulator. Simantics stores initial values for these variables only as blobs, which are transferred to the simulator as black boxes.
- Derived. Some variables have values computed from other variable values.
- Dynamic. Some variables are semantically declared but their occurrence in valuations is optional.
- Fully dynamic. Some variables are not semantically declared can occur in valuations only. These variables can be analysed based on their primitive data content only.
A valuation does not have to supply values for all variables in a model configuration. A model user accesses values using a valuation stack which is guaranteed to produce values for all variables.
Some models contain large pieces (> megabytes) of primitive data which needs to be specially processed. Key questions are
- Partial reads and writes (e.g. of a terabyte file)
- Data transfer between server and client
- File access support (e.g. native state files of simulators stored in the semantic database and accessed by simulators as files)
Sessions and databases
The semantic database is accessed by using a session which provides shared access to a working copy of the database for multiple users. Key points are
- Concurrent use models. The session provides both transaction-locked concurrent use and publish/synchronize based operation.
- Authentication and access control. These are not yet supported in Simantics 1.0
Database client access interface
The database session interface allows reading and writing the semantic database using requests. Key points are
- Transaction model. The client provides a traditional read-write locking scheme. Reads are processed concurrently while writes require exclusive locking. Fairness requirements for the locking scheme are not specified.
- Caching of requests. Often-used read requests can be cached to achieve necessary read performance.
- Change listening. The client interface supports change listening with automatic dependency analysis. This mechanism allows the database to be used in reactive programming style.
- Threading model. The client supports both synchronous and asynchronous request formulation. Asynchronous model automatically distributes the computation on available processor cores.
- The connection between change sets and requests. Write requests form change sets, which can be further annotated with metadata.
Runtime client extensions
The adaption of resources into Java interfaces is a key method of enforcing the semantics of the data model. The adapter model includes
- A way to declare adapter interfaces in plugins
- A way to contribute adapter implementations in plugins
- A way to retrieve an adapter for a resource
The mechanism is described in Resource Adaptation.
The Simantics database client can extend the persistent semantic graph with more transient resources and statements. This mechanism can be used to e.g. expose valuations and other workspace-persistent data.
The following statements apply
- The life cycle of transient contributions is managed by a workspace-persistent entity
- Transient resources can be persistent within the life cycle of the workspace
- This entity ensures that co-existing subgraphs share the available identifier space
- Transient contributions are dynamically attached and detached to a database Session.
- The subgraph is applied on top of the persistent graph and below the query layer
Generic Relation Indexing
Layer0 Generic Relations can be used to construct per model indices of useful data, which then can be quickly searched using the Lucene search engine.
Ontology language (Layer0)
Layer0 describes the baseline semantics for Simantics applications. Layer0 involves
- Generic concepts used in all Simantics modelling
- Common activities encountered in all Simantics modelling
- Adapter interfaces, which implement the intended semantics of the concepts and the activities
- Type. A resource can be an instance of several types. The types assert structural constraints, contribute relations and specify adapters.
- Relation. All predicates in the semantic graph are part of a relation hierarchy. Structural constraints involve relations and basic queries support relation hierarchies.
- Requirement. A validity requirement, which can used to check semantic instances against their specification
- Data type. Specifies the structure and units of a piece of primitive data.
- User. Users are needed in change management, documentation and access control.
- Adapter. Adapter specifies a requirement for the client to supply an implementation of a certain interface.
- URI. A URI can be used to identify and locate resources within a semantic database.
- Subgraph. Specifies a set of statements for a resource. Needed in e.g. copy and delete activities.
- Library. Libraries are used to structure large amounts of data. Libraries are browsed.
- Project. Private data is organised in projects. A project specifies its required execution environment.
- Model. A model spans a variable space.
- Variable. A model determines a set of variables which have primitive data values.
- Valuation. A valuation contains values for variables.
- Label. A label is a textual representation of a resource.
- Viewpoint. A viewpoint specifies means to browse the semantic graph.
- Predicate. A logical predicate as e.g. in Prolog.
- Function. A function with n inputs and a single output.
- Index. A workspace-persistent realization of a predicate
- Activation. Used to activate and deactivate a concept.
- Operation. A runnable action, which makes a modification to the semantic graph.
- Extent. For determining a local extent of an instance. For determining subgraphs.
- Adaption. Retrieves an implementation of an interface for a resource.
- Analysis. Computes some result based on the semantic graph.
- Validation. Checks whether some constraints are satisfied.
- Access determination. Determines whether access to data is granted.
- Structure determination. Determines allowed structures in the graph.
- Shared identification. Establishes identity across semantic databases.
- Instantiation. Creates new instances based on types.
- Copying. Creates a copy of a subgraph in the same database or in some other database.
- Deletion. Deletes a subgraph.
- Activation. Transfers a concept between active and inactive states. Activation can involve displaying and initiation of certain processes.
- Mapping. Enforces rules between models.
- Editing. Transforms a model based on input.
- Variable discovery. Determination of the variable space based on the configuration structure of the model.
- Instance discovery. Finding e.g. shared instances of a type from the database.
- Browsing. Traversing the semantic graph using tree representations.
Layer 0 is further specified in Layer0 ontology
The top level organization or a Simantics database is depicted in the image below.
Subgraph transfer mechanism
The subgraph transfer mechanism is needed for identifying a set of statements to transfer from and to a database.
Use-cases, where the mechanism is needed:
- Ontology evolution
There have been at least two different prototype implementations of subgraph transfer (org.simantics.layer0.utils.extent and org.simantics.layer0.utils.extent2). Both have their own shortcomings.
There are two major technical challenges:
- How to define which statements belong to the subgraph
- First approach is based on Includes- and Propagates-relations.
- Second approach is based on classification of relations and defining parent resources for all resources.
- See also Subgraphs
- We have so far tried to find a generic solution for this problem. There are however often application specific concerns. A robust solution should probably involve some generic solution that can be customized with adapters.
- Even if adapters solve the problem of local propagation of extents, we still need a conceptual model that defines the whole extent.
- How to serialize resource references
To maximally utilize the increased semantic content of the semantic database a validation mechanism is defined for
- Checking the semantic configuration against semantic rules
- Checking the validity of the semantic implications of the semantic configuration (e.g. checking the flattening of a structural model)
- Checking real time data against semantic rules
The following statements describe the browsing solution in Simantics
- The browser builds up its representation using Contributions e.g. ViewpointContributions and Labelers.
- Viewpoints and Labelers are determined using Evaluators bound to Java class of input objects.
- The ontology for browsing specifies a viewpoint made up from contributions to named contexts (e.g. browser identity or model) with adapter-modelled acceptance criteria
- New contributions are described in ontologies and implemented as Java adapters.
- Each browser has its own unique identifier string
- Contributions are contributed and bound to contexts using Eclipse extension points
Concepts and their contracts
- A Java object which determines e.g. Viewpoints and Labelers for given input.
- Determines a set of child objects for given input object.
- Determines a label for an object.
- A set of keys and associated values which are used to steer the browsing process. A mandatory INPUT object is used in Evaluators.
Mapping framework in Simantics consists of several different levels. From the lowest to the highest:
- Provides a uniform interface to mappings and other graph modifying software components. This specification defines how a mapping is attached to a resource and how it is used. Actual implementation can be written directly in Java or using higher level mechanisms.
- SCL mappings
- A datalog based language that is meant for writing queries and transformations.
- Data driven mappings
- A mapping defined using lower level definitions that can be customized with specific relations. For example mapping.graph defines this kind of mapping. Graphical user intefaces can be built for defining this kind of mappings.
The structural ontology models hierarchically decomposed and connected components and their flattening.
The data model is described in Structural ontology.
Diagrams are used to configure models in various domains. The diagram model is a self-sufficient graphical model which is mapped into domain specific simulation models directly or via structural ontology.
Address Space is the structure of the data when a model is exposed in an communication interface. It consists of nodes and variables.
There are two types of address spaces:
- Configuration Configuration model consists of initialization values, such as parameters.
- Runtime Runtime model consists of variables and constants.
There are two structures of address spaces:
- Structural address space is a complicated model of instances and classes. The structure is a graph.
- Flattened address space is a flattened simplification of the previous model. It is a tree.
There is no structural address space of runtime model.
The structure of configuration flat model is equal or a sub-set of the a runtime flat model.