Der Wachstumskern LEDS veröffentlicht regelmäßig wissenschaftliche Papers, welche im Rahmen des Forschungsprojektes Linked Enterprise Data Services entstehen. Auf der SEMANTiCS 2016 werden wir im Research Track insgesamt vier davon vorstellen, welche von den beteiligten Hochschulen Universität Leipzig und der Technischen Universität Chemnitz eingereicht wurden.
„KESeDa - Knowledge Extraction from Heterogeneous Semi-Structured Data Sources“ M. Krug, M. Seidel, F. Burian, M. Gaedke | TU Chemnitz
A large part of the free knowledge existing in the Web is available as heterogeneous, semi-structured data, which is only weakly interlinked and in general does not include any semantic classification. Due to the enormous amount of information the necessary preparation of this data for integrating it in the Web of data requires automated processes. The extraction of knowledge from structured as well as unstructured data has already been topic of research. But especially for the semi-structured data format JSON, which is widely used as a data exchange format e.g., in social networks, extraction solutions are missing. Based on the findings we made by analyzing existing extraction methods, we present our KESeDa approach for extracting knowledge from heterogeneous, semi-structured data sources. We show how knowledge can be extracted by describing different analysis and processing steps. With the resulting semantically enriched data the potential of Linked Data can be utilized.
“Executing SPARQL queries over Mapped Document Stores with SparqlMap-M” Jörg Unbehauen, Michael Martin | Universität Leipzig
With the increasing adoption of NoSQL data base systems like MongoDB or CouchDB more and more applications store structured data according to a non-relational, document oriented model. Exposing this structured data as Linked Data is currently inhibited by a lack of standards as well as tools and requires the implementation of custom solutions.
While recent efforts aim at expressing transformations of such data models into RDF in a standardized manner, there is a lack of approaches which facilitate SPARQL execution over mapped non-relational data sources. With SparqlMap-M we show how dynamic SPARQL access to non-relational data can be achieved. SparqlMap-M is an extension to our SPARQL-to-SQL rewriter SparqlMap that performs a (partial) transformation of SPARQL queries by using a relational abstraction over a document store. Further, duplicate data in the document store is used to reduce the number of joins and custom optimizations are introduced.
Our showcase scenario employs the Berlin SPARQL Benchmark (BSBM) with different adaptions to a document data model. We use this scenario to demonstrate the viability of our approach and compare it to different MongoDB setups and native SQL.
“Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store” Natanael Arndt, Norman Radtke and Michael Martin | Universität Leipzig
Collaboration is one of the most important topics regarding the evolution of the World Wide Web and thus also for the Web of Data. In scenarios of distributed collaboration on datasets it is necessary to provide support for multiple different versions to exist simultaneously, while also providing support for merging diverged datasets. In this paper we present an approach that uses SPARQL 1.1 in combination with the version control system Git, that creates commits for all changes applied to an RDF dataset containing multiple named graphs. Further the operations provided by Git are used to distribute the commits among collaborators and merge diverged versions of the dataset. We show the advantages of (public) Git repositories for RDF datasets and how this represents a way to collaborate on and to consume RDF data. With SPARQL 1.1 and Git in combination, users are given several opportunities to participate in the evolution of RDF data.
“Towards Versioning of Arbitrary RDF Data” Marvin Frommhold, Ruben Navarro Piris, Natanael Arndt, Sebastian Tramp, Niklas Petersen and Michael Martin | Universität Leipzig
Coherent and consistent tracking of provenance data and in particular update history information is a crucial building block for any serious information system architecture. Version Control Systems can be a part of such an architecture enabling users to query and manipulate versioning information as well as content revisions. In this paper, we introduce an RDF versioning approach as a foundation for a full featured RDF Version Control System. We argue that such a system needs support for all concepts of the RDF specification including support for RDF datasets and blank nodes. Furthermore, we placed special emphasis on the protection against unperceived history manipulation by hashing the resulting patches. In addition to the conceptual analysis and an RDF vocabulary for representing versioning information, we present a mature implementation which captures versioning information for changes to arbitrary RDF datasets.