LINKED OPEN DATA TOOLS
The Copernicus App Lab will establish a proof-of-concept for providing the data from the Copernicus land monitoring service, the Copernicus marine environment monitoring service, and the Copernicus atmosphere monitoring service as linked open data. This endeavour is meant to promote the incorporation of Copernicus data in mobile applications and the use of proper tools in the potential future linkage efforts of the Copernicus Services or developers themselves.
The concept is based on the dissemination of data products in a loosely coupled manner based on a framework that facilitates distributed data access and processing. For this purpose, a remote data access protocol will be tailored to the purposes of Copernicus data and made available in an open-source format. It will then be possible to host multi-thematic (gridded), cloud-connected data products related to the Earth sciences in a distributed fashion either on-premises or in the cloud. This will ensure that the distributed, complex nature of data (systems) will be hidden from end users, enabling them to request and consume data using the same unified APIs.
The Copernicus App Lab consists of three technical pillars:
- Provision of Copernicus linked open data via a cloud infrastructure
- Tools for semantic linkage of Copernicus data with other societal or business information
- Improved data access via a streaming data library
GeoTriples is a tool for transforming geospatial data from their original formats (e.g., shapefiles or spatially-enabled relational databases) into RDF. The following input formats are supported: spatially-enabled relational databases (PostGIS and MonetDB), ESRI shapefiles and XML, GML, KML, JSON, GeoJSON and CSV documents.
GeoTriples comprises two main components: the mapping generator and the R2RML/RML mapping processor. The mapping generator takes as input a geospatial data source (e.g., a shapefile) and creates automatically an R2RML or RML mapping that can transform the input into an RDF graph which uses the GeoSPARQL vocabulary. The user may edit the generated R2RML/RML mapping document to comply with her requirements (e.g., use a vocabulary different than the one of GeoSPARQL). Then, the mapping processor executes the R2RML/RML mappings to produce the output geospatial RDF graph. The mapping processor of GeoTriples comes in two forms: a single-node implementation and an implementation that uses Apache Hadoop for dealing with big geospatial data. Get more information here.
Strabon is a spatiotemporal RDF store. You can use it to store linked geospatial data that changes over time and pose queries using two popular extensions of SPARQL. Strabon supports spatial datatypes enabling the serialization of geometric objects in OGC standards WKT and GML. It also offers spatial and temporal selections, spatial and temporal joins, a rich set of spatial functions similar to those offered by geospatial relational database systems and support for multiple Coordinate Reference Systems. Strabon can be used to model temporal domains and concepts such as events, facts that change over time etc. through its support for valid time of triples, and a rich set of temporal functions. Strabon is built by extending the well-known RDF store Sesame (now called RDF4J) and extends RDF4J’s components to manage thematic, spatial and temporal data that is stored in the backend RDBMS.
The first query language supported by Strabon is stSPARQL. stSPARQL can be used to query data represented in an extension of RDF called stRDF. stRDF and stSPARQL have been designed for representing and querying geospatial data that changes over time, e.g., the growth of a city over the years due to new developments can be represented and queried using the valid time dimesion of stRDF and stSPARQL respectively. The expressive power of stSPARQL makes Strabon the only fully implemented RDF store with rich spatial and temporal functionalities available today. Get more information here.
Ontop-spatial can create virtual geospatial RDF graphs on top of your geospatial databases (Currently PostGIS, SpatiaLite and Oracle spatial are supported). Your geometries will be mapped to GeoSPARQL geometry literals using ontologies and R2RML/OBDA mappings.
As its parent system, Ontop, Ontop-spatial can be used as a standard SPARQL endpoint that can execute GeoSPARQL queries on top of geospatial databases. Therefore, it can be used complementarily with other tools that produce, manage, explore, and visualize geospatial RDF data. For example, R2R2ML mappings generated by GeoTriples can be given as input to Ontop-spatial to create a virtual geospatial repository. Also, the geometries of an Ontop-spatial repository can be visualized using Sextant, a web-based tool for browsing and visualizing linked geospatial data. An Ontop-spatial endpoint can also be used as source endpoint to the linking tool Silk, that has recently been extended with geospatial features. Last but not least, the virtual geospatial RDF graphs produced by Ontop-spatial can be materialized and stored in a geospatial RDF store (e.g., Strabon). Get more information here.
Silk is an open source framework for integrating heterogeneous data sources. The primary uses cases of Silk include:
- Generating links between related data items within different Linked Data sources.
- Linked Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web.
- Applying data transformations to structured data sources.
Silk is based on the Linked Data paradigm, which is built on two simple ideas: First, RDF provides an expressive data model for representing structured information. Second, RDF links are set between entities in different data sources.
Linking Data Sources
Using the declarative Silk – Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked. These link conditions may combine various similarity metrics and can take the graph around a data item into account, which is addressed using an RDF path language. Silk accesses the data sources that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints. Link Specifications can be created using the Silk Workbench graphical user interface or manually in XML.
While the main part of a integration workflow lies in the interlinking of data sources. Data sets coming fron different sources sometimes required the harmonization of the schemata and data formats prior to interlinking. For this purpose, Silk enables the user to create and execute lightweight transformation rules. Get more information here.
Sextant is a web based and mobile ready platform for visualizing, exploring and interacting with linked geospatial data. The old version of Sextant, was one of the first visualization tools for linked geospatial data but was focused heavily on the use of SPARQL from the end-user. In the new approach, we re-designed and re-implemented Sextant, focused on creating a user-friendly application that would allow both domain experts and non-experts to take advantage of semantic web technologies, and convince them to adopt these technologies by presenting the benefits of the linked open geospatial Web through the use of Sextant.
The core feature of Sextant is the ability to create thematic maps by combining geospatial and temporal information that exists in a number of heterogeneous data sources ranging from standard SPARQL endpoints, to SPARQL endpoints following the standard GeoSPARQL defined by the Open Geospatial Consortium (OGC), or well-adopted geospatial file formats, like KML, GML and GeoTIFF. Get more information here.
JedAI constitutes an open source, high scalability toolkit that offers out-of-the-box solutions for any data integration task, e.g., Record Linkage, Entity Resolution and Link Discovery. At its core lies a set of domain-independent, state-of-the-art techniques that apply to both RDF and relational data. These techniques rely on an approximate, schema-agnostic functionality based on (meta-)blocking for high scalability.
JedAI can be used in three different ways:
- As an open source library that implements numerous state-of-the-art methods for all steps of the end-to-end ER work presented in the figure below.
- As a desktop application with an intuitive Graphical User Interface that can be used by both expert and lay users.
- As a workbench that compares the relative performance of different (configurations of) ER workflows.
Get more information here.