The complexity to construct knowledge graphs and how low code tools can help or hurt you:

Published in

Knowledge Graph Digest

6 min readNov 18, 2020

In last week’s webinar, hosted by KgBase, Data Chef, François Scharffe, overviewed the construction process for knowledge graphs, looking at where automation and good UI can streamline the process.

Building knowledge graphs for many types of organizations from government, academics, startups, to large corporations — wisdom is imparted on problems and complexities that arise when building knowledge graphs.

Scharffe mentions two critical skill sets for an ontologist: technical and organizational. The ontologist must be:

A well-informed information scientist.
A domain expert with business expertise who can translate a formal model.

Central to his discussion is the character, Alice, a business analyst, who is investigating knowledge graphs as a solution to manage her company’s data.

A long list of knowledge graph activities can be accomplished:

Choosing the semantic web stack,
Learning the standards to build knowledge graphs,
Implementing data quality procedures,
Performing entity matching for different sources of data,
Mapping schemas and aligning key elements,
Building ETL pipelines
Establishing ontology definitions with the team, etc.

Thesis 1: Technical aspects of ontology development can be abstracted by using the proper tools.

The business analyst or domain expert uses tools to translate her domain expertise using a machine.

One of the first issues to tackle is disparate data sources.

Knowledge Graph from François Scharffe’s Webinar

Sources of data range from internal data systems like SharePoint, Salesforce, Content Management Systems, Excel spreadsheets, and other databases, not to mention all the vendor data from scraped websites, social media, and elsewhere.

These disparate data sources must be transformed and integrated into a common knowledge graph, stored in a graph database, and modeled on an agreed upon business ontology.

The main goal should be to use efficient tools to help teams better allocate their time and resources to model the domain and find insights.

Scharffe suggests several options:

Look at the knowledge graph itself, embed it, and do all the numerical representations of the graph to find missing relationships in the graph.
Use natural language processing pipelines to extract knowledge from documents, research, emails, etc.
Use a logistical analyst to examine the facts and infer new facts according to the data and the ontology.

Scharffe then discusses how to design an ontology scheme, touching on taxonomies and invoking real expressions from the information.

Ontologies express information about the “meta structure” describing data in the graph.

Conceptualization: a description of the way we think about a domain.
Specification: Formally conceptualizing the concept through writing.
Formal: Axioms expressed in a formal language.
Shared: Community-based, reusable across all applications.

Scharffe provides a helpful example to help you visualize these concepts better. The webinar takes us through a favorite Southern French food recipe: Cassoulet.

Screenshots:

A knowledge graph of Cassoulet — Figure 1: Knowledge graph representation of “Cassoulet”.

The graph shows that Cassoulet has a recipe type “rdf:type”, “food:Recipe”.

There’s a “hasIngredient”, ObjectProperty, that defines the relationship between the recipe and ingredient.

Back to basics brought to you by LinkDap.

Being aware of the complexity in taxonomies, schemas, data mapping, etc., Alice browses around for information and the tools to get started.

There are many languages and specifications to consider:

RDFS: The light schema language
OWL: Ontology languages
SKOS: Taxonomy language
SHACL: Schema constraints/linguistic specifications

The documentation, although interesting, is very dense and complex.

Focus more on knowledge about the business. However, still manage other aspects before you implement a knowledge graph.

Establish an Ingestion Strategy to handle a data pipeline architecture:

Partial vs Full ingestion
Materialization vs virtualization

Scharffe suggests the transformation of the existing data sources to import them into a graph database:

Mapping (tedious)
Requires a domain expert and coder (expensive)
For a relational database: (R2RML, W3C)
Data mapping tools (OntoText), (OntoRefine)

This is an important process to undertake in expanding the knowledge graph with new data sources.

Also, consider Entity Matching which ensures new data source fields are mapped properly to the existing knowledge graph.

This is usually done at the time of ingestion or after as a reconciliation process (see tools like Alteryx or Informatica).

Also, consider Entity Information Provenance, where priority rules are specified for attribute values.

Provide a simple framework, no complex external software
Provide a clean UI to define priorities and property values

Scharffe notes that current software solutions are not ideal but better than having rules and identification managed and buried by code.

Scharffe gives two proposals to reduce the burdens of constructing a knowledge graph.

Proposal 1: Nano Schemas as Knowledge-Building Graph Blocks:

Nano schemas are minimal units of ontological commitment, focused on a class and a set of properties. They have common use cases, connect to each other, are accessible through libraries, and can be used in property or RDF graphs.

Nano schemas can help with modeling and can be grounded with other formalities, such as OWL. They can also be mapped to common ontology and vocabularies, such as schema.org.

Proposal 2: Plug-’n- Play Data Integration

A ton of work goes into mapping new and existing data sources coming from enterprise databases, APIs, knowledge graphs, etc.

It’s a very manual process to take a relational database, map tables to concepts, and columns to attributes, etc. It becomes even more tedious to identify entities of the new data source and map them to other entities in the knowledge graph.

Once the graph is built, applications and visualizations can be built off of it, but they still need to be made accessible to others on the team.

No surprises that this requires significant development work.

Well, where do we go from here?

Many tools that exist for implementing knowledge graphs are too powerful, making them extremely complex. They’ve inherited a lot of technical artifacts from the history of knowledge graphs, creating an overwhelming experience for many users.

There is a better way to make the tools used for constructing knowledge graphs easier.

Below highlights the main points, as suggested by François Scharffe, Data Chef at The Knowledge Graph Conference:

Abstract the ontology language using a modern GUI

Hide everything that is not directly related (URI/IRI, Namespaces)
Focus on the simple constructs covering 99% of the use cases (mainly RDFS)
Give access to more complex rules in an advanced mode
Offer a library of reusable ontology components, linked with the tool

Data visualization

Data should be simple to analyze
Using standard tools such as Thinknum, Tableau or PowerBI without having to export the knowledge graph into a relational database

Schema Mapping

Use automation to auto-suggest names, data types, and analyze data values
Use GUIs to map fields and tables to the ontology
Provide integration of the nano-schemas to existing vendor dataset schemas

If you haven’t seen François Scharffe’s webinar yet, here ‘ya go: No Code Knowledge Graphs Webinar

If you are looking for new tools to tackle graph databases, check out these our knowledge graph solution.

If my writing didn’t just bore you to death, please read my perspective on how companies can operationalize data to gain a competitive advantage.

Data Mesh: Implications for Data Product Teams and Business Outcomes

Are you evaluating whether Data Mesh is a good fit for your organization? Then it’s worth checking out this brief…

medium.com