What is a knowledge graph?

A knowledge graph is a structured collection of information. By exposing facts from a knowledge base and converting them to entities and relationships, a knowledge graph embodies the information held within that knowledge base. They portray the extracted facts in the Subject-Predicate-Object format.

The following diagram illustrates a very simple knowledge graph. Note how real-world entities (for convenience, small icons have been shown in the place of circles) are the nodes of this graph, which are connected by edges. These edges represent the relationships between them. For example, Maria “watched’’ the movie “Jojo Rabbit.” Although the graph below is very simple, we can use knowledge graphs to model complex information in a human and machine-readable format.

How do knowledge graphs work?

The following diagram shows the steps involved in constructing a knowledge graph.

We need to follow certain steps to create one:

Data acquisition: We must acquire data from databases, files, and websites.
Identify entities: After collecting data, we need to identify the entities in the data.
Extracting relationship: Next, we need to determine the relationships between the identified entities.
Develop ontology: Then, we create a proper structure called ontology to organize the properties and relations between the entities.
Store data: We store the knowledge graph in a database that can handle graph data.
Querying and inference: We use graph query language to search and explore relationships in the graph data. We can also carry out more advanced tasks, like identifying new connections and pinpointing any inconsistencies within our knowledge graph.

Advantages of knowledge graphs

Knowledge graphs facilitate data integration by linking information from diverse sources, enabling structured data sharing across organizations. For example, a knowledge graph could connect customer data from a CRM system with product data from an inventory database in an e-commerce company.
They improve the comprehension of a knowledge base by presenting entities and their relationships in a format that is easily understandable by both humans and machines. For instance, a knowledge graph can represent the connections between symptoms, diseases, and treatments in the healthcare domain.
Knowledge graphs enhance search functionality by providing more relevant and accurate results based on the relationships between entities. For instance, a search for “healthy recipes” could yield better results by considering ingredients, nutritional values, and user preferences within the knowledge graph.
They offer flexibility as they can be tailored to suit the requirements of various applications. For example, a knowledge graph in the financial sector can be customized to handle diverse data types such as market trends, customer profiles, and regulatory information.
Knowledge graphs support inference tasks, enabling the discovery of new relationships and the identification of data inconsistencies. For example, analyzing the connections between weather patterns, crop yields, and agricultural practices could reveal insights for optimizing farming techniques.
Knowledge graphs are scalable, making them suitable for handling large-scale applications and massive datasets. For instance, a knowledge graph powering a smart city infrastructure can efficiently manage diverse data streams from sensors, transportation systems, and public services.

Limitations of knowledge graphs

Knowledge graphs have limitations related to the generalization of entities, particularly when they cross boundaries.
Distinguishing between similar entities, such as "Washington," the state, and "Washington," the person, can be challenging within knowledge graphs.
Setting and maintaining boundaries in different scenarios can also be difficult.
Knowledge graphs can become cluttered due to long relations between entities, often consisting of multi-word phrases.
Keeping track of relations becomes increasingly challenging as the complexity of the graph grows, especially with larger datasets.
Complex knowledge graphs containing multiple relations can confuse both humans and machines. In such cases, relational databases might offer a better alternative to knowledge graphs.

Real-world applications of knowledge graphs

Now, let’s look at some real-world applications of knowledge graphs.

Semantic search

Knowledge graphs are capable enough to understand the context and relations between entities on the web. Therefore, they improve search engine results, providing more relevant search results to users.

Chatbot

Knowledge graphs can recognize relevant information and relationships between entities. Therefore, they are suitable for chatbot applications, facilitating question-answering.

Fraud detection

Knowledge graphs can also identify strange behavior and relationships within large datasets, enabling them to pinpoint fraud and security threats. This may include suspicious transactions, fake or hacked accounts, and abnormal behavior.

Biomedical research

In biomedical research, knowledge graphs can model complex relationships between proteins, genes, and drugs. Therefore, they help in drug development and provide new insights to researchers.

Implementation

The following Python file shows how to create a knowledge graph from drug testing data. Click the “Run’’ button in the widget below and play with the Jupyter Notebook code once the notebook launches. Wait patiently because it might take some time for the app to respond.

import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

head_node_values = ['exp_drug_A', 'exp_drug_B', 'exp_drug_C',
        'exp_drug_D', 'exp_drug_A', 'exp_drug_C', 
        'exp_drug_D', 'exp_drug_E', 'exp_gene_13', 
        'exp_gene_2','exp_gene_666', 'exp_gene_4', 
        'exp_gene_6', 'exp_gene_2', 'exp_gene_666',
        'exp_gene_4']

relationship_values = ['will treat', 'will treats', 'will treat', 
            'will treat', 'will inhibit', 'will inhibit',
            'will inhibit', 'will inhibit', 'is associated with', 
            'is associated with', 'is associated with', 'is associated with', 
            'is associated with', 'interacts with', 'interacts with', 
            'interacts with']

tail_nodes_values = ['covid', 'back pain', 'lung cancer', 
        'headache', 'exp_gene_13', 'exp_gene_2', 'exp_gene_4',
        'exp_gene_20', 'weight gain', 'cardiac arrest',
        'sore throat', 'bleeding', 'brain tumor', 
        'exp_gene_13', 'exp_gene_20', 'exp_gene_6']

educatives_knowledge_graph = nx.Graph()
data_dictionary = {'head_node': head_node_values, 'relationship': relationship_values, 'tail_node': tail_nodes_values}
educatives_dataframe = pd.DataFrame(data_dictionary)
educatives_dataframe

for i, my_row in educatives_dataframe.iterrows():
    educatives_knowledge_graph.add_edge(my_row['head_node'], my_row['tail_node'], label=my_row['relationship'])

educative_position = nx.spring_layout(educatives_knowledge_graph, seed=47, k=3.6)
plt.figure(figsize=(11, 10))
nx.draw(educatives_knowledge_graph, educative_position, with_labels=True, node_size=666, node_color='green', edge_color='black', alpha=0.9)
nx.draw_networkx_edge_labels(educatives_knowledge_graph, educative_position, edge_labels=nx.get_edge_attributes(educatives_knowledge_graph, 'label'), label_pos=0.5, verticalalignment='baseline')
plt.show()

Working example

Code explanation

Line 1–3: First of all, we import the required packages; in our case, we’ll require pandas to create a data frame from the knowledge graph’s data, networkx to create the graph, and matplotlib to display it.
Line 5–24: There are three lists: one containing the data for the head nodes called head_node_values, the second containing the tail node values called tail_node_values, and last, the relationship from the head node to the tail node, stored within the relationship_values list.
Line 25–31: We create an empty, undirected graph with the Graph class. Then, we instantiate a dictionary that stores column names of the lists declared on lines 5, 12, and 19 as keys and their respective list variables as values. On line 27, we create a data frame from this list and display our medicine testing data. Note each row in our data frame represents a triple in our knowledge graph — the head connected to the tail via a relationship. Lastly, utilizing the add_edge function, we iterate over this data frame and add each row as an edge,
Line 33–37: We utilize the spring_layout function to set the positions of nodes in our graph. The argument seeds is used to set a random state for deterministic node layout. At the same time, k sets the distance between each node. We create a new figure using the plt.figure() method and then create the graph using nx.draw() function by passing it the educatives_knowledge_graph graph and educative_position, which stores the node positions. The rest of the arguments passed to it are self-explanatory except for alpha, which sets the opacity of the graph. Using the nx.draw_networkx_edge_labels method, we add the edge labels to the graph by extracting them from get_edge_attributes method and aligning them with label_pos and verticalalignment arguments. Finally, we visually display the graph with plt.show().

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources