A knowledge graph is a structured collection of information. By exposing facts from a knowledge base and converting them to entities and relationships, a knowledge graph embodies the information held within that knowledge base. They portray the extracted facts in the Subject-Predicate-Object format.
The following diagram illustrates a very simple knowledge graph. Note how real-world entities (for convenience, small icons have been shown in the place of circles) are the nodes of this graph, which are connected by edges. These edges represent the relationships between them. For example, Maria “watched’’ the movie “Jojo Rabbit.” Although the graph below is very simple, we can use knowledge graphs to model complex information in a human and machine-readable format.
The following diagram shows the steps involved in constructing a knowledge graph.
We need to follow certain steps to create one:
Data acquisition: We must acquire data from databases, files, and websites.
Identify entities: After collecting data, we need to identify the entities in the data.
Extracting relationship: Next, we need to determine the relationships between the identified entities.
Develop ontology: Then, we create a proper structure called ontology to organize the properties and relations between the entities.
Store data: We store the knowledge graph in a database that can handle graph data.
Querying and inference: We use graph query language to search and explore relationships in the graph data. We can also carry out more advanced tasks, like identifying new connections and pinpointing any inconsistencies within our knowledge graph.
Knowledge graphs facilitate data integration by linking information from diverse sources, enabling structured data sharing across organizations. For example, a knowledge graph could connect customer data from a CRM system with product data from an inventory database in an e-commerce company.
They improve the comprehension of a knowledge base by presenting entities and their relationships in a format that is easily understandable by both humans and machines. For instance, a knowledge graph can represent the connections between symptoms, diseases, and treatments in the healthcare domain.
Knowledge graphs enhance search functionality by providing more relevant and accurate results based on the relationships between entities. For instance, a search for “healthy recipes” could yield better results by considering ingredients, nutritional values, and user preferences within the knowledge graph.
They offer flexibility as they can be tailored to suit the requirements of various applications. For example, a knowledge graph in the financial sector can be customized to handle diverse data types such as market trends, customer profiles, and regulatory information.
Knowledge graphs support inference tasks, enabling the discovery of new relationships and the identification of data inconsistencies. For example, analyzing the connections between weather patterns, crop yields, and agricultural practices could reveal insights for optimizing farming techniques.
Knowledge graphs are scalable, making them suitable for handling large-scale applications and massive datasets. For instance, a knowledge graph powering a smart city infrastructure can efficiently manage diverse data streams from sensors, transportation systems, and public services.
Knowledge graphs have limitations related to the generalization of entities, particularly when they cross boundaries.
Distinguishing between similar entities, such as "Washington," the state, and "Washington," the person, can be challenging within knowledge graphs.
Setting and maintaining boundaries in different scenarios can also be difficult.
Knowledge graphs can become cluttered due to long relations between entities, often consisting of multi-word phrases.
Keeping track of relations becomes increasingly challenging as the complexity of the graph grows, especially with larger datasets.
Complex knowledge graphs containing multiple relations can confuse both humans and machines. In such cases, relational databases might offer a better alternative to knowledge graphs.
Now, let’s look at some real-world applications of knowledge graphs.
Knowledge graphs are capable enough to understand the context and relations between entities on the web. Therefore, they improve search engine results, providing more relevant search results to users.
Knowledge graphs can recognize relevant information and relationships between entities. Therefore, they are suitable for chatbot applications, facilitating question-answering.
Knowledge graphs can also identify strange behavior and relationships within large datasets, enabling them to pinpoint fraud and security threats. This may include suspicious transactions, fake or hacked accounts, and abnormal behavior.
In biomedical research, knowledge graphs can model complex relationships between proteins, genes, and drugs. Therefore, they help in drug development and provide new insights to researchers.
The following Python file shows how to create a knowledge graph from drug testing data. Click the “Run’’ button in the widget below and play with the Jupyter Notebook code once the notebook launches. Wait patiently because it might take some time for the app to respond.
import pandas as pd import networkx as nx import matplotlib.pyplot as plt head_node_values = ['exp_drug_A', 'exp_drug_B', 'exp_drug_C', 'exp_drug_D', 'exp_drug_A', 'exp_drug_C', 'exp_drug_D', 'exp_drug_E', 'exp_gene_13', 'exp_gene_2','exp_gene_666', 'exp_gene_4', 'exp_gene_6', 'exp_gene_2', 'exp_gene_666', 'exp_gene_4'] relationship_values = ['will treat', 'will treats', 'will treat', 'will treat', 'will inhibit', 'will inhibit', 'will inhibit', 'will inhibit', 'is associated with', 'is associated with', 'is associated with', 'is associated with', 'is associated with', 'interacts with', 'interacts with', 'interacts with'] tail_nodes_values = ['covid', 'back pain', 'lung cancer', 'headache', 'exp_gene_13', 'exp_gene_2', 'exp_gene_4', 'exp_gene_20', 'weight gain', 'cardiac arrest', 'sore throat', 'bleeding', 'brain tumor', 'exp_gene_13', 'exp_gene_20', 'exp_gene_6'] educatives_knowledge_graph = nx.Graph() data_dictionary = {'head_node': head_node_values, 'relationship': relationship_values, 'tail_node': tail_nodes_values} educatives_dataframe = pd.DataFrame(data_dictionary) educatives_dataframe for i, my_row in educatives_dataframe.iterrows(): educatives_knowledge_graph.add_edge(my_row['head_node'], my_row['tail_node'], label=my_row['relationship']) educative_position = nx.spring_layout(educatives_knowledge_graph, seed=47, k=3.6) plt.figure(figsize=(11, 10)) nx.draw(educatives_knowledge_graph, educative_position, with_labels=True, node_size=666, node_color='green', edge_color='black', alpha=0.9) nx.draw_networkx_edge_labels(educatives_knowledge_graph, educative_position, edge_labels=nx.get_edge_attributes(educatives_knowledge_graph, 'label'), label_pos=0.5, verticalalignment='baseline') plt.show()
Line 1–3: First of all, we import the required packages; in our case, we’ll require pandas
to create a data frame from the knowledge graph’s data, networkx
to create the graph, and matplotlib
to display it.
Line 5–24: There are three lists: one containing the data for the head nodes called head_node_values
, the second containing the tail node values called tail_node_values
, and last, the relationship from the head node to the tail node, stored within the relationship_values
list.
Line 25–31: We create an empty, undirected graph with the Graph
class. Then, we instantiate a dictionary that stores column names of the lists declared on lines 5, 12, and 19 as keys and their respective list variables as values. On line 27, we create a data frame from this list and display our medicine testing data. Note each row in our data frame represents a triple in our knowledge graph — the head connected to the tail via a relationship. Lastly, utilizing the add_edge
function, we iterate over this data frame and add each row as an edge,
Line 33–37: We utilize the spring_layout
function to set the positions of nodes in our graph. The argument seeds
is used to set a random state for deterministic node layout. At the same time, k
sets the distance between each node. We create a new figure using the plt.figure()
method and then create the graph using nx.draw()
function by passing it the educatives_knowledge_graph
graph and educative_position
, which stores the node positions. The rest of the arguments passed to it are self-explanatory except for alpha, which sets the opacity of the graph. Using the nx.draw_networkx_edge_labels
method, we add the edge labels to the graph by extracting them from get_edge_attributes
method and aligning them with label_pos
and verticalalignment
arguments. Finally, we visually display the graph with plt.show
()
.