In this tutorial, we explore how to take advantage of Pybel Ecosystem to build and analyze rich graphics of biological knowledge directly in Google Colab. We start by installing all the necessary packages, including Pybel, Networkx, Matplotlib, Seaborn and Pandas. We then demonstrate how to define proteins, processes and changes using the DSL Pybel. From there, we guide you through the creation of a path linked to Alzheimer's disease, presenting how to code causal relationships, protein-protein interactions and phosphorylation events. In addition to the construction of graphics, we introduce analyzes of advanced network, including centrality measures, classification of nodes and the extraction of sub-graphics, as well as techniques for the extraction of citation and proof data. At the end of this section, you will have a fully annotated graph ready for visualization analysis and downstream enrichment, throwing a solid base for the exploration of interactive biological knowledge.
!pip install pybel pybel-tools networkx matplotlib seaborn pandas -q
import pybel
import pybel.dsl as dsl
from pybel import BELGraph
from pybel.io import to_pickle, from_pickle
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')
print("PyBEL Advanced Tutorial: Biological Expression Language Ecosystem")
print("=" * 65)
We start by installing Pybel and its outbuildings directly in Colab, ensuring that all the necessary libraries, Networkx, Matplotlib, Seaborn and Pandas, are available for our analysis. Once installed, we import the basic modules and remove the warnings to keep our notebook clean and focused on the results.
print("\n1. Building a Biological Knowledge Graph")
print("-" * 40)
graph = BELGraph(
name="Alzheimer's Disease Pathway",
version="1.0.0",
description="Example pathway showing protein interactions in AD",
authors="PyBEL Tutorial"
)
app = dsl.Protein(name="APP", namespace="HGNC")
abeta = dsl.Protein(name="Abeta", namespace="CHEBI")
tau = dsl.Protein(name="MAPT", namespace="HGNC")
gsk3b = dsl.Protein(name="GSK3B", namespace="HGNC")
inflammation = dsl.BiologicalProcess(name="inflammatory response", namespace="GO")
apoptosis = dsl.BiologicalProcess(name="apoptotic process", namespace="GO")
graph.add_increases(app, abeta, citation="PMID:12345678", evidence="APP cleavage produces Abeta")
graph.add_increases(abeta, inflammation, citation="PMID:87654321", evidence="Abeta triggers neuroinflammation")
tau_phosphorylated = dsl.Protein(name="MAPT", namespace="HGNC",
variants=(dsl.ProteinModification("Ph")))
graph.add_increases(gsk3b, tau_phosphorylated, citation="PMID:11111111", evidence="GSK3B phosphorylates tau")
graph.add_increases(tau_phosphorylated, apoptosis, citation="PMID:22222222", evidence="Hyperphosphorylated tau causes cell death")
graph.add_increases(inflammation, apoptosis, citation="PMID:33333333", evidence="Inflammation promotes apoptosis")
graph.add_association(abeta, tau, citation="PMID:44444444", evidence="Abeta and tau interact synergistically")
print(f"Created BEL graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")
We initialize a Belgraph with metadata for a path of Alzheimer's disease and define proteins and processes using the DSL Pybel. By adding causal relationships, changes in proteins and associations, we build a complete network that captures key molecular interactions.
print("\n2. Advanced Network Analysis")
print("-" * 30)
degree_centrality = nx.degree_centrality(graph)
betweenness_centrality = nx.betweenness_centrality(graph)
closeness_centrality = nx.closeness_centrality(graph)
most_central = max(degree_centrality, key=degree_centrality.get)
print(f"Most connected node: {most_central}")
print(f"Degree centrality: {degree_centrality(most_central):.3f}")
We calculate the centralities of degree, interpretation and proximity to quantify the importance of each node in the graph. By identifying the most connected nodes, we have an overview of potential centers that can stimulate the mechanisms of the disease.
print("\n3. Biological Entity Classification")
print("-" * 35)
node_types = Counter()
for node in graph.nodes():
node_types(node.function) += 1
print("Node distribution:")
for func, count in node_types.items():
print(f" {func}: {count}")
We classify each node by its function, such as proteins or the biological process, and count their number. This ventilation helps us to understand the composition of our network at a glance.
print("\n4. Pathway Analysis")
print("-" * 20)
proteins = (node for node in graph.nodes() if node.function == 'Protein')
processes = (node for node in graph.nodes() if node.function == 'BiologicalProcess')
print(f"Proteins in pathway: {len(proteins)}")
print(f"Biological processes: {len(processes)}")
edge_types = Counter()
for u, v, data in graph.edges(data=True):
edge_types(data.get('relation')) += 1
print("\nRelationship types:")
for rel, count in edge_types.items():
print(f" {rel}: {count}")
We separate all the proteins and processes to measure the scope and complexity of the way. The counting of different types of relationships also reveals what interactions, such as increases or associations, dominate our model.
print("\n5. Literature Evidence Analysis")
print("-" * 32)
citations = ()
evidences = ()
for _, _, data in graph.edges(data=True):
if 'citation' in data:
citations.append(data('citation'))
if 'evidence' in data:
evidences.append(data('evidence'))
print(f"Total citations: {len(citations)}")
print(f"Unique citations: {len(set(citations))}")
print(f"Evidence statements: {len(evidences)}")
We extract quotes identifiers and evidence chains from each edge to assess the landing of our graphic in published research. The summary of total and unique quotes allows us to assess the extent of the support literature.
print("\n6. Subgraph Analysis")
print("-" * 22)
inflammation_nodes = (inflammation)
inflammation_neighbors = list(graph.predecessors(inflammation)) + list(graph.successors(inflammation))
inflammation_subgraph = graph.subgraph(inflammation_nodes + inflammation_neighbors)
print(f"Inflammation subgraph: {inflammation_subgraph.number_of_nodes()} nodes, {inflammation_subgraph.number_of_edges()} edges")
We areolated the sub-graphic of inflammation by collecting its direct neighbors, giving a targeted vision of the inflammatory diaphony. This targeted subnet highlights how inflammation interfaces with other pathological processes.
print("\n7. Advanced Graph Querying")
print("-" * 28)
try:
paths = list(nx.all_simple_paths(graph, app, apoptosis, cutoff=3))
print(f"Paths from APP to apoptosis: {len(paths)}")
if paths:
print(f"Shortest path length: {len(paths(0))-1}")
except nx.NetworkXNoPath:
print("No paths found between APP and apoptosis")
apoptosis_inducers = list(graph.predecessors(apoptosis))
print(f"Factors that increase apoptosis: {len(apoptosis_inducers)}")
We list simple paths between application and apoptosis to explore mechanistic routes and identify key intermediaries. The listed all the predecessors of apoptosis also shows us what factors can trigger cell death.
print("\n8. Data Export and Visualization")
print("-" * 35)
adj_matrix = nx.adjacency_matrix(graph)
node_labels = (str(node) for node in graph.nodes())
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
pos = nx.spring_layout(graph, k=2, iterations=50)
nx.draw(graph, pos, with_labels=False, node_color="lightblue",
node_size=1000, font_size=8, font_weight="bold")
plt.title("BEL Network Graph")
plt.subplot(2, 2, 2)
centralities = list(degree_centrality.values())
plt.hist(centralities, bins=10, alpha=0.7, color="green")
plt.title("Degree Centrality Distribution")
plt.xlabel("Centrality")
plt.ylabel("Frequency")
plt.subplot(2, 2, 3)
functions = list(node_types.keys())
counts = list(node_types.values())
plt.pie(counts, labels=functions, autopct="%1.1f%%", startangle=90)
plt.title("Node Type Distribution")
plt.subplot(2, 2, 4)
relations = list(edge_types.keys())
rel_counts = list(edge_types.values())
plt.bar(relations, rel_counts, color="orange", alpha=0.7)
plt.title("Relationship Types")
plt.xlabel("Relation")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
We prepare the adjacence matrices and the node labels for downstream use and generate a multi-Panneal figure showing the structure of the network, centrality distributions, node type proportions and the number of types of edge. These visualizations give life to our beautiful graph, supporting a deeper biological interpretation.
In this tutorial, we have demonstrated the power and flexibility of Pybel for the modeling of complex biological systems. We have shown what facility we can build an organized white box graphic for interactions of Alzheimer's disease, carry out analyzes at the network to identify the key nodes of the concentrator and extract biologically significant sub-graphs for a focused study. We have also covered the essential practices for the exploitation of the literature on the exploitation and preparation of data structures for convincing visualizations. As a next step, we encourage you to extend this framework to your tracks, integrate additional OMS data, execute enrichment tests or couple the graph with automatic learning workflows.
Discover the Codes here. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our Subseubdredit 100k + ml and subscribe to Our newsletter.
Sana Hassan, consulting trainee at Marktechpost and double -degree student at Iit Madras, is passionate about the application of technology and AI to meet the challenges of the real world. With a great interest in solving practical problems, it brings a new perspective to the intersection of AI and real life solutions.
