Wednesday, November 26, 2025

HITS for Authority and Hub identification

The HITS (Hyperlink-Induced Topic Search) algorithm makes use of the mutual reinforcing relationship between authorities and hubs to evaluate and rank a set of linked entities. It assigns ranking scores to the vertices, aimed to assess the quality of information and references in linked structures.  Instructively, a node with large in-degree is viewed as an authority. If a node points to a considerable number of authoritative nodes, it is referred to as a hub.

 We will use the following Social Network to demonstrate the HITS algorithm

Here we have State official, citizens and journalists in a State. 

As illustrated in the graph above, State Officials represent good authorities, while Journalists represent good hubs. Observe that Hubs and Authorities exhibit a mutually reinforcing relationship: a good hub points to many good authorities; a good authority is pointed to by many good hubs.


Let's start by creating sample data:



create table users (username varchar(20));
create table follows (follower varchar(20), followee varchar(20));

insert into users values 
  ('governor')
  , ('mayor')
  , ('senator')
  , ('journalist_1')
  , ('journalist_2')
  , ('journalist_3')
  , ('g')
  , ('h')
  , ('i')
  , ('j')
  , ('k')
  , ('l')
  , ('m')
  , ('n')
  , ('o')
  , ('p')
  , ('q')
  , ('r')
  , ('s')
  , ('t')
  , ('u')
  , ('v')
  , ('w')
  , ('x')
  , ('y')
  , ('z');


insert into follows values 
  ('z', 'governor')
  , ('y', 'governor')  
  , ('x', 'governor')  
  , ('w', 'governor')    
  , ('v', 'governor')  
  , ('journalist_1', 'governor')  
  , ('journalist_2', 'governor')  

  , ('u', 'mayor')  
  , ('t', 'mayor')  
  , ('s', 'mayor')        
  , ('r', 'mayor')
  , ('q', 'mayor')
  , ('journalist_1', 'mayor')        
  , ('journalist_2', 'mayor')        
  , ('journalist_3', 'mayor')        

  , ('p', 'senator')        
  , ('o', 'senator')
  , ('n', 'senator')
  , ('m', 'senator')
  , ('l', 'senator')
  , ('journalist_2', 'senator')
  , ('journalist_3', 'senator');                 


Next create a PROPERTY GRAPH to represet the Following relationship in this Social Graph

CREATE OR REPLACE PROPERTY GRAPH social_network
  VERTEX TABLES (
    users KEY (username) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    follows
      key (follower, followee) 
      SOURCE KEY (follower) REFERENCES users(username)
      DESTINATION KEY(followee) REFERENCES users(username)
      PROPERTIES ARE ALL COLUMNS
    
);
We can query this PROPERTY GRAPH using the following SQL/PGQ

SELECT distinct 
   follower
   , followee
   , v1
   , v2
   , e1
FROM GRAPH_TABLE (
       social_network
       MATCH   (follower  IS users) - [e is follows] -> (followee is users)
       COLUMNS (
               follower.username  as follower
               , followee.username  as followee
               , vertex_id(follower) as v1
               , vertex_id(followee) as v2
               , edge_id(e) as e1

        )
);
Next we will run the HITS algorithm on this PROPERTY GRAPH

%python-pgx
graph = session.read_graph_by_name("SOCIAL_NETWORK", "pg_sql")

hits = analyst.hits(graph, auth='authorities', hubs='hubs')

result_set = graph.query_pgql(

    "SELECT x.USERNAME, x.authorities, x.hubs "

    "MATCH (x) ORDER BY x.authorities DESC, x.hubs DESC")

result_set.print()


+---------------------------------------------------------+
| x.USERNAME   | x.authorities      | x.hubs              |
+---------------------------------------------------------+
| mayor        | 0.7071067811865475 | 0.0                 |
| senator      | 0.5000000000000001 | 0.0                 |
| governor     | 0.5000000000000001 | 0.0                 |
| journalist_2 | 0.0                | 0.5187737569752051  |
| journalist_1 | 0.0                | 0.3668284414587895  |
| journalist_3 | 0.0                | 0.3668284414587895  |
| u            | 0.0                | 0.21488312594237396 |
| t            | 0.0                | 0.21488312594237396 |
| s            | 0.0                | 0.21488312594237396 |
| r            | 0.0                | 0.21488312594237396 |
| q            | 0.0                | 0.21488312594237396 |
| l            | 0.0                | 0.1519453155164156  |
| m            | 0.0                | 0.1519453155164156  |
| n            | 0.0                | 0.1519453155164156  |
| o            | 0.0                | 0.1519453155164156  |
| p            | 0.0                | 0.1519453155164156  |
| v            | 0.0                | 0.1519453155164156  |
| w            | 0.0                | 0.1519453155164156  |
| x            | 0.0                | 0.1519453155164156  |
| y            | 0.0                | 0.1519453155164156  |
| z            | 0.0                | 0.1519453155164156  |
| k            | 0.0                | 0.0                 |
| j            | 0.0                | 0.0                 |
| i            | 0.0                | 0.0                 |
| h            | 0.0                | 0.0                 |
| g            | 0.0                | 0.0                 |
+---------------------------------------------------------+
 
Note: Nodes with no in-links are assigned an authority weight of 0, while nodes with no out-links are assigned a hub weight of 0. State Officials get Authority Rankings because many Hubs point to them. Where Journalist get high Hub Ranking as they point to many Hubs.

Eigenvector Centrality Algorithm

Eigenvector centrality is an established measure of global connectivity, from which the importance and influence of nodes can be inferred. Eigenvector centrality quantifies a node's influence within a graph. A node's importance is determined by its neighbors—it is both influenced by them and exerts influence on them. However, not all connections are equal; a node's centrality increases if it is connected to other highly influential nodes. In short, Eigenvector centrality gets the centrality of the vertices in an intricate way using neighbors, allowing to find well-connected vertices.

  


 Let's start by creating some sample data:

 


    create table website (website_url varchar(50));
    create table links(link_from varchar(50), link_to varchar(50), weight float);

    insert into website values ('web1')
    , ('web2')
    , ('web3')
    , ('web4')
    , ('web5')
    , ('web6');

    insert into links values ('web1', 'web1', .2),
        ('web1', 'web2', .3),
        ('web2', 'web3', .4),
        ('web3', 'web1', .4),
        ('web3', 'web2', .4),
        ('web3', 'web4', .4),
        ('web3', 'web5', .4),
        ('web5', 'web3', .4),
        ('web6', 'web6', .4);

Next create a PROPERTY GRAPH to represent Web Graph


CREATE OR REPLACE PROPERTY GRAPH web_graph
  VERTEX TABLES (
    website KEY (website_url) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    links
      key (link_from, link_to) 
      SOURCE KEY (link_from) REFERENCES website(website_url)
      DESTINATION KEY(link_to) REFERENCES website(website_url)
      PROPERTIES ARE ALL COLUMNS
    
);
You can use the following SQL/PGQ to query the above PROPERTY GRAPH

SELECT distinct 
   website_a
   , website_b
   , v1
   , v2
   , e1
FROM GRAPH_TABLE (
       web_graph
       MATCH   (link_from  IS website) - [e is links] -> (link_to is website)
       COLUMNS (
               link_from.website_url  as website_a
               , link_to.website_url  as website_b
               --, e_path.relationship as e_path_relationship
               , vertex_id(link_from) as v1
               , vertex_id(link_to) as v2
               , edge_id(e) as e1

        )
);
Finally run eigenvector_centrality algorithm on the above PROPERTY GRAPH

%python-pgx


print(session)

# Write your codeprint(session)
graph = session.read_graph_by_name("WEB_GRAPH", "pg_sql")
print(graph)
analyst.pagerank(graph)
analyst.degree_centrality(graph)
analyst.in_degree_centrality(graph)
analyst.out_degree_centrality(graph)
analyst.vertex_betweenness_centrality(graph)
analyst.local_clustering_coefficient(graph)
analyst.eigenvector_centrality(graph)
analyst.communities_label_propagation(graph, 100, 'label_propagation')

print(cycle)
graph.query_pgql("""
SELECT 
  a.website_url
  , a.eigenvector
  , a.label_propagation
FROM MATCH (a)
""").print()

Output:

PgxSession(id: 5ef30734-4ebb-4f22-87fe-457c3e5ccc84, name: ADMIN)
PgxGraph(name: WEB_GRAPH, v: 6, e: 9, directed: True, memory(Mb): 0)
PgxPath(graph: WEB_GRAPH, src: PgxVertex[provider=WEBSITE,key=web1,ID=WEBSITE(web1)], dst: PgxVertex[provider=WEBSITE,key=web1,ID=WEBSITE(web1)], num. edges: 1 cost: 1.0)
+---------------------------------------------------------+
| website_url | eigenvector           | label_propagation |
+---------------------------------------------------------+
| web1        | 0.24693143869692635   | 0                 |
| web2        | 0.19814795883595723   | 1                 |
| web3        | 0.3567709398214408    | 1                 |
| web4        | 0.0                   | 1                 |
| web5        | 0.19814795883595723   | 1                 |
| web6        | 1.7038097185306344E-6 | 2                 |
+---------------------------------------------------------+

Tuesday, November 25, 2025

Louvain for Community detection

Louvain algorithm is a widely used and and well-regarded method for community detection in graphs. The algorithm begins with a singleton partition, where each node belongs to its own community. It then proceeds iteratively through multiple passes, each consisting of two distinct phases.

We will use the Louvain Algorithm in Oracle Graph Server to identify communities in the following Mentoring relationship graph

 

Let's start by generating some sample data:

 

 



create table users (username varchar(10));
create table mentors (mentor varchar(10), mentee varchar(10), weight float);

insert into users values 
  ('a')
  , ('b')
  , ('c')
  , ('d')
  , ('e')
  , ('f')
  , ('g')
  , ('h')
  , ('i')
  , ('j')
  , ('k')
  , ('l')
  , ('m')
  , ('n')

  ;


insert into mentors values 
  ('a', 'b', 0.3)
  , ('a', 'c', 0.4)  
  , ('a', 'd', 0.1)  
  , ('a', 'e', 0.4)    
  , ('b', 'g', 0.5)  
  , ('g', 'f', 0.2)  
  , ('f', 'a', 0.5)  
  , ('f', 'h', 0.2)  
  , ('f', 'i', 0.1)  
  , ('f', 'j', 0.5)        
  , ('f', 'k', 0.9)
  , ('k', 'a', 0.1)        
  , ('k', 'm', 0.2)        
  , ('k', 'n', 0.6)
  , ('k', 'l', 0.3);                 


Next create a PROPERTY GRAPH to reflect the Mentor -> Mentee relationships
CREATE OR REPLACE PROPERTY GRAPH social_network
  VERTEX TABLES (
    users KEY (username) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    mentors
      key (mentor, mentee) 
      SOURCE KEY (mentor) REFERENCES users(username)
      DESTINATION KEY(mentee) REFERENCES users(username)
      PROPERTIES ARE ALL COLUMNS
    
);

Now let's run the Louvain algorithm on this Property Graph 

%python-pgx

# Write your code here - e.g.: graph = session.read_graph_with_prgraph = ....
graph = session.read_graph_by_name("SOCIAL_NETWORK", "pg_sql")

weight = graph.get_edge_property(name="WEIGHT")

print(weight)

partition = analyst.louvain(graph, weight,  max_iter=100, nbr_pass=3, tol=0.0001, community='community')

result_set = graph.query_pgql(

    "SELECT x.USERNAME, x.community MATCH (x) ORDER BY x.community DESC")

result_set.print()
print(partition)

session.close()

Output:

EdgeProperty(name: WEIGHT, type: double, graph: SOCIAL_NETWORK)
+--------------------------+
| x.USERNAME | x.community |
+--------------------------+
| i          | 10          |
| j          | 10          |
| k          | 10          |
| l          | 10          |
| m          | 10          |
| n          | 10          |
| f          | 10          |
| h          | 10          |
| a          | 4           |
| c          | 4           |
| d          | 4           |
| e          | 4           |
| b          | 1           |
| g          | 1           |
+--------------------------+

PgxPartition(graph: SOCIAL_NETWORK, components: 14)
 
 
 

Wednesday, November 12, 2025

Degree Centrality in Oracle Graph (SQL/PGQ)

Betweenness centrality measures the likelihood of a node being on the shortest paths between any two other nodes. This metric effectively identifies "bridge" nodes that facilitate connectivity between different parts of a graph. This algorithm assigns a score to each node- higher scores indicating nodes that exert greater influence over the flow and connectivity of the network.

 In this blogpost we will show examples of running the Betweenness Centrality algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small social network graph of a handful nodes connected in a particular pattern. The example graph looks like this:


 

We will run the Betweenness Centrality algorithm on this node and compute the Node. Can you guess which node will score the highest. Yes, in this case it is easy to visually inspect and guestimate. However in a large graph this will not be possible without the use of some algorithm.

Let's start by generating some sample data:

 



create table users (user_id int, user_name varchar(256));
create table knows (user_id_from int, user_id_to int, relationship varchar(256));

insert into users values (1, 'Fatima')
  , (2, 'Uroosa')
  , (3, 'Hannah')
  , (4, 'Maryam')
  , (5, 'Zainab')
  , (6, 'Urwa')
  , (7, 'Zoha');


insert into knows values (1, 2, 'knows')
  , (1, 5, 'knows')
  , (3, 1, 'knows')
  , (4, 3, 'knows')
  , (4, 6, 'knows')
  , (6, 5, 'knows')
;


Next create a Property Graph to define the relationships in the Social Network


CREATE OR REPLACE PROPERTY GRAPH social_graph
  VERTEX TABLES (
    users KEY (user_id) PROPERTIES ARE ALL COLUMNS
  )
  EDGE TABLES ( 
    knows
      key (user_id_from, user_id_to) 
      SOURCE KEY (user_id_from) REFERENCES users(user_id)
      DESTINATION KEY(user_id_to) REFERENCES users(user_id)
      PROPERTIES ARE ALL COLUMNS
    
);


We can query the the Property Graph as follows:



SELECT distinct 
   *
FROM GRAPH_TABLE (
       social_graph
       MATCH   (user_id_from  IS users) - [e_path is knows] -> (user_id_to is users)
       COLUMNS (
               user_id_from.user_name  as user_name_a
               , user_id_to.user_name  as user_name_b
               , vertex_id(user_id_from) as v1
               , vertex_id(user_id_to) as v2
               , edge_id(e_path) as e1

        )
) --as recommendations
;



   Next let's run the Betweenness Centrality on this Network Graph


print(session)
graph = session.read_graph_by_name("SOCIAL_GRAPH", "pg_sql")
print(graph)
analyst.vertex_betweenness_centrality(graph)
graph.query_pgql("""
SELECT 
  a.user_name
  , a.betwenness
FROM MATCH (a)
""").print()


+-------------------------+
| user_name | betweenness |
+-------------------------+
| Fatima    | 3.0         |
| Uroosa    | 0.0         |
| Hannah    | 2.0         |
| Maryam    | 0.0         |
| Zainab    | 0.0         |
| Urwa      | 1.0         |
| Zoha      | 0.0         |
+-------------------------+
 

We note that the 'Fatima' node has the highest score, followed by the 'Hannah' node. Studying the example graph we can see that these nodes are in bottleneck positions in the graph. The 'Fatima' node connects the 'Uroosa' nodes to all other nodes, which increases its score. In particular, the shortest path from 'Uroosa' to any other reachable node passes through 'Fatima'. 

Conversely, there are no shortest paths that pass through either of the nodes 'Uroosa', 'Maryam', or 'Zainab' which causes their betweenness centrality score to be zero.

 

Tuesday, October 28, 2025

Pagerank in Oracle Graph (SQL/PGQ)

Oracle 26ai has support for running advance Graphs Algorithms like Pagerank on Property Graphs (SQL/PGQ) stored the in database. While Graphs represent relationships, Graph algorithms provide additional features that quantitatively capture how an entity is connected to other entities.

This blog is intended to explore the Graph Algorithms and how to use then Oracle 26ai. In this post we will look at the Pagerank Algorithm.

When analyzing Graphs, we are often interested in what is the value of a vertex (node) in relation to other vertices? This is where the Centrality, PageRank, Betweenness Centrality measures become useful.  In pagerank, we measure importance of nodes by incoming edges, importance of neighboring nodes.

Pagerank Algorithm:

  1. Give the initial rank to the starting node only
  2. Each node gets ranks from incoming edges, and distribute own ranks to outgoing edges
  3. Iterate until the ranks are settled  

Let's start with a simple example where have a Beneficial Ownership graph of some fictitious entities:

Let's first create some sample data and a Property Graph that captures the Beneficial Ownership relationship.

 


create table entity(entity_name varchar(80));
create table ownership(entity_name varchar(80), owner_name varchar(80), percentage float);


insert into entity values 
  ('Uroosa'), ('Fatima'), ('Zainab')
  , ('Maryam'), ('Hannah'), ('Acme Corp.')
  , ('Northwind'), ('Fourth Coffee'), ('Tasmanian Traders')
  , ('Blue Yonder'), ('Contoso'), ('Hotel Kitchen Sink'), ('Coffee Corp.');
insert into OWNERSHIP values 
  ('Acme Corp.', 'Uroosa', 0.80)
  , ('Acme Corp.', 'Fatima', 0.20)
  , ('Coffee Corp.', 'Maryam', 1)
  , ('Hotel Kitchen Sink', 'Hannah', 0.25)
  , ('Hotel Kitchen Sink', 'Uroosa', 0.15)
  , ('Fourth Coffee', 'Zainab', 0.19)
  , ('Fourth Coffee', 'Acme Corp.', 0.30)
  , ('Hotel Kitchen Sink', 'Northwind', 0.09)
  , ('Fourth Coffee', 'Blue Yonder', 0.51)
  , ('Hotel Kitchen Sink', 'Fourth Coffee', 0.51) 
  , ('Blue Yonder', 'Coffee Corp.', 0.51)
  , ('Blue Yonder', 'Tasmanian Traders', 0.49);

Next create a Property Graph to capture the Beneficial Ownership relationships. In CREATE PROPERTY GRAPH statement VERTEX TABLES clause and EDGE TABLES clause list the source tables. This creates a graph with materialized vertices and edges.
CREATE OR REPLACE PROPERTY GRAPH beneficial_owner_graph
  VERTEX TABLES (
    entity KEY ( entity_name ) PROPERTIES ARE ALL COLUMNS
)
  EDGE TABLES (
   ownership AS ownership_percentage
     key (entity_name, owner_name)
     SOURCE KEY ( owner_name ) REFERENCES entity( entity_name )
     DESTINATION KEY ( entity_name ) REFERENCES entity( entity_name )
     PROPERTIES ARE ALL COLUMNS
);
  

Next run the pagerank algorithm on this graph.


%python-pgx

print(session)
graph = session.read_graph_by_name("BENEFICIAL_OWNER_GRAPH", "pg_sql")
print(graph)
analyst.pagerank(graph)
analyst.degree_centrality(graph)
analyst.in_degree_centrality(graph)
analyst.out_degree_centrality(graph)
analyst.vertex_betweenness_centrality(graph)

After running algorithms, the results are stored into the graph, e.g. each node gets a new property "pagerank". Users can access the results using pgql:

graph.query_pgql("""
SELECT 
 a.degree
 , a.in_degree
 , a.out_degree
 , a.betweenness
FROM MATCH (a)
ORDER BY a.pagerank DESC
LIMIT 10
""").print()

 Here is the output from the Pagerank algorithm: 


PgxSession(id: 9a0e419f-733f-415a-a8a7-388aa2f48d4a, name: ADMIN)
PgxGraph(name: BENEFICIAL_OWNER_GRAPH_6, v: 13, e: 12, directed: True, memory(Mb): 0)
+-------------------------------------------------------------------------------------------+
| ENTITY_NAME        | pagerank             | degree | in_degree | out_degree | betweenness |
+-------------------------------------------------------------------------------------------+
| Hotel Kitchen Sink | 0.10169935096153847  | 4      | 4         | 0          | 0.0         |
| Fourth Coffee      | 0.07722548076923078  | 4      | 3         | 1          | 7.0         |
| Blue Yonder        | 0.03949038461538462  | 3      | 2         | 1          | 6.0         |
| Acme Corp.         | 0.026250000000000006 | 3      | 2         | 1          | 3.0         |
| Coffee Corp.       | 0.021346153846153848 | 2      | 1         | 1          | 3.0         |
| Northwind          | 0.01153846153846154  | 1      | 0         | 1          | 0.0         |
| Zainab             | 0.01153846153846154  | 1      | 0         | 1          | 0.0         |
| Hannah             | 0.01153846153846154  | 1      | 0         | 1          | 0.0         |
| Tasmanian Traders  | 0.01153846153846154  | 1      | 0         | 1          | 0.0         |
| Maryam             | 0.01153846153846154  | 1      | 0         | 1          | 0.0         |
+-------------------------------------------------------------------------------------------+

Each vertex has a pagerank value between 0 and 1. Higher the value, more important the node. This is additional information about your data. It is explicilitly not present in the data, but derived from how data is connected.

Monday, October 20, 2025

Dark Mode for Oracle Graph Studio Notebooks?

 Is there a way to enable Dark Mode for Oracle Graph Studio Notebooks?

 

Support for running Graph Algorithm on the database server in Oracle 26ai

I was really hoping that Oracle 26ai would bring the support for running Graph Algorithms on the Oracle Database server as part of the SQL/PGQ support. This did not happen. We still need the Oracle Graph Server to run the Graph Algorithms on Property Graphs store in Oracle 26ai.

Hoping this comes soon. Running Graph Algorithms on a Graph Server while the data resides on Oracle 26ai is rather cumbersome.