alibaba/GraphScope

[BUG] Running clustering app on graphscope.nx is slower than networkx over p2p dataset

Open

#2,934 opened on 2023年6月26日

GitHub で見る
 (0 comments) (0 reactions) (0 assignees)HTML (2,401 stars) (301 forks)batch import
component:networkxgood first issueperformance

説明

  import os
  import graphscope.nx as gs_nx
  import networkx as nx
  import time


  start = time.time()
  g1 = nx.read_edgelist(
      os.path.expandvars('./p2p-31.e'),
      nodetype=int,
      data=False,
      create_using=nx.Graph
  )
  print(type(g1))
  print("networkx = ", time.time() - start)
  # networkx.classes.graph.Graph
  start = time.time()
  g2 = gs_nx.read_edgelist(
      os.path.expandvars('./p2p-31.e'),
      nodetype=int,
      data=False,
      create_using=gs_nx.Graph
  )
  print(type(g2))
  print("gs = ", time.time() - start)
  start = time.time()
  ret_nx = nx.clustering(g1)
  print("networkx = ", time.time() - start)
  # 0.91s

  start = time.time()
  ret_gs = gs_nx.clustering(g2)
  print("gs = ", time.time() - start)
  # 2.12s

  # compare the results
  print(ret_gs == ret_nx)

In addition, our blog shows on Twitter dataset, graphscope.nx is over 25X faster than networkx, but on my testbed, graphscope.nx is only about 7x faster than networkx.

コントリビューターガイド