alibaba/GraphScope

[BUG] Running clustering app on graphscope.nx is slower than networkx over p2p dataset

Open

Aperta il 26 giu 2023

Vedi su GitHub
 (0 commenti) (0 reazioni) (0 assegnatari)HTML (2401 star) (301 fork)batch import
component:networkxgood first issueperformance

Descrizione

  import os
  import graphscope.nx as gs_nx
  import networkx as nx
  import time


  start = time.time()
  g1 = nx.read_edgelist(
      os.path.expandvars('./p2p-31.e'),
      nodetype=int,
      data=False,
      create_using=nx.Graph
  )
  print(type(g1))
  print("networkx = ", time.time() - start)
  # networkx.classes.graph.Graph
  start = time.time()
  g2 = gs_nx.read_edgelist(
      os.path.expandvars('./p2p-31.e'),
      nodetype=int,
      data=False,
      create_using=gs_nx.Graph
  )
  print(type(g2))
  print("gs = ", time.time() - start)
  start = time.time()
  ret_nx = nx.clustering(g1)
  print("networkx = ", time.time() - start)
  # 0.91s

  start = time.time()
  ret_gs = gs_nx.clustering(g2)
  print("gs = ", time.time() - start)
  # 2.12s

  # compare the results
  print(ret_gs == ret_nx)

In addition, our blog shows on Twitter dataset, graphscope.nx is over 25X faster than networkx, but on my testbed, graphscope.nx is only about 7x faster than networkx.

Guida contributor