Add more popular datasets to graphscope built-in datasets · alibaba/GraphScope#1015

Repository metrics

Stars: (2,401 stars)
PR merge metrics: (平均マージ 1m) (30d で 8 merged PRs)

説明

We have several built-in datasets that can be easily loaded in one-line, located in the dataset directory of Aliyun OSS bucket graphscope, and the corresponding utility function to load them, located in python/graphscope/dataset/. We are planning to enrich the datasets continuously.

There's the procedure to add new datasets:

Find a popular and appropriate dataset, adapt the format to property graph if necessary,
Put all data files inside a folder, give the folder a meaningful name,
Compress the folder, then upload the compressed file together with the original folder to the dataset folder of the OSS bucket. Assume you have a folder named foo/, and two files foo/nodes.csv and foo/edge.csv, after this step, you will have the following file structure in the bucket:

dataset
|-- foo.tar.gz
|-- foo
    |-- nodes.csv
    |-- edge.csv

Write the loading function load_foo in a new file named python/graphscope/dataset/foo.py.
A corresponding unit test is appreciated!

コントリビューターガイド

調査方針: この課題は、新しいデータセットを追加するための明確なステップバイステップ手順を提供しています。まず、人気のあるグラフデータセットを見つけ、必要に応じてプロパティグラフ形式に適応させます。次に、CSVやその他のファイルを含むデータセット名のフォルダを作成し、.tar.gzとして圧縮し、圧縮ファイルと元のフォルダの両方をAliyun OSSバケットのデータセットフォルダにアップロードします。その後、python/graphscope/dataset/の下に新しいファイルを作成し、既存のローダーのパターンに従ってPythonローディング関数を記述します。最後に、新しいローダーのユニットテストを追加します。リポジトリ内の既存のデータセットローダー（同じディレクトリ内のものなど）を参考にしてください。
技術スタック: python
領域: backenddata
Issue 種別: 機能
難度: 3
推定時間: 1-2日
活動状況: アクティブ
明確さ: 明確
前提条件: PythonGit
初心者向け度: 80

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。