apache/beam

[Task]: Create a script to train sklearn model for IT test.

Open

#24,903 opened on Jan 5, 2023

View on GitHub
 (19 comments) (0 reactions) (4 assignees)Java (7,313 stars) (4,097 forks)batch import
P2good first issuepythonrun-inferencetask

Description

What needs to happen?

Sklearn doesn't offer backward compatibility wrt models on newer versions. In the Sklearn IT tests, we use models trained manually and this can get outdated when there is an update to sklearn.

So to tackle this, we need to create a script which trains the sklearn models on the data and then publish this model to a GCS bucket, once this is done we can use this model to run Sklearn IT test.

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Contributor guide