apache/hudi

Use Command line options instead of positional arguments when launching spark applications from various CLI commands

Open

#14477 opened on Nov 30, 2025

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Java (4,823 stars) (2,431 forks)batch import
component:clifrom-jiragood first issuepriority:highpriority:mediumstatus:pr-availabletype:devtask

Description

Hoodie CLI commands like compaction/rollback/repair/savepoints/parquet-import relies on launching a spark application to perform their operations (look at SparkMain.java). 

SparkMain (Look at SparkMain.main()) relies on positional arguments for passing  various CLI options. Instead we should define proper CLI options in SparkMain and use them (using Jcommander)  to improve readability and avoid accidental errors at call sites. For e.g : See com.uber.hoodie.utilities.HoodieCompactor

JIRA info


Comments

04/May/19 17:07;abhioncbr;Since some of the commands in SparkMain.java are invoking methods of different classes which already have Jcommander configs. To make all of them easy to use, I am thinking to have a package in sub-project hoodie-common say com.uber.hoodie.common.jobConfigs which will have configs classes of all the Main launcher classes. It will help us to consolidate job configs and make them easy to manage. 

[~vbalaji], please share your thoughts on my approach.;;;


04/May/19 20:29;abhioncbr;Also, if pursuing the above approach is it ok to mention class name as a string for example 'com.uber.hoodie.utilities.sources.JsonDFSSource';;;


11/May/19 21:56;abhioncbr;PR raise https://github.com/apache/incubator-hudi/pull/673;;;


08/Aug/19 04:15;vinoth;[~abhioncbr] are you still working on this patch? ;;;


30/Aug/19 16:20;vinoth;Moving ticket back to opened due to inactivity;;;


03/Jan/20 11:12;Pratyaksh;[~vbalaji] I have resumed the work for this ticket and have tried to address most of the comments that you already gave. I have raised a fresh PR for this. Please have a look and let me know your thoughts. Here is the PR - [https://github.com/apache/incubator-hudi/pull/1174].;;;


07/Sep/21 21:07;githubbot;vinothchandar closed pull request #1174: URL: https://github.com/apache/hudi/pull/1174

-- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at: users@infra.apache.org ;;;


07/Sep/21 21:07;githubbot;vinothchandar commented on pull request #1174: URL: https://github.com/apache/hudi/pull/1174#issuecomment-914630077

Closing due to inactivty

-- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at: users@infra.apache.org ;;;

Contributor guide