Use Command line options instead of positional arguments when launching spark applications from various CLI commands
#14477 opened on Nov 30, 2025
Description
Hoodie CLI commands like compaction/rollback/repair/savepoints/parquet-import relies on launching a spark application to perform their operations (look at SparkMain.java).
SparkMain (Look at SparkMain.main()) relies on positional arguments for passing various CLI options. Instead we should define proper CLI options in SparkMain and use them (using Jcommander) to improve readability and avoid accidental errors at call sites. For e.g : See com.uber.hoodie.utilities.HoodieCompactor
JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-96
- Type: Task
- Epic: https://issues.apache.org/jira/browse/HUDI-1388
Comments
04/May/19 17:07;abhioncbr;Since some of the commands in SparkMain.java are invoking methods of different classes which already have Jcommander configs. To make all of them easy to use, I am thinking to have a package in sub-project hoodie-common say com.uber.hoodie.common.jobConfigs which will have configs classes of all the Main launcher classes. It will help us to consolidate job configs and make them easy to manage.
[~vbalaji], please share your thoughts on my approach.;;;
04/May/19 20:29;abhioncbr;Also, if pursuing the above approach is it ok to mention class name as a string for example 'com.uber.hoodie.utilities.sources.JsonDFSSource';;;
11/May/19 21:56;abhioncbr;PR raise https://github.com/apache/incubator-hudi/pull/673;;;
08/Aug/19 04:15;vinoth;[~abhioncbr] are you still working on this patch? ;;;
30/Aug/19 16:20;vinoth;Moving ticket back to opened due to inactivity;;;
03/Jan/20 11:12;Pratyaksh;[~vbalaji] I have resumed the work for this ticket and have tried to address most of the comments that you already gave. I have raised a fresh PR for this. Please have a look and let me know your thoughts. Here is the PR - [https://github.com/apache/incubator-hudi/pull/1174].;;;
07/Sep/21 21:07;githubbot;vinothchandar closed pull request #1174: URL: https://github.com/apache/hudi/pull/1174
-- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at: users@infra.apache.org ;;;
07/Sep/21 21:07;githubbot;vinothchandar commented on pull request #1174: URL: https://github.com/apache/hudi/pull/1174#issuecomment-914630077
Closing due to inactivty
-- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at: users@infra.apache.org ;;;