Mystery: Why doesn't SPARK_MASTER_IP accept actual IP addresses?
#43 opened on Nov 5, 2015
Description
This is a mystery that someone can take on for fun or for glory.
If I change these two blocks of code from this:
master_host=master_instance.public_dns_name,
slave_hosts=[i.public_dns_name for i in slave_instances],
to this:
master_host=master_instance.ip_address,
slave_hosts=[i.ip_address for i in slave_instances],
then Spark fails to launch. master_host, in particular, gets plugged into SPARK_MASTER_IP in this template, which seems to set off the problem.
For whatever reason, DNS names work but IP addresses don't. I'm not sure why. Spark's documentation suggests that IP addresses should work.
I've probably misunderstood something about how to configure Spark. Another possibility is that there is a documentation or code bug in Spark itself that needs to be fixed.
One clue I've come across but not tested out is the fact that SPARK_MASTER_HOST is checked here, even though it is not mentioned anywhere else in the Spark codebase. I have a suspicion that SPARK_MASTER_HOST should instead be SPARK_MASTER_IP.
What I can say for certain is that this file is where some master configurations get set, and I have traced code there from start-master.sh. So it's probably a good place to start digging.