HttpWaitStrategy does not retry to re-validate ResponsePredicate
#2,516 创建于 2020年4月3日
描述
using the HttpWaitStrategy with a forResponsePredicate does only try it once, although the responseBody will change when startup is completed.
expectation would be that the strategy re-evaluated until timeout is reached and until it is successful.
test setup with a HDFS cluster where the NameNode is in safemode on startup and will change into normal operation mode only after DataNode is successfully registered:
public static final GenericContainer hadoopContainer = new GenericContainer("sequenceiq/hadoop-docker:2.7.1")
.withExposedPorts(50070, 50075, 50090, 9000)
.waitingFor(new WaitAllStrategy()
.withStrategy(Wait.forLogMessage(".*localhost: starting nodemanager.*\n", 1))
.withStrategy(Wait.forHttp("/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo")
.forPort(50070)
.forResponsePredicate(s -> !s.contains("Safe mode is ON.")))
.withStartupTimeout(Duration.ofMinutes(2))
);
the log output clearly only tries it once to validate the predicate:
2020-04-03 10:52:31.339 INFO [ontainers.wait.strategy.HttpWaitStrategy] [ducttape-0] - /magical_euler: Waiting for 120 seconds for URL: http://localhost:33077/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo
2020-04-03 10:52:32.438 TRACE[ontainers.wait.strategy.HttpWaitStrategy] [ducttape-1] - Get response code 200
2020-04-03 10:52:32.440 TRACE[ontainers.wait.strategy.HttpWaitStrategy] [ducttape-1] - Get response { "beans" : [ { "name" : "Hadoop:service=NameNode,name=NameNodeInfo", "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem", "UpgradeFinalized" : true, "ClusterId" : "CID-5e691286-4de5-4dde-800b-c02a7a8bf44a", "Version" : "2.7.1, r15ecc87ccf4a0228f35af08fc56de536e6ce657a", "Used" : 335872, "Free" : 29158596608, "Safemode" : "Safe mode is ON. The reported blocks 31 has reached the threshold 0.9990 of total blocks 31. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 17 seconds.", "NonDfsUsedSpace" : 33566855168, "PercentUsed" : 5.354608E-4, "BlockPoolUsedSpace" : 335872, "PercentBlockPoolUsed" : 5.354608E-4, "PercentRemaining" : 46.48582, "CacheCapacity" : 0, "CacheUsed" : 0, "TotalBlocks" : 31, "TotalFiles" : 35, "NumberOfMissingBlocks" : 0, "NumberOfMissingBlocksWithReplicationFactorOne" : 0, "LiveNodes" : "{\"localhost:50010\":{\"infoAddr\":\"127.0.0.1:50075\",\"infoSecureAddr\":\"127.0.0.1:0\",\"xferaddr\":\"127.0.0.1:50010\",\"lastContact\":0,\"usedSpace\":335872,\"adminState\":\"In Service\",\"nonDfsUsedSpace\":33566855168,\"capacity\":62725787648,\"numBlocks\":31,\"version\":\"2.7.1\",\"used\":335872,\"remaining\":29158596608,\"blockScheduled\":0,\"blockPoolUsed\":335872,\"blockPoolUsedPercent\":5.354608E-4,\"volfails\":0}}", "DeadNodes" : "{}", "DecomNodes" : "{}", "BlockPoolId" : "BP-1961412683-172.17.0.32-1450036414523", "NameDirStatuses" : "{\"failed\":{},\"active\":{\"/tmp/hadoop-root/dfs/name\":\"IMAGE_AND_EDITS\"}}", "NodeUsage" : "{\"nodeUsage\":{\"min\":\"0.00%\",\"median\":\"0.00%\",\"max\":\"0.00%\",\"stdDev\":\"0.00%\"}}", "NameJournalStatus" : "[{\"stream\":\"EditLogFileOutputStream(/tmp/hadoop-root/dfs/name/current/edits_inprogress_0000000000000000192)\",\"manager\":\"FileJournalManager(root=/tmp/hadoop-root/dfs/name)\",\"required\":\"false\",\"disabled\":\"false\"}]", "JournalTransactionInfo" : "{\"LastAppliedOrWrittenTxId\":\"192\",\"MostRecentCheckpointTxId\":\"191\"}", "NNStarted" : "Fri Apr 03 04:52:15 EDT 2020", "CompileInfo" : "2015-06-29T06:04Z by jenkins from (detached from 15ecc87)", "CorruptFiles" : "[]", "DistinctVersionCount" : 1, "DistinctVersions" : [ { "key" : "2.7.1", "value" : 1 } ], "SoftwareVersion" : "2.7.1", "RollingUpgradeStatus" : null, "Threads" : 35, "Total" : 62725787648 } ]}
2020-04-03 10:54:12.156 ERROR[🐳 [sequenceiq/hadoop-docker:2.7.1] ] [main] - Could not start container
org.rnorth.ducttape.TimeoutException: java.util.concurrent.TimeoutException
at org.rnorth.ducttape.timeouts.Timeouts.callFuture(Timeouts.java:70) ~[duct-tape-1.0.8.jar:?]
at org.rnorth.ducttape.timeouts.Timeouts.doWithTimeout(Timeouts.java:60) ~[duct-tape-1.0.8.jar:?]
at org.testcontainers.containers.wait.strategy.WaitAllStrategy.waitUntilReady(WaitAllStrategy.java:53) ~[testcontainers-1.13.0.jar:?]
at org.testcontainers.containers.GenericContainer.waitUntilContainerStarted(GenericContainer.java:876) ~[testcontainers-1.13.0.jar:?]
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:428) ~[testcontainers-1.13.0.jar:?]
at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:317) ~[testcontainers-1.13.0.jar:?]
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81) [duct-tape-1.0.8.jar:?]
at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:315) [testcontainers-1.13.0.jar:?]
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:302) [testcontainers-1.13.0.jar:?]
[...]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_242]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) [?:1.8.0_242]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [?:1.8.0_242]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) [?:1.8.0_242]
at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:217) [junit-4.12.jar:4.12]
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.createTest(SpringJUnit4ClassRunner.java:226) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner$1.runReflectiveCall(SpringJUnit4ClassRunner.java:289) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.methodBlock(SpringJUnit4ClassRunner.java:291) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:246) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190) [spring-test-5.2.3.RELEASE.jar:5.2.3.RELEASE]
at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) [junit-rt.jar:?]
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) [junit-rt.jar:?]
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) [junit-rt.jar:?]
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) [junit-rt.jar:?]
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:1.8.0_242]
at org.rnorth.ducttape.timeouts.Timeouts.callFuture(Timeouts.java:65) ~[duct-tape-1.0.8.jar:?]
... 34 more
i am not sure if i am doing something wrong or this is actually a bug. (the response does change after ~10seconds and does not include the safemode info anymore, so that the predicate should succeed)
otherwise after looking into source code i suspect that the retryUntilScucess has to be replaced with retryUntilTrue (at least from description the former only does until success)