I keep encountering an error when running nutch on hadoop YARN:
AttemptID:attempt_1423062241884_9970_m_000009_0 Timed out after 600 secs
Some info on my setup. I'm running a 64 nodes cluster with hadoop 2.4.1. Each node has 4 cores, 1 disk and 24Gb of RAM, and the namenode/resourcemanager has the same specs only with 8 cores.
I am pretty sure one of these parameters is to the threshold I'm hitting:
yarn.am.liveness-monitor.expiry-interval-ms
yarn.nm.liveness-monitor.expiry-interval-ms
yarn.resourcemanager.nm.liveness-monitor.interval-ms
but I would like to understand why.
The issue usually appears under heavier load, and most of the time the on the next attempts it is successful. Also if I restart the Hadoop cluster the error goes away for some time.