[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Dagman stalling with shadow exception messages?



Hi Michael,

-- ShadowLog on submit host:
4/6 21:00:27 Initializing a VANILLA shadow
4/6 21:00:27 (22190.0) (7173): Request to run on <192.168.1.111:32771> was ACCEPTED
4/6 21:00:27 (22190.0) (7173): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.111)" at line 63 in file NTreceivers.C
----------------------
Note: The above message is repeated for any render host that gets matched, and the hosts are definitely up and visible to the submit host. In addition, that same render host will happily render other jobs from other dags in other people's queues.





You say that the execute node is "visible" to the submit node: to what extent? The times I have experienced these error messages it was due to a firewall somewhere blocking some traffic back to the submitting node. Is your environment firewalled, or does that submitting node run its own firewall (pfilter, etc.)?


Just my two cents' worth...

Mark

--
Dr Mark Calleja Department of Earth Sciences, University of Cambridge
Downing Street, Cambridge CB2 3EQ, UK
Tel. (+44/0) 1223 333408, Fax (+44/0) 1223 333450
http://www.esc.cam.ac.uk/~mcal00



Condor Support Information: http://www.cs.wisc.edu/condor/condor-support/ To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with unsubscribe condor-users <your_email_address>