[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Bug STARTER at <ip> failed to send file(s) to <<ip>:9618>; remaps resulted in a cycle:



We encounter this issue during normal operation of cluster while tracking the issue found this limit factor.Â

Thanks for your help.Â

Thanks & Regards,
Vikrant Aggarwal


On Mon, Apr 6, 2020 at 7:08 PM Zach Miller <zmiller@xxxxxxxxxxx> wrote:
Hello,

No, this won't affect the scalability or performance of a pool.

Just curious, did you run into this because you actually had a 20-deep directory hierarchy? Or just testing the limits of HTCondor?

Anyhow, thanks for the report. Given that the performance impact is very minimal, perhaps we should bump the default ourselves.


Cheers,
-zach


ïOn 4/6/20, 2:31 AM, "HTCondor-users on behalf of ervikrant06@xxxxxxxxx" <htcondor-users-bounces@xxxxxxxxxxx on behalf of ervikrant06@xxxxxxxxx> wrote:

  Hello Zach,


  Thanks for quick response.


  Any known implications of bumping this value on cluster consists of 500 nodes?

  Thanks & Regards,
  Vikrant Aggarwal









  On Mon, Apr 6, 2020 at 11:08 AM Zach Miller <zmiller@xxxxxxxxxxx> wrote:


  Hello Vikrant,

  This is a circuit breaker in the code to prevent infinite recursion when doing filename remaps during transfer.

  The default (as you found) is that 20 directories deep means something might be going wrong. Luckily, there is a config setting you can change that determines how deep HTCondor will go before it considers it a problem. In your configuration, set:
      MAX_REMAP_RECURSIONS = 100 # Or any number you like if you need to go deeper

  Let me know if that works for you.


  Cheers,
  -zach



  From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of "ervikrant06@xxxxxxxxx" <ervikrant06@xxxxxxxxx>
  Reply-To: HTCondor-Users List <htcondor-users@xxxxxxxxxxx>
  Date: Monday, April 6, 2020 at 12:15 AM
  To: HTCondor-Users List <htcondor-users@xxxxxxxxxxx>
  Subject: [HTCondor-users] Bug STARTER at <ip> failed to send file(s) to <<ip>:9618>; remaps resulted in a cycle:

  Hello Experts,

  Jobs went into held status with the messae shown in description. This happens when path mentioned for output/log attribute having more path with more than 19 nested directory.


  Not working: "/tmp/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/out.txt"

  working: ""/tmp/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/out.txt"


  Thanks & Regards,
  Vikrant Aggarwal


  _______________________________________________
  HTCondor-users mailing list
  To unsubscribe, send a message to
  htcondor-users-request@xxxxxxxxxxx <mailto:htcondor-users-request@xxxxxxxxxxx> with a
  subject: Unsubscribe
  You can also unsubscribe by visiting
  https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

  The archives can be found at:
  https://lists.cs.wisc.edu/archive/htcondor-users/





_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/