[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor, Docker, AWS ECR Login Issues



Morning list,

I am running HTCondor on a number of EC2 instances on AWS. I have a control node and then 1+ worker nodes. I am using the docker universe and am struggling to get the executor to pull down the docker container image from AWS ECR. Here is the setup:
  • Amazon ECR credential helper is installed on the worker AMI: https://github.com/awslabs/amazon-ecr-credential-helper
    • Tested and able to get auth without sudo.
  • IAM role attached to worker node(s) has read/write permissions on the ECR repository.
  • I can ssh / ssm into the worker node and sudo docker pullâ to get the image. Then the image is cached and subsequent jobs can run as expected
  • When I submit a job without a locally cached image, I am getting login errors. (Error: Head "https://##account##.dkr.ecr.us-west-2.amazonaws.com/v2/hidtm-htcondor-repository/manifests/1.0.0": no basic auth credentialsâ. This is an ecr error, not a condor error)
  • I am relatively confident that the jobs are executing as the nobodyâ user as per how the cluster was setup.
Does anyone have experience using HTCondor on AWS with the docker universe? Is condor running docker with sudoâ? I have noticed that the ecr credential helper needs to be run without elevated permissions (e.g., no sudo).

Best,
Jay