[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Changes in configuration: 8.6.13 to 8.8.12



Hi Vikrant.
Nope, I didn't get any information.

El miÃ, 11 ago 2021 a las 7:23, <ervikrant06@xxxxxxxxx> escribiÃ:
>
> Hello,
>
> Do you manage to collect any information about these settings? We have upgraded from 8.5.8 to 8.8.5. Some strange issue is reported with one schedd. Job remains in idle state for a long time. We have flocking configured despite of available resources job is not getting scheduled.
>
> # grep '2238959.0' /var/log/condor/SchedLog
> 08/11/21 01:43:19 (pid:9386) job_transforms for 2238959.0: 1 considered, 1 applied (SetTeam)
> 08/11/21 01:43:26 (pid:9386) Request was NOT accepted for claim slot1@xxxxxxxxxxxxxxxxxxxxxxxx<xx.xx.86.63:9618?addrs=xx.xx.86.63-9618&noUDP&sock=9535_cb9e_3> for testuser1 2238959.0
> 08/11/21 01:43:26 (pid:9386) Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxx<xx.xx.86.63:9618?addrs=xx.xx.86.63-9618&noUDP&sock=9535_cb9e_3> for testuser1, 2238959.0) deleted
> 08/11/21 02:03:44 (pid:9386) Request was NOT accepted for claim slot1@xxxxxxxxxxxxxxxxxxxxxxxx<xx.xx.84.136:9618?addrs=xx.xx.84.136-9618&noUDP&sock=515534_05ae_3> for testuser1 2238959.0
> 08/11/21 02:03:44 (pid:9386) Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxx<xx.xx.84.136:9618?addrs=xx.xx.84.136-9618&noUDP&sock=515534_05ae_3> for testuser1, 2238959.0) deleted
> 08/11/21 02:03:44 (pid:9386) Starting add_shadow_birthdate(2238959.0)
> 08/11/21 02:03:44 (pid:9386) Started shadow for job 2238959.0 on slot1@xxxxxxxxxxxxxxxxxxxxxxxx<xx.xx.87.240:9618?addrs=xx.xx.87.240-9618&noUDP&sock=57028_0b07_3> for testuser1, (shadow pid = 4127832)
>
> While going through release notes found bug
>
> https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7754 which looks related to settings mentioned in this thread. Would disabling them help us to avoid the bug?
>
> Following link also doesn't provide much information.
>
> https://github.com/htcondor/htcondor/blob/master/src/condor_utils/param_info.in
>
> Thanks & Regards,
> Vikrant Aggarwal
>
>
> On Tue, Jul 13, 2021 at 3:17 PM <jcaballero.hep@xxxxxxxxx> wrote:
>>
>> Hi again,
>>
>> I have been checking many of those variables in the documentation.
>> Focusing only on the ones that have changed value (not the ones that
>> have been removed or added), I saw several of them are not for the
>> Central Managers, so I guess I can ignore them.
>> Everything is narrowing to a very short set of variables:
>> * NEGOTIATOR_MAX_TIME_PER_SCHEDD
>> * NEGOTIATOR_RESOURCE_REQUEST_LIST_SIZE
>> * NEGOTIATOR_PREFETCH_REQUESTS
>> * NEGOTIATOR_PREFETCH_REQUESTS_MAX_TIME
>>
>> My problem now is with the last 2, as I am not able to find any
>> documentation about them. Not sure what they do, so no idea of the
>> consequences due their value change.
>> What are they about?
>>
>> Cheers,
>> Jose
>>
>> El miÃ, 7 jul 2021 a las 15:32, Jose Caballero
>> (<jcaballero.hep@xxxxxxxxx>) escribiÃ:
>> >
>> > Hi,
>> >
>> >
>> > We are in the process of upgrading our Central Managers from version
>> > 8.6.13 to 8.8.12.
>> > As expected,there are a few changes in the configuration. In
>> > particular in the default settings. Some of the previous ones are
>> > gone, some new ones have appeared.
>> >
>> > I have already filtered a bunch of variables that don't apply to a
>> > Linux-based Grid site.
>> > But I still have a list of variables to inspect.
>> >
>> > While I check the RELEASE NOTES, I was wondering if the developers, or
>> > someone who has done this upgrade recently, has any advice on which
>> > variables I should focus on, which new values should be replaced back
>> > by previous ones in order to keep functionalities, etc.
>> > I am just concerned that some of the new defaults may change
>> > drastically the behaviour of our pool.
>> >
>> > Here [1] is the reduced diff, and here [2] is a link to a temporary
>> > google doc with the same content in an easier to read format :)
>> >
>> >
>> > Thanks a lot in advance.
>> > Cheers,
>> > Jose
>> >
>> >
>> > =======================================================
>> > [1]
>> >
>> > diff 8.6.13 8.8.12
>> >
>> > > C_GAHP_CONTACT_SCHEDD_DELAY = 5
>> > > C_GAHP_MAX_FILE_REQUESTS = 10
>> > > CGAHP_SCHEDD_INTERACTION_TIME = 5
>> > > COLLECTOR_HOST_FOR_NEGOTIATOR = $(FULL_HOSTNAME)
>> > > COLLECTOR_QUERY_MAX_WORKTIME = 0
>> > > COLLECTOR_QUERY_WORKERS_PENDING = 50
>> > > COLLECTOR_QUERY_WORKERS_RESERVE_FOR_HIGH_PRIO = 1
>> > > CONDOR_Q_SHOW_OLD_SUMMARY = false
>> >
>> > < CURB_MATCHMAKING = false
>> > > CURB_MATCHMAKING = RecentDaemonCoreDutyCycle > 0.98
>> >
>> > > DAGMAN_AGGRESSIVE_SUBMIT = false
>> > > DAGMAN_ALLOW_ANY_NODE_NAME_CHARACTERS = false
>> >
>> > < DAGMAN_ALLOW_LOG_ERROR = false
>> >
>> > < DAGMAN_MAX_SUBMITS_PER_INTERVAL = 5
>> > > DAGMAN_MAX_SUBMITS_PER_INTERVAL = 100
>> >
>> > > DAGMAN_QUEUE_UPDATE_INTERVAL = 300
>> > > DAGMAN_REPORT_GRAPH_METRICS = false
>> >
>> > < DATABASE_PURGE_INTERVAL =
>> > < DATABASE_REINDEX_INTERVAL =
>> > < DBMSD = $(SBIN)/condor_dbmsd
>> > < DBMSMANAGER_NAME =
>> >
>> > < DEFAULT_JOB_MAX_RETRIES = 10
>> > > DEFAULT_JOB_MAX_RETRIES = 2
>> >
>> > > DEFRAG_DRAINING_START_EXPR = FALSE
>> > > DELEGATE_FULL_JOB_GSI_CREDENTIALS = false
>> >
>> > < EMAIL_NOTIFICATION_CC =
>> > < ENABLE_ADDRESS_REWRITING = true
>> >
>> > > ENABLE_HTTP_PUBLIC_FILES = false
>> > > ENABLE_MULTIFILE_TRANSFER_PLUGINS = false
>> >
>> > < ENABLE_WEB_SERVER = false
>> >
>> > < FILE_TRANSFER_DISK_LOAD_THROTTLE =
>> > > FILE_TRANSFER_DISK_LOAD_THROTTLE = 2.0
>> >
>> > > FILE_TRANSFER_STATS_LOG = $(LOG)/transfer_history
>> > > GAHP_SSL_CADIR =
>> > > GAHP_SSL_CAFILE =
>> >
>> > < HISTORY_HELPER_MAX_CONCURRENCY = 2
>> > > HISTORY_HELPER_MAX_CONCURRENCY = 50
>> >
>> > > HTTP_PUBLIC_FILES_ADDRESS = 127.0.0.1:80
>> > > HTTP_PUBLIC_FILES_ROOT_DIR = /usr/share/nginx/html
>> > > HTTP_PUBLIC_FILES_STALE_AGE = 604800
>> >
>> > < IS_OWNER = (START =?= False)
>> > > IS_OWNER = False
>> >
>> > > JOB_DEFAULT_LEASE_DURATION = 2400
>> >
>> > < JOB_PROXY_OVERRIDE_FILE =
>> >
>> > > JOB_ROUTER_ROUND_ROBIN_SELECTION = false
>> > > KEYRING_SESSION_CREATION_TIMEOUT = 20
>> >
>> > < MASTER_SQLLOG =
>> >
>> > > MAX_ACCEPTS_PER_CYCLE = 8
>> > > MAX_CONCURRENT_DOWNLOADS = 100
>> > > MAX_CONCURRENT_UPLOADS = 100
>> > > MAX_DRAINING_ACTIVATION_DELAY = 20
>> > > MAX_PENDING_STARTD_CONTACTS = 0
>> > > MAX_REAPS_PER_CYCLE = 0
>> > > MAX_REMAP_RECURSIONS = 128
>> >
>> > < MAX_RUNNING_SCHEDULER_JOBS_PER_OWNER =
>> > > MAX_RUNNING_SCHEDULER_JOBS_PER_OWNER = 200
>> >
>> > > MAX_TIMER_EVENTS_PER_CYCLE = 3
>> > > MAX_UDP_MSGS_PER_CYCLE = 1
>> >
>> > < MAX_XML_LOG = 1900000000
>> >
>> > < MOUNT_UNDER_SCRATCH =
>> > > MOUNT_UNDER_SCRATCH = "/tmp,/var/tmp"
>> >
>> > > NEGOTIATOR_DEPTH_FIRST = false
>> > > NEGOTIATOR_JOB_CONSTRAINT =
>> > > NEGOTIATOR_MAX_TIME_PER_CYCLE = 1200
>> >
>> > < NEGOTIATOR_MAX_TIME_PER_SCHEDD = 31536000
>> > > NEGOTIATOR_MAX_TIME_PER_SCHEDD = 120
>> >
>> > < NEGOTIATOR_PREFETCH_REQUESTS = false
>> > < NEGOTIATOR_PREFETCH_REQUESTS_MAX_TIME = 120
>> > > NEGOTIATOR_PREFETCH_REQUESTS_MAX_TIME = 60
>> > > NEGOTIATOR_PREFETCH_REQUESTS = true
>> >
>> > < NEGOTIATOR_RESOURCE_REQUEST_LIST_SIZE = 20
>> > > NEGOTIATOR_RESOURCE_REQUEST_LIST_SIZE = 200
>> >
>> > > NEGOTIATOR_SOCKET_CACHE_SIZE = 500
>> >
>> > < OBITUARY_LOG_LENGTH = 20
>> > > OBITUARY_LOG_LENGTH = 200
>> >
>> > < PREEN_MAX_SCHEDD_CONNECTION_TIME = 20
>> >
>> > > SCHEDD_ALLOW_LATE_MATERIALIZE = true
>> > > SEC_CREDENTIAL_REFRESH_INTERVAL = -1
>> >
>> > < SHARED_PORT_ADDRESS_REWRITING = false
>> > < STARTD_COMPUTE_AVAIL_STATS = false
>> > < STARTD_MAX_AVAIL_PERIOD_SAMPLES = 100
>> >
>> > < START_SCHEDULER_UNIVERSE = TotalSchedulerJobsRunning < 200
>> > > START_SCHEDULER_UNIVERSE = TotalSchedulerJobsRunning < 500
>> >
>> > > SUBMIT_DEFAULT_SHOULD_TRANSFER_FILES =
>> >
>> > < SUBMIT_SKIP_FILECHECKS =
>> > > SUBMIT_SKIP_FILECHECKS = true
>> >
>> > < SYSTEM_VALID_SPOOL_FILES = job_queue.log, job_queue.log.tmp,
>> > history, Accountant.log, Accountantnew.log, local_univ_execute,
>> > .quillwritepassword, .pgpass, .schedd_address, .schedd_address.super,
>> > .schedd_classad, OfflineLog
>> > > SYSTEM_VALID_SPOOL_FILES = job_queue.log, job_queue.log.tmp, history, Accountant.log, Accountantnew.log, local_univ_execute, .pgpass, .schedd_address, .schedd_address.super, .schedd_classad, OfflineLog
>> >
>> > > TRUST_LOCAL_UID_DOMAIN = true
>> >
>> > < WARN_ON_UNUSED_SUBMIT_FILE_MACROS =
>> > < WEB_ROOT_DIR =
>> >
>> > > WARN_ON_UNUSED_SUBMIT_FILE_MACROS = true
>> >
>> >
>> > [2]
>> > https://docs.google.com/document/d/1x43ksFjfDGLozJsVbMABJmtZeGR4i2XFgzGYqlXr5YA/edit?usp=sharing
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/