Architecture¶
Basic overview¶
Submitting an interactive job to the RCE by issuing the following command:
rce_submit.py -r -a shell
performs the following tasks:
- Creates a classad for the job and submits this classad to the central manager. A truncated version of a such a generated classad is reproduced below.
HMDCUseXpra = true
Email = "evansarm@gmail.com"
AccountingGroup = "group_interactive.esarmien"
User = "esarmien@hmdc.harvard.edu"
OnExitHold = false
MyType = "Job"
PeriodicHold = false
PeriodicRemove = false
Err =
strcat("/nfs/home/E/esarmien/.HMDC/jobs/interactive","/","shell_2.31.3","_",ClusterId,"_","201510271445984022","/err.txt")
ProcId = 0
HMDCApplicationName = "shell"
AcctGroupUser = "esarmien"
JobUniverse = 5
In = "/dev/null"
HMDCApplicationVersion = "2.31.3"
Requirements = true && TARGET.OPSYS == "LINUX" && TARGET.ARCH ==
"X86_64" && TARGET.FileSystemDomain == MY.FileSystemDomain &&
TARGET.Disk >= RequestDisk && TARGET.Memory >= RequestMemory
LocalJobDir =
strcat("/nfs/home/E/esarmien/.HMDC/jobs/interactive","/","shell_2.31.3","_",ClusterId,"_","201510271445984022")
PublicClaimId = "<10.0.0.34:9619>#1442840844#313#..."
WhenToTransferOutput = "ON_EXIT"
Environment = "_='/usr/bin/rce_submit.py'
XAUTHORITY='/nfs/home/E/esarmien/.Xauthority'
MY_INTERACTIVE_JOB_DIR='/nfs/home/E/esarmien/.HMDC/jobs/interactive'
LOCATE_PATH=':/nfs/tools/lib/locate/locate.db::/nfs/tools/lib/locate/locate.db::/nfs/tools/lib/locate/locate.db'
KDE_IS_PRELINKED='1'
HMDC_PROD_KEYS_PATH='/nfs/home/E/esarmien/.hmdc_prod_keys'
rvm_version='1.26.11 (latest)' NX_ROOT='/nfs/home/E/esarmien/.nx'
COLORTERM='gnome-terminal' LINES='43'
rvm_path='/nfs/home/E/esarmien/.rvm' LESSOPEN='||/usr/bin/lesspipe.sh
%s' RUBY_VERSION='ruby-2.2.1'
_CONDOR_ENTITLEMENTS='\"admin,mail_manager_root,hmdcOpsview,rt,cvs,jabber,desktopadmin,jenkins,login_manager_rce,bomgar\"'
LOGNAME='esarmien' USER='esarmien' HOME='/nfs/home/E/esarmien'
PATH='/opt/bin:/usr/local/bin:/opt/bin:/usr/local/bin:/nfs/home/E/esarmien/.rvm/gems/ruby-2.2.1/bin:/nfs/home/E/esarmien/.rvm/gems/ruby-2.2.1@global/bin:/nfs/home/E/esarmien/.rvm/rubies/ruby-2.2.1/bin:/opt/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/HMDC/sbin:/usr/local/HMDC/bin:/nfs/home/E/esarmien/bin:/nfs/home/E/esarmien/.rvm/bin:/nfs/home/E/esarmien/.rvm/bin:/nfs/home/E/esarmien/.rvm/bin:/nfs/home/E/esarmien/bin:/usr/local/HMDC/sbin:/usr/local/HMDC/bin:/nfs/home/E/esarmien/bin:/usr/local/HMDC/sbin:/usr/local/HMDC/bin:/nfs/home/E/esarmien/bin:/nfs/home/E/esarmien/.rvm/bin:/nfs/home/E/esarmien/.rvm/bin:/nfs/home/E/esarmien/.rvm/bin'
MFINPUTS=':/usr/share/texlive/texmf-dist/fonts/source/public/mathabx::/usr/share/texlive/texmf-dist/fonts/source/public/mathabx::/usr/share/texlive/texmf-dist/fonts/source/public/mathabx:'
HMDC_DEV_KEYS_PATH='/nfs/home/E/esarmien/.hmdc_dev_keys'
_CONDOR_EMAIL='\"esarmien@g.harvard.edu\"'
LD_LIBRARY_PATH='/opt/lib64:/opt/lib:/nfs/home/E/esarmien/lib'
SSH_AGENT_PID='17080' LANG='en_US.UTF-8' TERM='xterm' SHELL='/bin/bash'
CVS_RSH='ssh'
XDG_SESSION_COOKIE='eb5986295c70101fb32f63f90000001a-1445872821.791472-194473865'
SESSION_MANAGER='local/unix:@/tmp/.ICE-unix/17079,unix/unix:/tmp/.ICE-unix/17079'
SHLVL='2' _system_arch='x86_64' NXDIR='/usr/NX' G_BROKEN_FILENAMES='1'
NX_CLIENT='/usr/NX/bin/nxclient' HISTSIZE='1000' WINDOWID='41943043'
ORBIT_SOCKETDIR='/scratch/orbit-esarmien' XMODIFIERS='@im=none'
IMSETTINGS_INTEGRATE_DESKTOP='yes' GIO_LAUNCHED_DESKTOP_FILE_PID='3453'
NX_SYSTEM='/usr/NX'
MY_BATCH_JOB_DIR='/nfs/home/E/esarmien/.HMDC/jobs/batch'
GEM_PATH='/nfs/home/E/esarmien/.rvm/gems/ruby-2.2.1:/nfs/home/E/esarmien/.rvm/gems/ruby-2.2.1@global'
rvm_bin_path='/nfs/home/E/esarmien/.rvm/bin' USERNAME='esarmien'
IMSETTINGS_MODULE='none' GTK_IM_MODULE='gtk-im-context-simple'
GIO_LAUNCHED_DESKTOP_FILE='/usr/share/applications/gnome-terminal.desktop'
_system_version='6' rvm_prefix='/nfs/home/E/esarmien'
HMDC_ADMIN_PATH='/nfs/home/E/esarmien/git/hmdc-admin' NX_TEMP='/tmp'
T1FONTS=':/usr/share/texlive/texmf-dist/fonts/type1/public/mathabx-type1::/usr/share/texlive/texmf-dist/fonts/type1/public/mathabx-type1::/usr/share/texlive/texmf-dist/fonts/type1/public/mathabx-type1:'
SSH_AUTH_SOCK='/scratch/keyring-11bgaB/socket.ssh'
IRBRC='/nfs/home/E/esarmien/.rvm/rubies/ruby-2.2.1/.irbrc'
_system_type='Linux'
MY_RUBY_HOME='/nfs/home/E/esarmien/.rvm/rubies/ruby-2.2.1'
ACRO_ENABLE_FONT_CONFIG='1' COLUMNS='166' SPSSTMPDIR='/scratch'
_system_name='CentOS' NX_SESSION_ID='2FB0CB7AC10E341D8605AEC17D60E014'
TMPDIR='/scratch'
TEXINPUTS=':/usr/local/share/texlive/texmf:/usr/share/texlive/texmf-dist:/usr/share/texmf:/usr/local/share/texmf/hmdc/misc:/usr/share/texlive/texmf-dist/tex/latex/powerdot:/usr/local/share/texmf/hmdc/imsart:/usr/share/texlive/texmf-dist/tex/generic/mathabx::/usr/local/share/texlive/texmf:/usr/share/texlive/texmf-dist:/usr/share/texmf:/usr/local/share/texmf/hmdc/misc:/usr/share/texlive/texmf-dist/tex/latex/powerdot:/usr/local/share/texmf/hmdc/imsart:/usr/share/texlive/texmf-dist/tex/generic/mathabx::/usr/local/share/texlive/texmf:/usr/share/texlive/texmf-dist:/usr/share/texmf:/usr/local/share/texmf/hmdc/misc:/usr/share/texlive/texmf-dist/tex/latex/powerdot:/usr/local/share/texmf/hmdc/imsart:/usr/share/texlive/texmf-dist/tex/generic/mathabx::/usr/share/texmf/tex/latex/ppower4:/usr/lib/R/share/texmf:/usr/share/latex2html/texinputs::/usr/share/texmf/tex/latex/ppower4:/usr/lib/R/share/texmf:/usr/share/latex2html/texinputs::/usr/share/texmf/tex/latex/ppower4:/usr/lib/R/share/texmf:/usr/share/latex2html/texinputs:'
CVSEDITOR='vi' KDEDIRS='/usr' JAVA_HOME='/usr/lib/jvm/java-1.8.0/jre'
DISPLAY=':1004.0' NX_CUPS_BIN='/usr/bin'
BIBINPUTS='.:/nfs/home/E/esarmien/gkbibtex:.:/nfs/home/E/esarmien/gkbibtex:.:/nfs/home/E/esarmien/gkbibtex::'
OLDPWD='/nfs/home/E/esarmien/projects/rce-interactive-tools'
HOSTNAME='dev-rce6-2.hmdc.harvard.edu'
BSTINPUTS='.:/nfs/home/E/esarmien/gkbibtex:.:/nfs/home/E/esarmien/gkbibtex:.:/nfs/home/E/esarmien/gkbibtex::'
TEXMFLOCAL=':/usr/share/texlive/texmf:/usr/share/texlive/texmf-dist:/usr/share/texmf:/usr/local/share/texmf::/usr/share/texlive/texmf:/usr/share/texlive/texmf-dist:/usr/share/texmf:/usr/local/share/texmf::/usr/share/texlive/texmf:/usr/share/texlive/texmf-dist:/usr/share/texmf:/usr/local/share/texmf:'
HISTCONTROL='ignoredups' PWD='/nfs/home/E/esarmien' QT_IM_MODULE='xim'
GTK_RC_FILES='/etc/gtk/gtkrc:/nfs/home/E/esarmien/.gtkrc-1.2-gnome2'
MAIL='/var/spool/mail/esarmien'
LS_COLORS='rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:'
GEM_HOME='/nfs/home/E/esarmien/.rvm/gems/ruby-2.2.1'
TFMFONTS=':/usr/share/texlive/texmf-dist/fonts/tfm/public/mathabx::/usr/share/texlive/texmf-dist/fonts/tfm/public/mathabx::/usr/share/texlive/texmf-dist/fonts/tfm/public/mathabx:'"
TargetType = "Machine"
LeaveJobInQueue = false
JobNotification = 1
Owner = "esarmien"
CondorPlatform = "$CondorPlatform: x86_64_RedHat6 $"
JobLeaseDuration = 1200
RecentBlockWriteKbytes = 0
TransferIn = false
ExitStatus = 0
RootDir = "/"
NumJobMatches = 1
JobCurrentStartDate = 1445969625
HMDCInteractive = true
Args = false
CondorVersion = "$CondorVersion: 8.2.9 Aug 13 2015 BuildID: 335839 $"
Out =
strcat("/nfs/home/E/esarmien/.HMDC/jobs/interactive","/","shell_2.31.3","_",ClusterId,"_","201510271445984022","/out.txt")
ShouldTransferFiles = "NO"
FileSystemDomain = "hmdc.harvard.edu"
JobPrio = 0
NumCkpts = 0
DebugPrepareJobHook = true
BufferBlockSize = 32768
ImageSize = 325000
StatsLifetimeStarter = 3286729
Cmd = "/usr/bin/gnome-terminal"
Iwd = "/nfs/home/E/esarmien"
AcctGroup = "group_interactive"
Entitlements = "admin mail_manager_root hmdcOpsview rt cvs jabber
desktopadmin jenkins login_manager_rce bomgar"
- Some of these classad elements are added by rce_submit.py and have effects on job scheduling.
- Upon submission, rce_submit.py polls the central manager, every five
seconds, to check whether the job has started yet, polling for a
maximum of 90 seconds. The
POLL_TIMEOUT
is set inhmdccondor/HMDCCondor.py
. - When the poller finds that the submitted classad has a
JobStatus == 2
,rce_submit.py
launches Xpra to connect to the running job’s xpra server.
Server-side operations¶
Starting a job¶
Most of the work is performed by HTCondor startd or execute nodes.
When a job submitted using rce_submit.py starts running, the execute node runs
HMDC_interactive_prepare_job
, as configured inhmdc-admin/templates/etc-condor-config.d/compute.config.erb
:HMDC_HOOK_UPDATE_JOB_INFO = $(PIP_BINDIR)/HMDC_periodic_job_is_idle.py HMDC_HOOK_PREPARE_JOB = $(PIP_BINDIR)/HMDC_interactive_prepare_job.py HMDC_HOOK_JOB_EXIT = $(PIP_BINDIR)/HMDC_clean_up.py STARTER_JOB_HOOK_KEYWORD = HMDC
HMDC_interactive_prepare_job
performs the following tasks:- Creates a job directory under
$HOME/.HMDC/jobs/interactive
, specified by the classad elementLocalJobDir
to house stdout, stderr, and console output from job. When the Xpra server is started, Xpra server output is written to this directory.
- Creates a job directory under
After
HMDC_interactive_prepare_job
successfully completes,HMDC_job_wrapper.py
executes the command specified in the job classad by:- Setting ulimits on the executing job based on the slot’s memory and cpu allocation.
- Executing an Xpra server which runs the command specified in the job classad.
Counting idle time¶
HMDC_periodic_job_is_idle.py
is run periodically for each job running on an execute node, as configured inhmdc-admin/templates/etc-condor-config.d/compute.config.erb
.HMDC_periodic_job_is_idle.py
performs the following functions:- Opens
$TEMP/.idletime
and reads the integer representing the total time a job was idle. Here,$TEMP
is the job’s execute directory under/tmp/condor/execute
- If the job is currently idle,
HMDC_periodic_job_is_idle.py
subtracts the mtime of$TEMP/.idletime
from the current time and adds this value to the idle time value stored in$TEMP/.idletime
. - If the job is currently active,
HMDC_periodic_job_is_idle.py
writes 0 to$TEMP/.idletime
- If the job is currently idle and idle for two or more days,
HMDC_periodic_job_is_idle.py
sends a notification to the job owner of the job’s impeding preemptibility.
- Opens
HMDC_periodic_job_is_idle.py
calculates idletime for a job, but, a different script actually propagates this value to the HTCondor collector.Every five minutes, cron runs
/usr/bin/HMDC_startd_cron_idle_generator.py
/usr/bin/HMDC_startd_cron_idle_generator.py
performs the following tasks:For every running job, opens and reads the value in
$TMP/.idletime
If idletime is greater than zero, generates a string composed of all Job IDs and idle times and writes this as a script to
/usr/bin/HMDC_startd_cron_idle.sh
, for example:[root@dev-cod6-1 bin]# /usr/bin/HMDC_startd_cron_idle.sh HMDCIdleJobs = "110.0,3702664 113.0,3637730 114.0,0 187.0,105483 192.0,105480 198.0,105484"
Every ten seconds, HTCondor executes
/usr/bin/HMDC_startd_cron_idle.sh
, which publishes theHMDCIdleJobs
machine classad to the collector, as configured in:STARTD_CRON_IDLEJOBS_AUTOPUBLISH = If_Changed STARTD_CRON_IDLEJOBS_EXECUTABLE = /usr/bin/HMDC_startd_cron_idle.sh STARTD_CRON_IDLEJOBS_MODE = Periodic STARTD_CRON_IDLEJOBS_PERIOD = 10s STARTD_CRON_JOBLIST = idlejobs
Note
Unfortunately, a system cron job and an HTCondor cron job are
required, but, not desired. The .idletime
file created by
HMDC_periodic_job_is_idle.py
is owned by the executing user,
whereas scripts executed by STARTD_CRON
run as user daemon
and
are unable to read .idletime
files. Therefore, a root system
cronjob reads these files such that the STARTD_CRON` job can access
them.
ClassAd Elements¶
ClassAd Element | Accepted values | Description | Effect |
---|---|---|---|
HMDCUseXpra | True, False | Determines whether a job should use XPRA | If True, execute node treats this job as an XPRA-enabled interactive job, launching an XPRA server to execute the job’s command. |
AccountingGroup | group_interactive.$(Owner) or group_batch.$(Owner) | Places a job into an accounting group | If set to group_interactive.$(Owner), user’s job is limited by group_interactive’s quota. If set to group_batch.$(Owner), user’s job is l imited by group_batch’s quota. In January 2016, quotas will be disabled in favor of multi-slot pre-emption and this ClassAd element will become deprecated. |
Err | Fully qualified path to stderr output | Location of stderr output file. | The running job’s stderr output will be redirected to this file. |
HMDCApplicationName | A human readable string denoting the running job’s application | A human readable string denoting the running job’s application | No effect, useful for statistics |
HMDCApplicationVersion | A string denoting the running job’s version | A string denoting the running job’s version | No effect, useful for statistics |
LocalJobDir | Fully qualified path of a directory | This directory will store stdout and stderr output from the running job. | LocalJobDir will be created by the HTCondor execute node and stderr and stdout output friom the job will be stored in this directory. |
Environment | A string of x=y pairs separated by spaces the job’s environment | The job’s environment | This is a standard HTCondor ClassAd element populated by rce_submit.py with the user’s shell environment, subtracting GNOME and DBUS environment variables. |
HMDCInteractive | True, False | Determines whether a job should be treated as an interactive job | HMDCInteractive, when set to True, influences a number of HTCondor policy decisions regarding preemption. |
Entitlements | A string of entitlements separated by spaces | The user’s eduPersonEntitlements | Entitlements is populated by rce_submit.py by querying LDAP, or, when using condor_submit, through the environment varaible $_CONDOR_ENTITLEMENTS ``created by ``/etc/profile.d/Condor_group.sh or /etc/profile.d/Condor_group.csh |
Out | Fully qualified path to stdout output | Location of stdout output file | The running job’s stdout output will be redirected to this file. |