Single GPU benchmark scripts#15514
Conversation
davidrohr
left a comment
There was a problem hiding this comment.
Didn't check anything in detail, but the things that immediately came to my mind
|
|
||
| # ROCm library injection is only useful for HIP runs. Keep it off by default for CUDA/NVIDIA containers, | ||
| # because mixed AMD/NVIDIA hosts can otherwise leak ROCm libraries into LD_LIBRARY_PATH. | ||
| if [[ "${GPUTYPE:-}" == "HIP" && "0$BENCH_AUTO_ROCM_LIBS" == "01" ]]; then |
There was a problem hiding this comment.
With new bash you can just use $BENCH_AUTO_ROCM_LIBS == 1
|
|
||
| export DPL_REPORT_PROCESSING="${DPL_REPORT_PROCESSING:-1}" | ||
|
|
||
| export FST_TMUX_NO_EPN="${FST_TMUX_NO_EPN:-1}" |
There was a problem hiding this comment.
not needed, since start_tmux.sh is not used
| # ---------------------------------------------------------------------------------------------------------------------- | ||
| # Locate original workflow script. Keep the original untouched. | ||
|
|
||
| : "${GEN_TOPO_MYDIR:=$(dirname "$(realpath "$0")")}" |
There was a problem hiding this comment.
Why don't you simple use $O2_ROOT/dpl-workflow.sh?
| export WORKFLOW_PARAMETERS="${WORKFLOW_PARAMETERS:-GPU,CTF}" | ||
| export GPUTYPE="${GPUTYPE:-CUDA}" | ||
| export NGPUS=1 | ||
| export NUMAGPUIDS=1 |
There was a problem hiding this comment.
NUMAGPUIDS and NUMAID should not be set, if not using NUMA pinning
| export EPNSYNCMODE="${EPNSYNCMODE:-0}" | ||
| export SYNCMODE="${SYNCMODE:-1}" | ||
| export SYNCRAWMODE="${SYNCRAWMODE:-0}" | ||
|
|
||
| export TIMEFRAME_RATE_LIMIT="${TIMEFRAME_RATE_LIMIT:-5}" | ||
| export GEN_TOPO_NO_TF_RATE_UPSCALING="${GEN_TOPO_NO_TF_RATE_UPSCALING:-1}" | ||
|
|
||
| export DISABLE_ROOT_OUTPUT="${DISABLE_ROOT_OUTPUT:-1}" | ||
|
|
||
| # Double pipeline requires zsraw input. Therefore default to raw TF input, not CTF. | ||
| export CTFINPUT="${CTFINPUT:-0}" | ||
| export RAWTFINPUT="${RAWTFINPUT:-1}" | ||
| export DIGITINPUT="${DIGITINPUT:-0}" | ||
| export EXTINPUT="${EXTINPUT:-0}" |
There was a problem hiding this comment.
Why do you redefine all the defaults that come from setenv.sh?
I would only set those settings, which you need.
That should be
SYNCMODE=1
TIMEFRAME_RATE_LIMIT=5
RAWTFINPUT=1
| source "$PWD/local_env.sh" | ||
| fi | ||
|
|
||
| export ALICE_O2_FST="${ALICE_O2_FST:-1}" |
There was a problem hiding this comment.
This is a hack for running on MI100, I would not put it in this script
|
|
||
| export ALICE_O2_FST="${ALICE_O2_FST:-1}" | ||
|
|
||
| if [[ -f "$GEN_TOPO_MYDIR/setenv.sh" ]]; then |
There was a problem hiding this comment.
dpl-workflow.sh will source setenv.sh, why do you source it here?
| # Let O2/core dumps land in the benchmark run directory, not in the original working directory. | ||
| export CORE_DUMP_DIR="${CORE_DUMP_DIR:-$RUNDIR}" | ||
| export O2_CORE_DUMP_DIR="${O2_CORE_DUMP_DIR:-$RUNDIR}" | ||
| export FAIRMQ_SHM_MONITOR_CONFIG="${FAIRMQ_SHM_MONITOR_CONFIG:-}" |
There was a problem hiding this comment.
We do not run the SHM MONITOR, why do you need this?
This PR brings two scripts that benchmark the single GPU performance