HPL .dat config file

ASC18

The following is the HPL .dat configuration file template from ASC18.

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
67200 65280 62976 65280 96000 65280 38400 96000 102400 168960 153600 76800   142848 153600 142848 124416 96256 142848 124416 115200 110592 96256 Ns
1             # of NBs
384 768 384 768 1024 768 896 768 1024 512 384 640 768 896 960 1024 1152 1280 384 640 960 768 640 256  960 512 768 1152         NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
2 1 2 1        Ps
1 2 2 4        Qs
16.0         threshold
1            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
2 8          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
2 0 2          BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
1            SWAP (0=bin-exch,1=long,2=mix)
192          swapping threshold
1            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

ASC20

The following is the HPL .dat configuration file template from ASC20. Machine Spec : 8 Tesla V100

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
2            # of problems sizes (N)
175104 178176 165888 168960 172032 175104 Ns
2             # of NBs
384 256 128 256 384 192 288 320 384 384 768 1024 768 896 768 1024 512 384 640 768 896 960 1024 1152 1280 384 640 960 768 640 256  960 512 768 1152         NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
4 2 8 1 2 1        Ps
4 8 2 2 4        Qs
16.0         threshold
1            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
2 8          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
2 0 2          BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
1            SWAP (0=bin-exch,1=long,2=mix)
192          swapping threshold
1            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

SC21

The following is the HPL .dat configuration file template from SC20. Machine Spec : 8 Tesla A100

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
2            # of problems sizes (N)
346122 348122 352122 Ns
2             # of NBs
384 256 128 NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
4 2 8 1 2 1        Ps
4 8 2 2 4        Qs
16.0         threshold
1            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
2 8          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
2 0 2          BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
1            SWAP (0=bin-exch,1=long,2=mix)
192          swapping threshold
1            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

Binder

#!/bin/bash
cd $1
# Global settings

export UCX_RNDV_SCHEME=put_zcopy
export UCX_IB_PCI_RELAXED_ORDERING=on
export UCX_MEMTYPE_CACHE=n
export UCX_MAX_RNDV_RAILS=1
export UCX_RNDV_THRESH=8192

APP="$2"
me=`hostname`
lrank=$OMPI_COMM_WORLD_LOCAL_RANK

case ${lrank} in
0)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    source ../source.sh
    export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
1)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
2)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
3)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
4)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
5)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
6)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
7)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
8)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP"
   source ../source.sh
    export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
9)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
10)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
11)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
12)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
13)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
14)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
15)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
16)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    source ../source.sh
    export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
17)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
18)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
19)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
20)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
21)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
22)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
23)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;

24)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    source ../source.sh
    export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
25)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
26)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
27)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
28)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
29)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
30)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
31)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
32)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    source ../source.sh
    export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
33)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
34)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
35)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
36)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
37)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
38)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
39)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
40)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    source ../source.sh
    export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
41)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
42)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
43)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
44)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
45)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
46)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
47)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
48)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    source ../source.sh
    export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
49)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
50)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
51)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
52)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
53)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
54)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
55)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
56)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    source ../source.sh
    export CUDA_VISIBLE_DEVICES=2; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
57)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP"
    export CUDA_VISIBLE_DEVICES=3; numactl --cpunodebind=0 taskset -c 0-23 $APP
  ;;
58)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=7; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
59)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP"
    export CUDA_VISIBLE_DEVICES=6; numactl --cpunodebind=2  taskset -c 48-71 $APP
  ;;
60)
#ldd $APP
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=0; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
61)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP"
    export CUDA_VISIBLE_DEVICES=1; numactl --cpunodebind=1 taskset -c 24-47 $APP
  ;;
62)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=5; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;
63)
echo "host=$me rank= $OMPI_COMM_WORLD_RANK lrank = $lrank cores=$CPU_CORES_PER_RANK bin=$APP"
  #set GPU and CPU affinity of local rank
    echo "export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP"
    export CUDA_VISIBLE_DEVICES=4; numactl --cpunodebind=3 taskset -c 72-95 $APP
  ;;

esac