ENVI: ENVI software for processing and analyzing geospatial data

Overview of package
1. General usage
Availability of package by cluster
Running IDL in batch mode
Multithreading and IDL

Overview of package

General information about package
Package:	ENVI
Description:	ENVI software for processing and analyzing geospatial data
For more information:	https://www.l3harrisgeospatial.com/Software-Technology/Enterprise-Solutions
Categories:	SoftwareDevelopment NumericalMethods Visualization
License:	Proprietary

General usage information

ENVI (ENvironment for Visualizing Images) is a software application used to process and analyze geospatial imagery. It is commonly used by remote sensing professionals and image analysts. ENVI, version 5.7, includes IDL (Interactive Data Language), version 8.9 This module will add the envi, idl and related commands to your path. NOTE: If running on an HPCC and requesting less than all cores on a node, you should be setting IDL_CPU_TPOOL_NTHREADS environmental variable to be equal to SLURM_NTASKS for best performance; for more information see https://hpcc.umd.edu/hpcc/help/software/envi.html

Available versions of the package ENVI, by cluster

This section lists the available versions of the package ENVIon the different clusters.

Available versions of ENVI on the Zaratan cluster

Available versions of ENVI on the Zaratan cluster
Version	Module tags	CPU(s) optimized for	GPU ready?
5.7	envi/5.7	x86_64	Y

Running IDL in batch mode

Although for short computations on a personal workstation the interactive mode of IDL is nice, sometimes one wishes to have IDL work in a batch-style mode. This is a requirement for using IDL on the HPC clusters, where computationally intensive processes must be submitted to the scheduler for running on the compute nodes.

The easiest method is probably to invoke idl in batch mode on a simple script file which then uses the .RUN IDL executive command to run a main program file containing the real code you wish to run. I.e., create a main program file the code you wish to run. For example, the following is a simple main program to print factorials, which we will place in a file called factorial_test.pro:

f = 1
max = 7
for k=1,max do begin
        f = k * f
        print, k, f
endfor
end

You should also create a simple IDL batch script to invoke this program, e.g. the batch_test.pro file below:

.run factorial_test.pro
exit

Be sure to include the exit command if you wish for IDL to terminate when the main program is finished. This is especially important if submitting to the HPC compute nodes via sbatch, as otherwise your job will not terminate when the calculation is finished, but wait until it hits the wall time limit, wasting CPU cycles (and charging your allocation account)

You can then invoke your main program from the unix command line with something like idl batch_test.pro (NOTE: you can omit the .pro extension in the idl and .run commands if so desired; the .pro extension will be used by default.) To use with sbatch, a job script like

#!/bin/bash
#SBATCH -ntasks=1
#SBATCH -t 15
#SBATCH --mem-per-cpu=1

. ~/.profile

module load idl

echo "Starting job ..."
idl batch_test

The two separate .pro files are required in general because in IDL batch mode, which batch_test.pro runs under, each line is read and executed immediately. Control statements, e.g. the for loop in our factorial_test example, often span multiple lines --- although you can use ampersands and dollar signs to continue lines, this quickly becomes messy. In main program parsing mode, such as used for factorial_test, the entire program unit is read and compiled as a single unit. Since IDL code run in batch mode or submitted to the HPC compute nodes is assumed to be complicated, it is probably best to use this two file approach.

a name="mthread">

Multithreading and IDL

Recent versions of IDL support multi-threading, at least to some degree. This means that on systems with more than one CPU and/or multiple cores per CPU socket, IDL will use multiple threads to do work in parallel when the application determines it is advantageous to do so. This is automatic, and invisible to the user except for hopefully improved performance.

By default, when IDL encounters a calculation that would benefit from multi-threading, it will generate a thread for every CPU core on the system it is running. This is probably the desired behavior when IDL is the only (or the main) program running on a system. E.g., on a desktop or dedicated compute node.

But on some HPC systems, the available cores per node can be somewhat large (e.g. nodes on DT2 have at least 20 cores), and might be more than the optimal degree of parallelization for certain problems. On these systems, for certain problems, you might not wish to allocation all the cores on the node to the system (since you will be charged for the time on those cores). However, if you restrict the number of cores that you are requesting, you must tell this restriction to IDL as well, because otherwise IDL will just try to use all the cores anyway, which could adversely affect performance.

E.g., assume that you have determined through trial and error that for the particular type of problem you are working on, the greatest efficiency occurs for 6 cores (e.g., a 6 core job is 40% faster than a 4 core job, but an 8 core job is only a few percent faster than a 6 core job). So you submit a bunch of 6 core idl jobs, with the --share flag so that you are not charged for all the cores on the node. And suppose that 3 of your jobs end up on the same 20 core node (since you told slurm these are 6 core jobs, three will fit on a 20 core node). If you do not tell IDL to restrict itself to 6 cores, each jobs will determine that there are 20 cores on the node, and assume it can use all of them, and multithreaded calculations will be split into 20 tasks. In addition to any performance hit from using too many tasks for each job, you also have up to 60 tasks trying to run on a 20 core system, which will further degrade performance.

To tell IDL to use less than all the cores it finds on the system, you need to set the IDL_CPU_TPOOL_NTHREADS environmental variable. By default, it is 0, which means use all cores on the system. You should set it equal to the number of cores your requested from Slurm; we recommend that you set it to the SLURM_NTASKS to ensure consistency between what you requested from Slurm and what you tell IDL. E.g.,

#!/bin/tcsh
#SBATCH -n 8
#SBATCH --share
#SBATCH -t 2:00
#SBATCH --mem=8000

module load idl
setenv IDL_CPU_TPOOL_NTHREADS $SLURM_NTASKS

idl -e "myprogram.idl"

If you change the number of cores (the value after -n), the value of IDL_CPU_TPOOL_NTHREADS will automatically agree.