Introduction to SDSC systems and data analytics software packages - - PowerPoint PPT Presentation

introduction to sdsc systems and data analytics software
SMART_READER_LITE
LIVE PREVIEW

Introduction to SDSC systems and data analytics software packages - - PowerPoint PPT Presentation

Introduction to SDSC systems and data analytics software packages Mahidhar Tatineni (mahidhar@sdsc.edu) SDSC Summer Institute August 05, 2013 2013 Summer Institute: Discover Big Data, August 5-9, San Diego,


slide-1
SLIDE 1

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO


 Introduction to SDSC systems and data analytics software packages


  • Mahidhar Tatineni (mahidhar@sdsc.edu)

SDSC Summer Institute August 05, 2013

slide-2
SLIDE 2

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Getting Started

  • System Access – Logging in
  • Linux/Mac – Use available ssh clients.
  • ssh clients for windows – Putty, Cygwin
  • http://www.chiark.greenend.org.uk/~sgtatham/putty/
  • Login hosts for the machines:
  • gordon.sdsc.edu, trestles.sdsc.edu
  • For NSF Resources – Users can login via the

XSEDE user portal:

  • https://portal.xsede.org/
slide-3
SLIDE 3

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Access Via Science Gateways (XSEDE)

  • Community-developed set of tools, applications, and data that

are integrated via a portal.

  • Enables researchers of particular communities to use HPC

resources through portals without the complication of getting familiar with the hardware and software details. Allows them to focus on the scientific goals.

  • CIPRES gateway hosted by SDSC PIs enables large scale

phylogenetic reconstructions using applications such as MrBayes, Raxml, and Garli. Enabled ~200 publications in 2012 and accounts for a significant fraction of the XSEDE users.

  • NSG portal hosted by SDSC PIs enables HPC jobs for

neuroscientists.

slide-4
SLIDE 4

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data Transfer (scp, globus-url-copy)

  • scp is o.k. to use for simple file transfers and

small file sizes (<1GB). Example:

$ scp w.txt train40@gordon.sdsc.edu:/home/train40/w.txt 100% 15KB 14.6KB/s 00:00

  • globus-url-copy for large scale data transfers

between XD resources (and local machines w/ a globus client).

  • Uses your XSEDE-wide username and password
  • Retrieves your certificate proxies from the central server
  • Highest performance between XSEDE sites, uses striping

across multiple servers and multiple threads on each server.

4

slide-5
SLIDE 5

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data Transfer – globus-url-copy

  • Step 1: Retrieve certificate proxies:

$ module load globus $ myproxy-logon –l xsedeusername Enter MyProxy pass phrase: A credential has been received for user xsedeusername in /tmp/ x509up_u555555.

  • Step 2: Initiate globus-url-copy:

$ globus-url-copy -vb -stripe -tcp-bs 16m -p 4 gsiftp:// gridftp.ranger.tacc.teragrid.org:2811///scratch/00342/username/test.tar gsiftp:// trestles-dm2.sdsc.xsede.org:2811///oasis/scratch/username/temp_project/test- gordon.tar Source: gsiftp://gridftp.ranger.tacc.teragrid.org:2811///scratch/00342/username/ Dest: gsiftp://trestles-dm2.sdsc.xsede.org:2811///oasis/scratch/username/ temp_project/ test.tar -> test-gordon.tar

5

slide-6
SLIDE 6

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data Transfer – Globus Online

  • Works from Windows/Linux/Mac via globus
  • nline website:
  • https://www.globusonline.org
  • Gordon, Trestles, and Triton endpoints already
  • exist. Authentication can be done using XSEDE-

wide username and password for the NSF resources.

  • Globus Connect application (available for

Windows/Linux/Mac can turn your laptop/ desktop into an endpoint.

6

slide-7
SLIDE 7

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data Transfer – Globus Online

  • Step 1: Create a globus online account

7

slide-8
SLIDE 8

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data Transfer – Globus Online

  • 8
slide-9
SLIDE 9

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data Transfer – Globus Online

  • Step 2: Set up local machine as endpoint using

Globus Connect.

9

slide-10
SLIDE 10

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data Transfer – Globus Online

  • Step 3: Pick Endpoints and Initiate Transfers!

10

slide-11
SLIDE 11

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data Transfer – Globus Online

  • 11
slide-12
SLIDE 12

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SDSC HPC Resources: 
 Running Jobs

slide-13
SLIDE 13

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Running Batch Jobs

  • All clusters use the TORQUE/PBS resource

manager for running jobs. TORQUE allows the user to submit one or more jobs for execution, using parameters specified in a job script.

  • NSF resources have the Catalina scheduler to

control the workload.

  • Copy hands on examples directory from:

cp –r /home/diag/SI2013 .

slide-14
SLIDE 14

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon : Filesystems

  • Lustre filesystems – Good for scalable large block I/O
  • Accessible from both native and vSMP nodes.
  • /oasis/scratch/gordon – 1.6 PB, peak measured

performance ~50GB/s on reads and writes.

  • /oasis/projects – 400TB
  • SSD filesystems
  • /scratch local to each native compute node – 300 GB

each.

  • /scratch on vSMP node – 4.8TB of SSD based filesystem.
  • NFS filesystems (/home)

14

slide-15
SLIDE 15

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon – Compiling/Running Jobs

  • Copy the SI2013 directory:

cp –r /home/diag/SI2013 ~/

  • Change to workshop directory:

cd ~/SI2013

  • Verify modules loaded:

$ module li Currently Loaded Modulefiles: 1) binutils/2.22 2) intel/2011 3) mvapich2_ib/1.8a1p1

  • Compile the MPI hello world code:

mpif90 -o hello_world hello_mpi.f90

  • Verify executable has been created:

ls -lt hello_world

  • rwxr-xr-x 1 mahidhar hpss 735429 May 15 21:22 hello_world
slide-16
SLIDE 16

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon: Compiling/Running Jobs

  • Job Queue basics:
  • Gordon uses the TORQUE/PBS Resource Manager with the Catalina scheduler to

define and manage job queues.

  • Native/Regular compute (Non-vSMP) nodes accessible via “normal” queue.
  • vSMP node accessible via “vsmp” queue.
  • Workshop examples illustrate use of both the native and vSMP

nodes.

  • hello_native.cmd – script for running hello world example on native nodes (using MPI).
  • hello_vsmp.cmd – script for running hello world example on vSMP nodes (using

OpenMP)

  • Hands on section of tutorial has several scenarios
slide-17
SLIDE 17

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon: Hello World on native (non-vSMP) nodes

  • The submit script (located in the workshop directory) is hello_native.cmd
  • #!/bin/bash

#PBS -q normal #PBS -N hello_native #PBS -l nodes=4:ppn=1:native #PBS -l walltime=0:10:00 #PBS -o hello_native.out #PBS -e hello_native.err #PBS -V ##PBS -M youremail@xyz.edu ##PBS -m abe #PBS –A gue998 cd $PBS_O_WORKDIR mpirun_rsh -hostfile $PBS_NODEFILE -np 4 ./hello_world

slide-18
SLIDE 18

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon: Output from Hello World

  • Submit job using “qsub hello_native.cmd”

$ qsub hello_native.cmd 845444.gordon-fe2.local

  • Output:

$ more hello_native.out node 2 : Hello world node 1 : Hello world node 3 : Hello world node 0 : Hello world Nodes: gcn-15-58 gcn-15-62 gcn-15-63 gcn-15-68

slide-19
SLIDE 19

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Compiling OpenMP Example

  • Change to the SI2013 directory:

cd ~/SI2013

  • Compile using –openmp flag:

ifort -o hello_vsmp -openmp hello_vsmp.f90

  • Verify executable was created:

ls -lt hello_vsmp

  • rwxr-xr-x 1 train61 gue998 786207 May 9 10:31

hello_vsmp

slide-20
SLIDE 20

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Hello World on vSMP node (using OpenMP)

  • hello_vsmp.cmd

#!/bin/bash #PBS -q vsmp #PBS -N hello_vsmp #PBS -l nodes=1:ppn=16:vsmp #PBS -l walltime=0:10:00 #PBS -o hello_vsmp.out #PBS -e hello_vsmp.err #PBS -V ##PBS -M youremail@xyz.edu ##PBS -m abe #PBS -A gue998 cd $PBS_O_WORKDIR export LD_PRELOAD=/opt/ScaleMP/libvsmpclib/0.1/lib64/libvsmpclib.so export PATH="/opt/ScaleMP/numabind/bin:$PATH" export KMP_AFFINITY=compact,verbose,0,`numabind --offset 8` export OMP_NUM_THREADS=8 ./hello_vsmp

slide-21
SLIDE 21

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Hello World on vSMP node (using OpenMP)

  • Code written using OpenMP

PROGRAM OMPHELLO INTEGER TNUMBER INTEGER OMP_GET_THREAD_NUM

  • !$OMP PARALLEL DEFAULT(PRIVATE)

TNUMBER = OMP_GET_THREAD_NUM() PRINT *, 'HELLO FROM THREAD NUMBER = ', TNUMBER !$OMP END PARALLEL

  • STOP

END

slide-22
SLIDE 22

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

vSMP OpenMP binding info 
 (from hello_vsmp.err file)

… …

OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {504} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {505} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {506} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {507} OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {508} OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {509} OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {511} OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {510}

slide-23
SLIDE 23

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Hello World (OpenMP version) Output

  • HELLO FROM THREAD NUMBER = 1

HELLO FROM THREAD NUMBER = 6 HELLO FROM THREAD NUMBER = 5 HELLO FROM THREAD NUMBER = 4 HELLO FROM THREAD NUMBER = 3 HELLO FROM THREAD NUMBER = 2 HELLO FROM THREAD NUMBER = 0 HELLO FROM THREAD NUMBER = 7 Nodes: gcn-3-11

23

slide-24
SLIDE 24

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Running on vSMP nodes - Guidelines

  • Identify type of job – serial (large memory), threaded (pthreads,
  • penmp), or MPI
  • Workshop directory has examples for the different scenarios. Hands
  • n section will walk through different types.
  • Use affinity in conjunction with automatic process placement utility

(numabind).

  • Optimized MPI (mpich2 tuned for vSMP) is available.
slide-25
SLIDE 25

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

vSMP Guidelines for Threaded Codes

  • 25
slide-26
SLIDE 26

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

OpenMP Matrix Multiply Example

  • #!/bin/bash

#PBS -q vsmp #PBS -N openmp_mm_vsmp #PBS -l nodes=1:ppn=16:vsmp #PBS -l walltime=0:10:00 #PBS -o openmp_mm_vsmp.out #PBS -e openmp_mm_vsmp.err #PBS -V ##PBS -M youremail@xyz.edu ##PBS -m abe #PBS -A gue998 cd $PBS_O_WORKDIR # Setting stacksize to unlimited. ulimit -s unlimited # ScaleMP preload library that throttles down unnecessary system calls. export LD_PRELOAD=/opt/ScaleMP/libvsmpclib/0.1/lib64/libvsmpclib.so source ./intel.sh export MKL_VSMP=1 # Path to NUMABIND. export PATH=/opt/ScaleMP/numabind/bin:$PATH np=8 tag=`date +%s` # Dynamic binding of OpenMP threads using numabind. export KMP_AFFINITY=compact,verbose,0,`numabind --offset $np` export OMP_NUM_THREADS=$np /usr/bin/time ./openmp-mm > log-openmp-nbind-$np-$tag.txt 2>&1

  • 26
slide-27
SLIDE 27

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Using SSD Scratch (Native Nodes)

  • #!/bin/bash

#PBS -q normal #PBS -N ior_native #PBS -l nodes=1:ppn=16:native #PBS -l walltime=00:25:00 #PBS -o ior_scratch_native.out #PBS -e ior_scratch_native.err #PBS -V ##PBS -M youremail@xyz.edu ##PBS -m abe #PBS -A gue998

  • cd /scratch/$USER/$PBS_JOBID
  • mpirun_rsh -hostfile $PBS_NODEFILE -np 4 $HOME/SI2013/IOR-gordon -i 1 -F –b 16g -t

1m -v -v > IOR_native_scratch.log

  • cp /scratch/$USER/$PBS_JOBID/IOR_native_scratch.log $PBS_O_WORKDIR/
slide-28
SLIDE 28

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Using SSD Scratch (Native Nodes)

  • Snapshot on the node during the run:

$ pwd /scratch/mahidhar/72251.gordon-fe2.local $ ls -lt total 22548292

  • rw-r--r-- 1 mahidhar hpss 5429526528 May 15 23:48 testFile.00000001
  • rw-r--r-- 1 mahidhar hpss 6330253312 May 15 23:48 testFile.00000003
  • rw-r--r-- 1 mahidhar hpss 5532286976 May 15 23:48 testFile.00000000
  • rw-r--r-- 1 mahidhar hpss 5794430976 May 15 23:48 testFile.00000002
  • rw-r--r-- 1 mahidhar hpss 1101 May 15 23:48 IOR_native_scratch.log
  • Performance from single node (in log file copied back):
  • Max Write: 250.52 MiB/sec (262.69 MB/sec)
  • Max Read: 181.92 MiB/sec (190.76 MB/sec)

28

slide-29
SLIDE 29

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Running Jobs on Trestles

  • All nodes on Trestles are identical. However, nodes

have 32 cores and can be shared.

  • Scheduler is again PBS + Catalina.
  • Two options
  • normal – Exclusive access to compute nodes. Allocation

charged for 32 cores / node.

  • shared – Shared access. Allocation charged based on

number of cores requested.

slide-30
SLIDE 30

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data Intensive Computing & Viz Stack

  • Gordon was designed to enable data intensive computing (details in

following slides). Additionally, some of the Triton nodes have large memory (up to 512 GB) to aid in such processing.

  • All clusters have access to the high speed lustre filesystem (Data Oasis:

details in separate presentation) with an aggregated peak measured data rate of 100GB/s.

  • Several libraries and packages have been installed to enable data intensive

computing and visualization:

  • R – Software environment for statistical computing and graphics.
  • Weka – Tools for data analysis and predictive modeling
  • RapidMiner – Environment for machine learning, data mining, text mining, and

predictive analytics

  • Octave
  • Matlab
  • VisIt
  • Paraview
  • The myHadoop infrastructure was developed to enable use Hadoop for

distributed data intensive analysis.

slide-31
SLIDE 31

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Hands On Example - Hadoop

  • Examples in /home/diag/SI2013/hadoop
  • Simple benchmark examples:
  • TestDFS_2.cmd – TestDFS example to benchmark HDFS

performance.

  • TeraSort_2.cmd – Sorting performance benchmark.
slide-32
SLIDE 32

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

TestDFS Example

  • PBS variables part:
  • #!/bin/bash

#PBS -q normal #PBS -N hadoop_job #PBS -l nodes=2:ppn=1 #PBS -o hadoop_dfstest_2.out #PBS -e hadoop_dfstest_2.err #PBS -V

slide-33
SLIDE 33

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

TestDFS example

  • Set up Hadoop environment variables:
  • # Set this to location of myHadoop on gordon

export MY_HADOOP_HOME="/opt/hadoop/contrib/myHadoop"

  • # Set this to the location of Hadoop on gordon

export HADOOP_HOME="/opt/hadoop"

  • #### Set this to the directory where Hadoop configs should be

generated # Don't change the name of this variable (HADOOP_CONF_DIR) as it is # required by Hadoop - all config files will be picked up from here # # Make sure that this is accessible to all nodes export HADOOP_CONF_DIR="/home/$USER/config"

slide-34
SLIDE 34

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

TestDFS Example

  • #### Set up the configuration

# Make sure number of nodes is the same as what you have requested from PBS # usage: $MY_HADOOP_HOME/bin/configure.sh -h echo "Set up the configurations for myHadoop"

  • ### Create a hadoop hosts file, change to ibnet0 interfaces - DO NOT REMOVE
  • sed 's/$/.ibnet0/' $PBS_NODEFILE > $PBS_O_WORKDIR/hadoophosts.txt

export PBS_NODEFILEZ=$PBS_O_WORKDIR/hadoophosts.txt

  • ### Copy over configuration files

$MY_HADOOP_HOME/bin/configure.sh -n 2 -c $HADOOP_CONF_DIR

  • ### Point hadoop temporary files to local scratch - DO NOT REMOVE -

sed -i 's@HADDTEMP@'$PBS_JOBID'@g' $HADOOP_CONF_DIR/hadoop- env.sh

slide-35
SLIDE 35

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

TestDFS Example

  • #### Format HDFS, if this is the first time or not a persistent instance

echo "Format HDFS" $HADOOP_HOME/bin/hadoop --config $HADOOP_CONF_DIR namenode -format echo sleep 1m #### Start the Hadoop cluster echo "Start all Hadoop daemons" $HADOOP_HOME/bin/start-all.sh #$HADOOP_HOME/bin/hadoop dfsadmin -safemode leave echo

slide-36
SLIDE 36

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

TestDFS Example

  • #### Run your jobs here

echo "Run some test Hadoop jobs" $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop- test-1.0.3.jar TestDFSIO -write

  • nrFiles 8 -fileSize 1024 -bufferSize 1048576

sleep 30s $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop- test-1.0.3.jar TestDFSIO -read - nrFiles 8 -fileSize 1024 -bufferSize 1048576

  • echo
  • #### Stop the Hadoop cluster

echo "Stop all Hadoop daemons" $HADOOP_HOME/bin/stop-all.sh echo

slide-37
SLIDE 37

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Running the TestDFS example

  • Submit the job:

qsub TestDFS_2.cmd

  • Check the job is running (qstat)
  • Once the job is running the hadoophosts.txt file

is created. For example on a sample run:

$ more hadoophosts.txt gcn-13-11.ibnet0 gcn-13-12.ibnet0

slide-38
SLIDE 38

2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Summary, Q/A

  • Access options – ssh clients, XSEDE User Portal
  • Data Transfer options – scp, globus-url-copy

(gridftp), globus online, and XSEDE User Portal File Manager.

  • Two queues – normal (native, non-vSMP) and

vsmp.

  • Follow guidelines for serial, OpenMP, Pthreads,

MPI jobs on the vSMP nodes.

  • Use SSD local scratch where possible. Excellent

for codes like Gaussian, Abaqus.

38