IMPLEMENTING COLLOCATION GROUPS #1 About Draper Lab An - - PDF document

implementing collocation groups 1 about draper lab
SMART_READER_LITE
LIVE PREVIEW

IMPLEMENTING COLLOCATION GROUPS #1 About Draper Lab An - - PDF document

IMPLEMENTING COLLOCATION GROUPS #1 About Draper Lab An independent, not-for-profit corporation dedicated to applied research, engineering development, education, and technology transfer Spun off from the Massachusetts Institute of


slide-1
SLIDE 1

IMPLEMENTING COLLOCATION GROUPS #1

slide-2
SLIDE 2

IMPLEMENTING COLLOCATION GROUPS #2

About Draper Lab

  • An independent, not-for-profit corporation

dedicated to applied research, engineering development, education, and technology transfer

– Spun off from the Massachusetts Institute of Technology in 1973 – Expertise in guidance, navigation and control systems – Early applications: U.S. Navy's Fleet Ballistic Missile Program and NASA's Apollo Program

slide-3
SLIDE 3

IMPLEMENTING COLLOCATION GROUPS #3

Agenda

  • Why collocation groups?
  • ITSM code components
  • Additional tools
  • A process to move 40TB
  • Conclusions
slide-4
SLIDE 4

IMPLEMENTING COLLOCATION GROUPS #4

Why do I want Collocation groups?

  • Number of nodes vs. number of slots
  • 1. Nodes < slots; collocate by node or

filespace

  • 2. Nodes > slots; can't collocate
  • If collocate is on, no control of node mixing,

still 1 mount per node during migration

  • Node size vs. tape capacity
  • 1. Size > tape cap.; collocation fills tape
  • 2. Size < tape cap.; collocation wastes tape
  • Collocation by group makes "supernodes"

which work for both case 1's

slide-5
SLIDE 5

IMPLEMENTING COLLOCATION GROUPS #5

Server Configuration

2 33,000,000 4,300,000 For desktops SD2 40 230,000,000 45,400,000 For desktops SD 18 47,500,000 8,500,000 For servers SS 6 5,000 5,500 Library manager LM Physical TB Number of files DB size (pages) Function Acronym

  • Sun v480 4 processors, Solaris 9
  • Raw disk for db, log, backuppool, no raid
  • TSM server code at 5.3.1.3
slide-6
SLIDE 6

IMPLEMENTING COLLOCATION GROUPS #6

The starting SD server mess

  • Volumes

– 417 to process – Average nodes / volume is 188 – Max is 713 – 25 are over 500

  • Nodes

– 1635 nodes – Average 48 volumes / node – Max is 132 – 25 are over 100

slide-7
SLIDE 7

IMPLEMENTING COLLOCATION GROUPS #7

New server commands

  • Def, del, upd, query collocgroup

– names and describes the group

  • Def, del collocmember

– adds a node to a group

  • Query nodedata

– Very fast!! – Lists tapes which have files for a node or group, no separation by filespace

  • Upd stgpool colloc=group
slide-8
SLIDE 8

IMPLEMENTING COLLOCATION GROUPS #8

The secret perl scripts

  • 4 scripts in the bin directory, not documented
  • Used only defgroups.pl

– Analyzes ‘q occ’ data, creates define statements to build the groups – Execution

  • ./defgroups.pl id pwd domain size [execute]
slide-9
SLIDE 9

IMPLEMENTING COLLOCATION GROUPS #9

Fix the defgroups.pl SQL

  • Eliminate stgpool subselect

– Change stgpool subselect to in list, name your tape stgpool

  • Eliminate join between nodes & occupancy

– check domain_name with a subselect

  • Eliminate check for a collocgroup

– It is always null while implementing

  • Runtime drops from “beyond the limits of my

patience” to 5 minutes

slide-10
SLIDE 10

IMPLEMENTING COLLOCATION GROUPS #10

Using Query Nodedata

  • SQL generates a command for each node

– Also 'q nodedata * stg=pool_name'

  • Run file from step1, direct output to 2nd file

– ‘q nodedata’ doesn’t have a corresponding SQL table (the very expensive volumeusage table is close)

  • Edit output to get only node name and

volume name

  • Load into MySql
  • Analyze
slide-11
SLIDE 11

IMPLEMENTING COLLOCATION GROUPS #11

Tools

  • MySQL desktop development server

– Very handy to have! – No select for nodedata – Do complex joins without killing the server – http://www.mysql.com/

  • UltraEdit editor

– Sorting, column editing, hex editing – http://www.ultraedit.com/

slide-12
SLIDE 12

IMPLEMENTING COLLOCATION GROUPS #12

Preliminaries

  • Decide target number of tapes in each group

– Convert it to 'size in megs' for defgroups.pl – Goal is 4 tapes – We compress at the client, so lto2 capacity is 200G, 'size' is 800,000

  • Run defgroups.pl on domain(s)
  • Execute the commands from defgroups.pl
  • 'Update stgpool <name> colloc=group'
  • Mark all current tapes readonly

– Stops migrating to uncollocated filling tapes – Makes SQL easier

  • Have as many scratch tapes as groups
slide-13
SLIDE 13

IMPLEMENTING COLLOCATION GROUPS #13

A process to minimize tape mounts

  • By turning on collocation by group, a move or

reclaim within the tapepool will need an

  • utput tape mount for each collocgroup on

the input tape. – Potentially very slow, stressful for the tape drives

  • Solution is to move data from tape to devt=file

pools on disk where files are put into groups, then migrate back to tape.

slide-14
SLIDE 14

IMPLEMENTING COLLOCATION GROUPS #14

Storage pools

  • 3 sequential pools on disk

– seqdisk3, seqdisk4, seqdisk5

  • 2 pools receive data from tapes

– Seqdisk3 & 4 each have 2 69G volumes – Not collocated, moves don’t reconstruct

  • Seqdisk5 receives data from seqdisk3 & 4

– 170 8GB volumes on 10 146GB drives, each with its own file system. – Collocated by group, moves reconstruct

slide-15
SLIDE 15

IMPLEMENTING COLLOCATION GROUPS #15

The schedules and scripts

  • Each script is executed every 10 minutes by a

schedule – 6 similar schedules for each script

  • For script a, run at 00:00, 00:10, 00:20, etc.
  • T4_VOLUMES_ODD, move odd numbered volumes

to seqdisk3

  • T4_VOLUMES_EVEN, move even numbered

volumes to seqdisk4

  • T4_MOVES, moves seqdisk3 & 4 volumes to

seqdisk5

  • T4_MIGRATES, starts migration of seqdisk5 to tape
  • T4_VOLUMES_DIRECT move some tapes direct to

tape

slide-16
SLIDE 16

IMPLEMENTING COLLOCATION GROUPS #16

SQL to make the scripts

  • Use a file as a macro to create the script
  • The T4_VOLUMES* script has a prolog with logic

– checks if backuppool migration is running, exit if yes – checks if SEQDISK3 is being used, exit if yes – checks for space in SEQDISK3, if yes then run

  • Run SQL to select odd/even volumes ordered by

pct_utilized and append it to the file

  • For each volume, need 4 lines in the script

– test if the volumes is still full or filling – goto script lines to issue a move command – issue the move command – exit

slide-17
SLIDE 17

IMPLEMENTING COLLOCATION GROUPS #17

Other methods to move all that data

  • Direct tape to tape within the pool

– Not as bad as I had feared! – Analyzed which tapes had the fewest groups

  • n them and moved them tape to tape.
  • Of 278 tapes, 219 have 30 or more (42 max)
  • Move nodedata direct tapes to tapes

– Move nodedata list-of-all-the-nodes-in-group – Need extra scratch tapes because source tapes aren't emptied quickly

slide-18
SLIDE 18

IMPLEMENTING COLLOCATION GROUPS #18

The results so far

  • Started on Aug-5, results as of Sep-8
  • Volumes

– 160 to process – Average nodes / volume is 188 – Max is 485 – 10 are over 400

  • Nodes

– 1629 nodes – Average 22 volumes / node – Max is 63 – 4 are over 50

slide-19
SLIDE 19

IMPLEMENTING COLLOCATION GROUPS #19

Summary

  • Match your process to your resources

– Does your disk write speed match your tape read speed?

  • The more groups you have, the longer a tape

to tape move or reclaim will take.

  • Do 2 processes?

– Few cg's on a tape, do tape to tape. – Lots of cg's on a tape, do tape to file.