IMPLEMENTING COLLOCATION GROUPS #1
IMPLEMENTING COLLOCATION GROUPS #1 About Draper Lab An - - PDF document
IMPLEMENTING COLLOCATION GROUPS #1 About Draper Lab An - - PDF document
IMPLEMENTING COLLOCATION GROUPS #1 About Draper Lab An independent, not-for-profit corporation dedicated to applied research, engineering development, education, and technology transfer Spun off from the Massachusetts Institute of
IMPLEMENTING COLLOCATION GROUPS #2
About Draper Lab
- An independent, not-for-profit corporation
dedicated to applied research, engineering development, education, and technology transfer
– Spun off from the Massachusetts Institute of Technology in 1973 – Expertise in guidance, navigation and control systems – Early applications: U.S. Navy's Fleet Ballistic Missile Program and NASA's Apollo Program
IMPLEMENTING COLLOCATION GROUPS #3
Agenda
- Why collocation groups?
- ITSM code components
- Additional tools
- A process to move 40TB
- Conclusions
IMPLEMENTING COLLOCATION GROUPS #4
Why do I want Collocation groups?
- Number of nodes vs. number of slots
- 1. Nodes < slots; collocate by node or
filespace
- 2. Nodes > slots; can't collocate
- If collocate is on, no control of node mixing,
still 1 mount per node during migration
- Node size vs. tape capacity
- 1. Size > tape cap.; collocation fills tape
- 2. Size < tape cap.; collocation wastes tape
- Collocation by group makes "supernodes"
which work for both case 1's
IMPLEMENTING COLLOCATION GROUPS #5
Server Configuration
2 33,000,000 4,300,000 For desktops SD2 40 230,000,000 45,400,000 For desktops SD 18 47,500,000 8,500,000 For servers SS 6 5,000 5,500 Library manager LM Physical TB Number of files DB size (pages) Function Acronym
- Sun v480 4 processors, Solaris 9
- Raw disk for db, log, backuppool, no raid
- TSM server code at 5.3.1.3
IMPLEMENTING COLLOCATION GROUPS #6
The starting SD server mess
- Volumes
– 417 to process – Average nodes / volume is 188 – Max is 713 – 25 are over 500
- Nodes
– 1635 nodes – Average 48 volumes / node – Max is 132 – 25 are over 100
IMPLEMENTING COLLOCATION GROUPS #7
New server commands
- Def, del, upd, query collocgroup
– names and describes the group
- Def, del collocmember
– adds a node to a group
- Query nodedata
– Very fast!! – Lists tapes which have files for a node or group, no separation by filespace
- Upd stgpool colloc=group
IMPLEMENTING COLLOCATION GROUPS #8
The secret perl scripts
- 4 scripts in the bin directory, not documented
- Used only defgroups.pl
– Analyzes ‘q occ’ data, creates define statements to build the groups – Execution
- ./defgroups.pl id pwd domain size [execute]
IMPLEMENTING COLLOCATION GROUPS #9
Fix the defgroups.pl SQL
- Eliminate stgpool subselect
– Change stgpool subselect to in list, name your tape stgpool
- Eliminate join between nodes & occupancy
– check domain_name with a subselect
- Eliminate check for a collocgroup
– It is always null while implementing
- Runtime drops from “beyond the limits of my
patience” to 5 minutes
IMPLEMENTING COLLOCATION GROUPS #10
Using Query Nodedata
- SQL generates a command for each node
– Also 'q nodedata * stg=pool_name'
- Run file from step1, direct output to 2nd file
– ‘q nodedata’ doesn’t have a corresponding SQL table (the very expensive volumeusage table is close)
- Edit output to get only node name and
volume name
- Load into MySql
- Analyze
IMPLEMENTING COLLOCATION GROUPS #11
Tools
- MySQL desktop development server
– Very handy to have! – No select for nodedata – Do complex joins without killing the server – http://www.mysql.com/
- UltraEdit editor
– Sorting, column editing, hex editing – http://www.ultraedit.com/
IMPLEMENTING COLLOCATION GROUPS #12
Preliminaries
- Decide target number of tapes in each group
– Convert it to 'size in megs' for defgroups.pl – Goal is 4 tapes – We compress at the client, so lto2 capacity is 200G, 'size' is 800,000
- Run defgroups.pl on domain(s)
- Execute the commands from defgroups.pl
- 'Update stgpool <name> colloc=group'
- Mark all current tapes readonly
– Stops migrating to uncollocated filling tapes – Makes SQL easier
- Have as many scratch tapes as groups
IMPLEMENTING COLLOCATION GROUPS #13
A process to minimize tape mounts
- By turning on collocation by group, a move or
reclaim within the tapepool will need an
- utput tape mount for each collocgroup on
the input tape. – Potentially very slow, stressful for the tape drives
- Solution is to move data from tape to devt=file
pools on disk where files are put into groups, then migrate back to tape.
IMPLEMENTING COLLOCATION GROUPS #14
Storage pools
- 3 sequential pools on disk
– seqdisk3, seqdisk4, seqdisk5
- 2 pools receive data from tapes
– Seqdisk3 & 4 each have 2 69G volumes – Not collocated, moves don’t reconstruct
- Seqdisk5 receives data from seqdisk3 & 4
– 170 8GB volumes on 10 146GB drives, each with its own file system. – Collocated by group, moves reconstruct
IMPLEMENTING COLLOCATION GROUPS #15
The schedules and scripts
- Each script is executed every 10 minutes by a
schedule – 6 similar schedules for each script
- For script a, run at 00:00, 00:10, 00:20, etc.
- T4_VOLUMES_ODD, move odd numbered volumes
to seqdisk3
- T4_VOLUMES_EVEN, move even numbered
volumes to seqdisk4
- T4_MOVES, moves seqdisk3 & 4 volumes to
seqdisk5
- T4_MIGRATES, starts migration of seqdisk5 to tape
- T4_VOLUMES_DIRECT move some tapes direct to
tape
IMPLEMENTING COLLOCATION GROUPS #16
SQL to make the scripts
- Use a file as a macro to create the script
- The T4_VOLUMES* script has a prolog with logic
– checks if backuppool migration is running, exit if yes – checks if SEQDISK3 is being used, exit if yes – checks for space in SEQDISK3, if yes then run
- Run SQL to select odd/even volumes ordered by
pct_utilized and append it to the file
- For each volume, need 4 lines in the script
– test if the volumes is still full or filling – goto script lines to issue a move command – issue the move command – exit
IMPLEMENTING COLLOCATION GROUPS #17
Other methods to move all that data
- Direct tape to tape within the pool
– Not as bad as I had feared! – Analyzed which tapes had the fewest groups
- n them and moved them tape to tape.
- Of 278 tapes, 219 have 30 or more (42 max)
- Move nodedata direct tapes to tapes
– Move nodedata list-of-all-the-nodes-in-group – Need extra scratch tapes because source tapes aren't emptied quickly
IMPLEMENTING COLLOCATION GROUPS #18
The results so far
- Started on Aug-5, results as of Sep-8
- Volumes
– 160 to process – Average nodes / volume is 188 – Max is 485 – 10 are over 400
- Nodes
– 1629 nodes – Average 22 volumes / node – Max is 63 – 4 are over 50
IMPLEMENTING COLLOCATION GROUPS #19
Summary
- Match your process to your resources
– Does your disk write speed match your tape read speed?
- The more groups you have, the longer a tape
to tape move or reclaim will take.
- Do 2 processes?