Using OSCAR to Win the Cluster Challenge University of Alberta Paul Greidanus, Gordon Klok 1
The Cluster Challenge • New challenge event introduced at Super CompuFng 2007. • Six team members without a undergraduate degree, a faculty coach from the insFtuFon. • CompeFFon consisted of running the HPCC benchmarks and three applicaFons, GAMESS, POP (Parallel Ocean Program), POV‐Ray. • Power limit of 26 Amps 2
Team Alberta From leV to right, Gordon Klok, Chris Kuethe, Paul Greidanus, Stephen PorFllo, Andrew Nisbet, Paul Lu, Antoine Filion Not Pictured: Bob Beck, Cameron Macdonel 3
Our Cluster • Our vendor partner SGI supplied 5 AlFx XE310 servers, • AlFx XE310 1U chassis contains two nodes sharing a single power supply. Each node consisted of: – Two quad‐core Intel Xeon 5355 CPUs running at 2.67Ghz with 8MB of L2 cache. – 250GB SATA drive. – 16GB of RAM, we later added 8 GB to two of the nodes to get 24GB of ram. 4
The compeFtors SC07 Cluster Challenge mem/ interconnect operating T eam Sponsor chip nodes sockets cores node system 20 Gbit Xeon SGI Alberta 8 16 64 16 GB SL Infiniband 2.66 Ghz Aspen Dual 10 Gbit Xeon Colorado 6 12 48 8 GB CentOS Systems Infiniband 2.66 Ghz Myrinet 10G Xeon Apple Indiana 9 18 36 8 GB OS X over 10GE 3 Ghz 10 Gbit Xeon ASUS NTHU 6 12 48 12 GB CentOS Infiniband 2.83 Ghz 20 Gbit Opteron HP Purdue 14 28 54 4 GB CentOS Infiniband 2.2 Ghz 5 Gbit Xeon Dell Stonybrook 13 26 100 8 GB Debian Infiniband 1.86 Ghz Courtesy: Brent Gorda 5
Why OSCAR? • OSCAR allowed us to deploy the cluster quickly and focus on the important thing: ApplicaFons. – Not everyone used a product like OSCAR. • Changes can be pushed to nodes quickly. • Dealt with all the details. SGE, Ganglia, SystemImager • Used ganglia as part of our visualizaFon strategy. 6
Cluster preparaFon • Installed the head node using ScienFfic Linux 4.5. OSCAR 5.0 used to build client image and push it out to the nodes. – Sun Grid Engine chosen over Maui/Torque. • Not perfect, needed new kernel, needed SystemImager update. • No 3 rd party compiled MPI libraries, and no Infiniband. 7
OSCAR LimitaFons and Future Features • C3 – Fmeouts with dead nodes • SGE LimitaFons – Does not work aVer reimage • MPI LimitaFons – No IB, no OFED, no AlternaFve compilers • IPMI support – reboot nodes, predict failures. • ApplicaFon Checkpoint/Restart – Linux is weak here out of the box – This could be a killer‐feature. • No non‐headnode /home NFS possible. 8
Concluding remarks Clusters have evolved, the tools need to keep evolving. Students can use tools like Oscar to build clusters, it’s not rocket surgery Using the clusters with high performance interconnect, and with non‐standard configuraFons needs to be addressed. 9
Acknowledgements Brent Gorda and the cluster challenge organizers. SGI Dan St.‐Germain, Jimmy Scoi, MarFn Pinard, Marcel Bourque, John Baron, Kah‐Song Cho, corporate CompuFng Science at the University of Alberta Cam Macdonell, Yang Wang, Neil Burch, Steve Sutphen, Jonathan Schaeffer, Jill Bagwe, Sheryl Mayko, Carol Smith, Ruth Oh University of Alberta Alex Brown, Paul Myers, Mariusz Klobukowski, Ron Senda, Mark Gordon, Greg Lukeman, Keith Thompson, Yimin Liu, Asia Embroidery 10
Th hi is y ye ea ar r, , S SC0 08 8 i in nv vi ite es s t te ea am ms s o of f u un nd de er rg gr ra ad du ua at te e s st tu ud den nt ts s t to o SC08 r ri is se e to o a n ne ew w Cl lu us st te er r Ch ha al ll le en ng ge e. . Cluster The S SC0 08 8 C Cl lu us st ter r Ch ha al ll le en ng ge e is a showcase event in which teams of next-generation high performance computing talent harness the incredible power of current-generation cluster Challenge hardware. This challenge brings together an international field of teams that share a "need for speed" and a willingness to earn the top prize. The event promises to be exciting, educational and a truly rewarding experience for all involved. showcasing the Taking place N No ov v. . 1 15 5- -2 21 1, , 2 20 00 08 8, , a at t t th he e A Au us st ti in n C Co on nv ve en nt ti io on n Center in Austin, TX, six teams Ne ex xt t G Ge en ne er ra ati io on n o of f of undergraduates working with a faculty adviser and cluster vendors will assemble, test H Hi ig gh h- -P Pe er rf fo or rm ma an nc ce e and tune their machines until the green flag drops on Monday night as the Exhibit Opening Gala is winding down. The race now begins as the teams are given data sets for the C Com mp pu ut ti ing g contest. With CPUs roaring, teams will be off to analyze and optimize the workload to achieve maximum points over the next two days. P Pr ro of fe es ss si io on na al ls s In full view of conference attendees, teams will execute the prescribed workload while Au us sti in n C Co on nv ve en nt ti io on n C Ce en nt te er r showing progress and science visualization output on large displays in their areas. As they No N ov ve em mb be er r 1 15 5- -2 21 1, , 2 20 00 08 8 race to the finish, the team with the most points will earn the checkered flag - presented at the awards ceremony on Thursday. After the checkered flag drops, teams are invited to partake in the side-show, where they can spin their wheels and show off what they've learned and what they can do with the equipment with demonstrations that defy gravity, simulate blood flow, visualize earthquakes, search the genome, or perhaps even model a cure for AIDS. Ar re e y yo ou u u up p t to o t th he e c ch ha al ll len ng ge e? ? Im mp po or rt ta an nt t D Da at te es s: : SC C0 08 8 S Sp po on ns so or rs s: : To o b be e c co on ns si id de er re ed d f fo or r o on ne e o of t th he e s si ix x t te ea am ms s: : En nt tr ri ie es s Due: : Ju uly y 3 31 1, , 2 20 00 08 8 IE EE EE E C Com mp pu ut te er r S So oc ci ie et ty y 1) Visit http://sc08.supercomputing.org for rules AC A CM M S SI IG GA AR RC CH H N No ot ti if fi ic ca at ti io on n: : A Au ug gu us st t 1 15 5, , 2 20 00 08 8 and entry details. 2) Form a team of up to six undergraduates, plus a Qu ue es st ti io on ns s: : vi is si it: faculty supervisor. h ht tt tp p: :/ // /s sc c0 08 8. .s su up pe er rc co om mp pu ut ti in ng g. .o or rg g 3) Contact a cluster vendor for equipment and support. em ma ai il l: : cl lu us st te er r- -c ch ha al ll le en ng ge e@ @i inf fo o. . 4) Submit proposal. s su up pe er rc co om mp pu ut ti in ng g. .o or rg g 11
Recommend
More recommend