Adaptive Bulk Search: Solving Quadratic Unconstrained Binary - - PowerPoint PPT Presentation

β–Ά
adaptive bulk search solving quadratic unconstrained
SMART_READER_LITE
LIVE PREVIEW

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary - - PowerPoint PPT Presentation

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs Ryota Yasudo, Koji Nakano, Yasuaki Ito (Hiroshima University, Japan) Masaru Tatekawa, Ryota Katsuki, Takashi Yazane, Yoko Inaba (NTT DATA


slide-1
SLIDE 1

Ryota Yasudo, Koji Nakano, Yasuaki Ito (Hiroshima University, Japan) Masaru Tatekawa, Ryota Katsuki, Takashi Yazane, Yoko Inaba (NTT DATA Corporation, Japan)

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs

Presentation at ICPP2020

1

slide-2
SLIDE 2

Quadratic Unconstrained Binary Optimization (QUBO)

  • Example

minimize: energy An NP-hard problem of finding an π‘œ-bit vector π‘Œ = 𝑦!𝑦" β‹― 𝑦#$"with the minimum energy. xi ∈ {0, 1}

<latexit sha1_base64="bWyAfcz4cLVbJR0+HxB7/9gfg/8=">AB+HicdVBNS8NAEJ34WetHox69LBbBg5SkFmpvBS8eK9gPaELZbLft0s0m7G7EGvpLvHhQxKs/xZv/xm0aQUfDzem2FmXhBzprTjfFgrq2vrG5uFreL2zu5eyd4/6KgokYS2ScQj2QuwopwJ2tZMc9qLJcVhwGk3mF4u/O4tlYpF4kbPYuqHeCzYiBGsjTSwS3cDhjwmkJc6Z643H9hlp9LIgJakXstJw0VuxclQhytgf3uDSOShFRowrFSfdeJtZ9iqRnhdF70EkVjTKZ4TPuGChxS5afZ4XN0YpQhGkXSlNAoU79PpDhUahYGpjPEeqJ+ewvxL6+f6NGFnzIRJ5oKslw0SjSEVqkgIZMUqL5zBMJDO3IjLBEhNtsiqaEL4+Rf+TrXinleq17Vys5bHUYAjOIZTcKEOTbiCFrSBQAIP8ATP1r31aL1Yr8vWFSufOYQfsN4+AR+9kro=</latexit>

1 βˆ’1 2 βˆ’2 βˆ’1 5 3 4 2 3 βˆ’3 βˆ’5 βˆ’2 4 βˆ’5 4

Weight matrix W 1 2 3 1 2 3

(n = 4)

2

input: an 𝒐 ×𝒐 symmetric weight matrix W = (Wij)

W00 W01 W02 W03 W10 W11 W12 W13 W20 W21 W22 W23 W30 W31 W32 W33

Weight matrix W 1 2 3 1 2 3

𝑦!𝑦! βˆ’π‘¦!𝑦" 2𝑦!𝑦# βˆ’2𝑦!𝑦$ βˆ’π‘¦"𝑦! 5𝑦"𝑦" 3𝑦"𝑦# 4𝑦"𝑦$ 2𝑦#𝑦! 3𝑦#𝑦" βˆ’3𝑦#𝑦# βˆ’5𝑦#𝑦$ βˆ’2𝑦$𝑦! 4𝑦$𝑦" βˆ’5𝑦$𝑦# 4𝑦$𝑦$

1 2 3 1 2 3

1 2 βˆ’2 2 βˆ’3 βˆ’5 βˆ’2 βˆ’5 4

1 2 3 1 2 3

E(1011) =

<latexit sha1_base64="DpCu9pR2yrqli78Iv5IMwaAaTZg=">AB73icdVDLSgNBEOz1GeMr6tHLYBTiJezEQMxBCIjgMYJ5QLKE2clsMmT24cysEJb8hBcPinj1d7z5N042K6hoQUNR1U13lxsJrRtf1hLyura+u5jfzm1vbObmFv63CWFLWoqEIZdcligkesJbmWrBuJBnxXcE67uRy7nfumVQ8DG71NGKOT0YB9zgl2kjdqxK2MT69GBSKdrmeAi1IrZqROka4bKcoQobmoPDeH4Y09lmgqSBK9bAdaSchUnMq2CzfjxWLCJ2QEesZGhCfKSdJ752hE6MkRdKU4FGqfp9IiG+UlPfNZ0+0WP125uLf3m9WHvnTsKDKNYsoItFXiyQDtH8eTklEtpoYQKrm5FdExkYRqE1HehPD1KfqftCtlfFau3FSLjeMsjhwcwhGUAEMNGnANTWgBQEP8ATP1p31aL1Yr4vWJSubOYAfsN4+AW1sjtk=</latexit>

1 βˆ’ 0 + 2 βˆ’ 2 βˆ’0 + 0 + 0 + 0 +2 + 0 βˆ’ 3 βˆ’ 5 βˆ’2 + 0 βˆ’ 5 + 4

=

<latexit sha1_base64="lqFmsDpvdDZasXgv8jORrR4+9U=">AB6HicdVBNS8NAEJ3Ur1q/qh69LBbBU0hqofYgFLx4bMF+QBvKZrtp1242YXcjlNBf4MWDIl79Sd78N27TCr6YODx3gwz8/yYM6Ud58MqrK1vbG4Vt0s7u3v7B+XDo6KEkloh0Q8kn0fK8qZoB3NKf9WFIc+pz2/Nn10u/dU6lYJG71PKZeiCeCBYxgbaT21ahcexGBrQi9VpOGi5ybSdDBXK0RuX34TgiSUiFJhwrNXCdWHsplpoRThelYaJojMkMT+jAUIFDqrw0O3SBzowyRkEkTQmNMvX7RIpDpeahbzpDrKfqt7cU/IGiQ4uvZSJONFUkNWiIOFIR2j5NRozSYnmc0MwkczcisgUS0y0yaZkQvj6FP1PulXbvbCr7Vql6eRxFOETuEcXKhDE26gBR0gQOEBnuDZurMerRfrdVasPKZY/gB6+0TXeuNRg=</latexit> βˆ’8

E(X) = XTWX =

nβˆ’1

X

i=0 nβˆ’1

X

j=0

Wijxixj

<latexit sha1_base64="5rguNkQI6jKPifjXxtc0ZOyCzIY=">ACLnicdZBLSwMxFIUzvq2vqks3wSLowjKjhdpFoSCSwVrB9o6ZNJUzOZIbkjLcP8Ijf+FV0IKuLWn2HajvhADwQO37khucePBNdg20/WxOTU9Mzs3HxuYXFpeSW/unauw1hRVqehCJXrE80El6wOHARzI8VI4AvW8K8Ph3njhinNQ3kGg4i1A3IpeZdTAgZ5+aOjbXen6l60gPUhOUtxA7u4ils6DryEV+30IpG7TpqB3hdomLiX9j2O+17PyxfsYmUkPDblUmYqDnaK9kgFlOnEyz+0OiGNAyaBCqJ107EjaCdEAaeCpblWrFlE6DW5ZE1jJQmYbiejdVO8ZUgHd0NljgQ8ot9vJCTQehD4ZjIgcKV/Z0P4V9aMoXvQTriMYmCSjh/qxgJDiIfd4Q5XjIYGEOo4uavmF4RSiYhnOmhM9N8f/mfK/o7Bf3TkuFmp3VMYc20CbaRg4qoxo6Rieoji6RfoGb1Yd9aj9Wq9jUcnrOzOvoh6/0DA3qn/w=</latexit>

E(X) =

nβˆ’1

X

i=0 nβˆ’1

X

j=0

Wijxixj

<latexit sha1_base64="Ed9Cs7+SO1Ri4HhH0n1ONSxL4=">ACHnicdZDLSgMxGIUz9VbrerSTbAIdWGZqZXaRaEgsK9gJtHTJp2qbNZIYkIy3DPIkbX8WNC0UEV/o2pu2IF/RA4PCdPyT/cXxGpTLNdyOxsLi0vJcTa2tb2xupbd36tILBCY17DFPNB0kCaOc1BRVjDR9QZDrMNJwRmfTvHFDhKQev1ITn3Rc1Oe0RzFSGtnpk/Ns87AM2zJw7ZCWzeg65EdWFIPhF2joeBiNbQrH9tBOZ8xcaSY4N8VCbEoWtHLmTBkQq2qnX9tdDwcu4QozJGXLMn3VCZFQFDMSpdqBJD7CI9QnLW05conshLP1InigSRf2PKEPV3BGv98IkSvlxHX0pIvUQP7OpvCvrBWo3mknpNwPFOF4/lAvYFB5cNoV7FJBsGITbRAWVP8V4gESCvdaEqX8Lkp/N/U8znrOJe/LGQqZlxHEuyBfZAFiCrgAVADGNyCe/AInow748F4Nl7mowkjvrMLfsh4+wApqaHw</latexit>
slide-3
SLIDE 3

Why is QUBO important?

  • Solving QUBO can be an alternative method to quantum annealing (QA).

W00 W01 W02 W03 W10 W11 W12 W13 W20 W21 W22 W23 W30 W31 W32 W33

Weight matrix W 1 2 3 1 2 3

  • Many NP problems can easily be converted to QUBO

– a QUBO solver is a solver for a wide range of combinatorial optimization problems

MAX-CUT, TSP , etc.

3

Ising model

Each spin takes the velue

  • f 1 or -1.

There exists an interaction between adjacent two spins.

QUBO instance

equivalent

used in quantum annealing

slide-4
SLIDE 4

Our main contributions

  • Propose Adaptive Bulk Search (ABS), a new search method for solving

QUBO

– Ο(1) cost for evaluating energy per solution

  • Present an implementation of ABS on multiple GPUs and show results

for MAX-CUT, TSP , and random QUBO instances.

– 1.2T search rate (it can evaluate 1.2 Γ—1012 solutions per second).

4

slide-5
SLIDE 5

Local search

  • For an n-bit vector π‘Œ = 𝑦*𝑦+ β‹― 𝑦,-+, let us define

flipk(X) = x0x1 Β· Β· Β· xkβˆ’1xkxk+1 Β· Β· Β· xnβˆ’1

<latexit sha1_base64="fOT0wGVxsLcJKVZIe1BV3Q1xE=">ACNnicdVBNS+RAEO2o68e46qwevTQOC8qyQzIOuB4EwcteBAVHByZD6PR0tEmnO3RXJEPIr/Li7/DmxYOLePUn2MmM4i7ug4bHq1dVXS9MBTfguvfOzOzcl/mFxaXG8teV1bXmt/VzozJNWY8qoXQ/JIYJLlkPOAjWTzUjSjYRgfVfWLa6YNV/IMxikbJuRS8ohTAlYKmsc+sByKSPC0DGK83d85yAM3Dzs05ECg/OgiH96JfaVHVNtKfIgLmv5RyW/u6R1Bc2W296vgSdkrzsl+x72m6NFpriJGje+SNFs4RJoIYM/DcFIYF0cCpYGXDzwxLCY3JRtYKknCzLCozy7xd6uMcKS0fRJwrX7sKEhizDgJrTMhcGX+rVXiZ7VBtGvYcFlmgGTdLIoygQGhasM8YhrRkGMLSFUc/tXTK+IJhRs0g0bwtul+P/kvNP2dtud027rsDuNYxFtoi20jTy0hw7Rb3SCeoiG3SPHtEf59Z5cJ6c54l1xpn2bKC/4Ly8AlLcrE4=</latexit>

k-th bit is flipped

We can use this function for obtaining neighbor solutions.

5

10110

00110 11110

10010

10100 10111

00010 11000 10110 10000 10011 01000 10000 11100 11010 11001

…

X flip2(X) 1 1 1 1 1

flip 𝑦!

1 2 3 4

slide-6
SLIDE 6

Naive Evaluation of Energy E(X)

  • Evaluating 𝐹 π‘Œ requires Ο(n2) cost.

W00 W01 W02 W03 W04 W10 W11 W12 W13 W14 W20 W21 W22 W23 W24 W30 W31 W32 W33 W34 W40 W41 W42 W43 W44

Weight matrix W 1 2 3 4 1 2 3 4

6

10110

00110 11110

10010

10100 10111

00010 11000 10110 10000 10011 01000 10000 11100 11010 11001

…

Ο(n2) Ο(n2) Ο(n2)

E(X) = XTWX =

nβˆ’1

X

i=0 nβˆ’1

X

j=0

Wijxixj

<latexit sha1_base64="5rguNkQI6jKPifjXxtc0ZOyCzIY=">ACLnicdZBLSwMxFIUzvq2vqks3wSLowjKjhdpFoSCSwVrB9o6ZNJUzOZIbkjLcP8Ijf+FV0IKuLWn2HajvhADwQO37khucePBNdg20/WxOTU9Mzs3HxuYXFpeSW/unauw1hRVqehCJXrE80El6wOHARzI8VI4AvW8K8Ph3njhinNQ3kGg4i1A3IpeZdTAgZ5+aOjbXen6l60gPUhOUtxA7u4ils6DryEV+30IpG7TpqB3hdomLiX9j2O+17PyxfsYmUkPDblUmYqDnaK9kgFlOnEyz+0OiGNAyaBCqJ107EjaCdEAaeCpblWrFlE6DW5ZE1jJQmYbiejdVO8ZUgHd0NljgQ8ot9vJCTQehD4ZjIgcKV/Z0P4V9aMoXvQTriMYmCSjh/qxgJDiIfd4Q5XjIYGEOo4uavmF4RSiYhnOmhM9N8f/mfK/o7Bf3TkuFmp3VMYc20CbaRg4qoxo6Rieoji6RfoGb1Yd9aj9Wq9jUcnrOzOvoh6/0DA3qn/w=</latexit>
slide-7
SLIDE 7

βˆ†k(X) =E(flipk(X)) βˆ’ E(X) = Β± (2 X

j6=k

Wkjxj + Wkk)

<latexit sha1_base64="QCLbJHtQm/gW0lgc+nLOHb8rQ=">ACPnicdZDPaxNBFMdn68af6Xt0ctgsCSIYTcW2h4KBS14rGCaQCYs5O37WRnZteZt6Vh2b/MS/8Gbx69eFDEq0cnmwgq+oUZPnzfe8y8b1Io6TAMPwYbN27eun1n827r3v0HDx+1t7bPXF5aAUORq9yOE+5ASQNDlKhgXFjgOlEwSrKXy/roEqyTuXmLiwKmp8bmUrB0Vtxe8hegUIeZ91x72j3pMsQrBKlSzqOKPe7D0/8TdjraNdygpNuwPKXKnjas4MvKNZTUdxlc3rq3hOnzWc1b243Qn7h43oCvb31nAY0agfNuqQtU7j9gc2y0WpwaBQ3LlJFBY4rbhFKRTULVY6KLjI+DlMPBquwU2rZv2aPvXOjKa59cgbdzfJyqunVvoxHdqjhfu79rS/FdtUmJ6MK2kKUoEI1YPpaWimNlnQmLQhUCw9cWOn/SsUFt1ygT7zlQ/i1Kf0/nA360Yv+4M1e5zhcx7FJHpMnpEsisk+OyWtySoZEkPfkE/lCvgbXwefgW/B91boRrGd2yB8KfvwEZ7isVw=</latexit>

Incremental computation for E(X)

  • Incremental computation decreases the cost for evaluating

energy.

W00 W01 W02 W03 W10 W11 W12 W13 W20 W21 W22 W23 W30 W31 W32 W33

Weight matrix W 1 k 3 1 k 3

With this difference computation, the computational cost for

7

10110

00110 11110

10010

10100 10111

00010 11000 10110 10000 10011 01000 10000 11100 11010 11001

…

+πœ πŸ‘ +𝜠𝟐 +πœ πŸ“

Ο(n) Ο(n) Ο(n) E(X) E(flip2(X)) + Ξ”2(X)

X flip2(X) 1 1 1 1 1

flip 𝑦!

1 2 3 4

E(X) =

nβˆ’1

X

i=0 nβˆ’1

X

j=0

Wijxixj

<latexit sha1_base64="Ed9Cs7+SO1Ri4HhH0n1ONSxL4=">ACHnicdZDLSgMxGIUz9VbrerSTbAIdWGZqZXaRaEgsK9gJtHTJp2qbNZIYkIy3DPIkbX8WNC0UEV/o2pu2IF/RA4PCdPyT/cXxGpTLNdyOxsLi0vJcTa2tb2xupbd36tILBCY17DFPNB0kCaOc1BRVjDR9QZDrMNJwRmfTvHFDhKQev1ITn3Rc1Oe0RzFSGtnpk/Ns87AM2zJw7ZCWzeg65EdWFIPhF2joeBiNbQrH9tBOZ8xcaSY4N8VCbEoWtHLmTBkQq2qnX9tdDwcu4QozJGXLMn3VCZFQFDMSpdqBJD7CI9QnLW05conshLP1InigSRf2PKEPV3BGv98IkSvlxHX0pIvUQP7OpvCvrBWo3mknpNwPFOF4/lAvYFB5cNoV7FJBsGITbRAWVP8V4gESCvdaEqX8Lkp/N/U8znrOJe/LGQqZlxHEuyBfZAFiCrgAVADGNyCe/AInow748F4Nl7mowkjvrMLfsh4+wApqaHw</latexit>
slide-8
SLIDE 8

Incremental computation for Ξ”(X)

e.g., n = 4, k = 2, i = 1

  • Let us consider the difference of Ξ”k after xi is flipped.

βˆ†k(flipi(X)) = E(flipk(flipi(X))) βˆ’ E(flipi(X)) = Β±(2 X

j6=k

Wkjxj + Wkk) Β± 2Wkj = βˆ†k(X) Β± 2Wkj

<latexit sha1_base64="Cvo2KiVPo70WE1qZSCzRo3Jt9Ss=">AClXicdVFdb9MwFHUyPrby1W0PByRQG1QlRJmbTtYdKADfHGkOhaqa4ix3U2N7aT2Q5aFeUf7dfwxr/BTOJMjiSpaPzIV9fx7ngxgbBL8/fuHf/wcPNrdajx0+ePmtv75ybrNCUDWkmMj2OiWGCKza03Ao2zjUjMhZsFKeflv7oB9OGZ+q7XeRsKsmF4gmnxDopat/gEyYsidIutuzalongeRXx7rjXgzdHcLomp3A31Xt3elfEuOXKOJfQHWBTyKicY8WuIK1gFJXpvILraA5vlzytenVwsDKa5u1Q4zUzaneC/mENWJH9vYchD2gxod1OAsav/Es4wWkilLBTFmEga5nZEW04Fq1q4MCwnNCUXbOKoIpKZaVlvtYLXTplBkml3lIVa/bNREmnMQsYuKYm9NH97S/Ff3qSwycG05CovLFN0dVFSCLAZL8IZlwzasXCEUI1d7MCvSaUOs+suWcPtS+D85H/TD9/3Bt73O8atmHZvoBXqJuihE+gYfUFnaIiot+MdeB+8j/5z/8g/8T+vor7XdHbRGvyvwGmn8XL</latexit>

After xi is flipped, Wkjxi is changed.

βˆ†k(flipi(X)) = βˆ’βˆ†k(X)

<latexit sha1_base64="7SbjLr2/mLMy9zBNrCo9KSvJeg=">ACE3icdVDLSgNBEJz1GeMr6tHLYFQSwbCrguYgCHrwqGBiIAnL7KQ3Dpl9MNMrhiX/4MVf8eJBEa9evPk3Th6KihY0FXdHd5sRQabfvdGhufmJyazsxkZ+fmFxZzS8tVHSWKQ4VHMlI1j2mQIoQKCpRQixWwJNw6XWO+/7lNSgtovACuzE0A9YOhS84QyO5ua3GCUhkbqfQLjB1Jci7rmiUCsW6eYh3f6ya0U3l7dL5QHokOzvjUjZoU7JHiBPRjhzc2+NVsSTAELkmld+wYmylTKLiEXraRaIgZ7A21A0NWQC6mQ5+6tENo7SoHylTIdKB+n0iZYHW3cAznQHDK/3b64t/efUE/YNmKsI4Qj5cJGfSIoR7QdEW0IBR9k1hHElzK2UXzHFOJoYsyaEz0/p/6S6U3J2Szvne/mj9VEcGbJK1kiBOGSfHJFTckYqhJNbck8eyZN1Zz1Yz9bLsHXMGs2skB+wXj8ABTGc5w=</latexit>

if i = k else

W00 W01 W02 W03 W10 W11 W12 W13 W20 W21 W22 W23 W30 W31 W32 W33

i k 3 i k 3

8

10110

00110 11110

10010

10100 10111

00010 11000 10110 10000 10011 01000 10000 11100 11010 11001

…

+𝜠𝟏 +𝜠𝟐 +πœ πŸ‘ +πœ πŸ’ +πœ πŸ“ +𝜠𝟏 +𝜠𝟐 +πœ πŸ‘ +πœ πŸ’ +πœ πŸ“ +𝜠𝟏 +𝜠𝟐 +πœ πŸ‘ +πœ πŸ’ +πœ πŸ“

  • The cost is reduced to Ο(1) per solution by storing βˆ†! for

all π‘œ bits in a memory.

Ο(n) Ο(n) Ο(n) for evaluating all of n neighbors

Ξ”0(X)Ξ”1(X)Ξ”2(X)Ξ”3(X)Ξ”4(X)

X flip2(X) = Y 1 1 1 1 1

flip

1 2 3 4

Ξ”0(Y)Ξ”0(Y)Ξ”0(Y)Ξ”0(Y)Ξ”0(Y)

slide-9
SLIDE 9

Selection policy

Store Ξ”i for all n bits

We can select a bit from all the n bits based on Ξ”s.

9

Hill Climbing Our proposed policy

Select best one from all the n bits. We cannot escape from a local minimum. Select best one from a segment of the bits.

0110110110011100

0010110110011100010001111010110010 0010110110011100010001111010110010

sometimes bad solution is selected (high temperature) always good solution is selected (low temperature)

Analogous to simulated annealing (SA).

Energy Energy

Ξ”0(X) Ξ”1(X) Ξ”2(X) Ξ”3(X) β‹― Ξ”nβˆ’2(X)Ξ”nβˆ’1(X)

segment

slide-10
SLIDE 10

Our selection policy

Advantages

1. No random number generation (e.g., cuRAND) 2. A flipped bit is not re-flipped in the following several iterations. 3. A bit-flipping operation is always performed in every iteration.

In SA, a bit may not be flipped.

0110110110011100

  • 1. Compare Ξ” of bits in a segment.
  • 2. Flip a bit with minimum Ξ”

in a segment.

  • 3. Repeat the same operations

in the following segment.

0010110110011100 0110110110011100 0110110010011100 0110110010011100 0110110000011100

…

…

0010110110011100 0010110110011100

never flipped at this iteration 10

iteration 1 iteration 2 iteration 3

segment segment segment segment segment

slide-11
SLIDE 11
  • Perform a significant number of local searches

– cost efficient (O(1) algorithm described before) – full utilization of GPU computation resources – adaptive local search (a search algorithm can be changed) – combination with a global search (genetic algorithm)

Concept of Adaptive Bulk Search (ABS)

11

π‘Œ! π‘Œ" π‘Œ# π‘Œ$%" 𝐹(π‘Œ!) 𝐹(π‘Œ") 𝐹(π‘Œ#) 𝐹(π‘Œ$%")

… …

Solution pool

  • 1. Crossover
  • 2. Mutation

01101001101011100001010 10010010110001010100100 00100010101011110001010

randomly select either one for each bit randomly select two vectors randomly flip bits with a certain probability

00100010101011110001010 01100010100010110001010

  • 3. Local search

Global search Local search

Host GPU

  • perate

asynchronously

slide-12
SLIDE 12

Host (CPU)

solution pool

π‘Œ! π‘Œ" π‘Œ# π‘Œ$%" 𝐹(π‘Œ!) 𝐹(π‘Œ") 𝐹(π‘Œ#) 𝐹(π‘Œ$%")

… … GA

crossover and mutation

12

Overview of ABS

CUDA block CUDA block CUDA block CUDA block

… … Devices (GPUs)

00110β‹― 10100β‹― 01110β‹― 01110β‹― 10101β‹― 00110β‹―

𝐹 𝐹 𝐹

CUDA block CUDA block CUDA block CUDA block

… …

…

A host and a GPU comunicate asynchronously. Each CUDA block performs a search in parallel.

target buffer solution buffer solution buffer target buffer

target target

best solution & energy best solution & energy

energy energy

00110β‹― 10100β‹― 01110β‹―

𝐹 𝐹 𝐹

energy

11101β‹― 𝐹 01010β‹― 𝐹 10100β‹― 𝐹 01010β‹― 𝐹 10110β‹― 10100β‹― 00110β‹― 10100β‹― 00001β‹― 10100β‹― 00001β‹―

slide-13
SLIDE 13

π‘ˆ π‘ˆ π‘ˆ target buffer solution buffer iteration i iteration i + 1 iteration i + 2 time

local search

B

<latexit sha1_base64="rCx6YZKyFarkehd6/kmCVau7Ixc=">AB8nicbVDLSgMxFM3UV62vqks3wSK4KjNV0GWpG5cV7AOmQ8mkmTY0kwzJHaEM/Qw3LhRx69e482/MtLPQ1gOBwzn3knNPmAhuwHW/ndLG5tb2Tnm3srd/cHhUPT7pGpVqyjpUCaX7ITFMcMk6wEGwfqIZiUPBeuH0Lvd7T0wbruQjzBIWxGQsecQpASv5g5jAhBKRtebDas2tuwvgdeIVpIYKtIfVr8FI0TRmEqgxviem0CQEQ2cCjavDFLDEkKnZMx8SyWJmQmyReQ5vrDKCEdK2ycBL9TfGxmJjZnFoZ3MI5pVLxf/8/wUotsg4zJgUm6/ChKBQaF8/vxiGtGQcwsIVRzmxXTCdGEgm2pYkvwVk9eJ91G3buqNx6ua81WUcZnaFzdIk8dIOa6B61UQdRpNAzekVvDjgvzrvzsRwtOcXOKfoD5/MHcyGRXA=</latexit>

B

<latexit sha1_base64="rCx6YZKyFarkehd6/kmCVau7Ixc=">AB8nicbVDLSgMxFM3UV62vqks3wSK4KjNV0GWpG5cV7AOmQ8mkmTY0kwzJHaEM/Qw3LhRx69e482/MtLPQ1gOBwzn3knNPmAhuwHW/ndLG5tb2Tnm3srd/cHhUPT7pGpVqyjpUCaX7ITFMcMk6wEGwfqIZiUPBeuH0Lvd7T0wbruQjzBIWxGQsecQpASv5g5jAhBKRtebDas2tuwvgdeIVpIYKtIfVr8FI0TRmEqgxviem0CQEQ2cCjavDFLDEkKnZMx8SyWJmQmyReQ5vrDKCEdK2ycBL9TfGxmJjZnFoZ3MI5pVLxf/8/wUotsg4zJgUm6/ChKBQaF8/vxiGtGQcwsIVRzmxXTCdGEgm2pYkvwVk9eJ91G3buqNx6ua81WUcZnaFzdIk8dIOa6B61UQdRpNAzekVvDjgvzrvzsRwtOcXOKfoD5/MHcyGRXA=</latexit>

B

<latexit sha1_base64="rCx6YZKyFarkehd6/kmCVau7Ixc=">AB8nicbVDLSgMxFM3UV62vqks3wSK4KjNV0GWpG5cV7AOmQ8mkmTY0kwzJHaEM/Qw3LhRx69e482/MtLPQ1gOBwzn3knNPmAhuwHW/ndLG5tb2Tnm3srd/cHhUPT7pGpVqyjpUCaX7ITFMcMk6wEGwfqIZiUPBeuH0Lvd7T0wbruQjzBIWxGQsecQpASv5g5jAhBKRtebDas2tuwvgdeIVpIYKtIfVr8FI0TRmEqgxviem0CQEQ2cCjavDFLDEkKnZMx8SyWJmQmyReQ5vrDKCEdK2ycBL9TfGxmJjZnFoZ3MI5pVLxf/8/wUotsg4zJgUm6/ChKBQaF8/vxiGtGQcwsIVRzmxXTCdGEgm2pYkvwVk9eJ91G3buqNx6ua81WUcZnaFzdIk8dIOa6B61UQdRpNAzekVvDjgvzrvzsRwtOcXOKfoD5/MHcyGRXA=</latexit>

best solution & energy target solution

…

13

Ο(n2)

Incremental computation is not possible.

Ο(n2) Ο(n2) CUDA block CUDA block

… … Devices (GPUs)

target buffer solution buffer

CUDA block CUDA block

… …

target buffer solution buffer

…

00110β‹― 10100β‹― 01110β‹―

𝐹 𝐹 𝐹

energy

11101β‹― 𝐹 01010β‹― 𝐹 10110β‹― 10100β‹― 00110β‹― 10100β‹― 00001β‹― 10110β‹― 10100β‹― 00110β‹― 10100β‹― 00001β‹― 00110β‹― 10100β‹― 01110β‹―

𝐹 𝐹 𝐹

energy

11101β‹― 𝐹 01010β‹― 𝐹

slide-14
SLIDE 14

T X

prohibited

The best neighbor is selected.

Straight Search

k-neighbor: Hamming distance is k

X = 00100101

10101101 00001101 00111101

T = 10010111

00101111

Target solution

10111101 00011101 00111111 10011101 10111111

14

Ο(n) Ο(n) Ο(n) Ο(n) for evaluating all of n neighbors

Total computational cost are less than Ο(n2) .

slide-15
SLIDE 15

15

CUDA block CUDA block

… … Devices (GPUs)

target buffer solution buffer

CUDA block CUDA block

… …

target buffer solution buffer

…

π‘ˆ π‘ˆ π‘ˆ target buffer solution buffer iteration i iteration i + 1 iteration i + 2 time

straight search local search

B

<latexit sha1_base64="rCx6YZKyFarkehd6/kmCVau7Ixc=">AB8nicbVDLSgMxFM3UV62vqks3wSK4KjNV0GWpG5cV7AOmQ8mkmTY0kwzJHaEM/Qw3LhRx69e482/MtLPQ1gOBwzn3knNPmAhuwHW/ndLG5tb2Tnm3srd/cHhUPT7pGpVqyjpUCaX7ITFMcMk6wEGwfqIZiUPBeuH0Lvd7T0wbruQjzBIWxGQsecQpASv5g5jAhBKRtebDas2tuwvgdeIVpIYKtIfVr8FI0TRmEqgxviem0CQEQ2cCjavDFLDEkKnZMx8SyWJmQmyReQ5vrDKCEdK2ycBL9TfGxmJjZnFoZ3MI5pVLxf/8/wUotsg4zJgUm6/ChKBQaF8/vxiGtGQcwsIVRzmxXTCdGEgm2pYkvwVk9eJ91G3buqNx6ua81WUcZnaFzdIk8dIOa6B61UQdRpNAzekVvDjgvzrvzsRwtOcXOKfoD5/MHcyGRXA=</latexit>

B

<latexit sha1_base64="rCx6YZKyFarkehd6/kmCVau7Ixc=">AB8nicbVDLSgMxFM3UV62vqks3wSK4KjNV0GWpG5cV7AOmQ8mkmTY0kwzJHaEM/Qw3LhRx69e482/MtLPQ1gOBwzn3knNPmAhuwHW/ndLG5tb2Tnm3srd/cHhUPT7pGpVqyjpUCaX7ITFMcMk6wEGwfqIZiUPBeuH0Lvd7T0wbruQjzBIWxGQsecQpASv5g5jAhBKRtebDas2tuwvgdeIVpIYKtIfVr8FI0TRmEqgxviem0CQEQ2cCjavDFLDEkKnZMx8SyWJmQmyReQ5vrDKCEdK2ycBL9TfGxmJjZnFoZ3MI5pVLxf/8/wUotsg4zJgUm6/ChKBQaF8/vxiGtGQcwsIVRzmxXTCdGEgm2pYkvwVk9eJ91G3buqNx6ua81WUcZnaFzdIk8dIOa6B61UQdRpNAzekVvDjgvzrvzsRwtOcXOKfoD5/MHcyGRXA=</latexit>

B

<latexit sha1_base64="rCx6YZKyFarkehd6/kmCVau7Ixc=">AB8nicbVDLSgMxFM3UV62vqks3wSK4KjNV0GWpG5cV7AOmQ8mkmTY0kwzJHaEM/Qw3LhRx69e482/MtLPQ1gOBwzn3knNPmAhuwHW/ndLG5tb2Tnm3srd/cHhUPT7pGpVqyjpUCaX7ITFMcMk6wEGwfqIZiUPBeuH0Lvd7T0wbruQjzBIWxGQsecQpASv5g5jAhBKRtebDas2tuwvgdeIVpIYKtIfVr8FI0TRmEqgxviem0CQEQ2cCjavDFLDEkKnZMx8SyWJmQmyReQ5vrDKCEdK2ycBL9TfGxmJjZnFoZ3MI5pVLxf/8/wUotsg4zJgUm6/ChKBQaF8/vxiGtGQcwsIVRzmxXTCdGEgm2pYkvwVk9eJ91G3buqNx6ua81WUcZnaFzdIk8dIOa6B61UQdRpNAzekVvDjgvzrvzsRwtOcXOKfoD5/MHcyGRXA=</latexit>

best solution & energy target solution

…

straight search straight search less than Ο(n2) less than Ο(n2) less than Ο(n2)

00110β‹― 10100β‹― 01110β‹―

𝐹 𝐹 𝐹

energy

11101β‹― 𝐹 01010β‹― 𝐹 10110β‹― 10100β‹― 00110β‹― 10100β‹― 00001β‹― 10110β‹― 10100β‹― 00110β‹― 10100β‹― 00001β‹― 00110β‹― 10100β‹― 01110β‹―

𝐹 𝐹 𝐹

energy

11101β‹― 𝐹 01010β‹― 𝐹

slide-16
SLIDE 16

System configuration

  • 18-core Intel Core i9-9980XE CPU (3.00 GHz)
  • NVIDIA GeForce RTX 2080 Ti GPU Γ—4 (Turing architecture, Compute

capability 7.5)

– 11GB global memory (GDDR6 SDRAM), 64-KB shared memory, 64K 32-bit registers per multiprocessor, 1024 threads per block, and 68 multiprocessors.

  • CUDA (v10.0)

16

slide-17
SLIDE 17

Evaluation methods

  • Evaluate two metrics

– Time-to-solution

  • Time to obtain good solutions (e.g., best-known solutions)

– Search rate (throughput)

  • The number of energy evaluations per second
  • Three benchmarks for up to 32k-bit QUBO problems.

– MAX-CUT (maximum cut problem) – TSP (travelling salesman problem) – Randomly generated matrix instances

17

slide-18
SLIDE 18

Time-to-solution (MAX-CUT)

Graph n Type Edge weight Target value Time (s) G1 800 random +1 Best-known 0.0723 G6 800 random Β±1 Best-known 0.106 G22 2000 random +1 99% of best-known 0.110 G27 2000 random Β±1 99% of best-known 0.721 G35 2000 planar +1 99% of best-known 0.208 G39 2000 planar Β±1 99% of best-known 1.89 G55 5000 random +1 95% of best-known 0.150 G70 10000 random +1 95% of best-known 0.360

18

1 3 2 4

  • G-set

– https://web.stanford.edu/~yyye/yyye/Gset/

βˆ’2 1 1 1 βˆ’3 1 1 1 βˆ’2 1 1 1 βˆ’3 1 1 1 βˆ’2

1 2 3 4 1 2 3 4

weight matrix W

slide-19
SLIDE 19

Time-to-solution (TSP)

Problem #city n Target Time(s) ulysses16 16 225 Best-known 0.11 bayg29 29 784 Best-known 0.69 dantzig42 42 1681 Best-known+5% 1.25 berlin52 52 2601 Best-known+5% 1.79 st70 70 4621 Best-known+10% 4.19

19

E A C B D city

  • TSPLIB

– http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/

An N-city problem can be converted into 𝑂 βˆ’ 1 "-bit QUBO.

16 x 16 weight matrix W

5-city problem

slide-20
SLIDE 20

Time-to-solution (random)

Problem n Target Time (s) Random 1024 Best-known 0.0172 Random 2048 Best-known 0.0413 Random 4096 Best-known 1.04 Random 16384 99% of best-known 0.417 Random 32768 99% of best-known 1.79

20

W00 W01 W02 W03 W10 W11 W12 W13 W20 W21 W22 W23 W30 W31 W32 W33

Weight matrix W 1 2 3 1 2 3

Every weight is selected from all 16-bit integers uniformly at random (βˆ’32768γ€œ 32767).

slide-21
SLIDE 21

Search rate (#evaluation/second)

n Search rate [T/s] 1k 1.24 2k 1.01 4k 0.732 8k 0.537 16k 0.578 32k 0.439

21

Scalablity 60x faster than prior work

[Matsubara et al., β€œIsing-Model Optimizer with Parallel-Trial Bit-Sieve Engine”, Int. Conf. on Complex, Intelligent, and Software Intensive Systems 2017]

Linearly improved

slide-22
SLIDE 22

Conclusions

  • A QUBO solver is potentially a solver for a wide range of problems.
  • We have proposed Adaptive Bulk Search (ABS), a framework for

solving Quadratic Unconstrained Binary Optimization (QUBO) problems.

– Efficient algorithm for evaluating energy – Local search, GA, and straight search

  • We have implemented ABS on four GPUs, which supports 32k-bit

QUBO problems.

– ABS finds approximate solutions for MAX-CUT, TSP, and random QUBO problems. – ABS attains 1.2T search rate for 1k-bit QUBO problems.

22