OpenLoops 2 M. F. Zoller in collaboration with F. Buccioni, J.-N. - - PowerPoint PPT Presentation

openloops 2
SMART_READER_LITE
LIVE PREVIEW

OpenLoops 2 M. F. Zoller in collaboration with F. Buccioni, J.-N. - - PowerPoint PPT Presentation

OpenLoops 2 M. F. Zoller in collaboration with F. Buccioni, J.-N. Lang, J. Lindert, P. Maierhfer, S. Pozzorini and H. Zhang [arxiv:1907.13071] LoopFest XVIII Fermilab 08/14/2019 OpenLoops OpenLoops is a fully automated numerical tool


slide-1
SLIDE 1

OpenLoops 2

  • M. F. Zoller

in collaboration with

  • F. Buccioni, J.-N. Lang, J. Lindert, P. Maierhöfer, S. Pozzorini and H. Zhang

[arxiv:1907.13071]

LoopFest XVIII – Fermilab – 08/14/2019

slide-2
SLIDE 2

OpenLoops

[Höche]

OpenLoops is a fully automated numerical tool for the tree and one-loop computation of hard scattering amplitudes required in Monte-Carlo simulations of scattering events

  • Full NLO QCD and NLO EW corrections available
  • Strong CPU performance and excellent numerical stability

Scattering probability densities in perturbation theory W00 =

  • hel
  • col

|M0|2, W01 =

  • hel
  • col

2 Re

 M∗

0M1

 ,

W11 =

  • hel
  • col

|M1|2 computed from sums of l-loop Feynman diagrams: M0 = + + . . . M1 = + + . . .

1

slide-3
SLIDE 3

Applications of OpenLoops

Interfaces to many Monte Carlo programs (unchanged from OpenLoops 1) Sherpa [Höche, Krauss, Schönherr, Siegert et al.] → NLO matching and merging , Munich/Matrix [Grazzini, Kallweit, Rathlev, Wiesemann] → (N)NLO parton level MC, Powheg [Nason, Oleari et al.], Herwig [Gieseke, Plätzer et al.] Geneva [Alioli, Bauer, Tackmann et al.], Whizard [Kilian, Ohl, Reuter et al.] Many OpenLoops applications:

  • NLO QCD and NLO+PS for any 2 → 2, 3, 4 SM process at LHC and future colliders
  • NLO EW for any 2 → 2, 3, 4 SM process [Kallweit, Lindert, Maierhöfer, Pozzorini, Schönherr]
  • NNLO QCD for pp → V, V V with Matrix [Grazzini, Kallweit, Rathlev, Wiesemann]

First OpenLoops 2 applications (2019):

  • NNLO QCD Spin correlations in t¯

t production [Behring, Czakon, Mitov, Papanastasiou, Poncelet]

  • NLO QCD t¯

tb¯ b+jet production [Buccioni, Kallweit, Pozzorini, M.Z.]

2

slide-4
SLIDE 4

Outline

  • I. Program structure of OpenLoops 2
  • II. Full NLO QCD and EW corrections in the SM

– Power counting in αS and α – Input schemes and parameters – Evaluation of amplitudes → Automated scale variations – Colour- and Spin-correlators

  • III. The on-the-fly algorithm

– Recursive construction of tree and loop diagrams – On-the-fly reduction, merging and helicity summation – On-the-fly stability system → Numerical stability and performance

  • IV. Summary and Outlook

3

slide-5
SLIDE 5
  • I. The structure of OpenLoops 2
  • OpenLoops program (public): User interfaces and process-independent OpenLoops routines.

Available as tar from https://openloops.hepforge.org or from git repository: git clone https://gitlab.com/openloops/OpenLoops.git

  • Process generator (not public): Perform analytical steps (e.g. colour factors) and generate

process-dependent code for numerical calculation → stored in process libraries

  • Process libraries (public) automatically downloaded by the user.

A process a library contains all partonic channels for a process class. Example: ppjj contains d ¯ d → d ¯ d, u¯ u → d ¯ d, d ¯ d → gg, gg → gg, etc and real corrections d ¯ d → d ¯ dg, u¯ u → d ¯ dg, etc Also: particle permutations + channel maps, e.g. b¯ b → gg mapped to d ¯ d → gg for MB = 0. More than 200 process libraries available for all relevant SM processes (+ HEFT) → see https://openloops.hepforge.org Additional libraries provided upon user request

  • Third party tools for integral evaluation (included): Collier 1.2.2 [Denner, Dittmaier, Hofer ’16],

OneLoop 3.6.1 [van Hameren ’10]

4

slide-6
SLIDE 6
  • II. Full NLO QCD and EW corrections in the SM

EW corrections enhanced by soft/collinear logarithms from virtual EW bosons:

  • Order

α πs2

w ln2(Q2/M2

W) ∼ 25% > αS in observables at the TeV scale

⇒ EW corrections crucial for SM tests and BSM searches at the LHC But also more challenging than NLO QCD!

  • Virtual corrections involve more particles and masses (γ, Z, W, H, b, t)

qi g qi g γ, Z, W γ, Z, W ℓ+ ℓ−

  • V → lepton decays: Effective particle multiplicity increased due to final-state interactions and

non-factorisation, e.g. pp → ZZ is 2 → 2 in QCD and 2 → 4 with EW (×400 more diagrams)

Z/γ∗

p p

ℓ− ℓ+ Z/γ∗

p p

ℓ− ℓ+ Z/γ∗

p p

ℓ− ℓ+

5

slide-7
SLIDE 7

Power counting: Nontrivial QCD-EW interplay

Simple example: q¯ q → q¯ q at Born level: M0 ∼ O(e2) + O(g2

S)

⇒ σq¯

q→q¯ q ∼ W00 ∼ O(α2 S)

  • QCD

+ O(α1

Sα1)

  • EW−QCD interf.

+ O(α2)

  • EW

NLO EW corrections of O(α2

Sα1) for q¯

q → q¯ q:

  • EW corrections to QCD Born

γ γ γ, Z

  • QCD corrections to EW–QCD interference

γ, Z γ, Z γ, Z

→ only full O(α2

Sα1) IR finite

→ O(α) corrections can involve emissions of γ and g, q, ¯ q In general (e.g. pp → X+jets): M0 =

˜ nq¯

q

  • k=0 gn−2k

S

em+2kM(k) where gn

SemM(0)

= M0

  • LO QCD

˜ nq¯

q = nq¯ q − 1, if nq¯ q ≥ 1 (number of external q¯

q pairs), else ˜ nq¯

q = 0

6

slide-8
SLIDE 8

NLO EW corrections in OpenLoops 2

M0 =

˜ nq¯

q

  • k=0

gn−2k

S

em+2kM(k) ⇒ W00 = M0|M0 ∼ O(αn

Sαm) + O(αn−1 S

αm+1) + . . . + O(αn−k

S

αm+k) alternating series of dominant contributions involving |M(k)

0 |2 and suppressed pure interference terms

involving M(k)

0 |M(k′)

with k = k′.

αS

α αn

Sαm

αn−1

S

αm+1 αn−2

S

αm+2 αn+1

S

αm

αn

Sαm+1

LO NLO

⇒ Mixed α αS power counting with non-trivial interference contributions ⇒ OpenLoops provides any desired order O(αn

Sαm) in a fully automated way

7

slide-9
SLIDE 9

Input schemes and parameters

  • Three EW schemes implemented:

scheme input parameters value of 1/α α(0) α(0), MW, MZ, MH + fermion masses ≈ 137 Gµ (default) Gµ, MW, MZ, MH + fermion masses ≈ 132 α(MZ) α(MZ), MW, MZ, MH + fermion masses ≈ 128 derived parameters: cos2(θw) = µ2

W

µ2

Z , . . .

⊲ α(0)-scheme: pure QED interactions at scales Q2 ≪ M2

W, production of on-shell photons

⊲ Gµ-scheme: optimal description of W-interactions at EW scale ⊲ α(MZ)-scheme: hard EW interactions at EW scale (optimal for QED, decent for SU(2))

  • External photons in process A → B + n

γ

  • n-shell

+n∗ γ∗

  • ff-shell

(+ γ

  • real emission

) ⇒ rescale with ratios of input α and αon = α(0), αoff =

          

α|Gµ if α = α(0), α if α = α|Gµ or α = α(MZ) ⇒ W →

 αon

α

 n  αoff

α

 n∗ W

(No rescaling for real emission) Optimal scale choice for external on-shell, off-shell and real-emission photons

8

slide-10
SLIDE 10

Complex masses, Scale variations and Renormalisation

  • Consistent treatment of resonances with complex mass scheme at 1-loop [Denner, Dittmaier]

→ complex mass µ2

p = M2 p − i MpΓp from real physical mass Mp and width Γp as input

⇒ implemented in a flexible way, i.e. mix between on-shell and off-shell massive particles allowed ⇒ Consistent calculation of e.g. pp → t¯ tZ → t¯ tl+l− with off-shell Z at NLO EW

  • Different Renormalisation schemes implemented, e.g. on-shell or MS for quark masses;

different flavour schemes for αS

  • Efficient QCD scale variations:

If scattering amplitudes are re-evaluated multiple times with different values of µr and αS (all other input and kinematic parameters fixed) → For each new phase-space point, matrix elements are computed and stored in a cache. → For (µr, αS) variations, only µr-dependent QCD counterterms are explicitly re-computed and the bare amplitude from the cache is re-scaled according to its αS-dependence. ⇒ Highly efficient algorithm for scale variations fully automated

9

slide-11
SLIDE 11

More OpenLoops 2 features

  • Colour and charge correlators,

→ IR subtraction methods e.g. ML|T a

j T a k |ML

  • αp

Sαq

and ML|QjQk|ML

  • αp

Sαq for L = 0, 1 (exchange of soft gluon/photon between external legs j, k)

  • Spin and Spin-colour correlators,

→ IR subtraction methods e.g. Bµν

j

= M|µ, j ν, j|M and B(p,q|jk|µν)

LL,LO

= ML|T a

j T a k |µ, j

  • ν, j|ML
  • αp

Sαq (soft-collinear radiation of external gluons/photons)

for L=0,1 (all L=0 correlators already available in OpenLoops 1)

  • Catani-Seymour I-operator → subtraction of IR poles
  • Selection of helicity states → polarised initial or final states

⇒ Ingredients for a wide range of applications available

10

slide-12
SLIDE 12
  • III. The OpenLoops algorithm: Tree level

Tree-level amplitudes constructed recursively from sub-trees For example M0 = + . . . → split into sub-trees Numerical recursion step: wα

a =

=

×

sub-tree wb sub-tree wc

= Xα

βγ(kb, kc)

k2

a − m2 a

  • universal building block

from Feynman rules wβ

b wγ c

Generic depiction:

α

wa

ka =

α

wb wc

kb kc (ki external momenta)

Highly efficient: Sub-trees constructed only once for multiple tree and loop diagrams

11

slide-13
SLIDE 13
  • III. The OpenLoops algorithm: One-loop amplitudes

High complexity in loop diagrams due to analytical structure in loop momentum q Mdiag

1

=

wN−1 wN w1 w2 D0 D1 D2 DN−1

q

= Cdiag

  • dDq S1(q)· · ·SN(q)

D0· · ·DN−1

Scalar propagators Di(q) = (q + pi)2 − m2

i

Factorisation into colour factor Cdiag and loop segments Si(q) =

βi−1

wi

ki Di

βi

= Xα

i (ki, pi, q)wα i

Universal building block × sub-tree(s) Open loop diagram at D0 → Dress open loop recursively: Nk(q) =

k

  • i=1 Si(q) = Nk−1(q)Sk(q) =

β0

w1

D1

w2

D2

wk

Dk

βk

wk+1

Dk+1

wN−1

DN−1

wN

D0

βN

  • dressed segments
  • undressed segments

=

k

  • r=0 N (r)

µ1...µrqµ1 . . . qµr

Completely generic and highly efficient algorithm Remaining tasks: Reduction of tensor in qµ, evaluation of q-integrals

12

slide-14
SLIDE 14

Tensor integral reduction

Challenge: High complexity in loop diagram ∼

3

  • µ1=0 . . .

3

  • µN=0 Nµ1...µN qµ1 . . . qµN

dressing steps 1 2 3 4 5 6 7 independent coefficients rank 1 2 3 4 5 6 7 5 15 35 70 126 210 330

OpenLoops 1

13

slide-15
SLIDE 15

Tensor integral reduction

Challenge: High complexity in loop diagram ∼

3

  • µ1=0 . . .

3

  • µN=0 Nµ1...µN qµ1 . . . qµN

OpenLoops 1: A-posteriori reduction External tools used for reduction

  • Collier [Denner, Dittmaier, Hofer ’16]
  • CutTools [Ossola, Papadopoulos, Pittau ’08]

⇒ High complexity in intermediate results Avoided entirely in OpenLoops 2 for Born-loop interference amplitudes

dressing steps 1 2 3 4 5 6 7 independent coefficients rank 1 2 3 4 5 6 7 5 15 35 70 126 210 330

OpenLoops 1

Tensor reduction

13

slide-16
SLIDE 16

The On-the-fly method

[Buccioni, Pozzorini, M.Z., 2018]

On-the-fly reduction via integrand-level identities [del Aguila, Pittau ’05]: qµqν = Aµν +

3

  • λ=0

Bµν

λ qλ

with Aµν = Aµν

−1 + Aµν 0 D0(q),

Bµν

λ

= Bµν

−1,λ + 4

  • i=0 Bµν

i,λDi(q)

Reconstructed Di cancel in full integrand

S1(q)S2(q) D0··· DN−1 N

  • i=3 Si = N µνqµqν+N µqµ+N

D0···DN−1 N

  • i=3 Si

dressing steps 1 2 3 4 5 6 7 independent coefficients rank 1 2 3 4 5 6 7 5 15 35 70 126 210 330

OpenLoops 1

14

slide-17
SLIDE 17

The On-the-fly method

[Buccioni, Pozzorini, M.Z., 2018]

On-the-fly reduction via integrand-level identities [del Aguila, Pittau ’05]: qµqν = Aµν +

3

  • λ=0

Bµν

λ qλ

with Aµν = Aµν

−1 + Aµν 0 D0(q),

Bµν

λ

= Bµν

−1,λ + 4

  • i=0 Bµν

i,λDi(q)

Reconstructed Di cancel in full integrand

S1(q)S1(q) D0··· DN−1 N

  • i=3 Si = N µνqµqν+N µqµ+N

D0···DN−1 N

  • i=3 Si

dressing steps 1 2 3 4 5 6 7 independent coefficients rank 1 2 3 4 5 6 7 5 15 35 70 126 210 330

OpenLoops 1

14

slide-18
SLIDE 18

The On-the-fly method

[Buccioni, Pozzorini, M.Z., 2018]

On-the-fly reduction via integrand-level identities [del Aguila, Pittau ’05]: qµqν = Aµν +

3

  • λ=0

Bµν

λ qλ

with Aµν = Aµν

−1 + Aµν 0 D0(q),

Bµν

λ

= Bµν

−1,λ + 4

  • i=0 Bµν

i,λDi(q)

Reconstructed Di cancel in full integrand

S1(q)S1(q) D0··· DN−1 N

  • i=3 Si = N µνqµqν+N µqµ+N

D0···DN−1 N

  • i=3 Si

Dressing and reduction of amplitude in a single recursion Huge reduction in complexity (rank≤ 2 at all stages)

dressing steps 1 2 3 4 5 6 7 independent coefficients rank 1 2 3 4 5 6 7 5 15 35 70 126 210 330

OpenLoops 1 OpenLoops 2

14

slide-19
SLIDE 19

On-the-fly reduction, merging and helicity summation

  • New topologies with pinched propagators

in every reduction step:

N µν qµqν D0···DN−1 = 3

  • i=−1

N µ

i qµ + Ni

D0 · · · / Di · · · DN−1

  • Merging of open loops with the same topology

and same undressed segments exploiting factori- sation of dressed part N (α) and undressed seg- ments

  • α N (α) Sn+2 · · · SN = N Sn+2 · · · SN

⇒ No extra cost for pinched topologies

  • On-the-fly helicity summation of dressed

segments interfered with Born ⇒ Factor 2 − 5 gain in CPU efficiency

w1 w2 ∼ N µνqµqν w3 wN

=                                                                                                 

w1 w2 w3 wN N µ

−1qµ

+

w1 w2 w3 wN N µ

1 qµ

+

w1 w2 w3 wN N µ

2 qµ

+

w1 w2 w3 wN N µ

3 qµ

+

w1 w2 w3 wN N µ

0 qµ

N (1)

wn

Dn

wn+1

Dn+1 Dn+1

wn+2 wN +

N (2)

wn wn+1

Dn+1

wn+2 wN + . . .

                                                 =

N

wn wn+1

Dn+1

wn+2 wN

15

slide-20
SLIDE 20

Numerical stability in OpenLoops

Spurious singularities due to inverse Gram determinants in the reduction can lead to large uncer- tainties in a small fraction of phase space points potentially spoiling the precision of the full MC run.

Stability system in OpenLoops 1:

  • Numerical accuracy of double precision (dp) result determined from rescaling test
  • Re-evaluation of critical phase-space points in quad precision (qp)

→ O(100) CPU cost wrt dp

Stability system in OpenLoops 2:

  • Ensure minimal need for quad precision

→ Exploit analytical properties in on-the-fly reduction formulas and use targeted expansions (to any order) ⇒ Instabilities postponed to the last steps of the algorithm or avoided entirely

  • Ensure minimal use of quad precision

→ Hybrid precision mode: Targeted use of qp only in a critical steps

16

slide-21
SLIDE 21

Hybrid precision mode in OpenLoops 2

Upgrade of dp objects to qp only triggered in a few final steps, while the bulk of the calculation is in dp

dressing reduction double precision quadruple precision

  • CPU cost O(1%) of full qp calculation
  • Excellent numerical stability at only O(10%) additional cost wrt pure dp

17

slide-22
SLIDE 22

Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (

√ ˆ s = 1 TeV, 106 events)

−32 −28 −24 −20 −16 −12 −8 −4

accuracy Amin

10−6 10−5 10−4 10−3 10−2 10−1 100

fraction of events

gg → t¯ tgg at O(α5

s)

OL1+CutTools dp

OL1+CutTools: 1% of points highly unstable

18

slide-23
SLIDE 23

Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (

√ ˆ s = 1 TeV, 106 events)

−32 −28 −24 −20 −16 −12 −8 −4

accuracy Amin

10−6 10−5 10−4 10−3 10−2 10−1 100

fraction of events

gg → t¯ tgg at O(α5

s)

OL1+CutTools dp OL1+Collier dp

OL1+CutTools: 1% of points highly unstable → OL1+Collier: O(103) improvement

18

slide-24
SLIDE 24

Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (

√ ˆ s = 1 TeV, 106 events)

−32 −28 −24 −20 −16 −12 −8 −4

accuracy Amin

10−6 10−5 10−4 10−3 10−2 10−1 100

fraction of events

gg → t¯ tgg at O(α5

s)

OL1+CutTools dp OL1+Collier dp OL2 dp

OL1+CutTools: 1% of points highly unstable → OL1+Collier: O(103) improvement OL2 dp: extra O(10) improvement and 2–3 times faster

18

slide-25
SLIDE 25

Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (

√ ˆ s = 1 TeV, 106 events)

−32 −28 −24 −20 −16 −12 −8 −4

accuracy Amin

10−6 10−5 10−4 10−3 10−2 10−1 100

fraction of events

gg → t¯ tgg at O(α5

s)

OL1+CutTools dp OL1+Collier dp OL2 dp OL2 hp 8 digits OL2 hp 11 digits

OL1+CutTools: 1% of points highly unstable → OL1+Collier: O(103) improvement OL2 dp: extra O(10) improvement and 2–3 times faster OL2 hp: extra O(100) improvement (always ≥ 7 digits) with +8% CPU time → hp target precision can be tuned (trigger for qp upgrade)

18

slide-26
SLIDE 26

Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (

√ ˆ s = 1 TeV, 106 events)

−32 −28 −24 −20 −16 −12 −8 −4

accuracy Amin

10−6 10−5 10−4 10−3 10−2 10−1 100

fraction of events

gg → t¯ tgg at O(α5

s)

OL1+CutTools dp OL1+Collier dp OL2 dp OL2 hp 8 digits OL2 hp 11 digits OL2 qp

OL1+CutTools: 1% of points highly unstable → OL1+Collier: O(103) improvement OL2 dp: extra O(10) improvement and 2–3 times faster OL2 hp: extra O(100) improvement w.r.t. dp (always ≥ 7 digits) with +8% CPU time OL2 qp: always 17–32 digits with 80 times more CPU time than in dp

18

slide-27
SLIDE 27

Numerical stability improvements for hard kinematics (NLO EW) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (

√ ˆ s = 1 TeV, 106 events)

−32 −28 −24 −20 −16 −12 −8 −4

accuracy Amin

10−6 10−5 10−4 10−3 10−2 10−1 100

fraction of events

¯ uu → e+e−µ+µ− at O(α5)

OL1+CutTools dp OL1+Collier dp OL2 dp OL2 hp 8 digits OL2 hp 11 digits OL2 qp

Similar improvements for a wide range of tested processes with NLO QCD and NLO EW corrections as well as in hard, soft and collinear phase-space regions

19

slide-28
SLIDE 28
  • V. Summary and Outlook

OpenLoops 2: Fully automated numerical tool for tree and one-loop scattering amplitudes

  • Highly flexible and simple to interface to any MC generator (same interfaces as OL1)
  • Full NLO QCD and NLO EW corrections in the SM available (+ HEFT)
  • Excellent performance and numerical precision due to

– On-the-fly reduction, merging and helicity summation – On-the-fly stability system with hybrid precision Short-term and mid-term projects:

  • On-the-fly generation of process-dependent code
  • Numerical stability improvements in deep IR regions
  • Full NNLO automation

20

slide-29
SLIDE 29

Backup

21

slide-30
SLIDE 30

Installation of OpenLoops 2

Checkout from git repository: → takes ∼ 3 sec git clone https://gitlab.com/openloops/OpenLoops.git

  • r download as tar from https://openloops.hepforge.org/downloads

Compilation after checkout or download: cd OpenLoops ./scons → takes 1 − 2 min, ∼ 60 MB disk space Default requirements: Python ≥ 2.4, gfortran ≥ 4.6; Alternative compiler: ifort Change default options in file OpenLoops/openloops.cfg: [OpenLoops] fortran compiler = ifort . . . Update program and (if necessary) all process libraries with ./openloops update

22

slide-31
SLIDE 31

Process libraries

Download and compile process libraries or collections of process libraries: ./openloops libinstall <processes> ./openloops libinstall <collection>.coll for example ./openloops libinstall pptt ppttj ppttjj Predefined collections: <collection> Description bornloop all public NLO QCD born-loop libraries loop2 all public NLO QCD loop-induced libraries lhc essential LHC processes all every lib from the public process repository Time and disk space required for download+compilation of process libraries: library/collection pptt ppttj ppttjj lhc.coll (27 libs) all.coll (156 libs) time 4 sec 10 sec 2 min 24 min 105 min disk space [MB] 3.6 19.7 288 3500 13900

23

slide-32
SLIDE 32

Example for QCD and EW power counting

Channels and orders, e.g. W + multi-jet [Kallweit,Lindert,Maierhöfer,Pozzorini,Schönherr ’14] pp → W + n jets @LO pp → W + n jets @NLO

αn

αn−1

s

α2 αn−2

s

α3 αn−3

s

α4 αn+1

s

α αn

sα2

αn−1

s

α3 αn−2

s

α4 αn−3

s

α5 ui ¯ di → W + ng

×

  • ×

×

  • ui ¯

di → W + q¯ q + (n − 2)g

× × ×

  • ×

× × ×

  • γui → diW + (n − 1)g
  • ×
  • γui → diW + q¯

q + (n − 3)g

  • ×

× ×

  • γγ → ¯

uidiW + (n − 2)g

  • ×
  • ui ¯

di → W + q¯ qq′¯ q′ + (n − 3)g

  • ×

× × × ×

. . . 24

slide-33
SLIDE 33

Computing details: Performance and memory consumption Timing ratios OL1/OL2 and timings normalised to the 2 → 2 process of the same class Process Correction OL1/OL2 (2 → n)/(2 → 2) gg → t¯ t NLO QCD 2.0 1 gg → t¯ t + g NLO QCD 2.2 23 gg → t¯ t + gg NLO QCD 2.8 700 q¯ q → t¯ t NLO QCD 1.5 1 q¯ q → t¯ t + g NLO QCD 2.4 12 q¯ q → t¯ t + gg NLO QCD 2.9 300 u¯ u → e− ¯ νe¯ µνµ NLO EW 4.3

  • uu → W +W +dd

NLO EW 2.3

  • (sample of 105 random psp)

Complexity and CPU time grow exponentially with number of external particles Performance gain of factor 2-4 w.r.t OpenLoops 1 for non-trivial processes Maximum memory allocated in RAM during calculation (RSS, Peak) Process q¯ q → t¯ tg gg → t¯ tg gg → t¯ t + gg RSS Peak [MB] 54 68 420 ⇒ 70 MB for 2 → 3 420 MB for 2 → 4 Peak RAM usage up to factor 2 lower than in OpenLoops 1

25

slide-34
SLIDE 34

Stability of OpenLoops: 2 → 3 process with single soft gluon with ξ = 10−7, where ξ ∼

Q2

soft

ˆ s

∼ Esoft

√ ˆ s ( √ ˆ s = 1 TeV, 105 events)

−32 −28 −24 −20 −16 −12 −8 −4 4

instability Amin

10−5 10−4 10−3 10−2 10−1 100

fraction of events gg → t¯ tg

* wrt OL2 qp benchmark

OL1+Collier dp * OL1+CutTools dp * OL2 dp * OL2 hybrid * OL2 qp (rescaling)

Stability: OL1 + Cuttools ∼ 5% points with zero digits OL2 → no points with less than 11 (8) digits in hp (dp) Performance: OL2 (hp) 5 times slower than OL2 (dp) → Major speed-up possible

26

slide-35
SLIDE 35

Stability of OpenLoops: 2 → 3 process with single soft gluon Dependence of stability on ξ ∼

Q2

soft

ˆ s

∼ Esoft

√ ˆ s ( √ ˆ s = 1 TeV, hard kinematics fixed)

4 −4 −8 −12 −16 −20 −24 −28 −32 −36

accuracy A

10−1 10−2 10−3 10−4 10−5 10−6 10−7 10−8 10−9

ξ gg → t¯ tg

* wrt OL2 qp benchmark

OL1+CutTools dp * OL1+Collier dp * OL2 dp * OL2 hybrid * OL2 qp (rescaling)

Stability: at least 10 digits from OL2 (hp) in ultra-soft region at least 24 digits from OL2 (qp) → excellent benchmark

27

slide-36
SLIDE 36

Stability of OpenLoops: 2 → 3 process with collinear gluon pair with ξ = 10−7, where ξ ∼ Q2

coll

ˆ s

∼ k2

T

ˆ s ( √ ˆ s = 1 TeV, 105 events)

−32 −28 −24 −20 −16 −12 −8 −4 4

instability Amin

10−5 10−4 10−3 10−2 10−1 100

fraction of events gg → t¯ tg

* wrt OL2 qp benchmark

OL1+Collier dp * OL1+CutTools dp * OL2 dp * OL2 hybrid * OL2 qp (rescaling)

Stability: OL1 + Cuttools ∼ 100% points with zero digits OL2 → no points with less than 10 (7) digits in hp (dp) Performance: OL2 (hp) 8 times slower than OL2 (dp) → Major speed-up possible

28

slide-37
SLIDE 37

Stability of OpenLoops: 2 → 3 process with collinear gluon pair Dependence of stability on ξ ∼ Q2

coll

ˆ s

∼ k2

T

ˆ s ( √ ˆ s = 1 TeV, hard kinematics fixed)

4 −4 −8 −12 −16 −20 −24 −28 −32

accuracy A

10−1 10−2 10−3 10−4 10−5 10−6 10−7 10−8 10−9

ξ gg → t¯ tg

* wrt OL2 qp benchmark

OL1+CutTools dp * OL1+Collier dp * OL2 dp * OL2 hybrid * OL2 qp (rescaling)

Stability: at least 9 digits from OL2 (hp) in extremely collinear region at least 23 digits from OL2 (qp) → excellent benchmark

29

slide-38
SLIDE 38

Gram determinant instabilities

qµqν =

  • Aµν

−1 + Aµν 0 D0

  • +

  Bµν

−1,λ + 3

  • i=0 Bµν

i,λDi

   qλ,

Di = (q + pi)2 − m2

i,

p0 = 0 Aµν

i , Bµν i,λ constructed from three external momenta p1, p2, p3.

  • Strong dependence on inverse rank-2 Gram determinant ∆12 = (p1 · p2)2 − p2

1p2 2

  • Moderate dependence on inverse rank-3 Gram determinant √∆123 ∼ p3 · l(p1, p2)

→ Can become dominant in soft and collinear regions. Aµν

i

= 1 ∆12 aµν

i ,

Bµν

i,λ

= 1 ∆2

12

1 √∆123

 b(1)

i,λ

 µν

+ 1 ∆12

 b(2)

i,λ

 µν

  • Severe numerical instabilities for

∆12 → 0

  • Numerical instabilities for

∆123 → 0

30

slide-39
SLIDE 39

Numerical instabilities due to small rank-2 Gram determinants

Aµν

i

= 1 ∆12 aµν

i ,

Bµν

i,λ =

1 ∆2

12

 ˜

b(1)

i,λ

 µν

+ 1 ∆12

 b(2)

i,λ

 µν

Severe numerical instabilities for ∆12 → 0

  • for N ≥ 4: Exploit freedom to choose p1, p2 by re-ordering at runtime:

{D1, D2, D3} − → {Di1, Di2, Di3} such that parameter ∼ |∆i1i2| is maximal ⇒ avoid small rank-2 Gram determinants until triangle reduction!

  • for N = 3: Identify problematic kinematic configurations → for hard kinematics only one case:

q p1 q + p1 p2 − p1 q + p2 − p2

p2

1

= −p2 < 0, p2

2

= −p2(1 + δ), 0 ≤ δ ≪ 1, (p2 − p1)2 = 0, ⇒ ∆ = −p2δ2

Master integrals: C0(p2

1, p2 2) ∼

  • dDq

1 D0D1D2 and B0(p2

1) ∼

  • dDq

1 D0D1

Reduction formulas exhibit poles in δ, e.g. for massless rank-1 topology: Cµ = 1 δ2B0(−p2) [. . .] + 1 δ2B0

  • −p2(1 + δ)
  • [. . .] + 1

δC0

  • −p2, −p2(1 + δ)
  • [. . .]

31

slide-40
SLIDE 40

Any-order expansions

  • Expand master integrals B0, C0 in δ ∼ ∆12 ⇒ 1

δ-poles cancel (also for higher rank):

Cµ = pµ

1 + pµ 2

2p2

  • −B0(−p2) + 1
  • + δ pµ

1 + 2pµ 2

6p2

  • B0(−p2)
  • + . . .
  • General and compact formulas derived for

∂δ

mB0 and ∂

∂δ

mC0 (all QCD topologies)

→ extremely fast implementation ⇒ Expansion of B0, C0 to any order M in order to reach 16, 32 or more digits ⇒ Uncertainty due to truncation of series avoided entirely.

On-the-fly reduction, no stability improvements

−15 −10 −5 5 10

accuracy A

105 1010 1015 1020 1025 1030

  • Q4

12/∆

2

min

100 101 102 103

(D1, D2, D3)-permutation, no expansion

−15 −10 −5 5 10

accuracy A

105 1010 1015 1020 1025 1030

  • Q4

12/∆

2

min

100 101 102 103

(D1, D2, D3)-permutation + any-order expansions

−15 −10 −5 5 10

accuracy A

105 1010 1015 1020 1025 1030

  • Q4

12/∆

2

min

100 101 102 103

32

slide-41
SLIDE 41

Treating numerical instabilities in soft and collinear regions

  • for N ≥ 5: Exploit freedom to choose the 3 propagators for the reduction from D1, D2, D3, D4

such that parameter ∼ |∆i1i2| is maximal (as before) and parameter ∼ |∆i1i2i3| is maximal. ⇒ Avoid small rank-3 (rank-2) Gram determinants until box (triangle) reduction!

  • Analytical treatment of large cancellations between quasi on-shell self-energy insertions and

counterterms in unresolved regions

  • Numerically stable evaluation of kinematics (momenta, invariants, propagators,...)
  • Numerically stable construction of reduction basis → coefficients in reduction formulas
  • Improvements in Collier 1.2.2 (scalar boxes, triangles) [Denner, Dittmaier]

→ requires analytical understanding of the origin of instabilities and their cancellation

33

slide-42
SLIDE 42

Hybrid precision

Idea: Promote only open loops enhanced by ∆−n terms and their subsequent dressing, merging and reduction to quad precision (qp), → bulk of the calculation still in double precision (dp) Example:

w1 w2 w3 wN

dp

w1 w2 w3 wN

qp

w1 w2 w3 wN

dp

w1 w2 w3 wN

dp

w1 w2 w3 wN

dp

w1 w2 w3 wN

qp

reduce 34

slide-43
SLIDE 43

Hybrid precision

Idea: Promote only open loops enhanced by ∆−n terms and their subsequent dressing, merging and reduction to quad precision (qp), → bulk of the calculation still in double precision (dp) Example:

w1 w2 w3 wN

dp

w1 w2 w3 wN

qp

w1 w2 w3 wN

dp

w1 w2 w3 wN

dp

w1 w2 w3 wN

dp

w1 w2 w3 wN

qp

reduce

w1 w2 w3 wN

dp

merge

w1 w2 w3 wN

qp

merge 34

slide-44
SLIDE 44

Hybrid precision

Idea: Promote only open loops enhanced by ∆−n terms and their subsequent dressing, merging and reduction to quad precision (qp), → bulk of the calculation still in double precision (dp) Example:

w1 w2 w3 wN

dp

w1 w2 w3 wN

qp

w1 w2 w3 wN

dp

w1 w2 w3 wN

dp

w1 w2 w3 wN

dp

w1 w2 w3 wN

qp

reduce

w1 w2 w3 wN

dp

merge

w1 w2 w3 wN

qp

merge

w1 w2 w3 wN

qp

dress

. . . . . .

w1 w2 w3 wN

dp

dress dress

w1 w2 w3 wN

qp

34

slide-45
SLIDE 45

Hybrid precision

Idea: Promote only open loops enhanced by ∆−n terms and their subsequent dressing, merging and reduction to quad precision (qp), → bulk of the calculation still in double precision (dp) Example:

w1 w2 w3 wN

dp

w1 w2 w3 wN

qp

w1 w2 w3 wN

dp

w1 w2 w3 wN

dp

w1 w2 w3 wN

dp

w1 w2 w3 wN

qp

reduce

w1 w2 w3 wN

dp

merge

w1 w2 w3 wN

qp

merge

w1 w2 w3 wN

qp

dress

. . . . . .

w1 w2 w3 wN

dp

dress dress

w1 w2 w3 wN

qp . . . . . .

Large cancellation 34