OpenLoops 2
- M. F. Zoller
in collaboration with
- F. Buccioni, J.-N. Lang, J. Lindert, P. Maierhöfer, S. Pozzorini and H. Zhang
[arxiv:1907.13071]
LoopFest XVIII – Fermilab – 08/14/2019
OpenLoops 2 M. F. Zoller in collaboration with F. Buccioni, J.-N. - - PowerPoint PPT Presentation
OpenLoops 2 M. F. Zoller in collaboration with F. Buccioni, J.-N. Lang, J. Lindert, P. Maierhfer, S. Pozzorini and H. Zhang [arxiv:1907.13071] LoopFest XVIII Fermilab 08/14/2019 OpenLoops OpenLoops is a fully automated numerical tool
in collaboration with
[arxiv:1907.13071]
LoopFest XVIII – Fermilab – 08/14/2019
[Höche]
OpenLoops is a fully automated numerical tool for the tree and one-loop computation of hard scattering amplitudes required in Monte-Carlo simulations of scattering events
Scattering probability densities in perturbation theory W00 =
|M0|2, W01 =
2 Re
M∗
0M1
,
W11 =
|M1|2 computed from sums of l-loop Feynman diagrams: M0 = + + . . . M1 = + + . . .
1
Interfaces to many Monte Carlo programs (unchanged from OpenLoops 1) Sherpa [Höche, Krauss, Schönherr, Siegert et al.] → NLO matching and merging , Munich/Matrix [Grazzini, Kallweit, Rathlev, Wiesemann] → (N)NLO parton level MC, Powheg [Nason, Oleari et al.], Herwig [Gieseke, Plätzer et al.] Geneva [Alioli, Bauer, Tackmann et al.], Whizard [Kilian, Ohl, Reuter et al.] Many OpenLoops applications:
First OpenLoops 2 applications (2019):
t production [Behring, Czakon, Mitov, Papanastasiou, Poncelet]
tb¯ b+jet production [Buccioni, Kallweit, Pozzorini, M.Z.]
2
– Power counting in αS and α – Input schemes and parameters – Evaluation of amplitudes → Automated scale variations – Colour- and Spin-correlators
– Recursive construction of tree and loop diagrams – On-the-fly reduction, merging and helicity summation – On-the-fly stability system → Numerical stability and performance
3
Available as tar from https://openloops.hepforge.org or from git repository: git clone https://gitlab.com/openloops/OpenLoops.git
process-dependent code for numerical calculation → stored in process libraries
A process a library contains all partonic channels for a process class. Example: ppjj contains d ¯ d → d ¯ d, u¯ u → d ¯ d, d ¯ d → gg, gg → gg, etc and real corrections d ¯ d → d ¯ dg, u¯ u → d ¯ dg, etc Also: particle permutations + channel maps, e.g. b¯ b → gg mapped to d ¯ d → gg for MB = 0. More than 200 process libraries available for all relevant SM processes (+ HEFT) → see https://openloops.hepforge.org Additional libraries provided upon user request
OneLoop 3.6.1 [van Hameren ’10]
4
EW corrections enhanced by soft/collinear logarithms from virtual EW bosons:
α πs2
w ln2(Q2/M2
W) ∼ 25% > αS in observables at the TeV scale
⇒ EW corrections crucial for SM tests and BSM searches at the LHC But also more challenging than NLO QCD!
qi g qi g γ, Z, W γ, Z, W ℓ+ ℓ−
non-factorisation, e.g. pp → ZZ is 2 → 2 in QCD and 2 → 4 with EW (×400 more diagrams)
Z/γ∗
p pℓ− ℓ+ Z/γ∗
p pℓ− ℓ+ Z/γ∗
p pℓ− ℓ+
5
Simple example: q¯ q → q¯ q at Born level: M0 ∼ O(e2) + O(g2
S)
⇒ σq¯
q→q¯ q ∼ W00 ∼ O(α2 S)
+ O(α1
Sα1)
+ O(α2)
NLO EW corrections of O(α2
Sα1) for q¯
q → q¯ q:
γ γ γ, Z
γ, Z γ, Z γ, Z
→ only full O(α2
Sα1) IR finite
→ O(α) corrections can involve emissions of γ and g, q, ¯ q In general (e.g. pp → X+jets): M0 =
˜ nq¯
q
S
em+2kM(k) where gn
SemM(0)
= M0
˜ nq¯
q = nq¯ q − 1, if nq¯ q ≥ 1 (number of external q¯
q pairs), else ˜ nq¯
q = 0
6
M0 =
˜ nq¯
q
gn−2k
S
em+2kM(k) ⇒ W00 = M0|M0 ∼ O(αn
Sαm) + O(αn−1 S
αm+1) + . . . + O(αn−k
S
αm+k) alternating series of dominant contributions involving |M(k)
0 |2 and suppressed pure interference terms
involving M(k)
0 |M(k′)
with k = k′.
αS
α αn
Sαm
αn−1
S
αm+1 αn−2
S
αm+2 αn+1
S
αm
αn
Sαm+1
LO NLO
⇒ Mixed α αS power counting with non-trivial interference contributions ⇒ OpenLoops provides any desired order O(αn
Sαm) in a fully automated way
7
scheme input parameters value of 1/α α(0) α(0), MW, MZ, MH + fermion masses ≈ 137 Gµ (default) Gµ, MW, MZ, MH + fermion masses ≈ 132 α(MZ) α(MZ), MW, MZ, MH + fermion masses ≈ 128 derived parameters: cos2(θw) = µ2
W
µ2
Z , . . .
⊲ α(0)-scheme: pure QED interactions at scales Q2 ≪ M2
W, production of on-shell photons
⊲ Gµ-scheme: optimal description of W-interactions at EW scale ⊲ α(MZ)-scheme: hard EW interactions at EW scale (optimal for QED, decent for SU(2))
γ
+n∗ γ∗
(+ γ
) ⇒ rescale with ratios of input α and αon = α(0), αoff =
α|Gµ if α = α(0), α if α = α|Gµ or α = α(MZ) ⇒ W →
αon
α
n αoff
α
n∗ W
(No rescaling for real emission) Optimal scale choice for external on-shell, off-shell and real-emission photons
8
→ complex mass µ2
p = M2 p − i MpΓp from real physical mass Mp and width Γp as input
⇒ implemented in a flexible way, i.e. mix between on-shell and off-shell massive particles allowed ⇒ Consistent calculation of e.g. pp → t¯ tZ → t¯ tl+l− with off-shell Z at NLO EW
different flavour schemes for αS
If scattering amplitudes are re-evaluated multiple times with different values of µr and αS (all other input and kinematic parameters fixed) → For each new phase-space point, matrix elements are computed and stored in a cache. → For (µr, αS) variations, only µr-dependent QCD counterterms are explicitly re-computed and the bare amplitude from the cache is re-scaled according to its αS-dependence. ⇒ Highly efficient algorithm for scale variations fully automated
9
→ IR subtraction methods e.g. ML|T a
j T a k |ML
Sαq
and ML|QjQk|ML
Sαq for L = 0, 1 (exchange of soft gluon/photon between external legs j, k)
→ IR subtraction methods e.g. Bµν
j
= M|µ, j ν, j|M and B(p,q|jk|µν)
LL,LO
= ML|T a
j T a k |µ, j
Sαq (soft-collinear radiation of external gluons/photons)
for L=0,1 (all L=0 correlators already available in OpenLoops 1)
⇒ Ingredients for a wide range of applications available
10
Tree-level amplitudes constructed recursively from sub-trees For example M0 = + . . . → split into sub-trees Numerical recursion step: wα
a =
=
sub-tree wb sub-tree wc
= Xα
βγ(kb, kc)
k2
a − m2 a
from Feynman rules wβ
b wγ c
Generic depiction:
α
ka =
α
wb wc
kb kc (ki external momenta)
Highly efficient: Sub-trees constructed only once for multiple tree and loop diagrams
11
High complexity in loop diagrams due to analytical structure in loop momentum q Mdiag
1
=
wN−1 wN w1 w2 D0 D1 D2 DN−1
q
= Cdiag
D0· · ·DN−1
Scalar propagators Di(q) = (q + pi)2 − m2
i
Factorisation into colour factor Cdiag and loop segments Si(q) =
βi−1
wi
ki Di
βi
= Xα
i (ki, pi, q)wα i
Universal building block × sub-tree(s) Open loop diagram at D0 → Dress open loop recursively: Nk(q) =
k
β0
w1
D1
w2
D2
wk
Dk
βk
wk+1
Dk+1
wN−1
DN−1
wN
D0
βN
=
k
µ1...µrqµ1 . . . qµr
Completely generic and highly efficient algorithm Remaining tasks: Reduction of tensor in qµ, evaluation of q-integrals
12
Challenge: High complexity in loop diagram ∼
3
3
OpenLoops 1
13
Challenge: High complexity in loop diagram ∼
3
3
OpenLoops 1: A-posteriori reduction External tools used for reduction
⇒ High complexity in intermediate results Avoided entirely in OpenLoops 2 for Born-loop interference amplitudes
OpenLoops 1
13
[Buccioni, Pozzorini, M.Z., 2018]
On-the-fly reduction via integrand-level identities [del Aguila, Pittau ’05]: qµqν = Aµν +
3
Bµν
λ qλ
with Aµν = Aµν
−1 + Aµν 0 D0(q),
Bµν
λ
= Bµν
−1,λ + 4
i,λDi(q)
Reconstructed Di cancel in full integrand
S1(q)S2(q) D0··· DN−1 N
D0···DN−1 N
OpenLoops 1
14
[Buccioni, Pozzorini, M.Z., 2018]
On-the-fly reduction via integrand-level identities [del Aguila, Pittau ’05]: qµqν = Aµν +
3
Bµν
λ qλ
with Aµν = Aµν
−1 + Aµν 0 D0(q),
Bµν
λ
= Bµν
−1,λ + 4
i,λDi(q)
Reconstructed Di cancel in full integrand
S1(q)S1(q) D0··· DN−1 N
D0···DN−1 N
OpenLoops 1
14
[Buccioni, Pozzorini, M.Z., 2018]
On-the-fly reduction via integrand-level identities [del Aguila, Pittau ’05]: qµqν = Aµν +
3
Bµν
λ qλ
with Aµν = Aµν
−1 + Aµν 0 D0(q),
Bµν
λ
= Bµν
−1,λ + 4
i,λDi(q)
Reconstructed Di cancel in full integrand
S1(q)S1(q) D0··· DN−1 N
D0···DN−1 N
Dressing and reduction of amplitude in a single recursion Huge reduction in complexity (rank≤ 2 at all stages)
OpenLoops 1 OpenLoops 2
14
in every reduction step:
N µν qµqν D0···DN−1 = 3
N µ
i qµ + Ni
D0 · · · / Di · · · DN−1
and same undressed segments exploiting factori- sation of dressed part N (α) and undressed seg- ments
⇒ No extra cost for pinched topologies
segments interfered with Born ⇒ Factor 2 − 5 gain in CPU efficiency
w1 w2 ∼ N µνqµqν w3 wN
=
w1 w2 w3 wN N µ
−1qµ
+
w1 w2 w3 wN N µ
1 qµ
+
w1 w2 w3 wN N µ
2 qµ
+
w1 w2 w3 wN N µ
3 qµ
+
w1 w2 w3 wN N µ
0 qµ
N (1)
wn
Dn
wn+1
Dn+1 Dn+1
wn+2 wN +
N (2)
wn wn+1
Dn+1
wn+2 wN + . . .
=
N
wn wn+1
Dn+1
wn+2 wN
15
Spurious singularities due to inverse Gram determinants in the reduction can lead to large uncer- tainties in a small fraction of phase space points potentially spoiling the precision of the full MC run.
→ O(100) CPU cost wrt dp
→ Exploit analytical properties in on-the-fly reduction formulas and use targeted expansions (to any order) ⇒ Instabilities postponed to the last steps of the algorithm or avoided entirely
→ Hybrid precision mode: Targeted use of qp only in a critical steps
16
Upgrade of dp objects to qp only triggered in a few final steps, while the bulk of the calculation is in dp
dressing reduction double precision quadruple precision
17
Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (
√ ˆ s = 1 TeV, 106 events)
−32 −28 −24 −20 −16 −12 −8 −4
accuracy Amin
10−6 10−5 10−4 10−3 10−2 10−1 100
fraction of events
gg → t¯ tgg at O(α5
s)
OL1+CutTools dp
OL1+CutTools: 1% of points highly unstable
18
Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (
√ ˆ s = 1 TeV, 106 events)
−32 −28 −24 −20 −16 −12 −8 −4
accuracy Amin
10−6 10−5 10−4 10−3 10−2 10−1 100
fraction of events
gg → t¯ tgg at O(α5
s)
OL1+CutTools dp OL1+Collier dp
OL1+CutTools: 1% of points highly unstable → OL1+Collier: O(103) improvement
18
Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (
√ ˆ s = 1 TeV, 106 events)
−32 −28 −24 −20 −16 −12 −8 −4
accuracy Amin
10−6 10−5 10−4 10−3 10−2 10−1 100
fraction of events
gg → t¯ tgg at O(α5
s)
OL1+CutTools dp OL1+Collier dp OL2 dp
OL1+CutTools: 1% of points highly unstable → OL1+Collier: O(103) improvement OL2 dp: extra O(10) improvement and 2–3 times faster
18
Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (
√ ˆ s = 1 TeV, 106 events)
−32 −28 −24 −20 −16 −12 −8 −4
accuracy Amin
10−6 10−5 10−4 10−3 10−2 10−1 100
fraction of events
gg → t¯ tgg at O(α5
s)
OL1+CutTools dp OL1+Collier dp OL2 dp OL2 hp 8 digits OL2 hp 11 digits
OL1+CutTools: 1% of points highly unstable → OL1+Collier: O(103) improvement OL2 dp: extra O(10) improvement and 2–3 times faster OL2 hp: extra O(100) improvement (always ≥ 7 digits) with +8% CPU time → hp target precision can be tuned (trigger for qp upgrade)
18
Numerical stability improvements for hard kinematics (NLO QCD) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (
√ ˆ s = 1 TeV, 106 events)
−32 −28 −24 −20 −16 −12 −8 −4
accuracy Amin
10−6 10−5 10−4 10−3 10−2 10−1 100
fraction of events
gg → t¯ tgg at O(α5
s)
OL1+CutTools dp OL1+Collier dp OL2 dp OL2 hp 8 digits OL2 hp 11 digits OL2 qp
OL1+CutTools: 1% of points highly unstable → OL1+Collier: O(103) improvement OL2 dp: extra O(10) improvement and 2–3 times faster OL2 hp: extra O(100) improvement w.r.t. dp (always ≥ 7 digits) with +8% CPU time OL2 qp: always 17–32 digits with 80 times more CPU time than in dp
18
Numerical stability improvements for hard kinematics (NLO EW) Probability to encounter an event with accuracy Amin or less for a 2 → 4 process (
√ ˆ s = 1 TeV, 106 events)
−32 −28 −24 −20 −16 −12 −8 −4
accuracy Amin
10−6 10−5 10−4 10−3 10−2 10−1 100
fraction of events
¯ uu → e+e−µ+µ− at O(α5)
OL1+CutTools dp OL1+Collier dp OL2 dp OL2 hp 8 digits OL2 hp 11 digits OL2 qp
Similar improvements for a wide range of tested processes with NLO QCD and NLO EW corrections as well as in hard, soft and collinear phase-space regions
19
OpenLoops 2: Fully automated numerical tool for tree and one-loop scattering amplitudes
– On-the-fly reduction, merging and helicity summation – On-the-fly stability system with hybrid precision Short-term and mid-term projects:
20
21
Checkout from git repository: → takes ∼ 3 sec git clone https://gitlab.com/openloops/OpenLoops.git
Compilation after checkout or download: cd OpenLoops ./scons → takes 1 − 2 min, ∼ 60 MB disk space Default requirements: Python ≥ 2.4, gfortran ≥ 4.6; Alternative compiler: ifort Change default options in file OpenLoops/openloops.cfg: [OpenLoops] fortran compiler = ifort . . . Update program and (if necessary) all process libraries with ./openloops update
22
Download and compile process libraries or collections of process libraries: ./openloops libinstall <processes> ./openloops libinstall <collection>.coll for example ./openloops libinstall pptt ppttj ppttjj Predefined collections: <collection> Description bornloop all public NLO QCD born-loop libraries loop2 all public NLO QCD loop-induced libraries lhc essential LHC processes all every lib from the public process repository Time and disk space required for download+compilation of process libraries: library/collection pptt ppttj ppttjj lhc.coll (27 libs) all.coll (156 libs) time 4 sec 10 sec 2 min 24 min 105 min disk space [MB] 3.6 19.7 288 3500 13900
23
Channels and orders, e.g. W + multi-jet [Kallweit,Lindert,Maierhöfer,Pozzorini,Schönherr ’14] pp → W + n jets @LO pp → W + n jets @NLO
αn
sα
αn−1
s
α2 αn−2
s
α3 αn−3
s
α4 αn+1
s
α αn
sα2
αn−1
s
α3 αn−2
s
α4 αn−3
s
α5 ui ¯ di → W + ng
×
×
di → W + q¯ q + (n − 2)g
× × ×
× × ×
q + (n − 3)g
× ×
uidiW + (n − 2)g
di → W + q¯ qq′¯ q′ + (n − 3)g
× × × ×
. . . 24
Computing details: Performance and memory consumption Timing ratios OL1/OL2 and timings normalised to the 2 → 2 process of the same class Process Correction OL1/OL2 (2 → n)/(2 → 2) gg → t¯ t NLO QCD 2.0 1 gg → t¯ t + g NLO QCD 2.2 23 gg → t¯ t + gg NLO QCD 2.8 700 q¯ q → t¯ t NLO QCD 1.5 1 q¯ q → t¯ t + g NLO QCD 2.4 12 q¯ q → t¯ t + gg NLO QCD 2.9 300 u¯ u → e− ¯ νe¯ µνµ NLO EW 4.3
NLO EW 2.3
Complexity and CPU time grow exponentially with number of external particles Performance gain of factor 2-4 w.r.t OpenLoops 1 for non-trivial processes Maximum memory allocated in RAM during calculation (RSS, Peak) Process q¯ q → t¯ tg gg → t¯ tg gg → t¯ t + gg RSS Peak [MB] 54 68 420 ⇒ 70 MB for 2 → 3 420 MB for 2 → 4 Peak RAM usage up to factor 2 lower than in OpenLoops 1
25
Stability of OpenLoops: 2 → 3 process with single soft gluon with ξ = 10−7, where ξ ∼
Q2
soft
ˆ s
∼ Esoft
√ ˆ s ( √ ˆ s = 1 TeV, 105 events)
−32 −28 −24 −20 −16 −12 −8 −4 4
instability Amin
10−5 10−4 10−3 10−2 10−1 100
fraction of events gg → t¯ tg
* wrt OL2 qp benchmark
OL1+Collier dp * OL1+CutTools dp * OL2 dp * OL2 hybrid * OL2 qp (rescaling)
Stability: OL1 + Cuttools ∼ 5% points with zero digits OL2 → no points with less than 11 (8) digits in hp (dp) Performance: OL2 (hp) 5 times slower than OL2 (dp) → Major speed-up possible
26
Stability of OpenLoops: 2 → 3 process with single soft gluon Dependence of stability on ξ ∼
Q2
soft
ˆ s
∼ Esoft
√ ˆ s ( √ ˆ s = 1 TeV, hard kinematics fixed)
4 −4 −8 −12 −16 −20 −24 −28 −32 −36
accuracy A
10−1 10−2 10−3 10−4 10−5 10−6 10−7 10−8 10−9
ξ gg → t¯ tg
* wrt OL2 qp benchmark
OL1+CutTools dp * OL1+Collier dp * OL2 dp * OL2 hybrid * OL2 qp (rescaling)
Stability: at least 10 digits from OL2 (hp) in ultra-soft region at least 24 digits from OL2 (qp) → excellent benchmark
27
Stability of OpenLoops: 2 → 3 process with collinear gluon pair with ξ = 10−7, where ξ ∼ Q2
coll
ˆ s
∼ k2
T
ˆ s ( √ ˆ s = 1 TeV, 105 events)
−32 −28 −24 −20 −16 −12 −8 −4 4
instability Amin
10−5 10−4 10−3 10−2 10−1 100
fraction of events gg → t¯ tg
* wrt OL2 qp benchmark
OL1+Collier dp * OL1+CutTools dp * OL2 dp * OL2 hybrid * OL2 qp (rescaling)
Stability: OL1 + Cuttools ∼ 100% points with zero digits OL2 → no points with less than 10 (7) digits in hp (dp) Performance: OL2 (hp) 8 times slower than OL2 (dp) → Major speed-up possible
28
Stability of OpenLoops: 2 → 3 process with collinear gluon pair Dependence of stability on ξ ∼ Q2
coll
ˆ s
∼ k2
T
ˆ s ( √ ˆ s = 1 TeV, hard kinematics fixed)
4 −4 −8 −12 −16 −20 −24 −28 −32
accuracy A
10−1 10−2 10−3 10−4 10−5 10−6 10−7 10−8 10−9
ξ gg → t¯ tg
* wrt OL2 qp benchmark
OL1+CutTools dp * OL1+Collier dp * OL2 dp * OL2 hybrid * OL2 qp (rescaling)
Stability: at least 9 digits from OL2 (hp) in extremely collinear region at least 23 digits from OL2 (qp) → excellent benchmark
29
qµqν =
−1 + Aµν 0 D0
Bµν
−1,λ + 3
i,λDi
qλ,
Di = (q + pi)2 − m2
i,
p0 = 0 Aµν
i , Bµν i,λ constructed from three external momenta p1, p2, p3.
1p2 2
→ Can become dominant in soft and collinear regions. Aµν
i
= 1 ∆12 aµν
i ,
Bµν
i,λ
= 1 ∆2
12
1 √∆123
b(1)
i,λ
µν
+ 1 ∆12
b(2)
i,λ
µν
∆12 → 0
∆123 → 0
30
Aµν
i
= 1 ∆12 aµν
i ,
Bµν
i,λ =
1 ∆2
12
˜
b(1)
i,λ
µν
+ 1 ∆12
b(2)
i,λ
µν
Severe numerical instabilities for ∆12 → 0
{D1, D2, D3} − → {Di1, Di2, Di3} such that parameter ∼ |∆i1i2| is maximal ⇒ avoid small rank-2 Gram determinants until triangle reduction!
q p1 q + p1 p2 − p1 q + p2 − p2
p2
1
= −p2 < 0, p2
2
= −p2(1 + δ), 0 ≤ δ ≪ 1, (p2 − p1)2 = 0, ⇒ ∆ = −p2δ2
Master integrals: C0(p2
1, p2 2) ∼
1 D0D1D2 and B0(p2
1) ∼
1 D0D1
Reduction formulas exhibit poles in δ, e.g. for massless rank-1 topology: Cµ = 1 δ2B0(−p2) [. . .] + 1 δ2B0
δC0
31
δ-poles cancel (also for higher rank):
Cµ = pµ
1 + pµ 2
2p2
1 + 2pµ 2
6p2
∂
∂δ
mB0 and ∂
∂δ
mC0 (all QCD topologies)
→ extremely fast implementation ⇒ Expansion of B0, C0 to any order M in order to reach 16, 32 or more digits ⇒ Uncertainty due to truncation of series avoided entirely.
On-the-fly reduction, no stability improvements
−15 −10 −5 5 10
accuracy A
105 1010 1015 1020 1025 1030
12/∆
2
min
100 101 102 103
(D1, D2, D3)-permutation, no expansion
−15 −10 −5 5 10
accuracy A
105 1010 1015 1020 1025 1030
12/∆
2
min
100 101 102 103
(D1, D2, D3)-permutation + any-order expansions
−15 −10 −5 5 10
accuracy A
105 1010 1015 1020 1025 1030
12/∆
2
min
100 101 102 103
32
such that parameter ∼ |∆i1i2| is maximal (as before) and parameter ∼ |∆i1i2i3| is maximal. ⇒ Avoid small rank-3 (rank-2) Gram determinants until box (triangle) reduction!
counterterms in unresolved regions
→ requires analytical understanding of the origin of instabilities and their cancellation
33
Idea: Promote only open loops enhanced by ∆−n terms and their subsequent dressing, merging and reduction to quad precision (qp), → bulk of the calculation still in double precision (dp) Example:
w1 w2 w3 wN
dp
w1 w2 w3 wN
qp
w1 w2 w3 wN
dp
w1 w2 w3 wN
dp
w1 w2 w3 wN
dp
w1 w2 w3 wN
qp
reduce 34
Idea: Promote only open loops enhanced by ∆−n terms and their subsequent dressing, merging and reduction to quad precision (qp), → bulk of the calculation still in double precision (dp) Example:
w1 w2 w3 wN
dp
w1 w2 w3 wN
qp
w1 w2 w3 wN
dp
w1 w2 w3 wN
dp
w1 w2 w3 wN
dp
w1 w2 w3 wN
qp
reduce
w1 w2 w3 wN
dp
merge
w1 w2 w3 wN
qp
merge 34
Idea: Promote only open loops enhanced by ∆−n terms and their subsequent dressing, merging and reduction to quad precision (qp), → bulk of the calculation still in double precision (dp) Example:
w1 w2 w3 wN
dp
w1 w2 w3 wN
qp
w1 w2 w3 wN
dp
w1 w2 w3 wN
dp
w1 w2 w3 wN
dp
w1 w2 w3 wN
qp
reduce
w1 w2 w3 wN
dp
merge
w1 w2 w3 wN
qp
merge
w1 w2 w3 wN
qp
dress
. . . . . .
w1 w2 w3 wN
dp
dress dress
w1 w2 w3 wN
qp
34
Idea: Promote only open loops enhanced by ∆−n terms and their subsequent dressing, merging and reduction to quad precision (qp), → bulk of the calculation still in double precision (dp) Example:
w1 w2 w3 wN
dp
w1 w2 w3 wN
qp
w1 w2 w3 wN
dp
w1 w2 w3 wN
dp
w1 w2 w3 wN
dp
w1 w2 w3 wN
qp
reduce
w1 w2 w3 wN
dp
merge
w1 w2 w3 wN
qp
merge
w1 w2 w3 wN
qp
dress
. . . . . .
w1 w2 w3 wN
dp
dress dress
w1 w2 w3 wN
qp . . . . . .
Large cancellation 34