Semi-automatic implementation of the complementary error function - - PowerPoint PPT Presentation

semi automatic implementation of the complementary error
SMART_READER_LITE
LIVE PREVIEW

Semi-automatic implementation of the complementary error function - - PowerPoint PPT Presentation

intel cnrs - inria Semi-automatic implementation of the complementary error function Anastasia Volkova (Intel & Inria) Jean-Michel Muller (CNRS) ARITH-26 June 11, 2019 intel cnrs - inria Dont write code, generate it.


slide-1
SLIDE 1

intel cnrs - inria

Semi-automatic implementation of the complementary error function

Anastasia Volkova (Intel & Inria) Jean-Michel Muller (CNRS)

ARITH-26 June 11, 2019

slide-2
SLIDE 2

intel cnrs - inria

Don’t write code, generate it.

  • Mathematical functions are costly

→ rich trade-off possibilities

  • Standard libm is not enough
  • A ”

flavor”per application/target platform

→ high human resource consumption

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 1/13

slide-3
SLIDE 3

intel cnrs - inria

Don’t write code, generate it.

  • Mathematical functions are costly

→ rich trade-off possibilities

  • Standard libm is not enough
  • A ”

flavor”per application/target platform

→ high human resource consumption

Our approach:

  • Automate
  • Generate code on-demand
  • Adapt for specific context

PERFORMANCE GUARANTEED

ACCURACY

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 1/13

slide-4
SLIDE 4

intel cnrs - inria

Metalibm

code generator for libm and beyond

Function Domain Target error … C code Gappa certificate

Code gen/optim Property detection Domain splitting Polynomial approximation (fpminimax)

  • Easy to use
  • Performance comparable to handwritten code
  • Deals with a variety of elementary functions

www.metalibm.org

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 2/13

slide-5
SLIDE 5

intel cnrs - inria

Metalibm

code generator for libm and beyond

Function Domain Target error … C code Gappa certificate

Code gen/optim Property detection Domain splitting Polynomial approximation (fpminimax)

  • Easy to use
  • Performance comparable to handwritten code
  • Deals with a variety of elementary functions ...but special

functions remain a challenge

www.metalibm.org

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 2/13

slide-6
SLIDE 6

intel cnrs - inria

Erf and erfc

erf(x) = 2 √π x e−t2 dt erfc(x) = 2 √π ∞

x

e−t2 dt

Some properties: erfc(x) = 1 − erf(x) erfc(−x) = 2 − erfc(x)

−1 −0.5 0.5 1 1.5 2 −4 −2 2 4 erfc(x) erf(x)

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 3/13

slide-7
SLIDE 7

intel cnrs - inria

Erf and erfc

erf(x) = 2 √π x e−t2 dt erfc(x) = 2 √π ∞

x

e−t2 dt

Some properties: erfc(x) = 1 − erf(x) erfc(−x) = 2 − erfc(x)

−1 −0.5 0.5 1 1.5 2 −4 −2 2 4 erfc(x) erf(x)

Metalibm with binary64 target accuracy:

  • deals with erf(x) on [0; 6] within 49 sec
  • fails with erfc(x) on [0; 28] even after 3 h
  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 3/13

slide-8
SLIDE 8

intel cnrs - inria

Erf and erfc

erf(x) = 2 √π x e−t2 dt erfc(x) = 2 √π ∞

x

e−t2 dt

Some properties: erfc(x) = 1 − erf(x) erfc(−x) = 2 − erfc(x)

−1 −0.5 0.5 1 1.5 2 −4 −2 2 4 erfc(x) erf(x)

Metalibm with binary64 target accuracy:

  • deals with erf(x) on [0; 6] within 49 sec
  • fails with erfc(x) on [0; 28] even after 3 h

Issues:

  • erfc(x) is too ”

flat”

  • not close enough to asymptotic expression e−x2 /(x√π)
  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 3/13

slide-9
SLIDE 9

intel cnrs - inria

Code generation for the erfc(x)

Input: relative error bound δ Output: C code using binary64 data/arithmetic

<latexit sha1_base64="ApdTJyNfridNM9ys58lzxcuadHc=">ACAnicbVDLSsNAFJ3UV62vqCtxM1gEVyURQZdFNy4r2Ac0oUwmN+3QySTMTIQSiht/xY0LRdz6Fe78GydtFtp6YJjDOfemXuClDOlHefbqysrq1vVDdrW9s7u3v2/kFHJZmk0KYJT2QvIAo4E9DWTHPopRJIHDoBuObwu8+gFQsEfd6koIfk6FgEaNEG2lgH3mZCEGklDIvZFKi9ul8XQ6sOtOw5kBLxO3JHVUojWwv7woVkMQlNOlOq7Tqr9nEjNKIdpzcsUmPFjMoS+oYLEoPx8tsIUnxolxFEizREaz9TfHTmJlZrEgamMiR6pRa8Q/P6mY6u/JyJNMg6PyhKONYJ7jIA4dMAtV8Ygihkpm/YjoiJg5tUquZENzFlZdJ57zhOg37qLevC7jqKJjdILOkIsuURPdohZqI4oe0TN6RW/Wk/VivVsf89KVfYcoj+wPn8AGRGX3Q=</latexit><latexit sha1_base64="ApdTJyNfridNM9ys58lzxcuadHc=">ACAnicbVDLSsNAFJ3UV62vqCtxM1gEVyURQZdFNy4r2Ac0oUwmN+3QySTMTIQSiht/xY0LRdz6Fe78GydtFtp6YJjDOfemXuClDOlHefbqysrq1vVDdrW9s7u3v2/kFHJZmk0KYJT2QvIAo4E9DWTHPopRJIHDoBuObwu8+gFQsEfd6koIfk6FgEaNEG2lgH3mZCEGklDIvZFKi9ul8XQ6sOtOw5kBLxO3JHVUojWwv7woVkMQlNOlOq7Tqr9nEjNKIdpzcsUmPFjMoS+oYLEoPx8tsIUnxolxFEizREaz9TfHTmJlZrEgamMiR6pRa8Q/P6mY6u/JyJNMg6PyhKONYJ7jIA4dMAtV8Ygihkpm/YjoiJg5tUquZENzFlZdJ57zhOg37qLevC7jqKJjdILOkIsuURPdohZqI4oe0TN6RW/Wk/VivVsf89KVfYcoj+wPn8AGRGX3Q=</latexit><latexit sha1_base64="ApdTJyNfridNM9ys58lzxcuadHc=">ACAnicbVDLSsNAFJ3UV62vqCtxM1gEVyURQZdFNy4r2Ac0oUwmN+3QySTMTIQSiht/xY0LRdz6Fe78GydtFtp6YJjDOfemXuClDOlHefbqysrq1vVDdrW9s7u3v2/kFHJZmk0KYJT2QvIAo4E9DWTHPopRJIHDoBuObwu8+gFQsEfd6koIfk6FgEaNEG2lgH3mZCEGklDIvZFKi9ul8XQ6sOtOw5kBLxO3JHVUojWwv7woVkMQlNOlOq7Tqr9nEjNKIdpzcsUmPFjMoS+oYLEoPx8tsIUnxolxFEizREaz9TfHTmJlZrEgamMiR6pRa8Q/P6mY6u/JyJNMg6PyhKONYJ7jIA4dMAtV8Ygihkpm/YjoiJg5tUquZENzFlZdJ57zhOg37qLevC7jqKJjdILOkIsuURPdohZqI4oe0TN6RW/Wk/VivVsf89KVfYcoj+wPn8AGRGX3Q=</latexit><latexit sha1_base64="ApdTJyNfridNM9ys58lzxcuadHc=">ACAnicbVDLSsNAFJ3UV62vqCtxM1gEVyURQZdFNy4r2Ac0oUwmN+3QySTMTIQSiht/xY0LRdz6Fe78GydtFtp6YJjDOfemXuClDOlHefbqysrq1vVDdrW9s7u3v2/kFHJZmk0KYJT2QvIAo4E9DWTHPopRJIHDoBuObwu8+gFQsEfd6koIfk6FgEaNEG2lgH3mZCEGklDIvZFKi9ul8XQ6sOtOw5kBLxO3JHVUojWwv7woVkMQlNOlOq7Tqr9nEjNKIdpzcsUmPFjMoS+oYLEoPx8tsIUnxolxFEizREaz9TfHTmJlZrEgamMiR6pRa8Q/P6mY6u/JyJNMg6PyhKONYJ7jIA4dMAtV8Ygihkpm/YjoiJg5tUquZENzFlZdJ57zhOg37qLevC7jqKJjdILOkIsuURPdohZqI4oe0TN6RW/Wk/VivVsf89KVfYcoj+wPn8AGRGX3Q=</latexit> <latexit sha1_base64="R6BnFnPk32NTysE2kjWImDiyOVc=">ACBXicbVDLSsNAFJ3UV62vqEtdBIvgKiRF0WXRjcsK9gFNKJPJTt0MgkzE6GEbtz4K25cKOLWf3Dn3zhps9DWA8MczrmXe+8JUkalcpxvo7Kyura+Ud2sbW3v7O6Z+wcdmWSCQJskLBG9AEtglENbUcWglwrAcCgG4xvCr/7AELShN+rSQp+jIecRpRgpaWBexlPAQRCEwg90YyLf6G3bg8XQ6MOuO7cxgLRO3JHVUojUwv7wIVkMXBGpey7Tqr8HAtFCYNpzcsk6AljPIS+phzHIP18dsXUOtVKaEWJ0I8ra6b+7shxLOUkDnRljNVILnqF+J/Xz1R05eUp5kCTuaDoxZKrGKSKyQCiCKTBRFC9q0VGWCeidHA1HYK7ePIy6TRs17Hdu/N687qMo4qO0Ak6Qy6RE10i1qojQh6RM/oFb0ZT8aL8W58zEsrRtlziP7A+PwBiEqYkQ=</latexit><latexit sha1_base64="R6BnFnPk32NTysE2kjWImDiyOVc=">ACBXicbVDLSsNAFJ3UV62vqEtdBIvgKiRF0WXRjcsK9gFNKJPJTt0MgkzE6GEbtz4K25cKOLWf3Dn3zhps9DWA8MczrmXe+8JUkalcpxvo7Kyura+Ud2sbW3v7O6Z+wcdmWSCQJskLBG9AEtglENbUcWglwrAcCgG4xvCr/7AELShN+rSQp+jIecRpRgpaWBexlPAQRCEwg90YyLf6G3bg8XQ6MOuO7cxgLRO3JHVUojUwv7wIVkMXBGpey7Tqr8HAtFCYNpzcsk6AljPIS+phzHIP18dsXUOtVKaEWJ0I8ra6b+7shxLOUkDnRljNVILnqF+J/Xz1R05eUp5kCTuaDoxZKrGKSKyQCiCKTBRFC9q0VGWCeidHA1HYK7ePIy6TRs17Hdu/N687qMo4qO0Ak6Qy6RE10i1qojQh6RM/oFb0ZT8aL8W58zEsrRtlziP7A+PwBiEqYkQ=</latexit><latexit sha1_base64="R6BnFnPk32NTysE2kjWImDiyOVc=">ACBXicbVDLSsNAFJ3UV62vqEtdBIvgKiRF0WXRjcsK9gFNKJPJTt0MgkzE6GEbtz4K25cKOLWf3Dn3zhps9DWA8MczrmXe+8JUkalcpxvo7Kyura+Ud2sbW3v7O6Z+wcdmWSCQJskLBG9AEtglENbUcWglwrAcCgG4xvCr/7AELShN+rSQp+jIecRpRgpaWBexlPAQRCEwg90YyLf6G3bg8XQ6MOuO7cxgLRO3JHVUojUwv7wIVkMXBGpey7Tqr8HAtFCYNpzcsk6AljPIS+phzHIP18dsXUOtVKaEWJ0I8ra6b+7shxLOUkDnRljNVILnqF+J/Xz1R05eUp5kCTuaDoxZKrGKSKyQCiCKTBRFC9q0VGWCeidHA1HYK7ePIy6TRs17Hdu/N687qMo4qO0Ak6Qy6RE10i1qojQh6RM/oFb0ZT8aL8W58zEsrRtlziP7A+PwBiEqYkQ=</latexit><latexit sha1_base64="R6BnFnPk32NTysE2kjWImDiyOVc=">ACBXicbVDLSsNAFJ3UV62vqEtdBIvgKiRF0WXRjcsK9gFNKJPJTt0MgkzE6GEbtz4K25cKOLWf3Dn3zhps9DWA8MczrmXe+8JUkalcpxvo7Kyura+Ud2sbW3v7O6Z+wcdmWSCQJskLBG9AEtglENbUcWglwrAcCgG4xvCr/7AELShN+rSQp+jIecRpRgpaWBexlPAQRCEwg90YyLf6G3bg8XQ6MOuO7cxgLRO3JHVUojUwv7wIVkMXBGpey7Tqr8HAtFCYNpzcsk6AljPIS+phzHIP18dsXUOtVKaEWJ0I8ra6b+7shxLOUkDnRljNVILnqF+J/Xz1R05eUp5kCTuaDoxZKrGKSKyQCiCKTBRFC9q0VGWCeidHA1HYK7ePIy6TRs17Hdu/N687qMo4qO0Ak6Qy6RE10i1qojQh6RM/oFb0ZT8aL8W58zEsrRtlziP7A+PwBiEqYkQ=</latexit> <latexit sha1_base64="PRTl3bHpm6zAJxCaoakvt/Wuqc=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgqulB3FewDmhAm0k7dGYSZibSErPwV9y4UMStv+HOv3HaZqGtBy4czrmXe+8JE0aVdpxva2FxaXltbRWXt/Y3Nq2d3abKk4lJg0cs1i2Q6QIo4I0NWMtBNJEA8ZaYWDq7HfeiBS0Vjc61FCfI56gkYUI2kwN7PhkHm8TAeZp6mYgQvb6/zPA/silN1JoDzxC1IBRSoB/aX141xyonQmCGlOq6TaD9DUlPMSF72UkUShAeoRzqGCsSJ8rPJ/Tk8MkoXRrE0JTScqL8nMsSVGvHQdHKk+2rWG4v/eZ1UR+d+RkWSaiLwdFGUMqhjOA4DdqkWLORIQhLam6FuI8kwtpEVjYhuLMvz5PmSdV1qu7daV2UcRAgfgEBwDF5yBGrgBdAGDyCZ/AK3qwn68V6tz6mrQtWMbMH/sD6/AGICpZi</latexit><latexit sha1_base64="PRTl3bHpm6zAJxCaoakvt/Wuqc=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgqulB3FewDmhAm0k7dGYSZibSErPwV9y4UMStv+HOv3HaZqGtBy4czrmXe+8JE0aVdpxva2FxaXltbRWXt/Y3Nq2d3abKk4lJg0cs1i2Q6QIo4I0NWMtBNJEA8ZaYWDq7HfeiBS0Vjc61FCfI56gkYUI2kwN7PhkHm8TAeZp6mYgQvb6/zPA/silN1JoDzxC1IBRSoB/aX141xyonQmCGlOq6TaD9DUlPMSF72UkUShAeoRzqGCsSJ8rPJ/Tk8MkoXRrE0JTScqL8nMsSVGvHQdHKk+2rWG4v/eZ1UR+d+RkWSaiLwdFGUMqhjOA4DdqkWLORIQhLam6FuI8kwtpEVjYhuLMvz5PmSdV1qu7daV2UcRAgfgEBwDF5yBGrgBdAGDyCZ/AK3qwn68V6tz6mrQtWMbMH/sD6/AGICpZi</latexit><latexit sha1_base64="PRTl3bHpm6zAJxCaoakvt/Wuqc=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgqulB3FewDmhAm0k7dGYSZibSErPwV9y4UMStv+HOv3HaZqGtBy4czrmXe+8JE0aVdpxva2FxaXltbRWXt/Y3Nq2d3abKk4lJg0cs1i2Q6QIo4I0NWMtBNJEA8ZaYWDq7HfeiBS0Vjc61FCfI56gkYUI2kwN7PhkHm8TAeZp6mYgQvb6/zPA/silN1JoDzxC1IBRSoB/aX141xyonQmCGlOq6TaD9DUlPMSF72UkUShAeoRzqGCsSJ8rPJ/Tk8MkoXRrE0JTScqL8nMsSVGvHQdHKk+2rWG4v/eZ1UR+d+RkWSaiLwdFGUMqhjOA4DdqkWLORIQhLam6FuI8kwtpEVjYhuLMvz5PmSdV1qu7daV2UcRAgfgEBwDF5yBGrgBdAGDyCZ/AK3qwn68V6tz6mrQtWMbMH/sD6/AGICpZi</latexit><latexit sha1_base64="PRTl3bHpm6zAJxCaoakvt/Wuqc=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgqulB3FewDmhAm0k7dGYSZibSErPwV9y4UMStv+HOv3HaZqGtBy4czrmXe+8JE0aVdpxva2FxaXltbRWXt/Y3Nq2d3abKk4lJg0cs1i2Q6QIo4I0NWMtBNJEA8ZaYWDq7HfeiBS0Vjc61FCfI56gkYUI2kwN7PhkHm8TAeZp6mYgQvb6/zPA/silN1JoDzxC1IBRSoB/aX141xyonQmCGlOq6TaD9DUlPMSF72UkUShAeoRzqGCsSJ8rPJ/Tk8MkoXRrE0JTScqL8nMsSVGvHQdHKk+2rWG4v/eZ1UR+d+RkWSaiLwdFGUMqhjOA4DdqkWLORIQhLam6FuI8kwtpEVjYhuLMvz5PmSdV1qu7daV2UcRAgfgEBwDF5yBGrgBdAGDyCZ/AK3qwn68V6tz6mrQtWMbMH/sD6/AGICpZi</latexit> <latexit sha1_base64="qRnj04GDTG6mI6G3y1uoqtZEK2Y=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgq6EIXQgX7gCaEyXTSDp2ZhJmJtMQs/BU3LhRx62+482+ctlo64ELh3Pu5d57woRpR3n21pYXFpeWS2tldc3Nre27Z3dpopTiUkDxyW7RApwqgDU01I+1EsRDRlrh4HLstx6IVDQW93qUEJ+jnqARxUgbKbD3s2GQeTyMh5mnqRjB25urPM8Du+JUnQngPHELUgEF6oH95XVjnHIiNGZIqY7rJNrPkNQUM5KXvVSRBOEB6pGOoQJxovxscn8Oj4zShVEsTQkNJ+rviQxpUY8NJ0c6b6a9cbif14n1dG5n1GRpJoIPF0UpQzqGI7DgF0qCdZsZAjCkpbIe4jibA2kZVNCO7sy/OkeVJ1nap7d1qpXRxlMABOATHwAVnoAauQR0AaP4Bm8gjfryXqx3q2PaeuCVczsgT+wPn8AlFiWag=</latexit><latexit sha1_base64="qRnj04GDTG6mI6G3y1uoqtZEK2Y=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgq6EIXQgX7gCaEyXTSDp2ZhJmJtMQs/BU3LhRx62+482+ctlo64ELh3Pu5d57woRpR3n21pYXFpeWS2tldc3Nre27Z3dpopTiUkDxyW7RApwqgDU01I+1EsRDRlrh4HLstx6IVDQW93qUEJ+jnqARxUgbKbD3s2GQeTyMh5mnqRjB25urPM8Du+JUnQngPHELUgEF6oH95XVjnHIiNGZIqY7rJNrPkNQUM5KXvVSRBOEB6pGOoQJxovxscn8Oj4zShVEsTQkNJ+rviQxpUY8NJ0c6b6a9cbif14n1dG5n1GRpJoIPF0UpQzqGI7DgF0qCdZsZAjCkpbIe4jibA2kZVNCO7sy/OkeVJ1nap7d1qpXRxlMABOATHwAVnoAauQR0AaP4Bm8gjfryXqx3q2PaeuCVczsgT+wPn8AlFiWag=</latexit><latexit sha1_base64="qRnj04GDTG6mI6G3y1uoqtZEK2Y=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgq6EIXQgX7gCaEyXTSDp2ZhJmJtMQs/BU3LhRx62+482+ctlo64ELh3Pu5d57woRpR3n21pYXFpeWS2tldc3Nre27Z3dpopTiUkDxyW7RApwqgDU01I+1EsRDRlrh4HLstx6IVDQW93qUEJ+jnqARxUgbKbD3s2GQeTyMh5mnqRjB25urPM8Du+JUnQngPHELUgEF6oH95XVjnHIiNGZIqY7rJNrPkNQUM5KXvVSRBOEB6pGOoQJxovxscn8Oj4zShVEsTQkNJ+rviQxpUY8NJ0c6b6a9cbif14n1dG5n1GRpJoIPF0UpQzqGI7DgF0qCdZsZAjCkpbIe4jibA2kZVNCO7sy/OkeVJ1nap7d1qpXRxlMABOATHwAVnoAauQR0AaP4Bm8gjfryXqx3q2PaeuCVczsgT+wPn8AlFiWag=</latexit><latexit sha1_base64="qRnj04GDTG6mI6G3y1uoqtZEK2Y=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgq6EIXQgX7gCaEyXTSDp2ZhJmJtMQs/BU3LhRx62+482+ctlo64ELh3Pu5d57woRpR3n21pYXFpeWS2tldc3Nre27Z3dpopTiUkDxyW7RApwqgDU01I+1EsRDRlrh4HLstx6IVDQW93qUEJ+jnqARxUgbKbD3s2GQeTyMh5mnqRjB25urPM8Du+JUnQngPHELUgEF6oH95XVjnHIiNGZIqY7rJNrPkNQUM5KXvVSRBOEB6pGOoQJxovxscn8Oj4zShVEsTQkNJ+rviQxpUY8NJ0c6b6a9cbif14n1dG5n1GRpJoIPF0UpQzqGI7DgF0qCdZsZAjCkpbIe4jibA2kZVNCO7sy/OkeVJ1nap7d1qpXRxlMABOATHwAVnoAauQR0AaP4Bm8gjfryXqx3q2PaeuCVczsgT+wPn8AlFiWag=</latexit> <latexit sha1_base64="mfgeADO2zkTd2xPe9t81NpfKe/Q=">ACAXicbVDLSsNAFJ34rPUVdSO4GSyCq5KIoLiquHFRoaJ9QBPCZDph04mYWYiDSFu/BU3LhRx61+482+ctlo64ELh3Pu5d57/JhRqSzr21hYXFpeWS2tldc3Nre2zZ3dlowSgUkTRywSHR9JwignTUVI51YEBT6jLT94dXYbz8QIWnE71UaEzdEfU4DipHSkmfuZyMvc0I/GmWOojyFdzeX9Xqe5ZsarWBHCe2AWpgAINz/xyehFOQsIVZkjKrm3Fys2QUBQzkpedRJIY4SHqk6mHIVEutnkgxweaUHg0jo4gpO1N8TGQqlTENfd4ZIDeSsNxb/87qJCs7djPI4UYTj6aIgYVBFcBwH7FBsGKpJgLqm+FeIAEwkqHVtYh2LMvz5PWSdW2qvbtaV2UcRAgfgEBwDG5yBGrgGDdAEGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AHg7pcd</latexit><latexit sha1_base64="mfgeADO2zkTd2xPe9t81NpfKe/Q=">ACAXicbVDLSsNAFJ34rPUVdSO4GSyCq5KIoLiquHFRoaJ9QBPCZDph04mYWYiDSFu/BU3LhRx61+482+ctlo64ELh3Pu5d57/JhRqSzr21hYXFpeWS2tldc3Nre2zZ3dlowSgUkTRywSHR9JwignTUVI51YEBT6jLT94dXYbz8QIWnE71UaEzdEfU4DipHSkmfuZyMvc0I/GmWOojyFdzeX9Xqe5ZsarWBHCe2AWpgAINz/xyehFOQsIVZkjKrm3Fys2QUBQzkpedRJIY4SHqk6mHIVEutnkgxweaUHg0jo4gpO1N8TGQqlTENfd4ZIDeSsNxb/87qJCs7djPI4UYTj6aIgYVBFcBwH7FBsGKpJgLqm+FeIAEwkqHVtYh2LMvz5PWSdW2qvbtaV2UcRAgfgEBwDG5yBGrgGDdAEGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AHg7pcd</latexit><latexit sha1_base64="mfgeADO2zkTd2xPe9t81NpfKe/Q=">ACAXicbVDLSsNAFJ34rPUVdSO4GSyCq5KIoLiquHFRoaJ9QBPCZDph04mYWYiDSFu/BU3LhRx61+482+ctlo64ELh3Pu5d57/JhRqSzr21hYXFpeWS2tldc3Nre2zZ3dlowSgUkTRywSHR9JwignTUVI51YEBT6jLT94dXYbz8QIWnE71UaEzdEfU4DipHSkmfuZyMvc0I/GmWOojyFdzeX9Xqe5ZsarWBHCe2AWpgAINz/xyehFOQsIVZkjKrm3Fys2QUBQzkpedRJIY4SHqk6mHIVEutnkgxweaUHg0jo4gpO1N8TGQqlTENfd4ZIDeSsNxb/87qJCs7djPI4UYTj6aIgYVBFcBwH7FBsGKpJgLqm+FeIAEwkqHVtYh2LMvz5PWSdW2qvbtaV2UcRAgfgEBwDG5yBGrgGDdAEGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AHg7pcd</latexit><latexit sha1_base64="mfgeADO2zkTd2xPe9t81NpfKe/Q=">ACAXicbVDLSsNAFJ34rPUVdSO4GSyCq5KIoLiquHFRoaJ9QBPCZDph04mYWYiDSFu/BU3LhRx61+482+ctlo64ELh3Pu5d57/JhRqSzr21hYXFpeWS2tldc3Nre2zZ3dlowSgUkTRywSHR9JwignTUVI51YEBT6jLT94dXYbz8QIWnE71UaEzdEfU4DipHSkmfuZyMvc0I/GmWOojyFdzeX9Xqe5ZsarWBHCe2AWpgAINz/xyehFOQsIVZkjKrm3Fys2QUBQzkpedRJIY4SHqk6mHIVEutnkgxweaUHg0jo4gpO1N8TGQqlTENfd4ZIDeSsNxb/87qJCs7djPI4UYTj6aIgYVBFcBwH7FBsGKpJgLqm+FeIAEwkqHVtYh2LMvz5PWSdW2qvbtaV2UcRAgfgEBwDG5yBGrgGDdAEGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AHg7pcd</latexit> <latexit sha1_base64="+kYPchScNyik1n1ulpTEzE1fdeA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqHgqePHYgv2ANpTNdtKu3WzC7kYob/AiwdFvPqTvPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsbm1vVPcLe3tHxwelY9P2jpOFcMWi0WsugHVKLjEluFGYDdRSKNAYCeY3M39zhMqzWP5YKYJ+hEdSR5yRo2Vmu6gXHGr7gJknXg5qUCOxqD81R/GLI1QGiao1j3PTYyfUWU4Ezgr9VONCWUTOsKepZJGqP1sceiMXFhlSMJY2ZKGLNTfExmNtJ5Gge2MqBnrVW8u/uf1UhPe+BmXSWpQsuWiMBXExGT+NRlyhcyIqSWUKW5vJWxMFWXGZlOyIXirL6+T9lXVc6te87pSv83jKMIZnMleFCDOtxDA1rAOEZXuHNeXRenHfnY9lacPKZU/gD5/MHdrGMrA=</latexit><latexit sha1_base64="+kYPchScNyik1n1ulpTEzE1fdeA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqHgqePHYgv2ANpTNdtKu3WzC7kYob/AiwdFvPqTvPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsbm1vVPcLe3tHxwelY9P2jpOFcMWi0WsugHVKLjEluFGYDdRSKNAYCeY3M39zhMqzWP5YKYJ+hEdSR5yRo2Vmu6gXHGr7gJknXg5qUCOxqD81R/GLI1QGiao1j3PTYyfUWU4Ezgr9VONCWUTOsKepZJGqP1sceiMXFhlSMJY2ZKGLNTfExmNtJ5Gge2MqBnrVW8u/uf1UhPe+BmXSWpQsuWiMBXExGT+NRlyhcyIqSWUKW5vJWxMFWXGZlOyIXirL6+T9lXVc6te87pSv83jKMIZnMleFCDOtxDA1rAOEZXuHNeXRenHfnY9lacPKZU/gD5/MHdrGMrA=</latexit><latexit sha1_base64="+kYPchScNyik1n1ulpTEzE1fdeA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqHgqePHYgv2ANpTNdtKu3WzC7kYob/AiwdFvPqTvPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsbm1vVPcLe3tHxwelY9P2jpOFcMWi0WsugHVKLjEluFGYDdRSKNAYCeY3M39zhMqzWP5YKYJ+hEdSR5yRo2Vmu6gXHGr7gJknXg5qUCOxqD81R/GLI1QGiao1j3PTYyfUWU4Ezgr9VONCWUTOsKepZJGqP1sceiMXFhlSMJY2ZKGLNTfExmNtJ5Gge2MqBnrVW8u/uf1UhPe+BmXSWpQsuWiMBXExGT+NRlyhcyIqSWUKW5vJWxMFWXGZlOyIXirL6+T9lXVc6te87pSv83jKMIZnMleFCDOtxDA1rAOEZXuHNeXRenHfnY9lacPKZU/gD5/MHdrGMrA=</latexit><latexit sha1_base64="+kYPchScNyik1n1ulpTEzE1fdeA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqHgqePHYgv2ANpTNdtKu3WzC7kYob/AiwdFvPqTvPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsbm1vVPcLe3tHxwelY9P2jpOFcMWi0WsugHVKLjEluFGYDdRSKNAYCeY3M39zhMqzWP5YKYJ+hEdSR5yRo2Vmu6gXHGr7gJknXg5qUCOxqD81R/GLI1QGiao1j3PTYyfUWU4Ezgr9VONCWUTOsKepZJGqP1sceiMXFhlSMJY2ZKGLNTfExmNtJ5Gge2MqBnrVW8u/uf1UhPe+BmXSWpQsuWiMBXExGT+NRlyhcyIqSWUKW5vJWxMFWXGZlOyIXirL6+T9lXVc6te87pSv83jKMIZnMleFCDOtxDA1rAOEZXuHNeXRenHfnY9lacPKZU/gD5/MHdrGMrA=</latexit> <latexit sha1_base64="V4e5TdQqlzv2ZBQlxV5oxpiYgY=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUTwVvHhswX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFxi03AjsJMopFEgsB2M72Z+wmV5rF8MJME/YgOJQ85o8ZKjat+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaT2JAtsZUTPSy95M/M/rpia8TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpvHUYQTOIVz8OAanAPdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fkWMsQ=</latexit><latexit sha1_base64="V4e5TdQqlzv2ZBQlxV5oxpiYgY=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUTwVvHhswX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFxi03AjsJMopFEgsB2M72Z+wmV5rF8MJME/YgOJQ85o8ZKjat+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaT2JAtsZUTPSy95M/M/rpia8TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpvHUYQTOIVz8OAanAPdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fkWMsQ=</latexit><latexit sha1_base64="V4e5TdQqlzv2ZBQlxV5oxpiYgY=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUTwVvHhswX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFxi03AjsJMopFEgsB2M72Z+wmV5rF8MJME/YgOJQ85o8ZKjat+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaT2JAtsZUTPSy95M/M/rpia8TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpvHUYQTOIVz8OAanAPdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fkWMsQ=</latexit><latexit sha1_base64="V4e5TdQqlzv2ZBQlxV5oxpiYgY=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUTwVvHhswX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFxi03AjsJMopFEgsB2M72Z+wmV5rF8MJME/YgOJQ85o8ZKjat+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaT2JAtsZUTPSy95M/M/rpia8TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpvHUYQTOIVz8OAanAPdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fkWMsQ=</latexit>
  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 4/13

slide-10
SLIDE 10

intel cnrs - inria

Code generation for the erfc(x)

Input: relative error bound δ Output: C code using binary64 data/arithmetic Our approach: “Easy”zones:

  • directly use Metalibm

“Difficult”zone:

  • asymptotic expression
  • correction back to erfc(x)
  • re-partition of the error budget

0.5 1 1.5 2 −4 −2 2 4 erfc(x)

“easy”

| {z }

<latexit sha1_base64="ApdTJyNfridNM9ys58lzxcuadHc=">ACAnicbVDLSsNAFJ3UV62vqCtxM1gEVyURQZdFNy4r2Ac0oUwmN+3QySTMTIQSiht/xY0LRdz6Fe78GydtFtp6YJjDOfemXuClDOlHefbqysrq1vVDdrW9s7u3v2/kFHJZmk0KYJT2QvIAo4E9DWTHPopRJIHDoBuObwu8+gFQsEfd6koIfk6FgEaNEG2lgH3mZCEGklDIvZFKi9ul8XQ6sOtOw5kBLxO3JHVUojWwv7woVkMQlNOlOq7Tqr9nEjNKIdpzcsUmPFjMoS+oYLEoPx8tsIUnxolxFEizREaz9TfHTmJlZrEgamMiR6pRa8Q/P6mY6u/JyJNMg6PyhKONYJ7jIA4dMAtV8Ygihkpm/YjoiJg5tUquZENzFlZdJ57zhOg37qLevC7jqKJjdILOkIsuURPdohZqI4oe0TN6RW/Wk/VivVsf89KVfYcoj+wPn8AGRGX3Q=</latexit><latexit sha1_base64="ApdTJyNfridNM9ys58lzxcuadHc=">ACAnicbVDLSsNAFJ3UV62vqCtxM1gEVyURQZdFNy4r2Ac0oUwmN+3QySTMTIQSiht/xY0LRdz6Fe78GydtFtp6YJjDOfemXuClDOlHefbqysrq1vVDdrW9s7u3v2/kFHJZmk0KYJT2QvIAo4E9DWTHPopRJIHDoBuObwu8+gFQsEfd6koIfk6FgEaNEG2lgH3mZCEGklDIvZFKi9ul8XQ6sOtOw5kBLxO3JHVUojWwv7woVkMQlNOlOq7Tqr9nEjNKIdpzcsUmPFjMoS+oYLEoPx8tsIUnxolxFEizREaz9TfHTmJlZrEgamMiR6pRa8Q/P6mY6u/JyJNMg6PyhKONYJ7jIA4dMAtV8Ygihkpm/YjoiJg5tUquZENzFlZdJ57zhOg37qLevC7jqKJjdILOkIsuURPdohZqI4oe0TN6RW/Wk/VivVsf89KVfYcoj+wPn8AGRGX3Q=</latexit><latexit sha1_base64="ApdTJyNfridNM9ys58lzxcuadHc=">ACAnicbVDLSsNAFJ3UV62vqCtxM1gEVyURQZdFNy4r2Ac0oUwmN+3QySTMTIQSiht/xY0LRdz6Fe78GydtFtp6YJjDOfemXuClDOlHefbqysrq1vVDdrW9s7u3v2/kFHJZmk0KYJT2QvIAo4E9DWTHPopRJIHDoBuObwu8+gFQsEfd6koIfk6FgEaNEG2lgH3mZCEGklDIvZFKi9ul8XQ6sOtOw5kBLxO3JHVUojWwv7woVkMQlNOlOq7Tqr9nEjNKIdpzcsUmPFjMoS+oYLEoPx8tsIUnxolxFEizREaz9TfHTmJlZrEgamMiR6pRa8Q/P6mY6u/JyJNMg6PyhKONYJ7jIA4dMAtV8Ygihkpm/YjoiJg5tUquZENzFlZdJ57zhOg37qLevC7jqKJjdILOkIsuURPdohZqI4oe0TN6RW/Wk/VivVsf89KVfYcoj+wPn8AGRGX3Q=</latexit><latexit sha1_base64="ApdTJyNfridNM9ys58lzxcuadHc=">ACAnicbVDLSsNAFJ3UV62vqCtxM1gEVyURQZdFNy4r2Ac0oUwmN+3QySTMTIQSiht/xY0LRdz6Fe78GydtFtp6YJjDOfemXuClDOlHefbqysrq1vVDdrW9s7u3v2/kFHJZmk0KYJT2QvIAo4E9DWTHPopRJIHDoBuObwu8+gFQsEfd6koIfk6FgEaNEG2lgH3mZCEGklDIvZFKi9ul8XQ6sOtOw5kBLxO3JHVUojWwv7woVkMQlNOlOq7Tqr9nEjNKIdpzcsUmPFjMoS+oYLEoPx8tsIUnxolxFEizREaz9TfHTmJlZrEgamMiR6pRa8Q/P6mY6u/JyJNMg6PyhKONYJ7jIA4dMAtV8Ygihkpm/YjoiJg5tUquZENzFlZdJ57zhOg37qLevC7jqKJjdILOkIsuURPdohZqI4oe0TN6RW/Wk/VivVsf89KVfYcoj+wPn8AGRGX3Q=</latexit>

| {z }

<latexit sha1_base64="R6BnFnPk32NTysE2kjWImDiyOVc=">ACBXicbVDLSsNAFJ3UV62vqEtdBIvgKiRF0WXRjcsK9gFNKJPJTt0MgkzE6GEbtz4K25cKOLWf3Dn3zhps9DWA8MczrmXe+8JUkalcpxvo7Kyura+Ud2sbW3v7O6Z+wcdmWSCQJskLBG9AEtglENbUcWglwrAcCgG4xvCr/7AELShN+rSQp+jIecRpRgpaWBexlPAQRCEwg90YyLf6G3bg8XQ6MOuO7cxgLRO3JHVUojUwv7wIVkMXBGpey7Tqr8HAtFCYNpzcsk6AljPIS+phzHIP18dsXUOtVKaEWJ0I8ra6b+7shxLOUkDnRljNVILnqF+J/Xz1R05eUp5kCTuaDoxZKrGKSKyQCiCKTBRFC9q0VGWCeidHA1HYK7ePIy6TRs17Hdu/N687qMo4qO0Ak6Qy6RE10i1qojQh6RM/oFb0ZT8aL8W58zEsrRtlziP7A+PwBiEqYkQ=</latexit><latexit sha1_base64="R6BnFnPk32NTysE2kjWImDiyOVc=">ACBXicbVDLSsNAFJ3UV62vqEtdBIvgKiRF0WXRjcsK9gFNKJPJTt0MgkzE6GEbtz4K25cKOLWf3Dn3zhps9DWA8MczrmXe+8JUkalcpxvo7Kyura+Ud2sbW3v7O6Z+wcdmWSCQJskLBG9AEtglENbUcWglwrAcCgG4xvCr/7AELShN+rSQp+jIecRpRgpaWBexlPAQRCEwg90YyLf6G3bg8XQ6MOuO7cxgLRO3JHVUojUwv7wIVkMXBGpey7Tqr8HAtFCYNpzcsk6AljPIS+phzHIP18dsXUOtVKaEWJ0I8ra6b+7shxLOUkDnRljNVILnqF+J/Xz1R05eUp5kCTuaDoxZKrGKSKyQCiCKTBRFC9q0VGWCeidHA1HYK7ePIy6TRs17Hdu/N687qMo4qO0Ak6Qy6RE10i1qojQh6RM/oFb0ZT8aL8W58zEsrRtlziP7A+PwBiEqYkQ=</latexit><latexit sha1_base64="R6BnFnPk32NTysE2kjWImDiyOVc=">ACBXicbVDLSsNAFJ3UV62vqEtdBIvgKiRF0WXRjcsK9gFNKJPJTt0MgkzE6GEbtz4K25cKOLWf3Dn3zhps9DWA8MczrmXe+8JUkalcpxvo7Kyura+Ud2sbW3v7O6Z+wcdmWSCQJskLBG9AEtglENbUcWglwrAcCgG4xvCr/7AELShN+rSQp+jIecRpRgpaWBexlPAQRCEwg90YyLf6G3bg8XQ6MOuO7cxgLRO3JHVUojUwv7wIVkMXBGpey7Tqr8HAtFCYNpzcsk6AljPIS+phzHIP18dsXUOtVKaEWJ0I8ra6b+7shxLOUkDnRljNVILnqF+J/Xz1R05eUp5kCTuaDoxZKrGKSKyQCiCKTBRFC9q0VGWCeidHA1HYK7ePIy6TRs17Hdu/N687qMo4qO0Ak6Qy6RE10i1qojQh6RM/oFb0ZT8aL8W58zEsrRtlziP7A+PwBiEqYkQ=</latexit><latexit sha1_base64="R6BnFnPk32NTysE2kjWImDiyOVc=">ACBXicbVDLSsNAFJ3UV62vqEtdBIvgKiRF0WXRjcsK9gFNKJPJTt0MgkzE6GEbtz4K25cKOLWf3Dn3zhps9DWA8MczrmXe+8JUkalcpxvo7Kyura+Ud2sbW3v7O6Z+wcdmWSCQJskLBG9AEtglENbUcWglwrAcCgG4xvCr/7AELShN+rSQp+jIecRpRgpaWBexlPAQRCEwg90YyLf6G3bg8XQ6MOuO7cxgLRO3JHVUojUwv7wIVkMXBGpey7Tqr8HAtFCYNpzcsk6AljPIS+phzHIP18dsXUOtVKaEWJ0I8ra6b+7shxLOUkDnRljNVILnqF+J/Xz1R05eUp5kCTuaDoxZKrGKSKyQCiCKTBRFC9q0VGWCeidHA1HYK7ePIy6TRs17Hdu/N687qMo4qO0Ak6Qy6RE10i1qojQh6RM/oFb0ZT8aL8W58zEsrRtlziP7A+PwBiEqYkQ=</latexit>

“difficult” xBIG

<latexit sha1_base64="PRTl3bHpm6zAJxCaoakvt/Wuqc=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgqulB3FewDmhAm0k7dGYSZibSErPwV9y4UMStv+HOv3HaZqGtBy4czrmXe+8JE0aVdpxva2FxaXltbRWXt/Y3Nq2d3abKk4lJg0cs1i2Q6QIo4I0NWMtBNJEA8ZaYWDq7HfeiBS0Vjc61FCfI56gkYUI2kwN7PhkHm8TAeZp6mYgQvb6/zPA/silN1JoDzxC1IBRSoB/aX141xyonQmCGlOq6TaD9DUlPMSF72UkUShAeoRzqGCsSJ8rPJ/Tk8MkoXRrE0JTScqL8nMsSVGvHQdHKk+2rWG4v/eZ1UR+d+RkWSaiLwdFGUMqhjOA4DdqkWLORIQhLam6FuI8kwtpEVjYhuLMvz5PmSdV1qu7daV2UcRAgfgEBwDF5yBGrgBdAGDyCZ/AK3qwn68V6tz6mrQtWMbMH/sD6/AGICpZi</latexit><latexit sha1_base64="PRTl3bHpm6zAJxCaoakvt/Wuqc=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgqulB3FewDmhAm0k7dGYSZibSErPwV9y4UMStv+HOv3HaZqGtBy4czrmXe+8JE0aVdpxva2FxaXltbRWXt/Y3Nq2d3abKk4lJg0cs1i2Q6QIo4I0NWMtBNJEA8ZaYWDq7HfeiBS0Vjc61FCfI56gkYUI2kwN7PhkHm8TAeZp6mYgQvb6/zPA/silN1JoDzxC1IBRSoB/aX141xyonQmCGlOq6TaD9DUlPMSF72UkUShAeoRzqGCsSJ8rPJ/Tk8MkoXRrE0JTScqL8nMsSVGvHQdHKk+2rWG4v/eZ1UR+d+RkWSaiLwdFGUMqhjOA4DdqkWLORIQhLam6FuI8kwtpEVjYhuLMvz5PmSdV1qu7daV2UcRAgfgEBwDF5yBGrgBdAGDyCZ/AK3qwn68V6tz6mrQtWMbMH/sD6/AGICpZi</latexit><latexit sha1_base64="PRTl3bHpm6zAJxCaoakvt/Wuqc=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgqulB3FewDmhAm0k7dGYSZibSErPwV9y4UMStv+HOv3HaZqGtBy4czrmXe+8JE0aVdpxva2FxaXltbRWXt/Y3Nq2d3abKk4lJg0cs1i2Q6QIo4I0NWMtBNJEA8ZaYWDq7HfeiBS0Vjc61FCfI56gkYUI2kwN7PhkHm8TAeZp6mYgQvb6/zPA/silN1JoDzxC1IBRSoB/aX141xyonQmCGlOq6TaD9DUlPMSF72UkUShAeoRzqGCsSJ8rPJ/Tk8MkoXRrE0JTScqL8nMsSVGvHQdHKk+2rWG4v/eZ1UR+d+RkWSaiLwdFGUMqhjOA4DdqkWLORIQhLam6FuI8kwtpEVjYhuLMvz5PmSdV1qu7daV2UcRAgfgEBwDF5yBGrgBdAGDyCZ/AK3qwn68V6tz6mrQtWMbMH/sD6/AGICpZi</latexit><latexit sha1_base64="PRTl3bHpm6zAJxCaoakvt/Wuqc=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgqulB3FewDmhAm0k7dGYSZibSErPwV9y4UMStv+HOv3HaZqGtBy4czrmXe+8JE0aVdpxva2FxaXltbRWXt/Y3Nq2d3abKk4lJg0cs1i2Q6QIo4I0NWMtBNJEA8ZaYWDq7HfeiBS0Vjc61FCfI56gkYUI2kwN7PhkHm8TAeZp6mYgQvb6/zPA/silN1JoDzxC1IBRSoB/aX141xyonQmCGlOq6TaD9DUlPMSF72UkUShAeoRzqGCsSJ8rPJ/Tk8MkoXRrE0JTScqL8nMsSVGvHQdHKk+2rWG4v/eZ1UR+d+RkWSaiLwdFGUMqhjOA4DdqkWLORIQhLam6FuI8kwtpEVjYhuLMvz5PmSdV1qu7daV2UcRAgfgEBwDF5yBGrgBdAGDyCZ/AK3qwn68V6tz6mrQtWMbMH/sD6/AGICpZi</latexit>

xMID

<latexit sha1_base64="qRnj04GDTG6mI6G3y1uoqtZEK2Y=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgq6EIXQgX7gCaEyXTSDp2ZhJmJtMQs/BU3LhRx62+482+ctlo64ELh3Pu5d57woRpR3n21pYXFpeWS2tldc3Nre27Z3dpopTiUkDxyW7RApwqgDU01I+1EsRDRlrh4HLstx6IVDQW93qUEJ+jnqARxUgbKbD3s2GQeTyMh5mnqRjB25urPM8Du+JUnQngPHELUgEF6oH95XVjnHIiNGZIqY7rJNrPkNQUM5KXvVSRBOEB6pGOoQJxovxscn8Oj4zShVEsTQkNJ+rviQxpUY8NJ0c6b6a9cbif14n1dG5n1GRpJoIPF0UpQzqGI7DgF0qCdZsZAjCkpbIe4jibA2kZVNCO7sy/OkeVJ1nap7d1qpXRxlMABOATHwAVnoAauQR0AaP4Bm8gjfryXqx3q2PaeuCVczsgT+wPn8AlFiWag=</latexit><latexit sha1_base64="qRnj04GDTG6mI6G3y1uoqtZEK2Y=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgq6EIXQgX7gCaEyXTSDp2ZhJmJtMQs/BU3LhRx62+482+ctlo64ELh3Pu5d57woRpR3n21pYXFpeWS2tldc3Nre27Z3dpopTiUkDxyW7RApwqgDU01I+1EsRDRlrh4HLstx6IVDQW93qUEJ+jnqARxUgbKbD3s2GQeTyMh5mnqRjB25urPM8Du+JUnQngPHELUgEF6oH95XVjnHIiNGZIqY7rJNrPkNQUM5KXvVSRBOEB6pGOoQJxovxscn8Oj4zShVEsTQkNJ+rviQxpUY8NJ0c6b6a9cbif14n1dG5n1GRpJoIPF0UpQzqGI7DgF0qCdZsZAjCkpbIe4jibA2kZVNCO7sy/OkeVJ1nap7d1qpXRxlMABOATHwAVnoAauQR0AaP4Bm8gjfryXqx3q2PaeuCVczsgT+wPn8AlFiWag=</latexit><latexit sha1_base64="qRnj04GDTG6mI6G3y1uoqtZEK2Y=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgq6EIXQgX7gCaEyXTSDp2ZhJmJtMQs/BU3LhRx62+482+ctlo64ELh3Pu5d57woRpR3n21pYXFpeWS2tldc3Nre27Z3dpopTiUkDxyW7RApwqgDU01I+1EsRDRlrh4HLstx6IVDQW93qUEJ+jnqARxUgbKbD3s2GQeTyMh5mnqRjB25urPM8Du+JUnQngPHELUgEF6oH95XVjnHIiNGZIqY7rJNrPkNQUM5KXvVSRBOEB6pGOoQJxovxscn8Oj4zShVEsTQkNJ+rviQxpUY8NJ0c6b6a9cbif14n1dG5n1GRpJoIPF0UpQzqGI7DgF0qCdZsZAjCkpbIe4jibA2kZVNCO7sy/OkeVJ1nap7d1qpXRxlMABOATHwAVnoAauQR0AaP4Bm8gjfryXqx3q2PaeuCVczsgT+wPn8AlFiWag=</latexit><latexit sha1_base64="qRnj04GDTG6mI6G3y1uoqtZEK2Y=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5KIoLgq6EIXQgX7gCaEyXTSDp2ZhJmJtMQs/BU3LhRx62+482+ctlo64ELh3Pu5d57woRpR3n21pYXFpeWS2tldc3Nre27Z3dpopTiUkDxyW7RApwqgDU01I+1EsRDRlrh4HLstx6IVDQW93qUEJ+jnqARxUgbKbD3s2GQeTyMh5mnqRjB25urPM8Du+JUnQngPHELUgEF6oH95XVjnHIiNGZIqY7rJNrPkNQUM5KXvVSRBOEB6pGOoQJxovxscn8Oj4zShVEsTQkNJ+rviQxpUY8NJ0c6b6a9cbif14n1dG5n1GRpJoIPF0UpQzqGI7DgF0qCdZsZAjCkpbIe4jibA2kZVNCO7sy/OkeVJ1nap7d1qpXRxlMABOATHwAVnoAauQR0AaP4Bm8gjfryXqx3q2PaeuCVczsgT+wPn8AlFiWag=</latexit>

xSMALL

<latexit sha1_base64="mfgeADO2zkTd2xPe9t81NpfKe/Q=">ACAXicbVDLSsNAFJ34rPUVdSO4GSyCq5KIoLiquHFRoaJ9QBPCZDph04mYWYiDSFu/BU3LhRx61+482+ctlo64ELh3Pu5d57/JhRqSzr21hYXFpeWS2tldc3Nre2zZ3dlowSgUkTRywSHR9JwignTUVI51YEBT6jLT94dXYbz8QIWnE71UaEzdEfU4DipHSkmfuZyMvc0I/GmWOojyFdzeX9Xqe5ZsarWBHCe2AWpgAINz/xyehFOQsIVZkjKrm3Fys2QUBQzkpedRJIY4SHqk6mHIVEutnkgxweaUHg0jo4gpO1N8TGQqlTENfd4ZIDeSsNxb/87qJCs7djPI4UYTj6aIgYVBFcBwH7FBsGKpJgLqm+FeIAEwkqHVtYh2LMvz5PWSdW2qvbtaV2UcRAgfgEBwDG5yBGrgGDdAEGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AHg7pcd</latexit><latexit sha1_base64="mfgeADO2zkTd2xPe9t81NpfKe/Q=">ACAXicbVDLSsNAFJ34rPUVdSO4GSyCq5KIoLiquHFRoaJ9QBPCZDph04mYWYiDSFu/BU3LhRx61+482+ctlo64ELh3Pu5d57/JhRqSzr21hYXFpeWS2tldc3Nre2zZ3dlowSgUkTRywSHR9JwignTUVI51YEBT6jLT94dXYbz8QIWnE71UaEzdEfU4DipHSkmfuZyMvc0I/GmWOojyFdzeX9Xqe5ZsarWBHCe2AWpgAINz/xyehFOQsIVZkjKrm3Fys2QUBQzkpedRJIY4SHqk6mHIVEutnkgxweaUHg0jo4gpO1N8TGQqlTENfd4ZIDeSsNxb/87qJCs7djPI4UYTj6aIgYVBFcBwH7FBsGKpJgLqm+FeIAEwkqHVtYh2LMvz5PWSdW2qvbtaV2UcRAgfgEBwDG5yBGrgGDdAEGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AHg7pcd</latexit><latexit sha1_base64="mfgeADO2zkTd2xPe9t81NpfKe/Q=">ACAXicbVDLSsNAFJ34rPUVdSO4GSyCq5KIoLiquHFRoaJ9QBPCZDph04mYWYiDSFu/BU3LhRx61+482+ctlo64ELh3Pu5d57/JhRqSzr21hYXFpeWS2tldc3Nre2zZ3dlowSgUkTRywSHR9JwignTUVI51YEBT6jLT94dXYbz8QIWnE71UaEzdEfU4DipHSkmfuZyMvc0I/GmWOojyFdzeX9Xqe5ZsarWBHCe2AWpgAINz/xyehFOQsIVZkjKrm3Fys2QUBQzkpedRJIY4SHqk6mHIVEutnkgxweaUHg0jo4gpO1N8TGQqlTENfd4ZIDeSsNxb/87qJCs7djPI4UYTj6aIgYVBFcBwH7FBsGKpJgLqm+FeIAEwkqHVtYh2LMvz5PWSdW2qvbtaV2UcRAgfgEBwDG5yBGrgGDdAEGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AHg7pcd</latexit><latexit sha1_base64="mfgeADO2zkTd2xPe9t81NpfKe/Q=">ACAXicbVDLSsNAFJ34rPUVdSO4GSyCq5KIoLiquHFRoaJ9QBPCZDph04mYWYiDSFu/BU3LhRx61+482+ctlo64ELh3Pu5d57/JhRqSzr21hYXFpeWS2tldc3Nre2zZ3dlowSgUkTRywSHR9JwignTUVI51YEBT6jLT94dXYbz8QIWnE71UaEzdEfU4DipHSkmfuZyMvc0I/GmWOojyFdzeX9Xqe5ZsarWBHCe2AWpgAINz/xyehFOQsIVZkjKrm3Fys2QUBQzkpedRJIY4SHqk6mHIVEutnkgxweaUHg0jo4gpO1N8TGQqlTENfd4ZIDeSsNxb/87qJCs7djPI4UYTj6aIgYVBFcBwH7FBsGKpJgLqm+FeIAEwkqHVtYh2LMvz5PWSdW2qvbtaV2UcRAgfgEBwDG5yBGrgGDdAEGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AHg7pcd</latexit> <latexit sha1_base64="+kYPchScNyik1n1ulpTEzE1fdeA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqHgqePHYgv2ANpTNdtKu3WzC7kYob/AiwdFvPqTvPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsbm1vVPcLe3tHxwelY9P2jpOFcMWi0WsugHVKLjEluFGYDdRSKNAYCeY3M39zhMqzWP5YKYJ+hEdSR5yRo2Vmu6gXHGr7gJknXg5qUCOxqD81R/GLI1QGiao1j3PTYyfUWU4Ezgr9VONCWUTOsKepZJGqP1sceiMXFhlSMJY2ZKGLNTfExmNtJ5Gge2MqBnrVW8u/uf1UhPe+BmXSWpQsuWiMBXExGT+NRlyhcyIqSWUKW5vJWxMFWXGZlOyIXirL6+T9lXVc6te87pSv83jKMIZnMleFCDOtxDA1rAOEZXuHNeXRenHfnY9lacPKZU/gD5/MHdrGMrA=</latexit><latexit sha1_base64="+kYPchScNyik1n1ulpTEzE1fdeA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqHgqePHYgv2ANpTNdtKu3WzC7kYob/AiwdFvPqTvPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsbm1vVPcLe3tHxwelY9P2jpOFcMWi0WsugHVKLjEluFGYDdRSKNAYCeY3M39zhMqzWP5YKYJ+hEdSR5yRo2Vmu6gXHGr7gJknXg5qUCOxqD81R/GLI1QGiao1j3PTYyfUWU4Ezgr9VONCWUTOsKepZJGqP1sceiMXFhlSMJY2ZKGLNTfExmNtJ5Gge2MqBnrVW8u/uf1UhPe+BmXSWpQsuWiMBXExGT+NRlyhcyIqSWUKW5vJWxMFWXGZlOyIXirL6+T9lXVc6te87pSv83jKMIZnMleFCDOtxDA1rAOEZXuHNeXRenHfnY9lacPKZU/gD5/MHdrGMrA=</latexit><latexit sha1_base64="+kYPchScNyik1n1ulpTEzE1fdeA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqHgqePHYgv2ANpTNdtKu3WzC7kYob/AiwdFvPqTvPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsbm1vVPcLe3tHxwelY9P2jpOFcMWi0WsugHVKLjEluFGYDdRSKNAYCeY3M39zhMqzWP5YKYJ+hEdSR5yRo2Vmu6gXHGr7gJknXg5qUCOxqD81R/GLI1QGiao1j3PTYyfUWU4Ezgr9VONCWUTOsKepZJGqP1sceiMXFhlSMJY2ZKGLNTfExmNtJ5Gge2MqBnrVW8u/uf1UhPe+BmXSWpQsuWiMBXExGT+NRlyhcyIqSWUKW5vJWxMFWXGZlOyIXirL6+T9lXVc6te87pSv83jKMIZnMleFCDOtxDA1rAOEZXuHNeXRenHfnY9lacPKZU/gD5/MHdrGMrA=</latexit><latexit sha1_base64="+kYPchScNyik1n1ulpTEzE1fdeA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqHgqePHYgv2ANpTNdtKu3WzC7kYob/AiwdFvPqTvPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsbm1vVPcLe3tHxwelY9P2jpOFcMWi0WsugHVKLjEluFGYDdRSKNAYCeY3M39zhMqzWP5YKYJ+hEdSR5yRo2Vmu6gXHGr7gJknXg5qUCOxqD81R/GLI1QGiao1j3PTYyfUWU4Ezgr9VONCWUTOsKepZJGqP1sceiMXFhlSMJY2ZKGLNTfExmNtJ5Gge2MqBnrVW8u/uf1UhPe+BmXSWpQsuWiMBXExGT+NRlyhcyIqSWUKW5vJWxMFWXGZlOyIXirL6+T9lXVc6te87pSv83jKMIZnMleFCDOtxDA1rAOEZXuHNeXRenHfnY9lacPKZU/gD5/MHdrGMrA=</latexit>

5

<latexit sha1_base64="V4e5TdQqlzv2ZBQlxV5oxpiYgY=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUTwVvHhswX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFxi03AjsJMopFEgsB2M72Z+wmV5rF8MJME/YgOJQ85o8ZKjat+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaT2JAtsZUTPSy95M/M/rpia8TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpvHUYQTOIVz8OAanAPdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fkWMsQ=</latexit><latexit sha1_base64="V4e5TdQqlzv2ZBQlxV5oxpiYgY=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUTwVvHhswX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFxi03AjsJMopFEgsB2M72Z+wmV5rF8MJME/YgOJQ85o8ZKjat+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaT2JAtsZUTPSy95M/M/rpia8TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpvHUYQTOIVz8OAanAPdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fkWMsQ=</latexit><latexit sha1_base64="V4e5TdQqlzv2ZBQlxV5oxpiYgY=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUTwVvHhswX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFxi03AjsJMopFEgsB2M72Z+wmV5rF8MJME/YgOJQ85o8ZKjat+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaT2JAtsZUTPSy95M/M/rpia8TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpvHUYQTOIVz8OAanAPdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fkWMsQ=</latexit><latexit sha1_base64="V4e5TdQqlzv2ZBQlxV5oxpiYgY=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUTwVvHhswX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFxi03AjsJMopFEgsB2M72Z+wmV5rF8MJME/YgOJQ85o8ZKjat+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaT2JAtsZUTPSy95M/M/rpia8TMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpvHUYQTOIVz8OAanAPdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fkWMsQ=</latexit>

Split points: xSMALL, 0, xMID, 5, xLARGE, xBIG,

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 4/13

slide-11
SLIDE 11

intel cnrs - inria

Approximation technique

Easier-to-approximate function: g(x) = 1 xex2 erfc(x) − 2

  • decreasing on [5; xBIG]
  • |g(x)| ≤ 2 − √π ≤ 0.228
  • 0.23
  • 0.225
  • 0.22
  • 0.215
  • 0.21
  • 0.205
  • 0.2
  • 0.195
  • 0.19
  • 0.185

xbig 5 10 15 20 25 g(x)

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 5/13

slide-12
SLIDE 12

intel cnrs - inria

Approximation technique

Easier-to-approximate function: g(x) = 1 xex2 erfc(x) − 2

  • decreasing on [5; xBIG]
  • |g(x)| ≤ 2 − √π ≤ 0.228
  • 0.23
  • 0.225
  • 0.22
  • 0.215
  • 0.21
  • 0.205
  • 0.2
  • 0.195
  • 0.19
  • 0.185

xbig 5 10 15 20 25 g(x)

Correction: erfc(x) = e−x2 2x + xg(x)

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 5/13

slide-13
SLIDE 13

intel cnrs - inria

Approximation technique

Easier-to-approximate function: g(x) = 1 xex2 erfc(x) − 2

  • decreasing on [5; xBIG]
  • |g(x)| ≤ 2 − √π ≤ 0.228
  • 0.23
  • 0.225
  • 0.22
  • 0.215
  • 0.21
  • 0.205
  • 0.2
  • 0.195
  • 0.19
  • 0.185

xbig 5 10 15 20 25 g(x)

Correction: erfc(x) = e−x2 2x + xg(x) Evaluation:

  • approximate exp and g
  • recover erfc
  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 5/13

slide-14
SLIDE 14

intel cnrs - inria

Approximation technique

Easier-to-approximate function: g(x) = 1 xex2 erfc(x) − 2

  • decreasing on [5; xBIG]
  • |g(x)| ≤ 2 − √π ≤ 0.228
  • 0.23
  • 0.225
  • 0.22
  • 0.215
  • 0.21
  • 0.205
  • 0.2
  • 0.195
  • 0.19
  • 0.185

xbig 5 10 15 20 25 g(x)

Correction: erfc(x) = e−x2 2x + xg(x) Evaluation:

  • approximate exp and g
  • recover erfc

Issue:

  • −x2 ∈ [−741.256, −25]
  • exp will underflow
  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 5/13

slide-15
SLIDE 15

intel cnrs - inria

What is the best way to scale?

Choose a scaling s to be within [−708.396 · · · ; 670.96 · · · ] e−x2+s · e−s

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 6/13

slide-16
SLIDE 16

intel cnrs - inria

What is the best way to scale?

Choose a scaling s to be within [−708.396 · · · ; 670.96 · · · ] e−x2+k ln 2 · 2 −k

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 6/13

slide-17
SLIDE 17

intel cnrs - inria

What is the best way to scale?

Choose a scaling s to be within [−708.396 · · · ; 670.96 · · · ] e−x2+k ln 2 · 2 −k Search for k ∈ Z that minimizes |s − ◦(k ln 2)|

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 6/13

slide-18
SLIDE 18

intel cnrs - inria

What is the best way to scale?

Choose a scaling s to be within [−708.396 · · · ; 670.96 · · · ] e−x2+k ln 2 · 2 −k Search for k ∈ Z that minimizes |s − ◦(k ln 2)|

  • for FP representation k = 61, ∆s = 0.2583u,

−x2 + ˆ s ∈ [−698.9 · · · , 17.2 · · · ]

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 6/13

slide-19
SLIDE 19

intel cnrs - inria

What is the best way to scale?

Choose a scaling s to be within [−708.396 · · · ; 670.96 · · · ] e−x2+k ln 2 · 2 −k Search for k ∈ Z that minimizes |s − ◦(k ln 2)|

  • for FP representation k = 61, ∆s = 0.2583u,

−x2 + ˆ s ∈ [−698.9 · · · , 17.2 · · · ]

  • for DD representation k = 1021, ∆s = 0.0289u2,

−x2 + ˆ s ∈ [−33.5 · · · , 682.7 · · · ]

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 6/13

slide-20
SLIDE 20

intel cnrs - inria

Error analysis and repartition

Task: ensure a relative error δ and deduce accuracy of each step in erfc(x) = 2 −k e−x2+ˆ

s

2x + xg(x)

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 7/13

slide-21
SLIDE 21

intel cnrs - inria

Error analysis and repartition

Task: ensure a relative error δ and deduce accuracy of each step in erfc(x) = 2 −k e−x2+ˆ

s

2x + xg(x)

y(x)= 2 −ka(x)/d(x) a(x)= et(x) t(x)= −x2 + ˆ s d(x)= 2x + r(x) r(x)= xg(x)

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 7/13

slide-22
SLIDE 22

intel cnrs - inria

Error analysis and repartition

Task: ensure a relative error δ and deduce accuracy of each step in erfc(x) = 2 −k e−x2+ˆ

s

2x + xg(x)

y(x)= 2 −ka(x)/d(x) a(x)= et(x) t(x)= −x2 + ˆ s d(x)= 2x + r(x) r(x)= xg(x) ˆ a(x) = a(x)(1 + εa), ˆ d(x) = d(x)/(1 + εd) ˆ y(x) = 2 −k a(x) d(x)(1 + εDIV)(1 + εa)(1 + εd) = 2 −k a(x) d(x) (1 + εy) To ensure εy ≤ ε it suffices to ensure |εa| ≤ ε/4, |εd| ≤ ε/4, |εDIV| ≤ ε/4.

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 7/13

slide-23
SLIDE 23

intel cnrs - inria

Error analysis and repartition

Task: ensure a relative error δ and deduce accuracy of each step in erfc(x) = 2 −k e−x2+ˆ

s

2x + xg(x)

y(x)= 2 −ka(x)/d(x) a(x)= et(x) t(x)= −x2 + ˆ s d(x)= 2x + r(x) r(x)= xg(x) ˆ t(x) = t(x) + ∆t ˆ t(x)t(x) + ∆t ˆ a(x) = EXP(t(x) + ∆t) = et(x)+∆t (1 + εEXP) = et(x)(1 + e∆t − 1) (1 + εEXP) = et(x) (1 + εa) , To ensure εa ≤ ε it suffices to ensure |εEXP| ≤ ε/4, |∆t| ≤ ln(1 + ε/4)

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 7/13

slide-24
SLIDE 24

intel cnrs - inria

Generic error bounds

Computation step Error Examples of error requirements εy δ 2 −32 2 −46 y(x) = 2 −ka(x)/d(x) εDIV δ/4 2 −34 2 −48 a(x) = et(x) εEXP δ/16 2 −36 2 −50 t(x) = −x2 + k ln 2 ∆t ln(1 + δ/16) 1.99 · 2 −37 1.99 · 2 −51 d(x) = 2x + r(x) εADD δ/8 2 −35 2 −49 r(x) = xg(x) εMUL

δ 4α(8+δ)

1.94 · 2 −35 1.94 · 2 −49 g(x) εg

δ 4α(8+δ)

1.94 · 2 −35 1.94 · 2 −49

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 8/13

slide-25
SLIDE 25

intel cnrs - inria

Generic error bounds

Computation step Error Examples of error requirements εy δ 2 −32 2 −46 y(x) = 2 −ka(x)/d(x) εDIV δ/4 2 −34 2 −48 a(x) = et(x) εEXP δ/16 2 −36 2 −50 t(x) = −x2 + k ln 2 ∆t ln(1 + δ/16) 1.99 · 2 −37 1.99 · 2 −51 d(x) = 2x + r(x) εADD δ/8 2 −35 2 −49 r(x) = xg(x) εMUL

δ 4α(8+δ)

1.94 · 2 −35 1.94 · 2 −49 g(x) εg

δ 4α(8+δ)

1.94 · 2 −35 1.94 · 2 −49

. . . but what happens in double precision?

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 8/13

slide-26
SLIDE 26

intel cnrs - inria

When straightforward binary64 is used

Computation step Error Bounds |εy| δ y(x) = 2 −ka(x)/d(x) |εDIV| u a(x) = et(x) |εEXP|

  • 1. · · · δ − 1024.2584u

t(x) = −x2 + k ln 2 |∆t| 1024.2583u d(x) = 2x + r(x) |εADD| u r(x) = xg(x) |εMUL| u g(x) |εg| 1.7δ − 9.6u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 9/13

slide-27
SLIDE 27

intel cnrs - inria

When straightforward binary64 is used

Computation step Error Bounds |εy| δ y(x) = 2 −ka(x)/d(x) |εDIV| u a(x) = et(x) |εEXP|

  • 1. · · · δ − 1024.2584u

t(x) = −x2 + k ln 2 |∆t| 1024.2583u d(x) = 2x + r(x) |εADD| u r(x) = xg(x) |εMUL| u g(x) |εg| 1.7δ − 9.6u

  • Arithmetic with relative error u
  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 9/13

slide-28
SLIDE 28

intel cnrs - inria

When straightforward binary64 is used

Computation step Error Bounds |εy| δ y(x) = 2 −ka(x)/d(x) |εDIV| u a(x) = et(x) |εEXP|

  • 1. · · · δ − 1024.2584u

t(x) = −x2 + k ln 2 |∆t| 1024.2583u d(x) = 2x + r(x) |εADD| u r(x) = xg(x) |εMUL| u g(x) |εg| 1.7δ − 9.6u

  • Arithmetic with relative error u
  • Can adapt only the accuracy of exp(x) and g(x)
  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 9/13

slide-29
SLIDE 29

intel cnrs - inria

When straightforward binary64 is used

Computation step Error Bounds |εy| δ y(x) = 2 −ka(x)/d(x) |εDIV| u a(x) = et(x) |εEXP|

  • 1. · · · δ − 1024.2584u

t(x) = −x2 + k ln 2 |∆t| 1024.2583u d(x) = 2x + r(x) |εADD| u r(x) = xg(x) |εMUL| u g(x) |εg| 1.7δ − 9.6u

  • Arithmetic with relative error u
  • Can adapt only the accuracy of exp(x) and g(x)
  • Restriction on the relative error: δ > 5.002 · 2 −38
  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 9/13

slide-30
SLIDE 30

intel cnrs - inria

When straightforward binary64 is used

Computation step Error Bounds |εy| δ y(x) = 2 −ka(x)/d(x) |εDIV| u a(x) = et(x) |εEXP|

  • 1. · · · δ − 1024.2584u

t(x) = −x2 + k ln 2 |∆t| 1024.2583u d(x) = 2x + r(x) |εADD| u r(x) = xg(x) |εMUL| u g(x) |εg| 1.7δ − 9.6u

  • Arithmetic with relative error u
  • Can adapt only the accuracy of exp(x) and g(x)
  • Restriction on the relative error: δ > 5.002 · 2 −38

Must be more accurate in critical parts

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 9/13

slide-31
SLIDE 31

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-32
SLIDE 32

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

ethetℓe∆t

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-33
SLIDE 33

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

ethetℓe∆t

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-34
SLIDE 34

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

ethetℓ(1 + εt)

|εt| ≤ 0.2585u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-35
SLIDE 35

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

ethetℓ(1 + εt)

|εt| ≤ 0.2585u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-36
SLIDE 36

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

eth(1 + tℓ)(1 + εt)

|εt| ≤ 0.2585u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-37
SLIDE 37

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

eth(1 + tℓ)(1 + εEℓ)(1 + εt)

|εt| ≤ 0.2585u |εEℓ| ≤ (1089 · 2 −44)u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-38
SLIDE 38

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

eth(1 + tℓ)(1 + εEℓ)(1 + εFMA)(1 + εt)

|εt| ≤ 0.2585u |εEℓ| ≤ (1089 · 2 −44)u |εFMA| ≤ u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-39
SLIDE 39

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

eth(1 + tℓ)(1 + εEℓ)(1 + εFMA)(1 + εt)

|εt| ≤ 0.2585u |εEℓ| ≤ (1089 · 2 −44)u |εFMA| ≤ u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-40
SLIDE 40

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

eth(1 + tℓ)(1 + 1.259u)

|εt| ≤ 0.2585u |εEℓ| ≤ (1089 · 2 −44)u |εFMA| ≤ u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-41
SLIDE 41

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

eth(1 + tℓ)(1 + 1.259u)(1 + εEh)

|εt| ≤ 0.2585u |εEℓ| ≤ (1089 · 2 −44)u |εFMA| ≤ u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-42
SLIDE 42

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

eth(1 + tℓ) (1 + 1.259u)(1 + εEh)

  • (1+εa)

|εt| ≤ 0.2585u |εEℓ| ≤ (1089 · 2 −44)u |εFMA| ≤ u

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-43
SLIDE 43

intel cnrs - inria

Exploiting double-word arithmetic

  • Evaluate t(x) as a double-word th + tℓ

Method 1 |∆t| ≤ 32.259u 6 FP operations Method 2 |∆t| ≤ 0.2584u 10 FP operations

  • Evaluate exponential: a(x) = ethetℓe∆t

eth(1 + tℓ) (1 + 1.259u)(1 + εEh)

  • (1+εa)

|εt| ≤ 0.2585u |εEℓ| ≤ (1089 · 2 −44)u |εFMA| ≤ u

Result: can satisfy an error up to δ = 0.76 · 2 −50 (vs. 2 −38) Cost: 13 FP operations

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 10/13

slide-44
SLIDE 44

intel cnrs - inria

Numerical results – 1

Implementation:

  • Semi-automatic approximation choice for Metalibm
  • Code generation in C

Testing:

  • Reference implementation: GNU libm with gcc 6.3.0
  • Target accuracy: 2 −32, 2 −46, 0.76 · 2 −50

Results:

target accuracy [0; 5] [5; xLARGE] [xLARGE, xBIG] abs rel abs rel abs GNU libm 4 ulp 6.34 u 3 ulp 3.98 u 1.5 ulp 0.76 · 2 −50 (6.08u) 2 ulp 3.84 u 4 ulp 4.02 u 1.5 ulp 2 −46 (128u) 18 ulp 21.07 u 15 ulp 16.6 u 1.5 ulp

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 11/13

slide-45
SLIDE 45

intel cnrs - inria

Numerical results – 2

5 10 15 20 25 30 200 400 600 800 1,000 x Number of cycles GNU libm δ = 2−32 δ = 2−46 δ = 0.76 · 2−50

Intel Xeon Gold 6136 CPU, -march=native -O3

  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 12/13

slide-46
SLIDE 46

intel cnrs - inria

Conclusion

  • Partly-automated implementation that offers
  • a priori target accuracy
  • guaranteed error bounds
  • exploration of a large design space
  • Asymptotic expression + correction
  • Double-word arithmetic for critical parts when in binary64

Perspectives

  • Optimize error budget repartition
  • Achieve higher accuracy
  • Adapt for other functions with similar behavior
  • A. Volkova, J.-M. Muller

Semi-automatic code generation for the erfc(x) function 13/13