Blog Archives

Controlling FPU rounding modes with Python

Hey folks! In the previous post I talked a little about the theory behind the floating-point arithmetic and rounding methods. This time I’ll try a more practical approach showing how to control the rounding modes that FPUs (Floating-Point Units) employ to perform their operations. For that I’ll use Python. But, first let me introduce the <fenv.h> header of the ANSI C99 standard.

The ANSI C99 standard establishes the <fenv.h> header to control the floating-point environment of the machine. A lot of functions and constants allowing to configure the properties of the floating-point system (including rounding modes) are defined in this header. In our case, the important definitions are:

Functions

int fegetround() – Returns the current rounding mode
int fesetround(int mode) – Sets the rounding mode returning 0 if all went ok and some other value otherwise

Constants

FE_TOWARDZERO – Flag for round to zero
FE_DOWNWARD – Flag for round toward minus infinity
FE_UPWARD – Flag for round toward plus infinity
FE_TONEAREST – Flag for round to nearest

Unfortunately, not all compilers fully implement the ANSI C99 standard. Therefore, there is no guarantee on the portability of code that uses the <fenv.h> header. Considering this caveat, the GCC compiler supports it, which makes it possible to build a Python extension wrapping the features of <fenv.h>. However, this is not a smart way to reach our goal since it would require forcing users to compile the source code of the extension. An alternative is to use the ctypes Python module to instantiate the libm (part of the standard C library used by GCC, typically glibc, that implements <fenv.h>):

from ctypes import cdll
from ctypes.util import find_library
libm = cdll.LoadLibrary(find_library('m'))

The constants that identify the rounding modes are usually defined as macros in <fenv.h> and, therefore, are not accessible via ctypes. We need to redefine them in Python. The problem is that they vary according the processor so that it is necessary to establish some logic on the result of the function platform.processor(). Right now, however, I just have a x86 processor to test, so I will use the constants for it only:

FE_TOWARDZERO = 0xc00
FE_DOWNWARD = 0x400
FE_UPWARD = 0x800
FE_TONEAREST = 0

Now, just call the appropriate functions:

>>> back_rounding_mode = libm.fegetround()
>>> libm.fesetround(FE_DOWNWARD)
0
>>> 1.0/10.0
0.099999999999999992
>>> libm.fesetround(FE_UPWARD)
0
>>> 1.0/10.0
0.10000000000000001
>>> libm.fesetround(back_rounding_mode)

But, what about the MS Visual Studio? Well… the standard C library of this compiler, msvcrt (MS Visual Studio C Run-Time Library), does not support <fenv.h>. Nonetheless, MS Visual Studio implements a set of non-standard (quite typical…) constants and functions specialized in manipulating the properties of the machine FPU. These constants and functions are defined in the <float.h> header distributed with that compiler. The following definitions are of particular interest to us:

Functions

unsigned int _controlfp(unsigned int new, unsigned int mask) – Sets/gets the control vector of the floating-point system (more information here)

Constants

_MCW_RC = 0x300 – Control vector mask for information about rounding modes
_RC_CHOP = 0x300 – Control vector value for round to zero mode
_RC_UP = 0x200 – Control vector value for round toward plus infinity mode
_RC_DOWN = 0x100 – Control vector value for round toward minus infinity mode
_RC_NEAR = 0 – Control vector value for round to nearest mode

Analogous to the previous case:

>>> _MCW_RC = 0x300
>>> _RC_UP = 0x200
>>> _RC_DOWN = 0x100
>>> from ctypes import cdll
>>> msvcrt = cdll.msvcrt
>>> back_rounding_mode = msvcrt._controlfp(0, 0)
>>> msvcrt._controlfp(_RC_DOWN, _MCW_RC)
590111
>>> 1.0/10.0
0.099999999999999992
>>> msvcrt._controlfp(_RC_UP, _MCW_RC)
590367
>>> 1.0/10.0
0.10000000000000001
>>> msvcrt._controlfp(back_rounding_mode, _MCW_RC)
589855

For sure, it is not convenient to type all this every time you want to change the rounding mode. Nor it is appropriate to guess the standard C library of the system. But you can always create functions encapsulating this behavior. Something like this:

def _start_libm():
    global TO_ZERO, TOWARD_MINUS_INF, TOWARD_PLUS_INF, TO_NEAREST
    global set_rounding, get_rounding
    from ctypes import cdll
    from ctypes.util import find_library
    libm = cdll.LoadLibrary(find_library('m'))
    set_rounding, get_rounding = libm.fesetround, libm.fegetround
    # x86
    TO_ZERO = 0xc00
    TOWARD_MINUS_INF = 0x400
    TOWARD_PLUS_INF = 0x800
    TO_NEAREST = 0

def _start_msvcrt():
    global TO_ZERO, TOWARD_MINUS_INF, TOWARD_PLUS_INF, TO_NEAREST
    global set_rounding, get_rounding
    from ctypes import cdll
    msvcrt = cdll.msvcrt
    set_rounding = lambda mode: msvcrt._controlfp(mode, 0x300)
    get_rounding = lambda: msvcrt._controlfp(0, 0)
    TO_ZERO = 0x300
    TOWARD_MINUS_INF = 0x100
    TOWARD_PLUS_INF = 0x200
    TO_NEAREST = 0

for _start_rounding in _start_libm, _start_msvcrt:
    try:
        _start_rounding()
        break
    except:
        pass
else:
    print "ERROR: You couldn't start the FPU module"

In this case, the constants and functions to be called, as well as the standard C library instantiated, are abstracted. Just use the constants TO_ZERO, TOWARD_MINUS_INF, TOWARD_PLUS_INF, TO_NEAREST and the functions set_rounding, get_rounding.

So, that’s it… I hope this has been useful to you somehow (I doubt it… :P) or, at least, that it was interesting… Until next time…

Posted in Floating-Point Numbers, Python, Scientific Computing

2 Comments

Tags: floating-point, fpu, python, rounding

Aritmética de ponto flutuante

Mar 14

Posted by Rafael Barreto

Opa pessoal! Depois das apresentações, um post com alguma coisa útil… ou não :P. Vou falar sobre algo em que trabalhei há um tempinho atrás na universidade e que acho muito interessante: o sistema de ponto flutuante. Para alguns esse assunto pode ser bastante massante, mas um conhecimento básico sobre ponto flutuante é de extrema importância para qualquer pessoa que trabalhe de forma séria com informática porque problemas derivados do seu uso eventualmente se tornam grandes armadilhas. Enfim… deixa eu parar com a enrolação :).

Os números de ponto flutuante ( $\mathbb{F}$ ) são uma tentativa de representar os números reais ( $\mathbb{R}$ ) nos computadores. Entretanto, dada a finitude inerente aos dispositivos computacionais, há perda de informação — e, portanto, $\mathbb{F}$ é subconjunto próprio de $\mathbb{R}$ — de modo que uma das maiores preocupações matemáticas ao se trabalhar com o sistema de ponto flutuante é a falta de fecho aritmético. Isso corresponde a dizer que, para $\star \in \{+, -, \cdot, /\}$ , nem sempre é verdade que $x \star y \in \mathbb{F}$ . De fato, isso é um problema. Afinal de contas, se o resultado de uma operação não é representável no conjunto de trabalho, a aritmética perde em robustez tornando-se inconsistente e, para todos os fins práticos, inútil.

Uma solução é redefinir os operadores sobre $\mathbb{F}$ de tal forma a criar o fecho. A idéia é aplicar uma função (arredondamento) aos resultados para fazê-los “caber” no conjunto. Formalmente, seja $\bigcirc:\mathbb{R} \rightarrow \mathbb{F}$ . Dizemos que $\bigcirc$ é um arredondamento se obedece às seguintes propriedades:

$\forall \; x \in \mathbb{F} \Rightarrow \bigcirc(x) = x$
$\forall \; x, y \in \mathbb{R} \text{ e } x \le y \Rightarrow \bigcirc(x) \le \bigcirc(y)$

Ao contrário do que esse formalismo sugere, o significado de tais propriedades é bastante simples. A primeira delas estabelece que todos os elementos do conjunto dos números de ponto flutuante são fixados pela função de arredondamento. Ou seja, o arredondamento de um número de ponto flutuante é o próprio número. A segunda garante a preservação da relação de ordem (monoticidade), o que era de se esperar. Em conjunto, essas propriedades implicam no resultado mais importante a respeito da natureza do arredondamento: a característica de máxima qualidade. O que se quer dizer com isso é que $\nexists \; y \in \mathbb{F} \; | \; y \in (\min\{\bigcirc(x), x\}, \max\{\bigcirc(x), x\})$ , ou seja, entre um número real e o seu ponto flutuante correspondente não existem intermediários pertencentes a $\mathbb{F}$ .

Tudo bem, mas qual é a utilidade disso no estabelecimento de uma aritmética de ponto flutuante útil e robusta? Dado $\star \in \{+, -, \cdot, /\}$ , seja $\bullet \in \{\oplus, \ominus, \odot, \oslash\}$ o operador correspondente em $\mathbb{F}$ . Nós definimos uma aritmética de ponto flutuante como:

$\forall \; x, y \in \mathbb{F} \Rightarrow x \bullet y = \bigcirc(x \star y)$

Nesse ponto aparece uma questão de ordem prática bastante relevante. Ora, se eu preciso aplicar a função $\bigcirc$ ao resultado da operação $\star$ , então de alguma forma eu precisarei representar este resultado na máquina. Mas, não é precisamente esse problema que estamos tentando resolver?! A circularidade é apenas aparente, entretanto. Com efeito, é possível mostrar que, se $x$ e $y$ estão armazenados em registradores com $p$ dígitos de precisão, o valor exato da operação $x \star y$ pode ser armazenado em um intermediário com $p + 3$ dígitos – para detalhes a respeito da técnica e do porquê de seu funcionamento, recomendo [1]. Procede-se então o arredondamento adaptando o resultado ao registrador ordinário de $p$ dígitos.

Agora que nós definimos uma aritmética de ponto flutuante abstrata, é necessário concretizá-la. No contexto, isto significa que nós precisamos definir o comportamento da função $\bigcirc$ , o que pode ser feito de 4 formas diferentes como descrito abaixo.

Round to zero

Este é o método de arredondamento mais simples que existe. Formalmente, a função round to zero $\square_z : \mathbb{R} \rightarrow \mathbb{F}$ é definida como

$\square_z(x) = sign(x) \; max \; \lbrace{y \in \mathbb{F} : y \le |x|\rbrace}$

onde $sign(x)$ é o que o próprio nome diz. Este arredondamento é o mais facilmente implementável. De fato, se $\mathbb{F}$ tem precisão $p$ , é suficiente descartar os dígitos além da posição $p-1$ . Por causa disso, este método também é conhecido como truncagem.

Round toward minus infinity

Este é um dos modos de arredondamento pertencentes à classe dos direcionados. Um arredondamento direcionado deve resultar no ponto flutuante imediatamente anterior (ou posterior) ao número real correspondente. Formalmente, ele deve satisfazer uma das seguintes propriedades:

$x \in \mathbb{R} \Rightarrow \bigcirc(x) \le x$
$x \in \mathbb{R} \Rightarrow \bigcirc(x) \ge x$

Particularmente, o round toward minus infinity obedece a primeira propriedade e é definido por:

$\bigtriangledown(x) = max \; \lbrace{y \in \mathbb{F} : y \le x\rbrace}$

Note as semelhanças entre este modo e o round to zero. Apesar de parecerem idênticos, existe uma diferença sutil. Por exemplo, se nós tivermos um sistema de ponto flutuante com 2 dígitos de precisão e tentarmos arredondar -1.234 usando o round to zero, nós obteremos -1.23. Por outro lado, se usarmos o round to minus infinity, o resultado será -1.24. De forma geral, é verdade que $\square_z(x) = sign(x) \; \bigtriangledown(|x|)$ .

Round toward plus infinity

O round toward plus infinity é o análogo simétrico do round toward minus infinity. Portanto, este arredondamento satisfaz a segunda das duas propriedades citadas acima e é definido como:

$\bigtriangleup(x) = min \; \lbrace{y \in \mathbb{F} : y \ge x\rbrace}$

Note uma propriedade interessante dos arredondamentos direcionados: do que foi dito, nós podemos concluir que o intervalo $[\bigtriangledown(x), \bigtriangleup(x)]$ é o menor cujos extremos são números em $\mathbb{F}$ e contém $x$ . Então, o diâmetro deste intervalo é o erro máximo cometido pela função arredondamento. Esta propriedade é usada na Aritmética Intervalar de Precisão Máxima.

Round to nearest

Uma metodologia mais inteligente de arredondamento é considerar o ponto médio $\mu = \frac{1}{2}(\bigtriangledown(x)+\bigtriangleup(x))$ do intervalo $[\bigtriangledown(x), \bigtriangleup(x)]$ . Desta forma, se $x < \mu$ , retorne $\bigtriangledown(x)$ , caso contrário, se $x > \mu$ , então retorne $\bigtriangleup(x)$ . Se $x = \mu$ , nós temos que decidir o que fazer. Esta decisão cria diferentes variantes do arredondamento. O round to nearest implementa exatamente esse método. Em geral, o erro cometido por este modo é menor que o dos outros.

Bem, por agora é isso :). Desculpe se tudo isso foi maçante. De fato, esta é uma abordagem muito teórica para um problema muito prático, mas eu gosto de alguma teoria antes das questões práticas. Isso ajuda a dar fundamentação a possíveis soluções. Em algum ponto no futuro irei escrever sobre o erro de arredondamento considerando o padrão de ponto flutuante IEEE 754 de um ponto de vista prático. Então, até a próxima… :)

Posted in Floating-Point Numbers, Scientific Computing

Rafael Barreto

Random thoughts about technical stuff

Blog Archives

Controlling FPU rounding modes with Python

Functions

Constants

Functions

Constants

Aritmética de ponto flutuante

Round to zero

Round toward minus infinity

Round toward plus infinity

Round to nearest

Categories

Recent Posts

Archives

Buzz on Twitter