Maneira eficiente de arredondar números de precisão dupla para uma precisão menor dada em número de bits

Question

Jan 11, 2013, 08:41 PM

Maneira eficiente de arredondar números de precisão dupla para uma precisão menor dada em número de bits

Em C #, eu quero arredondar o dobro para uma precisão menor para que eu possa armazená-los em baldes de tamanho variável em um array associativo. Ao contrário do arredondamento usual, quero arredondar para um número de bits significativos. Assim, grandes números irão mudar em termos absolutos muito mais do que pequenos números, mas tenderão a mudar o mesmo proporcionalmente. Então, se eu quiser arredondar para 10 dígitos binários, eu acho os dez bits mais significativos, e zerarei todos os bits menores, possivelmente adicionando um número pequeno para arredondar.

Eu prefiro que números "intermediários" sejam arredondados.

Se fosse um tipo inteiro, aqui seria um algoritmo possível:

  1. Find: zero-based index of the most significant binary digit set H.
  2. Compute: B = H - P, 
       where P is the number of significant digits of precision to round
       and B is the binary digit to start rounding, where B = 0 is the ones place, 
       B = 1 is the twos place, etc. 
  3. Add: x = x + 2^B 
       This will force a carry if necessary (we round halfway values up).
  4. Zero out: x = x mod 2^(B+1). 
       This clears the B place and all lower digits.

O problema é encontrar uma maneira eficiente de encontrar o maior conjunto de bits. Se eu estivesse usando números inteiros, existem hacks legais para encontrar o MSB. Eu não quero chamar Round (Log2 (x)) se eu puder ajudar. Esta função será chamada muitos milhões de vezes.

Nota: Eu li esta pergunta SO:

O que é uma boa maneira de arredondar valores de precisão dupla para uma precisão (um pouco) menor?

Ele funciona para C ++. Eu estou usando c #.

ATUALIZAR:

Este é o código (modificado do que o respondente forneceu) como eu estou usando:

/// <summary>
/// Round numbers to a specified number of significant binary digits.
/// 
/// For example, to 3 places, numbers from zero to seven are unchanged, because they only require 3 binary digits,
/// but larger numbers lose precision:
/// 
///      8    1000 => 1000   8
///      9    1001 => 1010  10
///     10    1010 => 1010  10
///     11    1011 => 1100  12
///     12    1100 => 1100  12
///     13    1101 => 1110  14
///     14    1110 => 1110  14
///     15    1111 =>10000  16
///     16   10000 =>10000  16
///     
/// This is different from rounding in that we are specifying the place where rounding occurs as the distance to the right
/// in binary digits from the highest bit set, not the distance to the left from the zero bit.
/// </summary>
/// <param name="d">Number to be rounded.</param>
/// <param name="digits">Number of binary digits of precision to preserve. </param>
public static double AdjustPrecision(this double d, int digits)
{
    // TODO: Not sure if this will work for both normalized and denormalized doubles. Needs more research.
    var shift = 53 - digits; // IEEE 754 doubles have 53 bits of significand, but one bit is "implied" and not stored.
    ulong significandMask = (0xffffffffffffffffUL >> shift) << shift;
    var local_d = d;
    unsafe
    {
        // double -> fixed point (sorta)
        ulong toLong = *(ulong*)(&local_d);
        // mask off your least-sig bits
        var modLong = toLong & significandMask;
        // fixed point -> float (sorta)
        local_d = *(double*)(&modLong);
    }
    return local_d;
}

ATUALIZAÇÃO 2: Algoritmo de Dekker

Eu deduzi isso do algoritmo de Dekker, graças ao outro entrevistado. Ele arredonda para o valor mais próximo, em vez de truncar como o código acima, e usa apenas código seguro:

private static double[] PowersOfTwoPlusOne;

static NumericalAlgorithms()
{
    PowersOfTwoPlusOne = new double[54];
    for (var i = 0; i < PowersOfTwoPlusOne.Length; i++)
    {
        if (i == 0)
            PowersOfTwoPlusOne[i] = 1; // Special case.
        else
        {
            long two_to_i_plus_one = (1L << i) + 1L;
            PowersOfTwoPlusOne[i] = (double)two_to_i_plus_one;
        }
    }
}

public static double AdjustPrecisionSafely(this double d, int digits)
{
    double t = d * PowersOfTwoPlusOne[53 - digits];
    double adjusted = t - (t - d);
    return adjusted;
}

ATUALIZAÇÃO 2: TIMINGS

Eu fiz um teste e descobri que o algoritmo de Dekker é melhor que TWICE tão rápido!

Número de chamadas em teste: 100.000.000
Tempo Inseguro = 1.922 (seg)
Tempo Seguro = 0.799 (seg)