¿Fuga de memoria en gcc 4.8.1 al usar thread_local?

Valgrind está reportando bloques filtrados, aparentemente uno por hilo, en el siguiente código:

#include <iostream>
#include <thread>
#include <mutex>
#include <list>
#include <chrono>

std::mutex cout_mutex;

struct Foo
{
    Foo() 
    { 
        std::lock_guard<std::mutex> lock( cout_mutex );
        std::cout << __PRETTY_FUNCTION__ << '\n'; 
    }

    ~Foo() 
    { 
        std::lock_guard<std::mutex> lock( cout_mutex );
        std::cout << __PRETTY_FUNCTION__ << '\n'; 
    }

    void 
    hello_world() 
    { 
        std::lock_guard<std::mutex> lock( cout_mutex );
        std::cout << __PRETTY_FUNCTION__ << '\n'; 
    }
};

void
hello_world_thread()
{
    thread_local Foo foo;

    // must access, or the thread local variable may not be instantiated
    foo.hello_world();

    // keep the thread around momentarily
    std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
}

int main()
{
    for ( int i = 0; i < 100; ++i )
    {
        std::list<std::thread> threads;

        for ( int j = 0; j < 10; ++j )
        {
            std::thread thread( hello_world_thread );
            threads.push_back( std::move( thread ) );
        }

        while ( ! threads.empty() )
        {
            threads.front().join();
            threads.pop_front();
        }
    }
}

Versión del compilador:

$ g++ --version
g++ (GCC) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Opciones de construcción de GCC:

--enable-shared
--enable-threads=posix
--enable-__cxa_atexit
--enable-clocale=gnu
--enable-cxx-flags='-fno-omit-frame-pointer -g3'
--enable-languages=c,c++
--enable-libstdcxx-time=rt
--enable-checking=release
--enable-build-with-cxx
--disable-werror
--disable-multilib
--disable-bootstrap
--with-system-zlib

Opciones de compilación del programa:

g++ -std=gnu++11 -Og -g3 -Wall -Wextra -fno-omit-frame-pointer thread_local.cc

versión valgrind:

$ valgrind --version
valgrind-3.8.1

Opciones de Valgrind:

valgrind --leak-check=full --verbose ./a.out > /dev/null

Final de cola de salida de valgrind:

==1786== HEAP SUMMARY:
==1786==     in use at exit: 24,000 bytes in 1,000 blocks
==1786==   total heap usage: 3,604 allocs, 2,604 frees, 287,616 bytes allocated
==1786== 
==1786== Searching for pointers to 1,000 not-freed blocks
==1786== Checked 215,720 bytes
==1786== 
==1786== 24,000 bytes in 1,000 blocks are definitely lost in loss record 1 of 1
==1786==    at 0x4C29969: operator new(unsigned long, std::nothrow_t const&) (vg_replace_malloc.c:329)
==1786==    by 0x4E8E53E: __cxa_thread_atexit (atexit_thread.cc:119)
==1786==    by 0x401036: hello_world_thread() (thread_local.cc:34)
==1786==    by 0x401416: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (functional:1732)
==1786==    by 0x4EE4830: execute_native_thread_routine (thread.cc:84)
==1786==    by 0x5A10E99: start_thread (pthread_create.c:308)
==1786==    by 0x573DCCC: clone (clone.S:112)
==1786== 
==1786== LEAK SUMMARY:
==1786==    definitely lost: 24,000 bytes in 1,000 blocks
==1786==    indirectly lost: 0 bytes in 0 blocks
==1786==      possibly lost: 0 bytes in 0 blocks
==1786==    still reachable: 0 bytes in 0 blocks
==1786==         suppressed: 0 bytes in 0 blocks
==1786== 
==1786== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
--1786-- 
--1786-- used_suppression:      2 dl-hack3-cond-1
==1786== 
==1786== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

Los constructores y destructores se ejecutaron una vez para cada hilo:

$ ./a.out | grep 'Foo::Foo' | wc -l
1000

$ ./a.out | grep hello_world | wc -l
1000

$ ./a.out | grep 'Foo::~Foo' | wc -l
1000

Notas:

Si cambia el número de subprocesos creados, el número de bloques filtrados coincide con el número de subprocesos.El código está estructurado de tal manera quepodría Permitir la reutilización de recursos (es decir, el bloque filtrado) si GCC se implementó.Desde el valtrind stacktrace, thread_local.cc:34 es la línea:thread_local Foo foo;Debido a la llamada sleep_for (), una ejecución del programa tarda unos 10 segundos aproximadamente.

¿Alguna idea de si esta pérdida de memoria está en GCC, es el resultado de mis opciones de configuración o hay algún error en mi programa?

Respuestas a la pregunta(2)

Su respuesta a la pregunta