numba guvectorize target = 'parallel' langsamer als target = 'cpu'

Question

Feb 11, 2016, 10:45 PM

python numexpr numba parallel-processing

numba guvectorize target = 'parallel' langsamer als target = 'cpu'

Ich habe versucht, einen Teil des Python-Codes zu optimieren, der umfangreiche mehrdimensionale Array-Berechnungen umfasst. Ich erhalte kontraproduktive Ergebnisse mit Numba. Ich laufe auf einem MBP, Mitte 2015, 2,5 GHz i7 Quadcore, OS 10.10.5, Python 2.7.11. Folgendes berücksichtigen

 import numpy as np
 from numba import jit, vectorize, guvectorize
 import numexpr as ne
 import timeit

 def add_two_2ds_naive(A,B,res):
     for i in range(A.shape[0]):
         for j in range(B.shape[1]):
             res[i,j] = A[i,j]+B[i,j]

 @jit
 def add_two_2ds_jit(A,B,res):
     for i in range(A.shape[0]):
         for j in range(B.shape[1]):
             res[i,j] = A[i,j]+B[i,j]

 @guvectorize(['float64[:,:],float64[:,:],float64[:,:]'],
    '(n,m),(n,m)->(n,m)',target='cpu')
 def add_two_2ds_cpu(A,B,res):
     for i in range(A.shape[0]):
         for j in range(B.shape[1]):
             res[i,j] = A[i,j]+B[i,j]

 @guvectorize(['(float64[:,:],float64[:,:],float64[:,:])'],
    '(n,m),(n,m)->(n,m)',target='parallel')
 def add_two_2ds_parallel(A,B,res):
     for i in range(A.shape[0]):
         for j in range(B.shape[1]):
             res[i,j] = A[i,j]+B[i,j]

 def add_two_2ds_numexpr(A,B,res):
     res = ne.evaluate('A+B')

 if __name__=="__main__":
     np.random.seed(69)
     A = np.random.rand(10000,100)
     B = np.random.rand(10000,100)
     res = np.zeros((10000,100))

Ich kann jetzt timeit für die verschiedenen Funktionen ausführen:

%timeit add_two_2ds_jit(A,B,res)
1000 loops, best of 3: 1.16 ms per loop

%timeit add_two_2ds_cpu(A,B,res)
1000 loops, best of 3: 1.19 ms per loop

%timeit add_two_2ds_parallel(A,B,res)
100 loops, best of 3: 6.9 ms per loop

%timeit add_two_2ds_numexpr(A,B,res)
1000 loops, best of 3: 1.62 ms per loop

Es scheint, dass 'parallel' nicht einmal die Mehrheit eines einzelnen Kerns nutzt, wie es in @ verwendet wirtop zeigt, dass Python ~ 40% CPU für 'Parallel', ~ 100% für 'CPU' und numexpr Treffer ~ 300% erreicht.

Antworten auf die Frage(2)

Top Fragen

0 die antwort

Rails Server-Fehler: Ruby-Version ist 1.8.7, aber Ihre Gemfile hat 1.9.3 angegeben

0 die antwort

Vier Varianten von jQuery ready () - Was ist der Unterschied?

0 die antwort

Wie verwende ich Async mit Visual Studio 2010 und .NET 4.0?

0 die antwort

C # /. NET: Ermitteln, ob das Programm als Dienst oder als Konsolenanwendung ausgeführt wird [duplizieren]

0 die antwort

Schreiben von Langtext zur Anzeige auf dem Bildschirm in Android, so dass der Bildschirm nach unten scrollen [geschlossen]

Du bist sehr aktiv! Es ist großartig!

numba guvectorize target = &#39;parallel&#39; langsamer als target = &#39;cpu&#39;

Antworten auf die Frage(2)

Ihre Antwort auf die Frage

Top Fragen

numba guvectorize target = 'parallel' langsamer als target = 'cpu'