Переменная массива Cuda Shared Memory

Question

Feb 08, 2012, 05:36 AM

Переменная массива Cuda Shared Memory

Я пытаюсь объявить переменную для умножения матриц следующим образом:

__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];

Я пытаюсь сделать так, чтобы пользователь мог ввести размер матрицы для вычисления, однако это означало бы изменение BLOCK_SIZE. Я изменил его, но получаю ошибку компилятора: «ошибка: постоянное значение не известно». Я смотрел на это, и это похоже на этонить, Итак, я попробовал:

__shared__ int buf [];

Но тогда я получаю: «ошибка: неполный тип не допускается»

Спасибо, Дэн Обновление с кодом (в значительной степени следилэто руководство и просмотр с помощью руководства cuda): размер блока передается путем запроса у пользователя размера матрицы. Они вводят х и у. Размер блока равен только x, и сейчас он должен принимать тот же размер, что и x и y.

__global__ void matrixMul( float* C, float* A, float* B, int wA, int wB,size_t block_size)
{
    // Block index
    int bx = blockIdx.x;
    int by = blockIdx.y;

    // Thread index
    int tx = threadIdx.x;
    int ty = threadIdx.y;

    // Index of the first sub-matrix of A processed 
    // by the block
    int aBegin = wA * block_size * by;

    // Index of the last sub-matrix of A processed 
    // by the block
    int aEnd   = aBegin + wA - 1;

    // Step size used to iterate through the 
    // sub-matrices of A
    int aStep  = block_size;

    // Index of the first sub-matrix of B processed 
    // by the block
    int bBegin = block_size * bx;

    // Step size used to iterate through the 
    // sub-matrices of B
    int bStep  = block_size * wB;
    float Csub=0;
    // Loop over all the sub-matrices of A and B
    // required to compute the block sub-matrix
    for (int a = aBegin, b = bBegin; a <= aEnd; a += aStep, b += bStep) 
    {
        // Declaration of the shared memory array As 
        // used to store the sub-matrix of A

        extern __shared__ float As[];

        // Declaration of, the shared memory array Bs 
        // used to store the sub-matrix of B
        extern __shared__ float Bs[];
        extern __shared__ float smem[];

        // Load the matrices from global memory
        // to shared memory; each thread loads
        // one element of each matrix
        smem[ty*block_size+tx] = A[a + wA * ty + tx];
        //cuPrintf("\n\nWhat are the memory locations?\n");
        //cuPrintf("The shared memory(A) is: %.2f\n",smem[ty*block_size+tx]);
        smem[block_size*block_size+ty*block_size+tx]  = B[b + wB * ty + tx];
        //cuPrintf("The shared memory(B) is: %.2f\n",smem[block_size*block_size+ty*block_size+tx]);
        // Synchronize to make sure the matrices 
        // are loaded
        __syncthreads();

        // Multiply the two matrices together;
        // each thread computes one element
        // of the block sub-matrix
        for (int k = 0; k < block_size; ++k)
        {

            Csub += smem[ty*block_size+k] * smem[block_size*block_size+k*block_size+tx] ;
            //cuPrintf("Csub is currently: %.2f\n",Csub);
        }
        //cuPrintf("\n\n\n");
        // Synchronize to make sure that the preceding
        // computation is done before loading two new
        // sub-matrices of A and B in the next iteration
        //cuPrintf("the results are csub: %.2f\n",Csub);
        __syncthreads();
    }
    // Write the block sub-matrix to device memory;
    // each thread writes one element
    int c = wB * block_size * by + block_size * bx;
    C[c + wB * ty + tx] = Csub;


}

Переменная массива Cuda Shared Memory

Ответы на вопрос(3)

Ваш ответ на вопрос

Популярные вопросы

Вы очень активны! Это здорово!

Переменная массива Cuda Shared Memory

Ответы на вопрос(3)

Ваш ответ на вопрос

Популярные вопросы