Amelioration of GPU speed using pycuda functions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Amelioration of GPU speed using pycuda functions

elafrit
Hi,

I woder if I can ameliorate the pycuda code by editing the number of maximum threads in the gpuarray.py ?
And I can't understand what's really happening when I use the methods of gpuarray to multiply a matrix with a scalar ? Is the scalar sent to the GPU for each element of the matrix or it's sent only the first time ? And is it sent as scalar or as gpuarray ?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Amelioration of GPU speed using pycuda functions

Andreas Kloeckner
Dear Marie,

On Sun, 13 Mar 2011 16:04:06 -0700 (PDT), elafrit <[hidden email]> wrote:
> I woder if I can ameliorate the pycuda code by editing the number of maximum
> threads in the gpuarray.py ?

The only way to find out is to try. If you do find a way to improve the
speed, please do let the list know.

I imagine that a better approach might be to try and introduce some
instruction-level parallelism. (or at least create some wiggle room for
the insn scheduler in ptxas) That, unfortunately, is sort of difficult.

> And I can't understand what's really happening when I use the methods of
> gpuarray to multiply a matrix with a scalar ? Is the scalar sent to the GPU
> for each element of the matrix or it's sent only the first time ? And is it
> sent as scalar or as gpuarray ?

CPU scalars are sent as kernel parameters, which is a fairly efficient
way of broadcasting to all thread blocks.

HTH,
Andreas

_______________________________________________
PyCUDA mailing list
[hidden email]
http://lists.tiker.net/listinfo/pycuda

attachment0 (195 bytes) Download Attachment