[PyCUDA] GPUArray class gives negative "s" with large size gpuarray

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[PyCUDA] GPUArray class gives negative "s" with large size gpuarray

takayanagi.tetsuya
Hi, All.
I have developed Lattice Boltzmann Method Code with PyCUDA in our company for simulating Air flow.
Then, I need to handle large gpuarray such like arr[velocity][Z][Y][X] for 3-dimensional fluid flow.
My code run correctly relatively small size gpuarray such as (27, 300, 300, 300).

But Changing gpuarray size from (27, 300, 300, 300) to (27, 450, 450, 450) gives following error.

Error message
OverflowError : can't convert negative int to unsigned

For debugging it, I'm testing following simple code, which also arise error if I designate large size numpy array such like (27, 450, 450, 450).

//
// sample code start
//

import math
import numpy as np
import pycuda.gpuarray as gpuarray
from pycuda.compiler import SourceModule
import pycuda.autoinit

module = SourceModule("""
__global__ void plus_one_3d(int nx, int ny, int nz, int nv, float *arr){
const int x = threadIdx.x + blockDim.x * blockIdx.x;
const int y = threadIdx.y + blockDim.y * blockIdx.y;
const int z = threadIdx.z + blockDim.z * blockIdx.z;
const int nxyz = nx * ny * z + nx * y + x;
int ijk = nx * ny * z + nx * y + x;
if (x < nx && y < ny && z < nz){
for (int c = 0; c < nv; c++){
arr[nxyz * c + ijk] += 1.0;
}
}
}
""")

plus_one = module.get_function("plus_one_3d")

num_x, num_y, num_z = np.int32(450), np.int32(450), np.int32(450)
nv = np.int32(27)
arr_gpu = gpuarray.zeros([nv, num_z, num_y, num_x], dtype=np.float32)

threads_per_block = (6, 6, 6)
block_x = math.ceil(num_x / threads_per_block[0])
block_y = math.ceil(num_y / threads_per_block[1])
block_z = math.ceil(num_z / threads_per_block[2])
blocks_per_grid = (block_x, block_y, block_z)

plus_one(num_x, num_y, num_z, nv, arr_gpu, block=threads_per_block, grid=blocks_per_grid)

arr = arr_gpu.get()

print("result :", arr)

//
// sample code end
//

Debugging with pycharm leads variables "s" become negative of GPUArray class when I designate shape (27, 450, 450, 450) as a gpuarray.
But s is calculated correctly when I designate shape (27, 300, 300, 300). I think data type of s is something wrong.

Any advise ?

Besh wishes,
t-tetsuya
_______________________________________________
PyCUDA mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

[PyCUDA] Re: GPUArray class gives negative "s" with large size gpuarray

Andreas Kloeckner
[hidden email] writes:

> Hi, All.
> I have developed Lattice Boltzmann Method Code with PyCUDA in our company for simulating Air flow.
> Then, I need to handle large gpuarray such like arr[velocity][Z][Y][X] for 3-dimensional fluid flow.
> My code run correctly relatively small size gpuarray such as (27, 300, 300, 300).
>
> But Changing gpuarray size from (27, 300, 300, 300) to (27, 450, 450, 450) gives following error.
>
> Error message
> OverflowError : can't convert negative int to unsigned
>
> For debugging it, I'm testing following simple code, which also arise error if I designate large size numpy array such like (27, 450, 450, 450).
>
> //
> // sample code start
> //
>
> import math
> import numpy as np
> import pycuda.gpuarray as gpuarray
> from pycuda.compiler import SourceModule
> import pycuda.autoinit
>
> module = SourceModule("""
> __global__ void plus_one_3d(int nx, int ny, int nz, int nv, float *arr){
> const int x = threadIdx.x + blockDim.x * blockIdx.x;
> const int y = threadIdx.y + blockDim.y * blockIdx.y;
> const int z = threadIdx.z + blockDim.z * blockIdx.z;
> const int nxyz = nx * ny * z + nx * y + x;
> int ijk = nx * ny * z + nx * y + x;
> if (x < nx && y < ny && z < nz){
> for (int c = 0; c < nv; c++){
> arr[nxyz * c + ijk] += 1.0;
> }
> }
> }
> """)
>
> plus_one = module.get_function("plus_one_3d")
>
> num_x, num_y, num_z = np.int32(450), np.int32(450), np.int32(450)
Your shape can't consist of int32's. Convert them to (Python) int before
using them in the array shape.

Andreas

_______________________________________________
PyCUDA mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]

signature.asc (847 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PyCUDA] Re: GPUArray class gives negative "s" with large size gpuarray

takayanagi.tetsuya
Thank you for your reply.
Actually, I couldn't recognize the concrete maximum value of np.int32.
The fact that np.iinfo(np.int32).max / (27 * 450 * 450 * 450)  is smaller than 1 tells me the issue come from Overflow.
Changing "np.int32" to "np.int64" and "int" to "long long" solves the bug.
I confirmed following code run correctly in the case of numpy array shape (27, 450, 450, 450).
Thank you very much.

Best regards,
t-tetsuya

#
# sample code start
#

import math
import numpy as np
import pycuda.gpuarray as gpuarray
from pycuda.compiler import SourceModule
import pycuda.autoinit
module = SourceModule("""

__global__ void plus_one_3d(long long nx, long long ny, long long nz, long long nv, float *arr){
    const long long x = threadIdx.x + blockDim.x * blockIdx.x;
    const long long y = threadIdx.y + blockDim.y * blockIdx.y;
    const long long z = threadIdx.z + blockDim.z * blockIdx.z;
    const int nxyz = nx * ny * nz;
    int ijk = nx * ny * z + nx * y + x;
    long idx;
    if (x < nx && y < ny && z < nz){
        for (int c = 0; c < nv; c++){
            idx = nxyz * c + ijk;
            arr[idx] += 1.0;
        }
    }
}

""")

plus_one = module.get_function("plus_one_3d")

num_x, num_y, num_z = np.int64(450), np.int64(450), np.int64(450)
nv = np.int64(2)
arr_gpu = gpuarray.zeros([nv, num_z, num_y, num_x], dtype=np.float32)

threads_per_block = (6, 6, 6)
block_x = math.ceil(num_x / threads_per_block[0])
block_y = math.ceil(num_y / threads_per_block[1])
block_z = math.ceil(num_z / threads_per_block[2])
blocks_per_grid = (block_x, block_y, block_z)

plus_one(num_x, num_y, num_z, nv, arr_gpu, block=threads_per_block, grid=blocks_per_grid)

arr = arr_gpu.get()

print("result :", arr)

#
# sample code end
#
_______________________________________________
PyCUDA mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]