Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION]: <title>How can I restore the program after a cuda kernel error? #1268

Open
delverOne25 opened this issue Aug 12, 2024 · 1 comment

Comments

@delverOne25
Copy link

delverOne25 commented Aug 12, 2024

Question

private static void test(Index1D d, ArrayView view)
{
view[-d ] = 3/d; /// error
}
public static unsafe void Main(string[] args)
{
var ctx = Context.Create (c=>c.AllAccelerators().EnableAlgorithms().Optimize(OptimizationLevel.O2).Inlining(InliningMode.Aggressive));
var a = ctx.CreateCudaAccelerator(0);
var ttt = a.LoadAutoGroupedKernel<Index1D, ArrayView>(test);
var ccc = a.Allocate1D(10);
try
{
ttt(a.DefaultStream, 1000, ccc.View);
a.DefaultStream.Synchronize();
}
catch (AcceleratorException e)
{
CudaAPI.CurrentAPI.DestroyContext((a as CudaAccelerator).NativePtr);
a = ctx.CreateCudaAccelerator(0); /// ILGPU.Runtime.Cuda.CudaException: "an illegal memory access was encountered"

        a.Dispose();
        return;
    }

Environment

  • ILGPU version: [e.g., 1.5.1]
  • .NET version: [e.g., .NET 8]
  • Operating system: [e.g., Windows 10]
  • Hardware (if GPU-related): [e.g., NVIDIA GeForce GTX 1080]

Additional context

No response

@hez2010
Copy link

hez2010 commented Oct 16, 2024

I don't think it's possible. To be short, cuda device doesn't support any form of exception handling. If an error happened on the device side, there's no way to recover from it.
The only solution is to avoid the error from happening before you execute it on the device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants