[Fixed] Constant game crash during gameplay (GPU warranty replacement)

+
[EDIT] Final edit, in the end after running out of troubleshooting ideas, I sent my pre-built PC to MSI for warranty and they replaced the GPU. It now works perfectly fine and smooth. Running Raytracing mode with stable 175fps, no crash in 50hrs playtime so far.

Dear support,
I am running the PC version through GOG, without mods, it is my first play through.
I have a very good rig (3 months old), FPS are excellent but it crashes randomly between 1 sec and 5 minutes during normal gameplay.
It also crashes sometimes during the benchmark started from the display settings.

Rig is MSI MPG INFINITE X2 14NUG7-416AT:
- CPU = i7-14700KF (has factory OC)
- GPU = GeForce RTX 4080 16GB VENTUS 3X (latest driver 565.90 installed with DDU clean install)
- Bios flashed with the most recent manufacturer BIOS (MSI) in attempt to solve the 14th CPU gen issue
- Windows 11 Version 23H2 for x64
- RAM 32Gb DDR5 (no OC) (tested with MemTest86, 100% pass, not faulty)

I tried different graphic presets:
- Preset: Ray tracing high (avg. 160 FPS) -> crash
- Preset: High with resolution scaling OFF & RayTracing OFF (~170FPS) -> crash
- Using default keybindings, as I read changing some hardcoded keybinds from the engine would crash the game
- I cleared the following directories before new tests:
- - %user%\AppData\Local\CD Projekt Red
- - %user%\AppData\Local\REDEngine
- - %user%\AppData\Local\NVIDIA\GLCache
- - disk cleanup to clear DirectX shader cache
-- Repaired file integrity in GOG
- I disabled the GOG overlay and xbox bar overlay
- I verified the game is not blocked by the antivirus
- Flash lastest MSI BIOS for this computer model
- clean DDU install of latest stable Ge Force NVIDIA driver for the RTX 4080
- tried changing the CPU and GPU down to default factory settings (instead of the MSI overclocked values)
- No overheating observed (max 70 degrees during benchmark in ultra with ray tracing)

On other games:
- I have no crash at all in Elden Ring for over 200hrs
- I had similar amount of crash with Electronic Arts games (it takes two, unravel 2, A way out), it would also crash every 1s to 5 mins

The errors:
- crash randmomly during gameplay or benchmark even if the FPS are very high (170) and smooth
- From Windows Event Viewer :
The description for Event ID 153 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.
The following information was included with the event:
\Device\Video3
Error occurred on GPUID: 100
- From RedEngine the error is (except that the Breadcrumbs Finished / In Progress happens at different steps) :
Error reason: Assert
Expression: <Unknown>
Message: Gpu Crash for unknown reasons! Callstack here is probably irrelevant. Check if Breadcrumbs or Aftermath logged anything useful.
File: E:\R6.Release\dev\src\common\gpuApi\src\dx12\gpuApiDX12Error.cpp(42)

Dumping Breadcrumbs
Breadcrumbs: Command list '[ 36292: 21] [00000229449AA190 CopyAsync] CopyQueueUpload': Finished [2]
Breadcrumbs: ----
Breadcrumbs: Command list '[ 36293: 22] [0000022977D019E0 Default] DepthPrepass': Finished [2]
Breadcrumbs: Command list '[ 36293: 23] [0000022955E21CA0 Default] GBufferEarly_Velocity_Weapon': Finished [2]
Breadcrumbs: Command list '[ 36293: 24] [000002296EAD4EE0 Default] GBuffer_Solid': Finished [2]
Breadcrumbs: Command list '[ 36293: 25] [000002296D46D0F0 Default] GBuffer_Solid': In progress [2]
Breadcrumbs: > COMMANDLIST_SCOPE [marker=1]
Breadcrumbs: > FinalFlushBarriers [marker=1]
Breadcrumbs: Command list '[ 36293: 26] [000002296D46E220 Default] GBuffer_Solid': Not started [2]

Any recommandation on what else I could try ?

Thanks for all inputs.
 
Last edited:
"Message: Gpu Crash for unknown reasons!"

Experienced this in 3 cases and in all of them, 1 or 2 components came out faulty and were replaced. After replacement it no longer occurred. So, I would recommend to have your rig examined.
 
Two things come to mind.

First, I see you didn't attempt to perform a clean re-install of your drivers using DDU. Try that.

Second, as @LeKill3rFou pointed out, there are some well known issues with 13/14th gen CPUs. Read through the material. Regardless of if DDU fixes your issue. Read it.

I don't know how long you've had this computer but the reality is that the voltage issues with the 13/14th gen CPUs often caused errors that made it look like other parts are failing. Once damaged, there is no going back. Intel has extended the warranty on those CPUs at least.

I also want to be clear, I am not saying that is your issue with any degree of certainty. We know way too little of your situation to make that kind of call yet. You just need to be aware this is a possibility.

Other than that, are you experiencing any kind of issues with other demanding games?

You may also want to rollback your GPU drivers if DDU doesn't fix it. Sometimes newer versions are more unstable on certain systems whereas older versions are perfectly fine.
 
Hello @LeKill3rFou and @GrimReaper801 , to your recommandation I did an update of the BIOS. There was indeed a new BIOS that suspposedly fix issues related to the 14th gen CPU.
Unfortunately it did not fix the issue though.

I also performed a clean install of the GPU driver with the help of DDU.

I also ran a full memory stress test MemTest86, it passed 100%, no errors with RAM.

The rig is new since 3 months and for other games:
- No crash on over 200hrs of Elden RIng with max settings
- Tons of crashes on some Electronic Arts coop games (It takes two, Unravel 2, A way out) It would crash also between 10sec - 5 mins

I have notified in the Windows Event Viewer that the crash logs this error:
The description for Event ID 153 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.
\Device\Video3
Error occurred on GPUID: 100
 
Last edited:
That event in Windows Event Viewer, i assume it happened post-reinstall using DDU?

The nvlddmkm error isn't unknown to me. A bit of experience with it

Two things I'd like you to try:

Turn debug mode on in your Nvidia control panel. Open Nvidia control panel -> Press Help menu up top and toggle debug mode on. Then try the game.

What's the exact make and model of your GPU? I'm asking because a lot of manufacturers will OC right out the door. Usually, it will show as not OC'ed on your end. Check your specific model against the stock version from Nvidia. Regardless, try downclocking just a tad under the factory settings from Nvidia.
 
Last edited:
@GrimReaper801 Yes the Windows event Event ID 153 from source nvlddmkm is logged on every crash, and after reinstall using DDU.

I enabled debug mode, but where is it logging more ? In a new Event ID 153 there was no more details.

The GPU exact model is GeForce RTX 4080 16GB VENTUS 3X and the rig is "MSI MPG INFINITE X2 14NUG7-416AT".
 
The GPU exact model is GeForce RTX 4080 16GB VENTUS 3X and the rig is "MSI MPG INFINITE X2 14NUG7-416AT".
I'm not an expert so I could be wrong, but even if your card doesn't show as OC'ed, from what I saw on MSI website (RTX 4080 Super Ventus 3X), I think it is (hence the OC on the name) like @GrimReaper801 pointed out. So try to set the clock rate to the "default" stock Nvidia RTX 4080 Super.

Also remember this, so check in case too :
I did have the problem but that is resolved, without checking the heat, and only need to reduce the P-cores to X52 from X55. This kind of issue causes crashes from start up, to 5 to 10 min into the game, at any zone, (side note without the P-core reduction x52 Hogwarts refuses to run at all, stating lack of memory).
 
@GrimReaper801 Yes the Windows event Event ID 153 from source nvlddmkm is logged on every crash, and after reinstall using DDU.

I enabled debug mode, but where is it logging more ? In a new Event ID 153 there was no more details.

The GPU exact model is GeForce RTX 4080 16GB VENTUS 3X and the rig is "MSI MPG INFINITE X2 14NUG7-416AT".

As far as I know, the debug mode not logging anything is absolutely normal. Simply put, I've ran into this particular error three times in my life and in two cases the debug mode fixed it right up and it worked for years. Third case was just a corrupt driver issue. I was hoping it would work for you but no such luck. You can turn off the debug mode.

As far as your GPU being OCed, I'm having trouble locating your exact model. Most point to your card being OCed but I've found a few instances of your specific model of PC having potentially a non OCed card? Pre-builts can be weird on how they list their specifics.

I'm leaning towards it being OCed since most listings I saw indicated it is factory OCed and it would make perfect sense for an MSI card in an MSI pre-built. Ultimately, check your base clock speed on it vs Nvidia's. If it's OCed, turn it down to Nvidia's specs and see what happens. CP2077 is infamous for dealing poorly with any kind of OC.

The P-cores suggestion that @LeKill3rFou pointed is something that's worth trying but, if I recall correctly, the CPU update fixed that. Or at least, was meant to help. Still, doesn't hurt to try.

Side note, I just realized you updated your OP, thank you for updating your OP as you go along and try new things. I wish everyone who asks for help was as organized as you are about their troubleshooting.

And because of that, I just realized, there is no mentions of your temperatures. Are you monitoring your temperatures as you experience these crashes? What are we looking at, normal? Both CPU and GPU.
 
The temperatures are very low, it doesn’t have time to heat to be honest.
Both CPU and GPU are at 30 degrees Celsius even during crash.

Your assumption were correct, the RTX 4080 Ventus 3X was factory OCed at maximum : 2505Mhz versus 2205 factory default

Also the CPU’s P-cores were all at x55/x56 ratio.

Using MSI center I reduced P-cores to x52 while leaving voltage at « auto ».

And downclocked the GPU to 2205Mhz.

Unfortunately this did not work at all, actually I crashed even faster within 1 secondin the benchmark.

With a new type of error now, that the GPU was disconnected.

I assume that the new settings are unstable.

I will reset the OC to a default profile in MSI center and run a CPU stress test and GPU stress test to see if the fault is on CPU or GPU.

At this stage I also assume that it’s most likely a faulty component and I need to activate the warranty.
 
I will reset the OC to a default profile in MSI center and run a CPU stress test and GPU stress test to see if the fault is on CPU or GPU.

At this stage I also assume that it’s most likely a faulty component and I need to activate the warranty.

That's unfortunate but I also believe you're most likely looking at a faulty component. Downclocking wouldn't normally make things more unstable.

Luckily, MSI, in my experience, is great with warranty exchanges. Though I've only dealt with them to RMA specific components, I have no clue how they deal with pre-built PCs. I hope everything goes well with them!
 
Last edited:
Top Bottom