So a bit of time has pass since my previous post. During that time I dug deep into the issue and tested so many different theories. In short, I fixed it - but it wasn't the solution I thought it was going to be. Skip to end for answer.
1) started by digging through the event viewer and noticed a kernel power event
2) moved over to "view reliability history" since it appears to be a cleaner view of the event viewer data. This showed the same kernel power events, but also highlighted other key events such as a hardware failure event. Also, all events are laid out in a nice timeline. Highly recommend using this.
3) ran GPU, CPU, Power, and Mem stress tests using OCCT. All PASSED. ( which is crazy when you read the fix). But the system would still crash in less than a minute into the Heaven benchmark.
4) downgraded from an RTX 2070 Super to a GTX 1070 hoping a lower power card would help. Still crashed.
4) swapped around memory sticks and ran Windows Memory Diagnostic. Passed.
5) since the system was still crashing I got a screw driver and started taking the computer apart. Replaced thermal paste.
6) with the motherboard out I inspected caps to look for anything noticeably bad. Thankfully, everything was good.
THE FIX: Checked the PSU in detail. I have a modular power supply. One of the sockets used for the GPU cable was 'pulled' out about 1mm. I took a closer look at the cable and noticed part of the plastic pin shroud on the PSU side of the cable was cracked. Swapped to a different cable and avoided the bent socket. Boom! It works. I've been running Heaven for the last 30 minutes. I clearly missed this when swapping GPU's since the issue was on the other side of the cable.
So that's it. A ~$20 power cable and a bent PSU socket. Not what I expected at all. What a wild ride.
1) started by digging through the event viewer and noticed a kernel power event
2) moved over to "view reliability history" since it appears to be a cleaner view of the event viewer data. This showed the same kernel power events, but also highlighted other key events such as a hardware failure event. Also, all events are laid out in a nice timeline. Highly recommend using this.
3) ran GPU, CPU, Power, and Mem stress tests using OCCT. All PASSED. ( which is crazy when you read the fix). But the system would still crash in less than a minute into the Heaven benchmark.
4) downgraded from an RTX 2070 Super to a GTX 1070 hoping a lower power card would help. Still crashed.
4) swapped around memory sticks and ran Windows Memory Diagnostic. Passed.
5) since the system was still crashing I got a screw driver and started taking the computer apart. Replaced thermal paste.
6) with the motherboard out I inspected caps to look for anything noticeably bad. Thankfully, everything was good.
THE FIX: Checked the PSU in detail. I have a modular power supply. One of the sockets used for the GPU cable was 'pulled' out about 1mm. I took a closer look at the cable and noticed part of the plastic pin shroud on the PSU side of the cable was cracked. Swapped to a different cable and avoided the bent socket. Boom! It works. I've been running Heaven for the last 30 minutes. I clearly missed this when swapping GPU's since the issue was on the other side of the cable.
So that's it. A ~$20 power cable and a bent PSU socket. Not what I expected at all. What a wild ride.