Random crashes of graphics device under Linux while playing

Ask here if you experience technical problems with X4: Foundations.

Moderator: Moderators for English X Forum

-=FL=- UniversE
Posts: 1198
Joined: Sat, 31. Dec 05, 13:36
x4

Random crashes of graphics device under Linux while playing

Post by -=FL=- UniversE »

> Version and language

7.10 Hotfix 1 English., Steam Version.

> Whether or not your game is modified using any third party scripts or mods (see note below).

No.

> Game start being played.

Happens in sandbox, and also happens in timelines.

> Exact nature of the problem, where and when it occurs and what you were doing at the time.

Random crash of the graphics device. Doesn't need to be a graphics intensive scene. It also happened once when docked. Every time it happens you can see this in the journal

Code: Select all

Sep 04 21:58:59 uapc03 kernel: pcieport 0000:00:03.1: pciehp: Slot(0): Link Down
Sep 04 21:58:59 uapc03 kernel: pcieport 0000:00:03.1: pciehp: Slot(0): Card not present
Sep 04 21:58:59 uapc03 kernel: snd_hda_intel 0000:07:00.1: Unable to change power state from D3hot to D0, device inaccessible
Sep 04 21:59:00 uapc03 kernel: snd_hda_intel 0000:07:00.1: Unable to change power state from D3cold to D0, device inaccessible
Sometimes, but not always, followed up by

Code: Select all

Aug 30 23:02:11 uapc03 kernel: NVRM: GPU at PCI:0000:07:00: GPU-210b1c70-28b4-0f9f-b657-9e7f7f5c5c90
Aug 30 23:02:11 uapc03 kernel: NVRM: GPU Board Serial Number: PMVQU0A9VGY005
Aug 30 23:02:11 uapc03 kernel: NVRM: Xid (PCI:0000:07:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Aug 30 23:02:11 uapc03 kernel: NVRM: GPU 0000:07:00.0: GPU has fallen off the bus.
No other entries in the journal before the error message, that might indicate the cause.
Don't be confused by the snd_hda_intel stuff - the bus 07 belongs to the graphics card.

> Any possibly relevant changes you have made to your game, system, or software before the issue occurred.

Hard to pinpoint it. Might be related to the upgrade to Fedora 40, because the crashes did not happen before. But they still don't happen in other applications, only in X4.

> Where appropriate, additional symptoms, error messages, links to saves, screenshots and crash dump files (see this Wiki entry).

Independent of the ingame setting. Happens anywhere, also in timelines scenarios.

> Your system specifications in the form of a DxDiag report and vulkaninfo (see this Wiki entry).

No DxDiag, Linux system, Fedora 40, all updates installed. Vulkan Info is here: https://uap-core.de/misc/vulkaninfo


I know, this is not much to work with. If there is anything else I could do, e.g. enable some tracing while playing, to record more data, please let me know.
Also let me know, if you know any good way to provoke this issue outside of X4 to see if it's really a game related issue.
CBJ
EGOSOFT
EGOSOFT
Posts: 54239
Joined: Tue, 29. Apr 03, 00:56
x4

Re: Random crashes of graphics device under Linux while playing

Post by CBJ »

In the understandable absence of a DXDiag, could you give us some basic info about the rest of your system specs? The vulkaninfo does a good job of providing information about your graphics subsystems but doesn't tell us anything about things like your CPU and memory.
-=FL=- UniversE
Posts: 1198
Joined: Sat, 31. Dec 05, 13:36
x4

Re: Random crashes of graphics device under Linux while playing

Post by -=FL=- UniversE »

Sure - I hope this helps:

Linux Kernel and Distribution:

Code: Select all

$ uname -a
Linux uapc03 6.10.6-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Aug 19 14:09:30 UTC 2024 x86_64 GNU/Linux
$ cat /etc/fedora-release 
Fedora release 40 (Forty)
CPU specs:

Code: Select all

$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   24
  On-line CPU(s) list:    0-23
Vendor ID:                AuthenticAMD
  Model name:             AMD Ryzen 9 5900X 12-Core Processor
    CPU family:           25
    Model:                33
    Thread(s) per core:   2
    Core(s) per socket:   12
    Socket(s):            1
    Stepping:             2
    Frequency boost:      enabled
    CPU(s) scaling MHz:   50%
    CPU max MHz:          4950.1948
    CPU min MHz:          2200.0000
    BogoMIPS:             7386.22
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m
                          ca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall
                           nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep
                          _good nopl xtopology nonstop_tsc cpuid extd_apicid ape
                          rfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4
                          _1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdran
                          d lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
                          misalignsse 3dnowprefetch osvw ibs skinit wdt tce topo
                          ext perfctr_core perfctr_nb bpext perfctr_llc mwaitx c
                          pb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vm
                          mcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rd
                          t_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xs
                          avec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_tota
                          l cqm_mbm_local user_shstk clzero irperf xsaveerptr rd
                          pru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scal
                          e vmcb_clean flushbyasid decodeassists pausefilter pft
                          hreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pk
                          u ospke vaes vpclmulqdq rdpid overflow_recov succor sm
                          ca fsrm debug_swap
Virtualization features:  
  Virtualization:         AMD-V
Caches (sum of all):      
  L1d:                    384 KiB (12 instances)
  L1i:                    384 KiB (12 instances)
  L2:                     6 MiB (12 instances)
  L3:                     64 MiB (2 instances)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-23
Vulnerabilities:          
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Vulnerable: Safe RET, no microcode
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prct
                          l
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointe
                          r sanitization
  Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STI
                          BP always-on; RSB filling; PBRSB-eIBRS Not affected; B
                          HI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected
PCI Devices

Code: Select all

$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 7
01:00.0 Non-Volatile memory controller: Sandisk Corp WD PC SN810 / Black SN850 NVMe SSD (rev 01)
02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset USB 3.1 XHCI Controller
02:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset SATA Controller
02:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset Switch Upstream Port
03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
03:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
04:00.0 Non-Volatile memory controller: Sandisk Corp WD PC SN810 / Black SN850 NVMe SSD (rev 01)
05:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8852AE 802.11ax PCIe Wireless Network Adapter
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 16)
07:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3080 Lite Hash Rate] (rev a1)
07:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function
09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
09:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
09:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
USB Devices

Code: Select all

$ lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 1a40:0101 Terminus Technology Inc. Hub
Bus 001 Device 003: ID 0bda:2852 Realtek Semiconductor Corp. Bluetooth Radio
Bus 001 Device 004: ID 0557:8021 ATEN International Co., Ltd Hub
Bus 001 Device 005: ID 103c:84fd HP TracerLED
Bus 001 Device 006: ID 0461:554a Primax Electronics, Ltd HP 125 Wired Keyboard
Bus 001 Device 007: ID 046d:0aaa Logitech, Inc. Logitech G PRO X Gaming Headset
Bus 001 Device 008: ID 06a3:0762 Saitek PLC Saitek X52 Pro Flight Control System
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 002: ID 045b:0209 Hitachi, Ltd 
Bus 003 Device 003: ID 131d:0158 Natural Point TrackIR 5 Pro Head Tracker
Bus 003 Device 004: ID 056d:4014 EIZO Corp. FlexScan EV2750
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 004 Device 002: ID 045b:0210 Hitachi, Ltd 
Memory

Code: Select all

$ lsmem
RANGE                                  SIZE  STATE REMOVABLE  BLOCK
0x0000000000000000-0x00000000afffffff  2.8G online       yes   0-21
0x0000000100000000-0x000000083fffffff   29G online       yes 32-263

Memory block size:       128M
Total online memory:    31.8G
Total offline memory:      0B
Kernel Modules

Code: Select all

$ lsmod
Module                  Size  Used by
rfcomm                102400  16
snd_seq_dummy          12288  0
snd_hrtimer            12288  1
nf_conntrack_netbios_ns    12288  1
nf_conntrack_broadcast    12288  1 nf_conntrack_netbios_ns
nft_fib_inet           12288  1
nft_fib_ipv4           12288  1 nft_fib_inet
nft_fib_ipv6           12288  1 nft_fib_inet
nft_fib                12288  3 nft_fib_ipv6,nft_fib_ipv4,nft_fib_inet
nft_reject_inet        12288  10
nf_reject_ipv4         12288  1 nft_reject_inet
nf_reject_ipv6         20480  1 nft_reject_inet
nft_reject             12288  1 nft_reject_inet
nft_ct                 28672  8
nft_chain_nat          12288  3
nf_nat                 65536  1 nft_chain_nat
nf_conntrack          192512  4 nf_nat,nft_ct,nf_conntrack_netbios_ns,nf_conntrack_broadcast
nf_defrag_ipv6         24576  1 nf_conntrack
nf_defrag_ipv4         12288  1 nf_conntrack
ip_set                 69632  0
nf_tables             409600  298 nft_ct,nft_reject_inet,nft_fib_ipv6,nft_fib_ipv4,nft_chain_nat,nft_reject,nft_fib,nft_fib_inet
nvidia_drm            135168  10
nvidia_modeset       1650688  11 nvidia_drm
nvidia_uvm           6844416  0
qrtr                   57344  2
bnep                   36864  2
lm75                   28672  0
nvidia              72577024  133 nvidia_uvm,nvidia_modeset
sunrpc                897024  1
binfmt_misc            28672  1
vfat                   24576  1
fat                   114688  1 vfat
rtw89_8852ae           12288  0
rtw89_8852a           716800  1 rtw89_8852ae
rtw89_pci             114688  1 rtw89_8852ae
snd_hda_codec_realtek   208896  1
snd_hda_codec_generic   131072  1 snd_hda_codec_realtek
rtw89_core            950272  2 rtw89_pci,rtw89_8852a
snd_hda_codec_hdmi    102400  1
snd_hda_scodec_component    20480  1 snd_hda_codec_realtek
amd_atl                53248  1
intel_rapl_msr         20480  0
intel_rapl_common      57344  1 intel_rapl_msr
snd_usb_audio         598016  2
snd_hda_intel          69632  4
snd_intel_dspcfg       40960  1 snd_hda_intel
snd_intel_sdw_acpi     16384  1 snd_intel_dspcfg
edac_mce_amd           40960  0
mac80211             1753088  2 rtw89_core,rtw89_pci
snd_usbmidi_lib        57344  1 snd_usb_audio
snd_hda_codec         225280  4 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek
snd_ump                40960  1 snd_usb_audio
snd_rawmidi            57344  2 snd_usbmidi_lib,snd_ump
snd_hda_core          155648  5 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek
mc                     90112  1 snd_usb_audio
kvm_amd               217088  0
btusb                  86016  0
snd_hwdep              20480  2 snd_usb_audio,snd_hda_codec
btrtl                  36864  1 btusb
snd_seq               135168  7 snd_seq_dummy
btintel                65536  1 btusb
libarc4                12288  1 mac80211
snd_seq_device         16384  3 snd_seq,snd_ump,snd_rawmidi
btbcm                  24576  1 btusb
btmtk                  12288  1 btusb
joydev                 32768  0
snd_pcm               196608  5 snd_hda_codec_hdmi,snd_hda_intel,snd_usb_audio,snd_hda_codec,snd_hda_core
kvm                  1445888  1 kvm_amd
cfg80211             1421312  3 rtw89_core,mac80211,rtw89_8852a
bluetooth            1056768  44 btrtl,btmtk,btintel,btbcm,bnep,btusb,rfcomm
hp_wmi                 32768  0
snd_timer              53248  3 snd_seq,snd_hrtimer,snd_pcm
sparse_keymap          12288  1 hp_wmi
r8169                 131072  0
platform_profile       12288  1 hp_wmi
snd                   159744  28 snd_hda_codec_generic,snd_seq,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_usb_audio,snd_usbmidi_lib,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_ump,snd_pcm,snd_rawmidi
wmi_bmof               12288  0
rfkill                 40960  8 hp_wmi,bluetooth,cfg80211
gpio_amdpt             16384  0
soundcore              16384  1 snd
video                  81920  1 nvidia_modeset
i2c_piix4              40960  0
realtek                45056  1
amd_pmc                57344  0
rapl                   20480  0
k10temp                16384  0
acpi_cpufreq           32768  0
acpi_tad               20480  0
pcspkr                 12288  0
gpio_generic           20480  1 gpio_amdpt
loop                   45056  0
nfnetlink              24576  4 nf_tables,ip_set
zram                   40960  1
crct10dif_pclmul       12288  1
crc32_pclmul           12288  0
crc32c_intel           16384  3
polyval_clmulni        12288  0
polyval_generic        12288  1 polyval_clmulni
nvme                   69632  0
ghash_clmulni_intel    16384  0
sha512_ssse3           53248  0
nvme_core             245760  1 nvme
sha256_ssse3           36864  0
ccp                   180224  1 kvm_amd
sha1_ssse3             32768  0
sp5100_tco             20480  0
nvme_auth              28672  1 nvme_core
wmi                    32768  3 hp_wmi,video,wmi_bmof
ip6_tables             28672  0
ip_tables              28672  0
fuse                  233472  5
User avatar
PGeyer-Ego
EGOSOFT
EGOSOFT
Posts: 56
Joined: Thu, 9. Jun 22, 14:37
x4

Re: Random crashes of graphics device under Linux while playing

Post by PGeyer-Ego »

Hi there,

I have tried to reproduce this in my local Fedora 40 install and have been unable to so far.

The Nvidia forums suggest that this may be that your 30 series is power spiking and your PSU is insufficient to handle that.

Can you check this and see if this is perhaps an issue, if not I can investigate further.

Thanks
PG
-=FL=- UniversE
Posts: 1198
Joined: Sat, 31. Dec 05, 13:36
x4

Re: Random crashes of graphics device under Linux while playing

Post by -=FL=- UniversE »

It is a stock PC (HP OMEN GT21-0000ng) with an 800 W PSU and factory settings. I did not build my own and don't mess with the settings, particularly to avoid issues - but that doesn't mean that they didn't mess it up.

The most recent two cases I had in two subsequent attempts of the Omicron Lyrae Timelines Scenario. I don't really think that there is something special with this scenario and I assume it was just bad luck, but I thought I mention it anyway, just in case. I will try it a third time tomorrow and see if I can complete it.

I will check how I can run any measurements about power consumption (and maybe also temperature, just be sure) in parallel and in a way that I have the logs available even after a hard reset of the machine due to this issue. If it's just about spikes, I don't know if I can come up with a sample rate that captures the spike without killing the SSD with logs.

It may take a while until I can come back with the measurements, because I have to figure out how exactly to collect them, first.

Update looks like it's simpler than I thought. I will use nvidia-smi -q -d POWER,TEMPERATURE to collect the data and come back when I have something.
-=FL=- UniversE
Posts: 1198
Joined: Sat, 31. Dec 05, 13:36
x4

Re: Random crashes of graphics device under Linux while playing

Post by -=FL=- UniversE »

It seems to be quite reproducible in the Omicron Lyrae Timelines scenario.

The first time I played it, it happened in the first "cut scene" when the other destroyers spawn in.

The second time it happened in the "it starts firing" scene.

The third time, it happened again in the first cut scene.

The crash happend at 22:32:33 and I've got a reading just the second before. These are the most recent two (I watched with a 10s interval)

Code: Select all

Timestamp                                 : Sun Sep  8 22:32:22 2024
        GPU Current Temp                  : 80 C
        Power Draw                        : 306.68 W
        Max                               : 346.37 W
Timestamp                                 : Sun Sep  8 22:32:32 2024
        GPU Current Temp                  : 79 C
        Power Draw                        : 286.65 W
        Max                               : 312.12 W
The moment right before the crash looks fine. 312 W max is below the limit of 320 W. But ten seconds earlier a max of 346 W was recorded, which is way beyond the power limit. The 3080 Ti and the "bigger" 3080 has a power limit of 350, but mine should not go beyond 320.
Still, the 800 W PSU should be able handle it. Also, the "Power Draw" reading never exceeded 320, only the Max reading went up to even 353.82 W about 40 seconds before the crash.

The temperature looks okay - it's 15°C away from the slowdown temp and I have read that around 80°C is quite normal for NVIDIA cards.

I am not an expert on this, but since the "worst" reading was 40 seconds before the incident and another "bad" reading was six minutes before the incident, I don't think it's related to power or temperature. Maybe I'll just play a session of Cyberpunk tomorrow, with my monitoring enabled, to see if it produces similar temperatures and power readings without crashing.

Update:
Just in case someone is interested in how I measured this. Put this into a gpumon.sh and execute with "watch -n 10 ./gpumon.sh":

Code: Select all

#!/bin/sh

nvidia-smi -q -d power,temperature \
  | sed -n '4p;11p;20p;29{p;q}' \
  | tee -a gpumon.log
-=FL=- UniversE
Posts: 1198
Joined: Sat, 31. Dec 05, 13:36
x4

Re: Random crashes of graphics device under Linux while playing

Post by -=FL=- UniversE »

I can now confirm that I can reliably reproduce the issue in the Omicron Lyrae Timelines scenario. This time it crashed in the "it's about to fire" scene, again.

It also seems to be confirmed, that it only happens reliably in the cut scenes.

What I am guessing here is, that the cut scenes are not as rendering intensive as everything else, which somehow makes the PCI power management think that it can put the graphics card into D3 state, but then fails to quickly get it back to D0 (where it is supposed to be all the time when running the game).

But that's only a wild guess. And since the crash - although extremely rarely and not really reproducible - also happened in the open universe, I am not sure if my theory holds.
Rinne
Posts: 53
Joined: Sat, 18. Aug 12, 13:06
x4

Re: Random crashes of graphics device under Linux while playing

Post by Rinne »

Your suspicion is possible. I had a similar issue with my Vega 56 that would crash during frequency changes, as the voltage can't keep up.
This can come with silicon age, though it shouldn't yet happen with 30 series card, if everything is stock.

According to the NVIDIA- and other forums, it can also be a bios issue. There was a new BIOS issued for your computer in may this year. Maybe check it out:
OMEN 40L Desktop GT21-0000ng Bundle PC (675H8EA) Software

System Integrators sadly often pair inadequate hardware combinations. To make matters worse, the wattage rating of a PSU tells little of how it can handle spikes. Even a poor 1200W PSU can fail to reliably supply a 30 series card.
Unfortunately, I don't know much about HP's selection of PSUs, but it looks to be an in-house solution (which doesn't have to be bad).

One thing you can try is to force a power level for the gpu. I don't know how it works for nvidia, or if it is even possible without activating overclocking options.
On AMD, you can set the performance governor. E.g.

Code: Select all

sudo echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level
Forces the GPU to stay in its lowest power state.

Code: Select all

sudo echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
Forces it into its highest power state.

Code: Select all

sudo echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level
or a simple reboot reset it to dynamic scaling.

If it doesn't crash on low, but crashes on high, it's likely a power delivery issue.
If it doesn't crash on low or high, but only on auto, it is likely that the power delivery can't keep up during frequency changes (which was my issue and fixed with a slight voltage increase on the - previously underclocked - gpu).

I'm not 100% sure, but think low shouldn't see (harsh) spikes, but high still can. It should still be helpful to get a better idea of the underlying issue.
Unfortunately, as mentioned, the above commands only work on amd. You'd have to check your nvidia driver for similar settings.

The reports I found online also tied the issue to specific linux kernel versions, rather than nvidia driver versions. So possibly something changed in the kernel as well.
-=FL=- UniversE
Posts: 1198
Joined: Sat, 31. Dec 05, 13:36
x4

Re: Random crashes of graphics device under Linux while playing

Post by -=FL=- UniversE »

I think I have found a location where to reproduce the issue in the Open Universe, as well.

Since the issue appeared reliably in the Omicron Lyrae Timelines scenario, I did not bother to try again and again and again with the same result (that would be the definition of insanity, wouldn't it ; -) and instead I continued to play in the Open Universe.

For several days now that went well without the GPU falling from the bus.

Until I investigated a particular Data Vault in a Space Suite. Yesterday it happend quite immediately after exiting my Katana (and several hours of gameplay) and today I loaded an auto-save from a few minutes before, went to the data vault and managed to unlock three or four doors until it happened again (that was like 5 minutes or so into the game).

I made a quicksave before the data vault which can be downloaded here: https://uap-core.de/misc/x4bug/quicksave.xml.gz

On the one hand I hope it will happen again (s.t. we have a savegame to work with), on the other hand I hope it does not, because that would mean, I am blocked in both Timelines and Open Universe now :(
Update: in the meantime I managed to get passed that point without a crash. So maybe the savegame is not as good as I thought it would be, just because it happened two times in a row. Might be back on square zero...

To summarize what we know so far:
  • it happens in the Omicron Lyrae Timelines Scenario, but (so far) only in the cut scenes
  • it happened twice in a row near the data vault in the uploaded savegame (update: but cannot be reliably reproduced)
  • it is extremely rare in open universe (and perhaps even impossible to pinpoint / reliably reproduce)
  • it does not matter how many hours the game is already running (or in other terms: how "hot" the hardware already is)
  • it seems to be related with PCI power management, probably due to sending the device in D3 state wrongly and then having not enough power to do the job (okay, that we don't know, but this is my theory)
I am still theorizing that it might have something to do with particular Vulkan calls or rarely invoked shaders that do not appear often in the open universe, but definitely in the Omicron Lyrae Timelines cut scenes.

Return to “X4: Foundations - Technical Support”