Single, not duel-threaded CPU usage?

Ask here if you experience technical problems with X4: Foundations.

Moderator: Moderators for English X Forum

Post Reply
Verge
Posts: 4
Joined: Tue, 26. Nov 13, 10:46
x4

Single, not duel-threaded CPU usage?

Post by Verge » Sat, 21. Sep 19, 09:02

Game uses only 1 thread, all other threads are at below 25% usage in task manager. I know this game is duel-threaded and that multi-threaded support is low but I'm not even getting duel-core performance here, pure single-thread. I'm gonna blame AMD but yeah: is there a workaround? Anyway, specs:

3800X / X570
RTX 2070 (OC, definitely not bottleneck according to task manager)
DDR4-3333C16 (OC, possibly unstable? Stable everywhere else)
Game installed on SU800 and Windows on 660p

Full parts list if I'm missing anything: https://pcpartpicker.com/list/Lctr9J

Alan Phipps
Moderator (English)
Moderator (English)
Posts: 30436
Joined: Fri, 16. Apr 04, 19:21
x4

Re: Single, not duel-threaded CPU usage?

Post by Alan Phipps » Sat, 21. Sep 19, 13:51

There are two aspects to this: firstly the game is multi-threaded and does use threads from typically 4 real cores (less so for others above that if available), with of course other background apps, the OS and drivers etc using those and others. Secondly, your system/OS should be balancing work across all available cores (periodically switching hard-working cores not specifically due to in-game re-threading) to keep the heat, stress and damage build-up across the cores more balanced.

The first is a demonstrable fact and the second is something that should happen unless you have set up specific application/core affinities.
A dog has a master; a cat has domestic staff.

User avatar
LennStar
Posts: 879
Joined: Fri, 1. Apr 05, 15:22
x4

Re: Single, not duel-threaded CPU usage?

Post by LennStar » Sat, 21. Sep 19, 18:29

Try disabling SMT. I definitely get better fps without. (Ryzen 1700)
:idea: BUG REPORT INFO: I play X4 vanilla. You can find all my bug report files in there:
All X4 files: https://www.dropbox.com/sh/83j3cjfhkdlf ... w6HLa?dl=0

Imperial Good
Moderator (English)
Moderator (English)
Posts: 4764
Joined: Fri, 21. Dec 18, 18:23
x4

Re: Single, not duel-threaded CPU usage?

Post by Imperial Good » Sat, 21. Sep 19, 19:42

Verge wrote:
Sat, 21. Sep 19, 09:02
Game uses only 1 thread, all other threads are at below 25% usage in task manager.
This is a contradictory statement. Either the game uses 1 thread so 1 core is heavily loaded and the rest are at 0%, or the game is multi threaded and is loading 1 core more than the others with the rest at 0 to 25%. What you describe is what I would expect, since one thread will be heavier than the others with the rest loading much less.

Please make sure you are running Windows 10 1903 with a AGESA ABBA BIOS installed on your motherboard and the Ryzen Balanced power plan from the latest chipset drivers. Anything older than that and third generation Ryzen processors will not operate correctly including incorrect OS thread scheduling or incorrect boosting behaviour.

As an example I use a Ryzen 9 3900X. The heaviest thread is always placed on Core 3 or Core 2. Any others fill up in the order specified by Ryzen Master based on the binning of the CPU (vary between individual processor). In preference die 1 (cores 1 to 6) fill up over mixing the dies (cores 7 to 12) for best thread synchronization performance. The loaded cores can boost up to 4.51 GHz depending on the work load (probably thermal limited) which will even give an Intel Core I9 9900K a good run for gaming performance.
Verge wrote:
Sat, 21. Sep 19, 09:02
DDR4-3333C16 (OC, possibly unstable? Stable everywhere else)
If the memory was unstable generally the computer is unstable due to memory errors. As such this is likely not the case.

Try running tests using Ryzen DRAM calculator. It is possible your memory performance is less than optimal if the infinity fabric ratio is not 1:1 with the actual memory clock (half the transfer rate frequency). A transfer rate of 3333 is very unusual and might not necessarily be a stable infinity fabric clock which would result in a down clocking of the infinity fabric which would reduce performance. For example with DDR4 3200 (in spec operation, bare minimum I recommend) one wants an infinity fabric speed of 1,600 MHz since the DRAM is operating at 1,600 MHz (half the transfer rate of 3,200 MHz).

Usually Ryzen DRAM calculator gives decent recommendations for timing. Importantly it also gives a set of tools for measure memory performance such as the critical access latency. One has to use these tools before memory OC (stock XMP settings) and after to see if there are any real performance improvements.
Alan Phipps wrote:
Sat, 21. Sep 19, 13:51
Secondly, your system/OS should be balancing work across all available cores (periodically switching hard-working cores not specifically due to in-game re-threading) to keep the heat, stress and damage build-up across the cores more balanced.
I do not think this is true. Although the part about heat build up is true, it makes no difference as anything below 90 degrees is perfectly acceptable operation which will allow the CPU to function correctly for many decades and well past its useful (not replaced by faster) life.

From what I can tell the OS scheduler has 2 behaviours based on CPU vendor.
  • Intel: Due to finite boost duration the OS scheduler will shift threads between cores. This allows individual cores to recover their boost duration timer allowing the CPU to operate at boost speeds for longer, possibly infinitely, in lightly threaded work loads where most of the cores are idle at any given time. This is literally the only explanation I have for this behaviour which I observed on my I9 920 and some other Intel processors. This did not used to happen with Windows Vista which means that it was added to Windows 7 and later likely to deal with the modern boosting behaviour of Intel processors even if not really applicable to older ones like the 920.
  • AMD (Zen2, I have not experienced anything else): Due to each core having a specific boost behaviour from the binning process the OS scheduler will shit heavy threads to the highest performance (best binned) core, prioritizing filling single chiplets at a time (39##X and thread ripper only as they have multiple chiplets). This is the total inverse of what you describe and what I observed on Intel since on my Ryzen 9 3900X the OS will actively move a single heavy thread to core 2 or 3 (my best cores on chiplet 1) and park it there for an effectively infinite amount of time. This is also the behaviour AMD officially states it should be doing.
LennStar wrote:
Sat, 21. Sep 19, 18:29
Try disabling SMT. I definitely get better fps without. (Ryzen 1700)
Should make no difference as with Windows 10 1903 the scheduler will prioritize loading real cores before SMT cores. Yes it can make a difference but only with applications which scale badly but still try to use SMT cores like real cores.

Alan Phipps
Moderator (English)
Moderator (English)
Posts: 30436
Joined: Fri, 16. Apr 04, 19:21
x4

Re: Single, not duel-threaded CPU usage?

Post by Alan Phipps » Sat, 21. Sep 19, 20:50

@ Imperial Good: I bow to your far, far more recent technical knowledge than mine. Here be dinosaurs! :D
A dog has a master; a cat has domestic staff.

Scoob
Posts: 10082
Joined: Thu, 27. Feb 03, 22:28
x4

Re: Single, not duel-threaded CPU usage?

Post by Scoob » Mon, 23. Sep 19, 17:31

Imperial Good wrote:
Sat, 21. Sep 19, 19:42
Intel: Due to finite boost duration the OS scheduler will shift threads between cores. This allows individual cores to recover their boost duration timer allowing the CPU to operate at boost speeds for longer, possibly infinitely, in lightly threaded work loads where most of the cores are idle at any given time. This is literally the only explanation I have for this behaviour which I observed on my I9 920 and some other Intel processors. This did not used to happen with Windows Vista which means that it was added to Windows 7 and later likely to deal with the modern boosting behaviour of Intel processors even if not really applicable to older ones like the 920.
AMD (Zen2, I have not experienced anything else): Due to each core having a specific boost behaviour from the binning process the OS scheduler will shit heavy threads to the highest performance (best binned) core, prioritizing filling single chiplets at a time (39##X and thread ripper only as they have multiple chiplets). This is the total inverse of what you describe and what I observed on Intel since on my Ryzen 9 3900X the OS will actively move a single heavy thread to core 2 or 3 (my best cores on chiplet 1) and park it there for an effectively infinite amount of time. This is also the behaviour AMD officially states it should be doing.
This is interesting. On my Intel CPU - and older 2600k - thread loading characteristics are a bit different. Invariably in X4, once the game has picked the "busy" thread and placed it on a Core, that busy thread will stay there. So, I launch the game one time and Core #1 (either thread #2 or #3) will be the busy thread that entire session. I launch it another time and Core #2 (either thread #4 or #5) will be busy. The scheduler is good enough to know not to place the second heaviest thread on the same CPU - I remember when it didn't in the early days!

Conversely I've been reading and watching a lot of videos on the new Ryzen 3000 (Zen 2) architecture, as I'm interested in potentially upgrading very soon. In numerous videos it's been remarked upon how Zen 2 CPU's will move a single-threaded work load around a LOT during execution. So, observing live, you'd see each core/thread being busy, but not all at once, as the load is moved around. If however you're just looking at historical "max" data, you'd see that all cores, at some point, likely hit their maximum boost state, suggesting the CPU was perhaps running a heavily-threaded work load when, in fact, it was just moving that load around.

It's really interesting stuff!

So, for complete clarity, I'm running a 2600k with a 4.7ghz all core OC using Windows 10 1903. That all-core clock may well be the reason that I don't see the core hopping that others do. Though, certainly in the videos I watched, no one commented seeing this behaviour on Intel, only AMD. Personally, it sounds like a good thing to be able to spread a single-threaded load like that, but I wonder at the potential overhead incurred.

Note: the one weird exception on my CPU when it comes to single threaded load is Cinebench R20. I usually use the multi-threaded test as a short stability and heat check. I did do a couple of single threaded tests to compare my ageing CPU's performance vs. the 3900X - it's over twice as fast lol. However, at NO point during the test were any of my Cores / threads anything above 50% and they were constantly changing. This is, in part at least, likely due to how HT is "seen" by monitoring tools - i.e. 50% = actually 100% as HT doesn't mean double the resources for a given core of course. The other part is that the single-threaded workload was indeed moving around, yet all my Cores boost the same.

I am very interested in your experiences with the 3900X @Imperial Good - I did consider one myself, then decided to hold off until the 3950X as, other than X4, all the titles I play run just fine. Oh, quick question... There's contradicting information out there for the 3900x. Some say it's two six core chiplets, other says it's one eight core (full) and one four core. The former makes more sense to me.

Btw: I find it interesting that the 3900X (and 3950X when it's out) actually have double the memory write bandwidth vs. the single chiplet designs for a given RAM module. Read performance is the same. I wonder how much this might impact gaming, if at all?

Scoob.

Imperial Good
Moderator (English)
Moderator (English)
Posts: 4764
Joined: Fri, 21. Dec 18, 18:23
x4

Re: Single, not duel-threaded CPU usage?

Post by Imperial Good » Mon, 23. Sep 19, 19:59

Scoob wrote:
Mon, 23. Sep 19, 17:31
This is interesting. On my Intel CPU - and older 2600k - thread loading characteristics are a bit different. Invariably in X4, once the game has picked the "busy" thread and placed it on a Core, that busy thread will stay there. So, I launch the game one time and Core #1 (either thread #2 or #3) will be the busy thread that entire session. I launch it another time and Core #2 (either thread #4 or #5) will be busy. The scheduler is good enough to know not to place the second heaviest thread on the same CPU - I remember when it didn't in the early days!
The 2600k is a quad core CPU. It is highly likely that X4 loads it enough that any Intel stock boost behaviour cannot be maintained. At this stage the scheduler should stop moving threads between cores as that decreases cache efficiency and generates extra overhead and hence reduces performance. That said if it would benefit from this with lesser workloads is questionable given the age of the CPU and even with modern CPUs one can overclock them in the form of disabling or greatly increasing boost duration limits.
Scoob wrote:
Mon, 23. Sep 19, 17:31
Conversely I've been reading and watching a lot of videos on the new Ryzen 3000 (Zen 2) architecture, as I'm interested in potentially upgrading very soon. In numerous videos it's been remarked upon how Zen 2 CPU's will move a single-threaded work load around a LOT during execution. So, observing live, you'd see each core/thread being busy, but not all at once, as the load is moved around. If however you're just looking at historical "max" data, you'd see that all cores, at some point, likely hit their maximum boost state, suggesting the CPU was perhaps running a heavily-threaded work load when, in fact, it was just moving that load around.
These videos are incorrect. They were likely made with old Windows, Chipset and BIOS versions. Specifically one needs to use Windows 10 1903 or newer as that contains scheduler improvements which enable better cores to be prioritized. Additionally many bugs have been fixed in more recent AGESA and chipset releases.

For example when I run cinebench R20 single thread test on my Ryzen 9 3900X the single thread will hop between core 2 or core 3. It will not hop to any other cores and looking at Ryzen Master core 3 is fastest and core 2 second fastest of that chiplet. Although it will stay on one or the other for an extended period it occasionally swaps over, this is likely the result of both cores being very close in binning so at any given point in time one might become faster than the other due to different core thermals. Both reach a speed of ~4.51 GHz, out of the chips maximum rated 4.6 GHz boost.

Some people are reporting that the scheduler is not working correctly. Specifically that the OS moves threads to the wrong cores (not fastest) or keeps some cores boosted incorrectly when not loaded. If this is the case then a fix will likely come in a month or so.
Scoob wrote:
Mon, 23. Sep 19, 17:31
So, for complete clarity, I'm running a 2600k with a 4.7ghz all core OC using Windows 10 1903. That all-core clock may well be the reason that I don't see the core hopping that others do.
It could be because of the improved scheduler with 1903. It could also be that if what I am saying above about moving for boost duration sharing is true then the scheduler might be aware of the overclock and hence turn off that behaviour.
Scoob wrote:
Mon, 23. Sep 19, 17:31
Personally, it sounds like a good thing to be able to spread a single-threaded load like that, but I wonder at the potential overhead incurred.
It incurs a context switching overhead, a pipeline stall overhead (due to shutting down the currently executing pipeline), a synchronization overhead as state is pushed out to higher tiers of cache and then cache miss overhead as the state gets pulled towards the other core for loading and resuming. I think the measured overhead of this sort of thing is between 2 and 5% of total performance depending on how often it occurs so it can be significant and is sub optimal.

However if this happens on Intel chips to try to extend boost duration then that overhead can be easily masked by the higher average clock speeds that boost provides. This is why it is my theory (I lack strong proof) for this happening.
Scoob wrote:
Mon, 23. Sep 19, 17:31
I am very interested in your experiences with the 3900X @Imperial Good - I did consider one myself, then decided to hold off until the 3950X as, other than X4, all the titles I play run just fine. Oh, quick question... There's contradicting information out there for the 3900x. Some say it's two six core chiplets, other says it's one eight core (full) and one four core. The former makes more sense to me.
Well I cannot speak for all cases, but my 3900X and all demonstrations from AMD consists of 2 Core Complex Dies containing 2 core complexes each with 3 cores. Hence it is 2x3 cores per die with 2 dies for 12 cores.

Since all the core complex dies are binned, it is unlikely a 8 core die will be used in a 3900X for logical reasons. That same die could much rather be used as a 3700X or 3800X or if exceptional a 3950X. I speculate the core complex dies are sorted into 6 or 8 cores (with either 3 or 4 cores per core complex) and then depending on the quality of the binning they get allocated to either the 3900X, 3600X or 3600 for 6 cores and 3950X, 3800X or 3700X for 8 cores. This means that the 3900X has the best binned 6 core dies while the 3950X will have the best binned 8 cores dies and hence best binned dies in general. Yield of good binned 6 cores core complex dies will logically be higher than 8 cores since any 8 core can be turned into a 6 core by disabling the slowest core of each core complex, especially if those cores are outliers.

Not using a uniform layout would likely incur some performance overhead, which is not something AMD would want to push for their high end consumer CPUs like the 3900X which already is more expensive to own than a Core I9 9900K, especially due to the x570 chipset. Specifically if it was 8+4 cores then 16 threads would share the same amount of cache and internal bandwidth as just 8 threads on the other core complex die which would cause significantly more cache misses for them and bandwidth related bottlenecking.
Scoob wrote:
Mon, 23. Sep 19, 17:31
Note: the one weird exception on my CPU when it comes to single threaded load is Cinebench R20. I usually use the multi-threaded test as a short stability and heat check. I did do a couple of single threaded tests to compare my ageing CPU's performance vs. the 3900X - it's over twice as fast lol. However, at NO point during the test were any of my Cores / threads anything above 50% and they were constantly changing. This is, in part at least, likely due to how HT is "seen" by monitoring tools - i.e. 50% = actually 100% as HT doesn't mean double the resources for a given core of course. The other part is that the single-threaded workload was indeed moving around, yet all my Cores boost the same.
This again supports my theory that on Intel CPUs the scheduler will try to share single threaded workloads between cores to extend boost duration by allowing all cores an equal period of time to idle. With my 3900X the Cinebench thread gets pushed to either Core 2 or Core 3 of core complex die 1 core complex 1 and it remains there most of the time. It preferentially chooses core 3 (fastest core of the core complex die) over core 2 but will occasionally swap to core 2 for a few seconds.

What you think about hyperthreading is not correct. The OS should not load hyperthreading cores at all until all cores are mostly loaded. It is completely possible for each thread of a hyper threaded core to reach 100% load as the utilization report is based on scheduling and not actual core performance. If a value of 50% is reported then that thread (nothing to do with core) has nothing scheduled on it for half the time. If one schedules 2 threads onto the same core using hyperthreading then both the threads will execute slower than when executing on separate cores however the utilization report mechanic remains the same, based purely on the amount of time the OS scheduler keeps something scheduled to that execution unit thread. Potentially both will report 100%, or less if the threads are not scheduled all the time.

Hyperthreading and SMT give a core twice the execution units but share some of the functional units, clock and bandwidth. Modern HT/SMT core design is so parallel that it is statistically unlikely that contention will occur. The main limiting factor for performance is the shared cache, bandwidth and current/power limits (since more die area is used so more power is used). This is how a Ryzen 3600 can effortlessly beat an Intel Core I5 9500 in multi threaded tasks by a huge margin despite having lower clocked cores and the same core count. This is also why it is likely that Intel's next generation of consumer CPUs will feature HT down the full stack, like AMD does.
Scoob wrote:
Mon, 23. Sep 19, 17:31
Btw: I find it interesting that the 3900X (and 3950X when it's out) actually have double the memory write bandwidth vs. the single chiplet designs for a given RAM module. Read performance is the same. I wonder how much this might impact gaming, if at all?
It will make as good as no impact. It might at best mean a percent or so of performance, but unlikely for most games and is in the range that one could easily compensate by tightening DRAM timings over base XMP. This is because game workloads use so little memory write bandwidth anyway that the only time one would notice it is with artificial or very specific workloads. The 6 and 8 core variants do have less write bandwidth by half, but they also have half the core count of the 12 and 16 core variants which have the full write bandwidth, hence they will generally execute roughly half as much code that needs to write data within the same time and hence use roughly half the bandwidth.

Scoob
Posts: 10082
Joined: Thu, 27. Feb 03, 22:28
x4

Re: Single, not duel-threaded CPU usage?

Post by Scoob » Mon, 23. Sep 19, 20:40

That all makes sense. There was quite the mix of hardware, software, and (sometimes flaky) firmware in the various videos I've watched since launch. The general theme though is that things are generally improving for Ryzen 3 with subsequent updates. I'm long overdue a new build, and it's a toss up between going the "just a gamer" route, so a 9700k, or something I can tinker with an do more, hence the 3950X interest. The appeal of the 3950X was that it has two full chiplets, plus there's talk / speculation that these might be the better binned parts the 3800X gets. Though how much actual additional mileage that gives remains to be seen.

Regarding hyperthreading, it's more of a quirk of the reporting tools than anything. I.e. I can enable HT and see a reported overall load of typically 25-30% in a given test or game - fairly evenly spread over all eight Threads. If I turn off HT, I then see a load of 50-60%, spread fairly evenly over all four threads. This sort of thing appears typical in various games and tools I've tested this with. For the record I use Process Explorer as my primary monitoring tool for this sort of thing, though I have tried various others and seen the same results. It's like the tool thinks it has more resources available (two Full Cores) than it really does (one true Core, two threads) and can make a CPU that's quite heavily taxed look like it has more resources to spare. Perhaps this is more prevalent on older Intel systems, but it's something that appears fairly consistent for me.

From my observations, many benchmarking and stress-testing tools are very good at making use of HT, certain games not so much. I know my CPU is old now, and four real Cores are really starting to be entry-level. However, other than X4, I simply don't get performance issues in the other titles I play...yet. Some are equally heavy on the CPU as X4, or even more so, but they seem more balanced over the available Cores / Threads. That said the general trend seems to be that games are making more use of greater than four threads. Hence why my 2600k overtook the slightly better clocked 2500k in newer titles some time back now...plus the extra Cache couldn't have hurt.

Good to hear the minimal impact of the odd RAM config in gaming, that's pretty much what I expected, though I understand certain productivity tasks can be impacted somewhat. I have actually read up on tweaking memory for Ryzen 3000, it looks like it can make quite the difference in certain scenarios. It's a bit contrary to what I'm used to, as with Intel of my chip's generation it was pick your RAM, enable XMP and you're done. Minimal fine tuning unless you wanted to overclock.

Got to say, while I've been Intel for a while, I'm quite impressed by Zen and Zen2 in particular, despite the early teething problems. If they can keep up with 3900X and future 3950X demands and the prices remain near those AMD listed, I'm certainly interested. For the record, I don't do new gaming builds very often, I still with one Mobo and CPU for years. I went from an AMD X2 4400+ to a Q6600 and then to the 2500k as my main gaming systems. I then got a second-hand 2600k, motherboard and RAM dirt cheap about six years ago, so migrated to that. Each time I've had a huge jump in CPU power, and I'd like that again. IPC-wise, in Cinebench R20 at least, the 3900X - and one assumes Ryzen 3000 series in general - near doubles my Single Core scores. If that even came close to translating to the same amount in gaming, then I'd be on for another huge jump with any of the Zen 2 lineup.

Scoob.

Scoob
Posts: 10082
Joined: Thu, 27. Feb 03, 22:28
x4

Re: Single, not duel-threaded CPU usage?

Post by Scoob » Mon, 23. Sep 19, 21:07

Just wanted to add:

I've disabled HT again on my 2600k, with everything else unchanged. X4's reported CPU load basically doubles as expected. However, what I didn't expect was the much more even CPU load over ALL four threads, I'm not seeing one dominant thread (on one Core, with the other thread on that core near idle) along with a second not quite so busy Core and the rest near idle. The load is still very spiky, changing from second to second by 30-40% per core, but load is much more even. This is on my large station with lots of traffic. Perhaps these are quirks of the older architecture in my ageing 2600k.

Performance is upwards of 40fps, which isn't terrible considering the size of the station. There is periodic combat going on with my largish wings of Corvettes and fighters engaging random Xenon, Kha'ak and the odd hostile SCA ships. GPU load tops out about 55%

Scoob.

Imperial Good
Moderator (English)
Moderator (English)
Posts: 4764
Joined: Fri, 21. Dec 18, 18:23
x4

Re: Single, not duel-threaded CPU usage?

Post by Imperial Good » Tue, 24. Sep 19, 00:53

Scoob wrote:
Mon, 23. Sep 19, 20:40
I'm long overdue a new build, and it's a toss up between going the "just a gamer" route, so a 9700k, or something I can tinker with an do more, hence the 3950X interest. The appeal of the 3950X was that it has two full chiplets, plus there's talk / speculation that these might be the better binned parts the 3800X gets. Though how much actual additional mileage that gives remains to be seen.
If you just plan to game get a 9900K. It will run most games better than the 3950X and is significantly cheaper (cheaper than a 3900X). Only reason to get AMD at that price range is if you need more cores or if their better power efficiency grabs your attention.
Scoob wrote:
Mon, 23. Sep 19, 20:40
It's like the tool thinks it has more resources available (two Full Cores) than it really does (one true Core, two threads) and can make a CPU that's quite heavily taxed look like it has more resources to spare. Perhaps this is more prevalent on older Intel systems, but it's something that appears fairly consistent for me.
The tools report based on scheduling. It doubles the number of execution units and hence the same workload will load the available cores by half the time. The CPUs still have significant amount of resources to spare, hence why HT/SMT was developed to start with.

If you use that sort of argument against HT/SMT then one can do that with multiple cores. Since as more CPU cores get loaded the average CPU clock of each core decreases and hence the average performance of each core decreases. This applies to both modern Intel and AMD, with exception of the up coming I9 9900KS which should hit 5GHz irrespective of core loading. For example with my Ryzen 9 3900X it can hit 4.54 GHz odd when running a single heavy thread but that falls to just 4.1 GHz or less when running all cores. Additionally memory bandwidth and cache is shared between threads so 2 separate cores running can degrade their performance beyond that.

The reason it is recommended to turn off HT or SMT with some workloads is that some workloads scale very badly with thread count. For example they might throw twice the thread count at a task to gain 30% more performance. In such case the reduction in thread performance from SMT can degrade performance over not having those extra threads. This is why Ryzen Master offers gameing mode, which will limit the CPU to 8 or 16 threads max. Properly written software should not suffer from this and should always gain from SMT/HT.
Scoob wrote:
Mon, 23. Sep 19, 20:40
four real Cores are really starting to be entry-level
They have fallen below that now. 6 cores is entry level, with 4 cores being mobile, extreme budget or low power. The minimum processor core count of Ryzen third gen CPUs aimed at desktop users is 6 cores. The lower core count parts are mobile processors which often come with integrated graphic processor.
Scoob wrote:
Mon, 23. Sep 19, 20:40
It's a bit contrary to what I'm used to, as with Intel of my chip's generation it was pick your RAM, enable XMP and you're done. Minimal fine tuning unless you wanted to overclock.
This is the case for low core count parts only. Even with the I9 9900K memory timing and clock speed makes a significant difference, similar to Ryzen third gen.

In the last few years core counts have doubled or even tripled. DDR4 speed has not and all these processors are still dual channel. The result is memory speed and latency has become increasingly important. There was a reason why my I7 920 back in 2009 had tri channel memory instead of just dual channel... For comparison Intel HEDT and Xenon have anywhere from 6 to 8 channels of memory to feed the large number of cores. AMD's Epyc (and thus Threadripper) is also going to be 6 to 8 channel.

The 3900X and especially 3950X are really pushing the limits of dual channel memory. If not for the more expensive motherboards it would make much more sense to release them as quad channel. This is why you should use 3200MHz DDR4 minimum in dual channel mode, and optimizing timings further can give many percent improvement in performance.

I plan to optimize my memory timings eventually. However for now AMD is still pushing out updates which offer significant performance gains so there is little point, especially seeing how the process must be repeated each time the BIOS is updated.
Scoob wrote:
Mon, 23. Sep 19, 20:40
For the record, I don't do new gaming builds very often, I still with one Mobo and CPU for years. I went from an AMD X2 4400+ to a Q6600 and then to the 2500k as my main gaming systems. I then got a second-hand 2600k, motherboard and RAM dirt cheap about six years ago, so migrated to that.
Neither do I. I was using a Core I7 920 until last month, when the PSU murdered either the motherboard or CPU. Instead of trying to salvage the system I decided to refresh the motherboard, memory and CPU while keeping the same GPU and legacy bulk storage. Result is this kind of silly system with a Ryzen 9 3900X but NVidia GTX 760. Needless to say the GPU almost always is a bottleneck unlike before. So far I am not disappointed with the processor, although in retrospect I am not sure if I am getting good value from it as even heavy games generally load it below 10% CPU usage, at least for now.

Intel 10nm is looking to be good and would most certainly make AMD sweat, but too bad the first processors for that will hit market around 2021. Until then Intel will yet again refresh the same core designs with high frequencies to the point that they might as well become space heaters compared with AMDs even if they have better single thread performance. Hence for anyone building new now I can only recommend AMD, unless you really want to min-max gaming performance in which case it is I9 9900K/KS.
Scoob wrote:
Mon, 23. Sep 19, 20:40
IPC-wise, in Cinebench R20 at least, the 3900X - and one assumes Ryzen 3000 series in general - near doubles my Single Core scores. If that even came close to translating to the same amount in gaming, then I'd be on for another huge jump with any of the Zen 2 lineup.
Honestly I am not too certain you will get double the performance. The uplift will be significant but benchmarks such as from Gamers Nexus show that moving from a 2600K OCed to even a I9 9900K (best gaming CPU ATM) is not double the FPS. On newer games that use more threads then maybe, but not for a lot of existing games. In my case the performance from the I7 920 to Ryzen 9 3900X was huge, but I never OCed my 920 and so it was very slow at around 3 GHz and without AVX. The 2600K has AVX, is OCed to 4.7 GHz and is quad core which is enough for most games currently. The memory bandwidth from DDR4 is a lot higher, but few games are bottlenecked by that.
Scoob wrote:
Mon, 23. Sep 19, 21:07
X4's reported CPU load basically doubles as expected.
Yes because half the number of execution threads are available for the OS to not schedule anything to and hence report as idle.
Scoob wrote:
Mon, 23. Sep 19, 21:07
However, what I didn't expect was the much more even CPU load over ALL four threads, I'm not seeing one dominant thread (on one Core, with the other thread on that core near idle) along with a second not quite so busy Core and the rest near idle. The load is still very spiky, changing from second to second by 30-40% per core, but load is much more even. This is on my large station with lots of traffic. Perhaps these are quirks of the older architecture in my ageing 2600k.
The OS scheduler is aware of HT/SMT so it might change the scheduler algorithm based on if the feature is enabled or not which would explain these observed differences. Specifically with HT and SMT enabled the scheduler has to avoid loading odd (or even, seen that happen sometimes) threads as those overlap resources with physical cores. It will load 1 thread on each physical core first before scheduling additional threads on the HT/SMT cores. With it disabled the scheduler does not have to worry about that so it might instead focus on load balancing, especially if my theory on boost duration is correct.

The spiky load from X4 is due to script tasks running. All OoS logic is staggered in a semi-random way (literally random wait interval in script...) and so will fluctuate over time and can come in bursts.
Scoob wrote:
Mon, 23. Sep 19, 21:07
Performance is upwards of 40fps, which isn't terrible considering the size of the station. There is periodic combat going on with my largish wings of Corvettes and fighters engaging random Xenon, Kha'ak and the odd hostile SCA ships. GPU load tops out about 55%
Low attention combat should use minimal resources. High attention combat uses a lot of resources due to the physics involved. One can likely have 400 ships duking it out OoS in a huge mess and frame rate should barely drop at all.

Post Reply

Return to “X4: Foundations - Technical Support”