X4 performance profiling

yerdna · Post by **yerdna** » Mon, 6. Feb 23, 20:02

So X4 is not the most computationally intensive thing I run, yet it makes my system choke up as if it's trying to run everything on top of Python.

Just out of curiosity, what kind of profiling results do the devs see? where do all the cycles actually go?

The cache thrashing I see suggests there's lots of low hanging fruit that can result in major performance gains.

Spoiler

Show

I've been messing with mod writing and my suspicion is that the performance vampire is around all the XML processing going through an interpreter instead of through something like Roslyn or LLVM.
On the bright side, the XML game code structure is an AST so feeding it to LLVM should be straightforward

Post by **CBJ** » Mon, 6. Feb 23, 20:30

We do regular performance analysis, and I can assure you that we are long past the point of picking off any low hanging fruit. The full universe simulation combined with the local rendered simulation, physics, animation, AI logic, UI, sound processing, and so on, all add up to make it an extremely CPU-intensive game. Not having "levels" and artificial limits on the number of objects present at any given time, or indeed universe-wide, plays a major part in this. And with respect, your suggestion that there is some kind of XML-related performance issue has been made, and debunked, many times. The XML is read once, usually on game start, and what is used in-game are the kind of performance-oriented internal data structures that you'd expect from a C++ application.

Now, since this is not a Tech Support question, it can move over to the main forum.

Rinne · Post by **Rinne** » Mon, 6. Feb 23, 21:15

CBJ wrote: ↑
Mon, 6. Feb 23, 20:30
We do regular performance analysis, and I can assure you that we are long past the point of picking off any low hanging fruit. The full universe simulation combined with the local rendered simulation, physics, animation, AI logic, UI, sound processing, and so on, all add up to make it an extremely CPU-intensive game. Not having "levels" and artificial limits on the number of objects present at any given time, or indeed universe-wide, plays a major part in this. And with respect, your suggestion that there is some kind of XML-related performance issue has been made, and debunked, many times. The XML is read once, usually on game start, and what is used in-game are the kind of performance-oriented internal data structures that you'd expect from a C++ application.

Now, since this is not a Tech Support question, it can move over to the main forum.

Seeing as this question comes up almost weekly, does it make sense to have a sort of FAQ for performance and programming related questions?
A lot of times yours and the others devs answers are really interesting, but it's often difficult to find them quickly and I see a lot of the same questions answered again and again, with the same discussions erupting every time.

chew-ie · Post by **chew-ie** » Mon, 6. Feb 23, 22:00

Guess the psychological strain isn't high enough yet - or CBJ likes answering those

Then again - you can write & break down things all you want. People just won't read it and post anyway...

Falcrack · Post by **Falcrack** » Tue, 7. Feb 23, 22:31

CBJ wrote: ↑
Mon, 6. Feb 23, 20:30
We do regular performance analysis, and I can assure you that we are long past the point of picking off any low hanging fruit. The full universe simulation combined with the local rendered simulation, physics, animation, AI logic, UI, sound processing, and so on, all add up to make it an extremely CPU-intensive game. Not having "levels" and artificial limits on the number of objects present at any given time, or indeed universe-wide, plays a major part in this.

I think the game would be better if it did have some limits on player expansion, both from a performance and a gameplay perspective.

*runs and hides from GCU Grey Area's "strongly disagree"

chew-ie · Post by **chew-ie** » Tue, 7. Feb 23, 22:51

Falcrack wrote: ↑
Tue, 7. Feb 23, 22:31

CBJ wrote: ↑
Mon, 6. Feb 23, 20:30
We do regular performance analysis, and I can assure you that we are long past the point of picking off any low hanging fruit. The full universe simulation combined with the local rendered simulation, physics, animation, AI logic, UI, sound processing, and so on, all add up to make it an extremely CPU-intensive game. Not having "levels" and artificial limits on the number of objects present at any given time, or indeed universe-wide, plays a major part in this.
I think the game would be better if it did have some limits on player expansion, both from a performance and a gameplay perspective.

I often thought so myself - would fix a lot of trouble with those people who can't grasp why X4 needs a lot of processing power. Half the amount of NPC entities when all DLCs are active, only 100 ships per player and X4 would run butter smooth. No "blabla unoptimized" talk, less bad reviews because of performance, more CPU power for awesome AI routines.

But then again it wouldn't be X and we could as well play arcade space games like Everspace 2.

GCU Grey Area · Post by **GCU Grey Area** » Tue, 7. Feb 23, 23:33

Falcrack wrote: ↑
Tue, 7. Feb 23, 22:31
I think the game would be better if it did have some limits on player expansion, both from a performance and a gameplay perspective.

*runs and hides from GCU Grey Area's "strongly disagree"

Yep, you guessed right - don't like limits. In particular building huge complex structures is an important part of the gameplay for me (currently working on an HQ with over 3,400 modules). As for performance - they're my frames per second, can decide for myself what I want to use them for.

jlehtone · Post by **jlehtone** » Tue, 7. Feb 23, 23:49

chew-ie wrote: ↑
Tue, 7. Feb 23, 22:51
... No "blabla unoptimized" talk, less bad reviews because of performance, more CPU power for awesome AI routines.

Why I trust (some) players to do the same "blabla unoptimized" talk even then?
And the compulsory "this aint X" rants (although, X4 has had those already).

I did build moderately big in X3R. Got seconds per frame. Would not have had it in any other way.

yerdna · Post by **yerdna** » Wed, 8. Feb 23, 10:59

CBJ, what I do is processors down at the RTL level (Verilog) and also compiler optimizations, so low hanging fruit from my POV is possibly different from yours.
In processor land, random memory access (i.e. too much indirection) is the enemy of high performance computing. Period! (I/O is the other killer, but at least most people are aware of it)
We can try to compensate for that mainly by increasing cache sizes, since DRAM seek time is a major bottleneck that can make certain algorithms take the same time on a 166MHz Pentium2 as on a 4GHz Core i9.
Compiler optimizations help too, but in the end the main engineering idiom stands: you can't foolproof from talented fools (e.g. python

)

Optimizing the data structures is good, but optimizing the memory accesses is (to me) the real low hanging fruit, and is far too often overlooked.\

to illustrate the point, take this example code:

Code: Select all

#include <stdlib.h>
#include <chrono>

int* data = nullptr;
#define DATA_SZ 1024*1024*128

void initData();
int doWork(bool random);
int _rand();


int main()
{
	using std::chrono::high_resolution_clock;
	using std::chrono::duration;
	duration<double, std::micro> t_delta;

	int ret;
	initData();
	printf("data init\n");
	auto t_start = high_resolution_clock::now();

	for (int j = 0; j < 5; j++) {
		t_start = high_resolution_clock::now();
		ret = doWork(false); // use and print the return value to prevent optimizer truncation
		t_delta = high_resolution_clock::now() - t_start;
		printf("0x%8.8X seq run: %.2fus\n", ret, t_delta.count());
	}
	for (int j = 0; j < 5; j++) {
		t_start = high_resolution_clock::now();
		ret = doWork(true); // use and print the return value to prevent optimizer truncation
		t_delta = high_resolution_clock::now() - t_start;
		printf("0x%8.8X rnd run: %.2fus\n", ret, t_delta.count());
	}


	if (data != nullptr)
		free(data);
}

int doWork(bool random) {
	int ret = 0;
	for (int i = 0; i < DATA_SZ; i++)
	{
		int rnd = _rand() & (DATA_SZ-1);
		ret += rnd; // tally rand value to prevent optimizer truncation
		int addr = random ? rnd : i;
		ret += data[addr]; // force data access
	}
	return ret;
}

void initData() {
	data = (int*)malloc(sizeof(int) * DATA_SZ);
	if (data == nullptr)
		exit(0);

	for (int i = 0; i < DATA_SZ; i++)
	{
		data[i] = _rand();
	}
}

//use precessing LFSRs since CRT rand() has nondeterministic performance
int rnd_seed1 = 0xDEADBEEF;
int rnd_seed2 = 0x1EE17889;
int _rand()
{
	int feed = ((rnd_seed1 & (1 << 2)) >> 2) ^ ((rnd_seed1 & (1 << 30)) >> 30);
	rnd_seed1 = (rnd_seed1 << 1) | feed;

	feed = ((rnd_seed2 & (1 << 1)) >> 1) ^ ((rnd_seed2 & (1 << 28)) >> 28);
	rnd_seed2 = (rnd_seed2 << 1) | feed;

	return rnd_seed1 ^ rnd_seed2;
}

All that it does is it makes an array of random ints, and then does a whole bunch memory accesses. Some sequential, some random.
Running it gives me the following results for same memory page vs random accesses:

Code: Select all

data init
0x5B959E7D seq run: 272732.10us
0x88610FC5 seq run: 275153.10us
0xEBA323E0 seq run: 271510.10us
0xDA428E09 seq run: 283839.20us
0xECA3E5AC seq run: 273258.40us
0x320BE4DF rnd run: 3825904.80us
0xB52BC756 rnd run: 3831637.00us
0xAC8D19B7 rnd run: 4047264.80us
0xCDD836A0 rnd run: 3846358.00us
0xBF6C3A38 rnd run: 4055526.60us

The randomized access result is an example of what happens when you have a complex data structure with multiple levels of object instances and data residing all over the place.
A super complex data structure is one that looks like

Code: Select all

obj->blah->foo->bar->more->stuff->Process()

All those pointer dereferences or C++ sugar can compound when you're running this inside a loop and each "obj" has a unique "blah" "Foo" ...
i.e. an execution sequence becomes the following if all the memory objects have not been thoughtfully allocated (using braces as index in an memory pool)

Code: Select all

obj{0}->blah{300}->foo{1}->bar{523}->more{1}->stuff{87}->Process();
obj{1}->blah{0}->foo{634}->bar{3}->more{54}->stuff{1}->Process();
obj{2}->blah{109}->foo{32}->bar{276}->more{400}->stuff{543}->Process();
obj{3}->blah{239}->foo{298}->bar{102}->more{0}->stuff{200}->Process();
...

even though the root "obj" elements are sequential in memory and likely don't trigger cache miss between accesses, the "blah" and other child element do.
Resolutions to this are to structure the blah and others right after obj, which is effectively flattening obj.
Alternatively, a re-sorting and a reorganization of the child elements so that blah of obj{0} is right next to blah of obj{1}

(in processor land, memory page is same as cache page, not to be confused with virtual page which is an OS thing)

Post by **CBJ** » Wed, 8. Feb 23, 11:26

Your starting assumption is wrong. I've spent many happy hours tuning code and data structures to reduce cache misses.

Raptor34 · Post by **Raptor34** » Wed, 8. Feb 23, 12:21

chew-ie wrote: ↑
Tue, 7. Feb 23, 22:51

Falcrack wrote: ↑
Tue, 7. Feb 23, 22:31

CBJ wrote: ↑
Mon, 6. Feb 23, 20:30
We do regular performance analysis, and I can assure you that we are long past the point of picking off any low hanging fruit. The full universe simulation combined with the local rendered simulation, physics, animation, AI logic, UI, sound processing, and so on, all add up to make it an extremely CPU-intensive game. Not having "levels" and artificial limits on the number of objects present at any given time, or indeed universe-wide, plays a major part in this.
I think the game would be better if it did have some limits on player expansion, both from a performance and a gameplay perspective.
I often thought so myself - would fix a lot of trouble with those people who can't grasp why X4 needs a lot of processing power. Half the amount of NPC entities when all DLCs are active, only 100 ships per player and X4 would run butter smooth. No "blabla unoptimized" talk, less bad reviews because of performance, more CPU power for awesome AI routines.

But then again it wouldn't be X and we could as well play arcade space games like Everspace 2.

This. Why would I play the game if it goes around limiting me artificially. I can limit myself thank you very much. Not my fault people want to go around crashing their game and then complaining that they crashed their game.

Post by **Imperial Good** » Wed, 8. Feb 23, 12:48

I suspect a lot of the cache misses are due to the frequently accessed data set being significantly larger than the CPU cache. Factorio suffers a similar problem with large bases and is one of the reasons that update time scales worse than linearly with SPM. This is also why large cache CPUs, such as the R7 5800X3D, perform so much better running this sort of workload than their lower cache counterparts.

Most high budget, console like, games try to keep the frequently accessed data set mostly inside the CPU cache. This imposes limits that might be hard limits (limited to X entities at any given time) or soft limits (level designer removes objects due to poor performance in a scene).

Eyeklops · Post by **Eyeklops** » Wed, 8. Feb 23, 16:06

I'd love to see X4 get a benchmarking mode. Preferably with separate CPU & GPU tests. Reference: Post your FPS & CPU's scores

That topic has more replies and views than any other user created thread I've seen. There is substantial interest in how X4 performs on various hardware. In absence of a true benchmark tool the community put forth some good effort and hacked something together. However it has data consistency issues. Too many variables have the results all over the place ("I only had XYZ expansions for testing" or "My ship was stuck inside another causing FPS to tank.").

Having a CPU benchmarking tool that eliminates all possible variables except for major release (4.0, 5.0, 6.0, etc) and universe age (size? number of entities?...whatever) would be awesome.

Maebius · Post by **Maebius** » Wed, 8. Feb 23, 16:15

Eyeklops wrote: ↑
Wed, 8. Feb 23, 16:06
I'd love to see X4 get a benchmarking mode. Preferably with separate CPU & GPU tests.

Indeed.

The addition of a benchmarking ingame tool might also put the game under more people's radar.

X4 is a peculiar game (in its' resource requirements) and niche products like it are a good tool for reviewers to cover all uses of a CPU/GPU.

Remember how Civ6 was included in most CPU/GPU reviews? For sure, it's vastly more popular than X4, but do people care how their PC runs Civilization?

magitsu · Post by **magitsu** » Wed, 8. Feb 23, 17:57

Quite low chance to get a game to stick in review test patterns. For example Arma 3 was pushed into them by the community for some time, but it disappeared fast.
Only Ashes of Singularity comes to mind which stayed there for testing, not popularity reasons. Because it was the first DX12 game.

It's going to happen only if it makes sense for Ego's internal use. As as marketing tool it's not worth the effort. It would have to stay fixed for a long time to make sense to include in reviews, for comparing without the need to redo all tests with every change (dlc, engine).

yerdna · Post by **yerdna** » Wed, 22. Mar 23, 23:49

@CBJ you're probably right about my assumptions being wrong. I'm going off of peripheral information, and as a hardware guy I still have my biases when I see complex things in not C.

If you could humor me, I'm kinda curious as to what the performance profile of the game looks like (i.e. % spent in render prep, % spent in game logic, % spent in IO, ...)
I know that it'll be savegame dependent, but maybe a 200hours in with several stations and dozens of player ships with all sectors explored and satellites everywhere, if you have a scenario like that.

Also, what does the game do with all the XML? after it parses it in, does it just interpret it? does it translate it something like lua and then use the jitter?

Post by **CBJ** » Fri, 24. Mar 23, 14:26

yerdna wrote: ↑
Wed, 22. Mar 23, 23:49
If you could humor me, I'm kinda curious as to what the performance profile of the game looks like (i.e. % spent in render prep, % spent in game logic, % spent in IO, ...)
I know that it'll be savegame dependent, but maybe a 200hours in with several stations and dozens of player ships with all sectors explored and satellites everywhere, if you have a scenario like that.

Just for fun over lunch, I loaded up a save that one of the other developers gave me, which they said roughly met your specifications. I'm not sure of the exact empire size, but I was sitting in a large ship that was busy bombarding a station.

I got frame times of about 15ms on the main simulation thread, with 3.5ms being spent on render preparation, 4.5ms being spent on the UI, and an average of 7.5ms (but with quite a lot of variation) being spent on the "game logic" (everything from animation updates to AI script processing). I'm not sure what you mean specifically by "IO", but if you're referring to file access, it's irrelevant because it's not happening on the same thread as the above. Other things such as the actual rendering, the physics, pathfinding, and so on, also run on other threads. Note that this was a development build, with some overhead for the profiling environment.

yerdna wrote: ↑
Wed, 22. Mar 23, 23:49
Also, what does the game do with all the XML? after it parses it in, does it just interpret it? does it translate it something like lua and then use the jitter?

The XML is loaded, mostly when you start the game, and turned into simple classes or structs. These tend to have a somewhat flatter layout than the XML itself because the internal formats are set up for performance (at least where that's relevant) while the XML is set up for readability. The internal format of the AI and MD scripts is obviously a bit more complex, but it's still basically just classes and structs. We're not translating it into any third party language or using third party APIs for the language-like features; it's all custom-built and under our direct control.

Axeface · Post by **Axeface** » Fri, 24. Mar 23, 14:59

Raptor34 wrote: ↑
Wed, 8. Feb 23, 12:21
This. Why would I play the game if it goes around limiting me artificially. I can limit myself thank you very much. Not my fault people want to go around crashing their game and then complaining that they crashed their game.

People approach games differently. Some people NEED reasons to do things. Just like I have a reason to attack the split, due to the games plots and lore I need a reason to limit myself. I need a reason to not capture 100 asgards for absolutely no rep hit. I need a reason to use a fleet when I can just grab my fighter and cripple every xenon ship in a sector or grab an asgard and obliterate literally any threat in the game without a sweat.
Some people are capable of coming up with limits for themselves and can enjoy that experience, some people literally cant. I cant enjoy this, I need a well designed game with limits and challenge. I cant enjoy just deciding to attack the argon if the game gives me no reason to. And using the reason of 'im going to crash the game' if I dont limit myself is slightly immersion breaking, for people that care about that, wouldnt you agree?

I suggest that egosoft (and this is an example of something that could be optional) add a new mechanic that imposes reputation caps to the other factions after the player reaches a certain power point. Player power would be tracked by creating a sum of the empire power, military assets like military ships or turrets on stations would increase this number a lot more than non-combat assets like trade ships. As the player continues to grow their empire the game would continue to lower the rep cap, to a point where if the players empires vastly outsize the AIs, they will all be hostile. This create new gameplay where the player would need to balance their empire in a realistic way, as they approach these caps they would be able to chose to go over them or make their empire more efficient. Its like the imperium mechanic in total war games, basically, and again, optional.

Raptor34 · Post by **Raptor34** » Fri, 24. Mar 23, 21:36

Axeface wrote: ↑
Fri, 24. Mar 23, 14:59

Raptor34 wrote: ↑
Wed, 8. Feb 23, 12:21
This. Why would I play the game if it goes around limiting me artificially. I can limit myself thank you very much. Not my fault people want to go around crashing their game and then complaining that they crashed their game.
People approach games differently. Some people NEED reasons to do things. Just like I have a reason to attack the split, due to the games plots and lore I need a reason to limit myself. I need a reason to not capture 100 asgards for absolutely no rep hit. I need a reason to use a fleet when I can just grab my fighter and cripple every xenon ship in a sector or grab an asgard and obliterate literally any threat in the game without a sweat.
Some people are capable of coming up with limits for themselves and can enjoy that experience, some people literally cant. I cant enjoy this, I need a well designed game with limits and challenge. I cant enjoy just deciding to attack the argon if the game gives me no reason to. And using the reason of 'im going to crash the game' if I dont limit myself is slightly immersion breaking, for people that care about that, wouldnt you agree?

I suggest that egosoft (and this is an example of something that could be optional) add a new mechanic that imposes reputation caps to the other factions after the player reaches a certain power point. Player power would be tracked by creating a sum of the empire power, military assets like military ships or turrets on stations would increase this number a lot more than non-combat assets like trade ships. As the player continues to grow their empire the game would continue to lower the rep cap, to a point where if the players empires vastly outsize the AIs, they will all be hostile. This create new gameplay where the player would need to balance their empire in a realistic way, as they approach these caps they would be able to chose to go over them or make their empire more efficient. Its like the imperium mechanic in total war games, basically, and again, optional.

I disagree. People's lack of self-control is no reason to limit others who are able to control themselves.

Axeface · Post by **Axeface** » Sat, 25. Mar 23, 02:30

Raptor34 wrote: ↑
Fri, 24. Mar 23, 21:36

Axeface wrote: ↑
Fri, 24. Mar 23, 14:59

Raptor34 wrote: ↑
Wed, 8. Feb 23, 12:21
This. Why would I play the game if it goes around limiting me artificially. I can limit myself thank you very much. Not my fault people want to go around crashing their game and then complaining that they crashed their game.
People approach games differently. Some people NEED reasons to do things. Just like I have a reason to attack the split, due to the games plots and lore I need a reason to limit myself. I need a reason to not capture 100 asgards for absolutely no rep hit. I need a reason to use a fleet when I can just grab my fighter and cripple every xenon ship in a sector or grab an asgard and obliterate literally any threat in the game without a sweat.
Some people are capable of coming up with limits for themselves and can enjoy that experience, some people literally cant. I cant enjoy this, I need a well designed game with limits and challenge. I cant enjoy just deciding to attack the argon if the game gives me no reason to. And using the reason of 'im going to crash the game' if I dont limit myself is slightly immersion breaking, for people that care about that, wouldnt you agree?

I suggest that egosoft (and this is an example of something that could be optional) add a new mechanic that imposes reputation caps to the other factions after the player reaches a certain power point. Player power would be tracked by creating a sum of the empire power, military assets like military ships or turrets on stations would increase this number a lot more than non-combat assets like trade ships. As the player continues to grow their empire the game would continue to lower the rep cap, to a point where if the players empires vastly outsize the AIs, they will all be hostile. This create new gameplay where the player would need to balance their empire in a realistic way, as they approach these caps they would be able to chose to go over them or make their empire more efficient. Its like the imperium mechanic in total war games, basically, and again, optional.
I disagree. People's lack of self-control is no reason to limit others who are able to control themselves.

If you read my post you would have seen that I literally said OPTIONAL. And calling it 'lack of self control' is just downright insulting because that is not what this is. Im not going to interact you with anymore.

egosoft.com

egosoft.com

X4 performance profiling

X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling

Re: X4 performance profiling