Now we're speaking my language
Imperial Good wrote: ↑Mon, 3. Jan 22, 06:05
I do not see where in your snippet serialization is meant to occur. A big limitation on save performance is probably due to serialization since a complex entity component graph has to be squished into a flat form with well defined ordering. When saving iterating this graph is apparently what is taking most of the time.
I couldn't have said it better myself, and the writeup on the original sample has a section called
"This isn't actually saving anything to disk..."
Honestly, I hid the thread differentiation more than I should have in the original example in the name of getting it down to 100 lines (enough to fit comfortably on a vertical monitor) -- again, designed for general consumption.
In both samples, the last thread performs double duty as a standard processing thread until a save state is flagged and then switches to dedicated save mode... but no serialization / data marshalling occurs, simply because it wasn't the point of the exercise. It could spend 20 minutes serializing to XML if you wanted it to (with a little spinner in the corner of the screen, of course) and it would make no difference, other than the inability to quick save again in the mean time -- which technically could be alleviated by using a stack instead of a hard pair, but again, that's outside the scope of a focused example.
Point being, serialization should be occurring on a temporarily dedicated thread from snapshot data -- it shouldn't matter how long serialization takes.
Imperial Good wrote: ↑Mon, 3. Jan 22, 06:05
The problem fundamentally stems from resolving relationships. An object, such as an entity, could be referenced by multiple other objects, such as component states. For performance these references are usually kept in a machine friendly way such as address pointers or offsets into a chink of virtual memory. During serialisation such references to an object are all resolved to the same unique identifier which can then be used to find and rebuild the references to the object during deserialisation.
Standard entity model -- so now I'm assuming we're using a table/document model, as opposed to a simple owner hierarchy.
Imperial Good wrote: ↑Mon, 3. Jan 22, 06:05
Trying to multithread such process efficiently is non-trivial, since the resulting object reference identifiers from all threads must be consistent and no object should be duplicated. Trying to join the results from multiple threads would likely have performance limited by the joining thread which ends up doing similar work to a less threaded implementation.
Ok, pause here for a second.
Either you're telling me the entire modeled dataset is managed real-time on a single thread, or you're at least saying it's problematic to access said machine-friendly structure from multiple threads.
I can't imagine the latter is true. It shouldn't be.
A standard entity is effectively just an ID or a pointer to the object memory location -- if it's being used for serialization, I presume it's the former.
In either case, object references within offset data should be equally accessible to any thread so long as said access is synchronized.
Entities have their own, lets say, "hierarchical chain of data processors".
I'd expect they're grouped into or processed by threads according to priority or locality (e.g. the distant world is less important "right now" than nearby entities) and should be managing their own data blocks with priority-level timing -- or at least sending messages to a dedicated state write thread at said priority level (but god that's messy).
Imperial Good wrote: ↑Mon, 3. Jan 22, 06:05
Trying to have the threads coordinate with each other likely results in such lock bottlenecking and overhead that less threaded solutions are faster.
On
general principle, agree to disagree.
In my experience, so long as most of the world is read, not written, per cycle (be that state, physics, rendering, whatever) shared locking across as many available threads as possible is more efficient.
But my experience has no bearing on code I can't see, so I'll gladly cede that likelihood in the name of finding the right solution.
Imperial Good wrote: ↑Mon, 3. Jan 22, 06:05
Not all data has such complex relationships. For example bulk data in Minecraft or Factorio representing a chunk of terrain. In such case it is certainly possible to multi thread the serialisation of such data ...
The only data in X4 I can think of which might benefit from this in is the saving of current yield data, something which is likely so small that trivial time is spent serialising it anyway, especially if the data is organised efficiently.
Maybe our thought trains are traveling in opposite directions here.
I'm not suggesting multi-threading serialization.
I mean, if you wanted to break down entity groups and you're drawing off a thread pool and there are some free threads that you could divide them between, go for it.
But that's not what I'm suggesting -- quite the opposite.
I'm suggesting keeping the serialization on one (or more) predetermined background thread(s) that can read from a temporary snapshot state, and leave the rest to keep the game running.
Granted, you're down (or up) a thread, and the sooner the save completes, the sooner delta memory gets freed, but I get the feeling most of the game's memory consumption doesn't consist of low-level, numeric, compressed, offset, real-time data blocks anyway -- it's assets. That is to say, I presume the non-aggregate low level numeric delta over a 60 second period isn't 4+ GB.
Imperial Good wrote: ↑Mon, 3. Jan 22, 06:05
This seems like an attempt to imitate Linux process forking...
Not what I was going for, but glad you pegged the Linux bit.
The copy-on-write bit aligns -- Overlayfs and AUFS came to mind when I styled this particular in-place snapshot example -- suppose it's also why I mentioned whiteouts and tombstones.
You trailed off a bit there from
"like a fork" to
"an actual fork", and I'm certainly not proposing a proc fork -- both because of page copy (the biggest problem with casper/tmpfs+overlayfs, and btrfs to a smaller degree) and because it's entirely unnecessary here.
The main process already has the data to be saved -- it's not on disk, it's in available memory, yes?
We know
when we want to save it.
We have to keep writing world state changes.
All I'm saying is we don't have to write them to the same place if we can imitate the desired read.
If we're talking about block offsets, sure, that adds a layer of complexity.
It's like asking for the (i)th array element every time and then when a switch is flipped suddenly saying -- you know what? forget that array, instead, read (n) bytes from arbitrary (j) location.
If you're stuck with twiddling inside pre-allocated blocks, it'd suck to guess at sizes and pre-allocate a scratch space...
And yet, that's precisely what I'd propose... and then merge those deltas to their original slots when the save's complete.
While it takes a bit more mental gymnastics, it's really no different than working with a multi-dimensional linked list.