In my last post I talked about some issues that come with using UDP and some approaches to dealing with them. In this post I'm going to describe my solution, a message dependency graph.
On the sending side, each guaranteed message has a sequence number associated with it. The sequence number is incremented each time a message is sent. When a message is sent, its sequence number is stored in a reference to a user-provided MessageDependency object. This way, when sending future messages, you call addDependency() and pass in the MessageDependency instance. You might wonder why I have a MessageDependency object instead of just using the sequence number directly. There is a good reason, which will be important later on.
Included in each message's header is the sequence number of that message as well as a list of dependencies. In my implementation, sequence numbers are 2 bytes unsigned integers in the range [0,32767] which wrap around when advancing past their maximum value. The reason I didn't use the full range is because the first bit is used as a "list termination" flag - if the first bit is 0, it signals the end of the list.
The following algorithm is used on the receiving side. When a new message arrives, a node in the graph is added for that message (if it doesn't already exist), and the node is initially flagged as "not executed". Next, the message nodes for each dependency are examined (if a dependency node doesn't yet exist, it is added and flagged as "not executed"). A directed edge is added pointing from each not-yet-executed dependency to the new message node. If the dependency has already been executed, then we don't add an edge between the two nodes.
Once the node has been added to the graph, we attempt to recursively execute any ready nodes. The algorithm for this is: start at node N. If N has any incoming edges, terminate, as this means it still has pending dependencies. Otherwise, if it has no incoming edges, execute it and flag it as "executed". Then, remove all outgoing edges from that node so that its successors know that it's been executed. Then recursively perform this algorithm on each successor.
Here is some pseudocode:
void addMessage( seqNumber, dependencies[] )
n = getNode( seqNumber )
if n == null
n = addNode( seqNumber )
n.executed = false
// don't do anything if we've already executed the message
if n.executed
return
// add each dependency node
for depSeqNumber in dependencies
d = getNode( depSeqNumber )
if n == null
d = addNode( depSeqNumber )
d.executed = false
if !d.executed
addEdge( d, n )
void tryExecuteMessage( seqNumber )
// check to see if all dependencies have been completed
n = getNode( seqNumber )
// if there are no dependencies, this message is ready to be executed
if n.predecessorList is empty
// execute the message
executeMessage( n )
m.executed = true
// try executing all messages dependent on this one
for depSequenceNumber in n.successorList
d = getNode( depSequenceNumber )
removeEdge( n, d )
tryExecuteMessage( depSequenceNumber )
One detail is that you should remove nodes from the graph eventually. Not only does this prevent a memory leak, but it's also required because sequence numbers wrap around. If you don't remove nodes, then after a while ("after a while", the mark of a bug full of pain and suffering!) you'll start dropping messages when the wrap occurs because the graph thinks they've already been executed. I choose to start removing nodes after they've been around for the connection timeout limit plus a second of padding. I also made it so that if an unexecuted node reaches this removal timeout, a connection error is signaled.
So far the implementation is pretty good. But there are still two major issues with it. The first issue is subtle, but fundamental. The problem is "old dependencies". Here's an example: suppose you send a message to spawn an enemy, and the message has sequence number 10. Then the player leaves the area, so the enemy goes idle. After an hour, the player returns and kills the enemy. The "kill enemy" message has a dependency on the "spawn enemy" message. But since it's been an hour, the sequence numbers have wrapped around, and "10" now refers to a completely different message. In fact, mostly likely, "10" wouldn't refer to any message at all, since the node containing message 10 has most likely been removed from the graph. Clearly this will not work at all.
The second issue is something that all network game programmers should be concerned about: bandwidth. Each guaranteed message with n dependencies is an additional 2+2n bytes. Using the plain old ordered channels method, each guaranteed message will cost only an additional 2 bytes. We need to find a way to cut down on all the extra dependency data.
Fortunately, there's something we can do that can sovle both of these problems at once! The idea is to determine, on the sending side, when a message must have been executed on the receiving side. Once we've determined that a message has been executed, we can simply remove any dependencies on that message. Remember that the reason we have dependencies in the first place, such as "B depends on A", is to ensure that A gets executed before B. And if we've determined that A has already been executed, we've already met that requirement!
To do this, we keep a local copy of the message dependency graph on the sending side in addition to the one on the receiving side. In the local graph, a node is added whenever a message is sent, but that node is initially marked as "not executed" (since when we first send it, it hasn't been executed on the receiving side yet). When we send a packet containing guaranteed messages, we keep a list of the sequence numbers of messages in that packet. Then, when we receive an ACK for that packet, we call the tryExecuteMessage() function for each guaranteed message that was previously sent. This will cause the local dependency graph to be evaluated just as it is on the receiving side.
Locally, when evaluating messages in the dependency graph, we don't want to actually evaluate the data sent along with the message as we would on the receiver. Instead, whenever a message is executed, we want to remove future dependencies on it. (Implement executeMessage() as a function pointer to be able to easily interchange this functionality). Remember how above I talked about using a MessageDependency object instead of passing around the raw sequence numbers? This now becomes important.
First of all, take a look at my endMessage() method's signature:
bool endMessage( MessageDependency * dependency = NULL );
The important thing to note is that rather than returning a dependency object, I pass in an existing one to be filled out (or optionally none). Internally, when the function is called, a pointer to the dependency object passed in is stored along with the message. This is because the MessageDependency contains not only the message sequence number, but also an "executed" flag. Whenever a message is executed in the local graph, the corresponding dependency object's "executed" flag is set, so that next the object is passed into a call to addDependency(), it is simply ignored. Everything is completely transparent to the user!
(The implementation of this takes a bit of care; for example, the MessageDependency class should not be copyable, and its destructor needs to make sure to clear the external reference to it.)
Let's reexamine each of the above issues with this new local graph solution. The first issue was "old dependencies". E.g. message A is "spawn enemy", message B is "kill enemy". Message B is dependent on message A but occurs a long time after. With the local graph solution, an ACK for message A will be quickly received (usually in the connection round trip time), and A's dependency object will be marked as "executed". When message B is sent a long time later, the dependency won't even be added. First problem solved.
The second issue was bandwidth. Since we're quickly removing dependencies, the 2n term in the number of additional bytes needed per message will rapidly disappear, and we'll just be left with 2 bytes for the sequence number, which is just as good as having an ordered channel. Of course, this won't be the case when rapidly sending a ton of guaranteed messages with many interdependencies, but I can't imagine very many scenarios where this would occur, so I think it's safe to call the bandwidth issue "solved".
So far my implementation has been working great. I'd be interested in hearing your thoughts on the idea, or results if anyone has used something like it before.