Sunday, March 30, 2008

Circular ring of Buffers for Serial In/Out - fast!

In reviewing the waveforms (Logic Analyzer capture pictures in previous posts) we note that we are passing a character at a time from our serial receive Cog to its listening Cog which is the wost performing of the Transmit and Receive sides. This hand-off originally limited the code to running at 230,400 baud. After some instruction choice changes and making it more specific to interacting with the FTDI FT322R chip I was able to improve this to working fairly reliably at 460,800 baud but no better. My goal in re-writing the Serial objects is to reduce this cog interaction to needed boundaries thereby allowing the serial transfer rates to be much faster and to, hopefully, to attain our needed 1.5M baud.

Our study of the serial traffic has shown us that this application has string length maximums and an overall performance need. Let's make use of this knowledge and only transfer from Cog to Hub on each long within command (max string length) boundaries, not character boundaries. In the highest traffic cases we will then have 6 transfers were were seeing 22 transfers earlier. Let's also not make our Cogs wait on each other by providing more than one buffer so that a buffer can be being transmitted while another is being filled. Likewise, a buffer can be filled by the receiving Cog while another is being acted up by the command processing Cog.

In our serial transmit case we can transmit a buffer at full speed and not break stride until we next need another buffer. If another has been prepared we simply switch to it and start sending it; again at full speed.

In our serial receive case the original object was limited by our ability to dump characters into Main RAM. This severely impacted our ability to be ready to receive the next character. Since we now know that our characters streaming at us are broken into max command/message size strings,we can defer writing into Main RAM until we hit one of those natural boundaries where we have more time. This means that we can receive serial data from programs controlling the PropCAN at full serial rate without missing characters!

The command protocol for PropCAN ensures that activities are gated by acknowledgements which means that only one or sometimes a couple of commands will arrive before an acknowledgement must be sent at which time the sender will stop sending and wait for a response. This is great in that it allows our new routines to perform well but not reach the limit in performance they too have.

These new routines are built around a new data structure the contents of which controls the actions of the single producer and single consumer of data contained within the structure. The new data structure is simple. We have an array of pointers to fixed size buffers. The number of pointers in the array is adjustable. The fixed size buffers contain a length/flag byte and the rest of the space is for the data (to be received or to be transmitted). The fixed length of these buffers is the same for all buffers pointed to by the array but is adjustable. For the PropCAN device we use 32-byte fixed length buffers and the array pointing to them consists of 4 entries pointing to four unique 32-byte buffers. We choose the 32 because it is the next power of two greater than our maximum 22-character command/message. We chose the 4 rather arbitrarily (read- based on "no real data") but it can be adjusted separately for Transmit and for Receive as we discover what our real depth needs to be. The Transmit side and the Receive sides each have their own independent data structure instance (array of pointers and set of buffers to which they point.)

Did I catch you on my having an array of pointers to fixed length buffers? Isn't a concatenated set of fixed length buffers also an array? Why have the array of pointers to the array of buffers? The answer is really quite simple. It's the old standard trade-off between memory and performance. Think of this array of pointers as a pre-calculated set of answers. With the array of pointers I have much less code and therefore much less execution time in calculating which buffer will be used next. I simply move the preparation of these pre-calculated answers out of the critical path of when i needed to access buffers to when I'm starting up the PropCAN device; a much less time critical point in time.

I mentioned that our length byte in each fixed buffer is really length and flag (dual purpose). Let's look at why I think this. The array of pointers to these buffers lets us easily treat the set of buffers as a circular list. So we'll let the producer start at array[0]'s buffer and when it is filled the last value written will be the length byte. The consumer will also start at array[0] and will not consume the buffer until the length byte becomes non-zero. When the consumer is finally done with the buffer it zeros out the length byte and moves on to array[1]'s buffer and waits for it to have a non-zero length. Likewise, our producer sets a length in array[0]'s buffer and then moves on to filling array[1]'s buffer. As each tried to locate the next buffer after finishing the last in the array (in this case array[3]) then they wrapped the index back to zero and started again with array[0]'s buffer. So we have the buffers being used in a circular fashion and we have the length field within each buffer being used as a "buffer is empty - can be filled" or "buffer is full - can be emptied" flag.

Well, it's time to end this post but first let me describe the state of the code since my Wednesday post. I've implemented and tested the new "circular-fixed size buffer" handling Serial Receive and Serial Transmit objects. I've run them at baud-rates from 19,200 to 923,076 baud rates without error or data loss. Tomorrow I'll update my FTDI driver installation on my Windows XP PC so I can test them at 1.5M baud and 2M baud (I know they won't run at 3M baud so I won't go to the FTDI 3M baud during my testing...<sigh>)

My next post will be showing Logic Analyzer waveforms of the areas shown in earlier posts were we saw the serial traffic taking so long. We hope to see the serial traffic no longer dominating the waveform as it was in the past. We hope now to be taking up only as much time as we really need!

No comments: