Showing posts with label Serial Traffic. Show all posts
Showing posts with label Serial Traffic. Show all posts

Sunday, March 30, 2008

Circular ring of Buffers for Serial In/Out - fast!

In reviewing the waveforms (Logic Analyzer capture pictures in previous posts) we note that we are passing a character at a time from our serial receive Cog to its listening Cog which is the wost performing of the Transmit and Receive sides. This hand-off originally limited the code to running at 230,400 baud. After some instruction choice changes and making it more specific to interacting with the FTDI FT322R chip I was able to improve this to working fairly reliably at 460,800 baud but no better. My goal in re-writing the Serial objects is to reduce this cog interaction to needed boundaries thereby allowing the serial transfer rates to be much faster and to, hopefully, to attain our needed 1.5M baud.

Our study of the serial traffic has shown us that this application has string length maximums and an overall performance need. Let's make use of this knowledge and only transfer from Cog to Hub on each long within command (max string length) boundaries, not character boundaries. In the highest traffic cases we will then have 6 transfers were were seeing 22 transfers earlier. Let's also not make our Cogs wait on each other by providing more than one buffer so that a buffer can be being transmitted while another is being filled. Likewise, a buffer can be filled by the receiving Cog while another is being acted up by the command processing Cog.

In our serial transmit case we can transmit a buffer at full speed and not break stride until we next need another buffer. If another has been prepared we simply switch to it and start sending it; again at full speed.

In our serial receive case the original object was limited by our ability to dump characters into Main RAM. This severely impacted our ability to be ready to receive the next character. Since we now know that our characters streaming at us are broken into max command/message size strings,we can defer writing into Main RAM until we hit one of those natural boundaries where we have more time. This means that we can receive serial data from programs controlling the PropCAN at full serial rate without missing characters!

The command protocol for PropCAN ensures that activities are gated by acknowledgements which means that only one or sometimes a couple of commands will arrive before an acknowledgement must be sent at which time the sender will stop sending and wait for a response. This is great in that it allows our new routines to perform well but not reach the limit in performance they too have.

These new routines are built around a new data structure the contents of which controls the actions of the single producer and single consumer of data contained within the structure. The new data structure is simple. We have an array of pointers to fixed size buffers. The number of pointers in the array is adjustable. The fixed size buffers contain a length/flag byte and the rest of the space is for the data (to be received or to be transmitted). The fixed length of these buffers is the same for all buffers pointed to by the array but is adjustable. For the PropCAN device we use 32-byte fixed length buffers and the array pointing to them consists of 4 entries pointing to four unique 32-byte buffers. We choose the 32 because it is the next power of two greater than our maximum 22-character command/message. We chose the 4 rather arbitrarily (read- based on "no real data") but it can be adjusted separately for Transmit and for Receive as we discover what our real depth needs to be. The Transmit side and the Receive sides each have their own independent data structure instance (array of pointers and set of buffers to which they point.)

Did I catch you on my having an array of pointers to fixed length buffers? Isn't a concatenated set of fixed length buffers also an array? Why have the array of pointers to the array of buffers? The answer is really quite simple. It's the old standard trade-off between memory and performance. Think of this array of pointers as a pre-calculated set of answers. With the array of pointers I have much less code and therefore much less execution time in calculating which buffer will be used next. I simply move the preparation of these pre-calculated answers out of the critical path of when i needed to access buffers to when I'm starting up the PropCAN device; a much less time critical point in time.

I mentioned that our length byte in each fixed buffer is really length and flag (dual purpose). Let's look at why I think this. The array of pointers to these buffers lets us easily treat the set of buffers as a circular list. So we'll let the producer start at array[0]'s buffer and when it is filled the last value written will be the length byte. The consumer will also start at array[0] and will not consume the buffer until the length byte becomes non-zero. When the consumer is finally done with the buffer it zeros out the length byte and moves on to array[1]'s buffer and waits for it to have a non-zero length. Likewise, our producer sets a length in array[0]'s buffer and then moves on to filling array[1]'s buffer. As each tried to locate the next buffer after finishing the last in the array (in this case array[3]) then they wrapped the index back to zero and started again with array[0]'s buffer. So we have the buffers being used in a circular fashion and we have the length field within each buffer being used as a "buffer is empty - can be filled" or "buffer is full - can be emptied" flag.

Well, it's time to end this post but first let me describe the state of the code since my Wednesday post. I've implemented and tested the new "circular-fixed size buffer" handling Serial Receive and Serial Transmit objects. I've run them at baud-rates from 19,200 to 923,076 baud rates without error or data loss. Tomorrow I'll update my FTDI driver installation on my Windows XP PC so I can test them at 1.5M baud and 2M baud (I know they won't run at 3M baud so I won't go to the FTDI 3M baud during my testing...<sigh>)

My next post will be showing Logic Analyzer waveforms of the areas shown in earlier posts were we saw the serial traffic taking so long. We hope to see the serial traffic no longer dominating the waveform as it was in the past. We hope now to be taking up only as much time as we really need!

Performance a quick review

I've not posted since Wednesday because I've been off writing new propeller objects. Both serial transmit and serial receive were limited to 460,800 baud after my last rewrite of them. PropCAN needs to be able to run at ~1,500,000 baud if we want to meet best possible rate of messages arriving from the CAN bus. These numbers motivated the re-writes. Let's quickly review two of our key performance criteria.

The AMSAT CAN bus runs at 800,000bps. We have three CAN message sizes in the protocol: 0-byte payloads, 2-byte payloads, and 8-byte payloads. If the bus traffic was the maximum load of each of these messages the traffic rates would be as follows:
  • 0-byte payload: min. of 47-bits per message, max 17,021 messages / sec.
  • 2-byte payload: min. of 63-bits per message, max 12,698 messages / sec.
  • 8-byte payload: min of 111-bits per message, max of 7,207 messages / sec.

PropCAN translates these messages into ASCII strings and sends them up the USB interface to the connected PC. After formatting in ASCII let's look as what happens to the message size:

  • 0-byte payload: tiiiL<CR> (6-chars x 17,021 /sec = 1,021,277 bits per sec.)
  • 2-byte payload: tiiiL0011<CR> (10-chars x 12,698 /sec = 1,269,841 bits per sec.)
  • 8-byte payload: tiiiL0011223344556677<CR> (22-chars x 7,207 /sec = 1,585,586 bits per sec.)

OK, that's a lot of numbers but it shows us something interesting. It shows us that two message formats cause different ends of PropCAN to be working the hardest (at the fastest rate) and NOT at the same time. The 0-byte payload messages (the shortest ones) make our SPI offload run at maximum performance (1.8 Mbits/Sec) but the demand on the Serial Transmit routines is lower. However, the 8-byte payload messages don't load the SPI offload routines as much but now our Serial Transmit routines must run at 1.5Mbps! And now you see where I came up with my ~1,500,000 baud quoted in my opening paragraph.

Now we've reminded ourselves of Max SPI offload rate we need, the max message handling rate we need and the maximum Serial transmit rate we need. Our goal is to not let our I/O performance run slower at some point in our design which would cause us to need to buffer messages and then, ultimately, to not keep up with traffic arrival rates. We know we've got work to do to make this system work at our needed performance.

Now that we've clarified our goals, let's move on to the topic for the next post: my now working code for the new Serial Transmit and Receive Objects.

Tuesday, January 15, 2008

Anatomy of a full transaction - Sio-CAN-CAN-Sio

So here it is... (if you look closely ;-) At the left edge of the screen we have the serial data arriving at propCAN requesting a CAN message be sent. At the extreme right edge we have the CAN traffic arriving and then being formatted as ASCII and being returned via serial to the PC (over USB).

This is a fairly impractical view but let's look at a couple of numbers just before we zoom in on the left edge (then the right).
The view shows the request to response (M1 to M2) taking 409.79mSec with there actually being (M3 to M4) 401.48 mSec of "dead air" (nothing happening) in-between.

This second waveform shows the left edge where we can see the serial data arriving followed by the packet being sent to the MCP 2515 which is then followed by the packet being sent from the MCP 2515 via CAN. Let's look in more detail.

On the SIO RxD/TxD lines we see the request to send a message arriving (M1) and being acknowledged (M2). Next (at M3) we see the message being loaded into the transmit buffer of the MCP 2515 followed by a request to send the message now in the buffer. On the bottom two lines CAN Rx/Tx we see the message being transmitted over the CAN bus. We see the packet on both the Tx and Rx CAN lines because our own receiver listens to all traffic we send so that it can tell when it is allowed to send. (Collision Detection) and so that it can tell when other CAN devices acknowledge receipt of the message it sends.

If you look at the /INT line (MCP 2515 Interrupt request line) we see that immediately after the can message is sent the interrupt line is asserted. This is telling us that the transmission has completed and transmit status is now available (and the transmit buffer is now empty).

The next three SPI commands (1st two are (1) load tx buffer and (2) send tx buffer) are interrogating the MCP 2515 to determine health of the transmit, the last command is clearing the Tx complete interrupt.

Looking at the measured times we see that there is a lot of overhead while causing this message to be sent. A large portion of this overhead does not matter as a delay from when we choose to send to when the send occurs does not matter as long as we can send rapidly once we start to send.

However, a couple of general observations should guide our further efforts as we increase our device performance through code rewrites. These are
  1. The time from interrupt assert to interrupt clear can be shortened and should be since this directly affects how rapidly we can send subsequent messages.
  2. The time to generate a response on the serial interface and the time it takes to hand off the message to be written via SPI to the MCP 2515 are related in that one Cog is handling all of this. We can probably overlap these functions by splitting this effort amongst two Cogs.
  3. A significant portion of the M1 to M2 response time is due to bytes arriving one at a time from the Serial Rx Cog. If these transfers were a <CR> delimited line at a time this would be much shorter.

Next post? Let's look at the right edge (CAN message arriving) followed by the end of the right edge where we finally see the ASCII interpretation of the message being sent to the PC via USB.