Monday, March 31, 2008

New Serial Driver testing results

As promised, I've a new waveform to speak to today (yes, actual measurements!) but I'll get to that later in this post.

I spent my time today testing and making some late adjustments to code so that it would run correctly. As you remember in my last post I noted that I was running Rx and Tx at what Windows calls 921,600 baud and it was working well. However, next I had to modify my Windows FTDI driver installation to get to higher baud rates so I could continue testing.

After a few false starts (read- two driver reinstallations and a first incorrect attempt at modifying the 3 alias baud rate entries...), I managed to get my FTDI drivers configured correctly. What's fun is how I know I did. In my last efforts yesterday I upgraded my interaction with the autobaud routines to show detailed findings when asked and I also added a sign-on transmission which happens immediately after the autobaud completes. This is where it gets fun! I'll show you.

First, here are my speed aliases for my updated serial drivers:
  • 300 baud -> 3M baud
  • 1,200 baud -> 1.5M baud
  • 2,400 baud -> 2M baud

I chose these aliases so that I would have some hope (in my older age ;-) of remembering the new alias values. I also have no desire to be running this device at those original baud rates.

When I first start PropCAN now and send it a $0d <CR> is responds with the following message:

* PropCAN connected at 923076 baud, (92307 chars/sec) *

(It's pretty easy to tell when things are working well enough for it to send this <grin>.)

So I select 1200 baud (intending to get 1.5M baud) and connect to PropCAN and I see the following:

* PropCAN connected at 1500000 baud, (150000 chars/sec) *

I'm smiles all over! So I ask it it's current configuration by entering ?<CR> and get the following:


D PropCAN - A Propeller-based USB to CAN Controller by KZ0Q
D0 F/W Debug: OFF
VA201
O? CAN Closed
S7 CAN at 800 Kbps
X0 AUTO Fwd: OFF
Z0 Time-stamp: OFF


I move on to 2400 (new 2.0 Mbaud) and try once more:
* PropCAN connected at 2000000 baud, (200000 chars/sec) *

Now I'm grinning ear to ear.. but then I entered the ? command and got nothing. Then I'm moved up to 300 baud (3M baud) and I got nothing... Hmm, this is an amazing start to my testing but I've got work to do.

Now remember, I've an analyzer connected which is currently sampling at a 5 nS rate so I can pretty much accurately determine if what is being sent and received is actually at the bit rates I think it should be at. I verified each of these speeds and yes the driver is correctly sending at the 1.5, 2.0 and 3.0 bit rates!


This is where I had to go back and adjust things. My transmit side needed some code tweaking as it was exhibiting some really fat start-bits at these three speeds. I was able to make better instruction choices to bring this into alignment really well. On the receive side it took some studying. However, eventually I replaced a couple of test and jmp loops with waitpne/waitpeq and the receive side snapped to as well. After these changes, I was able to fully interact at 2.0M baud and I also retried my 300 baud ;-) test with the following results:
* PropCAN connected at 3000000 baud, (300000 chars/sec) *

(No I cannot interact with it at this speed, tho' I did try... I also knew by my calculating the instruction timings of my code that it would not work. This is why I posted the question I did at our Propeller Forum...)



So, in the end, this has been a great day. Now let's look at what's happened to our overall timing picture. I measured a full can transmit and the reply of the CAN Node's response sent back via USB. (I've no markers in this picture, so I'll talk in signal color and location instead...)


At the left, just up from the bottom we see to signals (red and Yellow) which are labelled as PropSIO RxD and TxD. I make the red color receive so these are easy to remember. The red descending blip at the left is the packet in ASCII coming in over serial "T1430<CR>" and the yellow blip following it is the <CR> reply back via USB. Moving on to the bottom two lines (red and green this time, 2515-CAN labels) we see our CAN message transmission in green and we see our CAN receiver seeing our full outgoing message (our receiver always sees our own transmissions in the MCP 2515.) Then we see the CAN Node replying to the message with a longer (8-byte payload) message. This is the next red burst. Just near the end of the red burst you see a green blip. This is the MCP 2515 ACK'ing the message (as each listener does on the CAN bus.) Now, look all the way to the right and you see the ASCII representation of the 8-byte payload message being sent to the PC via USB: "t36286263623726040034<CR>". Finally, and less important but so you know, the burst of activity above the CAN signals we just spoke of is the SPI traffic with interrupts.


The important part of this is seeing the relative time now taken for each of these actions. Just "eyeballing" the picture you can see that our serial transmission (in Yellow, and at 2M baud) is about the same width (takes about the same amount of time) as the CAN message arriving from the CAN Node (in Red). Contrast this against this 2nd picture originally shown in the Jan 22 post "Anatomy of a full transaction - Part 2". In this older picture we see the serial transmission burst (this time in Red) again to the right but look how much time it was taking! That's 1.2 mSec! vs. the 128.9 uSec of the CAN message arriving.

I think you'll agree this is exactly the improvement we've needed to meet our timing goals for this project. We needed 1.5M baud and we've got a functioning 2.0 M baud giving us timing margin and in fact our transmit side works to 3.0M baud according to our testing.

This has been some fun work on serial drivers for PropCAN! We see the autobaud object working well at all speeds of the FT232R. We see the serial transmit object working with all speeds as well. And finally we have the serial receive object working at the supported speeds up through 2.0M baud. This is all but the highest 3.0M baud speed. We've done ok.

Now it's time to work on the next performance upgrade for the SPI subsystem. I'm off to study what to do next there... I'll be posting more later this week after I've made some progress.

Sunday, March 30, 2008

Circular ring of Buffers for Serial In/Out - fast!

In reviewing the waveforms (Logic Analyzer capture pictures in previous posts) we note that we are passing a character at a time from our serial receive Cog to its listening Cog which is the wost performing of the Transmit and Receive sides. This hand-off originally limited the code to running at 230,400 baud. After some instruction choice changes and making it more specific to interacting with the FTDI FT322R chip I was able to improve this to working fairly reliably at 460,800 baud but no better. My goal in re-writing the Serial objects is to reduce this cog interaction to needed boundaries thereby allowing the serial transfer rates to be much faster and to, hopefully, to attain our needed 1.5M baud.

Our study of the serial traffic has shown us that this application has string length maximums and an overall performance need. Let's make use of this knowledge and only transfer from Cog to Hub on each long within command (max string length) boundaries, not character boundaries. In the highest traffic cases we will then have 6 transfers were were seeing 22 transfers earlier. Let's also not make our Cogs wait on each other by providing more than one buffer so that a buffer can be being transmitted while another is being filled. Likewise, a buffer can be filled by the receiving Cog while another is being acted up by the command processing Cog.

In our serial transmit case we can transmit a buffer at full speed and not break stride until we next need another buffer. If another has been prepared we simply switch to it and start sending it; again at full speed.

In our serial receive case the original object was limited by our ability to dump characters into Main RAM. This severely impacted our ability to be ready to receive the next character. Since we now know that our characters streaming at us are broken into max command/message size strings,we can defer writing into Main RAM until we hit one of those natural boundaries where we have more time. This means that we can receive serial data from programs controlling the PropCAN at full serial rate without missing characters!

The command protocol for PropCAN ensures that activities are gated by acknowledgements which means that only one or sometimes a couple of commands will arrive before an acknowledgement must be sent at which time the sender will stop sending and wait for a response. This is great in that it allows our new routines to perform well but not reach the limit in performance they too have.

These new routines are built around a new data structure the contents of which controls the actions of the single producer and single consumer of data contained within the structure. The new data structure is simple. We have an array of pointers to fixed size buffers. The number of pointers in the array is adjustable. The fixed size buffers contain a length/flag byte and the rest of the space is for the data (to be received or to be transmitted). The fixed length of these buffers is the same for all buffers pointed to by the array but is adjustable. For the PropCAN device we use 32-byte fixed length buffers and the array pointing to them consists of 4 entries pointing to four unique 32-byte buffers. We choose the 32 because it is the next power of two greater than our maximum 22-character command/message. We chose the 4 rather arbitrarily (read- based on "no real data") but it can be adjusted separately for Transmit and for Receive as we discover what our real depth needs to be. The Transmit side and the Receive sides each have their own independent data structure instance (array of pointers and set of buffers to which they point.)

Did I catch you on my having an array of pointers to fixed length buffers? Isn't a concatenated set of fixed length buffers also an array? Why have the array of pointers to the array of buffers? The answer is really quite simple. It's the old standard trade-off between memory and performance. Think of this array of pointers as a pre-calculated set of answers. With the array of pointers I have much less code and therefore much less execution time in calculating which buffer will be used next. I simply move the preparation of these pre-calculated answers out of the critical path of when i needed to access buffers to when I'm starting up the PropCAN device; a much less time critical point in time.

I mentioned that our length byte in each fixed buffer is really length and flag (dual purpose). Let's look at why I think this. The array of pointers to these buffers lets us easily treat the set of buffers as a circular list. So we'll let the producer start at array[0]'s buffer and when it is filled the last value written will be the length byte. The consumer will also start at array[0] and will not consume the buffer until the length byte becomes non-zero. When the consumer is finally done with the buffer it zeros out the length byte and moves on to array[1]'s buffer and waits for it to have a non-zero length. Likewise, our producer sets a length in array[0]'s buffer and then moves on to filling array[1]'s buffer. As each tried to locate the next buffer after finishing the last in the array (in this case array[3]) then they wrapped the index back to zero and started again with array[0]'s buffer. So we have the buffers being used in a circular fashion and we have the length field within each buffer being used as a "buffer is empty - can be filled" or "buffer is full - can be emptied" flag.

Well, it's time to end this post but first let me describe the state of the code since my Wednesday post. I've implemented and tested the new "circular-fixed size buffer" handling Serial Receive and Serial Transmit objects. I've run them at baud-rates from 19,200 to 923,076 baud rates without error or data loss. Tomorrow I'll update my FTDI driver installation on my Windows XP PC so I can test them at 1.5M baud and 2M baud (I know they won't run at 3M baud so I won't go to the FTDI 3M baud during my testing...<sigh>)

My next post will be showing Logic Analyzer waveforms of the areas shown in earlier posts were we saw the serial traffic taking so long. We hope to see the serial traffic no longer dominating the waveform as it was in the past. We hope now to be taking up only as much time as we really need!

Performance a quick review

I've not posted since Wednesday because I've been off writing new propeller objects. Both serial transmit and serial receive were limited to 460,800 baud after my last rewrite of them. PropCAN needs to be able to run at ~1,500,000 baud if we want to meet best possible rate of messages arriving from the CAN bus. These numbers motivated the re-writes. Let's quickly review two of our key performance criteria.

The AMSAT CAN bus runs at 800,000bps. We have three CAN message sizes in the protocol: 0-byte payloads, 2-byte payloads, and 8-byte payloads. If the bus traffic was the maximum load of each of these messages the traffic rates would be as follows:
  • 0-byte payload: min. of 47-bits per message, max 17,021 messages / sec.
  • 2-byte payload: min. of 63-bits per message, max 12,698 messages / sec.
  • 8-byte payload: min of 111-bits per message, max of 7,207 messages / sec.

PropCAN translates these messages into ASCII strings and sends them up the USB interface to the connected PC. After formatting in ASCII let's look as what happens to the message size:

  • 0-byte payload: tiiiL<CR> (6-chars x 17,021 /sec = 1,021,277 bits per sec.)
  • 2-byte payload: tiiiL0011<CR> (10-chars x 12,698 /sec = 1,269,841 bits per sec.)
  • 8-byte payload: tiiiL0011223344556677<CR> (22-chars x 7,207 /sec = 1,585,586 bits per sec.)

OK, that's a lot of numbers but it shows us something interesting. It shows us that two message formats cause different ends of PropCAN to be working the hardest (at the fastest rate) and NOT at the same time. The 0-byte payload messages (the shortest ones) make our SPI offload run at maximum performance (1.8 Mbits/Sec) but the demand on the Serial Transmit routines is lower. However, the 8-byte payload messages don't load the SPI offload routines as much but now our Serial Transmit routines must run at 1.5Mbps! And now you see where I came up with my ~1,500,000 baud quoted in my opening paragraph.

Now we've reminded ourselves of Max SPI offload rate we need, the max message handling rate we need and the maximum Serial transmit rate we need. Our goal is to not let our I/O performance run slower at some point in our design which would cause us to need to buffer messages and then, ultimately, to not keep up with traffic arrival rates. We know we've got work to do to make this system work at our needed performance.

Now that we've clarified our goals, let's move on to the topic for the next post: my now working code for the new Serial Transmit and Receive Objects.

Wednesday, March 26, 2008

Internal message storage formats

Well hi there! It's good to be having fun working on this project once again!

When last we discussed the internals, we were musing at all the "tuning" ahead of us yet in terms of the performance of our existing "proof of concept drivers". It is this driver tuning which will be the subject of my next posts but let's get into how we are choosing to move data through this system in this post. Our reducing data movement and transformation is our first step in tuning this system's performance.

Our goal is "best possible throughput" and to meet this we must be efficient. This means we want a minimum of format translations and a minimum of copying data from one location to another within propCAN. Let's look at our USB and CAN interfaces to see in what natural forms our traffic exists and this should lead us to defining what storage forms we will use. First let's look at the USB side.

Messages to be sent via CAN arrive as ASCII strings via USB. These strings are anywhere from 5 to 22 bytes for standard mode (11-bit ID) CAN packets and 11 to 27 bytes for extended mode (29-bit ID) packets. In addition to messages, commands to propCAN and status strings from propCAN are also sent via USB but these are all smaller than our 27-byte longest message. So this then is our USB side definition:
  • USB I/F: strings are sent and received which are 2-bytes to 27-bytes in length.

Now let's look at our CAN side. The MicroChip MCP 2515 receives packets into 13-byte buffers with an additional (optional) byte of status at the end. Both our standard and extended messages fit into this size buffer. In order to send a message the 2515 also wants the message formatted into this layout for best efficiency hand-off from the Propeller to the 2515. Also, we have an additional requirement to time-stamp arriving messages (relative time from packet to packet) so we'll add another two bytes to this length. So here we have our CAN side definition:
  • CAN I/F: messages are in 14-byte buffer with two byte time-stamp (16-bytes total).

If we round our needed buffer sizes up to multiples of longs (our largest native Propeller data size) we end up with 32-byte buffers on the USB side and 16-byte buffers on the CAN side. Let's look at the resulting minimum transforms we end up with:

  • SEND side: a message arrives as an ASCII string received by our Serial Receive COG. It transfers the message into a 32-byte buffer in Main RAM. A pointer to this 32-byte message is handed eventually to validation and then conversion code. This conversion code grabs a 16-byte buffer, again in Main RAM, and re-formats the ASCII text into an MCP 2515 layout message. The pointer to this 16-byte buffer is then passed to the SPI send routines at which time the buffer is accessed by the SPI cog and bytes from it are sent to the MCP 2515 device. Since the layout of the 13 early bytes in the buffer are exactly what the 2515 most wants the transmission to the 2515 is efficient: command followed by 13-bytes of data (actually, less bytes need be sent if the message payload is less than the full 8 bytes.)
  • RECEIVE side: this is pretty much the same effort but in reverse sequence. The 2515 receives the message and then sends an interrupt to the Propeller. Upon receiving the interrupt a 16-byte buffer is populated with the data offloaded from the 2515. We append the time-stamp of the unload effort as the last couple of bytes and the rest is the reverse of the send side effort.

Now we are beginning to understand the data flow through the propCAN system. We are not moving data around or reformatting it unless we need to. We've normalized our system buffers in Main RAM to 16 and 32-byte objects a convenient size for the Propeller.

Next we'll look at an approach to getting our serial drivers to be much better performing.

Monday, March 17, 2008

interference

Ah the best laid plans... I was actually thinking in late January that I'd be able to stay focused on this project. Alas, this has not proven to be so. But, I'm back... I'm spinning up now on where I left off and am deciding which aspects to work next. I'm actually getting anxious to get into propeller work again as I've been away too long. I've been reading articles and reading the Forum posts to wet my whistle for the project again. (deSilva, I can't thank you enough for all the posts and your articles that have brought me up to speed so quickly and for your always helpful posts to my questions on the Forum! You've been nothing but the best of help -IMHO. I sincerely hope you'll continue to lurk and offer us the benefit of your expertise. Good luck in your ongoing endeavors.)

Watch for new posts to start happening here next week if not late this week. I wonder what I'll work on first??? ;-)