Monday, January 21, 2008

Anatomy of a full transaction - Part 2

In the last post, we talked of the transmit side of things. In this post, we look at the receive side and the final sending of the received traffic to the PC via USB.

In the first waveform I've zoomed into the receiving of the message via CAN Bus and then the unloading of the message from the receive buffer within the MCP 2515.

Let's study the events in the first waveform. At M1 we see the message arriving over the CAN Rx line (in red). Then at M3, we see the MCP 2515 acknowledging the message by writing a bit in the ACK field (yellow CAN Tx line near bottom.) At M4, we see the interrupt line being asserted (green line, /INT, going low). After a period of time, we see traffic over the SPI bus. This is our firmware responding to the interrupt assertion and requesting the unload of the receive buffer. At the completion of the unload, we see the interrupt line be de-asserted.

I marked three times of note for this first waveform:

  1. M1 to M2 (617 uSec) shows the time from starting to receive the CAN message to the time it has been unloaded from the MCP 2515.
  2. M3 to M4 (8.6 uSec) shows the time from the message being completely received to the time the MCP 2515 asserts the /INT line.
  3. M4 to M5 (437 uSec) shows the time it takes for the propCAN firmware to respond to the interrupt by unloading the received message.

In the 2nd waveform I've included all we see on the first but I zoomed out so that we can now see the received message being reformatted in ASCII and sent via serial to the FT232R chip which in turn sends it via USB to the attached PC. With this wider view of the receive activity we can now see the relative scale of our transmitting this same message over the various I/O channels. Our message is arriving by CAN Bus at 800kbps. It is being unloaded over SPI at a much higher rate and then finally we see it being sent over serial at a very slow rate, one character at a time.

A few of the times measured in this waveform are worth pointing out:

  1. M1 to M4 (3.96 mSec) is how long it takes from then the message starts arriving at our receiver to when we start send it out in ASCII form over the Serial Tx line.
  2. M4 to M2 (1.2 mSec) is how long it takes to send the ASCII message over Serial.
  3. M1 to M5 (128.86 uSec) is how long it takes for the message to arrive over the CAN Bus.
  4. M6 to M3 (42.53 uSec) is how it take to unload the message over SPI.

These waveforms show us how our current firmware is working and what transfer rates we are achieving over our various interfaces. In general looking at this post and the previous, we see that our firmware is taking way too long when transmitting over serial. We see, too, that we are taking too long when responding to the interrupts. Overall, all of our interfaces need attention.

Next post? Let's go over our internal message data storage format (messages in transit between interfaces.) and we'll go over what approaches we'll take as we make adjustments so that we can get as close as possible to our desired message throughput's.

Tuesday, January 15, 2008

Anatomy of a full transaction - Sio-CAN-CAN-Sio

So here it is... (if you look closely ;-) At the left edge of the screen we have the serial data arriving at propCAN requesting a CAN message be sent. At the extreme right edge we have the CAN traffic arriving and then being formatted as ASCII and being returned via serial to the PC (over USB).

This is a fairly impractical view but let's look at a couple of numbers just before we zoom in on the left edge (then the right).
The view shows the request to response (M1 to M2) taking 409.79mSec with there actually being (M3 to M4) 401.48 mSec of "dead air" (nothing happening) in-between.

This second waveform shows the left edge where we can see the serial data arriving followed by the packet being sent to the MCP 2515 which is then followed by the packet being sent from the MCP 2515 via CAN. Let's look in more detail.

On the SIO RxD/TxD lines we see the request to send a message arriving (M1) and being acknowledged (M2). Next (at M3) we see the message being loaded into the transmit buffer of the MCP 2515 followed by a request to send the message now in the buffer. On the bottom two lines CAN Rx/Tx we see the message being transmitted over the CAN bus. We see the packet on both the Tx and Rx CAN lines because our own receiver listens to all traffic we send so that it can tell when it is allowed to send. (Collision Detection) and so that it can tell when other CAN devices acknowledge receipt of the message it sends.

If you look at the /INT line (MCP 2515 Interrupt request line) we see that immediately after the can message is sent the interrupt line is asserted. This is telling us that the transmission has completed and transmit status is now available (and the transmit buffer is now empty).

The next three SPI commands (1st two are (1) load tx buffer and (2) send tx buffer) are interrogating the MCP 2515 to determine health of the transmit, the last command is clearing the Tx complete interrupt.

Looking at the measured times we see that there is a lot of overhead while causing this message to be sent. A large portion of this overhead does not matter as a delay from when we choose to send to when the send occurs does not matter as long as we can send rapidly once we start to send.

However, a couple of general observations should guide our further efforts as we increase our device performance through code rewrites. These are
  1. The time from interrupt assert to interrupt clear can be shortened and should be since this directly affects how rapidly we can send subsequent messages.
  2. The time to generate a response on the serial interface and the time it takes to hand off the message to be written via SPI to the MCP 2515 are related in that one Cog is handling all of this. We can probably overlap these functions by splitting this effort amongst two Cogs.
  3. A significant portion of the M1 to M2 response time is due to bytes arriving one at a time from the Serial Rx Cog. If these transfers were a <CR> delimited line at a time this would be much shorter.

Next post? Let's look at the right edge (CAN message arriving) followed by the end of the right edge where we finally see the ASCII interpretation of the message being sent to the PC via USB.

Thursday, January 10, 2008

First look at the MCP2515-specialized SPI Engine

When last we visited the SPI back-end of propCAN, we saw that the SPI demonstration code worked great but we can run and we need to run with an MCP 2515 specialized version of the code for the reasons we cited. This post describes the performance increase we are seeing with this first draft of the specialized engine.

The SPI code results described in this post are those attained with the object as posted to our Propeller Object Exchange at http://obex.parallax.com/objects/227/. Let's look at the new performance.

The captured trace (here it comes.... "Click to enlarge") shows a slightly different SPI transaction this time. The transaction is writing two bytes to the MCP 2515 and then reading a byte from it.

Let me back up a minute and present what we've done in the assembly code in this specialized engine then we'll review the numbers shown in the waveform.

This first draft of the engine implements the "entire command" model in assembly. When studying the overall MCP 2515 SPI command set, we find that we can accomplish all we need with roughly seven command types which correspond to:
  • send 1 byte, read none
  • send 1 byte, read 1 byte
  • send 2 bytes, read 1 byte
  • send 3 bytes, read none
  • send 4 bytes, read none
  • send 1 byte, send 13 bytes (yes, send then send)
  • send 1 byte, read 13 bytes

Therefore, early in the assembly code you now see a jump table for these seven command forms. Also, in this first draft you see that control of the /CS line is now handled by the assembly code as well.

Let's get back to our diagram. The command we are using is the third form: send 2 bytes, read 1 byte. The /CS markers are (M5-violet and M6-cyan) which shows this command to now take only 11.62 uSec. We have improved, somewhat.

There is a slight difference in read versus write speeds so let's now look at these. The read-bit time is shown with the red markers (M1 dark-red and M2 light-red). These single bit reads look to be fairly regular (eye-balling the waveform) and the M1-M2 measurement shows this to be 440 nSec. (or a bit rate of 2.27 Mbps).

The write side is slightly faster and is shown with the green markers (M3-dark-green and M4-light-green). Here we see a bit time of 380 nSec (or a bit rate of 2.63 Mbps).

So, we have improved the handling of the SPI interface to the MCP 2515. I have carefully spoken to this being a "command handling" back-end though because of the final point we made in the earlier post. The hand-off time from one Cog to this dedicated SPI Cog is still slowing us more than we really want. So we will, even with the vast improvement shown in this very specialized SPI Engine, still need to move to "transaction" handling in the assembly code. That is small sequences of commands being handled within the assembly engine not just single commands.

Next post? Serial is working, the interrupt/status lines are working, and we've improved on the basic SPI performance. So in the next post I'll discuss an overall transaction (request to send a can message received via serial-Rx all the way to responding with a response CAN message sent back over serial-Tx) which will be _very_ telling as to what we next need to improve as we head towards meeting the desired performance of propCAN.

Proving working Auto-baud (serial rate detection)

In the prior post, we learned why we needed automatic baud-rate detection (AutoBaud). In this post, we show the measurement of the new working AutoBaud spin object and we discuss how it works.

The captured analyzer trace (click to enlarge) shows one character arriving via the serial Rx line from the FT232RL and then two characters being transmitted (after a short delay) in response over the serial Tx line. If all is working, we should see that the bit widths are roughly the same.

When the propCAN first starts it loads the AutoBaud Cog and waits for it to make a measurement and return the results. Then with the measured baud rate in hand the AutoBaud Cog is stopped and the serial Rx and serial Tx Cogs (1 each) are started at the measured baud rate.

You may notice (if you have had to spend any time looking at serial data) that the character being received (red line in waveform) is a carriage-return (<CR>, hex $0D, binary %00001101).

Serial characters are sent least-significant bits first with a preceding start-bit and followed by a trailing stop bit. (In this case we are not using parity and one stop-bit, not two.) Therefore, we should see (in waveform order) %0101100001; our ten bits. Also, since the receive (Rx) line is idle at "1" (or high), the trailing one is represented by the final rising-edge and a single bit time thereafter.

This $0d pattern is an "excellent" pattern to use for a number of reasons the first of which is it is easy to remember. From a bit-width discovery perspective, in this one character we've got a start bit, followed by two opposite-polarity single bit times followed by a double-bit time and finally ending with a quad-bit time. Gosh how accurate do we want to get? ;-)

In my case the initial AutoBaud object just measures the width of the least-significant bit, the $01 bit, of the $0d character. Then it looks up the measured value in a table of values representing the extended set of rates that the FT232R will do and picks the closest one. The table is preloaded with calculated values based on the clock frequency at which the Cog is running. I've offset the values by some small percentage in order to increase the capture-range of the measurement. I used a spread sheet to double-check that my offset-ranges did not create any overlap (at the highest bit rates, smallest values in the table) so that I don't accidentally pick a rate just above or below the desired.

Let's look back at the diagram. M1-yellow and M2-violet show the width of 9 of the ten bits (remembering that the trailing rising-edge is just the start of the stop-bit). We show these nine-bits to be 19.l51 uSec wide or 2.166 uSec / bit. This is 461,301 bits/sec. which we know to be the 460,800 bps setting of HyperTerminal.

Moving to the Tx side now we see that M3-green, and M4-cyan mark the first 9-bits of the transmitted response. This is measured at 19.58 uSec or 2.175 uSec / bit. This is 459,652 bits/sec. which, again associates to our 460,800 so we see that our AutoBaud routine has in-fact chosen the correct baud-rate.

The command language for the propCAN says that when a carriage-return <CR>, $0D, is sent it simply responds with one. This is so that it is easy, programmatically, to ensure that the controlling program is in synchronization with the command parser in the propCAN device. Given this, then, we now add the additional operational requirement that when first starting communication with propCAN, we slowly send <CR>'s until we begin receiving them. Then we can start commanding the propCAN device with confidence.

Turning on Serial -or- Who changed the baud-rate?

PropCAN is a self-contained little box (roughly 2.5" by 1.5" by 3/4") with a DE9P connector at one end for CAN and a mini-USB connector at the other. Oh, and some LEDs show at various places in this little box, too.

Why am I mentioning this? Well it has to just work when it is plugged in to USB for the first time. And I have found an issue which I didn't expect. (Ok maybe I should have or not, I'm not really sure.)

In my first tests of serial I setup the separate Tx and Rx drivers and setup the baud rate to a fixed default (kind-of mid-range). I then set my terminal program to that baud rate and after a few false starts and measuring of baud seen by my analyzer (remember I'm probing the serial lines, too.) I started working and all was well and I made good progress on implementing my command handler (the ASCII command set is presented in the draft manual found at the propCAN web-site).

Then on some sort of whim I changed the baud rate at my terminal program and... wait... hmm... nothing serial is working now! Huh?

I change it back it works. I move to the new rate, it doesn't. What is going on? It always seems to take me a few moments to get over the initial shock of these events. Finally I remember that I am already setup to measure the serial data to see if I'm even still sending data to the device.

So I take a trace and set my markers and then I calculate the baud-rate I'm now seeing -and- I have to pause and recheck. Sure enough, the FTDI chip is actually being affected by the baud rate selection the software is configuring. When I change the baud rate in HyperTerm it changed the back-end baud-rate of the serial Tx/Rx lines coming out of the FTDI chip and going to the Propeller. I totally was not expecting this behavior. Usually, in my experience, once a USB device is involved, the baud rates become meaningless. In this case I now have a device which can be any of the FTDI supported baud-rates when the propeller powers on. Hmmm...

Now you know the origin of the Auto-baud spin object in my software organization chart (earlier post). The manual simply now says that when first connecting propCAN one must first send a couple of carriage-returns before any other traffic so that propCAN can measure the currently configured baud-rate and set itself up appropriately.

Next post? Let's go over the Auto-baud turn-on (with analyzer trace) and then we'll get back to the New MCP 2515 SPI object turn-on.

Wednesday, January 9, 2008

Remembering the first day...

Earlier I stated that working with the propeller was "fun". I think back to first power-on of this device (late Nov 2007, Rev A1 board). The night before, I had completed the soldering of all the parts, studied the board for bad/missed solder joints... and then I cleaned and dried the board.

After letting it sit over-night I finished work, came home and grabbed the new board and cabled it up (plugged it into USB on my Dell XPS Gen4.) I keep the audio on when I'm working with new USB devices so I heard the now overly-familiar "bing" when the new device was added and I saw that the FTDI chip was recognized. Watching my new device I happily see the USB Tx/Rx led's flashing. All of a sudden I stop. "What code can I run to exercise something?"

After a moment of thought, I remembered the simple code for LED flashing found in the Propeller Manual so I reached for the book and started thumbing through it. After a little bit of hurried keying and looking up to which ports I attached the error and warning LEDs I then had some code which should work.

About this time my son walks in about to ask me something (its always something technical but some topic which I never seem to able to predict. ;-) I hold up my hand in a "please-wait" plea and I tell the Propeller tool to download the code to RAM and run it.

Both of us were amazed when it did and then it really did. It found the device and downloaded and verified the code and then it really did toggle the LED! Now that's fun! Having the board turn on so quickly!

In the midst of this "rush" of success I then pressed ^F11 to download to EEPROM. Again, it worked! I gotta say, for my own cobbled together schematic, doing board layout after carefully choosing parts, order the parts and boards, cutting out one of the boards from the rest in the panel and then soldering all the parts and then to see it "just work" I was stunned. This all happened so fast that I wasn't prepared to do anything next. I turned around to my Son we both reveled in how fast this came up and then I talked to him about subject he needed to address when he came into my work-room in the first place.

So, yes this propeller is fun! Many aspects of it are new (e.g., architecture, tool-set, microcode-like assembly language) and it's easy to interface devices to it. What a great "day two" with the Rev A.1 board! Thanks Parallax, this is great fun!

Hmmm, now that it is working what do I really have?

Ok, now we have the SPI engine running. Now let's measure the performance to see how close it is to what we need.

In this view of the traffic I've zoomed in to one SPI transaction. That is, we assert /CS, send one or more commands, maybe read data and then we de-assert /CS.

Referring to our violet /CS line you see that I've zoomed in so that /CS asserted is about the whole width of the waveform window. You'll also see that I've added a few markers (colored vertical dashed lines) denoting signal rising/falling edges of interest. At the top of the waveform you see three measurements in micro-seconds. Let me discuss each of these and what these times now tell us.

Let's work our way in from out-side in. If you look at the /CS line you'll see that I've placed a marker (M1, yellow) at the falling edge (signal asserted) and (M2, violet) at the rising edge. The first measurement at the top shows that M1 to M2 = 389.19 uS which shows, fairly precisely how long /CS is asserted.

Now this is a simple transaction. I'm asserting /CS, writing 4 bytes to the MCP 2515 and then de-asserting /CS.

The starting SPI engine code is built to do one byte at a time handing each byte from one Cog to the assembly language back-end Cog. To measure how long it takes to write a single byte, let it complete and then write another I've placed another two markers: (M5 - Red, and M6 - Blue). The measurement at the top in this case shows M5 to M6 = 103.01 uS. So now we know that handing the byte to the assembly engine waiting for it to be sent and then waiting for the next that it takes ~100 uSec per byte.

Now, I wanted to ask one more question of this waveform. How long does it take to send the one byte once the Assembly back-end Cog starts to send it? In this case I've added two more markers (M3 - green, and M4 cyan). The measurement at the top for these shows M3 to M4 = 4.33 uS.

So, in this case I'm seeing a bit-rate (within byte) of 1/4.33uS or ~225 kbits / sec.

However, since our byte-rate is much slower this degrades to 8-bits / 103 uS or 1/(103/8) = 75.8 kbits / sec.

There are a couple of issues we'll have to address now that we've looked at this performance:
  1. Handing one byte at a time between Cogs will not allow us to meet our desired performance given what we're seeing.
  2. Looking at the assembly code we can dramatically increase it's bit time performance if we move to a specialized driver (not the general demonstration driver we are starting with.)
    The assembly is built to handle many variations of SPI, once we target to one device we can remove this all purpose code in favor of only enough code to do exactly what we need.
  3. A first attempt can be made to move the boundary to a single MCP2515 command but we will likely need to go all the way to handling full transactions in the assembly code not just single commands, we'll see.

NOTE: as a result of this study I came up with and posted the MCP 2515 SPI Engine specialized object: http://obex.parallax.com/objects/227/ which does indeed show that I'll have to go to full transactions in the back-end Cog but that's material for another upcoming post...

In the next post I'll fall back to the initial device turn-on and then move on to aspects of serial Tx and Rx.

Turning on the SPI Engine communication with the CAN controller

With the firmware organization figured out we start to cobble together the first pieces of code which others have developed for us to test and learn from. In this case I'm starting with the SPI Engine 1.0. If you remember from the first post where I show which lines are probed you know that I've probed the SPI bus and a few extra lines to/from the CAN Controller. The logic analyzer capture (double click to enlarge) shows many bytes being sent to the MCP 2515 which include commands for toggling the /Rx0Bf, Rx1Bf, and /Int lines so we can see that the lines assert when the MCP 2515 wants to signal the Propeller that it needs servicing.

From top down the diagram shows the following signals:
  • /Rx0BF
  • /Rx1BF
  • /INT
  • SPI CLK
  • SPI data to CAN
  • SPI data from CAN
  • /CS
  • /Reset
  • (the remainder can be ignored.)

A quick key to reading this is when the violet line is low (/CS asserted) we are commanding or reading data from the MCP 2515. The dark-red line is data from the propeller to the CAN device, while the dark-green line is data coming from the CAN device. At the top in bright red/yellow/green are the handshake lines coming from the CAN controller which when controlled by the MCP 2515 means something specific. In this test however I've set them up as general purpose I/O so I can toggle them and prove them to be working.

I'm seeing expected wiggles on all the right lines so for now (many software iterations to get to this point) we are having a great day!

Tuesday, January 8, 2008

Which Cog is doing what?

The propeller is "fun" in that we have eight identical CPU's on the one chip. So how should the work be divided amongst them? That's what this post is about.

Each CPU is called a Cog. As I'm starting this effort I'm attempting to smartly allocate just enough Cogs to create a reasonable separation of function while leaving enough Cogs free to activate monitoring/debug functions in the unused Cogs.

The picture on the right (click for full-size view) shows my current draft functional breakdown and marks Cog assignments with the Blue Gear icon (each blue gear on the diagram means a different Cog is assigned to accomplish that function.) Major I/O pins are shown routed to these Cogs as well to show that each Cog interacts with a different part of the external hardware.

You can also see that I use queueing (Tx and Rx Q's) to stage traffic for handling by the various Cogs.

Finally, please note that I reuse one Cog by first loading the auto-baud detection code into the Cog and then when we know the Serial baud-rate set by the driver communicating with the propCAN device then the Cog is stopped and the serial receive code (or another task) is loaded into the same Cog since the auto-baud code is no longer needed.

So, here you have the initial proposed software organization. This mechanism is basically working today. However, Now we've got some issues to which we need to attend.

In the next post I'll show measurement of the current through-put as seen from a logic analyzer which is watching serial Tx/Rx, SPI bus and CAN Tx/Rx and some extra pins used for debug output.

Let's talk performance expectation

When I started this project expecting to use the Propeller chip and the FTDI chip I had to "run some numbers" to see if this could be done.

One aspect of using the Propeller chip is that there is a lot more software simulation of communication than with other microcontrollers. In the Propeller there is no USART, no SPI or I2C hardware. We have to do all of this in software. There are rudimentary proof-of-concept objects available for Spin (the propeller programming language) but they do not get near the speeds (in bytes per second throughput) that we need. How do I know what I need? Let's look at the CAN bus traffic to determine this.

This USB to CAN controller is for general CAN use but is being created so we can control or interact with a "constellation" of "Widgets" on the CAN Bus. See http://can-do.moraco.info/ for a description of the "Widget" including full user manual and Programmer's Guide. This Widget is designed to be the point of integration for each payload to the satellite's house-keeping computer. This "constellation" of "Widgets" is really the collection of payloads in the satellite all communicating with the house-keeping computer over the CAN Bus.

The protocol for communication with these Widgets is a small subset of the possible range of CAN messages. The actual messages used are called out in full detail in the appendix of the User's Guide at the aforementioned web-site.

In order to "run the numbers" I had to select the most useful way to view the traffic. That is CAN messages consist of a header and an optional payload followed by a few more fields (checksum, ack, etc.). The data portion of the message contains "Stuff bits" which are bits injected by the transmitter into the message so that there is never more than 5 consecutive ones or zeros. This facilitates clock recovery from the serial data stream. You can read in more detail in the CAN spec. but the important part for us is that there is a minimum bit length and a maximum for each payload size (0-8 bytes). For performance measurement then I chose to use the minimum (which we mostly can't attain) so that we'd get best saturation of the CAN Bus.

We have 0-byte, 2-byte and 8-byte payloads in the AMSAT protocol so my numbers yield the following as goals:

AMSAT CAN Bus rate: 800k bps (800,000 Hz)

Minimum bits lengths for CAN Messages:

  • 0-byte payload: 47 bits for a max rate of 17,021 messages/sec
  • 2-byte payload: 63 bits for a max rate of 12,698 messages/sec
  • 8-byte payload: 111 bits for a max rate of 7,207 messages/sec
Oddly enough these numbers inform our serial (propeller to FTDI USB chip) data-rates by the length of the 8-byte payload packets and inform our SPI (MCP2515 to propeller) data-rates by the max rate of zero-byte payload message arrivals from the CAN Bus. These numbers are then:

  • Max Serial Tx rate: 1.51 Mbits/sec (10bit characters at 22 chars/message)
  • Max SPI rate: 1.8 Mbits/sec (17,021 messages per second at 14 bytes per message)
    [actually this might be slightly higher based on SPI transactions used]

So, there we are. We have pretty high maximum data rates for both Serial Tx and SPI.

Now, let's move on to discussing the first firmware functional organization.

Monday, January 7, 2008

And so it starts...


Ok, I've a lot to do to enable this device. It's basically working but it lacks performance in many areas. I'll eventually describe each area and see if we can address them one by one. Hey, if you have constructive ideas, "I'm all ears". ;-)

I hope you enjoy this trek...

Oh, yes... what is propCAN? The diagram (click to enlarge) shows the major blocks of this device, the protocols being used and which signals are probed by the logic analyzer so we can verify the software operation.

For device pictures (my hand soldering of surface mount parts ;-), board layout, logic analyzer probe attachment, and further device description see: http://propcan.moraco.us/

Next up... let's talk general design goals followed by software organization (with links and pics, of course)