Category Archives: Code

libkakapo is now LGPL

After a bit of discussion, I came to the realisation that while I am keen to make sure that libkakapo is open source, I only care about how others improve the library, rather than what they do with it.

Pragmatically, I don’t mind if libkakapo was to be used by a closed-source project, so long as if the project made some great improvements that those changes ended up in the main source.

I had chosen GPL for the library more or less by default. But using GPL for the library means that your projects must be under the GPL as well – the linking clause requires it. This makes no sense for a library where I’m only interested in the drivers, not your project source.

LGPL means that both open source projects can use these drivers (which is good, because they can’t use Atmel’s ASF), and closed source projects can as well. The only obligation closed source projects have is to ensure they distribute the library source with their firmware (or generally make the library source available), and that any modifications to it must be included as well.

I think this is a nicer balance and I know a few people have asked about how to use this library on closed-source projects they’re being paid to develop for customers. I’m actually very supportive of that, because it indirectly funds improvements to the library. And that’s definitely a good thing.

So there we are: libkakapo is now LGPL.

Support all the things

I started out libkakapo as a basic set of drivers for one specific board, with the intention that there would be little abstraction of the real hardware configuration. The point behind doing this is to allow users to step away from a heavily-abstracted framework like Arduino into something which exposes a bit more of how things work.

The drivers in libkakapo are intended to be boilerplate, rather than abstracting. But the early implementation of it was too focused on one specific board with one specific MCU. This resulted in using a simple integer for what instance of a specific digital interface we wanted (eg, the “first” USART was 0.. whatever “first” meant).

I wasn’t very happy with this, esp given “first” was unclear, as was second and so forth. It made no sense when I was not trying to hide many other details to hide the hardware names, but a solution to make it friendly while fast wasn’t forth coming quickly.

The fast thing is important. If you look at many abstracted interfaces, they boil down to mapping between a friendly name or identifier and the actual resource. digitalWrite() in Arduino hides the real port and pin names, so it needs mappings to deal with this.

If a mapping is inefficient, that’s fine in setup but a real problem on regularly called code, especially in interrupts. Mapping from a number to a port is easy, but mapping a port to a number is harder. It seemed to boil down to an expensive series of if statements or an excessive number of defines or something.

I had kicked the idea of fixing that to touch, and got on with other drivers. But recently during beta testing with Nicegear, it’s become a bit more obvious the simple integer doesn’t work. I’ve built a beta board for them with a different MCU from the expected production run, using a 128A4U instead of a 64D4. The board physically supports it, so it was an easy thing to do.

But now libkakapo is stuck, do I fork it for diferent MCUs or support many, and then there’s this problem with that simple integer.

In the end, the best choice was to go back to what decisions I was making about abstraction. Exposing the hardware names was the right thing, I just needed a way to map it efficiently in both directions.

The solution was to use an enum type which listed all the hardware names (actually a similar name, so it wouldn’t overlap with avr-libc defines), and use this as an index into an array. The whole explanation is in this github ticket.

The upshot is we get to use hardware names, and the ISRs which depend on it can select the right port handler immediately, instead of a long series of if statements. Now it looks a lot better and an annoying bit of abstraction is removed. And it’s much easier to support almost any XMEGA MCU now.

XMega Clock System

The change between the AVR Mega and XMega architectures that I like the most is the clock system. It would be fair to say I have little love for the way that the Mega implements clock options, and given the changes in how it works in the XMega I don’t think I was alone.

On a Mega, the choice of system clock is set by fuses. It generally isn’t intended that you change fuses from your main code, instead they should be changed mostly by external programming or when asked by a bootloader. That is, you set fuses when doing a brand new build on an out of the box chip and never at any other time.

Mega fuses are a pain to work with. Apart from being inverted in sense (ie, 0 and 1 are called programmed and unprogrammed, and programmed means “active”) a single incorrect write to them can brick the chip and disable important programming interfaces. If you wanted to run a Mega at full clock speed, you had to carefully ensure an external crystal was present and then write the appropriate fuses. And hope.

(Pedant alert: technically there was one clocking option you could change from within your main code, that was system clock prescaler. It’s initalised from a fuse (CLKDIV8) to divide by 8 for a safe default from a variety of possible sources, but you could switch that to undivided in your main code quite easily and non-destructively. That still limited you do the internal 8MHz RC oscillator, however.)

The XMega does away with this disaster, and instead allows you to change the system clock source from within your main code at any time. It allows you to run the chips at their full speed without external components. You can use external components of a variety of speeds and multiply the clock in addition to dividing it. It’s a vastly better design.

By default, an XMega will always boot off the internal 2MHz RC oscillator, with the PLL multipler disabled and no prescaler divisions, giving us a nice simple reliable default clock to work with. Should any clock option you set later fail (and, yes, it has clock failure detection) the chip will always force itself back to the 2MHz RC oscillator. (It also generates a non-maskable interrupt for this situation, so you main code can do something about it.)

This means, in general, whatever you do to the chip is a simple flash of some new code to apply a different set of clocking options, even if you have pieces missing off the board you’ve built.

All XMegas are capable of running at 32MHz @ 3.3V and there’s a variety of ways to get this. The easiest to start with is using the internal 32MHz RC oscillator and run everything at the same clock rate. The only trick to enabling this is to ensure the oscillator is running before attempting to use it, and then changing the clock via a protection mechanism:

OSC.CTRL = OSC_RC32MEN_bm; /* start 32MHz RC oscillator */
while (!(OSC.STATUS & OSC_RC32MRDY_bm)); /* wait for ready */
CCP = CCP_IOREG_gc; /* allow changing CLK.CTRL */
CLK.CTRL = CLK_SCKLSEL_RC32M_gc; /* system clock is internal 32MHz RC */

This does mean all peripheral modules are running at  32MHz as well, which all modules will accept by default. Some modules (EBI, HiRes Timer Extensions) run off faster clocks, which can be twice (PER2) or four times (PER4) the maximum system clock speed. Setting up the clock system for those I might cover at a later date.

Otherwise, that’s all which is required to get an XMega with no external components running at full clock speed. A lot nicer than a Mega!

An external crystal is not much harder, again you need to start the appropriate oscillator, wait for it to be ready, and change the clock via a protection mechanism. But in this case, we’re going to start with an 8MHz crystal and multiply it up to 32MHz using the PLL:

OSC.XOSCCTRL = OSC_FREQRANGE_2TO9_gc | OSC_XOSCSEL_XTAL_256CLK_gc; /* configure the XTAL input */
OSC.CTRL |= OSC_XOSCEN_bm; /* start XTAL */
while (!(OSC.STATUS & OSC_XOSCRDY_bm)); /* wait until ready */
OSC.PLLCTRL = OSC_PLLSRC_XOSC_gc | 0x4; /* XTAL->PLL, 4x multiplier */
OSC.CTRL |= OSC_PLLEN_bm; /* start PLL */
while (!(OSC.STATUS & OSC_PLLRDY_bm)); /* wait until ready */
CCP = CCP_IOREG_gc; /* allow changing CLK.CTRL */
CLK.CTRL = CLK_SCLKSEL_PLL_gc; /* use PLL output as system clock */

While this is quite a lot more code than using a crystal clock on a Mega, it is still better since we get flexibility about what kind of crystal we’re using and the desired real clock rate we want from it. We could have done this with just a 32MHz crystal, and avoided the PLL. But only needing a cheaper 8MHz one and being able to multiply it up to the desired clock rate is a nice feature.

It would help if I knew what done looked like

Clearly posting about the NTP Server v1.4 board being done was a mistake, as I ended up launching myself into a major overhaul of it anyway. Yep, 1.4c was forked into 1.4d and has many many changes.

Part of the problem is I’m not settled on what “done” looks like. I was originally planning on not migrating from a simple watch crystal RTC to the DS3234 RTC until after 1.4 was completed. But working with the XMega RTC has given me a lot of niggly issues around getting it to behave well with aligning the clock.

The problems with the XMega RTC are a mix of just the result of using a 20ppm crystal (ie, you get 20ppm accuracy, and that’s not so cool) and some interesting choices about how it’s built.

Some AVR Megas have an async clock timer, usually it’s the single 8-bit timer. It’s pretty basic, but 8-bit is actually a good choice since you can cover off the rest of the needed bits for a 32.768kHz crystal off interrupts, once every 256 ticks is not a heavy interrupt workload. The Xmega doesn’t have an async clock timer, instead there’s an explicit 16-bit RTC with a crystal input.

At first (mostly when reading the datasheet) this looks pretty good. But, there’s a few annoyances. A number of Megas (and, more specifically, the ones I’ve been using like the 1284P) allow not only a 32.768kHz crystal but an external clock input. This allows you to use a variety of other clock options. The Xmega RTC has no such option, not in the A and D series anyway, and that’s almost all of the ones actually shipping. (You can brute force an external clock into a crystal input but I’d prefer to stay within what the datasheet says the chip is expecting.)

The other issue with the RTC is the way you access it. Because it’s running in it’s own clock domain – specifically off whatever 32-ish-kHz source you chose – you have to jump through a lot of busywaiting to get configuration pushed into it. Busywait. Set something. Busywait. Set something else. And so on. The same problem plagues accessing the current count, if you want to change it yet more busywaiting.

You also don’t get any capture options with the RTC, it has just compare and period registers, that’s it.

This makes it quite difficult to align the RTC to a fine degree. Sure, it’s great if you just need a simple timer to fire, say, a 1Hz interrupt to do something else in your code. It’s actually very good at that. But if you actually care what “now” really is, then it doesn’t cut it.

Falling out of that has been a large redesign, and involving the DS3234 RTC to replace the simple crystal. Thankfully while they did take away async clocking for a timer, they replaced it with a much more awesome system and the result is any of the normal 16-bit timers can provide a better replacement with finer control and more reliable capture. I’ll probably post about that some other time.

I’ll still be using the Xmega RTC, but it’ll just be there as a system clock for trivial event timing that we don’t really care about alignment for. Hell, 1% RC oscillators used as an RTC will be fine for many other timing needs in the code.