Sustainable Suburbia

If you've got a problem, don't care what it is

If you need a hand, I can assure you this

I can help, I've got two strong arms, I can help

It would sure do me good to do you good

Let me help

Few people today are aware of Problem Oriented Languages - a term first coined by Charles H. Moore - just over 45 years ago in June 1970.

Here is his unpublished book retyped some years ago into HTML, which again today I came across, this time on the Raspberry Alpha Omega blog of Frank Carver.

Frank is clearly an experienced and talented software engineer, and understands the inner workings of computer software in ways that I just struggle to grasp. So it was a pleasant surprise to find that Frank and I have been following virtually parallel journeys for the last couple of years, he from a software perspective - and myself from a hardware perspective.

This post is not specifically about Problem Oriented Languages, but more how they are still relevant today and can be applied to a range of fields of computing science.

A Problem Aired....

My immediate problem is that I am now working with a range of microcontrollers, both AVRs and ARMs of several different flavours and I need to find some common ground between them, and establish a comfortable development environment - founded on mutual territory. Throw into the mix, some stack-based soft cores running on FPGAs, and the chance of finding my Nirvana rapidly fades into dust.

All of these processors fit into the 20MHz to 200MHz clock speed, and really don't have enough memory resources to support an operating system like Linux. Some of the larger parts can be programmed in MicroPython or JavaScript but that's not much use for the soft cores or the smaller parts.

I wondered for a while whether Arduino might become the lingua franca, at least for "hobby" projects - because as well as the ATmega, it has been implemented on the STM32F ARM M3 and M4 parts and also the ZPUino soft core - hosted on the Spartan 6 of the Gadget Factory Papilio Duo.

So for the moment, you have a range of different microcontrollers, with which you can share Arduino sketches and libraries.

A word about FPGAs and Soft Core CPUs

In the last few years, FPGAs have become commonplace, and the toolchains needed to develop applications on them, have become free to use. Several tech suppliers are offering low cost dev-boards based around a small FPGA, with sufficient hardware support, to allow programming via serial or JTAG, plus support chips - such as external SRAM, SDRAM and configuration flash.

So far my journey has taken me to the Xilinx Spartan 6 - as there are several hobby-boards based around this device. Plus this part appears to have a lot of support from the emerging FPGA community.

Jack Gasset and his team at Gadget Factory have democratised the FPGA, in a similar way to how the Arduino Team democratised the ATmega and C++.

By porting an open source soft core 32bit CPU, plus a library of other cool hardware resource libraries - such as sound chips and music synthesisers, you can now make your own hardware designs relatively easily and control it all with the ZPUino softcore - executing Arduino sketches at 100MHz.

The great thing about the ZPUino FPGA soft core for projects, is that there is a hardware VGA generator plus the Adafruit graphics library available which can support 800x600 resolution VGA. You just plug in a VGA monitor, and you can draw graphics to your heart's content - all in a few lines of Arduino code. This is a super-quick Arduino, with colourful grahic output that you are totally in control of. Other libraries allow you to add a PS2 keyboard and mouse - you have the makings of a complete development environment on an FPGA. It's the computing equivalent of an early 1990s workstation - but unconstrained by any operating system - effectively a blank canvas onto which you paint your computing dreams!

If you want to look further - this was built using a Papilio Duo FPGA board fitted with a Computing Shield. If you want to go this route, make sure you buy the Papilio Duo with the 2MB SRAM - as the cheaper 512KB part will severely limit your graphics play.

So far, more interesting options - and no one solution. And also, the allure of FPGA soft core processors was becoming a great distraction.

J1 - A Forth CPU

Almost exactly 3 years ago to the week, I was in New York for the Open Hardware Summit. One speaker was James Bowman, who introduced us to GameDuino - a bolt on video system for an Arduino. In th form of a shield, with VGA and audio connectors - it could handle the whole video graphics - based on a neat Forth processor, that James had designed - and known as the J1. James had developed the J1 for a robotic video system based on a Xilinx Spartan 3E FPGA, which has more than enough grunt to handle the sprites, backgrounds and tone generators of a typical 1980s arcade video game.

The original J1, presented at EuroForth 2010 is descrbed here.http://www.excamera.com/files/j1.pdf
and further links to the J1 and GameDuino are here.

James has since ported it to the Spartan 6, made a few demon tweaks, and it can now run on the Papilio boards. The J1 can be instantiated as either a 16 bit or a 32 bit processor. Clock speeds approachin 100MHz are achievable.

In addition to developing the J1 hardware, James has created a version of Forth to run on it known as SwapForth, and has leveraged some open source FPGA tools to create an open core processor, running open source software - developed with open source tools. This video gives a quick demo.

I am still very much digesting the last 3 years on posts on Frank's blogsite. The next post here will try to make sense of it all.

This is a follow on from yesterday's post, where I described some of the cpu technology I currently deal with and how it would be great if there was a common language that would run on them all.

I am currently working with STM32Fxxx Cortex M3 M4 and M7 ARM devices, having designed half a dozen pcbs based around them in the last couple of years. I have also worked with ATmega parts - mostly derivatives of Arduino and now I am branching out into FPGA softcore CPUs.

I am not a natural or native software engineer, and I feel the need to simplify the processes to the point where I understand what is going on and I am in total control of the device. This low level approach rather constrains the scope of what I can achieve, but it gives me the satisfaction of understanding the whole picture. This approach has led me into the territory of baremetal programming - something that I am slowly starting to get a taste for.

To make life easier, we surround ourselves with familiar useful objects and tools. As we become accustomed to these new things, learn how to use them efficiently, they improve the quality of our output and enrich our (working) lives. It's been the same for the last 50 years, since when computers first started to appear in quantity.

But more computers doesn't always mean better, I have about 7 within arms reach, and one in my pocket. Fortunately 6 are switched off, and the one in my pocket has not disturbed me for at least 30 minutes. There's the laptop I am writing this on, that I bought about 5 years ago and takes 5 minutes to boot, and the Android in my pocket which I use for making calls and sending texts. Most of my interactions with modern operating systems is solely as a user of pre-packaged applications, the inner workings I know nothing about, and probably don't have the time to care.

I must confess I don't know how to drive a Linux machine through it's command line, I only powered a Raspberry Pi for the first time in April this year and I am totally lost on a MAC. To make things worse, I have no idea why my unbranded Android keeps popping up text box warnings in Chinese!

So - as a dinosaur of the digital age, I will stick to what I know and remain in my comfort zone - which is as close to the bare metal as possible.

Most microcontrollers are programmed in C or one of it's close cousins, with the language tools normally residing on a much more powerful machine, such as a laptop - with whatever flavour of operating system suits the user. As confessed above, I have grown up with Windows, as have most of the software tools I use in my day to day work and play. It's taken me several years to master one pcb design package, so why should I struggle like an infant, trying to learn another - just because it's the flavour of the month. My "muscle memories" are tuned to driving EagleCAD, so when out of interest I tried KiCAD, I found it one of the most frustrating, unproductive 2 hours of my adult life. I am beginning to think what they say about old dogs and new tricks, is not only true, but just a polite way of saying "move over, Grandad, you had your chance".

So this is increasingly why, tech people of my generation, cling to the foundations - and avoid climbing the towers of Babylon that have been built, every increasingly skyward, on top of them.

This is probably not what I intended to write this time - but I feel much better for getting it off my chest.

In my previous post which turned out to be a rant about the ever increasing complexity of software, I decided to dig out a draft post from earlier this year which describes my continuing investigation into how the lower levels of computing languages are implemented.

The purpose is to create a set of low level tools which allow application code to be developed easily and interactively on a microprocessor or soft CPU with limited memory resources.

The microprocessor may be an ARM or a softcore running on a FPGA - and with no C compiler available how do you start to bootstrap the processor from first principles?

Booting from Scratch

In bygone generations of computer hardware, there would be a set of toggle switches which could be set to represent the binary instruction words. These words would be individually deposited into memory and the program counter advanced. A very short routine, consisting of only a couple of dozen instructions would be manually toggled into memory, allowing a paper tape to be loaded. Great ingenuity was applied to the hardware architecture so that these loader routines were short.

Daniel Bailey has devised a fun project - his C88 computer is a home-brew CPU on a FPGA which takes one back to the very early methods of booting a cpu from scratch. This blogpost and video explains it nicely. Daniel will be presenting his C88 at the forthcoming OSHcamp- at the end of September in Hebden Bridge.

As technoogy progressed, and IC memories became available, interchangeable eproms were used to allow the program code to be quickly modified and re-run. However there might be a 20 minute delay between detecting a bug and reprogramming another eprom with the corrected code. To speed up the time needed to program and erase eproms - often a nonvolatile RAM,was used to emulate the eprom and hold the program code. I spent a summer in my late teens programming a Z80 dev board in this manner.

These days almost all microcontrollers have on chip flash memory, and some method for in circuit serial programming. (ICSP). As a PC or laptop is invariably used for program development and code compilation, virtually all of the development tools are hosted on the PC, and very few systems have the means to edit and recompile code on the target system itself.

There has to be another way

There is however another means of working, where a minimal interactive working environment can be loaded onto the target system, and this allows the development and debugging of programs on the target system itself. Very little additional support is needed other than a serial interface. The code can be developed in a text editor running on the microcontroller. This is no different to the way that interactive languages, such as BASIC were developed on home computers in the early to mid 1980s.

Interactive development is a completely different method of working compared to the usual edit, compile, load, test approach that has arisen out of the almost universal use of compiled languages - in particular C and its derivatives. It was only when PCs gained more RAM, disks and cpu speed that C compilation became practical.

Forth was one of the first interactive languages, - developed in the 1960s by Charles H. Moore, and it is Forth that influences my investigations here.

Forth might be compared to assembly language, as it deals with low level operations involving snippets of machine language that are threaded together to create some program function. However it is much more than just an assembly language, as it provides the means to edit, compile and assemble source code, in an extensible manner, using a very compact language.

Forth does away with much of the clutter of other languages, and gives the programmer direct access to the resources of the processor. Forth offers high speed execution, especially on FPGA hardware designed to specifically match the primitive instruction set of the Forth language. Modern, low cost FPGAs can run a Forth engine at a clock frequency of some 50 to 200MHz. At this speed the interactive nature of the language becomes extremely fast, and the time taken to go around the edit-compile-exexute process is slashed to just edit-execute. This is a significant saving in time, and ideas can be tested and rejected in the time otherwise spent compiling a large application.

Forth uses "words" to perform program functions. A Forth word ultimately causes a series of machine language subroutines to be executed. In some respect - a Forth word may be likened to a LEGO block, which can be used with other blocks to make a more complicated structure. These structures may be further combined to build up the entire object - in our analogy - the program application.

Forth words are threaded together, like beads on a string, with the end of execution of one word, rapidly jumping to the beginning of the next.

Brad Rodriguez has an excellent article "Moving Forth" which explains how this threading process is achieved. Although written over 20 years ago - the fundamentals have not changed.

SIMPL

In an attempt to understand the fundamentals of Forth, I studied another threaded, interpreted language, SIMPL - the Serial Interpreted Minimal Programming Language. I have blogged several times about SIMPL since May 2013 - when I first came across it. Refer back to previous posts for details.

SIMPL takes some aspects of Forth, and packages them in an easy to understand manner which may be adapted to run on a variety of microprocessor platforms. I first used SIMPL on a ATmega328 Arduino, but have subsequently ported it to ARM and FPGA soft core processor targets. SIMPL is written in more or less standard C, which makes it portable between, processors, without the complexity of dealing with the machine language of each target processor. It is a stepping stone towards a real Forth.

SIMPL provides an easy way of sequencing blocks of code functions, and doing it in an interactive manner from the serial keyboard interface. It's not so much a programming language - more a way of automating the access to routines - and done in an interactive manner.

As an example:

Suppose you have just written a function in C to plot a graphical character on the screen, and you wish see what it looks like - using a VGA graphics adaptor or a TFT LCD screen. In order to test this function, you need to supply it with parameters for the x,y position that you wish it to be plotted, and possibly the foreground and background colours. If you hardcoded these parameters, and got them wrong, then you would have to go back and edit and recompile until you got the effect you were looking for.

SIMPL provides the mechanism to enter these parameters and execute the function interactively. If you don't like the first result, you can change the parameters with a few key-strokes and test again. You have virtually eliminated the edit-recompile cycle from when you are developing new code functions.

The Mechanics of SIMPL

SIMPL exists as three short functions that provide a complete serial interpreter or shell, to accompany the rest of your program. By calling the SIMPL interpreter each time the main loop is executed provides the means to interact with the rest of the program.

The SIMPL interpreter consists of the following 3 functions executed sequentially within a loop.

txtRead() takes characters from the serial input and loads them into a buffer beginning at address buf - which has been allocated a length of 64 characters

txtChk() looks at the first character to see if it is a colon : If so it copies the input text string to a specific address - to form a "colon definition", an idea borrowed from Forth

txtEval(addr) This is the interpreter - It is pointed to the start address of the text string and evaluates the string a character at a time. The SIMPL interpreter has a few rules on how it treats each character

If it finds numerical characters, it enumerates them to a 16bit integer that it places on the stack.

If the characters are small letters or punctuation characters it will execute a given function each time that character is encountered. For example, you may have allocated the character "g" to plot out your new graphic symbol, each time g is encountered.

Uppercase characters are used as addressing pointers - to point to additional character strings, either in Flash or RAM. The colon definition allows you to compose a new sequence of characters, and access it every time the uppercase character is typed.

Because the text is always loaded into a buffer - it is executed at full speed - and not influenced by the serial baudrate.

In short, SIMPL provides an easy interactive means to call your code subroutines (functions) in a sequence from the keyboard, and provide serial printed output back to the serial terminal.

SIMPL is essentially providing a subroutine address lookup, and call to that subroutine, encoded into the ASCII value of a character. Sequences of characters provide sequences of function calls. Numerical parameters may be passed to those subroutines using an elementary stack based method.

Extending SIMPL

SIMPL is not a fully blown language, but a method of automating the calling of functions according to a stored sequence of characters or tokens. It treats every ASCII character as a new instruction, so you cannot use multiple character words such as ADD to convey a meaning - as this would be interpreted as sequence A followed by sequence D followed by sequence D and so on. This is where it deviates from Forth.

It's control structures are limited to simple loops - controlled by a single down counting loop counter.

SIMPL is the stringing together of subroutines in Flash, that have been created from compiled C code. A 16 bit ADD in SIMPL is going to be somewhat slower than the same operation in a directly executed subroutine.

SIMPL does provide some pointers to how a more Forth like version could be implemented.

It already has the means to interpret an integer number and place it on a stack.

It also has the means to take a sequence of tokens - in this case ASCII characters, look them up in a jump table and retrieve the start address of a code subroutine, which it then executes, before returning to the interpreter to decode the next token. Each 7-bit token provides an index into a jump table of addresses, whom where the next word is executed.

To extend SIMPL, it will be necessary to create some additional text handling code, so that typed words can be processed, added to a dictionary and have a codefield associated with them. How exactly this will be achieved will be the subject of a future post.

In the earlier post "Building form the ground up" I wrote about how Forth could be used to provide a low level development environment for an unfamiliar processor for which a C compiler was either not available, or not desirable to use.

The first task is to write a simple virtual machine for the target processor, and then use this VM to run your application code. This is similar to what Java bytecode does.

The virtual machine need only have a handful of instructions, including basic arithmetic, logical operations and the ability to access and manipulate memory. From these primitive instructions, all subsequent instructions may be synthesised.

This is the approach described in "The Elements of Computer Systems" (TECS) which is also known as "From Nand to Tetris". The simple machine described, consists of little more than an ALU, a program counter, two registers and a memory interface capable of accessing a 64K x 16 bit address space.

Designing a Virtual Machine

If it is possible to code up a simple virtual machine, on any choice of microcontroller, or softcore CPU FPGA - then it becomes practical to program the virtual machine using the same machine language.

The virtual machine will be a stack machine, and will support a data stack and a return stack. These stack structures will probably be created in RAM on the virtual machine. If this was then implemented in FPGA hardware, the stacks would be separate RAM blocks, with a means of fast access, without having to go through main memory.

The ALU has access to the top and second items of the stack, and will conduct arithmetic and logical operations on these elements, leaving the result on the top of the stack. If operands are required from memory, they will first have to be loaded into the top and 2nd stack locations.

Assume that the virtual machine ALU can perform the following operations. The arithmetic and logical operations are performed on the top 2 members of the data stack, returning the result to the top.

ADD
SUB
MUL
DIV

Multiply and Divide can be synthesised from shift and add/sub - but are time consuming without additional specialist hardware.

AND Bitwise AND
OR
XOR
NEG

SLL Shift Left Logical
SRL Shift Right Logical

@ Fetch a word from memory and place on the top of stack
! Store a word from the top of stack to memory

BRZ Branch if zero
JMP Unconditional Jump
CALL Call subroutine
RET Return from subroutine

LIT Put a literal number onto the stack

So with approximately 20 instructions, we have the basis of a stack machine that can do useful work.

The virtual machine to perform these operations can be written in a high level language - such as C, or actually implemented as a soft core on a FPGA. One such stack processor that lends itself to this is James Bowman's J1 Forth CPU. This compact, no thrills CPU can be defined in under 100 lines of C, or synthesised in FPGA hardware using under 200 lines of Verilog code.

The J1 is a practical CPU, designed for simplicity and optimised for high speed execution of stack based languages. It offers most of the instructions outlined above and forms the basis of an exploration into stack based CPUs. Initially it can be simulated on any processor in C code for experimentation and later blown into real high speed FPGA logic.

Even though the instruction set is very small, the 16 bit instruction word length does not lend itself to easy or memorable coding in machine language. It has several instruction fields, and these have to be populated with the correct bit pattern if we are to make any progress off the starting blocks. The instructions to however map very well onto the Forth language, but there are other alternatives which could be explored.

At the bare minimum, we need an assembler to synthesise the instruction words from the various fields, and once we have a list of 16bit hex instructions we need to load them into RAM and have the simulator step through them.

An assembler typically scans through a text file containing instruction mnemonics, such as ADD, JMP, CALL etc and converts these into the instruction opcodes. It also looks for numerical constants, variables and addresses and assembles these into the instruction. Additionally, it looks for labels that identify subroutine addresses for jumps and includes these in the code.

A relatively simple program: Mnemonics in - machine language out

Another Option

However there might be a different way to do this - in a more interactive nature - and this is where Forth or one of it's near cousins will come in. For the purpose of my exploration, I want to see if the SIMPL interpreter can be used as a means to perform this assembly step.

As we know, the SIMPL interpreter will read a series of characters from a buffer, one at a time, and execute code associated with that character.

So to add 2 numbers (say 45 and 63) in SIMPL and print out their sum

45 63+p

45 is interpreted as a literal to be pushed onto the stack
The space is used in SIMPL to push the stack down
63 can then be pushed onto the stack
+ adds the top two members of the stack leaving the result on the top
p prints the result to the serial terminal

This is almost Forth, except in SIMPL there is only a limited stack structure and the space is needed to command the stack to push down to accept another number.

Instead of executing the SIMPL instructions directly, we can hijack the SIMPL interpreter to synthesise the assembly language and the machine language needed for the virtual machine.

So 45 63+p is translated to assembly language

LIT 45
LIT 63
ADD
CALL PRINT

Where PRINT is a subroutine that outputs the top of stack contents as a printed integer to the serial port

But the translation to standard traditional assembly language with 3 letter mnemonics is an un-required middle step. The SIMPL interpreter can easily produce the instruction fields and generate the machine language directly:

802D LIT 45
803F LIT 63
6200 ADD
5100 CALL 100 (PRINT is at 0x0100)

So by this process of translation, the SIMPL language is the Assembly Language - we can find enough of the SIMPL character set to map directly onto the J1 instruction set, and any of the other command characters (like p) will invoke calls to subroutines.

It might be remembered that SIMPL uses small letters and punctuation characters as primitives, numbers are enumerated as literals and capitals are reserved for the application words. This means that our little language can have approximately 60 primitive instructions, which is enough to do real work, yet takes up a fraction of the space used by the 170 words used in a typical Forth system. Less words, less typing, yet still more readable than assembly language or machine code.

So lets look at the primitive words and how it might map onto the assembly language

+ ADD
- SUB
* MUL
/ DIV
% MOD
& AND
| OR
^ XOR
~ INV
# LIT

@ FETCH
! STORE

: CALL
; RET

< LT
= EQ
> GT

j JMP
z JPZ

I find Stack manipulations are never easy to remember in Forth - words like DUP, OVER, SWAP, NIP, DROP etc but as these are an essential part of the language, they need to have a syntax to code them into the assembly language. Perhaps it might be possible to express the top two stack items as a two member array between parenthesis.

DUP (1,1)
OVER (2,1)
SWAP (2,1)
NIP (x,y)
DROP (2,3)

Text Editor and Assembler

In order to start writing code for our new processor, we need a few simple tools to help us. We need a means of entering and displaying text - usually a serial terminal interface, so that we can enter machine instructions into memory and have the virtual machine execute them.

In the early days of microprocessors development kits there was often a hexadecimal keypad, 7 segment display for address and data, and a monitor program. This allowed the machine instructions to be entered directly into memory, and then the means to run code from a certain address. It was quite primitive, frustrating and subject to a lot of human typing mistakes. Nowadays, we can use the power and resources of a laptop to help get new systems up and running.

The first tool we need is a simple text editor. This will take in text from the keyboard, display it in a terminal window and allow basic editing, listing and text file storage and retrieval.

Secondly, no-one wants to code in machine language, so a very basic assembler that converts text mnemonics and numbers - one a line by line basis - to machine language would also be an asset.

For this we need simple text parsing - and in traditional Forth this was done by looking for individual words or numbers separated by spaces.

Forth would take each new word and look it up in the dictionary for a match. The match was based on the first three characters of the word, and it's length. This is quick to do and suitable for most purposes.

Numerical input, which includes integers up to 65535 were generally not stored in the dictionary, and would be converted from ASCII to integer and then placed on the stack.

The text editor and assembler can exist as one program - as they share a lot of features. Principally we will need the means to parse through the entered text looking for keywords and numbers. As there are only 20 or so keywords required to implement the instruction set of the virtual machine, the task of programming these into the assembler is not too difficult.

I have chosen mnemonics that simplify this task - the first 2 characters are unique to each mnemonic, and no mnemonic is more than 4 characters long. We can combine the first two characters to produce a unique number, and then use this number in a series of Switch-Case statements to perform the action needed.

The 16 bit virtual machine can handle integer numbers up to 65535. We need a means of detecting a number within the entered text string, and converting the ASCII characters to an integer. In a similar way to how we uniquely defined the mnemonics by the first 2 characters, we can do a quick test on the string to see if it is a number.

The assembler will convert our inputed mnemonics on a one by one basis into the machine instructions of the virtual machine. More detail on how the assembler should operate is outlined in Chapter 6 of "TECS".

Sometimes it is easier working in binary or hexadecimal, so additional assembler directives, for example BIN, HEX and DEC, could be used to instruct the assembler which base to use to interpret the numerical strings.

Assemblers make use of other directives such as ORG, and labels, to refer to points in the listing. Assemblers can be single pass or two-pass. A single pass assembler will require you to keep track of your own labels, which can be quite difficult if the assembly listing is rich in subroutines. So this is possibly one reason why Forth evolved as a language, it has it's roots in assembly language programming, but the Forth system of "words", provided an efficient and automated means of keeping track of labels and subroutines - it had to, as Forth is composed almost entirely of subroutines.

Charles Moore, created Forth to be an efficient and automated way of creating a common programming environment on each of the wide variety of mainframe machines that he encountered during the 1960s-70s. His virtual machine had to be first coded in the native machine language of the processor, but with the availability of C compilers, the VM can be coded in C.

Moore realised that the tasks of text editor, assembler, compiler and interactive interpreter could be bundled up into one efficient package, which he called Forth. How exactly this was done was at first fairly obscure, and Moore initially kept much of it to himself, before branching out around 1970 and sharing the inner secrets of Forth within the Radio Astronomy community.

If the primitive assembler can only accept valid mnemonics and numbers, then any other unrecognised text string could be considered to be a new word. A word that cannot be found in the dictionary, is treated as a new definition and is composed from the primitive instructions.

This is similar to the use of a macro label within a (two pass) assembler listing. Once a macro has been encountered and given an address, it can be used again.

So the combination of a simple text editor and line assembler will help us to build up the various Forth word definitions from the primitives. Whilst this C program is not a complete Forth system, it is a tool that helps us create the Forth system.

Disassember Window

Over the last couple of days - spare time permitting, I have written a simple application to assist in the development of code for a FPGA soft core processor.

So far, this consists of a memory view, a register view, stacks and a disassembler window. The windows into memory are animated such that the actions of the instruction set on memory and registers may be viewed whilst single-stepping through the code.

The novel thing about this simple application, is that it has been written in Arduino C++ code, and once compiled, it runs on a ZPUino softcore processor hosted on a FPGA. Additionally, the hardware which generates the 800x600 VGA display is also hosted on the FPGA. So we have a complete computer system consisting of cpu, memory and video generation hardware supplied on the Papilio Duo FPGA board.

The first part of this exercise was to get the graphical parts of the user interface working. These consist of the hex memory dump window and the various stack, registers and disassemnler windows.

Now, whilst the ZPUino is itself a stack based processor, and these tools will eventually be used to examine it's operation, it was decided that initially I would use the ZPUino to emulate an even simpler stack processor. The candidate is James Bowman's J1 Forth Processor (also available as a softcore for FPGA use) - which has the advantage of a very small instruction set, and a processor behaviour that is easily modelled in C code.

This may appear a somewhat round-about route but was chosen for the following reasons:

1. The ZPUino can be programmed in "Arduino code" using DesignLab - the Papilio Duo IDE
2. The ZPUino interfaces in hardware to the 800x600 VGA engine
3. Adafruit's GFX graphics library has been ported to ZPUino
4. A compact C model, and sufficient documentation exist for the J1 processor
5. I wanted to understand how the J1 works, and what it's limitations are
6. This is a programming project that meets my elementary coding skills

So my approach is to make use of the tools available. James Bowman is working on an implementation of the J1 to run specifically on the Papillio Duo board, and make use of its 2Mbyte of SRAM. Whether he will develop it to the point where a VGA engine is supported is unknown - so for the moment I have to be content emulating the J1 with the ZPUino in C, with the heavy burden of the GFX library calls.

If we can develop sufficient momentum, then there might be a srong case to put a fast Forth soft core on a FPGA with VGA. This however is beyond my coding skills - but on my wish list for the future.

Disassembler Window

This does a very simple disassembly on the instructions in memory. The jump, branch, call and ALU instructions are decode to their mnemonics for easier reading. The animated display shows the instructions highlighted in cyan as they are executed by the processor emulator.

More Interaction

So far only the graphical code has been prototyped - just enough to see the animation of the J1 processor emulation. For complete user interaction, it will require more code, in particular that to support keyboard, mouse and a text editor window.

I have ordered a Classic Computing shield for the Papilio Duo from Gadget Factory in Denver. This includes sockets for PS2 keyboard and mouse, VGA output, microSD card and a pair of Atari style joystick connectors. This will allow keyboard and mouse interaction to be developed, plus program and data storage on the microSD card. The thought had occurred to me that the Atari ports might be useful to accept switch presses from some form of custom keypad - a bit like Chuck Moore uses with his OKAD and colorForth environment.

Text Editor

The existing graphical layout allows for a text window of about 90 character columns by 75 rows. This should be sufficient for 80 column mode plus a few clickable buttons. The mouse will be used extensively for click and drag type operations - so a routine that links mouse position to the position of objects on the screen will be central to the user interaction.

As most coding languages are text based - the efficient manipulation of text leads to high productivity whilst programming. The use of colour text and highlighting of selected areas will enhance the user experience. The text window will additionally be used for serial output and command line input, and the use of the microSD card will allow source code to be saved and retrieved from "disk".

Assembler and Compiler

The J1 is a "Forth" processor, in as much that it is stack based, and almost all of it's instructions are Forth primitives. This allows it to execute the Forth language efficiently. However, a modern Forth consists of about 200 definitions, and these have to be encoded in the native instruction set of the processor. Fortunately, this is something that Bowman, and others have already done.

Background

Forth is an interactive, low level language which shares a lot in common with machine code. It allows low level access to the processor and its resources and can therefore be quick and powerful - in the right hands.
It allows a degree of interaction which has now been lost in the higher level compiled languages, but for the right applications it provides all the flexibility needed.

The following videos illustrate some aspects of Forth when used for controlling hardware.

Open Firmware

It Was Twenty Years Ago Today.......

In the mid-1990s Chuck Moore, Jeff Fox and others worked towards forth computing engines that would achieve burst speeds of around 500MIPS.

Chuck Moore developed custom VLSI devices - a series of processors where the machine language instructions were essentially Forth primitives. These processors all used a minimal instruction set - and were known as MISC processors.

Dr C.H. Ting, had also shown with his eForth model, that a working Forth could be composed from just 31 Forth primitives, and that all other definitions could be assembled from this core set. Thus a processor with a 5 bit instruction length could potentially be used for Forth execution. Dr. Ting explored this further with a series of chip designs - where 5-bit instructions were packed into 16 bit or 32 bit words - allowing 3 or 6 instructions to be fetched from memory at a time, which better suited the slower RAM access.

Keeping the speed up...

When Forth is implemented on a register based load-store architecture- such as the ARM, the overheads of running the Forth inner interpreter - in particular NEXT, means that around 10 machine instructions need to be executed in order to execute a Forth primitive. This suggests that an ARM clocked at 100MHz will only achieve around 10MIPS.

Forth requires the right architecture in the processor in order to be able to execute the Forth primitives efficiently - preferably as single cycle instructions. The processor should have a stack-based architecture, and the machine instructions should be directly map-able to the Forth primitives for efficient executing. Using this approach allows a simple Forth processor to be designed as a soft-core cpu for a FPGA - and maintain a performance of around 50 to 100 million Forth instructions per second. (Forth MIPS).

Affordable FPGAs

Whilst much of this work was done about 20 years ago using custom VLSI chips, progressive improvements to FPGAs, falling memory prices and greater access to sophisticated design and simulation tools has allowed the creation of FPGA soft-core microcontrollers to be in the reach of the hobbyist. Low cost FPGA dev-boards are available for the $50 to $80 price range.

There have however been a number of stack machine cpu designs developed over recent years, several of which have been implemented on a low cost FPGA. Notably ZPUino - by Alvaro Lopez, and J1 - by James Bowman, although several others exist.

James Bowman's J1 design is of interest because it is close in architectural design to Chuck Moore's 1985 Novix NC4000, but much simpler because the data and return stacks are implemented in on-chip RAM. This gives it the potential for 100 Forth MIPS - when implemented in a Xilinx Spartan 3E - and described in under 200 (160) lines of Verilog code.

J1 is incorporated into the Gameduino Shield - a gaming - graphics and sound generator for Arduino. Versions are also available from Olimex - which include PS2 keyboard connector and additional 32MB SDRAM for extended resolution - although Olimex leave you high and dry when it comes to implementing firmware to make full use of the extra 32Mb!

https://www.olimex.com/Products/Modules/Video/MOD-VGA-32MB/open-source-hardware

The J1 Processor Model

The J1 processor is simple enough that it may be modelled in about 100 lines of C code. I used this model available from ddb's Github Repository

The J1 model is created from James Bowman's original documentation "J1: a small Forth CPU Core for FPGAs" and is very similar to the verilog code that defines the J1 implementation in hardware.

More documentation at James's J1 site . As can be seen, the J1 has been used in a variety of projects including the Gameduino shield - which is a graphics engine in the form of an Arduino shield.

The J1 has just 5 categories of instruction coded up into a 16 bit instruction word:

Literal a 15 bit literal pushed onto the data stack
Jump Jump to a 13 bit target address
Conditional Jump Jump if T is zero to a 13 bit target address
Call Call a subroutine at a 13 bit target address
ALU

The ALU uses a 4 bit field to determine the its action, and there are additional bit fields to control access to the stacks and memory.

T -> N Copy T to Next
R -> PC Put the return stack into the PC to get a free Return
N -> [T] Store Next at the location addressed by T
T -> R Copy T to Return stack

Additionally there are two, 2-bit, bit fields that allow for the increment and decrement of the data stack pointer, and the return stack pointer - this enables items placed further down the stack to be accessed.

Simulation

The instruction set of any proposed processor may be simulated in software. Once a model of the various stacks, registers and memory has been devised, it becomes a relatively straightforward task to create a C program, with text output, that simulates the operation of the cpu and instruction execution. Whilst the output of the simulator is either text or graphics, the process can be further developed to the point where any processor can emulate the instruction set of another - but with a vast speed penalty.

Fortunately the relatively simple J1 processor may be quite easily simulated in C - even using an Arduino.

The model consists of a 512 word memory (As an Arduino Uno only has 2K of on chip RAM)

Snippets of machine language are loaded into the RAM during the setup() function - for example

m[0] = 0x6000; // NOP
m[1] = 0x8020; // LIT 20
m[2] = 0x8010; // LIT 10
m[3] = 0x6400; // ADD
m[4] = 0x6700; // NEG
m[5] = 0x6000; // NOP
m[6] = 0x0001; // JMP 0001

In this trivial example two literals are loaded onto the stack, added together, negated and the whole process is repeated as an endless loop - by the unconditional jump back to the beginning. This is definitely not a particularly good example to illustrate Forth, but it's a good test case to show that the processor model is correctly fetching, decoding and executing code, and that the "alu" and pc are working properly together.

The information contained in the machine instructions contains the following

Numerical constants or literals. These are 15 bits packaged into a word that has bit 15 set - i.e. 0x8xxx in hexadecimal.

Target Addresses - there are signed 13 bit addresses, which are used to force the processor to branch to a new subroutine address or jump to a new address. The jump in unconditional, but the branch may be conditional - in that the top of the stack needs to equal zero for the branch to be executed. This gives the processor a branch range of +/- 8192 addresses.

ALU Instructions.

The ALU has 16 possible instructions as controlled by a 4-bit field. Instructions of the type 0x6X00 are alu - where the X is the 4 bit instruction.

Code is Code

It might be worth stating that the entire operation of the processor is controlled by the various fields coded within the instruction. This is what makes machine language very powerful, and yet very easy to make mistakes. A single mistake in a field might send your processor off into an unintended area of RAM, where it can misinterpret your stored data as a program, and then start indiscriminately writing to RAM. Invariable this ends up as a system crash.

As writing in machine code has always been a thankless task and prone to mistakes, it is best to spend time writing an assembler to help "assemble" programs from the processor's instruction set.

Assemblers use human readable mnemonics such as ADD, OR, JMP and allow numbers to be entered in decimal or hex. The assembler will use a text file which contains the source code, and which can be edited using a text editor. This can then be processed by the assembler to produce a binary or hex file that may then be loaded into the RAM of the processor.

Assembly language is the first layer of abstraction above the processor's own machine language. As a tool it makes programming simpler, faster and less prone to mistakes.

These tools first stared becoming available in the early-1980s. Early 8-bit home-micros often had an assembler/disassembler available as part of it's toolkit.

An excellent reference book on Assemblers by David Salomon.

Forth as an Assembly Language

In the early 1960's, Charles Moore - the creator of Forth, realised that there may be a better way of writing programs, than the traditional assembler or high level language compiler method.

He knew that any program consisted of small snippets of code, each performing some small function within the program. These functions and routines would be stitched together with calls and jumps to form the structure of the program.

He came up with the concept of the Forth word, where the word is the name of such a function - for example SQUARE.

Running on the processor was a small interpreter program, which could take the text input and compile it into executable machine code.

The word SQUARE could be written at the keyboard, or typed into a text file, and every time it was encountered it would perform the function of calculating the square of a number.

For this to work, SQUARE had to be created using the colon definition method of defining new words - which is written like this:

: SQUARE DUP * ;

: This colon is the word that tells the interpreter that this is a new definition
SQUARE this is the name of our new word, and that will be put into the dictionary
DUP is a forth word that duplicates the top word on the stack
* multiplies the top two entries on the stack, leaving the product on the stack
; Semi-colon - this denotes the end of the definition and a return to the inner interpreter

For a much fuller explanation of how this works - have a Read of Brad Rodriguez' excellent article "Moving Forth"

Suffice to say, that the Forth system provides the assembly, compilation and run-time execution environment needed for a self contained system, and it does it in a user interactive manner.

This video shows a typical Forth work session.

N.I.G.E Machine

In another post, I describe my project to combine a simulation of the J1 Processor with a set of simple graphical tools to allow assembly, disassembly and memory viewing.

WTF!

Emulation is a useful technique - especially when you don't actually have the processor that you are writing code for.

In the Spring of 1975, 19 year old William Gates III did not possess an 8080 microcontroller, but he and friend Paul Allen had committed to writing a BASIC interpreter for the company supplying the new 8080 based Altair microcomputer.

Fortunately Paul Allen had written an emulator for the similar 8008, which ran on a PDP-10 mainframe at Harvard, and working nights for several weeks on the PDP-10, they managed to produce the first Microsoft BASIC product - and the rest is history.

Back in April, when I had a little spare time, I started to work on a program to emulate James Bowman's J1 Forth CPU - and it was the subject of my post "One Song to the Tune of Another".
At that time I had it running on a FPGA soft core - the ZPUino, and it was complete with a VGA display. I am now taking a step back to just isolate the J1 simulation part of that project, so that I can build it into a set of simple tools that I am developing.

Now as my thought processes are starting to converge, I thought I'd dust off the code and start to see how it will fit into my grand scheme for a stand alone code development system based on a J1 running on a FPGA.

James has put a lot of effort into writing his "swapforth" for the J1, but I am treating this as a learning exercise, so rather than use James's swapforth, I am setting about writing my own tiny language - it's the journey, not the destination I am interested in at the moment.

Not being as ambitious (precocious) as Bill Gates, I set my sights a little lower and in just 200 lines of code, I have a J1 emulator that runs on an Arduino. The code I am using has been adapted for the Arduino from Samawati's J1 simulator on GitHub.

Slow Forth

Not renowned for high speed or vast resources, the Arduino munches through the J1 code at a pedestrian 63,000 instructions per second. That's about 1600 times slower than an actual J1.
Slow, but nevertheless useful. I can now write snippets of assembly language to run on my "J1" and test them out.

The J1 machine code is stored in an array of 16 bit integers m[xxx ] set up in memory. As the ATmega328 only has 2K bytes of RAM, I kept the array size down to 768 words.

Here is the first J1 program - a simple counter

// Load up a simple count program into first 7 locations of the memory array m[ ]

m[0] = 0x8020; // LIT 0x20 (0x20 is the address of the count variable
m[1] = 0x6C00; // Fetch [0020]
m[2] = 0x8001; // LIT 1 We are going to add 1
m[3] = 0x6200; // ADD
m[4] = 0x8020; // LIT 0x20
m[5] = 0x6020; // Store
m[6] = 0x0000; // JMP 0000

Translating these 7 instructions into Forth we get

32 @ 1 + 32 ! followed by a jump back to the first instruction

Forth is clearly a little easier than assembly language, but note how the J1 instructions translate on a one to one basis into Forth, so validating the idea from yesterday's post about using the SIMPL interpreter to create assembly language - this is the next step.

Slightly Quicker Forth

Further experiments with a STM32F407 Discovery board - and ARM Cortex M4 clocked at 168MHz showed that the emulator would run at approximately 700,000 J1 instructions per second - about 1% of the speed of the proposed hardware.

Background

Inspired by Frank Carver's blog Raspberry Alpha Omega, I decided to dig up some of the work I did earlier in the year - to create a tiny Forth-like language that would run on virtually any processor.

My ambition is to have a language nucleus that resides in about 2K of memory which provides a means to debug and bootstrap an application with limited tools or resources. I imagine it as a common core, which can be accessed via a serial UART, and can be used right from the start of a project and form a foundation onto which an extendable application can be built.

Whilst the usual image of code development is typing into a text editor or IDE and then compiling before flashing the machine code into the microcontroller, my plan is to take a huge chapter out of Charles Moore's book and make my language interpreted and have the means to compile and edit code right there on the microcontroller itself. Indeed very Forthlike.

However over the years and through the various ANSI standardisation processes, Forth has become large and bloated - and that was never what Chuck Moore intended or wanted. So I am going to pick and choose from the characteristics of Forth, and come up with something very much simpler.

The plan is to have a compact language kernel which resides on the microcontroller - regardless of whether it is an AVR or ARM or a specialist stack processor burned into an FPGA. In each case, it will present me the same user interface and experience - for low level hacking or code development.

From a hardware developer's perspective, every microcontroller I work on needs to have the means to print to a terminal and waggle a port pin - right from the get-go.

However, this language need not just be for human interaction. As the commands are very compact, they lend themselves to being packetised, and sent from machine to machine by whatever appropriate communications channel - be it wireless, BLE, TCP/IP or 140 characters at a time via SMS or Twitter. It also allows a microcontroller such as that on the Raspberry Pi, to communicate with other task specific hardware - solely using a UART connection - the speed of control and interaction is not restricted to the speed of a few characters a second that a human can type.

Creating a Virtual Machine

For all this to work, we need to establish a virtual machine on the chosen microcontroller. The virtual machine could initially be coded in C, to run on the target, but later it can be created as a specialist soft core processor on a FPGA. On the Arduino, the virtual machine codes into about 2Kbytes - or 3K when you add the Serial.begin() function for UART output.

Once the virtual machine has been installed it will happily execute it's way through the memory on it's own, until it crashes or is reset. The challenge now becomes writing the low level inner interpreter application code in the assembly language of the virtual machine. This step is something that I will put off until I have generated a means of creating and assembling the language.

Txtzyme Revisited

To create an assembler, I am going to use the tiny Txtzyme interpreter, written by Ward Cunningham, which was the original inspiration for this project. It allows very basic parsing of a text buffer and then performs one of a series of function calls depending on the character typed, or read from the buffer. Numerical characters are converted into an integer and placed in a variable x.

For a simple implementation of an assembler using txtzyme , let's consider that the instruction word consists if 4 fields

Class Class Field

Literal 0x8000
Jump 0x0000
Jump if zero 0x2000
Call 0x4000
ALU 0x6000

For the first four of these - the assembler just needs to OR the class field with the literal number or the target address. It may be worth ensuring that the literal is constrained to 15 bits and the target address is constrained to 13 bits.

We will use the following characters to define the instruction class

# literal
j unconditional jump
z conditional jump when T=0
: call

For the ALU instruction there are 4 more sub-fields to populate depending on the nature of the instruction.

1. ALU op-code
2. Transfer field
3. Pointer field
4. Return field

ALU op-code

The 2nd nibble of the instruction word controls the ALU. It's 16 instructions are decoded thus:

0 t NOP
1 n COPY (T=N)
2 + ADD
3 & AND
4 | OR
5 ^ XOR
6 ~ INV
7 = T = !(T == N) Sets T to status of EQ flag
8 > T= !(N < T) Sets T to status of GT flag
9 / RShift
A d DEC (T= T-1)
B r r-fetch
C @ fetch
D * LShift
E d depth (shows dsp +1)
F u U<

So we have about 20 instructions - that fall into one of 5 categories

LIT - load the included 15 bit literal onto the top of the stack
CALL - call the subroutine at the enclosed 13 bit address
JMP - non-conditional jump to the enclosed 13 bit address
JPZ - conditional jump - only if the top=0
ALU - ALU and stack operations

Literal Instructions take the form 8xxx (in hex)
Jumps 0xxx or 1xxx
JPZ 2xxx or 3xxx
Calls 4xxx or 5xxx
ALU 6xxx or 7xxx if you include the "return"

So the plan is to adapt the txtzyme interpreter to convert text input into machine language in the form of the 16 bit instruction words.

The 3rd nibble of the instruction word insn controls the data flow from the stack to memory

N Insn[7] Top transfers to Next (2nd)
R Insn[6] Top transfers to Return
@ Insn[5] Next transfers to address pointed by Top
_ Insn[4] Not used

The lower nibble of the instruction word is used to control the incrementing or decrementing of the data and return stack pointers dsp and rsp. Pushes to the stack involve incrementing the dsp, whilst popping from the stack means that the dsp needs to be decremented. Some actions are stack neutral, and involve no net gain or loss in stack items.

The parentheses can be conveniently used to represent push and pop operations - memorable that you start with a push (left bracket) and end with a pop (right bracket)

( push ds
) pop ds

[ push rs
] pop rs

ds field

1 dsp++
2 dsp--

rs field

1 rsp++
2 rsp--

The basic core of the assembler which accept the text input and generates instructions as 16-bit hex words fits into under 200 lines of C.

Assembler Instruction set Summary

Implemented so far:

Arithmetic/Logical

t NOP
n Copy

+ ADD
& AND
| OR
^ XOR
~ INV

Comparison

<
=
>

r Right Shift
l left shift

d T - 1 (Decrement)

Memory

@ Fetch
! Store

Data Transfer

N T-> N
R T -> R
A T-> A

Stack Ops

( Push Data Stack
) Pop Data Stack
[ Push Return Stack
] Pop Return Stack

Literals

Keeping it SIMPL

Since May 2013, I have been slowly developing a tiny interpreted language that can be used to initialise and exercise hardware when developing with a new processor.

SIMPL is primarily intended to be a very low overhead language, requiring only a serial uart (or bit banged serial) for communication to a PC hosted terminal program.

Commands are in plain, human readable ascii text - with an emphasis on being easy to remember.

SIMPL is based on Ward Cunningham's Txtzyme interpreter - originally for Arduino - but ported onto several other microcontrollers - as it is written mainly in C.

The kernel or SIMPL interpreter needs only a few resources:

2K bytes of program memory (Flash)
35 bytes of RAM
UART getchar and putchar functions
microsecond delay
millisecond delay

On the Arduino these delays are provided by the delay() and delayMicroseconds() functions but can be provided with simple delay loops.

Once you have this 2K of code on-board, you can then start to add it more functionality - that is tailored to your particular application.

Slimming Down the Interpreter Kernel.

As originally written, Ward Cunningham's Txtzyme compiles to 5032bytes of flash and 209 bytes of RAM. (The exact number of compiled bytes may vary on what version of the Arduino IDE you are using).

As it made use of several of the high level functions available in Arduino - such as Serial.print, digitalWrite etc, - it was certainly not optimised for codesize.

I rewrote and enhanced the interpreter - so that now it fits into just short of 2048 bytes, and is written in more generic standard C for easier porting to other processors.

I have also added more functions including arithmetic, bitwise logic and memory operations.

I am sure that if the routines were handcoded in AVR assembly language, that further reductions in codesize could be achieved. However, I wanted a useful kernel that would fit in 2K and was easy to understand.

I have placed the SIMPL kernel here as a Github Gist.

Growing the Kernel

It has long been my intention to make SIMPL an extensible language, and so for this approach I have chosen to use some of the ideas used in Forth.

The kernel can easily be extended from some 30 basic functions to about 85, just by extending the switch/case statement that forms the basic subroutine calling mechanism at the heart of the kernel.

I keeping with Charles Moore's philosopy of "Problem Oriented Languages" the kernel of SIMPL may be extended in whatever way needed for solving the problem, and should as such be considered to be a minimum common starting point - for any cpu.

Once the 2K core of the kernel was established, it was time to add in the extra functionality that allows users to add their own functions. This is done in the spirit of Forth - but with certain limitations to keep the code size down. However, with the added functionality - the code grew from 2Kbytes to 3982 bytes. The main difference is in the amount of RAM that is used - the extra code allocates a User RAM array of 1248 bytes.

If you would like to look at the code and try it out on an Arduino - I have created a Github Gist here.

If you are using a standard Arduino with the LED on Pin 13 change line 64 to:

int d = 13; // d is used to denote the digital port pin for LED operation

As this is a work in progress - more details will emerge in a later post.

In the last post, I explained how I had slimmed down the kernel of SIMPL - at the same time removing much of the Arduino specific code so that it would fit into 2Kbytes of Flash plus a lower RAM requirement. This also makes it highly portable to other microcontrollers.

My intention was to be able to put an image of SIMPL onto any microcontroller target system that I happened to be working on at the time - and give myself a friendly, predictable environment with which to exercise the hardware. In some cases, SIMPL could even be loaded into the bootloader space of a processor - so that it was always accessible.

SIMPL fundamentally allows interaction with the microcontroller, because of it's interpreted nature. The interpreter is flexible enough to form the basis of a series of simple tools, such as cross assemblers, debuggers and simulators. It is, whatever you want it to be - you have absolute control over what action the cpu performs in response to your key strokes.

Kernel Functionality

SIMPL communicates with a PC using a serial UART interface. It can be driven from any terminal application.

It really only needs getchar() and putchar() routines that interface with the on-chip UART.

These together with a printnum() function which prints out a 16 bit unsigned number are all that is needed to communicate with the PC in a meaningful manner. It's old-school, but it works - and easy to set up, on almost any microcontroller or SoC device.

SIMPL is a low overhead program - a kind of interactive tiny OS, that only takes a few Kbytes, yet provides all the means of accessing and controlling the micro.

A brief list of functionality.

The digital I/O is limited to the writing to or reading from a single I/O pin. In most cases this will be one that supports a LED. The I/O functions can be extended to whatever is needed by the application - for example in one application - an LED chaser display I needed to write a 13 bit number to an array of LEDs each connected to an output pin of the Arduino.

Analogue Input (ADC) and output PWM functions may be ennabled if required - but these will add approximately a further 600 bytes to the kernel code.

The kernel uses the delay() and delayMicroseconds() functions to allow accurate timing of I/O operations. With these the microcontroller can generate pulse sequences (up to 100kHz on Arduino), generate musical tones, sound effects or animate LED displays.

As well as the functions that interact with the hardware peripherals, SIMPL also has a range of arithmetic and bitwise logic operators to allow simple integer maths and logical decision making.

There is a simple looping function which permits a block of code to be repeated - up to 32K times.

Recently added functions allow the printing of strings and the manipulation of ascii characters.

Extend-ability

On top of the 2K kernel core is some further code which allows the user to define their own functions and store them in RAM. Up to 26 user functions can be defined under the current system. It's not exactly Forth - but borrows a whole lot of ideas from that remarkable language.

The system could be extended to include SPI or I2C functions to exercise specific peripheral chips or access a microSD card for program storage.

One of my designs "WiNode" is an Arduino compatible target board but with 433MHz wireless module, external SRAM, RTC, motor driver/ speaker driver, and microSD card. SMPL may be used to exercise and interact with all of these peripherals.

32bit Math Extensions

This was remarkably easy to implement. By re-type-ing the x and y variables to long - it forced all of the arithmetic routines to 32 bit. Whilst this pushed the code size up by about 900 bytes - some of this was offset by rewriting the printnum() function as a 32 bit printong() and deleting the original printnum(). The code now stands at 4826 bytes and can be found on this GitHub Gist

This means that SIMPL can do 32 bit integer maths - straight form the can.

Whatever Next?

SIMPL has been an ongoing project for over 2 years, and as it has developed - so have my C coding skills. As the code becomes larger, things become easier - as the switch from 16 bit to 32 bit integer maths has proven - it was literally a 10 minute hack.

I am very aware that SIMPL is not yet a fully fledged language - it can flap its wings and tweet a bit - but is not ready to leap out of the nest and fly. Perhaps I am a bad mother bird - too eager to experiment with new ideas rather than concentrate on the basics. Time will tell.

I have ported SIMPL to STM32Fxxx ARM microcontrollers and seen a 25X increase in speed. Now their are ARMs that run at 240 and 300MHz (Atmel SAM E7) that will give even more of a performance boost.

The final intention is to create a SIMPL virtual machine (SVM) that can be hosted on nearly any micro - including FPGA soft core stack proessors - such a James Bowman's J1b. With these we hope to see a large leap in performance.

In the meanwhile, I still have an Arduino plugged into my laptop - as my preferred development platform - if it will run on Arduino - it will run on everything else a whole lot better!

Next Time - more uses for the SIMPL Interpreter.

The First Rays of October Sunshine Bathe The Llanfrothen Bothy

This week I have been off work and attending various events around the country.

The weather has been exceptional - i am so lucky to have picked a week off work with such fine sunny autumnal weather

Last Thursday I went to an ARM Cortex M7 programming course near Cambridge.

On Friday, I hung out with Andrew Back and Omer Killic in Hebden Bridge and discussed plans for a new video frame store and imaging system for his 1985 Cambridge Instruments Scanning Electron Microscope.

Saturday was oshCamp 2015 (Open Source Hardware Camp) - 2 days of tech talks and workshop sessions at Hebden Bridge Town Hall. This was the first event of the week long @wutheringbytes Digital Festival held in Hebden Bridge.

On Monday I attended the morning session of "Open for Business" and then spent the early afternoon teaching a 75 year old pensioner the basics of programming Arduino.

In the late afternoon I drove to North Wales to a remote rural bothy near the village of Llanfrothen. I then spent nearly 3 days developing a micro solar inverter with good friend Trystan Lea of @openenergymon.

I have returned to Redhill, Surrey and expect to return to work tomorrow for a well deserved rest.

More posts about the above to follow.

Micro-Solar - A New Approach to Increasing the Installed PV Capacity

Introduction.

Recent changes in the solar grant scheme have had a very negative effect on the UK's fledgling solar pv industry. The rug has been pulled out from under the feet of all those that set up installation businesses. The FITs have been decimated - and so now there is virtually no reason why anyone would make a large, long term investment in a permanent pv installation.

Feed in tariffs (FITs) have been reduced to the point where there is no longer an incentive to invest in a photovoltaic solar installation. Our current Government seem to be more interested in kickstarting the UK fracking industry and selling off large sections of our critical electricity generation infrastructure to the Chinese.

The reduction in FITs to just 1.63p/kWh in January 2016 is going beyond miserly, and at such levels completely removes any economic reason to export to the grid.

Without the grid export option, the pv output can only be used locally, and if used efficiently will help reduce the amount of power consumed from the grid.

In this post, I propose a new way to look at solar pv, in a way that could appeal to a much greater customer base, and at a price that is much more affordable.

A single 250 W pv panel, could reduce your electricity bills by up to 10%, and if repeated by millions, rather than 10's of thousands of consumers around the country, it would significantly increase the installed solar capacity.

There are many potential customers for micro-solar. Those who can afford an expenditure of up to £1000, but not the £6K - £10K for a full system. With the FITs gone, the market for larger systems will have evaporated.

Sources of PV Panels

A recent web-search revealed small solar systems being made and marked in China for about £750, complete with inverter, battery and controller.

If you choose to shop around on Taobao (Chinese ebay) you can find 250W panels for about £60 each. Even if these prices doubled by the time you had shipped to the UK.

If you want to search for your own panels - the Chinese term is 250W 太阳能板 - Happy Hunting!

Economics.

A unit from the grid (Southern Electric October 2015) is currently 14.0385p (incl VAT). My annual consumption is approximately 2400 kWh.

A small system consisting of 4 x 250Wp panels would yield between 850 and 900kWh per annum, displacing about 35% of my grid consumption.

If we think that 800kWh of this can be used in the home, it could reduce the incoming electricity bill by
£112 per year.

Importing 250W panels from China at £0.50 per peak watt, means that the system could recover its costs in 5 years - without a grant or FITs, nor the additional expense of a grid-tied inverter and professional MCS installation.

The Case for Microsolar

I will define a Micro-Solar system as an installed system of 1000Wp or less. With currently available panels this would consist of up to 4, 250Wp panels. These panels are typically 1.6m x 1m and weigh about 21kg.

The emphasis being that microsolar should be small, cheap and portable. Portable in the sense that it's not a permanent installation, can be deployed on a wall mounted bracket, especially if roof access is not available, and can be moved from property to property a required - a benefit for young adults in the rented sector. As it is a small system, typically it would be a DIY installation, not requiring specialist tools or equipment. One attractive use would be on a south facing balcony, with the panel securely clamped to the balcony rail.

As the system is small, it is essential to get the best use from it, and the best conversion efficiency. This will entail the use of high efficiency dc/dc converters and LiFePO4 batteries used for energy storage.

Whilst we are all familiar with ac mains electricity, and almost all of our appliances and consumer electronics products are intended for ac plug in use - for some products the ac is an inconvenience, and a huge source of inefficiency. With the rise of portable computing products, smart phones and other mobile gadgets - these increasingly require a 5V charge - from a standard "USB" style charger.

If you are interested in USB chargers - this excellent post covers them in great detail, in terms of efficiency, power quality and safety.

The underlying message is that small ac powered USB chargers are at best only 65 to 80% efficient, and this is a figure that has a lot of room for improvement. Direct dc/dc conversion could improve this considerably. Starting with a 12V input, this can be converted with 92% efficiency to 5V using a synchronous switching converter - such as this one from ON Semiconductor.

With the increase in electric bike technology - LiPo battery packs are available with high capacities and low cost. The real benefit of Lithium battery chemistry is that it has a very high charge efficiency. Packs of welded lithium cells are typically available in 350W to 1kWh capacities.

What can you power with 1kWh per day?

Several years ago, when I was working from home, I mused on the idea of a home-office workspace that ran on just 200W. At the end of that post, I speculated that perhaps even 100W was possible. In the 10 years since that post we have had the benefit of more power efficient laptops and netbooks, LCD monitors, LiPo batteries and LED lighting. I now believe that I can run the same work environment on an average power budget of just 100W - and that puts home working well within the reach of a microsolar installation.

Whilst thinking about the e-bike battery packs, it occurred to me that a lot more people are cycling these days, particularly within our urban cities. An e-bike consumes typically about 20Wh per mile, and so a 36V 10A pack could offer a range of about 15 to 18 miles between charges - depending on terrain and how much you pedal. By using several e-bike battery packs as the basis of the modular battery store, it is possible to have a freshly charged pack every morning. Three or four interchangeable packs would form the system, so on any day there is always about 1kWh of storage.

Having arrived back home after a day at the office, the battery is fully charged and ready for the evenings use. This would involve recharging of portable computing devices, smart phones and LED lighting in the evening.

A quick check on the specs of a 43" Samsung smart tv - showed a 51W consumption - so 6 hours of TV gaming or web-browsing in the evening is going to be well within the capability of a microsolar system.

What about the Winter?

Microsolar systems will run at much reduced capacity during the Winter months.

The graph below shows monthly output, for south facing panels, in southern England, averaged over 3 years (2012-2014) - scaled to reflect the output of a single 250 Wp panel.

Normalised output for a 250Wp panel

It shows that useful output is available from March until September - approaching 1kWh per day, but for 4 months of the year, you are getting only about a third of the summer peak. If you are reliant on the solar contribution to charge your bike pack, then you will need to find an efficient means to recharge your pack from the ac mains during the winter months.

Fortunately, the designers of switched mode power supplies have made significant advances over the years in improving power supply efficiency. This is particularly important in the server farm and telecoms applications where efficiency and reliability is paramount. These efficiency improvements have filtered down to the PC power supplies - and you can now buy a desktop PC supply with better than 95% efficiency.

Combining a high efficiency psu with a LiPo battery bank, means that you can top up your store when required, at maximum efficiency. It also helps maintain your output on cloudy days.

In summary, all the components for a high efficiency microsolar system are available and affordable.

In the next post I will go into some more detail of the proposed system.

Bothy-Hack - A Micro-solar inverter based on Arduino

About once a year, I get the opportunity to spend some time with my friends from @openenergymon in North Wales. This year, having attended the oshCamp in Heben Bridge last weekend, I took advantage of the glorious late September sunshine to cross over to southern Snowdonia, to a rural bothy outside the village of Llanfrothen, to spend a few days working on some new projects with Trystan Lea.

Trystan had expressed an interest in building from scratch, a low power inverter. This would take the dc output from a low-wattage solar pv panel and create a stable 50Hz, 230Vac mains - suitable for powering small items of equipment. So after a couple of beers and some tech discussion over a pub meal on Monday evening we set about beginning our micro-solar inverter project.

Open Source - Easily Built, Easily Repaired

There have been several inverter designs published on the web, but they are either crude, square wave or modified square wave and based on beefy bipolar transistors.

The intention was to make the design easily accessible to others, with the intention of using familiar and easily sourced components - available to hobbyists everywhere. The project was going to be open sourced, hopefuly with professional pcbs coming a bit later - so that others could follow our work.

We wanted to make a design that uses readily obtainable N-type FETS and an Arduino (more strictly a ATmega328P-PU on a breadboard) to generate the PWM signals and provide simple circuit protection, and load sensing. With the PWM signals generated in firmware it can easily be modified for 50Hz or 60Hz operation, either 115V or 230V operation and a wide range of battery input voltages.

We imagined that the final design could consist of an Arduino, an "Inverter Shield" containing FETs and driver ICs configued in a H-bridge and some voltage and current monitoring circuits. To make the inverter a 12V or 24V battery (or PV panel) and a 12V (or 24V) torroidal transformer would be added.

As we really only had 2 days to work on the design, we decided to make a simple proof of concept prototype, which could later be refined.

We are happy to receive suggestions from the wider community - in the hope that the basic design will evolve into an efficient unit.

Planning

The steps of the primary project were planned as follows, the time available was about two and a half days:

1. Use a breadboard "Arduino" to generate the 50Hz sinusoidal pwm waveforms needed to drive the FETs.

2. Breadboard the FET driver ICs and the 40A 55V FETs for ininial testing with a 4VA step up transformer.

3. Build up the FETs on stripboard - with substantial current handling tracks and heatsinks.

4. A series of tests with different ac loads, with both 12V battery and pv input power.

5. Documentation and blogposts.

A secondary project was to build a simple energy monitor - again using a "breadboard Arduino" which would measure the dc output of the pv panel, and allow us to perform efficiency tests on the micro-solar inverter.

As a fall-back position, I had brought along a dc motor driver board I have been developing at work that uses an ARM Cortex M4 processor and a 12A 24V H-bridge. I wanted to have a go at repurposing this board to make a simple 50Hz inverter (and succeeded!).

Implementation.

Step 1 was fairly quick to achieve, because I already had some Arduino code to generate an 8-bit sine waveform, using "Fast-PWM" - which appears as complementary pwm outputs on Arduino Digital Pins 3 and 11.

Trystan had already built up a "breadboard Arduino" - so it was relatively simple to program this with a FTDI cable, and then test the outputs for frequency, using a low pass filter to reconstitute the sine waveform for the oscilloscope.

The next task was to build up the 4 FETs that form the H-bridge onto a breadboard, and wire them up to the IR2110 driver ICs. These ICs produce a level shifted drive waveform, so that N-FETs can be used in the upper arms of the H-bridge - with reduced on resistance and therefore improved switching efficiency. The driver ICs are designed to supply the high currents - both source and sink, required to turn the FETs on and off - quickly.

Breadboard construction is not ideal for building fast switching power electronics, and getting the driver ICs to work reliably was probably the biggest challenge of the project.

However, by 9:45pm on the first night, Trystan had the inverter running and lighting an ac powered LED bulb.

The LED lamp in Trystan's right hand was first signs of a working inverter!

The ac waveforms on the scope were not in great shape, so we added a couple of 0.33uF 250V capacitors, connected in series across the ac output - that cleaned things up a lot!

Scope waveform - not great at first

Until we added 0.33uF capaciors across the mains output

Even with a 4VA step up transformer - the LED lamp was ultrabright!

So we retired at the end of Day One - with a working inverter, and the plan to characterise it and improve it on Day Two.

In the next part, I look in more detail at the design and performance. Later there will be links to the schematics, pcb layout and the Arduino code used to drive the FETs.

In the first part of this "Open Inverter" series, I described how Trystan and I had cooked-up a simple inverter based on a mosfet H-bridge, an "Arduino" and a 12V-230V mains transformer.

Before going into too much technical detail, (as I am still documenting it), I first wish to explain why I think that the combination of microcontroller and H-bridge is an essential building block in modern power electronics, and the ability to efficiently transform ac to dc, dc to dc, and dc to ac are paramount to the renewable energy sector.

FETs capable of switching moderate power levels are available surprisingly cheaply. The ones we used in our inverter were under £1 each. The driver ICs (IRF2110 or similar) are a couple of quid each.

So, it's possible to make up the H-bridge stage and drivers for under £10 - and that includes some heatsinks.

However, the ubiquitous H-bridge is also available in the form of an IC - or rather 2 ICs - as most implementations appear to use half H-bridges.

Infineon make a range of these - intended for automotive motor control, and so can handle high currents, but generally at 28V maximum. This makes them suitable for 24V battery systems.

The BTS7960 is typical of the Infineon range. It has a maximum voltage of 28V, but with correct heatsinking can switch up to 43A. Theoretically, a pair of these devices would be capable of running a 1kW inverter - but I would be happier in the 250W to 500W range.

It includes over-temperature, over-voltage and current limiting built in. It also outputs an anaogue signal proportional to the drive current - which can be used to monitor the performance.

Ebay is a good source for ready built modules containing a pair of BTS7960 devices. I bought a pair of these for about £8 each.

A low cost BTS7960 H-bridge module from Ebay or TaoBao

The BTS7960 is also available as an Arduino shield - called the MegaMoto shield, which holds a pair of BTS7960 plus jumpers for easy selection of which pwm to drive them from. At £32 its a bit overpriced, but handy if you are already working with an Arduino platform.

Other BTS7960 boards are availabe from Taobao - of varying design and quality - with or without heatsinks - but I consider adequate heatsinking to be essential.

This application note for the BTN8962 - a newer, related family member - gives good details on how to get the best from these devices.

Making the H-Bridge Work for Us.

If you look at a typical FET H-bridge, you will see that each FET is bypassed with a reverse biased diode. This is sometimes called the body-diode, and it comes for free, as part of the process of implementing a FET on the silicon substrate. It is tremendously important in protecting the FET from inductive switching over voltage spikes - as it returns them safely to the supply rails, but can also work in our favour in allowing easy implementation of rectifier and boost converter topologies.

A half bridge like this one may easily be turned into a boost converter by supplying dc power into the terminal marked OUT, via a series inductor, and extracting the boosted voltage from the terminal marked VS.

Conversely, a buck converter can be made by applying power between VS and GND, and extracting a reduced voltage between OUT and GND - again through a series inductor.

So two of these half bridges, a couple of inductors and you have all the makings of a boost- buck dc/dc converter.

Why would you first boost a voltage, only to buck it down again? Well if it was the varying output voltage of a solar panel which drifted depending on clouds - it would be handy to boost it so as to properly charge a battery pack at the correct charge voltage. Then you might want a stable 12V or 5V supply for powering some equipment - in which case you would buck the voltage down again.

This ability to transform dc power up and down, with high efficiency, or match the varying dc output of a pv panel - so as to capture peak power, is very important - and it can all be done with the H-bridge controlled by an 8-bit Arduino.

The Tasks of the Arduino.

In the 10 years, Arduino has become a familiar and accessible microcontroller platform. Even though it is only an 8-bit, 16MHz device, it can still be used to great effect in power electronic applications.

We built up "breadboard Arduinos" which closely follow Cefn Hoile's Shrimp design. Essentialy a ATmega328 IC with crystal, reset and FTDI cable header.

Shrimp - a minimal breadboard "Arduino" - by Cefn Hoile

The Arduino has to generate complimentary PWM in order to drive the H bridge. In some applications independent PWM channels may be needed to control each side of the H-bridge separately.

In addition to pwm generation, the Arduino should also monitor current, voltage and load regulation.

By making use of "Fast PWM" and "Fast ADC" on the Arduino, the ATmega328 can achieve quite a lot of control whilst generating the sinusoidal pwm.

Our first task on the Arduino is to generate a sinusoidal signal using "Fast PWM".

For those who are eager to experiment, I have created a Github Gist containing the sinusoid pwm generation sketch. This produces complimentary pwm on digital pins 3 and 11 - just what you need for driving H-bridges.

To test this routine, make a low pass filter from a 10K resistor and 100nF capacitor and attach to either digital Pin 3 or 11. This will reconstruct the sine wave from the digital pwm waveform - and give you a scope trace similar to that at the start of this post.

In the next part - I will have the schematics for the FET version of the inverter for eagleCAD. In the meantime I encourage readers to try and get the Arduino or Shrimp to produce a 50Hz sine waveform.

A 12A 24V motor drive H-bridge pcb repurposed into a 64W micro-solar inverter

Further Thoughts

Welcome to Part 3 of this series of posts regarding the Open Inverter - a micro-solar inverter.

The H-bridge is fundamentally a 2 port, power control device, similar to it's cousin the bridge rectifier. Instead of passive rectifiers the H-bridge consists of 4 semiconductor switches that can be controlled under firmware - and as such is much more versatile than the diode bridge. However, unike the diode bridge, it has electrical symmetry - and so power can flow in both directions - and we can use this unique ability to our advantage. The H-bridge becomes a versatile power transformation device.

Consider the H-bridge to be like a black box, that in it's simplest form has 2 ports A and B, to which power sources - or sinks, can be connected.

In the photo above, dc power (15.949V 3.97A) from a 150W solar panel enters from the right hand side on the red and black wires.

It is then converted by the H-bridge under firmware control into a pwm 50Hz ac signal, that feeds the mains transformer - connected to the left hand side - by white and black wires, and is then transformed up into 230V ac mains .

Thus a board designed for dc motor control has been repurposed into being a micro-solar inverter.

Depending on how the H-bridge is controlled, it can:

1. Transfer power in either direction between ports A and B,
2. Rectify ac to dc
3. Synthesise ac from dc
4. Transform dc up and down in voltage and current
5. Provide a variable impedance load - for load matching and peak power tracking.

All from a $20 module - Wow!

As I document the project so far and put my thoughts down into words it's becoming increasingly apparent that a versatile H-bridge controlled by a low cost microcontroller has a multitude of uses amongst the hobbyist community - a few here:

Solar Inverters
Step Up (Boost) dc/dc converters
Step Down (Buck) dc/dc converters
Boost-Buck Converters
Split Pi Converters
Synchronous rectifiers
Solar PV - peak power tracking
Battery chargers - high efficiency charging of consumer electronics and portable PCs
Load balancing
DC Motor Control for pumps, solar trackers, machine tools, vehicles (bikes etc)

However, the basic H-bridge design can then be further extended to 4 or more ports. By adding an extra half-H Bridge, 3 phase applications become practical:

3 phase ac or brushless dc motor drives
Solar boost peak power tracking inverter

Furthermore, each half-bridge can be regarded as a power port - where power may be supplied or removed to/from the system. This means that the H-bridge can be seen as a 3 port device - and in this mode it has applications in some bi-directional boost-buck dc/dc converter topologies - such as the split-pi converter.

If it is made in a modular fashion that can be extended to cope with more sophisticated or power hungry applications - then it will be a lot more versatile.

So it sounds like the world is in need of a low cost, open source, versatile H-bridge power converter. If it can be made for under $20, it can appeal to a whole variety of price sensitive applications.

A Modular Approach

An H-Bridge Module like this, connected to an Arduino is very versatile

Today it is time to make some fundamental design decisions regarding the inverter power stage.

There are 2 main options:

1. Traditional N-FETs with driver ICs.

2. Half-Bridge ICs - such as the Infineon BTN8962

Whilst there are many mosfet driver ICs, only a few work at 30V, which is essential for a nominal 24V battery supply. - eg Microchip TC4431/32

These are available in DIP for easy self-assembly. The advantage with option 1 is that you can fit whatever FETs you have available - depending on your maximum voltage and current requirements.

Option 2 uses the BTN8962 or BTS7960 integrated driver and half-bridge ICs from Infineon. These are available as ready made modules from China, at a price cheaper than they could be made here - and might appeal to some experimenters.

So in order to make a design decision, to further the project- I am going to suggest is a traditional FET board which is footprint compatible with the Chinese module.

Regarding the "Arduino" part of the design, this could be built on stripboard sized approximately 5cm x 5cm which is then stacked underneath the FET power module on hexagonal spacers. Header connectors would connect up to the power module. I believe that this would be a suitable option for home construction.

Additionally, I am looking at a pcb layout for the inverter's mcu section. This would use the ATmega328 on a pcb sized about 5cm x 5cm so that it stacks below either the discrete FET board or the Chinese BTS7960 motor drive module.

5 x 5 cm is a good size as it allows for additional circuitry & connectors, a 5V regulator and 5x5 boards are very cheap from dirty-pcbs.com. Using the standard 5 x 5 or 10 x 10cm boards - these can be made stackable with as many power stages as required. In theory, an inverter could be built up, stage by stage, to allow for possibly 1000W.

Wireless Control and Monitoring.

For several years I have been using designs that incorporate a RFM 12B or RFM 69 wireless module. These modules make use of Jean Claude Wippler's Jee Libs - a wireless protocol devised for communication between low cost wireless nodes.

Jeeibs has been adopted by my friends at Open Energy Monitor - for communication between their wireless sensors and energy monitors - and a base station - often web conected, to their cloud based analysis and energy visualisation package, emonCMS.

By including a wireless module on the Open Inverter MCU board - it ensures emonCMS compatibility - allowing remote monitoring and control of the power transferred by the inverter - using emonCMS.

As well as the micro-solar inverter, the Open Inverter boards could be used for pv peak power tracking, LiPo battery charging/monitoring, and dc/dc conversion for the various voltage outlets of the "dc ring main". They can be used anywhere that power is generated or converted and report back the individual power transfers to emonCMS.

Designing the H-Bridge

In today's post, the fourth in this series I start to look at some of the practical aspects of the Open Inverter design - starting with the proposed H-bridge and the choice of driver IC and heatsinks.

Sketching out the schematic, choosing the components and laying out the pcb is about 2 days work.

A typical N-FET H-Bridge with HIP4082 Driver IC and current sensing

The circuit above shows a typical H-bridge arrangement based on N type FETs and a readily available H-bridge driver IC - the HIP4082 from Intersil.

The circuit will fit easily on a 5 x 5cm pcb, and with the correct choice of FETs - and heatsinks will handle 50V and 50A .

The FETs are in standard TO220 packages, and there is a wide range of FETs that could be selected depending on application.

The 2 main parameters for FET selection are the maximum Drain-Source voltage Vds, and the maximum Drain Current Id.

For safe operation at 48V it would be worth using 100V FETS, and 70A current handling - such as the Infineon 70N10.

It should be noted that the gates of the FETs contain a pair of resistors and a diode. These components limit the current supplied to the gate, control the turn on time and the diode improves the turn-off time. They also help protect the driver IC from damage and oscillation of the gate drive outputs.

The bill of materials for this H-bridge will be around £10 in 1 off.

The Intersil HIP4082 is an economical way to drive an N FET, H-bridge

Above is the minimum circuit required for the HIP4082, this one with optional current sensing. The current sense resistor Rsh is typically 10 milliohm or even 1 mOhm - if you are working with high currents.

It should be a 1206 package or several resistors in parallel. 10A currents across 10 mOhm will dissipate 1W in the resistor. Special, precision metal current sense resistor links are available for this purpose. The gain of the op-amp should be chosen to suit the full scale input of the ADC on the microcontroller (typically 3V3 or 5V).

Typically the FETs will have about 10mOhm drain-source resistance when turned on. It is this RDSon that is the major dissipation of power in the FET. Remember that the power dissipation in the FET is the product of the square of the drain current Id and RDSon.

For example 20A passing through 10 mOhm will dissipate 4W in that FET, but 50A will dissipate 25W - and also 25W in the FET that forms the other active device in the H-bridge. The heatsink must me capable of removing this heat without the die of the FET overheating. A heatsink of about 3 degrees C per watt will be needed to safely handle these levels of heat dissipation.

This one from Farnell (below) is fairly economical, and both the upper and lower FETs can be bolted to it - provided that they are insulated from each other and from the heatsink. The thermal performance is 2.6C per Watt provided the heatsink is placed vertically.

The heatsinks are quite chunky, and I have had to exploit the 17.02mm wide channel for the placement of the gate drive and current sensing components. Placing the two mosfets back to back can make for a reasonably compact arrangement.

Heatsink Farnell 1699368 has 2.6 C/W

Putting it into Practice

First draft pcb layout in eagleCAD Size is 50mm x 50mm

The circuit at the top of this post has now been laid out on a 50mm x 50mm 2 layer pcb. Special attention was paid to maximising the copper areas that carry the high currents. These were laid out as polygons rather than traditional tracks. There are partial ground planes on both the top and bottom layers and these are "stitched" together with many vias which help carry the large currents - and in the case of the voltage regulator help dissipate the heat from the topside to the bottom.

The layout proved to be quite tight, and a dual op-amp such as an LM358 might be preferable to the quad LM324 making a bit more room in the centre of the pcb.

Two heatsinks like the one shown in the drawing above are fitted. The heatsinks overhang the top and bottom edges of the pcb by about 10mm.

The outputs of the inverter (to the 24V- 230V transformer) are on the left. The battery or solar panel inputs are on the right.

A 10 way connector, PL1 allows the "Arduino" to be connected.

The connector in the centre of the right hand side is to accept an external capacitor C1, as only 1000uF at 50V will fit on the board.

As the driver chip runs on 12V, a 78M12 regulator in a TO252 package is included. Maximum voltage for this regulator is 35V.

Today was a day in the lab to try out the various H-bridge modules I have access to, plus the toroidal transformers, which are more efficient than the E-core types that Trystan and I were using last week.

Low Cost IBT-2 Modules

Some months ago, I bought a pair of the IBT-2 H-bridge modules from an ebay seller. These are advertised as 43A and 24V. They use the now obsolete Infineon BTS7960 half H-bridge IC, that I discussed in Part 3.

The IBT-2 Module uses the Infineon BTS7960 half-bridge module

These boards are 50mm x 50mm and the holes (3.2mm) are on 45mm centres - this is an ideal size to take advantage of the low cost 5 x 5 cm pcb services being offered. I paid about £7 each for these - including free shipping from China! There are several variants of this module around, as it has been widely copied in China by various manufacturers.

It uses two of the Infineon BTS7960 half bridge modules, and a 74HC244D buffer-driver IC to provide some isolation between the "Arduino" and the H-bridge.

Not entirely visible is the anodised aluminium heatsink that is screwed to the underside of this module - to take the heat from the BTS7960 devices.

The BTS7960 is quite easy to drive - it has a single PWM pin an an /INHibit pin - active low. Only 4 port lines are required from the Arduino to drive this module.

At first I was getting some very unusual waveforms from the output tab of the ICs. I soon found out that they are limited to a 25kHz pwm signal - and I was using 62.5kHz. I made a quick change to the PWM timer control register - to reduce the PWM to 7.8125kHz and all was well - I was getting good clean sinusoidal signals from the IC tabs - with the scope set to 1kHz low pass filter mode

Connecting the Transformer

I happen to work for a company that uses a lot of toroidal transformers of various VA capacities. Today I selected our smallest and cheapest - which is 120VA with a nominal 24V rms secondary and 5A current.

Our standard 120VA toroidal transformer has a nominal 24Vrms secondary winding @ 5A

The transformer is intended for both 115V and 230V operation - so it is a case of connecting the split primary windings in series in order to get 230Vrms output. If you get them the wrong way, the phases wil cancel out and you will see no output. If you want 115V, you have to get the primaries the correct way around in parallel - otherwise you will short them out - which is bad :-(

In initial tests, I found that the "Magnetising Current" for this particular transformer was 0.22A from a 24V dc supply. So the inverter burns 5.28W when idle - before an ac load is attached.

The BTN8960 H-Bridge

My company makes products that uses brushed dc motor drives - between 100W and 600W.
Earlier this year I designed an experimental board o use the newer Infineon BTN8960 half-H driver IC. This replaces the BTS7960 - which is now end of life - and becoming harder to find.

On the left is an "Arduino" providing the pwm drive signals to the BTN8960 H-bridge board on the right.

The "Arduino" board is a ATmega328p-AU with 16MHz crystal, reset circuit and FTDI connection. Most of the I/O is brought out to connectors for easy plugging - this allows simple generation of 50Hz complimentary pwm sine waveforms using Timer 2.

The board on the right is an experimental motor drive board I cooked up earlier this year to evaluate the BTN8960 devices for dc motor control. The BTN8960 devices, IC1, IC2 are located in the middle of the upper and lower edges of the pcb - with the large dc-link capacitor located between them. A thermistor (thin yellow wires) allows the temperature of the upper BTN8960 to be measured. This board has no external heatsinking - but relies heavily on large "flag" copper areas both on the upper and lower surfaces of the pcb to dissipate the heat.

The 24V dc input to the board is on the lower left, and the output to the toroidal transformer is on the right edge of the pcb. The orange device is a relay which allows the toroid to be disconnected from the transformer. The board also includes a LM2576 5V "simple switcher" voltage regulator - for powering the microcontroller.

Here we see the full set-up.

The Driver Board, Toroidal Transformer and switched socket outlet complete the prototype Inverter.

So that was the state of play at 6pm this evening. I had the opportunity to connect my Weller soldering iron station to the output of the transformer. Off load the mains ac output was 240V dropping to 238Vrms when the soldering iron was plugged in. It used 24V dc at 1.15A from the 24V bench supply to power the iron.

More Testing Tomorrow.

Today I was lacking in suitable 230V loads to try. Tomorrow I have a bunch of 240V 60W incandescent lightbulbs to try out and slowly push this inverter up in power output to characterise its performance and efficiency.

130 VA Toroidal Transformer Inverter using BTN8960 H-Bridge ICs

It's been a frenetic week of progress on the Open Inverter project. Both Trystan and I have managed to cobble together inverters from FETs, H-bridge ICs and other random, readily available modules - bought cheaply from China.

I have powered up the inverter with a DC input of 24V close to 5A and successfully powered a couple of 60W 230V lightbulbs. The efficiency looks promising with neither the H-bridge nor the transformer getting anything more than warm.

In this post, I pause for thought to decide upon the future direction of the project based on our findings, so far in this week of discovery.

Here's the current wish-list:

1. An Open Source Inverter, of modular construction that is scalable in blocks of 125W or 250W.
2. Built from readily available, understandable, low cost electronics.
3. Rugged, robust, reliable - delivering reasonable efficiency and power quality.
4. Grid synchronisable - if required - will synchronise to external source.
5. Built in power monitoring, with wireless communications compatible with emonCMS monitoring
6. "Arduino" or similar microcontroller for hackability.
7. Supports a variety of power conversion topologies, including boost, buck, peak power tracking and split-pi.
8. Uses include micro-solar, LiPo4 battery charging, dc ring main schemes etc.
9. Under $20 for primary building block.
10. Easy to build, easy to repair, extendable, hackable.

Expanding on some of the above points.

The proposed inverter will be built from low cost modules that can be plugged together as required, depending on the application.

The basic design will consist of a microcontroller, one or more H-bridge power boards and a 125VA or 250VA torroidal transformer.

Choice of Microcontroller.

The microcontroller could be an Arduino or derivative, or one of the very low cost, easily breadboarded STM32F103 ARM boards - based on the Maple Mini - which can be programmed with Arduino code - using STM32-Duino.

The breadboardable Baite STM32F103 "Maple Mini"

The advantage of using the STM32F103 is that it has more I/O and Flash/RAM than the standard Arduino and runs at about 5 times the speed. It has more versatile and numerous pwm outputs and higher resolution AD converters. Remarkably these little boards are available very cheaply (<£5) from numerous vendors on ebay.

Using STM32Duino, these boards can be programmed from the Arduino IDE - using an additional board file which caters for the STM32Fxxx range of ARM devices. This allows sketches developed for Arduino to be readily easily converted so as to run on the much greater performance STM32 range.

Choice of H-Bridge.

This handy power board is one I designed earlier in the year to drive a 100W DC Motor, it uses 2 x BTN8960 ICs

The H-bridge board can either be based on standard n-type FETs with driver ICs, or using the more sophisticated BTN8960 H-bridge ICs.

The FET solution may be more hackable and appplicable to other projects, but the H-bridge based on the BTN8960 is a quick and cheap solution. I had already designed a power board to drive a low voltage dc motor, and it was very easy to adapt it to drive the secondary of the 120VA toroidal transformer, using an Arduino-like mcu to generate the 50Hz sine waves.

I am currently testing both solutions so that I can give more informed advice based on my findings.

The BTN8960 power board runs very efficiently with a 120W load. However, the IC is limited to 25kHz switching frequency - and this is not an easy frequency to create using the fast pwm options on the standard Arduino - short of running it on an 8MHz or 12MHz crystal.

The tabs of the H- bridge ICs are soldered to large copper areas - approximately 30 square cm each. This allows them to run cool even at 120W power - without additional heatsinking.

Choice of Toroidal Transformer.

This can be any toroidal transformer in the 50VA to 250VA range - with the secondary voltage chosen to be approximately that expected from the solar panel, or the battery. For convenience I use a 130VA toroid with a 24V secondary.

The toroidal transformer does two things for us, it conveniently steps-up the output voltage from the H-bridge to that of the ac mains, and effectively isolates the low voltage power electronics from the high voltage mains - so that high voltages are contained in the transformer, and not on the H-bridge board - making for a much safer project.

The toroidal transformer is also fairly efficient at converting the low voltage to mains - depending on its VA rating about 91 - 95% is typical.

In the UK, a suitable Vigortronix 120VA transformer 0-12V 0-12V from Rapid Electronics (88-3814) is £15 or less, on ebay.

Putting it in a box.

As the inverter has mains voltages present, it is recommended that it is put into a plastic or metal enclosure.

The largest, heaviest component is the toroidal transformer, which should be securely mounted to the case. The 120VA inverter should fit in a case about 100 x 160 x 50mm, some of those extrusions use for Eurocad sized pcbs could be used to advantage. The Vero 14-1003 or the Hammond cases (Rapid 30-1574 or 30-1535) aluminium extrusion at 105 x 165 x 60 or similar would be ideal.

Using components sourced from the UK, a DIY 120VA inverter in a case could be made for about £50.

A few points on efficiency.

The losses in a toroidal transformer are the sum of the Iron Loss and the Copper Loss. The Iron loss is effectively the magnetising current required to set up the field in the core and lost in the eddy currents. For a 230V 120VA transformer, this magnetising current is about 9mA and the iron loss is 0.98W. The iron loss remains constant at all loads.

The copper loss is the sum of the I2R losses in both the primary and secondary.

For a 2 x 12V 120VA transformer running with 5A secondary current and 0.5A primary current

Secondary loss = (5 x 5 x 0.24) = 6W

Primary loss = (0.5 x 0.5 x 14.6) = 3.65W

The copper losses will rise with temperature because of the increase in the resistance of the windings with temperature.

In total there will be about 11W lost in the transformer when working at its rated power.

Losses in the FETs.

These are outlines in the BTN8960 datasheet. At 25C, the path loss is 14.2 milliOhms. With a 5A drain current, the I2R losses in the FETs are (5 x 5 x 0.0142) = 0.355W. There wil also be some switching losses, plus powering of the remainder of the circuit.

I measured the input current to the H-bridge, which also includes the Arduino, a relay and a 24V to 5V simple switcher 5V voltage regulator. When not driving the transformer the circuit consumed 50mA at 24.25V. A loss of 1.21W

With no load on the transformer primary, the current into the inverter was 0.23A. This puts the no load driver losses as (24.25 x 0.23) = 5.58W.

Adding all the losses, the best estimate (unconfirmed) for system losses is

No load losses 5.58 W
FET I2R losses 0.355 W
Iron losses 0.98 W
Copper Losses 9.65 W

Total 16.565 W

Regarding overall efficiency

Inverter Efficiency (120-16.565)/120 = 86.2%.

Room for Improvement

The proposed micro-inverter is intended to be used when there is no alternative to using ac mains - for certain low wattage devices. The second part of this proposal is to use dc for direct charging of consumer electronics and mobile computing devices.

By increasing the system voltage, to say 48V, the currents switched in the FETs is halved, and so these I2R FET losses can be quartered, however a transformer with a 48V secondary will have more winding resistance.

The losses in the toroidal transformer are more or less fixed for a given core size, however by using a larger core than is actually needed, it will have lower resistance windings in both the primary and secondary - and so the copper losses will be reduced, but the iron losses will be up a little.

For example using a 250VA toroid - but running it at 5A

Secondary loss = (5 x 5 x 0.08) = 2W (previously 6W)

Primary loss = (0.5 x 0.5 x 6.1) = 1.525W (previousy 3.65W)

Iron Loss = 1.62W (previously 0.98W)

No load losses 5.58 W
FET I2R losses 0.355 W
Iron losses 1.62 W
Copper Losses 3.525 W

Total 11.08 W

Inverter Efficiency (120-11.08)/120 = 90.76 %.

So a half loaded 250VA transformer will run cooler with about half the total losses of the fully loaded 120VA toroid. This can increase the overall efficiency of the inverter by about 5%. It also allows for some extra capacity when other loads are switched in, and the voltage droop, under load will be less.

In May 2013, I learned about Ward Cunningham's Txtzyme language - a very simple serial command interpreter which allowed basic control of microcontroller function using a serial interface.

I wrote about it here:

Txtzyme - A minimal interpretive language and thoughts on simple extensions

Tytzyme was about as easy as a "language" could get, it offered digital input and output, analogue input, rudimentary timing in uS and mS and simple repetitive loops. It was so simple, and offered so many opportunities for extension, that I decided to write some new functions - and called the extended language SIMPL - serial interpreted minimal programming language.

In late May 2013, I described the extensions here:

SIMPL - A simple programming language based on Txtzyme

In the last 30 months I have ported SIMPL to Arduino, ARM and FPGA soft core processors. I have also used the Txtzyme interpreter to help to create assembly language for an entirely new soft core FPGA cpu.

Very often, during initial deveopments, we need a low level human interaction with a new microcontroller, and SIMPL seems to provide that coding framework to allow more complicated applications to be tested and coded.

SIMPL is coded in C - which allows good portability between micros, but recently I have been wondering whether SIMPL would be better coded in Forth, and act as a human interface layer between a Forth virtual machine and the human programmer.

Forth can be a difficult language to master - partly because it involves considerable manipulation of the data stack, and the need to keep track of the stack presents quite a challenge to the new programmer.

Standard Forth consists of a vocabulary of about 180 compound words, which can be entirely synthesised from about 30 primitives. When ported to a FPGA soft core CPU, optimised for Forth, most of those 30 primitives are actual native instructions. That makes it fast in execution, but still not the easiest of languages to grasp.

Can we use SIMPL to access a sufficient subset of the Forth vocabulary to allow real programs to be written, but without having to tie our brains in knots over keeping track of the stack?

The beauty of Forth, is that you can compile a new word and give it any name you wish. In SIMPL terms, this name would be a single alpha or punctuation character. Small alpha characters are reserved for the keywords, whilst capital letters can be used for User defined words.

This gives us access to

26 lower case primitives
26 Upper case User words
33 punctuation characters

This gives us a subset of 85 possible words - which has reduced the scope and complexity of the standard Forth language by a factor of 2.

Forth text entry consists solely of words and whitespace. This is intentional because it makes it more readable from a human point of view, and the spaces between words allows the compiler to know where one word ends and the next begins. The carriage return is usually used to denote the end of a line, and thus signal the compiler to start compilation of the contents of the input buffer.

SIMPL borrows from these ideas, but attempts to simplify by removing the whitespace. In fact a space character (ascii 32) may have it's own unique meaning to the SIMPL interpreter.

Numbers also present a burden to the traditional Forth interpreter - which has to search the dictionary only to find that a number is not listed, and then assume that it is a number. In SIMPL we assume that numbers are all decimal unless prefixed with a character that changes the base to hexadecimal.

As the compiler is only dealing with 85 possible input characters, the dictionary is simplified to an address lookup. For example if the interpreter encounters the character J, it looks up the address associated with the function called J, executes that code and then returns so as to interpret the next character in the buffer.

There are only 85 characters to interpret - so only 85 action routes to execute.

Here are some of the primitives

+ ADD
- SUB
* MUL
/ DIV

& AND
| OR
^ XOR
~ INV

@ FETCH
! STORE

< LT
= EQ
> GT

j JMP
z JPZ
: CALL
; RET

JMP3JPZ3CAL4RET3LT 2EQ 2GT 2MOD3DEC3HEX3PUS4R@ 3POP3

# LIT
% MOD
£ DEC
$ HEX

( Push R >R
\ R@
) Pop R R>
[
]
{
}
"
'

? Query
_ Text String

a ALLOT
b BEGIN
c CONSTANT
d DO
e ELSE
f FOR
g
h
i IF
k
l LOOP
m
n
o OVER
p PRINT
q
r REPEAT
s SWAP
t THEN
u UNTIL
v VARIABLE
w WHILE
x
y

Recently I read Chuck Moore's paper on Problem Oriented Languages and it inspired me to take a fresh minimalist approach to how I view my engineering design tools.

As an electronic engineer, I use on a daily basis, various computer tools to help me design electronic products. These include:

Circuit Simulation
Schematic Capture
PCB Layout
2D/3D Mechanical CAD
3D printer Applications

Most of these packages have their roots in the early 1980's - if not earlier, and over the intervening decades have grown to meet the available growth in computing power. Unfortunately, this has resulted in significant bloatware, and so really, these tools are not much better than they were 30 years ago.

So this got me thinking to what are the common features of these packages, and could they be rewritten for minimum bloat, so that they can run with acceptable performance on low specification machines?

So after some investigation I found a few open-source EDA (electronic design automation) tools, including KiCAD and gEDA.

At the heart of all these tools is a graphics drawing library, with line, polygon and primitive graphic drawing functions. These objects are represented in memory by their vertice co-ordinates and other parameters - such as colour, layer etc. Objects, once defined in memory may then be assigned to a group - allowing them to be manipulated as a single object - eg a pcb layour footprint of an IC package. Finally there is a routine to manipulate the position of these objects giving the means to move these objects around the screen.

What may appear to be a very sophisticated package, has as it's basis, three simple programs, and these programs could be shared across all the tools I have listed above.

Whilst a pcb layout package is often thought of as a simple 2D design tool, pcbs are built as a series of stacked layers, and so the package has the ability to create these various layers and manipulate them as a stack. Similarly a 2D drawing package, can be extended into 3D, by extruding 2D shapes along the z-axis. What is a 3D printed part, other than a whole series of 2D slices, stacked layer upon layer?

If you examine the way in which ICs are designed, they are also a series of 2D features, built into multiple layers of silicon and metalisation, and created by selective exposure of photo-resist by the use of photographic masks. In the same way that we design pcbs with layers of copper, ICs are designed with tools that lay down transistor structures in the silicon and connect them with metal interconnect layers.

This is a large oversimplification, but the point I am making is that the tools have very similar foundations, and as such could be written in a way that purposefully makes use of this common codebase. Once you have established a common graphical object representation and manipulation structure, the user interface and the file interchange format could also be standardised. In this way, possibly about 75% of the tool is common code, with an application layer built on top. The application layer would customise the tool for a specific task, such as schematic capture, pcb layout or 3D modelling, and as such would export the design data in an industry standard format - such as netlist, Gerber or G-code.

So a complete tool box of design tools could be written with a common open-source core, and standard open source file exchange formats.

The common core could be written to be fast and efficient, even on small resource limited systems such as the Raspberry Pi - so that for minimum outlay, youngsters could have their own open-source CAD workstation.

Harking back, in the very early 1990s, I did schematic capture and pcb layout design on an 25MHz i486 platfom - a Pi-2 should be approximately 100 times faster than that! (based on figures from Roy Longbottom's PC Benchmark Site).

With open source CAD tools running on a low cost platform - this would open up engineering CAD to a much wider population. Including those that arguably need it most - such as developing countries - who so far have been denied access to modern CAD tools on the grounds of cost - but who's economies would benefit so much from access to modern manufacturing methods such as community 3D printing and low cost pcb manufacture.

Conversely, the graphics core software could be ported to run (as OpenCL) on the ultra-quick GPU of a modern gaming platform - and allow blistering performance when rendering models in 3D.

This post may have described some Utopian outcome, but all the building blocks for good integrated open source CAD tools are in place. It just needs a group of collaborative software developers to make it happen.

Latest Images