Quantcast
Channel: Sustainable Suburbia
Viewing all 234 articles
Browse latest View live

Autumn Almanac

0
0
Early morning in South Snowdonia 2nd November 2015
Part 1 of a series of posts describing open source activities in the Autumn quarter of 2015.

This last week I have been travelling my "Autumn Tour" which incorporated a trip to the Isle of Man to see family and friends, then on to Liverpool for Oggcamp 2015, and then to Snowdonia, to spend a few days at the Open Energy Monitor premises - for further development of the Open Inverter. Finally home by way of Birmingham's NEC to spend a few hours at the Advanced Engineering Exhibition and trade show.

Unfortunately because of the timing of the Isle of Man ferry service, I missed a big chunk of Oggcamp on the Saturday, but made up for it on Saturday evening having a good chance to talk to a range of people at the excellent Halloween Night social gathering.

Sailing into the sunset-en route to the Isle of Man 28th Oct 2015

On Sunday, I helped out at the "Hardware Hacking" workshop - and engaged some local youngsters with some simple hardware hacks - where we added some extra LEDs to some "blaster guns" bought form the local Poundshop.

After leaving Oggcamp 15, it was a quick buzz down the Wirral's M53 and along the North Wales coast, where I encountered the weather extremes of a beautiful sunset - and also "peasoup" fog  - on what was the warmest November day on record - in mid Wales.

Meeting up with Trystan Lea in Llanberis, we had a chat over a beer about open source design tools, before attending a meeting about an innovative project to restore a former chapel in Caernarfon and convert it into a community workspace.
In the hills above Ysbty Ifan heading for the A5

Returning that evening with Trystan to his family home in south Snowdonia, we had 3 days ahead of open source hardware and software hacking - and time to push forward with the Open Inverter project.

The Bothy - with potential micro-hydro stream in the foreground
The Open Inverter forms just one small part of a small scale renewable energy system.

Some of the other things that are under consideration in the wider project are :

1. A low voltage DC "ring main" with programmable charger outlets for powering portable devices.

2. A battery storage system based on Li Ion electric bike batteries.

3. Simple, practical methods of mounting a 250W pv panel on a variety of properties.

4. A motorised solar tracker-mount for a single 250W panel.

5. A low water volume shower (35 litres), with direct heating as a solar dump load.

6. Integrated energy monitoring of micro-solar systems within the domestic environment

7. A micro-hydro system used to extract dc power from a nearby stream.


In the last month I have experimented with BTN8960 half-H-Bridge ICs and more traditional H-bridges using off the shelf N-FETs and driver ICs.  After my Snowdon trip, we hope to finalise the design of a simple, low cost H-bridge pcb and stackable microcontroller pcb - so that anyone can experiment with the open-inverter design.

This work is ongoing and we hope to have a few prototype pcbs ready for the next Snowdonia meetup - that is planned for the weekend before Christmas.

Looking forwards to 2016, there is a meetup at the British Computer Society premises (just off the Strand)  as part of the Open Source Hardware User Group (OSGUG) "Open Energy Interoperability" hacking workshop planned for Thursday 18th February.  This will be an all day workshop event with lightning talks, workshop sessions and an Oshug meeting in the evening to disseminate the day's proceedings.






Open Inverter - Part 7 - In search of Sine Waves

0
0
In Search of Sine Waves


A high quality sine wave, synthesised using an Arduino


The ability to generate a high quality sinusoidal waveform at a specific frequency,and amplitude is at the heart of the Open Inverter project.

Most microcontrollers these days have on-chip timers that can synthesise these waveforms  from a look-up table stored in flash memory.

Whilst the early developments of sinusoidal waveform generation were done on Arduino using "Fast PWM" the pwm control registers could only produce a limited number of PWM clock frequencies:

62.5 kHz
31.25 kHz
7812.5 Hz


My BTN8960 ICs really needed something above the audio threshold frequency of 16kHz but below 25kHz.  My initial experiments were done at 7800Hz - but this produces a painful howl from the transformer windings, and is really not ideal for transformer efficiency.  With the Arduino I had a couple of options left - reduce the crystal frequency to either 4MHz, 8MHz or 12MHz giving me access to 15.625kHz or 23.4375kHz, or write some custom code.  These will be tried at some point when I have the right crystals available.

At the latest open inverter workshop, we reconfigured the Arduino clock to 8MHz allowing us a 15.625 kHz pwm clocked sinewave output. This worked well with the BTN8960 driver ICs.

An ARM Solution

In the meantime, I was keen to press on and get some sinusoid generation code at 25kHz running on at least one of my available STM32Fxxx dev-boards.

At the latest open-inverter hack session held in Snowdonia in early November, I managed to get the STM32F103 sine wave generation code running, and successfully drove the inverter using the BTN8960 half H-bridge ICs.

I must point out at this stage that I have been actively designing pcbs for the STM2Fxxx range of ARM microcontrollers for the last 2 years, and I have several designs that I could adapt to the purpose. The cheapest is the $5 Maple Mini clone from Baite Electronics which uses a STM32F103, but I also have boards with STM32F373,  STM32F407 and STM32F746 at hand.

Of all of these, the STM32F373 is probably the best suited, as it has 3 ADCs with 16 bit resolution, and a lot of useful analogue peripherals - ideal for monitoring a 3-phase inverter.

However, as there is quite a following behind the cheaper STM32F103 boards - so I think this is where I will start.

It is hoped that there will soon be some modular pcbs available  allowing either a dual BTN8960 power board or conventional FET power board to be stacked with  AVR and ARM microcontroller boards.

Watch this space.

The Subtle Art of Substitution - Part 1

0
0
A simple text interpreter that allows code to be invoked by natural language words.     Part 1.

Over the weekend and in various bits of spare time I have been developing a tiny text interpreter in C, as part of the larger project of creating some low-overhead tools to run on various microcontroller targets.  The toolset will eventually include assemblers and compilers for some custom soft-core processors - but first I need the means to interpret typed text words and execute blocks of code if the word is recognised.

Why this is useful

This text interpreter is intended to provide a more human friendly interface layer to my SIMPL interactive programming language.  Writing in high level, more meaningful natural language will greatly enhance the speed at which SIMPL code can be generated.

A natural language interface makes programming tasks much easier.  Real words are more memorable than individual ascii characters, and it all makes for more readable code listings. Whilst SIMPL might use lower case "a" to initiate the analog read function, typing "analog" is a lot more reader friendly. An interpreter that follows a few simple parsing rules can offer a much increased speed of programming, yet be modest in the amount of on-chip resources utilised to do this.  The code to implement the interpreter is about a 2K to 3K overhead on top of SIMPL - but that will include listing, editing and file handling utilities too.

Substitution and Assemblers

A text interpreter and its ability to execute blocks of code based on parsing the text commands or file it receives is a fundamental part of utility programmes such as assemblers and compilers. Here a set of keyword mnemonics representing instructions can be interpreted and used to assemble machine code instructions by direct substitution.

With a simple text interpreter we can move out from the realms of numerical machine language, and implement the likes of assemblers, dissassemblers and even compilers.

In the case of an assembler, the wordset will comprise of the mnemonics used by the target processor - and the interpreter will merely substitute the human readable mnemonic for the machine instruction numerical opcode.

For example, a certain processor may have an instruction set including mnemonics such as ADD, AND, SUB, XOR etc. The role of the text interpreter is to find these words within the text buffer or input file and create a table consisting of direct machine code instructions, subroutine call addresses and other variables to be passed via the stack to those subroutines.

At a level above the assembler is the compiler.  This also takes text based input and generates machine code to run on a specific processor.  However, compilers are very complex pieces of software, and it is more likely that I will find an alternative solution  - given long enough.


Why do this?

The purpose of the text interpreter is to provide a natural language text interface for a small, resource limited microcontroller - in a similar style to what was provided with the various BASICs of the late 1970's. It's remarkable to think that some fully functioning basics fitted into 4K ROM and 1K of RAM - solely by some very clever programming tricks - in raw assembly language.

Fortunately most embedded programming these days does not have to resort to raw - assembler, and C has become the preferred interchange language layer for portability.  C code written for an Arduino, may be fairly easily ported to an ARM - provided that low level I/O routines - such a getchar and putchar are available for the target processor.

Coding up a text interpreter is a good exercise in C coding - and as I am not a natural born coder, any meaningful coding exercise is good practice.  I also enjoy the challenge of making something work from scratch block by block - rather than being over reliant on other peoples' code, that I don't even pretend to understand.


Requirements

As a bare minimum, we assume that the microcontroller target can provide a serial UART interface for communicating with a terminal program. I have recoded Serial.print and it's associated functions to use a much smaller routine - which saves memory space.

Ideally the microcontroller should have at least a couple of kilo-bytes of RAM for holding the dictionary and headers making it possible to implement it on anything from an Arduino upwards.

The text interpreter is an extension of the SIMPL interpreter, and can be used for programming tools such as text editors, assemblers, compilers and disassemblers. It provides the means to input text, analyse it for recognised keywords and build up a dictionary and jump table.

Borrowing from Forth experience, the text interpreter (or scanner) will look for a match on the first 3 characters of the input and the length of the word.  As a word is typed in, it will initiate a search of the dictionary (of already known words). If a match is found, the word will be substituted for a 4 digit  (16 bit) jump address. If the word is not matched, it will be added in full to the dictionary table.

This sounds all very Forth-like, and indeed it is, because it is a proven means to input new text data into a processor's memory using minimum of overheads. The dictionary structure is simple enough that it can easily be parsed for a word-match, and also processed for editing and printing.

As each Forth definition is effectively just a line of text it can easily be  handled with a line-editor - again a simple task for a resource limited processor.

Numbers are handled as literals. A quick scan of the text with "is_a_num" will reveal whether it is numerical text - if so it should be converted to a signed integer and put onto the stack.

The output of the text interpreter should be a series of call addresses relating to the functional blocks of code that perform the routines associated with the keyword.  In the case of the assembler example, the mnemonics can be translated directly using a look-up table which converts them directly into the machine instruction of the target processor - this is especially relevant if the target is a stack machine - such as the J1 forth processor.

Charles Moore struck on the idea of a language that was designed for solving problems.  He envisioned having separate vocabularies for each problem he wanted to solve.  For example his assembler application would use a vocabulary tailored to that application - namely the mnemonics as keywords, similarly the SIMPL language would utilise a vocabulary that supported the SIMPL command set. Thus by pointing to a different vocabulary in flash, the processor can readily swap between contexts.


Hop, Skip and Jump, - the Mechanics of Text Interpretation

Short of providing a flow chart - the description below describes the operation of the text interpreter.

The text interpreter will parse through lines of text, taking each "word" as defined by a group of characters terminated by white-space, and check through a list of dictionary words for a match. If there is a match, then the newly scanned word is either a system keyword or a new one that the user has previously added to the dictionary.

If the word does not generate a match with any existing keywords then it is added to the end of the dictionary - thus allowing a match the next time it is used.

In addition to the dictionary, there is a separate array of records that will be known as the "headers". The headers consists of a shortform record of all of the words in the dictionary.  The purpose of the headers is to allow an efficient search to be performed on the dictionary entries - as words are listed in the headers by their first three characters and their length.  A match on the first 3 characters and the length was proven many years ago to be an effective and efficient means of word recognition- see section 3.2.3 here

Once the header of scanned word has been deemed to match the header of one already in the header table, a jump address pointer can easily be calculated - its actually generated as part of the matching routine.  This jump address pointer is decoded by a look up table to generate an actual 16-bit jump address.

For compactness and efficiency,the word matching routine is limited to a maximum vocabulary of 255 words - which is more than enough for most applications.

The text interpreter deals with lines of code.  At some point their will be an accompanying package that implements a line editor, as the first step towards a full, screen editor.

The input buffer of a terminal program may be some 250 to 300 characters long. This is more than adequate space to define most sequences of command words.  Indeed - it may be beneficial to restrict the input buffer to say 128 characters - as this is what can be displayed sensibly on an 800 x 600 VGA screen.


Word storage format

The shortform entries stored in the dictionary headers can be saved as a group of 4 bytes, consisting of the first 3 characters and the length byte. The routine that searches the headers for the match automatically generates the jump address pointer allowing a lookup to the actual jump address from a table.

        Char1, Char2, Char3, Len
Byte     0         1           2        3

So a word can be expanded by knowing its length and the dictionary pointer to its 1st character
The jump address is shortened to a single byte fast look-up from a table.

Implementation

It's taken a bit longer than expected, but after an intensive day thinking, re-thinking, then coding, the tiny (2K) text interpreter is now starting to take shape.

I have put an interim version (#45) on this github gist

The interpreter is written in fairly standard C so it can be ported to a number of devices.  If implemented using Arduino using the Serial.print library it uses about 4142 bytes of flash and 1897 of RAM.  By using much more efficient custom UART routines for serial input and output, this can be massively reduced to just 2002 bytes of Flash and 1710 bytes of RAM.


Part 2 of this posting will look further at features of the text interpreter and the SIMPL toolset.

A Minimal Text Interpreter - Part 2

0
0
A Text Interpreter to run on a Resource Limited Microcontroller  - Part 2

In the previous post, I described the basics of a tiny text interpreter, written in C, intended for use on resource limited microcontrollers. The text interpreter would offer a natural language user interface, allowing programming and command line control of various microcontroller projects.

It will also form the basis of a wider range of self-written computing tools, including assembler and compiler, editor and file handler - all of which could be hosted, if necessary on a resource limited target board.

However, for the moment, and for ease of experimentation, the intention was to get the interpreter to run with only 2K of RAM (as per the Arduino Uno).

I envisioned the text interpreter as being a universal resident utility programme (almost akin to a bootloader) that would be initially flashed onto the microcontroller thus allowing a serial command interface and the means to exercise the hardware or develop small interactive programmes.

At work and at home, there are many instances of when I want some degree of interactive control over a small microcontroller project - even if it is just manipulating some port lines or sending and receiving a few serial responses on a terminal programme.

Some Practical Limitations

In order to keep the demands on the interpreter program reasonable it is necessary to put some limits on its capabilities.  In particular, the number of words it can recognise and create jump addresses for. For convenience I used a look up table to hold the jump addresses.  If the look up table is to remain reasonably compact - then a limit of 256 entries seems reasonable.  Restricting the word capacity will also help keep the dictionary and its headers to a manageable size in RAM. This is important when you only have 2K to play with!

As the 4 byte header is in fact a shortform, or compact coding convenience that represents the dictionary, it could be said that in very RAM limited systems that it is not actually a requirement to keep the dictionary in RAM on chip.  The only role that the dictionary performs is to allow the header entries to be expanded to the full word at times of listing.

As small micros generally have plenty Flash available, then the dictionary for all the standard words could be programmed into flash - as indeed could their headers.  If necessary, a shell hosted by a PC application could be used to host the various dictionaries and source code files needed for particular applications. However, the original aim is that this interpreter vastly increases the user-friendliness of the microcontroller target - even with just a serial terminal as the user interface.

Additionally, I have imposed a word length limit to 16 characters.  Imposing this limit means that the word length can be coded as a single hexadecimal digit - which makes it displayable in ascii and human readable. If you can't name something uniquely in 16 characters then you are probably of German extraction.

Vocabularies

Different tasks need different tools, and as the interpreter will be used for a variety of tasks, then it seems reasonable that it can be augmented or tailored towards a particular task. This can be done very conveniently with the use of vocabularies - with a particular vocab being used for a particular task.  A vocab that contains the mnemonics of a particular processor's instruction set would be one sensible use when using the interpreter within an assembler, compiler or disassembler.  

Forthright

Those of you that are familiar with Forth, will say that I am just creating a portable Forth-like environment, but rather than being coded in the native machine language of the target processor, it has been written in C for convenience and portability.

This is indeed partly true, as the utility I am creating has been inspired by the Forth language - especially in its compactness and low resource requirements.  Even in the 1960s Charles Moore was concerned how the tools provided for computing at that time hampered progress, and so set about redefining the whole man-machine interface. He compressed the previously separate editor, compiler, and interpreter programmes (none of which could be co-resident in memory at the same time) into a single compact, convenient package that did the job of all three.

When Forth was first introduced in the late 1960s, mini computers had sub-MHz clock frequencies, and very little RAM, and so benefited greatly from a moderately fast and compact language like Forth. Nowadays, typical microcontrollers have clock frequencies in the 20MHz to 200MHz range and so are not so hindered by what is essentially a virtual machine implementation of a stack processor written in C.

Virtual and Real Stack Machines

I have embarked on this journey because of my wider interest in soft-core and open core processors implemented on FPGAs. Several of these cores are based on stack machines, partly because they may be readily implemented in surprisingly few lines of VHDL or Verilog. Indeed James Bowman's J1 Forth processor is fully described in fewer than 200 lines of verilog.

Whilst a virtual stack machine might not be the easiest fit for a register based processor without performance penalties, it is a wonderful fit for a real stack machine.  A number of open-core processors including the ZPUino and James Bowman's J1 are true stack machines.  Here the instruction set of the virtual machine have a near one to one direct mapping to the the machine instructions of the stack processor.  In this case the text interpreter can be rewritten in the native assembly language of these cpus, to benefit from the vast increase in speed of running without an additional layer of virtual machine.

In order to do this an Assembler will be required  that is tailored to the instruction set of the Forth Processor, and this is one of the first tasks that the text interpreter will be used for - the assembly of machine code for a custom processor.

One of the reasons why I am concerning myself with such low level primitive tools, is the need to understand them from the ground up so that they can be implemented on a variety of non-conventional processors.

Whilst the ZPUino will execute Arduino code directly (albeit very inefficiently  - because of the C to stack machine programming conflicts), the J1 will need the tools to write it's own language from the ground up - and if you already have the mechanisms of a language in place, plus an easily customisable assembler, then it makes the job a lot easier.

In a later post, I will give an update on the text interpreter and it's application to custom code assemblers.





Minimal Text Interpreter - Part 3

0
0
The operation and main routines of a minimal text interpreter  - Part 3

This post is merely a description of the first implementation of the text interpreter looking at the principal routines. It's so I can remember what I did in 6 months time.

Currently only the basics have been implemented - by way of a proof of concept, and running on a 2K RAM Arduino. Later this will be ported to various ARM Cortex parts, the FPGA - softcore ZPUino and ultimately the J1 Forth processor.

There are probably many ways in which this could be implemented - some giving even more codespace and memory efficiency.  As a rookie C programmer, I have stuck to really basic coding methods - that I understand. A more experienced programmer would probably find a neater solution using arrays, pointers and the strings library - but for the moment I have kept it simple.

The interpreter resides in a continuous while(1) loop and consists of the following routines:

txt_read  

Reads the text from the UART into a 128 character buffer using u_getchar.
Checks that the character is printable - i.e. resides between space (32) and tilde ~ (127) in the ascii table and stores it in the buffer.
Keeps accepting text until it hits the buffer limit of 128 characters or breaks out of this if it sees a return or newline  \r or \n character.

colon_check

This checks if the text starts with a colon, and so is going to be a new colon definition.
sets flag colon=1
calls the build_buffer function

word_scan

If the leading character is not a colon, this function determines that the word is either within the body of the definition, or it is for immediate execution.  It calls build_buffer,  but only builds the header to allow a word match. It should not add the word to the dictionary, if it gets a match and is already there.

build_buffer

This checks the first 3 characters of the word and puts them into a new header slot in the headers table.
It also calculates the word length by counting the characters as it stores them into the dictionary table, which it continues until it sees a terminating space character.
It increments the dictionary pointer ready for the next word

word_match

This compares the 4 characters of the header of the newly input word with all the headers in the header table.
If all 4 characters match then it drops out with a match_address (for the jump address look-up table) and sets a match flag  match= 1.


header_list

This is a utility routine which prints out a list of all the headers in the order they are stored in the headers table.

dictionary_list

This is a utility routine which prints out a list of all the words in the dictionary in the order they were stored in the dictionary table.

txt_eval

This is the main character interpretation function which implements the SIMPL language core

is_a_word

Not yet implemented.  Returns true if it finds a word and invokes build_buffer and word_match

is_a_num

Not yet implemented.  Converts the ascii text to a signed integer and stores it in a parameter table.
Might possibly use ascii 0x80 (DEL) to signify to the header builder that the following bytes are a number.  Will need a conversion routine to go between printable and internal storage formats.

UART Routines

These provide getchar and putchar support directly to the ATmega328 UART. Saves a huge amount of codespace compared to Serial.print etc

uart_init

Initialises the ATmega328 UART to the correct baudrate and format.

u_putchar

Waits until the Tx register is empty and then transmits the next character

u_getchar

Waits until a character is present in the UART receive register and returns with it

Printing Routines

Having banished Serial.print - I had to implement some really basic functions

printnum()

Sends a 16 bit integer to the UART for serial output

printlong()

Sends a 32 bit integer to the UART for serial output




A J1 Virtual Machine - Gimme some Jips!

0
0
BOB is no slouch when it comes to simulating a virtual stack cpu!
Historical Note.

Way back in 1991 when I was half the age I am now,  I did my pcb design work  using OrCAD on a 25MHz 486 desktop. The picture above is of my latest experimental pcb - a breakout board for the 216MHz  STM32F746 ARM Cortex M7 microcontroller.  BOB (above) can emulate a 16 bit minimal instruction set processor  faster than the 25MHz ' 486 box - and for about $20!  Now that's progress.

Implementing a Stack Processor as a Virtual Machine

This post examines the role of a virtual machine, created to run on a given processor for the purpose of simulating another processor, for performing operations that the host processor might not readily do easily. One example was Steve Wozniak's "Sweet 16" - a 16 bit bytecode interpreter he wrote to run on the 6502, to allow the Apple II to readily perform 16 bit maths and 16 bit memory transfers.

In his closing remarks, Woz wrote:

"And as a final thought, the ultimate modification for those who do not use the 6502 processor would be to implement a version of SWEET16 for some other microprocessor design. The idea of a low level interpretive processor can be fruitfully implemented for a number of purposes, and achieves a limited sort of machine independence for the interpretive execution strings. I found this technique most useful for the implementation of much of the software of the Apple II computer. I leave it to readers to explore further possibilities for SWEET16."

The main limitations to the VM approach is that the execution speed is often one or two orders of magnitude slower than the host running native machine code, but with processsors now available with clock-speeds of 200MHz - this is not so much of a problem.

It is more than offset by the ability to design a processor with an instruction set that is hand-crafted for a particular application, or the means to explore different architectures and instruction sets, and to simulate these in software, before committing to FPGA hardware.

Stack Machines

Whilst Woz's Sweet 16 was a 16 bit register based machine, I had ideas more along the lines of a stack machine, because of it's simpler architecture and low hardware resource requirement.

I had become interested in an interpreted bytecode language that I believed would be a good fit for a stack machine, and so in order to get the ball rolling, I needed a virtual stack machine to try out the language.

Earlier this year, I invested in a Papilio Duo FPGA board, and with this came access to a ZPUino soft-core stack processor - devised and much enhanced from an existing design, by Alvie Lopez. The advantage of the ZPUino was that it was one of the few soft core processors that had GCC available, and so the task of porting the Arduino flavour of C++ to it was not over arduous (for those accustomed to that sort of task - not me!).

However, porting C to a stack machine is never a very successful fit - as C prefers an architecture with lots of registers - such as ARM.

As a result, the ZPUino, whilst clocked at 6 times the speed of the standard Arduino, only achieved about twice the performance when running a Dhrystone Benchmark test - written in C.  The other factor limiting  ZPUino is that it executes code from the external RAM - and there is a time overhead in fetching instructions.

Despite these limitations, the ZPUino has been a useful tool to run simulators, as it supports VGA hardware and the Adafruit Graphics library - allowing text and video output from an Arduino-like environment.

The other stack processor that caught my attention is James Bowman's J1 Forth processor.  This became available as an implementation on the Papilio Duo  in early September to run on readily available FPGA hardware at speeds of up to 180MHz. So I have been working towards trying it out - first using a software simulator.

A J1 Simulator - written in C - and tried on a number of processors.

Back in the spring, I found a bit of C code that allowed a J1 processor to be run as a virtual machine on almost any processor.

Initially, I implemented it on Arduino, but I quickly moved to the faster ZPUino - which, as stated above, is a stack based processor implemented on a FPGA.  This was a stop-gap, whilst I was waiting for James to release his J1 in a form that I could use.

The simulator is about 100 lines of standard C code, and implements a 16-bit processor with integer maths and a 64K word addressing space.

I then wrote a test routine, in J1 assembler, consisting of just  a simple loop - executing 7 instructions and incrementing (by 1) a 16-bit memory location, every time around the loop.

Running this test code - the standard 16MHz Arduino managed  67,000 J1 instructions per second. (Jips).

I then transferred the sketch to the ZPUino, running on the Papilio Duo board.  This provides a useful boost in performance to about 152,000 Jips.

A 72MHz  STM32F103 running the same code under STM32-Duino managed  404,000 Jips - about 6 times the speed of the Arduino,  - a healthy performance boost.

The difference in performance between the 8-bit Arduino and the 32 bit STM32F103 - could be explained to be partly down to the 4.5 times increase in clock speed, and partly that a 32 bit microcontroller can implement a 16 bit virtual machine somewhat more efficiently than an 8-bit device giving an additional 30% boost - over clock speed scaling alone.

In addition, the test code only added one to the memory cell. If this were say adding a 16 bit value into that location - the 16 bit transfer would slow the 8-bit AVR down considerably.

I then proceeded to port the simulator to a 168MHz  STM32F407 Discovery board. The 168MHz STM32F407 returned a slightly puzzling 764,000 Jips.

Based on the increase in clock speed it should have been about  940,000 Jips. This appeared to be a bit slow.  In theory it should be running at 2.33 times the speed of the 72MHz part.  This needs further checking to ensure that it is not a compiler optimisation issue that is holding it back.

I tried again with the various optimisation levels of the  GCC compiler:

Optimisation -00           733,333    Jips
Optimisation -01           3,083,333 Jips
Optimisation -02           3,333,333 Jips
Optimisation -03           3,583,333 Jips

With only modest optimisation the '407 is returning around 3 million Jips!

Meet BOB - the fastest, newest kid on the block.

Back in the summer I made up a break out board BOB for the 216MHz STM32F746  Cortex M7 microcontroller.  Whilst ST Microelectronics had released their $50 F7 Discovery board - complete with LCD, I wanted a very simple board, with the same pin-out as the previous F4 Discovery to try out relative performance checks.

So, it's now time to port the J1 simulator onto the STM32F746 - and see how it performs.

The '746 is an M7 ARM and has a six-stage dual issue pipeline - which means that it can effectively load two instructions from memory at once.  This feature and the higher clock frequency gives it a 2.2 times speed advantage over the '407.

With all this working, the 746 BOB board - should be able to simulate the J1 at around 7.8 million J1 instructions per second  - welcome back to the 1980's in terms of performance!

Whilst we can emulate the J1 in C at around  8 Million Jips, the real J1 should manage nearly 200 Million Jips, so when I get real J1 hardware up and running - it should really fly!

Update

After a long day and half a night of battling with compilers, I just got the figures for the STM32F746 running the J1 interpreter at 216 MHz. Initial measurements suggest that it's running at close to 15 million Jips per second with minor optimisation and about 27 million JIPS with the most aggressive optimisation!

Optimisation 00        9,000,000 JIPS
Optimisation 01        15,000,000 JIPS
Optimisation 03        27,000,000 JIPS

Beating the Bloat

0
0
Aaaaargh.......!

This post is by way of a minor rant about the current state of the tools and methods we use to produce embedded firmware.

In order to perform the benchmark tests on the series of processors yesterday, I had to use 4 individual IDEs and spend 12 hours of my life fighting the flab of blobby bloatware that is the embodiment of the modern IDE.

My grief really started when I wanted to port the J1 simulator to the Cortex M7. For this I needed a "professional" tool chain.

The Long and Winding Road.......

In order to blink an LED on my STM32F746 breakout board, I had to install the 32K codesize limited version of Keil's uVision 5 and their ARM MDK. This takes about an hour to install and set up.

Then I had to find an example project of something that was close to what I wanted to do - i.e. blink a LED. I found their generic Blinky example - and then found that it had been tailored for a couple of commercial dev boards - and the files that set up the port allocation were locked from editing within the IDE.

So I opened the files in Notepad++, edited the dozen or so lines of code that controlled the GPIO port allocation, and then wrote my edited version in place of the original - so far, so good.

Had I known that at 6pm I was still about 2 hours away from blinking a LED, I would have probably thrown in the towel and gone to the pub.  I eventually tracked down the problem to my particular port pin being re-assigned as an input in the example code, immediately after I had set it up as an output. There was also a minor problem with the clock generation set up for the wrong PLL ratio - that prevented the code from running.

Now I have learnt that ARM processors are fairly complex beasts - and the peripherals take up a a fair time to set them up with the myriad of different options -but when I looked at the project files to blink a LED, I saw that it was taking about 100 code modules to set up the peripherals - and some of those modules were each 1000+ lines of code.

However - as a fairly recent newcomer to the Keil compiler and the ST Microelectronics hardware abstraction layer - who was I to know which of the 100 files I needed and which I didn''t.

This leads me nicely on to  Shotgun, Voodoo and Cargo Cult coding practices. I'll let the interested follow up the definitions, but the point that I am making is that the modern IDE and methods of using a hardware abstraction layer do absolutely nothing to help simplify the problem or reduce the amount of bloat that has to be compiled - regardless of whether it is being used or not.

In order to flash a LED on and off, a single bit in a register needs to change state - why then do I need to compile 10,000 lines of somebody else's code into a 9.5k byte binary, in order to make this happen?

Compilation times of over a minute really do nothing to boost one's productivity. Yet we persist with this madness making our compilation tools even more sophisticated - with the excuse that the processors that we are compiling for are getting more complex - and the commercial suppliers of compilation tools - need to be seen to be keeping ahead of the competition.

It has been ever thus for about the last 50 years or more - with the computing industry pedaling us over bloated, over expensive tools that we neither want nor need.

HAL: Just what do you think you're doing, Dave? 

Well perhaps Dave should be asking HAL just WTF he thinks he is doing.  

And in this case, HAL is the new hardware abstraction layer - cooked up by the teams of clever code monkeys at ST Microelectronics.  I understand that as code gets more complex then it needs to be better managed, and that somewhere out there, someone writing code for a Cortex M0 may have an epiphany moment and realise that he should port his code to a Cortex M7......  

However appears that ST Microelectronics has employed a million monkeys with typewriters to undertake the mammoth task of writing the HAL modules - put them in separate rooms (or countries) and made it difficult for them to talk to one another.

Not unsurprisingly, the HAL reference manual runs to 963 pages - and took another team of our simian chums to cook that one up. This link is actually for the STM32F4xx Cortex M4 processors - because it appears that the M7 has not been published yet.

So in reverence to the computer Holly, from Red Dwarf  - I will call this code behemoth HOL  - or the hardware obfuscation layer - as that is exactly what it does.  It makes it difficult to know what your hardware is doing, nor what you need to do, in order to make it work for you.

There has to be a better way - and if Carlsberg wrote compilation tool chains - they would probably be the best in the World.

OK  - time for the pub...........



NAND to Tetris - A Personal Journey

0
0


From NAND to Tetris (N2T) is the popular name for an open study Computing Science course devised by Shimon Schocken and Noam Nisan.  It is accompanied by a book, by the same authors "The Elements Of Computing Systems - Building a Modern Computer from First Principles" and a series of online and downloadable study materials.

For anyone who wishes to get a more in depth understanding of the interaction between hardware, operating system and software application layers of a modern computer, or consolidate existing knowledge - then I would highly recommended purchasing this book, following the course materials and supporting this project - as a whole.  We need a whole new generation of Computer Science and Electronic Engineers - who understand this stuff from first principles.

After first hearing about the course from contacts at the London Hackspace, I bought the book last year and I am slowly working my way through it.  By this, I mean that I am making my own personal tour of the country that it describes  - and not necessarily by the direct linear route outlined in the book.  I dip into it occasionally, rather like a travel guide, as if I were planning a trip to the next major city.  I believe that I will reach the final destination, but it will be the wealth of experience gained from meandering on the journey, rather than the final destination, that currently is my driving factor.

I embarked on the book, having spent most of career in digital hardware design, but very little real experience of writing software tools. Whilst I found the chapters on hardware were fairly easy to follow, I hoped that the book would lead me gently into picking up new software skills.

The first 5 chapters of the book illustrate and re-enforce the principles of combinatorial and sequential digital logic, by having the student design the logic function of the various "chips" that go up to make a simple cpu.  From basic gates you combine ever increasing complex designs to make up the arithmetic logic unit, ALU, the program counter and the various registers that make up the cpu.

A hardware design language package allows the design, simulation and testing of the various logical components and gives the student confidence that their design meets the test spec of the required item.  It soon becomes apparent that there is no one way to implement the logic of the ALU - but some ways are quicker, more flexible or have a more efficient use of silicon.

I completed the hardware design chapter exercises of the book during an intensive week of evenings in spring last year.  Then got a more than a little bogged down in the software section, as I realised at the time I did not have the programming skills in any language to do justice to the demands of the software exercises - beginning at Chapter 6 "Assembler".

Rather than defeat by a complete road-block,  I have spent the last year surveying the surrounding territory for an alternative route to complete the mission.  In this, I have invested in FPGA hardware, designed pcbs for ARM processors and written simulator code for simple stack based processors.  I have now got to the point where the next logical step is to write an Assembler.

I have picked up enough C skills to put together a simple text interpreter and use it to parse through tables of mnemonics looking for a match and associating a machine instruction with that scanned mnemonic.  It is the basis of a "poor mans" assembler, but it has the flexibility to be applied to whatever novel processors instruction set I wish to explore.  I can now go back to Chapter 6 - with my new knowledge and software tool and make new progress.

In the intervening year - and at this stage in life we view projects in terms of years of involvement,  I have also learned a bit of Verilog and done a bit of FPGA logic design. These are skills I will need  to develop if I am to keep up with the modern world. And whilst I may no longer to be able to see (without glasses) some of the hardware I am working with,  I can still type, and I have the option of increasing the font size. That should keep me viable in the workplace for the next decade or so - although I do increasingly have my "dinosaur days".

This move was partly inspired by the N2T book, and also my desire to get involved in the new wave of low cost FPGAs that have now become available to the hobbyist.  I might be as bold to say that in 2015, they are to the enthusiast what the 6502 was in 1975, and the Arduino was in 2005.  User friendly FPGA hardware is definitely going to be a growth area for the next few years.

FPGAs allow you to design your own custom hardware, or recreate vintage or retro-hardware computers from years ago.  Soft core processors, featuring custom instruction sets are one area of involvement - and these will require software tools to simulate operation and allow code to be written.

In addition, I have moved on from being constrained by just 1 or 2 microcontrollers. I am now experiencing the portability of software written in C, and discovering how easy it can be to switch between processors - even though I have some concerns about the complexity of modern IDEs.

One of the tasks I set this year was to benchmark several microcontrollers with dhrystone and whetstone benchmarks - in an attempt to get a better understanding of how they perform under different applications.

By characterising the relative performance and resources of a few common cpus - I am now able to make informed decisions about which might be more suitable for a particular job. Currently I am impressed with  the ARM Cortex M7,  and I am eagerly awaiting 400MHz versions of this M7 core - expected in late 2016-2017.

Whilst 400MHz might appear puny to those who regularly use twin-core 1GHz parts in their mobile phone or Raspberry Pi, to them I offer the challenge of writing from scratch an Assembler!






Results of J1 Simulation

0
0
First Up  - Some Results

In the last post I looked at running the J1 simulator on various platforms from the humble 16MHz Arduino to the red-hot STM32G746 - running at 216MHz.  Here are the results of those tests in J1 instructions per second - or JIPS as I call them.

Arduino             16MHz    ATmega328                67,000 JIPS
ZPUino              96MHz    Soft CPU                   152,000 JIPS
STM32F103     72MHz   ARM Cortex M3         404,000 JIPS
STM32F407   168MHz   ARM Cortex M4    3,000,000+ JIPS *
STM32F746   216MHz   ARM Cortex M7    9,000,000+ JIPS *

*  The last 2 results are based on level 00 Compiler Optimisation.  With more aggressive optimisation, the '746 was returning 27 million JIPS.

So now that we have a means of simulating the processor at about 1/20th of full speed, the time has come to decide exactly how we are going to port a useful high level language onto this processor model.

James Bowman has done excellent work porting Forth onto his J1 soft core, but I am not quite ready to plunge into Forth - for me it's about the journey of exploration in reaching a high level language implementation - under my own steam.

A small revelation

At this point it is interesting to note - that if the 27 Million JIPS is indeed correct - then the 216MHz  Cortex M7 core is executing about 8 instructions for every emulated J1 instruction - in this particular (non demanding) test program.  So it would probably to be fair to say that most modern ARM processors (M7 and above) would probably achieve a similar decimation ratio whilst simulating the J1.

If this is the case, then a 1GHz ARM could simulate a 100MHz J1 - or put the other way, then a 100MHz J1 would have a similar overall performance to a 1GHz ARM - that was executing some sort of stack based Virtual Machine bytecode language - i.e. Java.

As a lot of applications are written in Java  (eg Arduino IDE), then the overhead of running a virtual stack machine on a register based cpu slows it down by a factor of 10.  If however the Java bytecode were translated into an intermediate form (possibly J1 Forth) it would likely run appreciably faster.

The point I am making is that with access to making one's own customised soft core stack cpu that has been tailored to Java bytecode, running on a FPGAs  could make Java run a lot faster on less powerful hardware machines. Some ARM ICs already have this ability to directly run Java bytecode - known as Jazelle. This is how some games are written, in order to run faster on small platforms - such as mobile phones.

Running the J1 Simulator on ZPUino.

The ZPUino has shown itself to be a  convenient and useful 32 bit processor, implemented on FPGA hardware. As the ZPUino is Arduino code compatible, and runs my simulations about twice the speed of an Arduino, plus the fact that it allows easy use of the Adafruit GFX graphics library, which permits 800 x 600 VGA text and graphics to be displayed on a flat screen monitor.

Whilst not a particularly fast processor, ZPUino does allow easy and unrestricted access to the graphics library - such that it is easy to create a series of animated display screens for displaying high level output, using what is effectively and Arduino sketch. This technique is particularly flexible, and allows you to creatively interact with the particular problem - rather than get bogged down in someone else's system calls and drivers.

I took the very short J1 test program as used in the simulations - a simple loop consisting of 7 instructions, and used the ZPUino to run this test program as an animated simulator - which graphically showed the contents of memory - as a hex dump, plus the main J1 registers, the stack and the instructions as they were stepped through. Repeated re-drawing of the hex dump memory display slowed the execution right down to about 1 instruction per second - about 100 millionth of real J1 execution speed.

The Missing Assembler

What was missing from this exercise was the ability to easily write J1 test programs in J1 machine code - and this rather hampered progress. So it is for this reason that the first application of the SIMPL text interpreter will be at the core of the J1 cross assembler.

Whilst the J1 is intended to run Forth, and has the tools to support it, my Forth skills are not great, and anyway I'm trying to challenge myself to learning C to a reasonable standard.  So a coding project written in C, that taxes my language and thinking skills is a good way to learn, and achieve something useful.

The interpreter can take a set of mnemonics, tailored for the J1 processor and by the process of direct substitution, create the series of 16 bit instructions that can then be run on the J1 virtual machine. I really want this to be an interactive process working in a Forth-like manner - so that small snippets or blocks of J1 assembly language can be assembled and tested individually as an iterative process.

It's many years since I wrote any code in assembler - and that was Z80 which had a reasonable mix of registers to play with.

Writing in a minimal instruction set language, is going to be interesting.

In order to gen up on the processes involved within a typical assembler - I returned to "NAND to Tetris" Chapter 6.  There is a good description of what is needed there.  I then wnt on to refresh myself on the contents of Chapter 7 -  "Virtual Machine I - Stack Arithmetic" and Chapter 8 "Virtual Machine II - Program Flow".  Having re-read these chapters, in the fresh light of a new day, I believe that my musings about the J1 cpu - are not only very relevant - but completely on track with the content and approach outlined in "NAND to Tetris".

More on this in a later post.

















A New Compact Microcontroller Board

0
0
The new board shown fitted with a 40 pin DIL Package  - eg ATmega1284


A Low Cost Generic Microcontroller and FPGA Board

The Arduino board format is now looking dated with its bulky footprint and only 20 useful I/O lines.

I realised that there was an opportunity to redesign the board and make it more useful for prototyping or developing with larger pin-count microcontrollers - yet retain nominal compatibility with the Arduino connector format, and therefore will also accept most original Arduino shields.

The proposed new board footprint is just 70% of the original area yet provides up to 58 I/O pins, direct USB programming and on board wireless communications.

The pcb makes use of a standard 50mm x 50 mm board footprint - which are now manufactured very cheaply (as little as $14 for 10) by various low cost board houses. 

The board format may also be used as a basis of a 50mm x 50 mm expansion shield.

Pin Naming.

Arduino started life  with 6 Analogue inputs and 14 Digital I/O pins. Over the years these have often been labelled A for analogue and D for digital.

The naming convention I have settled upon keeps the A and D headers for backwards compatibility, but adds extra headers  - labelled B, C, E and F.  Alphabetical port names make sense

These additional 0.1" pitch headers are placed in-board of the existing headers - which give an inner row of headers on a 1.70" width, which makes these entirely compatible with most breadboards and 50mm x70mm 0.1" prototyping boards.

Header A is 6 pins - Arduino standard  - providing analogue inputs
Header B is 6 pins - and provides additional lines with higher resolution analogue capability.
Header C is 8 pins - providing a mix of analogue, digital, communication and timer functions.
Header D is digital and has been extended to include the extra two I2C pins
Header E is 16 pins  - for Expansion - and is exclusively digital GPIO
Header F is also for Future and may provide up to 5 GPIO lines

The layout of the headers has been chosen so as not to be entirely symmetrical - this hopefully prevents any shield from being plugged in back to front.


Making it Compatible with Arduino Shields.

A brief word about Arduino.  Arduino originally offered 6 analogue input pins and 14 digital pins. Unfortunately due to a CAD error, the digital pins are not on a standard consecutive 0.1" spacing - as there is a gap of 0.060" between D7 and D8.

The first task was to come up with a shield footprint that could be compatible with this layout - yet  fit into the narrower width of a 50mm square pcb.  This was done by careful customisation of the size and shape of the header pads - so that they will just fit into a 50mm width.

The second task was to provide some additional 16 pin header strips, inboard of the original Arduino headers, which would give access to an additional 32 GPIO lines.

This was done in a way that would also allow two M3 fixing holes in opposite corners.  Finally, 4 additional signals - not present on the original Arduino headers were added, to give an I2C and the 2 extra pins on the R3 power header.

The 50x50 pcb fitted with 100 pin LQFP and mini-USB connector 


Choice of Processor.

The 50 x 50 board layout could be used for any microcontroller that offers around 50 to 60  I/O lines and can be readily adapted to suit various packages - up to 100 pin LQFP (Like the STM32F746).  For most projects it is a good match with 48 pin or 64 pin LQFP packages.

It may also be used with DIL footprint ICs - and it is just possible to shoehorn a 40 PIN DIL onto the pcb - such as ATmega1284 etc.

However because my recent experience lies with the STM32Fxxx range of ARM Cortex M3 and M4 microcontrollers, these were the obvious first choice.

Conveniently a board designed for one particular variant, can also be populated with another close family member  - so I chose the STM32F103 workhorse, and the STM32F373 - which has a faster M4 core , a floating point unit and significantly more analogue ADC capability - in terms of ADC resolution and signal lines.

Each of these processors has a maximum of 51 or 52 GPIO lines, but once you remove two for the crystal, two for the USB, two for the ST-link and two for the RTC  - you are down to a more manageable 44 lines.

The designation "PA" refers to the physical pins of GPIO Port PA on the STM32 mcu package - and not the A pins on the header. I hope this does not cause undue confusion.

Port PA   12 signals
Port PB   12 Signals
Port PC   14 Signals
Port PD     2 Signals
Port PE     2 Signals
Port PF     2 Signals

Total        44

This is 24 more than the original Arduino, so at least an additional 3 x 8 pin headers will be needed to accommodate these.

The problem is how best map the various GPIO ports on the ARM to the physical pins of the connectors - in a way that makes sense and clusters them by function.  Separating into nominally analogue and digital is a good starting point.

Layout of the Ports.

In addition to the Arduino's A0-A5, the proposed board offers a further 10 analogue inputs  - allowing A0 - A15 and 6 additional analogue or digital lines C0 to C5

These are provided on a 16 pin header on the same side as the existing analogue and power headers.

On the "digital side" of the board there is also and additional 16 pin connector.  This is the Expansion, or Extra port  - and is designated E0 to E15. If you want a 16 bit bus connected  - say for a FPGA project, then this would be good use of  port E.

Furthermore, later Arduino UNO R3 models offer two pins for I2C devices  - these are added as D14 and D15.

Conclusion

The proposed 50 x 50 board size is convenient, compact and versatile.  It has sufficient pins for the more demanding applications, and sufficient board area to allow plug in modules to be added.

The board can sensibly accept microcontrollers or FPGAs up to about 144 pin LQFP - which makes it viable for projects incorporating the STM32F7xx Cortex M7 or the Xilinx Spartan 6 range of FPGAs - both of which are available in LQFP - and this solderable by the hobbyist/enthusiast.



Revelation Time!

0
0
Revelation Time!

The STM32F7 Discovery board is remarkable a board for the money providing a rich mix of hardware, communication interfaces and memory. With it's 216MHz STM32F746 ARM Cortex M7 microcontroller and 4.3" capacitive touchscreen display, it makes a very capable platform to develop embedded applications on.  It represents probably the fastest system available that allows true bare metal programming - unhindered by an operating system. For anyone developing embedded systems requiring touchscreen display, ethernet, audio and a fast 32 bit processor, the F7 Discovery is worth considering. All of the principal ICs on the F7 Discovery are available in LQFP or TSSOP packages - which means that the core hardware design could be recreated on a simple double sided pcb - suitable for hobbyist construction.

When I first looked at the STM32F7 Discovery board, and saw the Arduino headers on the underside of the pcb - I thought that such a powerful board, having been design restricted to just 20 feeble I/O lines, was a bit lame - kind of added at the last moment - as an Arduino afterthought.

However, when I took a closer look at what signals had actually been routed out to the Arduino headers - I discovered that it is possible to get all but 1 of the signals for an RGB  2:3:2 display!  It almost looks as the signals were planted on the Arduino headers on purpose - for a backdoor VGA display  - of up to 128 colours.  Now that did get my attention!

A VGA shield could be easily built on a proto shield or even stripboard.

Not only that - but two SPI ports and an additional UART appear on the headers - allowing for a PS2 mouse and keyboard, and an auxiliary serial debug port!

Whoever laid out the F7 Discovery board was either a Genius of a bluffer!

A Pocket Workstation

With the possibility of quite an easy hack to get 7 bit VGA from the Arduino headers, and add a PS2 keyboard and mouse - it occurred to me that the Discovery board had just redeemed itself as a prime contender for the proposed retro-workstation project.   It would certainly work well as a target board - to allow ideas to be tried before committing to further hardware.

A simple shield carrying the VGA resistor networks, a VGA connector and 2 PS2 sockets could be made up very easily - on the new preferred 50x50mm board format.

The Discovery board provides  8Mbyte of SDRAM,  16MByte of Nor Flash and a microSD card. This would be sufficient for much of the experimentation I want to do - after all I am looking for minimal systems with modest resources. In terms of computing power - the Discovery board represents a respectable mid-1990s desktop.

Other useful hardware is the ethernet connectivity, the OTG high speed USB and the audio interface. hardware.

mbed on the STM32F7 Discovery

With mbed available for code development - this seems to be a step up from the humble Arduino - and definitely a very exposed platform for bare metal code development.   The hardware is totally accessible - and not hiding behind Linux, and there is no part of this hardware design that I cannot recreate using LQFP packages on a self designed pcb, adding more SRAM or SDRAM as required.

The hardware is quick too - it looks like it will be about 30 times the speed if the ZPUino - plus more resources - and a more flexible VGA or LCD system.  There is scope for Li Po battery operation - and discover some of the low power modes of the STM32F7 micro.


Discovery F7 VGA Output Hardware Details


Here's the hardware connection details for the proposed "Arduino" VGA/ PS2 shield.

Colour                      Port                       Arduino Pin

R7                             PG6                       Dig 2
R6                             PA8                       Dig 10

G7                             PI2                        Dig 8
G6                             PI1                        Dig 13
G5                             PI0                        Dig 5

B7                             PB9                       Dig 14
B6                             PB8                       Dig 15

H_SYNC                  PC6                       Dig 1  
V_SYNC                  TBD                      Available on camera FPC

The missing V_sync is probably not too much of an issue.  It could be generated by a Timer - clocked by H_sync

LCD_CLK                PG7                       Dig 4

TIM3 CH1                PB4                       Dig 3
                                 PH6                       Dig 6
TIM2 CH1                PA15                     Dig 9
                                 PB15                     Dig 11
                                 PB14                     Dig 12


For serial communication - we have access to UART7 on PF6 ad PF7 - which appear as AN4 and AN5. This will allow a FTDI cable to be plugged is as an auxiliary/ debug port

So the full line up - rearranged

H_SYNC                  PC6                        Dig 1
R7                             PG6                       Dig 2
TIM3 CH1                PB4                       Dig 3
LCD_CLK                PG7                       Dig 4
G5                             PI0                        Dig 5
TIM12 CH2              PH6                       Dig 6
Spare                        PI3                         Dig 7
G7                             PI2                        Dig 8
TIM2 CH1                PA15                     Dig 9
R6                             PA8                       Dig 10
TIM12 CH2              PB15                     Dig 11
TIM1/8 CH2N          PB14                     Dig 12
G6                             PI1                        Dig 13
B7                             PB9                       Dig 14
B6                             PB8                       Dig 15

                               PA0                       AN0
                               PF10                      AN1
SPI5_MOSI            PF9                        AN2
SPI5_MISO            PF8                        AN3
SPI5_SCK              PF7                        AN4
UART7 TX             PF6                        AN5

Experiments with STM32F7 Discovery

0
0
In this post I look at some possible uses for the STM32F7 Discovery board.  I look at the possibilities of getting it to generate a video display for an external VGA monitor.  I also look at a innovative means of creating a custom GUI - based on a low cost IC.

In late September I traveled up to Cambridge, to attend an ARM course hosted by ST Microelectronics as a vehicle to showcase their new Cortex M7 microcontroller.

The course described in detail some of the key features of the M7 architecture, and how it's performance has been enhanced over the existing M4.  In the afternoon there were practical introductions to some of its DSP capabilities.  And of course, the main reason for going was to get a free STM32F7 Discovery board - worth about £40.

The F7 Discovery board comes with a load of hardware bundled in, notably a 4.3" capacitive touchscreen, ethernet, stereo audio, microSD card,  camera interface,  dual MEMS microphones - for directional sound analysis, and external 8Mbyte SDRAM and Flash ( to demonstrate code execution from various external memory devices).

I had put the board to one side for a couple of months, whilst waiting for mbed to support this feature rich board.  Now at last, mbed has stable support - and I can begin experimentation.

As with all mbed platforms, there are a few code examples supplied to get you started  - in this case - writing text and graphics to the LCD, and reading the capacitive touchscreen.  The examples were easy to follow, and easy to modify to suit my own purposes.

I decided that it would be interesting to run my J1 Simulation on the Discovery board, by way of benchmarking it.  If the J1 Sim ran reasonably quickly on this extremely portable platform then it would be useful to try out some of my J1 assembler development on it - written in mbed C++.

The benchmark was reasonable - about 4.65 million J1 instructions per second.  This should be quite fast enough for now to develop the next stage of the project.

LCD and VGA Dispays

Both  the STM32F42x M4 and the STM32F7xx M7 processors contain an LCD controller.  This generates the RGB parallel 8 bit video data and the horizontal and vertical sync signals for most modern LCD displays.   The  sync generators are entirely programmable, and will handle display resolutions of up to 1024 x 768.

It is a fairly trivial matter to combine the digital data lines using a weighted value resistor network - which acts as a simple D to A converter and allows an analogue video signal to be recreated.   This is exactly what VGA monitors require - so it should be possible to get the M7 to generate reasonaby good graphics on a flat screen monitor.  As much as I like the touchscreen LCD - text on it is a little small.

This post to a forum confirmed the technique is valid - by modifying a '429 Discovery board 

To put this plan into action, I will need to create another breakout - board,  in order to get at the necessary signals  - unfortunately the F7 Discovery uses a BGA packaged processor - and virtually none of the required signals are accessible.

The board can use either a STM32F439 or a STM32F746 - as they are pin identical.  There should be at least 2MB of fully static SRAM and a block of SDRAM.

The new breakout board will have a modified Arduino MEGA foortprint - to allow it to be compatible with my Papilio Duo and Computing shield hardware. These provide PS2 and VGA break-out connectors.

An ARM/FPGA Duo ?

Since getting my Papilio Duo FPGA boards earlier this year, it has opened up experimentation with computer hardware and softcore processors.

One of my goals is to put an FPGA and a reasonably powerful ARM Cortex M7 processor together on the same pcb. This will allow exploration of the FPGA hardware - but with a fairly standard ARM to provide system hosting.  It is likely that they will share some dual ported RAM.

For debugging the ARM will provide a user interface - allowing keyboard, mouse, USB and ethernet interfaces.  Code can be simulated at about 1/30th of real speed on the ARM, before being ported across to the FPGA.  This is going to be a challenging brain stretching project.

The intention is to create a workstation-like environment, with 1024 x 768 graphics on a large screen monitor.

Prototyping

It should be possible to prototype a lot of this before having to commit to a lot of new hardware.

The 100 pin STM32F746 BOB board allows most of the RGB lines to be accessed - enough to allow a full 8 bits of green, and 6 bits of each Red and Blue to be connected.  This will at least allow the system timing to be tested on a VGA monitor.  The Computing shield has a break off section which allows VGA combination of 4 bit RGB signals.

The STM32F746 has 320K of SRAM - and if we are using 4 bits of each RGB - then we cannot display much of a picture.  Instead - for experimentation - pack up the RGB as 3:3:2 into a single byte - that would then allow a 640 x 480 display.

A lot of work has already been done in getting microcontrollers to output video.  Cliffle's Blog is a good place to start.

A Graphics Co-Processor? 

The idea of offloading the burden of driving a video display to a custom graphic processor has been around since the earliest days of the PC.  Traditionally the graphics system would involve a frame buffer RAM, into which the cpu would paint the pixels required for the video image. The cpu would usually use the vertical blanking period (about 50 blanked lines or 3.2mS) to update the data in the frame buffer.

Loading the frame buffer with the data for text characters and shapes is computationally intensive - and if it could be offloaded to a special graphics engine, then it frees up huge resources in the cpu.  The other way of looking at this, is that a much slower, simpler cpu could be used, if it no longer has the graphics overhead.

This is the approach adopted by FTDI, who have launched a series of embedded video engine (EVE) ICs - which are finding there way into embedded systems.  EVE does not just handle the video for the LCD, it also has a touchscreen controller interface and an audio generator - which a range of pre-recorded sounds.

EVE treats all the items that go to make up the video display as objects.  There are numerous in-built objects - such as buttons and sliders - that go up to making a modern GUI.  The host processor need only built up a display list of the objects that it wants to display, in terms of size, position, colour and other attributes - and this list is passed to the EVE device, whenever the mcu needs to update the display.

As the display list is a fraction of the size of  the objects that it represents, it means that it takes very few resources to generate and update it. This means that a simple 8 bit microcontroller can easily generate a display list for a very slick looking GUI.  FTDI have emphasised this capability - by basing there development boards around the ubiquitous ATmega328p  - commonly known as Arduino.

There are several key advantages to using this approach:


  • Future proofs the GUI,  will work with any display up to 800 x 600 pixels.
  • Changes can easily be made  - eg change position, font, colour etc of display objects
  • Speeds up the development time of slick looking GUIs
  • GUI is decoupled from the microcontroller resources - freeing up time and memory
  • GUI object lists can easily be ported to different microcontrollers.
  • Capacitive or resistive touch screen interface -if required
  • jpegs can be easily rendered from microSD
  • GUIs can be knocked up easily - "Try before you Buy" to meet sales & marketing aspirations


  • The EVE chips may be purchased in 1 off from Farnell - for about £5.

    The Frustrations of a Casual C Coder

    0
    0
    mbed to the Rescue!

    Last weekend I had all the frustration of trying to develop code for the STM32F746, and was forced to use Keil' s code size limited IDE, and an unwelcome foray into STM's hardware abstraction later HAL.

    Previously my benchmark tests had been developed on a mix of the Arduino, STM32-Arduino, CooCox and Keil MDK IDEs. Trying to juggle so many IDEs and HAL as well was becoming counter productive and not at all a rewarding experience. There has to be a better way.

    During the week, I started to look again at using the STM32F7 Discovery board for a small project, and was pleased to see that this target platform is now fully supported by mbed.

    I had dabbled with mbed last year when building an experimental motor control board and found it a useful means to put together small applications quickly - so I thought I'd give it another go using the STM32F7 Discovery board.

    I can now say that I was pleased with the progress with using mbed for this board -   so I have decided that from now on, that I will try to standardise my ARM project developments using the mbed compiler.

    The mbed online IDE presents a clean modern look, is easy to use and removes a lot of the complex clutter that a conventional multi-platform IDE presents. The only downside is that the online compiler does not directly support debugging - but a separate project GCC4MBED makes use of the CMSIS-DAP debugging and programming output via a USB port.

    It appears that if you want to work across several targets, that there is still not yet one solution that fits all, and it will always be necessary to do some juggling between different toolchains.

    The rest of this post is an autobiographical account of my various dealings with microcontrollers - over the last 35 years.

    Early Beginnings.

    My first introduction to personal computers was over 35 years ago when my secondary school bought an early Z80 machine, the Research Machines RM-380Z, housed in a 19" rack. These machines were bought by the Isle of Man Board of Education, under a nationwide government initiative to introduce computers into schools on the Isle of Man - a full 2 years ahead of the BBC Micro initiative.

    So my early experience of personal computers was sitting at a black and white monitor, writing in fairly standard, interactive BASIC, and this is more or less how it remained for the next 5 or 6 years or so - through a progression of machines including ZX81, Spectrum, BBC B, Apple II, Tatung Einstein and several others.

    Other experiences included writing a fair amount of assembly language - mostly in Z80, and mostly without the benefit of a machine that actually ran a Z80 assembler!   Assembly by hand, using paper, pencil and a table of mnemonics and opcodes is brutally tough, especially when you then have to then use an Apple II, to create an eprom from the hand assembled code you had just written.  With a code, program erase cycle of some 20 minutes - coding mistakes were frustratingly painful and time consuming.

    However - as a naive 18 year old in the technology backwaters of the Isle of Man, I was unaware that tools like hex editors actually existed, and the possibilities of getting a Z80 assembler on a Z80 machine - in a small company that had invested a lot in an Apple II - seemed somewhat remote.

    Sometime later, I wrote a Z80 disassembler in ZX Basic, and I also adapted a Z80 monitor program to allow hex dump and hex editing - and that was the extent of my tool chains up until the late 1980s when I believe the Tatung Einstein a CP/M machine I bought half price, end of line, was the first machine to offer these tools as part of the CP/M package.

    Forth

    Forth is a fascinating language that has captivated me since I was a teenager at school.  I probably first heard a rumour about it around 1982 - and that it had been used to control radio telescopes 10 years earlier. Byte Magazine ran a special edition on Forth in 1980 - which I later found in a technical library.

    My first copy was ZX81 ROM-Forth - and it was a copy.  As students in an electronic engineering department we had access to an eeprom programmer, and a good friend, Hadyn and I bought a legitimate ROM between us from an advert in the back of a computer magazine, and then ran off a copy at the soonest practical opportunity. ZX81 ROM Forth was a sophisticated yet quirky product, somewhat ahead of it's time, but sufficient to teach me the basics of the language. It was also much faster than interpreted ZX81 BASIC.  This was probably my first introduction to the notion of swapping ROMs to get a computer to do a completely different task.

    In the Summer of '84 Hadyn andd I took a day trip to London (from North Wales) to trawl the electronic music shops (elecrosynth was all the rage) and electronic surplus stores off the Tottenham Court Road. There in "Henry's Radio" I spotted a Jupiter Ace - again end of line - so I snapped that up cheaply.

    Forth was somewhat frustrating on those early UK 8-bit machines. The only means of saving a program was on an audio cassette, which was not entirely successful everytime.

    By the late 1980's I had more or less given up programming personal computers, became a user of other peoples applications, and dropped high level coding entirely for about the next decade.  By the time I got back to it the world had moved on significantly....

    I wrote some 68000 assembly language for a company that made scientific instruments, but this was all so alien to me compared to Z80, that I was kind of "in at the deep-end" and struggling to keep above water.  From hand-assembling a few lines of code, to working on a multi-module was a big leap - and not one that I managed to achieve successfully.  I left that company shortly afterwards....

    PIC Practice

    I had known about PICs since about 1995, but had no practical experience of them. I joined a telecoms company in late spring of 1998 and I worked on some telephone dialler products.  During this time I worked out some PIC machine code routines to perform many of the basic telephone signalling tones, including DTMF send and receive, V23 modem send and receive, American payphone "Nickel Tones" and several others.  This was a productive time for me, and with nothing but a PIC and a R-2R resistor ladder network as a primitive audio DAC - I developed code that in the wrong hands could create network chaos.  However wireline telecom development was very much on the way out, so there was another career change on the horizon.

    A Baptism of C

    I took a management role for a company in the midlands that were doing early asset tracking and wireless telematics - using a combination of GPS receiver coupled to a GSM modem module.  Their code was also running on a PIC, but this time in C.  In my year as a manager there, I watched other developers struggle with C, and me not having the experience to assist.

    In 2005 I dabbled with a budget C compiler for PIC.  The language was still so alien to anything I had experienced, I put C firmly on the back burner for another few years.

    A colleague urged me to have another look - telling me that "it was not too bad once go get into it". This would happen a few years further on - when I was first introduced to Arduino.

    I took a job in central London that were developing a smart energy monitor.  The source code for this was in PIC C for the receiver and 8051 C for the transmitter unit.  I was involved in product testing, and we built a test system and other gadgets, based around the Arduino.

    Arduino was the first product that sufficiently de-mystified the C language for me to begin to make some progress with it.

    My first exposure and early experiences with Arduino gave me sufficient confidence to stick at C - which conveniently brings us up to 2010.  The last 5 years of this rambling account, will be the subject of a later post.






    Colour Coding - Part 1

    0
    0
    The 800 x 600 VGA display is partitioned into 16 "Keyholes" or code windows

    Colour Coding was a quick Sunday afternoon hack to see if there might be a better way of presenting a coding environment to the human user.

    The code was written in "Arduino" and running on a ZPUino soft core processor hosted on a Papilio Duo Spartan 6 FPGA board.

    This combination was chosen because the ZPUino supports VGA generation hardware, and can make use of the Adafruit GFX common graphics library.

    This is just a trial mockup of how code might look if presented in a radically different format on a VGA screen - plus it's a means of trying out some ideas - of what might work, and what won't.

    These big bold, bright displays can easily be generated with modest hardware and limited processing power.  However they have their application for FPGA projects that need to generate more than just a UART output - but create an interactive graphical user interface on a big LCD monitor.

    Historical Note 

    The traditional screen editor has been around for about 45 years - ever since serial terminals could rapidly update a whole page of text and show screen cursor and editor operations.

    Expressing source code as a linear text file has always been accepted as one the easiest means of getting source code into a compiler, so that the various functions may be compiled in the order that they appear in the text file. Its a throwback to paper tape and punched cards that were fed in and read in sequence until the end of the tape or card deck was reached.

    This might not be the best way to organise source code for display - especially, as in the case of Forth, and certain object oriented languages, where the source consists of a lot of very short functions.

    Forth traditionally organised it's source into 1024 byte blocks - as these were seen to be a convenient size to edit, update and store on the disc.  A 1024 character block of source code is just 32 lines of 32 characters, and on a modern display 1024 x 768 screen - at a comfortable text size, this occupies about 1/15 of the screen viewing area.


    The text in the white keyhole is 32 x 16 characters



    Colourful Times

    Text has traditionally been in monochrome, but more modern editors have started to add colour to the text in order to signify meaning.

    Colour is something that is so simple to overlay over monochrome data - even in chunky VGA text, that it is surprising that not more use has been made of it. Indeed whe I edit pcb layouts in EagleCAD - the only way I can distinguish the top layer tracks from those of the bottom layer is by the use of bold colours.

    The same process could be applied to text - making it more readable and conveying meaning through the colour attributes.

    The text is almost the same as a screenful of ZX81 BASIC


    Micro-Windows or "Keyholes"

    Perhaps there might be a  better way of presenting source code on the screen, rather than having to constantly scroll up and down the text file looking for the function that you wish to edit.

    Using standard 1024 character Forth blocks, it would be practical to display up to 15 blocks arranged as a 5 x 3 array. The background colour of each block would be set to be different to that of it's neighbours, for ease of visibilty, and using the mouse or touch pad, each of these "micro-windows" could be brought into focus, to allow editing, compilation or execution of that block.

    Each micro-window would have access to its own toolset: edit, compile, execute, save etc accessed from buttons along the base of the window.

    Where a high level code definition was dependent on certain lower definitions, then those lower definitions could be colour-tagged to show the dependency tree.

    Micro-Windows is such a dreadful constructed name, perhaps "porthole" or "spyhole" or even "keyhole" may be more appropriate.

    Small Objects of Desire.

    Much of modern application development is done in object oriented languages,  where objects are created from data structures, and those structures are manipulated by methods. An object plotted to a screen might take the form of a Red Ball,  located at a certain x,y position on the screen and with a certain diameter and colour.  Further attributes could be used to describe it's shading, transparency or what graphics layer it resides on.  The use of keyholes might be a good way to present the attributes to the developer, such that they could be edited.  Likewise the method scripts that manipulate the objects could be viewable through the porthole system.

    Mouselection   

    Another questionable made up word, however, one that describes the selection and subsequent viewing of a particular block of code based on its colour.  A great deal should be possible by partitioning the overall screen view into zones, and sub-zones defined by colour.  When the mouse mouseovers or lands in a zone or subzone the options are immediately highlighted.

    Multi Processor Arrays

    The array of keyholes might be a good way to view tasks that are distributed across arrays of multi processors.


    TBC

    A Little Bit of 1980's Nostalgia

    0
    0
    The ever enduring ZX Spectrum


    Almost like the real thing!

    First - a Historical Note

    The first true stored programme computer nicknamed "Baby" was built in Manchester - and first ran code in July 1948.

    Some 34 years later - Clive Sinclair and his Cambridge team produced the Iconic ZX Spectrum - a 1980s design classic.

    Here we are another 34 years later (almost) and the Spectrum - and it's close cousins, descendants or cyborg clones are still very much alive.

    Spectrums have been around for virtually half the time that there have been electronic computers in existence!

    So What's New?

    At the weekend I stumbled across the Recreated Spectrum - a life sized replica that appears to be a good match to 1980's Sinclair ZX Spectrum. It has been released by Elite Systems - the electronic games company behind some of the original Spectrum games.

    However this keyboard is not what it appears - it is perhaps something more useful.

    Rather than recreating the Z80 based hardware of the original Spectrum - based on an FPGA, this is actually a Bluetooth keyboard, intended to work with various Spectrum games apps running on a PC, Apple or Android.

    The keyboard has 2 modes selectable by a switch at the back - Position A is for Spectrum mode - where each key down and key up generates a pair of "Sinclair" key codes.  This is the game playing mode - and works with the online and downloadable apps.

    The second mode is for a standard qwerty keyboard - or as standard as you can get - given the limitation of 40 keys.

    What I hadn't realised is that the keyboard arrives "locked" in game mode - and you have to go to the online app and run the unlocking script.  This records the electronic serial number of the keyboard and returns an unlock code - which you have to enter.

    Once unlocked - it functions more or less as expected.

    In the online app you have the option to access a plethora of original ZX Spectrum games  - often recreated by the original authors.

    In addition there is emulation of Sinclair 48K and 128K BASICS  - so kids can have a go at learning some basic instead of Scratch or Minecraft this Christmas.

    The keyboard feels just like the old one - and needs a good thump on every key  - just to be certain that it's actually sent the character.  There will be no awards for speed - touch typing on this one.

    The world has moved on a lot since 1982 -and we have all got used to real keyboards, a multitude of function keys - and mice with scroll wheels.

    I was never a games player back in the early 1980s - so I doubt I will be doing much gaming on this modern recreation. However if you have a spare £90 or so  - it might make some sad, middle aged "Bedroom Warriors" very happy this Christmas.

    Z80 Emulation and FPGAs

    The Z80 was the first microprocessor I encountered properly in my senior high school last 3 years  - and I spent an inordinate number of time in my youth working with it.

    Z80's are quite rare now - and the old 16K x1 or 64K x1 DRAM memories that they were connected to are no longer commercially available.

    For this reason, generally the Z80 experience has been recreated either in software on a modern PC, or in hardware on a FPGA.  Both of these routes are being voraciously pursued by a community of enthusiasts  - hell bent on keeping the 1980's  8-bit experience alive.

    The key to it is creating a machine - virtual or real, which can munch it's way correctly through Z80 machine code  - in a similar way to how the Java Virtual Machine eats Java bytecode for  breakfast, dinner and tea!

    A lot of the original games - that at the time really existed in cassette tape format have been transcribed into hex files or rereleased.

    The Z80 appeared a year or so after the 6502, and was a more sophisticated beast.  For a start, the designers purposefully it backwardly compatible with 8080 machine code - in order to make use of the large wealth of CP/M software that existed at that time.  The Z80 and Z80A  in their time were like an 8080 on speed!   The Z80 also had a rich mix of registers - that could be paired together to handle 16 bit operations. There was a DRAM refresh controller on chip too  - that greatly simplified the refreshing cycles of  dynamic RAM.

    It was a great little chip, and thanks to the prolific games programmer fraternity - an awful lot was squeezed out of a pint pot.

    So I decided to have a trawl of the Z80/Spectrum community websites and there were 3 real strands of development that impressed me.

    Doing it in Hardware - Soviet Style

    The Spectrum was very widely cloned and copied in fact - more so than probably any other 8 bit computer in the world - especially in the former Soviet block.  To this day there exists an open source computer called the "Pentagon" - a kind of ZX Spectrum 128 copied by enthusiasts, and still available after several generations of hardware. It replaces the Z80 with a faster  Z280 and an FPGA to provide memory interface and video generation logic.  There is a standard VGA video socket,  two PS2 ports for keyboard and mouse - and several other bits of hardware that were not around in the 1980's.  The latest iteration  - the "Speccy 2010" is well covered in detail in retro hardware hacker, Matt West's blog.

    Here's a couple of incarnations of the Speccy 2010




    Doing it in Software - Spanish Style

    The second interesting thing I came across is a very compact emulation of the Z80  - written in '86 assembly language.  At just 4Kbytes long - it rattles it's way through Z80 machine code and produces cycle accurately timed Z80 behaviour.   It has been used to good effect on several software Spectrum emulators - in particular  "Bacteria"  Here's a great links page

    There's a lot more to these emulators - with several to choose from.   Several different early 1980's Z80 home computers have been emulated in this way - including several offering online emulation in Javascript.

    Doing it with FPGAs

    This is definitely where the technology is heading  - just recreate the whole 1980's machine complete with Z80 softcore,  ROM, RAM, peripherals and video generation hardware all on the one FPGA.
    This link to the ZX-UNO will get you started.



    So - in this little round-up I have shown that there's more than one way to skin a cat, and the Spectrum despite it's senior years  - refuses to lie down and die.



    Colour Coding - Part II

    0
    0



    An early VGA experiment using the FPGA "ZPUino" soft core

    Last week I wrote about some ideas to use colour to enhance the coding experience.  This week I round-up some of the hardware that makes colourful graphics possible on low resource systems.

    The 32-Bit Mass Migration

    The hobbyist market is moving towards more sophisticated microcontrollers, and I guess that within a couple of years we shall all programming 32 bit ARM devices.

    These will be literally "as cheap as chips", and it will be the 25 year old, 8-bit processors (PICS AVRs,  8051) , manufactured in rapidly becoming obsolete process geometries that become disproportionately expensive - as we rush to geometries around the 10nm mark.

    Atmel have tried to counter this 8-bit stagnation, by releasing some new AVR parts, that are made in a 130nm CMOS process but the writing appears to be on the wall for the humble 8-bit parts. STMicroelectronics are making some of their older processes available to Universities for custom chip design - for about $2400/mm2.

    Atmel have recently launched a range of 300MHz Cortex M7 processors using a 65nm process.  These will eventually migrate to a 40nm process, and have clock frequencies of around 400MHz.  ST Microelectronics have also indicated that they are pushing their M7 processors to 400MHz.

    Whilst single board computers, such as BeagleBone and Raspberry Pi are now offering multi-core 1GHz processors, these devices are generally intended to work with the Linux O/S., and have achieved large sales success - it's is the basic 32-bit microcontroller, programmed in bare metal C (without operating system framework) that is rapidly playing catch-up.

    Within a year or so, 400MHz, 32bit ARM processors will be available on boards - designed for ease of use to the hobbyist and hacker community.  With a vast array of on-chip peripherals, these devices will find their way into flight controllers, software defined radios, games consoles, robotics and anywhere the microcontroller has to perform cycle-accurate real-time control.

    Video for All

    The one key factor that fuelled the microcomputer revolution 35 years ago has to be colour graphics on high street affordable PCs. Few perhaps remember when the only interaction with a computer was via a monochrome serial terminal. The use of colour, brings all levels of computing alive - even for the most basic of microcontroller projects, where a touch screen LCD is now a realistic practicality.

    Traditionally, embedded microcontrollers have had fairly minimal user interfaces - perhaps a character LCD and a few switch button inputs. In reality, a lot of embedded projects have user interfaces that have not progressed much beyond the 1970's calculator era.

    This is perhaps not for the want of processing power, as microcontrollers now have clock frequencies in the hundreds of MHz, and sufficient RAM to support a modest graphics buffer.  The fact is, that until recently, displays have been an expensive luxury, and product designers have at best, been conservative in their designs - in order to keep overall costs down.

    There is now, however, no real need for this frugality, as the price of colour TFT displays has plummeted in recent years as a result of the SmartPhone and Tablet revolution.  The displays used in last year's smartphone are now available very cheaply - virtually dumped on the market, when the SmartPhone Circus moves on to it's next destination.

    This is good news for the hobbyist, as there are now a whole bunch of very cheap LCDs and OLEDs available - starting at a couple of dollars. The open source community has stepped up and produced drivers for these displays - making the whole business of adding a colour LCD or OLED to your project a whole lot simpler and cheaper.

    RGB Video Hardware

    From a hardware point of view, generating colour video is really not very difficult - especially when some of the newer ARM Cortex M processors have on-chip LCD controllers that do all the hard work of getting the pixels out of RAM, and sending them to the screen with the right timing and in the right order.

    By offloading the graphics generation to a secondary IC, it can free up a lot of resources from the main processor - allowing stunning graphical performance from what perhaps is a fairly modest processor.

    For  the hobbyist wanting to experiment with DIY graphics hardware - there are currently 3 or 4 options of interest. I am currently evaluating 1-3 below - and have hardware on order to allow me to investigate 4 below.

    1.  STM32 Cortex M4 and Cortex M7 - some parts have on chip LCD controller.
    2.  FPGA - use a soft core processor like the ZPUino -  which is connected to an 800x600 VGA engine
    3.  FTDI81x  - an embedded video engine IC - known affectionately as EVE for about £5
    4.  Raspberry Pi class SBCs normally have VGA, composite or HDMI graphics on board

    Getting a VGA Output 

    I have now reached the age where I rely on reading glasses to read small text - and so to read a text from a SmartPhone LCD is a case of reaching for my reading glasses every time.  Large, flat screen monitors - perhaps of 24" or 27" diagonal are now available cheaply - and so this would be my preferred choice of viewing any graphical output.  On a 27" diagonal screen - a single 800 x 600 pixel is about 0.75mm square.

    Most flat screen monitors will accept an analogue VGA input, with a range of resolutions handled by the multi-sync display.

    The VGA signals consist of a Red, Green and Blue analogue voltage, accompanied by horizontal and vertical synchronisation signals on 2 separate pins.  These signals are relatively easy to create from either a FPGA or an LCD controller - just with some simple resistor networks to reconstruct the analogue voltage from the parallel RBG digital outputs.

    By offloading the graphics generation to specialist hardware means that even a low resource microcontroller can have great looking graphics.  Indeed there has been much interest in recent years in recreating some of the classic home-computers from the 1980s using FPGA technology. The original Gameduino (2011) - an add on video shield for the Arduino to allow 8 bit arcade games to be played,  ran a VGA video engine inside a FPGA.

    GameDuino 2 - uses an embedded video engine - or EVE device from FTDI to create the graphics - and displays them on a 4.3" LCD.

    The EVE device has been enhanced (EVE2) - and will now support up to 800 x 600 displays. It is likely that a number of products based on EVE 2 will appear in the near future.

    Since the Spring I have been using the ZPUino soft core processor running on an FPGA to generate VGA graphics. My platform of choice is the Gadget Factory, Papillio Duo - an FPGA board based on the Xilinx Spartan 6 FPGA coupled to a fast 2MB SRAM. When fitted with a "Computing Shield" which provides VGA connector and PS/2 connectors for keyboard and mouse it makes a compact platform. This combination allows up to 800 x  600 VGA - and it is quick and easy to programme in "Arduino" C++ using the Adafruit GFX Graphics library.

    The ability to write coloured text, individual pixels or graphics easily to anywhere on a screen from within an Arduino sketch - makes the whole experience very rewarding. The ZPUino core can readily animate the graphics with simple code.

    The final method I touched on to generate video is to use the On-Chip graphics engine.  Some of the STM32F4xx Cortex M4 and STM32F7xx Cortex M7 microcontrollers have this as an on-chip peripheral.  It can produce stunning graphics, but will take a lot of C programming, and will consume a fair percentage of the processors throughput to produce a good display.

    Revelations II

    0
    0
    The ability to see information, code or data, displayed in a graphical manner is one of the most powerful techniques within the whole field of human-machine-interface design.  In this post I describe a prototype pcb - which will allow colourful graphics to be added to even the humblest of 8-bit micros  - in a manner that will reveal their true capabilities - as inspired by this youtube:



    Andrew's demonstration above uses the microVGA III - graphics engine which is about $60.

    Other off the shelf video hardware is available - including the more economically priced (but less capable)   uVGA  - for about $30.

    However, I am looking for something a bit cheaper, and more flexible - that may be applied to almost any microcontroller.

    Last weekend I looked at a couple of ways of producing VGA graphics using simple hardware - and I decided that the two approaches worth following up are:

    1. The use of the on-chip LCD controller on the STM32F746 microcontroller displaying 1024 x 768.

    2. The FTDI FT81x  EVE2 embedded video engine that provides 800 x 600 + touchscreen + sound.

    Having looked at the "Arduino" headers on the STM32F7 Discovery board - I have worked out that I can get a RGB 3:2:2  output from the on chip LCD graphics controller via the Arduino expansion headers.   Whilst this is one red bit short of the more standard RGB 3:3:2 format, it will be good enough for simple coloured graphics and text.

    Also during the week I have taken delivery of the FTDI VM810 - an  Eve2 development board - which drives a 5" LCD. I have found code that will run on the STM32F7 (and STM32F4) Discovery boards - to allow me to drive the FT810 from the STM32.

    The only problem, is that the Eve dev-board does not give sensible access to the LCD colour signals - as they all terminate in a very narrow 0.5mm pitch 40 way FPC (flat plastic cable) connector - a soldering nightmare.

    The solution was obvious - make up a quick pcb that breaks out the Eve IC and provides VGA and PS/2 mouse and keyboard ports.

    The VGA Adaptor Board with PS/2 Mouse & Keyboard Ports
    A 5050 Design.

    Some weeks back I decided on a new 50mm square board format - that is quick and cheap to make yet breaks out sufficient pins to make it useful for prototyping with SMT IC parts.

    The VGA board was going to be the first test of the new format. In the image above it shows the VGA connector to the top, he FT812/813 embedded video IC below it and the two PS/2 connectors for mouse and keyboard along the bottom edge of the pcb.

    The FT812/13 is a 56 pin QFN.   24 pins are used to generate the 24 bit RGB 8:8:8 video - and the banks of resistors that form the networks are seen surrounding the central IC.

    A General Purpose Board.

    Whilst intended to match with Arduino style headers - this board may be used to provide 800 x 600 VGA output, PS/2 keyboard and mouse to a variety of different development boards.  The pin-headers allow Arduino, Nucleo, Discovery F7 and a variety of other boards to be used.

    The interface between the microcontroller is not at all complex - requiring just an SPI bus (at 3V3 levels), chip select  and +5V power and ground to operate - the full list is below:

    The EVE chip requires just 7 (or 8) interconnecting wires to implement the SPI interface and provide power

    SPI_MOSI
    SPI_MISO
    SPI_CLK
    SPI_CS
    /PD         Power Down
    /INT        Interrupt  - optional
    +5V      
    0V

    Code examples are available on the FTDI site, including a driver library for Arduino and code for STM32 and PIC targets.

    Development boards are available cheaply from FTDI (via Farnell, RS, Mouser etc) - but these are oriented towards driving an LCD.  My design is specifically aimed at creating a VGA output suitable for a large flat screen monitor.





    Learning from the Past

    0
    0


    Learning from the Past

    We can learn a lot from looking back at computer systems of the past - that were by their very nature much simpler than those we use today, constrained by memory and the cost of the hardware they were implemented in.  The early computing pioneers had to find practical ways to program and run their machines using very basic technology.

    It's now almost 50 years since EDSAC was designed then constructed at Cambridge University.  It was a serial arithmetic processor - as this consumed vastly less hardware than a parallel machine. Whilst it was really quite a simple, resource limited machine - it was capable of serious mathematical calculation  - assisted by the collective genius of the Cambridge maths department.

    EDSAC had some interesting features that are worth noting, as they are still very valid to this day.

    The first was the limited instruction set or "Order Codes" - which had to be input into the machine memory from a 5-bit paper tape. This was standard telegraphic practice at the time, and by way of a couple of shift characters, was able to input capital letters, numerals and some punctuation symbols. With this limited character set of around 60 symbols - the whole instruction set was coded - and this naturally made it a minimal instruction set machine.

    The capital alphabetical characters were used by way of instruction mnemonics - mainly because they were easy to read and remember.

    The second notable feature was that despite the limited instruction set, there were specific codes for paper tape input (I)  and teleprinter output (O).  These operations were central to the whole functioning of the machine, and were given their own high prominence instructions.

    The boot up sequence of EDSAC was referred to as "Initial Orders".  This was effectively a hard coded bootloader routine that allowed paper tape to be read direct into the main memory.  It was hardwired using uniselectors - that were a common electromechanical data-selector (mux - demux) used in telephone exchanges.

    EDSAC had limited memory - which consisted of 32 mercury delay line "tanks" which could store initially 256  36 bit words. This was later extended to 1024 words.

    Instructions were executed at a rate of approximately 600 per second.

    Details of the instruction set, are given in this poster 

    EDSAC is currently being recreated from original drawings, photographs and contemporary notebooks at the National Museum of Computing in Bletchley Park.

    Applying it Today

    The value of studying a simple machine like EDSAC is that it illustrates how the core of the machine, the instruction decoder, the arithmetic unit and the memory work together to make a functioning computer.   These techniques may be applied to the design of any simple processor - translated into whatever hardware is available - today FPGA is the most likely choice - allowing a small fast and agile machine.

    It is clear that the machine language for EDSAC was compact, memorable, human readable  and with a little practice could be written down on paper,  and the various jumps worked out manually.  The first programs would have to be done this way  - because there was no other solution. Assemblers and Compilers were still a long way off.
     
    Notable from this design was the minimum instruction set and the use of alphabetical character as mnemonics.  The 5 bit character was decoded directly into the instruction field.  After the instruction came the 10 bit memory address on which the instruction operated.  It appeared that literals were not coded in the instruction - however - some constants like 2 and 10 were placed in low address memory locations.
     
    A machine like EDSAC could be readily ported to a FPGA, with the core of the cpu described in about 100 lines of Verilog.  This was the approach taken by James Bowman with his J1 Forth cpu - a simple machine, implemented in a small FPGA and intended for high speed operation.  Many of the features of the EDSAC are apparent if you look closely at the J1 design - although fundamentally one is a load-store architecture and the other a stack architecture.

    FPGAs logic cells have block RAM resources, that are arranged in 9 bit arrays - the extra bit being available for parity or other purposes.  This means that 18, 36, 72  bit structures are all possible in the high speed block ram.

    Even in a small FPGA there are sufficient logic blocks to implement multiple processor cores, linked by shared dual port RAM. Two or four EDSAC cores could fit into a Spartan 6 LX9. These small cores are designed for efficient execution of code, within a small address range - so suited for small applications.

    The Fabric of Computing

    This week Jean Claude Wippler has been running a week long series of posts discussing what he has termed The Fabric of Computing. TFoC.

    In his posts he starts by saying that whatever happened to the golden days of interactive computing where great things were achieved on processors that were very much resource limited when compared to these we use today.

    He looked at how 40 years ago, some very creative programming led to the creation of some clever applications such as Tiny Basic - which at a pinch, could be coded into about 2500 bytes of memory. This was a complete workable programming language, line editor and interpreter - shoehorned into the smallest of memory maps.

    Jean Claude then compared this with an application that many of us are familiar with  - the Arduino IDE. This now has a disk image of some 370Mbytes - just so that we can program an ever increasing range of microcontrollers in C++. He concluded that there has to be a better way - and I tend to agree with him.

    I too am a fan of minimal computing, and I have even ventured into the design of an interpreted language that can run on virtually microcontroller - and offer a range of commands, aimed specifically at exercising the hardware.

    LED flashing, audio tones, PWM, ADC reading, digital input and output form just a few of the commands available directly to the User via a serial terminal. Loops, integer arithmetic, conditional branches millisecond and microsecond delays and easily extendable from the keyboard terminal are key features of this minimal user interface.


    The language directly interprets the serial characters input from the terminal (or a file) and executes a series of subroutines. 

    The characters are chosen for ease of memorability - not unlike the "Order Codes" that were used in the 1949 EDSAC instruction set - see poster. 


    The "language" was inspired by Ward Cunningham's Txtzyme Interpreter - which I have built upon and greatly enhanced to include integer maths, defining new words, compilation, and a host of other features. 

    The langauge "SIMPL" has been continually evolving since May 2013 - so I refer you my original blog post - and you can follow the progress from there.

    At the moment it is coded in C for portability, and it sets up a virtual machine on the target processor. I have ported it to everything from Arduino ATmega, STM32xxx and MSP430. 

    SIMPL is based on a virtual stack machine - which makes it very suitable for "Forth - like" languages. Many of the ideas in SIMPL were borrowed from Forth, and the SIMPL vocabulary is almost a subset of Forth. 

    I am now working towards porting SIMPL onto the J1b Forth Processor - to get blistering performance from a specialised, but simple open core cpu.


    In the future, I see SIMPL as almost equivalent to a bootloader - in that it is a tiny program, loaded into the target microcontroller once at the start of the debug session - and from that point on it gives you access to a series of debugging resources, a user interface and the means to write interactive interpreted code to exercise whatever code that you are working on.


    Any microcontroller, once procgrammed with the SIMPL kernel will be able to interpret the SIMPL commands - a form of bytecode - either typed from the terminal or loaded from a file.


    What Next?

    Today I received another Papilio Duo FPGA board from my friends at Gadget Factory. This is the Spartan 6LX9 FPGA connected to 2MB of fast static RAM.
    I opted for the offer of the free "Computing Shield" - that was available as a special deal a couple of weeks ago.
    The computing shield allows connection of the following hardware:
    VGA output  RGB 4:4:4
    PS/2 Keyboard and Mouse
    micro SDcard
    RS232 COM Port
    2 Atari Games controllers
    2 Audio channels
    4 User LEDs
    4 User Switches
    Reset Switch
    1 "Grove" connector
    With this hardware it should be possible to recreate a comfortable computing environment.



    The J1b Forth CPU - on a Papilio Duo

    0
    0
    The Papilio Duo - A Spartan 6 FPGA board with 2MB SRAM


    Today, for the first time, I have a working softcore processor running in a FPGA on the Papilio Duo board. It's a J1b Forth cpu which is a 32 bit, minimum instruction set stack processor.

    Thanks to James Bowman, the J1 designer at Excamera Labs for supplying the bitfile, of a slightly speed reduced variant that now programs and runs on my newest Papilio Duo Spartan 6 FPGA board.

    James has supplied his "SwapForth" that is an ANS compatible Forth  - with some extensions to suit the Papilio hardware.

    I am still having some beginners teething troubles with Python, so I am running the serial comms using Termite - a Terminal application.  It is important to ensure that DTR is set low - otherwise the board is held in reset.

    Forth is an interesting language  - with an unusual Reverse Polish syntax - that slowly grows on you, the more you use it.

    Forth is also a fast executing language - especially on a processor that has been optimised to execute Forth almost as its native machine language.

    This build (80MHz) of the  J1b can execute 10 million empty DO LOOPs per second and can sum a sequence of 100,000,000 integers in about 23 seconds.

    When it comes to toggling an I/O pin under processor control - the J1b can do about 8MHz using it's io! word to toggle the pin.

    The J1 is a soft core processor waiting to be discovered. James's SwapForth maks it relatively easy to program.




    Colour Coding - Part III

    0
    0
    This is the first of my 5050 boards -
     that has gone to meet it's makers....

    As Frankie Howerd used to say....

    The Prologue

    In the last week, I have designed a compact VGA generation pcb - which will provide a test bed for FTDI's latest second generation embedded video engine IC   ("Eve") -either FT812 or FT813  .

    This board is in the form of a 50mm x 50mm shield - that will work with Arduino compatible devices - provided that they have a 3.3V system voltage (NOT 5V!!). The EVE IC  is not 5V tolerant!

    This includes all STM Nucleo boards, STM32F7 Discovery board - and my own design "Piano Forte" board which is STM32F1xx, STM32F3xx or STM32F4xx with Arduino Headers.

    The board also includes an interface for a PS/2 Keyboard and Mouse.

    I have ordered a small batch of these boards from Ragworm - a UK PCB vendor, and hope to make some progress over the Christmas break.

    Some Details

    The VGA board uses a FT812 to generate the VGA signals in 24-bit colour at a resolution of 800 x 600 pixels.

    The FT812 or "Eve" chip  (embedded video engine) is a very capable graphics co-processor with a 1MB frame buffer.  It can provide a low resource microcontroller with all the elements of a graphical user interface for just a few dollars.

    The FT812 is connected to the host processor "Arduino" using a very conventional SPI interface, along with an interrupt line (optional) and a Power Down signal.

    The FT812 provides 8 digital outputs for each of the RGB colours and each of these are weighted and summed together using a very simple resistor network  - to produce an analogue video signal of red green and blue components.

    Whilst the board is arranged with Arduino style headers - it can be used with any other 3V3 dev - board - using jumper leads - as only 8 connections are needed to interface to it.

    As this board is purely a VGA testbed - none of the LCD specific signals are brought to connectors.

    The PCB supports a serial connection using an FTDI cable, plus a variety of different break-out options.

    A set of optional resistors fitted to the underside of the PCB - allow it to be used solely as a passive VGA adaptor (without the FT812 fitted) - to work with the STM32F7 Discovery board - allowing up to 1024 x 768 7 bit colour.

    The Hardware Set Up.

    When used as a VGA generation shield for a 3V3  Arduino - like device - the following pins are used to access the FT812


    D3  Keyboard Clock
    D4  Keyboard Data
    D6  Mouse Clock
    D7  Mouse Data
    D9  SPI CS
    D11 SPI_MOSI
    D12 SPI MISO
    D13 SPI SCK

    A1  /INT
    A2  PD

    Power is supplied to the board via the 5V power pin - and is regulated down to 3V3 by IC2 - a maximum of 300mA available from the MCP1702 regulator

    An FTDI serial cable can be fitted into connector JP4 (next to the analogue inputs connector) and using a pair of jumpers JP2 and JP3  allows access to the D0 and D1 Rx and Tx pins.


    Other Connectors

    The remaining un-jumpered pins of JP5 provide breakout for the GPIO pins plus PD and /INT of the FT812. The additional pins - on the end of the Arduino "power" header give access to the resistive touchscreen sensing network - and could be used as such, or used carefully as 10 bit resistive analogue inputs.  


    The backlight pin provides a 7-bit duty cycle PWM signal - of frequency between 250Hz and 10Khz.

    Pixel Clock, Data Enable and Disp have not been routed out on the first prototype boards.
    The Audio pin has not been routed out on the first prototype boards.

    This board hopefully will provide the VGA graphics, keyboard and mouse interface to a variety of dev - boards, thus expanding their capabilities, manifold.  If the basics of the board are sound, it can be later augmented to cater for audio and a microSD card.

    Introducing Eve.

    The EVE chip has an impressive specification - here's a copy of the 1st page of the datasheet

    The FT81x is a series of easy to use graphic controllers targeted at embedded applications to generate high-quality Human Machine Interfaces (HMIs).

     It has the following features:

    •  Advanced Embedded Video Engine(EVE) with high resolution graphics and video playback
    •  FT81x functionality includes graphic control, audio control, and touch control interface
    •  Pinout backward compatible with FT800 (FT810) and FT801 (FT811).
    •  Support multiple widgets for simplified design implementation
    •  Built-in graphics operations allow users with little expertise to create high-quality displays
    •  Support 4-wire resistive touch screen (FT810/FT812) 
    •  Support capacitive touch screen with up to 5 touches detection (FT811/FT813)
    •  Hardware engine can recognize touch tags and track touch movement. Provides notification for up to 255 touch tags.
    •  Enhanced sketch processing
    •  Programmable interrupt controller provides interrupts to host MCU
    •  Built-in 12MHz crystal oscillator with PLL providing programmable system clock up to 60MHz
    •  Clock switch command for internal or external clock source. External 12MHz crystal or clock input can be used for higher accuracy.
    •  Video RGB parallel output; configurable to support PCLK up to 60MHz and R/G/B output of 1 to 8 bits
    •  Programmable timing to adjust HSYNC and VSYNC timing, enabling interface to numerous displays
    •  Support for LCD display with resolution up to SVGA (800x600) and formats with data enable (DE) mode or VSYNC/HSYNC mode 
    • Support landscape and portrait orientations 
    • Display enable control output to LCD panel
    • Integrated 1MByte graphics RAM, no frame buffer RAM required
    • Support playback of motion-JPEG encoded AVI videos
    • Mono audio channel output with PWM output
    • Built-in sound synthesizer
    • Audio wave playback for mono 8-bit linear PCM, 4- bit ADPCM and µ-Law coding format at sampling frequencies from 8kHz to 48kHz. Built-in digital filter reduces the system design complexity of external filtering
    • PWM output for display backlight dimming control 
    • Advanced object oriented architecture enables low cost MPU/MCU as system host using SPI interfaces
    • Support SPI data lines in single, dual or quad mode; SPI clock up to 30MHz 
    • Power mode control allows the chip to be put in power down, sleep and standby states 
    • Supports I/O voltage from 1.8V to 3.3V
    • Internal voltage regulator supplies 1.2V to the digital core 
    • Build-in Power-on-reset circuit  -40°C to 85°C extended operating temperature range 
    •  Available in a compact Pb-free, VQFN-48 and VQFN- 56 package, RoHS compliant



    PCB - Second Function  - As a Passive VGA network for the Discovery F7 

    The shield may also be used to fit to a STM32F7 Discovery board - which also has Arduino style connectors. The STM32F746 has an on-chip video generation engine, which synthesises the signals needed to run a colour LCD. Conveniently several of these higher bit colour signals and H-sync appear on the Arduino headers. 

    When used in this mode - the following pins can be configured to have RGB data on them.

    D0        Green 6
    D1        H-Sync
    D2        Red 7 
    D5        Green 5
    D8        Green 7 
    D10      Red 6
    D14      Blue 7
    D15      Blue 6

    VSYNC is missing but can be synthesised from H_SYNC used to clock a Timer input

    To utilise this mode of operation it is necessary to fit the resistor network to the underside of the pcb and fit the jumper headers JP1 and JP5.  Ten jumper links are needed to connect every pin on JP1 across to its neighbour on JP5.

    If using a F7 Discovery board, there is an additional SPI port and UART available - as alternative function on the Analogue Input Pins. The UART can be jumper selected using JP3 and JP4 so that it is accessible from the FTDI connector.  All of these pins accept analogue inputs of 12 bits

    A0   PA0      UART4_TX
    A1   PF10
    A2   PF9     SPI5_MOSI
    A3   PF8     SPI5_MISO
    A4   PF7     SPI5_SCK        UART7_TX
    A5   PF6     SPI5_NSS        UART7_RX


    Applications

    Colour graphics really makes computers come alive - and a simple video interface is an asset to any microcontroller.

    It has been seen that the Gameduino and Gameduino2 provide a spectacular graphical environment for even the 8-bit ATmega328 Arduino.

    Having a colour text output, that can be displayed on a large screen monitor - independent of an IDE- will give a whole set of new dimensions to developing and debugging code on any microcontroller.


    The addition of a keyboard and mouse all makes for a better computing environment.

    .








    Viewing all 234 articles
    Browse latest View live




    Latest Images