---------------------------------------------------------------------------

Library:      Atomthreads
Author:       Kelvin Lawson <info@atomthreads.com>
Website:      http://atomthreads.com
License:      BSD Revised

---------------------------------------------------------------------------

AVR/ATMEGA PORT

This folder contains a port of the Atomthreads real time kernel for the
AVR/ATmega processor architecture.

All of the cross-platform kernel code is contained in the top-level 
'kernel' folder, while ports to specific CPU architectures are contained in
the 'ports' folder tree. A port to a CPU architecture can comprise just one
or two modules which provide the architecture-specific functionality, such
as the context-switch routine which saves and restores processor registers
on a thread switch. In this case, the kernel port is split into two files:

 * atomport.c: Those functions which can be written in C
 * atomport-asm.s: The main register save/restore assembler routines

Each Atomthreads port requires also a header file which describes various
architecture-specific details such as appropriate types for 8-bit, 16-bit
etc variables, the port's system tick frequency, and macros for performing
interrupt lockouts / critical sections:

 * atomport.h: Port-specific header required by the kernel for each port

A couple of additional source files are also included here:

 * tests-main.c: Main application file (used for launching automated tests)
 * uart.c: Simple driver for the ATmega UART

Atomthreads includes a suite of automated tests which prove the key OS
functionality, and can be used with any architecture ports. This port
provides an easy mechanism for building, downloading and running the test
suite to prove the OS on your target. You may also use the simavr
simulator to run the entire test suite and prove the OS without real
hardware.

The port was carried out and tested on both an ATmega16 and ATmega32
running within an STK500 board, utilising the gcc-avr tools. It is possible
to use it with other processors in the ATmega range, as well as other
hardware platforms and compilers, with minimal changes. Platform and
compiler specific code has been kept to an absolute minimum.


---------------------------------------------------------------------------

PREREQUISITES

The port works out-of-the-box with the GCC tools (for building) and UISP
(for programming). It can be built on any OS for which gcc-avr is
available, but by way of example you can install all of the required tools
on Ubuntu/Linux as follows:

 * sudo apt-get install gcc-avr binutils-avr avr-libc uisp

Use with alternative compiler tools may require some modification, but you
can easily replace UISP by your own favourite programmer if required.


---------------------------------------------------------------------------

BUILDING THE SOURCE

A Makefile is provided for building the kernel, port and automated tests.
The full build is carried out using the following (replacing PART by the
ATmega device you are using):

 * make PART=atmega128

All objects are built into the 'build' folder under ports/avr. The build
process builds separate target applications for each automated test, and
appropriate ELF or HEX files can be found in the build folder ready for
downloading to and running on the target. Because of the limited resources
on the ATmega, and the large amount of automated tests, each test is built
and run as a separate application.


All built objects etc can be cleaned using:

 * make clean


The Atomthreads sources are documented using Doxygen markup. You can build
both the kernel and AVR port documentation from this folder using:

 * make doxygen


---------------------------------------------------------------------------

PROGRAMMING TO THE TARGET DEVICE

Application HEX files which are built into the build folder can be
downloaded to the target using:

 * make PART=<cpu> program app=<appname>

For example to download the 'sem1.hex' test application to an ATmega128
target use:

 * make PART=atmega128 program app=sem1

This uses UISP which will write the application into flash and reset the
CPU to start running the program automatically.


---------------------------------------------------------------------------

STK500 SPECIFICS

If the STK500 is used, you should connect your serial programming cable to
the "RS232 CTRL" connector.

The test applications make use of a LED to indicate test pass/fail status.
This is currently configured to use a bit in PORTB, which on the STK500
maps to a LED on the carrier board. You may change the port and register
bit in tests-main.c to utilise a different pin on other hardware platforms.
You may also completely omit the LED flashing in the test application if
you prefer to use the UART for monitoring test status.

The test applications also make use of the UART to print out pass/fail
indications and other information. For this you should connect a serial
debug cable to the STK500 "RS232 SPARE" connector. You should also connect
PD0/1 to "RS232 SPARE RXD/TXD" respectively using a two-wire link. Use of
a UART is not required if you prefer to use the LED or some other method
of notifying test pass/fail status.


---------------------------------------------------------------------------

RUNNING THE AUTOMATED TESTS

Atomthreads contains a set of generic kernel tests which can be run on any
port to prove that all core functionality is working on your target. In
order to accommodate a full testing regime, a large number of test threads
are spawned which on ATmega platforms requires at least 1KB and possibly
more RAM. Bear this in mind if you wish to run all of the automated tests
on your target platform.

The full set of tests can be found in the top-level 'tests' folder. Each of
these tests is built as an independent application in the 'build' folder.
Run them individually using:

 * make PART=<cpu> program app=<testname>

For example to run the 'kern1.c' test on an ATmega128 device use:

 * make PART=atmega128 program app=kern1

Before running the program and data size for the application is printed
out on the terminal. You can use this to verify that your platform has
enough program and data space to run the application.

To view the test results, connect a serial debug cable to your target
platform (defaults to 9600bps 8N1). On starting, the test applications
print out "Go" on the UART. Once the test is complete they will print
out "Pass" or "Fail", along with other information if the test failed.

Most of the tests complete within a few seconds, but some (particularly
the stress tests) can take minutes, so be patient. 

Note that some tests utilise four test threads, which together with the
idle thread and main test thread can consume significant RAM resource.
Those tests which utilise four threads will generally only run on
platforms with greater than 1KB RAM. They do currently run on a system
with 1KB RAM but only with stack-checking disabled and this may change
in future as the RAM usage changes. For platforms with only 1KB internal
SRAM you should disable stack-checking by disabling STACK_CHECK in the
Makefile, or ensuring that the ATOM_STACK_CHECKING macro is not defined.
Platforms with only 512 bytes RAM will be unable to run many of the
automated tests.

The test application also toggles a bit on PORTB to indicate test success
or failure (once per second if the test was successful and once every
1/8 second if the test failed). On the STK500 this can be connected to an
LED for a visual indication of a passed test without the UART. If you do
not wish to use the UART you may use the LED to indicate passed tests; a
UART is not a requirement for running the tests.

The full suite of tests endeavours to exercise as much of the kernel code
as possible, and can be used for quick confirmation of core OS
functionality if you ever need to make a change to the kernel or port.

The test application main() is contained in tests-main.c. This initialises
the OS, creates a main thread, and calls out to the test modules. It also
initialises the UART driver and redirects stdout via the UART.


---------------------------------------------------------------------------

RUNNING TESTS WITHIN THE SIMAVR SIMULATOR

It is also possible to run the full automated test suite in a simulator
without programming the test applications into real hardware. This is very
useful for quick verification of the entire test suite after making any
software changes, and is much faster than download each test application to
a real target.

A single command runs every single test application, and checks the
(simulated) UART output to verify that each test case passes.

This requires two applications on your development PC: expect and the
simavr simulator. You can edit the SIMAVR variable in the Makefile to point
it at the simavr install location on your PC and then run:

 * make PART=atmega128 simtests

This will run every single test application within the simulator and quit
immediately if any one test fails. You should pick an ATmega device that is
supported by the simavr simulator: atmega128 is a good choice.

The ability to run these automated tests in one command (and without real
hardware) allows you to easily include the OS test suite in your nightly
build or continous integration system and quickly find out if any of your
local changes have caused any of the operating system tests to fail.


---------------------------------------------------------------------------

WRITING APPLICATIONS

The easiest way to start a new application which utilises the Atomthreads
scheduler is to base your main application startup on tests-main.c. This
initialises the OS, sets up a UART and calls out to the test module entry
functions. You can generally simply replace the call to the test modules by
a call to your own application startup code.


---------------------------------------------------------------------------

PORTING TO OTHER HARDWARE PLATFORMS

If you are using a CPU other than the ATmega16, change the PART definition
in the Makefile to your own CPU, or specify the PART on the make command
line using:

 * make PART=atmega128

On CPUs with multiple UARTs, the port uses UART0 to output debug 
information. If you wish to use an alternative UART you may change the
registers in uart.c. Note that a UART is not vital for Atomthreads
operation anyway, it is only used by the test applications for indicating
test pass/fail status. If you do not wish to use a UART you may instead
flash a LED or use some other indication mechanism.

The Atomthreads port uses Timer 1 to drive the system tick. If you wish
to use some other timer then you may do so by modifying atomport.c.

Note that the build scripts make no assumptions about the RAM and program
space available on your platform, partly because you may have external RAM
over and above the ATmega device's internal SRAM. You must, therefore,
ensure that you have sufficient RAM and program space available for the
applications that you build. This information can be obtained by passing
the application ELF filename to the avr-size utility. 


---------------------------------------------------------------------------

RAM FOOTPRINT & STACK USAGE

The Atomthreads kernel is written in well-structured pure C which is highly
portable and not targeted at any particular compiler or CPU architecture.
For this reason it is not highly optimised for the AVR architecture, and by
its nature will likely have a higher text and data footprint than an RTOS
targeted at the AVR architecture only.

It is very lightweight compared to many of the alternatives, but if you
have very limited RAM resources then you may prefer to use a more 
AVR-specific RTOS such as those with shared stacks or built entirely from
assembler. The emphasis here is on C-based portable, readable and 
maintainable code which can run on any CPU architecture, from the 8-bitters
up.

A good rule of thumb when using Atomthreads on the AVR architecture is that
a minimum of 1KB RAM is required in order to support an application with 4
or 5 threads and the idle thread. If a minimum of approximately 128 bytes
per thread stack is acceptable then you will benefit from the easy-to-read,
portable implementation of an RTOS herein.

The major consumer of RAM when using Atomthreads is your thread stacks.
Functionality that is shared between several kernel modules is farmed out
to separate functions, resulting in readable and maintainable code but
with some associated stack cost of calling out to subroutines. Further,
each thread stack is used for saving its own registers on a context
switch, and there is no separate interrupt stack which means that each
thread stack has to be able to cope with the maximum stack usage of the
kernel (and application) interrupt handlers.

Clearly the stack requirement for each thread depends on what your
application code does, and what memory model is used etc, but generally
you should find that 128 bytes is enough to allow for the thread to be
switched out (and thus save its registers) while deep within a kernel
or application call stack, and similarly enough to provide stack for
interrupt handlers interrupting while the thread is deep within a kernel
or application call stack. You will need to increase this depending on
what level of stack the application code in question requires.

At this time the maximum stack consumed by the test threads within the
automated test modules is 124 bytes of stack, and the main test thread has
been seen to consume 198 bytes of stack. At this time the queue9 test is
the largest consumer of test thread stack (124 bytes) and the mutex2 and
mutex5 tests consume the largest main thread stack (198 bytes). If your
applications have large amounts of local data or call several subroutines
then you may find that you need larger than 128 bytes.

You may monitor the stack usage of your application threads during runtime
by defining the macro ATOM_STACK_CHECKING and calling
atomThreadStackCheck(). This macro is defined by default in the Makefile
so that the automated test modules can check for stack overflows, but you
may wish to undefine this in your application Makefiles when you are happy
that the stack usage is acceptable. Enabling ATOM_STACK_CHECKING will
increase the size of your threads' TCBs slightly, and will incur a minor
CPU cycles overhead whenever threads are created due to prefilling the
thread stack with a known value.

With careful consideration and few threads it would be possible to use
a platform with 512 bytes RAM, but not all of the automated test suite
would run on such a platform (some of the test modules use 6 threads: a
main thread together with 4 test threads and the idle thread).

The default stack area used by GCC is at the top of RAM (RAMEND), so when
your application's main() function is called the stack pointer is
initialised to RAMEND. This initial stack is only required during startup
for the main() function, and as soon as the OS is started the stack
pointer switches to the stack areas for the application threads. Once the
OS is started the initial startup stack is no longer used at all. You may
continue to use the top of RAM for your startup stack, but in the test
applications for this port we used another area of RAM. This is an
optimisation to allow all of the automated tests to run on platforms with
1KB RAM. The area reused is the idle thread's stack, which is not
required until the OS is started. Ideally you should provide your own
area for your applications, but this was an optimisation that allowed all
of the automated tests to run on ATmega devices with 1KB RAM and did not
require any AVR-specific changes to the automated test modules. This
startup stack does not require a large amount of RAM, probably less than
64 bytes.

The application's data starts at the bottom of RAM, and this includes all 
of the thread stacks which are statically allocated arrays. The idle
thread, main thread, and automated test thread stacks are allocated here.

This RAM layout is only the one utilised by the test applications. You
may choose whatever layout you like.


---------------------------------------------------------------------------

HANDLING CPUS WITH LARGE PROGRAM SPACES

Some devices such as the ATmega2560/2561 support program memory larger than
128KB. GCC defines __AVR_3_BYTE_PC__ for such devices, which is detected in
the architecture port wherever special handling is required. GCC does not
at this time support function pointers greater than 16-bits, however, which
means that the thread_shell() routine which is the entry point for all
threads must be located in the bottom 128KB of program space. You may need
to force this using a linker directive.

Note that on the AVR architecture program memory is referenced in multiples
of 2 bytes, which is how 16-bits can reference 128KB of program memory
rather than the 64KB which you might expect.


---------------------------------------------------------------------------