Learn the techniques that reduce memory usage allowing your programs to run smaller Arduino based systems.

• 14,443 views
• 22 respects

Components and supplies

 Arduino UNO
×1
 MAX7219 LED 8x8 Dot Matrix Module Assembled
×1

Apps and online services

 Arduino IDE

There maybe times that you are resigned to using a more expensive Arduino not because the you need the IO pins, but you need the extra memory for your program. I will use the Knights Tour puzzle to show how this can be moved from a Arduino Mega 2560 to a Arduino UNO or Arduino Pro Mini system.

The original code base for this puzzle can be found at https://www.hackster.io/krishnalalith/knight-s-tour-3f1ae4. This article is not a critique on coding style rather it is to show techniques that you can use to reduce your memory footprint.

1. Make sure variables only use the minimum size they require:

In the MEGA version, the array that holds the moves is defined as:

`int sol[N + 1][N + 1]`

In C and C++, arrays are zero based, that is the first element in an array is 0 and the last element in an array should be N-1. While it is usually easier to think in a range of 1 to N than 0 to N-1, defining the array as 0 to N means that element 0 is never used in this case. So the 8x8 array is now 9x9 and thus 17 elements in the array are not used.

Secondly, each array element will hold a number between -1 and 64. Integers defined using the int type are 16 bits or 2 bytes. This means it will hold a number between -32768 and 32767. But we don't need this, by defining the array using int8_t, it will half the size of the array and be able to store numbers between -128 and 127. This easily meets the requirements.

In the UNO version the array can now defined as:

`int8_t sol[N][N]`

Bytes saved is 98 (Around 5% of RAM)

Looking at the other variables, we can reduce the following to 8 bit numbers:

`int i, j, step_count, x_move[], y_move[], cnt, k`

only the arrays need to be signed, the others are unsigned

`uint8_t i, j, step_count, cnt, k`

`int8_t x_move[], y_move[]`

While this won't seem to reduce RAM in this case because they are local variables (the compiler only shows global RAM usage), they are stored in a structure called the Stack. The stack is part of RAM and grows from end of memory down to the start of memory. It holds temporary variables, stores registers and the return address for functions. If the stack gets too large, it will start to overwrite your global variables and your program will ultimately crash.

2. Use compiler constants rather than constant variables

In general, all constants should be compiler constants and not defined as variables.

`const float pi = 3.14`

will use 4 bytes where as

`#define pi 3.14`

uses zero bytes.

Don't assume that the optimization done during compile time will automatically do this for you.

Update: The problem with compiler constants is they don't specify a data type. For example:

`#define one 1 `

Here the constant "one" has no data type. To overcome this you can include a data-type as follows:

`#define one (uint8_t)1 `

In the MEGA version, the `x_move[]`, `y_move[]` and `sol[][]` arrays are defined in the function `start_knight_tour`. The `x_move[]` and `y_move[]` arrays never change. The problem with defining arrays as local variables is that you must pass the array (address of array in this case) as a parameter on any function that is going to access that array. This adds extra overhead to the stack. Because the stack is still part of RAM, you actually use an extra 2 bytes every time you pass it as a parameter. In this case you can save stack space by defining them as global variables.

`const int8_t x_move[] = {2, 1, -1, -2, -2, -1, 1, 2};`

`const int8_t y_move[] = {1, 2, 2, 1, -1, -2, -2, -1};`

`int8_t sol[N][N];`

3. Put constant arrays in flash memory

Normally any data you define is stored in RAM. If the data you are storing never changes, you can store it in flash memory by adding the PROGMEM attribute,

`const int8_t x_move[] = {2, 1, -1, -2, -2, -1, 1, 2};`

`next_i = i + x_move[k];`

becomes

`const int8_t x_move[] PROGMEM = {2, 1, -1, -2, -2, -1, 1, 2};`

`next_i = i + pgm_read_byte_near(x_move + k);`

This is slower but the arrays are now stored in flash memory saving precious RAM.

Note: You can also write the above code line as:

`next_i = i + pgm_read_byte_near(&x_move[k]);`

`The` & character in C means "The address of". In this case the address of x_move[k] which is the same as adding k to the start of the x_move array.

4. Dealing with recursion

Recursion is the ability of a function to call itself. There are some programming solutions that are easier to understand using recursion rather than standard iteration methods. The classic example is calculating a factorial:

`int factorial(int x) {`

`if (x <= 1)`

`return 1`

`else`

`return x * factorial(x - 1);`

`}`

When the factorial function is invoked, the calling function will put the parameter values onto the stack (in this case x). Next it puts the return address (the address of the next statement to execute after the function returns) on the stack. Then it jumps to the start of the function. At the end of the function, the return address is obtained from the stack, the stack pointer is adjusted to skip the parameters as these are no longer required and the program continues at the statement after the function.

With recursion, let's say we call `factorial(10)`, the number 10 is placed on the stack, the return address is placed on the stack, the function is executed. In the function we call `factorial(x - 1)` so now the number 9 is placed on the stack, the return address is placed on the stack and the function is executed again. This keeps going until x reaches 1. So 10 numbers and 10 return addresses are saved on the stack. In this case 40 bytes required to calculate 10!. 80 bytes required to calculate 20! and so on.

In general recursion on micro-controllers is not recommended as RAM size is so limited. The example I have given is very simplistic. In practice a lot more overhead is used to invoke a function. Not only are parameters placed on the stack but so to are local variables. It is not uncommon for a function to use hundreds of bytes of stack space.

In the Knight's Tour example, it will try every square on the board. This means it will go to 64 levels deep. Now let's look at how much memory is required on the `knight_tour` function.

MEGA version (20 bytes per call):

`int knight_tour(int sol[N + 1][N + 1], int i, int j, int step_count, int x_move[], int y_move[])`

2 bytes for return address, 2 bytes for address of `sol[N+1][N+1]` array, 2 bytes `int i`, 2 bytes `int j`, 2 bytes `int step_count`, 2 bytes for address of `x_move[]` array, 2 bytes for address of `y_move[]` array,

Local variables `int k`, `next_i`, `next_j` take 6 bytes

Total stack space required for parameters and local variables is 20 * 64 or 1280 bytes.

UNO version (4 bytes per call):

`bool knight_tour(int8_t xy)`

2 bytes for return address, 1 byte for `int8_t xy`

Local variables `int8_t k` takes 1 byte

Total stack space required for parameters and local variables is 4 * 64 or 256 bytes.

5. Understand compiler optimization

Compiler optimization helps to speed up your program. It will look at the code and may decide that rather than transfer a variable in and out of memory to one of its registers, it will use a spare register to store it. This is common with the index of a `for` loop. However, while it is faster, it uses more memory. Not only is there memory allocated for the variable, but now the register it is using needs to be saved on the stack in case the function being called overwrites it. Adding the attribute to turn off optimization on the `knight_tour` function saved 10 bytes of stack every time the function is invoked (64 x 10 = 640 bytes saved!)

`__attribute__((optimize("O0")))`

`bool knight_tour(int8_t xy)`

6. Code specific optimizations

No compiler optimization system will ever compete with the programmer who understands how the code actually works. Take for example this code snippet:

MEGA version:

`int knight_tour(..., int step_count, ...) {`

`:`

`if (knight_tour(..., step_count + 1, ...))`

`:`

`}`

Here you see that step_count is increased by one each time the `knight_tour` function is recursively invoked. Because step_count is passed by value, on the return from the function its value will be the same. In effect 2 bytes of stack space is wasted every time the function is recursively called.

UNO version:

`uint8_t step_count` = 0`;`

`:`

`bool knight_tour(int8_t xy) {`

`:`

`step_count++;`

`solved = knight_tour(next_xy);`

`step_count--;`

`:`

}

Under normal circumstances, using a global variable instead of a local variable is generally frowned upon. However in this case it saves 128 bytes of stack for a cost of 1 byte.

Another code specific optimization was to combine the `x` and `y` variables into a single `xy` variable. Since both `x` and `y` must have a value between 0 and 7, it is easy to obtain the `x` and `y` value from `xy` using the following formula

`x = xy & 0x07;`

`y = xy >> 3;`

`xy = (y << 3) | x ;`

There is no doubt that this is slower but it means only one parameter needs to passed to the recursive `knight_tour` function.

Because the function is recursive, using less local variables is paramount. In the MEGA version it has three local variables. In the UNO version only one is required. The other two are recalculated after returning from the knight_tour function.

Under normal circumstances you wouldn't write code like this. However in this case it was a necessity required to reduce the amount of data stored on the stack.

Without looking at the assembly language generated by the compiler, my code still uses 24 bytes of stack on each recursive call of the `knight_tour` function. This equates to 24 * 64 = 1536 bytes of stack usage. This only leaves 512 bytes for global variables etc. I can't account for all of the extra 20 bytes that the compiler requires to setup the stack frame when calling the function. If anyone has any thoughts on the matter, please share them.

Code

KnightsTourV2.inoC/C++
Knights Tour puzzle rewritten to run on a ATMega328 system such as a Arduino UNO or Arduino Pro Mini
```/*
* Knights Tour
* Original code base: Krishna Lalith (https://www.hackster.io/krishnalalith/knight-s-tour-3f1ae4)
*
* Changes:
* Reduced memory footprint to work in ATMega328 (2474 Flash, 187 SRAM)
* Moved local variables from knight_tour function
* Reduced parameters on knight_tour function
* Moved solution animation from setup to loop
* Added timer and debug routines
* Added 1 second flash on LED to show it is thinking (Takes 7 minutes to find solution)
*/
//#define DEBUG
#include <LedControl.h>

#define CLOCK 11
#define CS 10
#define DIN 12
LedControl lc = LedControl(DIN, CLOCK, CS, 1);

#define N 8
#define N2 64

const int8_t x_move[] = {2, 1, -1, -2, -2, -1, 1, 2};
const int8_t y_move[] = {1, 2, 2, 1, -1, -2, -2, -1};
int8_t sol[N2];
int8_t m_level = 0;
bool solved = false;
bool thinking = false;
#ifdef DEBUG
uint8_t dot_count = 0;
#endif
unsigned long timeout;
int8_t step_count;
uint8_t next_xy;
uint8_t this_x;
uint8_t this_y;

//Initialise Matrix Display
void setup()
{
#ifdef DEBUG
Serial.begin(115200);
#endif
lc.shutdown(0, false);
lc.setIntensity(0, 2);
lc.clearDisplay(0);
lc.setLed(0, 0, 0, true);

//Timer used to flash LED while thinking
unsigned long start_time = millis();
timeout = start_time + 1000;
solved = start_knight_tour();
#ifdef DEBUG
Serial.println(((solved) ? "Found " : "Failed ") + String((millis() - start_time) / 1000) + "s");
#endif
}

//Clear board and start the recursive walk through of the knight from bottom left corner
bool start_knight_tour()
{
//Clear board
for (int8_t xy = 0; xy < N2; xy++)
{
sol[xy] = -1;
}

//Placing knight at cell(1, 1)
sol[0] = 0;
step_count = 1;
return (knight_tour(0));
}

//Recursive function to test all possible moves. Can go to 64 levels deep.
//Not optimizing saves 10 bytes of stack space per recursive call
//Also reducing local variables and function parameters reduce stack usage, but they need
//to be recalculated when the recursive function returns.
//Total used 24 bytes per call - Not sure why the compiler makes such a huge stack frame - should only be around 10 bytes
__attribute__((optimize("O0")))
bool knight_tour(int8_t xy)
{
int8_t k;

if (millis() > timeout)
{
lc.setLed(0, 0, 0, thinking);
thinking = !thinking;
timeout = millis() + 1000;

#ifdef DEBUG
//Note: this will have a serious impact on speed
Serial.print(".");
dot_count++;
if (dot_count == 60)
{
dot_count = 0;
Serial.println();
}
}
if (step_count > m_level)
{
m_level = step_count;
Serial.println("L:" + String(m_level) + ", S:" + String(SP));
#endif
}

if (step_count == N2)
return true;

for (k = 0; k < N; k++)
{
this_y = (xy >> 3) + y_move[k];
if (this_y >= 0 && this_y < N)
{
this_x = (xy & 0x07) + x_move[k];
if (this_x >= 0 && this_x < N)
{
next_xy = (this_y << 3) | this_x;
if (sol[next_xy] == -1)
{
sol[next_xy] = step_count;
step_count++;
solved = knight_tour(next_xy);
step_count--;
if (solved)
return true;
//Recalculate global variables since recursion kills them and they aren't stored on the stack to keep stack size to a minimum
this_x = (xy & 0x07) + x_move[k];
this_y = (xy >> 3) + y_move[k];
next_xy = (this_y << 3) | this_x;
sol[next_xy] = -1; // backtracking
}
}
}
}
return false;
}

//Display the solution.
void loop()
{
lc.clearDisplay(0);
if (solved)
{
int8_t cnt = 0;
while (cnt < N2)
{
for (int8_t xy = 0; xy < N2; xy++)
{
if (sol[xy] == cnt)
{
delay(1000);
lc.setLed(0, xy & 0x07, xy >> 3, true);
cnt = cnt + 1;
}
}
}
}
delay(2000);
}
```

Schematics

CLOCK -> D11
CS -> D10
DIN - D12

• 16 projects
• 44 followers

Published on

December 20, 2019

Members who respect this project

and 14 others

See similar projects
you might like

Make an Arduino Memory Game

Project tutorial by Jeremie

• 21,601 views
• 46 respects

Memory game

Project showcase by Anton Nazarenko

• 3,208 views
• 9 respects

Smart Analysis Using Live Usage Statistics

Project tutorial by Team Deodates

• 923 views
• 2 respects

• 35,509 views
• 32 respects

Arduino UNO with 8 Times More Memory

Project showcase by T.F. Mc Temucin

• 20,837 views