Intel Math Kernel Library on Arduino

Arduino Web Editor

UP2 Board

Note: This tutorial could be outdated, please go <a href="https://docs.arduino.cc/tutorials/generic/intel-math-kernel-library-on-arduino">here</a> for a more current version. In this tutorial we'll learn how to integrate your sketch with the super-optimized Intel libraries for heavy mathematical computation (MKL for friends).First of all, you need a suitable IoT gateway with a couple of free GB hard disk space (libraries can be heavy sometimes). Follow the Getting Started section of Create to get your Gateway up and running (choose Ubuntu distribution, for example).<h3 >Why do I need so much space </h3>(AKA: how I learned to love shared libraries)In the Arduino world, a library is a collection of code which helps you interacting with a certain piece of hardware or perform particular operations. On Linux, a library has the same properties, but can be shared between multiple programs. This allows saving a lot of space, since the function they contain can be used by different process at no cost. <h3 >Setting up the board</h3>Libraries are usually shipped via the package manager or using an installer (we'll use the latter one in this example). First of all, access your board via ssh by using its IP address and the username/password you provided during installation. To do this, you can use Putty (if you are on Windows) or a serial terminal. The port must be set to 22 for SSH login to work properly.Now we need to download the MKL package. Open your browser and go to <a href="https://software.intel.com/en-us/mkl">https://software.intel.com/en-us/mkl</a> , click &#34;Free Download&#34; and complete the registration process. Select &#34;Intel Performance Libraries for Linux&#34; and right click on &#34;Intel Math Kernel Library&#34;, selecting &#34;Copy link address&#34; (or similar, depending on your browser).<figure><img src=https://projects.arduinocontent.cc/3dc68c6f-e1fd-4ba7-a95f-9093ba577528.png /><figcaption> </figcaption></figure>Now reopen the SSH shell, type: <pre ><code >wget </code></pre>And paste the link you just copied. Press [Enter] and the download should start. Once downloaded, extract the package by typing:<pre ><code >tar xvf l_mkl_2017* </code></pre>Let's <code >cd</code> into the extract folder (typically with the same name of the downloaded package, without the extension) and type:<pre ><code >./install.sh </code></pre>Follow the on-screen instructions and after a couple of minutes your system will be ready.<h3 >Time to code!</h3>Open Create with the provided example. We're going to demonstrate a very handy function of MKL library which helps parallelizing the code execution without worrying about threads or similar.In the example, a matrix multiplication is performed using the optimized function <code >cblas_dgemm</code> included in MKL. The function is optimized for a variety of Intel hardware platforms, using the latest vectorization function available on the target CPU (AVX, SSE4 and so on).But what happens if we have a multicore architecture? We are losing a lot of power because the function runs only on a thread, even though the problem it must solve is &#34;splittable&#34; into multiple, smaller problems, thus it's a perfect candidate for parallelization.Using <code >mkl_set_num_threads</code> we can instruct the library to run on multiple threads (and cores) with no additional programming effort.The example executes the same computation using multiple thread numbers, form 1 to the number of cores of the target CPU (could be double if HyperThreading is active) and benchmarks the various runs.<h3 >Let's unleash the monster</h3>When ready, open the Monitor on the left panel, press &#34;Upload&#34; and wait a couple of seconds for upload and sketch start to happen. The output of the program will be printed on the Monitor. <figure><img src=https://projects.arduinocontent.cc/d8667d26-40fc-4ba2-9582-f46608a7897e.png /><figcaption> </figcaption></figure><h3 >What did we learn?</h3>If we take a look at the results, executing on two threads will bring almost double performances compared with single threads (on a dual core processor, of course). The performance impact in not exactly <code >x2</code> since there is a bit overhead when launching any additional thread and this penalty becomes bigger as long as the execution time is small. If most of the time is spent crunching numbers the speedup approximate the theoretical maximum.

Intel Math Kernel Library on Arduino

Learn how to integrate Intel MKL library with a very streamlined Arduino workflow.