Cloud-based services are going to be a central part of IoT (Internet of Things) development. This project explores a potential real-world application, automobile engine data collection and analysis. All vehicles include an OBD-II diagnostics port that provides access to a wealth of internal data. This project collects some of that data, sends it to the Microsoft Azure IoT Hub, analyzes it with Azure Stream Analytics and Machine Learning, and stores the results in Azure Blob storage.
One such analysis is early detection of maintenance issues; in this prototype system we monitor engine coolant temperature and look for abnormalities. The machine learning algorithm learns the normal temperature behavior and then classifies out-of-normal conditions. For richer and more accurate modeling we include month information, allowing the system to learn seasonal changes in behavior.
This project uses the Arduino MKR1000 board with an ARM processor and integrated WiFi. It was developed as part of the "World's Largest Arduino Maker Challenge" sponsored by Arduino.cc and Microsoft. It is called Car Smart as a reference to the use of Machine Learning for data analysis.
The device is a "black box" that is installed in the vehicle with no human interface. You can see in the image above the ODB-II interface board on top and the Arduino underneath. Because the focus of this project is on moving intelligence to the cloud, the in-vehicle device is a simple data logger and does no data analysis of its own.
A Microsoft Azure Stream Analytics job takes these incoming messages, processes them with a custom Azure Machine Learning Web Service, and writes the results to a CSV file in Azure Blob Storage. Here the Stream Analytics dashboard shows a burst of activity as the events come in.
The result for now is a simple CSV file with the recorded temperatures and the Machine Learning classification as LOW, NORMAL or HIGH. Note the change to classification of HIGH when the temperature goes higher than expected based on the training data.
The machine learning algorithm shown below is a multiples decision forest. A spreadsheet of training data was uploaded to train the model. The training data contains several hundred samples along with a LOW/NORMAL/HIGH classification. After training, the trained model can deployed as a web service and used to classify new samples of the data.
You must first create a Microsoft Azure account. Azure has a free trial and a price-by-usage plan. Then you need to create the following Azure services:
1. An IoT Hub
Give it a name; mine was LovegroveVehicle. Then my hostname is LovegroveVehicle.azure-devices.net
Available under the keys icon is a shared access key and a connection string, which are not repeated here for security.
Next create a device in your IoT Hub. I did this using the free Device Explorer program which comes in the IoT Hub SDK and runs on Windows 10. Give Device Explorer the hub connection string for access. The IoT Hub uses Shared Access Signatures for security, and this program will create an SAS token for you to include in your device. This program also has a Data tab which will display incoming data from your device.
2. A Storage Account
Note that Azure services have geographic regions. They did not always default to the same region. Make them all the same region for the best results.
3. A Machine Learning Workspace
Machine Learning has its own separate portal called the Machine Learning Studio where you build your experiments for training and then publish the trained model as a web service. See details later on building this web service, but do this before you create the Stream Analytics job, which requires it.
4. A Stream Analytics Job
A stream analytics job has three parts
- An input. This is a data stream from your IoT Hub
- Functions. Add your Machine Learning web service as a function, making it available to the Stream Analytics. The alias you pick will be the function name in the query.
- A Query to select the input data, apply the function, and produce the output.
- An output. In our prototype is is simply a CSV file in blob storage, giving us several options for downloading and further analysis.
Below is the current query.
WITH subquery AS ( SELECT Temp, tempstate(Month, Temp) as result from carsmart ) Select Temp, result.[Scored Labels] Into carsmartoutput From subquery
Machine learning requires a set of training data. In the future the system could learn car characteristics in real time, but for this demonstration we will pre-train the model with some already-classified data we supply. For this purpose we created a spreadsheet of random data calculated according to a distribution we hope the ML system will learn. This spreadsheet was created in Excel 2016 under Windows 10 and then saved as a CSV file in the Azure storage account. The spreadsheet can be found in the code repository. The input data is month and temperature. The classes are LOW, NORMAL, and HIGH temperature.
With the training data in place, create a new machine learning "experiment." The experiment is illustrated below and is built out of the following components:
- The training data set
- A split data function to split the data into a training and testing subset
- The machine learning algorithm of choice; I chose the Multiclass Decision Forest. I hope it will learn our three classes of month/temperature pairs.
- A "Train Model" module, with the algorithm and one data subset as inputs. Double-click on the module and select the column representing the correct classification; in my data it is the State column.
- A "Score Model" and an "Evaluate Model" module to test the model on the other training subset and report the results.
Run this experiment and check the results by clicking the output terminal of the Evaluate Model box. and selecting Visualize. In this case the overall accuracy is over 0.99 and all the confusion is in the HIGH class, probably due to too few samples of HIGH temperatures.
If the experiment is successful you are ready to set up a Predictive Web Service. The result is as shown below.
This service must be modified to specify only month and temp as input, and the state as the output. The results is as follows:
Run this model, then deploy the web service. The subsequent dashboard will contain a "TEST" button where you can manually enter data and test your service. It is now ready to include in your Stream Analytics job.
The MKR1000 uses a Sparkfun ODB-II-UART Board to get data from the vehicle. These boards are interfaced with a three-wire RS-232 serial port. For the prototype system these are the only hardware components required, so they were installed in a project box and the three signals wire-wrapped together.
A Sparkfun supplied cable connects the ODB board's DB-9 connector to the vehicle's ODB-II port.
The MKR1000 in the prototype is powered thorough its USB port from a 12V to 5V vehicle USB power adapter.
The MKR1000's built-in WiFi is programmed to connect to my home WiFi and upload data. The philosophy is that the system will store data and upload it over home WiFi when the vehicle is at home, eliminating the need for expensive internet access in the vehicle. The software uses the standard Wifi101 library for WiFi access.
Data can be sent to the IoT Hub in a variety of ways. For this prototype the simplest was to use HTTPS to send POST messages with the data in JSON format. Note that Azure requires HTTPS and the MKR1000 needs to have its firmware updated with the correct SSL certificate; directions on how to do this are on the web. The GitHub arduino libraries includes a tool called Wifi101-FirmwareUpdater which does the job easily. The matching Arduino FirmwareUploader sketch is required.
The complete Arduino source code is available in the code repository.