This project covers the process of deploying a simple architecture for the real-time and batch processing a temperature sensor data on Arduino with open source technologies part of the Big Data ecosystem. The purpose of the solution is to exemplify the flow of data through the different tools, from its capture to its transformation and Insights generation.
Within the presented architecture the services of publication, transfer and storage of data are agnostic to the format in which the data is sent by the Arduino board. This drives the idea of building a centralized service for the distribution of messages from different sending devices to many clients or services capable of consuming this data.
Under this premise, the possibilities of applicability of this architecture are directly proportional to the implementation creativity of information-emitting devices.
- Arduino IDE 1.8.5
- Hive 1.2.1
- Kafka 0.10.0
- Spark 1.6.2
- Zeppelin Notebook 0.6.0
- NiFi 1.2.0
1.- Data generation from humidity/temperature sensor
- The code loaded in the Arduino platform makes readings trough the DHT sensor every 3 seconds, capturing:
- Percentage of humidity in the environment.
- Temperature in Celsius (°C)
- Temperature in Fahrenheit (°C)
- The Heat Index is calculated. This measure determines how people perceive the temperature according to the humidity of the environment.
- A request is made to an external web service to determine the time of reading according to a predefined time zone.
2.- Data publication to the MQTT server
- The message or payload that will be sent to the MQTT server is built:
- The payload is in JSON format.
- It contains the data captured by the sensor, the calculated information, date/time of the reading, amount of milliseconds passed since the Arduino platform started and a unique identifier of the transmitter device.
- Verifications of Internet and MQTT broker connections.
- The payload is published to the MQTT broker on a specific topic under a predefined username and password.
- The MQTT broker has a list of permissions that defines which users can publish information over existing topics.
3, 4 & 5.- Real-time data capture
- The Apache NiFi service has an organized set of instructions that orchestrate the flow of data as they are captured:
- NiFi connects or subscribes to the Mosquitto topic and captures messages in real time.
- NiFi complement the received messages (JSON string) by defining new fields outside the message related to technical aspects of the message and the MQTT broker.
- NiFi inserts messages and the new fields into the Hive data store.
- NiFi publishes the original message in Kafka.
- Hive and Kafka stores the data:
- Hive allows batch processing of historical data.
- Kafka allows real-time processing of data sent by the Arduino platform.
6 & 7.- Data processing
- Zeppelin runs code blocks (Scala and SQL):
- It is possible to query the data stored in the data warehouse.
- It is possible to subscribe in real time to the Kafka topic to process the messages under different time windows.
- The code is executed on Spark.
- The data obtained in each time window are transformed and stored in Hive tables.
- the average of the temperatures captured is calculated on every window .
- Adafruit Unified Sensor 1.0.2
- DHT sensor library 1.3.0
- PubSubClient 2.6.0
- Time 1.5.0
- NTPClient 3.1.0
- Board: "ESP8266 Generic Module"
- Flash Mode: "DIO"
- Flash Size: "512K (64 SPIFFS)"
- Debug port: "Disabled"
- Debug level: "None"
- Reset Method: "ck"
- Crystal Frequency: "26 MHz"
- Flash Frequency: "40 MHz"
- CPU Frequency: "80 MHz"
- Upload Speed: "115200"
- Programmer: "AVRISP mkII"
- Ambos NL & CR
- 115200 bauds
In this project instructions are not loaded to the Arduino board, but to the ESP8266 module, since it is this module that will manipulate, transform and send the data.
To load instructions to the WiFi module it is necessary that it enters Flash Mode at the moment of start, which is achieved through the pin configuration showed in the Pinout diagram (Flash Mode).
It is recommended that the Arduino board does not have loaded instructions when carrying out the code load to the ESP8266 module.
In the serial monitor we can observe the process of connection, capture and publication of messages.
If we subscribe to the Mosquitto topic we can see how the messages are published by the Arduino board in real time.
NiFi publish the captured messages on Kafka and Hive. In the latter, additional fields related to the MQTT server are recorded in the table.
Once the NiFi template is started, if we subscribe to the Kafka topic to which we redirect the messages, we will be able to observe how the messages are published practically instantly when they are received at Mosquitto. In the following image we can see the reception of messages in the Mosquitto topic (left) and the Kafka topic (right).
On the other hand, if we consult the Hive table periodically, we will notice that the number of registers increases according to the messages captured by NiFi.
The notebook developed with Scala is in JSON format and can be imported into Zeppelin and is divided in 7 paragraph:
2.- Data capture.
3.- Calculation of temperature averages by window.
4.- Kmeans model creation and training.
5.- Data classification (window)
6.- Data classification (random data)
7.- Data inspection.
This project lacks the following characteristics that could increase the value of the possible applications for these technologies:
- Arduino board integration with different types of sensors.
- Multiplexing of signals sent to the WiFi module.
- Development of status and control indicators (LEDs, alerts, alarms).
- Flash Mode activation/deactivation button.
Detailed instructions for replicating this project are found in this Github repository, along with the skecths, templates, notebooks and test data used can be found.
At the moment it is in Spanish, so while it is translated into English you can practice your skills in this language ;).