An IoT sensor streaming solution on AWS

BY Mihir Satokar 30 August 2022

Overview

In this Proof of Concept (POC) we'll demonstrate an event-driven, serverless solution for IoT streaming.

Multiple water quality multi-sensor probes are deployed around Port Phillip Bay and the Yarra River in Melbourne. These probes are constantly streaming various data (such as pH, temperature, salinity and dissolved O2) to AWS IoT. The status of each sensor can also be monitored on a dashboard.

map.png

The sensors are represented as dots which change colour based on the data received. Clicking on a sensor displays detailed statistic values.

sensor.png

The dashboard receives live (near real-time) sensor data and updates the dashboard map and sensor graph.

Solution Architecture Diagram

SAD.png

Data acquisition and ingestion

All sensors are registered as Devices/Things in AWS IoT Core, and each device is issued a unique X.509 certificate and policy respectively.
Cetificates provide registered devices with the ability to authenticate during connection. Policies restrict a device so that it can only subscribe/publish to certain topics.

things.png

All devices are connected to AWS IoT Core via MQTT protocol, with IoT Core acting like a message broker. All sensors are clients that follow the Publish–subscribe pattern.
This is what sample nodejs sensor simulator configured and connected to AWS IoT looks like:

nodejs-iot-device.png

Here's an example of a sensor simulator publishing data to AWS IoT:

nodejs-device-publish.png

In AWS console, this is how we verify that data is received:

AWS-iot-receive-data.png

IoT rules are used to route the event stream to the next receivers. SQL statements can be executed in the rule before forwarding the payload. Here's a sample rule:

AWS-iot-rule.png

Data value realisation

When the IoT rule triggers lambda function, the function will execute a graphql mutation in AppSync, and the dashboard webapp will receive the update and reflect the sensor value update.

Data analytics and BI

AWS IoT Analytics has multiple components that run analytics on massive amounts of IoT data. Here's an illustration of the overall structure:

iot-analytics-flow.png

Channel is the source of the data.
Pipelines will transform and filter data.
Datastore is the destination of the data.
Standard SQL statement can be used to query from Datastore as shown below:

Iot-query-dataset.png

Dataset can be stored in s3 as data lake.
Jupyter Notebooks and Sagemaker can also be used to run some ML tasks.
Datasets can be fed into BI tools for further analytics and visualisation, which we demonstrate below using Power BI and QuickSight):

power-bi.png

quicksight.png

Data Archive and Data Lake

S3 is the standard choice to build a data lake on AWS, when data is saved in s3. IoT rules can specify the key for the S3 object. In our solution, the object key is `${topic()}/${timestamp()}`. Here's a screenshot of the folder structure:

s3-data-lake.png
With AWS Glue, we can generate a database table schema on the whole s3 bucket, as shown below:

aws-glue-table.png
Using AWS Athena, we can execute an SQL query against the table built by Glue data catalog, as shown below:

aws-athena-query.png

You are here

An IoT sensor streaming solution on AWS