Sorry, you need to enable JavaScript to visit this website.

You are here

An IoT sensor streaming solution on AWS

BY Mihir Satokar 30 August 2022

Overview

In this Proof of Concept (POC) we'll demonstrate an event-driven, serverless solution for IoT streaming.

Multiple water quality multi-sensor probes are deployed around Port Phillip Bay and the Yarra River in Melbourne. These probes are constantly streaming various data (such as pH, temperature, salinity and dissolved O2) to AWS IoT. The status of each sensor can also be monitored on a dashboard.

The sensors are represented as dots which change colour based on the data received. Clicking on a sensor displays detailed statistic values.
 

The dashboard receives live (near real-time) sensor data and updates the dashboard map and sensor graph.

Solution Architecture Diagram


 

Data acquisition and ingestion

  1.  All sensors are registered as Devices/Things in AWS IoT Core, and each device is issued a unique X.509 certificate and policy respectively.
  2. Cetificates provide registered devices with the ability to authenticate during connection. Policies restrict a device so that it can only subscribe/publish to certain topics. 
  3.   All devices are connected to AWS IoT Core via MQTT protocol, with IoT Core acting like a message broker. All sensors are clients that follow the Publish–subscribe pattern.
  4.   This is what  sample nodejs sensor simulator configured and connected to AWS IoT looks like:
  5.  Here's an example of a sensor simulator publishing data to AWS IoT:
  6.  In AWS console, this is how we verify that data is received:
  7.  IoT rules are used to route the event stream to the next receivers. SQL statements can be executed in the rule before forwarding the payload. Here's a sample rule:

Data value realisation

When the IoT rule triggers lambda function, the function will execute a graphql mutation in AppSync, and the dashboard webapp will receive the update and reflect the sensor value update.

Data analytics and BI

AWS IoT Analytics has multiple components that run analytics on massive amounts of IoT data. Here's an illustration of the overall structure:

  1.  Channel is the source of the data.
  2.  Pipelines will transform and filter data.
  3.  Datastore is the destination of the data.
  4.  Standard SQL statement can be used to query from Datastore as shown below: 
  5.  Dataset can be stored in s3 as data lake.
  6.  Jupyter Notebooks and Sagemaker can also be used to run some ML tasks.
  7.  Datasets can be fed into BI tools for further analytics and visualisation, which we demonstrate below using Power BI and QuickSight):

Data Archive and Data Lake

  1.  S3 is the standard choice to build a data lake on AWS, when data is saved in s3. IoT rules can specify the key for the S3 object. In our solution, the object key is `${topic()}/${timestamp()}`. Here's a screenshot of the folder structure:
     
  2.  With AWS Glue, we can generate a database table schema on the whole s3 bucket, as shown below:
     
  3.  Using AWS Athena, we can execute an SQL query against the table built by Glue data catalog, as shown below: