IoT Analytics and Big Data with AWS

Collect, store and vizualise/analyse data. Data which are ingested to AWS IoT can be stored in other services by topic rules.



What is...

Exercise: Data Analysis with Amazon Athena


Store sensor data on Amazon S3 with Amazon Kinesis Firehose

To store sensor data in S3 with Amazon Kinesise Firehose first of all an S3 bucket must be created. Then a Firehose stream will be provisioned and an IoT topic rule created to transfer data from AWS IoT into the Amazon Kinesis Firehose stream.

Create an S3 bucket

Go to the AWS S3 console

Create bucket
Enter bucket name -> choose region -> Next
Create bucket

Create a Kinesis Firehose stream

Go to the AWS Kinesis console

Go to the Firehose console
Create Delivery Stream
Destination: Amazon S3 -> Delivery stream name: sensor-data-to-s3 -> S3 bucket: <the bucket that was created in the previous step> -> Next
Buffer interval: 60 -> IAM role: Create/Update existing IAM role -> choose role -> Next
Policy Name: Create a new Role Policy -> Allow
Create Delivery Stream

When the Status of the delivery stream becomes ACTIVE

Go to the AWS IoT console

Rules -> Create
Name: RuleSensorsToS3 -> Attribute: device, pressure, temperature, humidity, datetime -> Topic filter: sdk/test/Python -> Add action
Send messages to an Amazon Kinesis Firehose stream -> Configure action -> Stream name: sensor-data-to-s3 -> Separator: \n (newline) -> Create a new role -> IAM role name: IoTSensorsToFirehoseRole -> Create a new role -> Choose a role: Choose newly created role -> Update role -> Add action
Create rule

Query data stored in S3 with Amazon Athena

To query the sensor data stored in S3 a database is created in Amazon Athena and afterwards a table which points to the S3 bucket where the data are stored.

Go to the Amazon Athena console

Get Started
Skip the tutorial window
Query Editor: "CREATE DATABASE IF NOT EXISTS iotdata;" -> Run Query
In the left menu under DATABASE select sensordata

Use the following SQL statement to create a table for the sensordata

  device string,
  pressure double,
  temperature double,
  humidity double,
  datetime string
  'serialization.format' = '1'

Now we are ready to query data on S3. Some examples:

SELECT max(humidity) AS max_hum from iotdata.sensors;

SELECT count(*) AS num_datasets from iotdata.sensors;

Demo for realtime analytics

Demo from the example architecture above.