Local Data Collection and Command Distribution

27 Nov 2021, 00:00

UDP / technical / networking / linux / IoT

I’m revisiting why I don’t like MQTT, and what I am doing about it.

I wrote in June about my thoughts on MQTT. I had another case where my MQTT broker went down. The Ubuntu host it was running on did some stupid Cannonical auto-update and it basically hung with no working network. Different control system than before though. In June it was bug zappers that were not getting controlled. This time it was outside lights. Hello Darkness, my old friend. As in I could not see in my own backyard.

This is not that big a deal, really. Quick fix. But it flared all my design instincts and I wanted to throw things again. Single Point of Failure (SPoF). My whole reliabillity posture is sitting on the uptime of a single host. My network is pretty solid (Ubiquity gear). And I have more than one AP to cover, powered by separate PoE switches. My network is redundant. The host running my MQTT broker is my weakest link.

Supposedly there is a commercial broker that has a failover option. So what. I’m not paying for for proprietary software to solve this. Besides, I am increasingly convinced that MQTT is simply not the answer. What does it give us? Decoupling, and one to many. The sender can post a message and then (if retain messages is configured) you can have many readers get that message. For sensor data readers may take immediate action or shuttle that data off to an indexed data store. For command messages the device reads the command and acts. The sender and readers are not directly coupled (e.g. HTTP interaction).

A Better Solution

I can get the decoupling in a far easier way. I’ve been playing with this idea for a long time and now is the time to really build it out.

Broadcast UDP.

Your reaction to reading was almost certainly negative. Oh, it’s unreliable. Oh, you have to do extra work. Bah. Let me explain.

Advantage: Lightweight, Low Impact

Sending a broadcast UDP packet is lightweight. The network itself becomes your broker. Yes, because the packets are broadcast every node on the subnet has to process it. Even if I send one reading per SECOND it’s really no traffic at all. Let’s say I have 50 devices (I don’t). 50 packets per second at 1K payload each is basically nothing. On my network I have a VLAN set up just for my IoT devices, so sending a broadcast message actually has an even smaller impact.

If you have been reading my blog, you know that I’ve done a lot of playing around with ESP8266 and ESP32 these past few years. I’m getting away from those in favor of using actual Linux boards (like the Beaglebone that I wrote about recently). But I also am super fond of Tasmota.
It’s written in the Arduino framework but it’s all super-lightweight. I bet that I can wedge UDP support even into that code base. UDP requires almost no code and it’s easy.

Risk: Data Loss

Some UDP packets may get lost. Not likely on my pretty good network, but yes. They might. Even if I had 3% packet loss though I’m not worried. If I am sending a packet per second and lose 2 seconds worth of data it’s not going to change anything. And as I mention below, if I lose a command packet then it won’t matter either because the declarative control system will fix it. That’s what it’s designed to do.

Prove It: UDP Code is Easy

I pulled together a simple C file yesterday while watching TV. 63 lines of code to build a simple CLI program to send a broadcast packet. Read the code here.

beaglebone:~> ./sendudp 8000 "This is just a test"

And the output in tcpdump:

mars:~> sudo tcpdump -i enxd0374505d925 -n "broadcast" and port 8000 -s 1024 -X
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enxd0374505d925, link-type EN10MB (Ethernet), capture size 1024 bytes
06:51:50.676886 IP 192.168.2.104.43532 > 255.255.255.255.8000: UDP, length 19
    0x0000:  4500 002f 99db 4000 4011 ddd2 c0a8 0268  E../..@.@......h
    0x0010:  ffff ffff aa0c 1f40 001b 29c3 5468 6973  .......@..).This
    0x0020:  2069 7320 6a75 7374 2061 2074 6573 7400  .is.just.a.test.
    0x0030:  0000                                     ..

Why in C?

I tried to do this simple example in typescript, or even javascript. Wow. The Beaglebone is really low power. It’s slow. Developing on it with a scripting language is painful, and I can only see doom in my future if I try to do multiple things on it in a scripting language. C is still my favorite language, it’s showing why again. It was easy to build, it’s insanely fast and it was fun to write.

Also, this approach adheres to the Unix Philosophy. You should stop and go read that. Seriously. I’ll wait. We need more of that thinking these days.

How Does this Solve Redundancy?

All by iteself it doesn’t. What do I have now with MQTT? A bunch of devices, and a broker. The new model is the same, with the network itself being the broker. Only now I can add another program to read the packets and it can operate completely independently of the other.

Listeners

For data coming from sensors it’s simple. The sensor sends data broadcast. Both listeners “hear” it and log it away where it needs to go (I’m thinking of using Amazon CloudWatch for this, or maybe Amazon TimeStream). But regardless, I can have duplicate listeners and control the logic upstream on whether I just keep duplicate data sets or filter dups post-collection.

Actuators

For commands being sent to actuators using a UDP packet is equally easy. I can have more than one command node and it will “just work.” Today I have a cronjob sending MQTT messages to the broker at specified times. It’s not ideal. Not only is the broker a SPoF as I mentioned above, but as a control system it’s really too simple. What I want is a declarative system where I say “I want this device to be in this state over these periods” and the controller actively checks the device state and commands it to be in that state if it isn’t. This solves another problem of using UDP as well: a lost command is no big deal, since the next time the controller checks the device state and finds it not to be in the desired state, it will send another command to fix it. Self-healing (this is how Kubernetes works, by the way). This is theoretically possible on top of MQTT, but then why do I need that complex message bus?

Addressing

The astute reader will wonder “what about addressing?” The UDP broadcast goes to everything. So the device needs a way to determine “is this for me?” Today over MQTT every device has a name. So I don’t think it would be too hard to just use that name for things that are directed messages. The data is there to use, it’s just a matter of deciding the formating.

Security

MQTT supports options for encryption and authentication. Those are easy to add too. The UDP packet could be either encrypted or signed (HMAC) with a shared key and you’d get the same level of protection that MQTT offers today.

Tasmota Changes?

I am very fond of Tasmota and especially the Shelly 1. Today these are either serial, HTTP, or MQTT based for commands. To make my dream system come true I’ll have to look at adding in a UDP option.

Incremental Development?

Some new ideas are basically “all or nothing” and you find yourself in a chicken-before-the-egg scenario. It’s clear that I could start this UDP solution for my swimming pool sensor problem first. I’m already working on using a Beaglebone Black for collecting all that sensor data. I can start there, and then migrate more devices as I make progress with Tasmota.

Practical Example

On my beaglebone I wrote a script:

beaglebone:~> cat ./sendtemp.sh
TEMP=`cat /sys/class/hwmon/hwmon0/temp1_input`
TEMPC=`awk "BEGIN {print $TEMP/1000}"`
echo $TEMPC
./sendudp 8000 $TEMPC

and if you run it you can get the data over the network:

mars:~> sudo tcpdump -i enxd0374505d925 -n "broadcast" and port 8000 -s 1024 -X
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enxd0374505d925, link-type EN10MB (Ethernet), capture size 1024 bytes
08:26:56.897086 IP 192.168.2.104.51066 > 255.255.255.255.8000: UDP, length 6
	0x0000:  4500 0022 9b82 4000 4011 dc38 c0a8 0268  E.."..@.@..8...h
	0x0010:  ffff ffff c77a 1f40 000e c268 3233 2e34  .....z.@...h23.4
	0x0020:  3337 0000 0000 0000 0000 0000 0000 0000  37..............
	0x0030:  0000                                     ..

Next Steps

Now I need to write a data collector for the UDP packets and come up with a common payload format (that includes addressing and multiple data elements). I’m leaning towards a JSON payload format, to be honest, and then using nodejs for the controller. The host for the controller software will be an Intel Nuc, most likely, or maybe another lower end linux box like a Beaglebone. The Nuc’s are wonderful but they do use about 14W last time I checked. A Beaglebone supposedly uses 6W. I don’t have my power meter handy - had to order a new one - so I’m going off the spec sheet. As I dial my power use back more and more I may just shift to Beaglebones for these tasks.

At a high level the controller won’t be all that hard. It basically will listen for data packets and forward them off, probably to Amazon Cloudwatch. I did a lot of work on Prometheus a few years back, but I really don’t like their pull model of data collection. It’s the opposite of what I am doing in my new approach. So the aggregation/visualization part still needs some thinking about.