RackRanger
Server Rack Monitoring with microcontrollers.
Miika, Justus, Valentin and I built RackRanger as a challenge project for HackUPC 2024, participating on the Grafana challenge. As you might now, I use Grafana, so I was quite eagered to see them when the sponsors were announced. First thing I did before the hackathon even started was visiting their booth. They were super friendly, and they had lots of hardware on the booth! Multimeter, scope, pinecil and a bunch of other cool stuff which got me hyped.
Grafana asked us for an IoT observability project, and they even provided some hardware: an ESP32, a breadboard, jumpers, LEDs and a bunch of sensors. On top of that, apart from the european space agency’s challenge (which btw, my friend arf won!! I’m so proud of him!!), all the others involved AI, which none of us really wanted to work with. The whole team was quite decided on Grafana, so we jumped in.
We started brainstorming instantly, coming up with ideas, debating them and refining them. I’ve got to mention, Justus is super experienced and his goal was to win. So it was quite different from my last experience. Last time I was with a bunch of friends whose goal was to have as much fun as possible. We found an idea that seemed fun and doable enough so we didn’t think any further. This year, we really put in a lot of effort on the ideation phase. We wanted something unique, useful and impressive. We settled on server rack envoriment monitoring - Observing simple stuff like the temperature inside the rack but also more interesting metrics like noise levels, which could help detect drive or fan faliures as soon as possible. We also added tamper detection, which you can see a video demonstration of here:
Yeah, Justus built an awesome rack mock-up with cardboard x3. I had to go to an office supply store to get tape, a marker and some scissors for it. But anyway - as you can see, not only we’re graphing all those metrics with Grafana, but we have alert logs with Loki and notifications on chat services with Grafana alerts. Just as an extra, we added infrared level monitoring (maybe to detect fires early) and humidity, just because the temperature sensor included it. We did what we could with the sensors Grafana gave us! But other sensors could also be added to monitor stuff like power consumption or vibration of the rack. Here’s a picture of the project homepage with the Grafana dashboard:
Not much, but for the hardware we had and for demo purposes, it’s fair enough.
You may have noticed - There’s more than one ESP32 here. And that’s right, I took advantage of the trip to the office supply store to pass by the train station where my flatmate brought me a box with some of my hardware. By building more " monitors", we could show the judges how easy it is to scale. We could add as many “monitors” to the Prometheus job as we wanted and the Grafana dashboard would automatically split the pannels to show the data from all the monitors.
The firmware, Miika and I ended up making with PlatformIO. The idea was to make it with Rust and Embassy, but I just couldn’t get it to work. Compiling for xtensa (the ESP32’s architecutre) was an absolute pain in the bum, and after hours and hours of fighting and fighting with cross-compilation, the mentors told me to cut my losses and try something else. While I was the one who built most of the firmware, I’ve really got to thank Miika. When I was exhaused from working on the firmare and just really really didn’t feel like implementing alert pushes to Loki, he went ahead and never really having worked with microcontrollers, he implemented it himself. Meanwhile, i took a break from code and did some small things here and there, like this graphic for the website:
As you can tell, it’s a diagram explaining how all the components integrate.
We have many “monitors” (microcontrollers, MCUs) on each datacenter that all get scraped by a single lightweight linux server (like a Single Board Computer) running Prometheus. That keeps the MCU - Database communication confined to within the network, so we didn’t have to worry about making the MCUs super secure. Then, Miika set up communication from the databases to a remote server running grafana with a tunnel/reverse proxy, keeping the communication secure. He also made everything super easy to deploy by writing an Ansible rulebook that defines the whole docker stack.
The hardest lesson I learnt was knowing when to swallow your pride, cut your losses and start over. It was thanks to the mentors that I realized I was digging myself in a hole and that I had to try something else.
One thing that had me bummed is coming out close second. The winner of the Grafana project wasn’t on the devpost, and we were pretty sure we were going to win. But we weren’t called to the stage. A way less complex but more creative project won. The people juging for Grafana contacted us and told us we were neck and neck, and that they really liked our project. I’d almost rather not have known. I think knowing we were thaaat close made me more bummed out. But oh well, it’s over now.