2025/07/31

What I did in my Chicago Years - BigData

Chicago was hard but it really reconnected me to what I love. 

On one gig, I worked at a fairly new startup. My years of real time market systems mattered here. They were taking in near real time IoT data and at 1mil per minute on my starting day it was going to be entertaining.

I was working under one of the CTOs leads but I was there to troubleshoot and I loved the relationship I had with him.

Most of it was built into a multi stage pipeline and I was mostly involved with the front end ingestion and before any being passed to the analytics engine.

The Front end was based on Apache NiFi. I have no idea where that product wound up but it was designed to take everything from a UDP feed to scanning FTP sites every minute to grab more data if it already ran out. 

NiFi then would let you do decent filtering and transform. But it was designed to scale for both raw ingestion and ETL so you could shift workloads and expand capacity on demand. It had a module that would reconfigure AWS EC2 scaling to provision what it wanted vs praying AWS rules could figure it out.

It could shift resources dynamicly between Ingres and ETL because it decided where the code ran and if it needed more it would bring more cluster members online but that was not going to be available within the next minute, maybe 5.

So I worked with two teams on NiFi tuning the injestion front end and the team handing the ETL stuff. It was always bouncing between because once we found the latest bottleneck it was on to the next thing needing attention. 

Then there was the BigData team. The lead I had worked with at another gig so we had a decent friendship already and he had built a great team. So they were storing data in Cassandra and running a 2400 node 800 shard environment just for production.

It wasn't just the data, because on the same nodes were the infrastructure for Apache Spark. I forget the name of the more generic worker model but Spark was a functional programming model over something like python C++ or a few others. 

So reach into the back shelf of my mind for C++ skills, everything I had learned at the previous gig from the same man about Functional Programming and put it to use in a highly distributed a environment where your code is brought to the data rather than the data being brought to the code. 

That was a fun learning experience to structure the data that best fit the analytics patterns.

The resulting synthetics were stored in another dataset within Cassandra but on a smaller subset of nodes so we could improve the performance for the next layer downstream.

The downstream workload ran on Mesosphere which became DC/OS and it was a container scheduler like AWS ECS and all of the EC2 build details. Like NIFI it could adjust cluster size based on workload. 

I mostly was involved at different levels with each of the teams in that layer. Some were well skilled, 

2025/01/25

My journey with HomeAssistant

In the early 10's I decided to switch from having an x86 desktop to #LivinOnAPiLife. Most of my work-life has me on cloud apps anyways, a few installed apps and the command shell. I've always been a Linux geek and I abandoned Windows for the most part (Gaming/3d Printing being the other part).

I bought a bunch of the early stuff RPi2b days when this started, I tried things like Pine64 till the RPi3 became my platform for a few years and the last one was my RPi4/8g. 

But throughout the adventure speed/capability was rarely the issue. Getting a tool like FreeCad installed could be a challenge. Even when installed it was just always missing some feature that Windows would have just fine in the Linux versions.

Plus my favorite Space Sim (Elite Dangerous) kinda abandoned PS4/5 and my clan moved to PC and they've been bugging me to join them. 

So I have an HP mini running Windows 10 for that. After installing an Nvme and swapping the HDD for a SSD is got enough speed and GPU for what I need. 

When I setup my first Home Assistant rig I had an unused but decent non RPi board the PineA64. Great little device in the years between RPi2 to 3. I'm a computer geek anyway and while the convenience of using HASS and just flashing an image to the SD I decided to go the Docker Container route.

A major advantage of this and being able to just change the image name to specify what version I want to run became my saving grace in Dec 24 when HA blew the community up with some breaking changes.

I'm going to make enough changes that the OS and I are going to eventually get a divorce. So I built using DietPi (great no Gui variant with text based config tools)

Launching the container in network = host mode eliminated 90% of future issues and a bunch of annoying things. 

Other than mapping the ConBee2 USB device though in no stranger to the cli so sorting permissions, setting up container to file system, setting up backup to Nas I got down to building my first HA.

Eventually I decided that the native ZigBee (zha) didn't give me the control I desired so I decided to move to ZigBee2MQTT. I love mqtt and I thought it would be the next move. 

Moved our home and it was just easier to start over, so Next Gen was launched

I used Eclipse vs RabbitMQ for the broker. Rmq is amazing for scalability but it's config just annoys me and I don't have 100s of things connecting to it.

So I had to deploy a few more containers. I switched from a shell script with a docker run command to docker-compose and launched the other two containers. 

I quickly changed from letting the other two run in bridge mode because localhost ain't what you think it is in a bridge. 

Got z2m talking to mqtt and then paired a bulb. After that installed the mqtt integration into HA and the new light was born. I quickly discovered the power of Groups in z2m. 

You tell me where in your home you don't have 2 or more bulbs that 90% of the time you just want them in sync. Yes zha has the ability to do some Zigbee magic but it's just so much easier in z2m.

So grouped up everything, hid the individual devices but didn't disable them because in my overnight routine I shut just one of them off in places. They just don't show in cards etc by default. Made a few more tweaks for improving reporting etc and after a few weeks had the basics of our home back up and running.

At this point I started down the ESPHome approach again. Before moving I built a few gadgets including a 3speed fan controller for some box fans. So now with a windows PC I could get into some PCB design and came up with a modular board and got 30 from JLBPcb for under 10 bucks. Can do up to RGBWW stupid directly driving or control strings like the WS28xx, plus some i2c or simple sensors. Armed with a bunch of new devices I started adding new things into my HA environment.

But the whole all of services on my single node using the same IP address started to become complicated. So I did some research into how to split my containers and give them each their own IP address. So down the rabbit hole I went and returned with MacVLan. 

I created another docker-compose file to build my networks and then changed my main file to assign up addresses to each container.

This brings us to modern Day. It's time to convert off this fragile Pi (yes I have a spare if it breaks). I need a new Nas, I want a place to host Plex and Frigate and have hardware acceleration for object detection etc so I'm in the process of building a HomeLab.

Once the new gear is online I'm going to migrate those containers to run over on the HL but still keep the docker container approach.

For the HL i've decided on the HP 800 mini chassis which allows for 2x3.5, 1x2.5 drives plus two M.2 devices (short and long) plus 32gb ram on a quad core Gen 7 x86.

Each node gets 32gb Ram, 256gb Nvme, 512gb SSD 2.5 and 2x12tb HDD's. Plus a 2.5gb Nic in addition to the onboard 1g. 2 of the three get a Coral TPU in the other M.2 slot for Frigate (tho one is really just a hot spare. Frigate is confined to one of those two nodes for execution.

Built 3 of these plus 4 port switch for less than $1,000 with $600 of that just for the HDD's 

My lab is based on Proxmox and I decided to use the cluster replication tool Ceph to mirror the disks between hosts so not only do I have HDD protection but also if something happens to a chassis I'm good.

Going down this route also has other advantages. The first is Ceph itself is basically a block storage system that Proxmox knows how to use for images, templates and containers. 

I choose to setup a 3 node mirrored storage device pointed at the SSD's so all of my running machine storage were in sync at all times without having to play snapshot replication and when I want to move a resource to another node migration takes less than 15 seconds most of the time. 

But the second advantage is CephFS. It creates a Linux file system on top of the Ceph volumes and distributes everything from Mount points, file access etc so there is really no master but a quorum of managers. Each volume is available as a mount on each Proxmox node so exposing them to containers is trivial.

This is also where the 2.5gb network comes into play. When I set Cepf up I have all the replication traffic on that separate network to improve performance 

So with CFS I setup that volume to have as least 2 copies distributed on 2 different nodes and I get assurance that if anything happens to any one device I'm ok. Unlike All the HomeLab pics I see in a nice rack my three nodes are in different places on the property, on different circuits and if something we're to happen to one it's highly likely the other two would be affected.

I started with proxmox containers for some things like my samba node but built a container with docker so each of my containers is just running a docker container and nothing more.

Passing the GPU and TPU though to Frigate wasn't overly complicated and it's been great so far. 

Still have some more work to do and I'm thinking I'm going to switch from using docker-compose to Portainer to manage my docker fleet.

And just to add and context, I already use OpenWRT to manage my network and last year switched from a /24 mask to a /22 because between EspHome devices and containers I'm going to have about 300ips on my network in the coming year.

2021/11/19

Using SubSlicing in Python passing "Slice Spec" as a string

Python's implicit [start:end:step] notation makes slicing objects very easy.

But if you want to create a function that accepts optional params that allow you to manipulate the return then your left to create your own interface specifying how to slice the return data.

In the sample below two params are passed to tester

theList - A list object to be sliced
sliceSpec - a string specifying the python slice notation

Python provides a helper called "slice" which accepts three params (start, end, step) which are the components of an implied slice.

But given that 0 (zero) is a legitimate value for any of the arguments when an argument is skipped ":5" the missing must be represented as a "None"

Also the values passed to slice() must be integers and split() will return a list of strings

So the below example takes "sliceSpec" does the following

  1. if sliceSpec is None, then replace with an empty string
        otherwise split(":") fails
  2. Traverse the list returned by split()
    • substitute blank strings with None
    • convert any values to integers
  3. Pass the transformed results from split() into arguments passed to slice()
  4. use the slice operator to access the subList from "theList"

Example Code

def tester(theList, sliceSpec=None):

    returnList = theList[slice(*(int(i) if i != '' else None for i in (sliceSpec or str()).split(":")))]

    print(f"'{sliceSpec}' = {returnList}\n")


mylist = [ 1,2,3,4,5,6,7,8,9 ]

tester(mylist)
tester(mylist, None)
tester(mylist, "")
tester(mylist, ":5")
tester(mylist, "0:1")
tester(mylist, ":1")
tester(mylist, "1:-2")
tester(mylist, "::-1")

Results

$ python3 test

'None' = [1, 2, 3, 4, 5, 6, 7, 8, 9]

'None' = [1, 2, 3, 4, 5, 6, 7, 8, 9]

'' = [1, 2, 3, 4, 5, 6, 7, 8, 9]

':5' = [1, 2, 3, 4, 5]

'0:1' = [1]

':1' = [1]

'1:-2' = [2, 3, 4, 5, 6, 7]

'::-1' = [9, 8, 7, 6, 5, 4, 3, 2, 1]