Sunday, August 14, 2022

The Ideal Software Law

In science, we make abstractions that are simplified models of reality, then we try to describe them with equations that let us make accurate predictions given the conditions assumed by the model. In this post I attempt to do that for software projects.

Contents

The Ideal Gas Law

In physics, the behavior of an idealized gas is described by the ideal gas law: PV=nRT, where P is pressure, V is volume, n is the quantity of gas, R is a constant, and T is the absolute temperature. While real gases don't follow this law exactly, it can be used to make pretty good predictions. It can help you understand how steam engines, refrigerators, and hot air balloons work.

A key insight that follows from this equation is that you can't hold three of the four parameters fixed and change just one parameter. If you have a fixed amount of gas at a given pressure, volume, and temperature, and you increase the temperature, then either the pressure goes up, the volume goes up, or both. If, with the same starting conditions, you decrease the volume, then either the pressure must go up, or the temperature must go down, or both. You can keep any two parameters fixed and change the other two in fixed relationships, but you simply can't hold three of the parameters fixed and change just one. If you try to do that, you will invariably fail: one or more of the other parameters will, perforce, also change.

The Ideal Software Law

We can use a similar equation to convey the relationships among the parameters of software development. Instead of PV=nRT, we have:
FQ=nST
where F is functionality, Q is quality, n is development resources, S is a constant, and T is the amount of time to complete development. As with the ideal gas law, this equation does not precisely apply to real software projects, but it can be used to make predictions and gain insights. In particular, we can see in this formulation the same basic insight as with the ideal gas law: it is not possible to hold all but one of the parameters fixed and change only one parameter. If you try to do so, one or more of the other parameters will, perforce, also change.

The Parameters

Let's take a look at what the parameters in our equation mean and how we might measure them.

Functionality (F)

Functionality represents what our software can do. There are defined ways to measure the functional size of software, such as COSMIC function points, but we would like something simpler that still allows us to understand the relationships between the parameters of the equation. For our purposes, a reasonable proxy for functionality is lines of code (LoC).

We are not claiming that lines of code is a good general metric for measuring productivity. Some people write denser code than others, so can implement more functionality in the same number of lines of code. Some research has concluded that people can write the same number of lines of code per day independent of language, but a higher-level language can express more with the same number of lines of code, so could be used to implement more functionality in the same number of lines of code as compared to a lower-level language. Some projects have a more difficult environment than others, so developers produce fewer lines of code per day in that environment.

However, we are using LoC slightly differently in this case. We are not using it to compare productivity or functionality between projects and teams, but only within the team and project for which we are measuring functionality. We assume that all of the factors mentioned above that affect the LoC metric are constant within the project and time span of interest, so that twice as many lines of code will provide twice as much functionality.

Quality (Q)

For quality, we could use a sophisticated quality model such as ISO/IEC 25010, but for this exercise we will use the simpler Defect Management approach.

Intuitively, it makes sense that higher quality software will have fewer bugs (also called defects). We also expect a larger project to have more total bugs than a smaller project. Roughly speaking, then, we can think of the number of bugs per line of code as being a proxy for the level of quality of a software project. We can call this the bug density (or defect density). We want our parameter to be larger for higher quality software, so we use the reciprocal of the bug density. The reciprocal of density for materials is called specific volume, so we will call this measure bug specific volume (or defect specific volume), and use that as our measure of quality. Our units for quality are thus LoC/bug.

We recognize that there are some practical problems with this measure. Firstly, bugs come in different sizes. For our purpose we will assume some kind of "normalized" bug units, and assign more serious bugs more than one bug unit. Secondly, we don't know how many bugs are in a piece of software until well after it is delivered. We assume those bugs exist and will be revealed over time, at a rate which depends on factors such as how much use the software gets, so although we don't know the number in advance, we can still use this concept in our abstraction to understand the relation of quality to the other parameters.

Resources (n)

Resources, as in Human Resources, refers to the people we have available to work on the project. To a first approximation, n is the number of people developing the project. Many studies have shown that different people have different levels of productivity. For this idealization we assume that there is a baseline developer and that we know what the productivity multiplier is for each of our developers as compared to that baseline developer, despite that in practice this might be difficult, and the factor could be different depending on circumstances. We then define n as the number of baseline developers on the project. If we have a developer who we believe is three times as productive as our baseline, that would increase n by three. Our units for n are thus baseline developers, but for simplicity, we will sometimes just refer to the units for n as people.

Our idealized equation assumes that we could do our project in half the time if we had twice the resources. We recognize that we are blatantly ignoring the problems of the mythical man-month.

Time (T)

Time refers to how much time it will take to complete the project. This is the most straightforward dimension to measure, and because of that it is often the dimension that gets the most attention during project planning. We choose to use days as our units, as that is a commonly used unit for other aspects of software development.

The Software Constant

The units we have selected for the four parameters define the units of the constant S.

F(LoC)Q(LoC/bug)=n(person)S(??)T(days)

Therefore the units for S must be (LoC^2)/(bug*person*days). We can also write this as (LoC/bug)*(LoC/person/day). LoC/bug is a bug specific volume (our quality measure), and LoC/person/day is a development velocity for our baseline developer, so S is the product of a bug specific volume and a per-person development velocity. We can think of S as the "quality velocity" for one baseline developer. A higher value of S means higher productivity: more functionality or quality from a given amount of time, per developer.

So what value should we use for S? Some people (such as Brooks in The Mythical Man-Month) say a programmer can write about 10 lines of production code per day. Other sources use different numbers, but as a baseline we will go with Brooks value of 10 LoC/person/day.

For bug density, various studies have come up with a number from 3 to 50 defects per 1000 LoC. As a starting point, I will select 10 bugs per 1000 LoC, or a bug specific volume of 100. Combining these two values gives 10 * 100 = 1000 as the value of S. This means our baseline developer could, for example, write 10 lines of code with 10 bugs per 1000 LoC in one day, or 20 lines of code with 20 bugs per 1000 LoC.

In reality, different collections of people, different development environments, and different project attributes will all lead to different values of S. Organization should always be looking for ways to increase the value of S for their projects, but for this analysis I am assuming that they have already done this in all the easy ways, and the remaining opportunities to increase S require larger investments and time to have an effect on the project. Thus when analyzing our equation to see what predictions it makes for a particular project, we will assume S is constant.

The form of the equation

The Ideal Gas Law was created by assembling a number of simpler laws that were derived from empirical observations. Each of these simpler laws demonstrated the relationship between two parameters when the other two were held constant.
Our Ideal Software Law is similarly assembled from simpler guidelines. We don't have previously stated laws, so we rely on our intuition to guide us.
  • All other things being equal, functionality is proportional to resources: F ∝ n
  • All other things being equal, functionality is proportional to time: F ∝ T
  • All other things being equal, quality will he higher with more resources
  • All other things being equal, quality will he higher with more time
Because quality is hard to define and measure, we don't actually know how close to being proportional to the other variables it is. For simplicity, we assume that it is proportional to both resources and time, the same as functionality: Q ∝ n and Q ∝ T.

These four rules, when assembled, give us the form of the equation for the Ideal Software Law shown above.

Example

Let's make a concrete example. Let's assume we have a project with the following parameters:
  • The functionality we desire requires 10,000 lines of code
  • Our quality bar is 5 bugs per 1000 lines of code (better than baseline), so 200 LoC/bug
  • We have 10 people on our team, all operating at baseline
  • Our team software constant S is 1000, as calculated above.
How many days should we expect this project to take to complete? From the Ideal Software Law, we have:

10,000 (LoC) * 200 (LoC/bug) = 10 (people) * 1000 (LoC^2/(bug*people*days)) * d (days)

Solving for d, we get d = (10,000*200)/(10*1000) = 200 days. A project team, given the assumptions above (although perhaps not so explicitly), would perhaps deliver this estimate to management when asked how long the project will take.

Analysis

Now let's play with the parameters and see what happens.

The typical scenario is that management comes back to the team and says "That estimate is too long. We need to deliver sooner. Make it happen faster." What options does the team have?

Looking at the Ideal Software Law equation, if we want to make T smaller, we have four options:
  • Make F smaller (less functionality)
  • Make Q smaller (less quality)
  • Make n larger (more developers)
  • Make S larger (higher velocity)
Clearly making S larger would be good, but, as mentioned above, when considering the schedule for a single project, this is unlikely to be a short-term option. That leaves us with three other parameters that can be changed.

We could make n larger by adding more developers to the team. This can be effective if there are people available, but practically speaking is difficult because of limited budgets, the difficulty of finding appropriate developers, and the time-cost of bringing a new team member up to speed. All of those factors make this choice possible but unlikely.

Now we are down to two parameters: functionality and quality. The developer team will typically propose to make F smaller, also called a reduction in scope, by removing features from the project. If this is acceptable to management, then the reduced value of T can be balanced by the reduced value of F.

In many cases, however, management insists on not cutting any features. Now we are left with only one parameter: quality. Because this is the hardest parameter to measure, it is also the one that most often is ignored. In this situation, when T is made smaller and F, n, and S are unchanged, Q must, perforce, be made smaller by the same fraction as T was reduced.

The choice to reduce quality is sometimes made consciously, and could come with a commitment to go back later and improve quality. This is often referred to as taking on technical debt, which is expected to be paid back by improving the code later. The word "debt" is used here in intentional analogy to financial debt: there is a carrying cost to debt in the form of interest, making the total cost continue to go up the longer it remains unpaid. In software, this manifests as more time spent fixing bugs after product release, until such time as the debt is repaid by cleaning up the code to bring its quality back up.

If, however, a decision is made to reduce project time without changing functionality or resources, without consciously recognizing that there will be a reduction in quality, this is effectively like borrowing money without realizing it or having a plan to pay it back. The interest payments will still be there, in the form of more time spent fixing bugs and more time required to add new features, and that will negatively impact the team's schedule on future projects.

Limitations of the abstraction

All abstractions will eventually break down when the parameters go outside the valid range of the abstraction.
  • Newton's law of gravity elegantly describes the paths of the planets, but starts to break down in strong gravitational fields
  • The constant-time swing of a pendulum of a given length starts to change when the pendulum swings too far from its center position
  • The Ideal Gas Law becomes less accurate at lower temperatures, higher pressures, and with larger gas molecules
Understanding the limitations of an abstraction allows us to improve our predictions. In the Parameters section above, I discuss some of the assumptions about each parameter. When we recognize that an assumption does not hold, we can bend the results of our formula to try to compensate.

For example, our formula tells us we can get the same functionality in half the time by doubling our resources. But we know that it takes time to bring a new developer up to speed on a project, so we won't actually be able to cut our time in half. By estimating how much reality deviates from our assumption, we can improve the accuracy of the predictions made by the formula despite the fact that the assumptions behind the formula are not entirely accurate.

Conclusion

By abstracting the parameters of software development and creating an equation, we can make practical predictions about those parameters. We can make such predictions even when the assumptions behind our formula are not completely true.

One of the most important predictions is this:
If you insist on reducing the time available to complete a software project, and you don't increase the number of people on the project or cut some features, the quality of the delivered sofware will decrease proportionally to the reduction in time.

Sunday, May 1, 2022

Home Automation for a Hot Water Recirculating Pump

My bathroom is pretty far from the water heater. It took over a minute of running the hot water for it to actually get hot. That's a lot of water wasted every time I waited for hot water. I wanted hot water faster.

Contents

Recirculating Hot Water

Last year, as part of a bathroom remodel, I had a hot water recirculating system installed. This consisted of a return pipe from the bathroom and a recirculating pump at the water heater to pull water from the return pipe, thus bringing hot water to the bathroom without having to run water down the drain waiting for it to warm up.

Once the system was installed, I learned that the pump is not supposed to run all the time. In addition, the pump, while not terribly noisy, produced enough noise to be annoying, especially in the parts of the house adjacent to the garage where the water heater and pump were located. So I didn't want to run it all the time for that reason.

The installer gave me a timer. I set it up to run in the morning and the evening. My schedule wasn't precise enough to run the timer for just a short amount of time, so I had it set up to run for about an hour. This didn't work very well: besides the noise issue mentioned above, the temperature of the water dropped a noticeable amount during this period. I needed another solution.

Home Automation

My solution was to set up a home automation system with some outlets and some battery-powered pushbuttons and program it so that when one of the pushbuttons was pressed, it would turn on the outlet for a couple of minutes to run the recirculating pump. This has worked well.

Years ago I used a bunch of X10 switches and outlets. I even installed a blocker to isolate the X10 signals in my house from the incoming power line and a coupler to ensure the X10 signals from one 120V leg made it to devices on the other 120V leg. I eventually stopped using those devices and had not installed any other home automation until now.

After looking at what was available, I decided to use the following technologies for my new home automation system:
  • Home Assistant as the controller
    I chose this for two reasons:
    1. I don't want my system to depend on the cloud or to be sending data out to anyone. Home Assistant allows me to do everything myself and be isolated from the internet. My automation won't stop working when my internet connection or someone else's computers or software go down.
    2. I like to tinker. Home Assistant is highly customizable - as long as you are willing to fiddle with it.
  • Zigbee 3.0 devices
    • I looked at Zigbee and Z-wave and decided Zigbee looked like the better choice for number of compatible available devices.
    • I specifically did not want to use wifi devices.
Having made those two choices, the next choice was where to run Home Assistant and how to connect the Zigbee devices to it. I figured I would use a USB Zigbee coordinator. For the Home Assistant host, I considered running it on my desktop (which is always on), on my Synology NAS, or on a bespoke device such as a Raspberry Pi. I learned that Synology announced they would be removing support for external USB devices other than disks, so I eliminated that choice. I starting looking into using a Raspberry Pi and read multiple comments about high failure rates of the SD cards. Someone suggested attaching a USB SSD, which seemed like a good idea, but that would require more research and figuring out how to mount everything. About this time I discovered HA Blue, a nice little device based on the Odroid-N2 with 128GB of on-board eMMC, 4 USB ports, ethernet, and HDMI, all in a good-looking extruded aluminum case, and pre-loaded with Home Assistant. It's a little more expensive than some other options, but for me the added convenience of a pre-installed system and the nice case were worth the price.

Note: Home Assistant Blue has been discontinued and is being superseded by Home Assistant Yellow, which has a built-in Zigbee radio and more expansion slots.

Even after deciding on Zigbee, there were a few different available ways to set up the communication between the Zigbee devices and Home Assistant. After doing some reading, I settled on using zigbee2mqtt. It seems like one of the newer solutions, and one where I would have less trouble integrating a wider variety of devices.

Hardware

For my initial foray into home automation and based on my decisions above, I bought the following:
  • HA Blue bespoke Home Assistant controller pre-loaded with Home Assistant
  • SmartLight Zigbee CC2652P Coordinator v4 USB Adapter preflashed with CC2652P_E72_20210319 firmware to support zigbee2mqtt
  • Some Sonoff S31 Lite Zigbee outlet plugs
  • Sonoff SNZB-01 Zigbee switch
  • Some Linkind Zigbee switches and outlets
I used the Blakadder compatibility list to find devices that were compatible with zigbee2mqtt, then looked at which ones I could get and what they cost. The outlets and switches I bought were on the less expensive end of the range, costing less than $10 each, although the price has since gone up.

Initial Setup

Setting up the HA Blue system was straightforward:
  1. Plug it in to power and ethernet
  2. Look in my DHCP log to see what IP address it was assigned
  3. Open my web browser to port 8123 at that IP address
  4. Wait for it to run through its first-boot setup (about 10 minutes)
  5. Create an account for myself
I set up the Zigbee USB adapter (following a YouTUBE video (but beware, there have been some changes since that video was made):
  1. Plug in the Zigbee USB adapter
  2. Log into HA Blue using my account
  3. Enable Advanced mode in my profile
  4. Create user "mqtt" to handle mqtt stuff
  5. From the Add-on store, install Mosquito Broker
  6. Configure Mosquito Broker by adding the mqtt user, and start it
Once the Zigbee adapter was in place, I set up zigbee2mqtt:
  1. In the Add-on store screen, from the "..." menu, select Repository and add the URL for the zigbee2mqtt repository, then find the Zigbee2mqtt Hass.io Add-on near the bottom and select it
  2. Find the USB port the Zigbee adapter is connected to: in Supervisor, System, Host box, three-dot menu, Hardware is a list of devices in /dev; by plugging and unplugging the Zigbee adapter I could see that it shows up as device 1-1.2 with path /dev/bus/usb/001/004 and as /dev/ttyUSB0. Or you can just assume /dev/ttyUSB0.
  3. Edit the configuration on the zigbee2mqtt module and change the default pot from /dev/ttyACM0 to /dev/ttyUSB0, and change the username to mqtt
  4. Start the module
I also set up ssh to simplify future customizations:
  1. Install the Terminal & SSH Add-on and start it
  2. Open the Terminal & SSH Web UI, which is a web terminal, usable as an alternative to ssh
  3. In the Terminal & SSH Config network page, specify port 22
  4. In the Terminal & SSH Config page, add my public key to the authorized_keys array in single quotes
  5. Save, and restart the module
  6. ssh to the HA Blue as root
At this point I rebooted the HA Blue and looked in the Log for each module to make sure it was working properly.

The above description of setting up zigbee2mqtt is condensed, as I actually had a bit of trouble setting it up, including using an old zigbee2mqtt repository that I later replaced with the newer repository URL given above.

Adding Devices

With Zigbee configured on my HA system, I was ready to add my Zigbee switches and outlets.

In order to add a new Zigbee device to the network, the zigbee2mqtt module must be configured to permit devices to join. Initially I was doing this by directly editing the configuration of the zigbee2mqtt module and changing the value of the permit-join attribute to true. Once the new device had been added, I then edited the configuration again and changed permit-join back to false. Later, I discovered I could just use the Web UI for the zigbee2mqtt module and click on the "Permit join" button, which enables permit-join for 255 seconds with a count-down timer, after which it automatically turns it off.

With the HA Blue system beside me, I enabled permit-join. The LED in the Zigbee adapter started flashing green to indicate that it was in permit-join mode.

The first device I attached was a SONOFF SNZB-01 button:
  1. Pry off the back of the button, remove the paper battery insulation sheet, replace the battery and back
  2. Using a paper clip, press and hold the reset button for 5 seconds, until the red light flashes
  3. After a couple more seconds, the tile for Mosquitto Broker shows "1 device and 3 entities"
  4. Click on "1 device" to open a list of devices
  5. Click on the device to open its details page
  6. Click on the pencil icon by the hex name at the top of the page and rename the device and the entity IDs
  7. Press the button, it briefly shows "single" by the "action" line
  8. Double-click, it briefly shows "double" by the "action" line
Yay, my first Zigbee device is working!

I added a few more devices with basically the same process. Sometimes they would join just by enabling permit-join, but sometimes I also had to reset the device. I had some Sonoff devices and some Linkind devices, and I got them all working, although I did have one unexpected hiccup.

I had purchased a few Linkind outlets. The first one successfully joined my network, but the second one did not. After a few tries, I finally looked at the zigbee2mqtt log and saw that there were error messages saying the unit was not supported. (Lesson: if a new device doesn't join right away, look in the log file for errors!) Although the two outlets were sold under the same product name and looked the same, it turned out they had different model numbers: the unit that worked was ZS190000118 and the unit that failed to join was ZS190000108.

In order to add support for this slightly different flavor of Linkind outlet, I found and followed some instructions to support a new device.
  1. ssh into my HA Blue as root
  2. cd to config/zigbee2mqtt
  3. edit the new file ZS190000118.js
  4. In web browser, open https://github.com/Koenkk/zigbee-herdsman-converters/blob/master/devices/linkind.js, look for Linkind ZS190000118, and copy that stanza into my yaml file (this assumed the description was compatible, which turned out to be true)
  5. Change zigbeeModel to ['ZB_ONOFFPlug_D0008'] (from the zigbee2mqtt log)
  6. Change model to 'ZS190000108' (from the zigbee2mqtt log)
  7. Add the rest of the boilerplate as specified in step 2 of the instructions
  8. Write out the new file
  9. Update the zigbee2mqtt config to add the new device: set advanced:log_level: debug (was warn); set external_converters: - ZS190000108.js
  10. Save, Restart

Programming

Once the hardware was all in place and working, the next step was to set up the programming. It looks like there are multiple ways this can be done, and as a programmer I figured it wouldn't be too hard to write some automation code, but then I discovered Node-RED, a graphical editor plugin.

I installed Node-RED from the Community section of the AddOns menu. I had a bit of trouble with the certificate stuff, but eventually got that working. I then created a flow such that when I pressed one of my buttons, it would turn on the pump for two minutes. I spent too much time trying to figure out how to do the whole thing using standard components, but eventually decided the standard components were not quite up to the task. I ended up using a few function components, in which I wrote a bit of Javascript code.

My buttons are connected to the input of the Add Time function component, which adds time to a counter each time a button is pressed, with a max value. The buttons are also wired to an on-outlet component that turns on the recirculating pump.

Here is the Add Time code:
// On Start flow.set("max_count", 120); // 2 minutes flow.set("button_increment", 80); // 1 minute and 20 seconds // On Message max_count = flow.get("max_count"); button_increment = flow.get("button_increment"); c = flow.get("counter") if (c < 0) { c = 0; } c = c + button_increment; if (c > max_count) { c = max_count; } node.status({fill:"blue",shape:"dot",text:"count:"+c}); flow.set("counter", c); return msg;
Once time has been added to the timer, there is another function that counts down to zero, the Count Down function. The input of the Count Down function is connected to a Ticker component that ticks once per second. The output of the Count Down function is connected to an off-outlet component that turns off the recirculating pump.

Here is the Count Down code:
// On Start flow.set("counter", 0) // On Message c = flow.get("counter") c = c - 1 flow.set("counter", c) if (c > 0) { node.status({fill:"green",shape:"dot",text:"count:"+c}); return {payload:{counter:c}}; } else if (c == 0) { node.status({fill:"yellow",shape:"dot",text:"stop"}); return {payload:"stop"}; } else { node.status({fill:"red",shape:"dot",text:"stopped"}); return {payload:"stopped"}; }
This worked well, but I wanted some kind of feedback so I knew when the pump was on. To get that, I added another smart outlet, into which I plugged a guide light. I then added a function component that monitored the state of the pump switch with a state-changed component, such that when the pump outlet turned on or off, the function would turn on or off the outlet with the guide light. The function also set the node status within Home Assistant so I could see on the Node-RED schematic when it was on or off.

Here is the Outlet State code:
// On Start flow.set("counter", 0) // On Message state = msg.payload; if (state == "on") { node.status({fill:"green",shape:"dot",text:"on"}); } else if (state == "off") { node.status({fill:"red",shape:"dot",text:"off"}); } return msg
After getting this all set up, I spent some time testing with different pump-on times and tweaked the values to be just long enough to get the initial hot water to the bathroom sinks. I'm pretty happy with how it is working now.