Jim McBeath

The Ideal Software Law

2022-08-14T22:28:00.000-07:00

In science, we make abstractions that are simplified models of reality, then we try to describe them with equations that let us make accurate predictions given the conditions assumed by the model. In this post I attempt to do that for software projects.

The Ideal Gas Law
The Ideal Software Law
The Parameters
The form of the equation
Example
Analysis
Limitations of the abstraction
Conclusion

The Ideal Gas Law

In physics, the behavior of an idealized gas is described by the ideal gas law: PV=nRT, where P is pressure, V is volume, n is the quantity of gas, R is a constant, and T is the absolute temperature. While real gases don't follow this law exactly, it can be used to make pretty good predictions. It can help you understand how steam engines, refrigerators, and hot air balloons work.

A key insight that follows from this equation is that you can't hold three of the four parameters fixed and change just one parameter. If you have a fixed amount of gas at a given pressure, volume, and temperature, and you increase the temperature, then either the pressure goes up, the volume goes up, or both. If, with the same starting conditions, you decrease the volume, then either the pressure must go up, or the temperature must go down, or both. You can keep any two parameters fixed and change the other two in fixed relationships, but you simply can't hold three of the parameters fixed and change just one. If you try to do that, you will invariably fail: one or more of the other parameters will, perforce, also change.

The Ideal Software Law

We can use a similar equation to convey the relationships among the parameters of software development. Instead of PV=nRT, we have:

FQ=nST

where F is functionality, Q is quality, n is development resources, S is a constant, and T is the amount of time to complete development. As with the ideal gas law, this equation does not precisely apply to real software projects, but it can be used to make predictions and gain insights. In particular, we can see in this formulation the same basic insight as with the ideal gas law: it is not possible to hold all but one of the parameters fixed and change only one parameter. If you try to do so, one or more of the other parameters will, perforce, also change.

The Parameters

Let's take a look at what the parameters in our equation mean and how we might measure them.

Functionality (F)

Functionality represents what our software can do. There are defined ways to measure the functional size of software, such as COSMIC function points, but we would like something simpler that still allows us to understand the relationships between the parameters of the equation. For our purposes, a reasonable proxy for functionality is lines of code (LoC).

We are not claiming that lines of code is a good general metric for measuring productivity. Some people write denser code than others, so can implement more functionality in the same number of lines of code. Some research has concluded that people can write the same number of lines of code per day independent of language, but a higher-level language can express more with the same number of lines of code, so could be used to implement more functionality in the same number of lines of code as compared to a lower-level language. Some projects have a more difficult environment than others, so developers produce fewer lines of code per day in that environment.

However, we are using LoC slightly differently in this case. We are not using it to compare productivity or functionality between projects and teams, but only within the team and project for which we are measuring functionality. We assume that all of the factors mentioned above that affect the LoC metric are constant within the project and time span of interest, so that twice as many lines of code will provide twice as much functionality.

Quality (Q)

For quality, we could use a sophisticated quality model such as ISO/IEC 25010, but for this exercise we will use the simpler Defect Management approach.

Intuitively, it makes sense that higher quality software will have fewer bugs (also called defects). We also expect a larger project to have more total bugs than a smaller project. Roughly speaking, then, we can think of the number of bugs per line of code as being a proxy for the level of quality of a software project. We can call this the bug density (or defect density). We want our parameter to be larger for higher quality software, so we use the reciprocal of the bug density. The reciprocal of density for materials is called specific volume, so we will call this measure bug specific volume (or defect specific volume), and use that as our measure of quality. Our units for quality are thus LoC/bug.

We recognize that there are some practical problems with this measure. Firstly, bugs come in different sizes. For our purpose we will assume some kind of "normalized" bug units, and assign more serious bugs more than one bug unit. Secondly, we don't know how many bugs are in a piece of software until well after it is delivered. We assume those bugs exist and will be revealed over time, at a rate which depends on factors such as how much use the software gets, so although we don't know the number in advance, we can still use this concept in our abstraction to understand the relation of quality to the other parameters.

Resources (n)

Resources, as in Human Resources, refers to the people we have available to work on the project. To a first approximation, n is the number of people developing the project. Many studies have shown that different people have different levels of productivity. For this idealization we assume that there is a baseline developer and that we know what the productivity multiplier is for each of our developers as compared to that baseline developer, despite that in practice this might be difficult, and the factor could be different depending on circumstances. We then define n as the number of baseline developers on the project. If we have a developer who we believe is three times as productive as our baseline, that would increase n by three. Our units for n are thus baseline developers, but for simplicity, we will sometimes just refer to the units for n as people.

Our idealized equation assumes that we could do our project in half the time if we had twice the resources. We recognize that we are blatantly ignoring the problems of the mythical man-month.

Time (T)

Time refers to how much time it will take to complete the project. This is the most straightforward dimension to measure, and because of that it is often the dimension that gets the most attention during project planning. We choose to use days as our units, as that is a commonly used unit for other aspects of software development.

The Software Constant

The units we have selected for the four parameters define the units of the constant S.

F(LoC)Q(LoC/bug)=n(person)S(??)T(days)

Therefore the units for S must be (LoC^2)/(bug*person*days). We can also write this as (LoC/bug)*(LoC/person/day). LoC/bug is a bug specific volume (our quality measure), and LoC/person/day is a development velocity for our baseline developer, so S is the product of a bug specific volume and a per-person development velocity. We can think of S as the "quality velocity" for one baseline developer. A higher value of S means higher productivity: more functionality or quality from a given amount of time, per developer.

So what value should we use for S? Some people (such as Brooks in The Mythical Man-Month) say a programmer can write about 10 lines of production code per day. Other sources use different numbers, but as a baseline we will go with Brooks value of 10 LoC/person/day.

For bug density, various studies have come up with a number from 3 to 50 defects per 1000 LoC. As a starting point, I will select 10 bugs per 1000 LoC, or a bug specific volume of 100. Combining these two values gives 10 * 100 = 1000 as the value of S. This means our baseline developer could, for example, write 10 lines of code with 10 bugs per 1000 LoC in one day, or 20 lines of code with 20 bugs per 1000 LoC.

In reality, different collections of people, different development environments, and different project attributes will all lead to different values of S. Organization should always be looking for ways to increase the value of S for their projects, but for this analysis I am assuming that they have already done this in all the easy ways, and the remaining opportunities to increase S require larger investments and time to have an effect on the project. Thus when analyzing our equation to see what predictions it makes for a particular project, we will assume S is constant.

The form of the equation

The Ideal Gas Law was created by assembling a number of simpler laws that were derived from empirical observations. Each of these simpler laws demonstrated the relationship between two parameters when the other two were held constant.

Boyle's Law: P ∝ 1/V when n and T are held constant
Charles's Law: V ∝ T when P and n are held constant
Avogadro's Law: V ∝ n when P and T are held constant
Gay-Lussac's Law: P ∝ T when V and n are held constant

Our Ideal Software Law is similarly assembled from simpler guidelines. We don't have previously stated laws, so we rely on our intuition to guide us.

All other things being equal, functionality is proportional to resources: F ∝ n
All other things being equal, functionality is proportional to time: F ∝ T
All other things being equal, quality will he higher with more resources
All other things being equal, quality will he higher with more time

Because quality is hard to define and measure, we don't actually know how close to being proportional to the other variables it is. For simplicity, we assume that it is proportional to both resources and time, the same as functionality: Q ∝ n and Q ∝ T.

These four rules, when assembled, give us the form of the equation for the Ideal Software Law shown above.

Example

Let's make a concrete example. Let's assume we have a project with the following parameters:

The functionality we desire requires 10,000 lines of code
Our quality bar is 5 bugs per 1000 lines of code (better than baseline), so 200 LoC/bug
We have 10 people on our team, all operating at baseline
Our team software constant S is 1000, as calculated above.

How many days should we expect this project to take to complete? From the Ideal Software Law, we have:

10,000 (LoC) * 200 (LoC/bug) = 10 (people) * 1000 (LoC^2/(bug*people*days)) * d (days)

Solving for d, we get d = (10,000*200)/(10*1000) = 200 days. A project team, given the assumptions above (although perhaps not so explicitly), would perhaps deliver this estimate to management when asked how long the project will take.

Analysis

Now let's play with the parameters and see what happens.

The typical scenario is that management comes back to the team and says "That estimate is too long. We need to deliver sooner. Make it happen faster." What options does the team have?

Looking at the Ideal Software Law equation, if we want to make T smaller, we have four options:

Make F smaller (less functionality)
Make Q smaller (less quality)
Make n larger (more developers)
Make S larger (higher velocity)

Clearly making S larger would be good, but, as mentioned above, when considering the schedule for a single project, this is unlikely to be a short-term option. That leaves us with three other parameters that can be changed.

We could make n larger by adding more developers to the team. This can be effective if there are people available, but practically speaking is difficult because of limited budgets, the difficulty of finding appropriate developers, and the time-cost of bringing a new team member up to speed. All of those factors make this choice possible but unlikely.

Now we are down to two parameters: functionality and quality. The developer team will typically propose to make F smaller, also called a reduction in scope, by removing features from the project. If this is acceptable to management, then the reduced value of T can be balanced by the reduced value of F.

In many cases, however, management insists on not cutting any features. Now we are left with only one parameter: quality. Because this is the hardest parameter to measure, it is also the one that most often is ignored. In this situation, when T is made smaller and F, n, and S are unchanged, Q must, perforce, be made smaller by the same fraction as T was reduced.

The choice to reduce quality is sometimes made consciously, and could come with a commitment to go back later and improve quality. This is often referred to as taking on technical debt, which is expected to be paid back by improving the code later. The word "debt" is used here in intentional analogy to financial debt: there is a carrying cost to debt in the form of interest, making the total cost continue to go up the longer it remains unpaid. In software, this manifests as more time spent fixing bugs after product release, until such time as the debt is repaid by cleaning up the code to bring its quality back up.

If, however, a decision is made to reduce project time without changing functionality or resources, without consciously recognizing that there will be a reduction in quality, this is effectively like borrowing money without realizing it or having a plan to pay it back. The interest payments will still be there, in the form of more time spent fixing bugs and more time required to add new features, and that will negatively impact the team's schedule on future projects.

Limitations of the abstraction

All abstractions will eventually break down when the parameters go outside the valid range of the abstraction.

Newton's law of gravity elegantly describes the paths of the planets, but starts to break down in strong gravitational fields
The constant-time swing of a pendulum of a given length starts to change when the pendulum swings too far from its center position
The Ideal Gas Law becomes less accurate at lower temperatures, higher pressures, and with larger gas molecules

Understanding the limitations of an abstraction allows us to improve our predictions. In the Parameters section above, I discuss some of the assumptions about each parameter. When we recognize that an assumption does not hold, we can bend the results of our formula to try to compensate.

For example, our formula tells us we can get the same functionality in half the time by doubling our resources. But we know that it takes time to bring a new developer up to speed on a project, so we won't actually be able to cut our time in half. By estimating how much reality deviates from our assumption, we can improve the accuracy of the predictions made by the formula despite the fact that the assumptions behind the formula are not entirely accurate.

Conclusion

By abstracting the parameters of software development and creating an equation, we can make practical predictions about those parameters. We can make such predictions even when the assumptions behind our formula are not completely true.

One of the most important predictions is this:

If you insist on reducing the time available to complete a software project, and you don't increase the number of people on the project or cut some features, the quality of the delivered sofware will decrease proportionally to the reduction in time.

Home Automation for a Hot Water Recirculating Pump

2022-05-01T21:24:00.000-07:00

My bathroom is pretty far from the water heater. It took over a minute of running the hot water for it to actually get hot. That's a lot of water wasted every time I waited for hot water. I wanted hot water faster.

Recirculating Hot Water
Home Automation
Hardware
Initial Setup
Adding Devices
Programming

Recirculating Hot Water

Last year, as part of a bathroom remodel, I had a hot water recirculating system installed. This consisted of a return pipe from the bathroom and a recirculating pump at the water heater to pull water from the return pipe, thus bringing hot water to the bathroom without having to run water down the drain waiting for it to warm up.

Once the system was installed, I learned that the pump is not supposed to run all the time. In addition, the pump, while not terribly noisy, produced enough noise to be annoying, especially in the parts of the house adjacent to the garage where the water heater and pump were located. So I didn't want to run it all the time for that reason.

The installer gave me a timer. I set it up to run in the morning and the evening. My schedule wasn't precise enough to run the timer for just a short amount of time, so I had it set up to run for about an hour. This didn't work very well: besides the noise issue mentioned above, the temperature of the water dropped a noticeable amount during this period. I needed another solution.

Home Automation

My solution was to set up a home automation system with some outlets and some battery-powered pushbuttons and program it so that when one of the pushbuttons was pressed, it would turn on the outlet for a couple of minutes to run the recirculating pump. This has worked well.

Years ago I used a bunch of X10 switches and outlets. I even installed a blocker to isolate the X10 signals in my house from the incoming power line and a coupler to ensure the X10 signals from one 120V leg made it to devices on the other 120V leg. I eventually stopped using those devices and had not installed any other home automation until now.

After looking at what was available, I decided to use the following technologies for my new home automation system:

Home Assistant as the controller
I chose this for two reasons:
1. I don't want my system to depend on the cloud or to be sending data out to anyone. Home Assistant allows me to do everything myself and be isolated from the internet. My automation won't stop working when my internet connection or someone else's computers or software go down.
2. I like to tinker. Home Assistant is highly customizable - as long as you are willing to fiddle with it.
Zigbee 3.0 devices
- I looked at Zigbee and Z-wave and decided Zigbee looked like the better choice for number of compatible available devices.
- I specifically did not want to use wifi devices.

Having made those two choices, the next choice was where to run Home Assistant and how to connect the Zigbee devices to it. I figured I would use a USB Zigbee coordinator. For the Home Assistant host, I considered running it on my desktop (which is always on), on my Synology NAS, or on a bespoke device such as a Raspberry Pi. I learned that Synology announced they would be removing support for external USB devices other than disks, so I eliminated that choice. I starting looking into using a Raspberry Pi and read multiple comments about high failure rates of the SD cards. Someone suggested attaching a USB SSD, which seemed like a good idea, but that would require more research and figuring out how to mount everything. About this time I discovered HA Blue, a nice little device based on the Odroid-N2 with 128GB of on-board eMMC, 4 USB ports, ethernet, and HDMI, all in a good-looking extruded aluminum case, and pre-loaded with Home Assistant. It's a little more expensive than some other options, but for me the added convenience of a pre-installed system and the nice case were worth the price.

Note: Home Assistant Blue has been discontinued and is being superseded by Home Assistant Yellow, which has a built-in Zigbee radio and more expansion slots.

Even after deciding on Zigbee, there were a few different available ways to set up the communication between the Zigbee devices and Home Assistant. After doing some reading, I settled on using zigbee2mqtt. It seems like one of the newer solutions, and one where I would have less trouble integrating a wider variety of devices.

Hardware

For my initial foray into home automation and based on my decisions above, I bought the following:

HA Blue bespoke Home Assistant controller pre-loaded with Home Assistant
SmartLight Zigbee CC2652P Coordinator v4 USB Adapter preflashed with CC2652P_E72_20210319 firmware to support zigbee2mqtt
Some Sonoff S31 Lite Zigbee outlet plugs
Sonoff SNZB-01 Zigbee switch
Some Linkind Zigbee switches and outlets

I used the Blakadder compatibility list to find devices that were compatible with zigbee2mqtt, then looked at which ones I could get and what they cost. The outlets and switches I bought were on the less expensive end of the range, costing less than $10 each, although the price has since gone up.

Initial Setup

Setting up the HA Blue system was straightforward:

Plug it in to power and ethernet
Look in my DHCP log to see what IP address it was assigned
Open my web browser to port 8123 at that IP address
Wait for it to run through its first-boot setup (about 10 minutes)
Create an account for myself

I set up the Zigbee USB adapter (following a YouTUBE video (but beware, there have been some changes since that video was made):

Plug in the Zigbee USB adapter
Log into HA Blue using my account
Enable Advanced mode in my profile
Create user "mqtt" to handle mqtt stuff
From the Add-on store, install Mosquito Broker
Configure Mosquito Broker by adding the mqtt user, and start it

Once the Zigbee adapter was in place, I set up zigbee2mqtt:

In the Add-on store screen, from the "..." menu, select Repository and add the URL for the zigbee2mqtt repository, then find the Zigbee2mqtt Hass.io Add-on near the bottom and select it
Find the USB port the Zigbee adapter is connected to: in Supervisor, System, Host box, three-dot menu, Hardware is a list of devices in /dev; by plugging and unplugging the Zigbee adapter I could see that it shows up as device 1-1.2 with path /dev/bus/usb/001/004 and as /dev/ttyUSB0. Or you can just assume /dev/ttyUSB0.
Edit the configuration on the zigbee2mqtt module and change the default pot from /dev/ttyACM0 to /dev/ttyUSB0, and change the username to mqtt
Start the module

I also set up ssh to simplify future customizations:

Install the Terminal & SSH Add-on and start it
Open the Terminal & SSH Web UI, which is a web terminal, usable as an alternative to ssh
In the Terminal & SSH Config network page, specify port 22
In the Terminal & SSH Config page, add my public key to the authorized_keys array in single quotes
Save, and restart the module
ssh to the HA Blue as root

At this point I rebooted the HA Blue and looked in the Log for each module to make sure it was working properly.

The above description of setting up zigbee2mqtt is condensed, as I actually had a bit of trouble setting it up, including using an old zigbee2mqtt repository that I later replaced with the newer repository URL given above.

Adding Devices

With Zigbee configured on my HA system, I was ready to add my Zigbee switches and outlets.

In order to add a new Zigbee device to the network, the zigbee2mqtt module must be configured to permit devices to join. Initially I was doing this by directly editing the configuration of the zigbee2mqtt module and changing the value of the permit-join attribute to true. Once the new device had been added, I then edited the configuration again and changed permit-join back to false. Later, I discovered I could just use the Web UI for the zigbee2mqtt module and click on the "Permit join" button, which enables permit-join for 255 seconds with a count-down timer, after which it automatically turns it off.

With the HA Blue system beside me, I enabled permit-join. The LED in the Zigbee adapter started flashing green to indicate that it was in permit-join mode.

The first device I attached was a SONOFF SNZB-01 button:

Pry off the back of the button, remove the paper battery insulation sheet, replace the battery and back
Using a paper clip, press and hold the reset button for 5 seconds, until the red light flashes
After a couple more seconds, the tile for Mosquitto Broker shows "1 device and 3 entities"
Click on "1 device" to open a list of devices
Click on the device to open its details page
Click on the pencil icon by the hex name at the top of the page and rename the device and the entity IDs
Press the button, it briefly shows "single" by the "action" line
Double-click, it briefly shows "double" by the "action" line

Yay, my first Zigbee device is working!

I added a few more devices with basically the same process. Sometimes they would join just by enabling permit-join, but sometimes I also had to reset the device. I had some Sonoff devices and some Linkind devices, and I got them all working, although I did have one unexpected hiccup.

I had purchased a few Linkind outlets. The first one successfully joined my network, but the second one did not. After a few tries, I finally looked at the zigbee2mqtt log and saw that there were error messages saying the unit was not supported. (Lesson: if a new device doesn't join right away, look in the log file for errors!) Although the two outlets were sold under the same product name and looked the same, it turned out they had different model numbers: the unit that worked was ZS190000118 and the unit that failed to join was ZS190000108.

In order to add support for this slightly different flavor of Linkind outlet, I found and followed some instructions to support a new device.

ssh into my HA Blue as root
cd to config/zigbee2mqtt
edit the new file ZS190000118.js
In web browser, open https://github.com/Koenkk/zigbee-herdsman-converters/blob/master/devices/linkind.js, look for Linkind ZS190000118, and copy that stanza into my yaml file (this assumed the description was compatible, which turned out to be true)
Change zigbeeModel to ['ZB_ONOFFPlug_D0008'] (from the zigbee2mqtt log)
Change model to 'ZS190000108' (from the zigbee2mqtt log)
Add the rest of the boilerplate as specified in step 2 of the instructions
Write out the new file
Update the zigbee2mqtt config to add the new device: set advanced:log_level: debug (was warn); set external_converters: - ZS190000108.js
Save, Restart

Programming

Once the hardware was all in place and working, the next step was to set up the programming. It looks like there are multiple ways this can be done, and as a programmer I figured it wouldn't be too hard to write some automation code, but then I discovered Node-RED, a graphical editor plugin.

I installed Node-RED from the Community section of the AddOns menu. I had a bit of trouble with the certificate stuff, but eventually got that working. I then created a flow such that when I pressed one of my buttons, it would turn on the pump for two minutes. I spent too much time trying to figure out how to do the whole thing using standard components, but eventually decided the standard components were not quite up to the task. I ended up using a few function components, in which I wrote a bit of Javascript code.

My buttons are connected to the input of the Add Time function component, which adds time to a counter each time a button is pressed, with a max value. The buttons are also wired to an on-outlet component that turns on the recirculating pump.

Here is the Add Time code:

// On Start
flow.set("max_count", 120); // 2 minutes
flow.set("button_increment", 80);  // 1 minute and 20 seconds

// On Message
max_count = flow.get("max_count");
button_increment = flow.get("button_increment");

c = flow.get("counter")
if (c < 0) {
    c = 0;
}
c = c + button_increment;
if (c > max_count) {
    c = max_count;
}
node.status({fill:"blue",shape:"dot",text:"count:"+c});
flow.set("counter", c);
return msg;

Once time has been added to the timer, there is another function that counts down to zero, the Count Down function. The input of the Count Down function is connected to a Ticker component that ticks once per second. The output of the Count Down function is connected to an off-outlet component that turns off the recirculating pump.

Here is the Count Down code:

// On Start
flow.set("counter", 0)

// On Message
c = flow.get("counter")
c = c - 1
flow.set("counter", c)
if (c > 0) {
    node.status({fill:"green",shape:"dot",text:"count:"+c});
    return {payload:{counter:c}};
} else if (c == 0) {
    node.status({fill:"yellow",shape:"dot",text:"stop"});
    return {payload:"stop"};
} else {
    node.status({fill:"red",shape:"dot",text:"stopped"});
    return {payload:"stopped"};
}

This worked well, but I wanted some kind of feedback so I knew when the pump was on. To get that, I added another smart outlet, into which I plugged a guide light. I then added a function component that monitored the state of the pump switch with a state-changed component, such that when the pump outlet turned on or off, the function would turn on or off the outlet with the guide light. The function also set the node status within Home Assistant so I could see on the Node-RED schematic when it was on or off.

Here is the Outlet State code:

// On Start
flow.set("counter", 0)

// On Message
state = msg.payload;
if (state == "on") {
    node.status({fill:"green",shape:"dot",text:"on"});
} else if (state == "off") {
    node.status({fill:"red",shape:"dot",text:"off"});
}
return msg

After getting this all set up, I spent some time testing with different pump-on times and tweaked the values to be just long enough to get the initial hot water to the bathroom sinks. I'm pretty happy with how it is working now.

From Counting to Complex by Inverse and Closure

2021-11-21T15:31:00.002-08:00

Walking the path from counting numbers to complex numbers.

Preface
Introduction
- Concepts
- Preview
Counting
Addition
Subtraction
- Associative
Negative Numbers
Multiplication
Division
- Associative
Rational Numbers
- Algebra
Exponentiation
Logarithms
Principal Values
Irrational Numbers
- Decimal Notation
Imaginary Numbers
Complex Numbers
Final Closure

Preface

Many years ago I read that Richard Feynman gave a talk to a room full of scientists in which he rederived basic abstract algebra on real numbers in under an hour. Since then I found that Feynman gave this derivation in a discussion on Algebra in his Lectures on Physics, for which I give a link a few paragraphs below.

I'm not going to compete with Feynman, but doing this derivation seemed like a fun challenge to undertake. Below I present my explanation of how one gets to complex numbers based on a few simple concepts: repetition, inverse and closure. Along the way I try to throw in a few comments about abstract algebra. By the end, we will look at Euler's Identity, e^iπ+1=0, and maybe make it a little less mystical than it might appear.

It is not necessary for you to understand all of the references to math terms, so you don't need to follow those links unless you want to learn about that concept. Similarly, it is not necessary for you to follow and understand in detail every proof. Hopefully you can simply ignore any parts you don't immediately understand and yet still get something out of the overall presentation.

I walked this path mostly for my own entertainment, but I thought perhaps others might get something out of it. It is quite long and likely contains some errors, so caveat lector.

Here are a couple of other documents that discuss Algebra that you might find interesting:

Feynman Lectures on Physics, chapter 22: Algebra, including a discussion of Euler's Formula, which Feynman referred to as "one of the most remarkable, almost astounding, formulas in all of mathematics."
Elementary Algebra by J. H. Tanner, PhD, 1904

Introduction

Imagine that none of this stuff exists, so we are making it all up as we go. We are going to define our numbering system from the ground up, gradually building up a structure of definitions and operations that all manage to work together nicely. It's not just by random chance that things work nicely: we are defining our numbers and operations precisely to make them work together nicely.

In the code blocks below, I label each assumption (or definition) with a name such as A1 enclosed in square brackets, like this: [A1]. Lemmas (things which can be proved from the assumptions and are used in later proofs) are labeled similarly but with L rather than A. Other intermediate steps in a proof which are not referenced outside of that proof are labeled similarly but with I. These names may be referenced later to build up additional lemmas. The references look the same, but appear in the text or in comments after an equation rather than before.

Concepts

There are three basic ways we will be extending our system:

Repetition: performing the same operation many times. For example, multiplication is repeated addition.
Inverse: an operation that has the opposite effect of some other operation. For example, subtraction is the inverse of addition.
Closure: the results of an operation are in the same set as the operands. For example, the natural numbers (or positive integers) are closed under addition, because you can add any two natural numbers and get another natural number; but they are not closed under subtraction, because there are some expressions on natural numbers using subtraction whose results are not natural numbers, such as (3 - 5).

Preview

Here is the quick preview of how we will move from counting to complex:

start with zero and the successor function
repeated successors yields counting and the natural numbers
repeated counting yields addition
inverse of addition yields subtraction
closure on subtraction yields negative numbers
repeated addition yields multiplication
inverse of multiplication yields division
closure on division yields rational numbers
repeated multiplication yields exponentiation
inverse of exponentiation yields logarithms
closure on exponentiation with positive rational numbers yields real numbers
closure on exponentiation with negative rational numbers yields complex numbers
all of our operations on complex numbers are already closed, so we are done

If you enjoy playing with math you might want to try doing all of these derivations yourself before reading my derivations.

Counting

At the most basic level, we start with some simple assumptions, which happen to be a subset of the Peano axioms.

We define a starting point for counting. Historically, people typically started with one, but for later simplicity in this exercise we start with zero. We define a successor function s(x) that takes a number x and produces the next number, which by definition is distinct from x.

[A1]  zero exists
[A2]  given x, s(x) generates another number, where s(x) is not the same as x

Equals

We define an equals operator (=) so that the statement a=a is true, and the statement a=b means that, for any true statement containing a, we can replace any or all instance of a by b and the resulting statement will also be true. We further assume that if a=b is false, then the same replacements as described above will generally (but not always) yield a false statement.

[A3]  a=a is true for all a
[A4]  a=b is a replacement rule   (described above)

The equals operator is:

Reflexive: a=a (by definition)
Symmetric: if a=b then b=a. Starting with the true statement a=a and the predicate a=b, by our definition of equals we can replace any instance of a by b in a=a and still have a true statement; we chose to replace the first a by b, yielding b=a.
Transitive: if a=b and b=c, then a=c. Taking the assumed true statement a=b, and applying our equals rule using the second statement b=c, we replace b by c in the first statement, yielding a=c.

[L5.1]  if a=b then b=a           (demonstrated above)
[L5.2]  if a=b and b=c then a=c   (demonstrated above)

For convenience, we define the not-equals operator != to be false whenever equals on the same values is true, and vice=versa.

The above definition also leads almost directly to one of the common ways of solving algebraic equations: performing the same operation to both sides of an equation, such as adding the same number to both sides of an equation, or multiplying both sides by the same number. Here's an example of adding the same amount to both sides of an equation.

a = b            Assume this is our starting equation we are working with
a + c = a + c    True by definition [A3]
a + c = b + c    From [A4]

Note that this works for any function:

[I6.1]  a = b          Assume this is our starting equation we are working with
[I6.2]  f(a) = f(a)    True by definition [A3]
[I6.3]  f(a) = f(b)    From [A4] using [I6.2] as a starting equation
                       and [I6.1] as our replacement rule
[L6.4]  if a = b then f(a) = f(b) for any f defined for a

f(x) might be 2*x, x+3, sin(x), or anything else we desire. Thus we can start with any true equation, perform the same valid operation on both sides, and still have a true equation.

Natural Numbers

Given our previously defined starting point of zero, we now define the natural numbers:

[A7.0]  0=zero
[A7.1]  1=s(0)
[A7.2]  2=s(1)
[A7.3]  3=s(2)
etc. to infinity.

By definition, s(x)!=x, so 1!=0, 2!=1, etc. Note that we did not assume that repeated application of s(x) would not eventually give us the same number. Without that assumption it is possible that, for example, s(s(s(x)))=x, or in other words, 3=0. This yields a "modulo" system, which can be useful. But for this particular exposition, I want to use the "normal" numbers, so we will add the assumption that s(x) is never equal to any previous value in the sequence. More precisely, we assume:

[A8]   For any x, repeated application of the successor function
       any number of times will never generate x.

We have now defined an unending stream of distinct numbers, each of which is a successor to one other number.

Greater Than

We next define the relational operators less than (<) and greater than (>) with the following statements:

[A9]  s(a) > a
[A10]  if (a > b) and (b > c) then (a > c)
[A11]  (b < a) always has the same truth value as (a > b)

We are now at the point where we can count and know (by definition) that each time we count we get a number that is greater than all of the previous numbers. We can start with any number and count up from there by repeated application of the successor function. For example, if we start with 4 (which is s(s(s(s(zero))))) we can count up from there by three by repeated application of the successor function three times to get s(s(s(4))), which we can calculate is 7. This gets unwieldy pretty fast. To make this simpler, let's define an "addition" operator + that gives us the same results as repeated counting.

Addition

We define the addition operator (+) as follows:

[A21]  a + 0 = a
[A22]  a + s(b) = s(a + b)

Some quick examples:

[L23.1]  a + 1 = a + s(0) = s(a + 0) = s(a)
[L23.2]  a + 2 = a + s(1) = s(a + 1) = s(s(a))

Since s(a) = a+1, we also have

[L23.3]  a + s(b) = a + (b+1)
[L23.4]  s(a + b) = (a + b) + 1

For some of what we want to do below, we are going to need to use the rule of induction:

[A24]  If an equation is true for a known value of n,
       and it can be demonstrated to be true for n+1 for any n when true for n,
       then it is true for all natural numbers x where x > n.

Associative

We now show that our addition operator is associative. We want to prove that (a+b)+n = a+(b+n) for all n. We start by showing this is true for n=1, then use induction:

[L25.1]  a + (b + 1) = (a + b) + 1    From [A22], [L23.3] and [L23.4]
[I25.2]  a + (b + n) = (a + b) + n    Inductive assumption, true for n=1
a + (b + (n + 1))
    = a + ((b + n) + 1)               From [L25.1] on (b+(n+1))
    = (a + (b + n)) + 1               From [L25.1] with (b+n) for b
    = ((a + b) + n)+ 1                From [I25.2] applied to (a+(b+n))
    = (a + b) + (n + 1)               From [L25.1] in reverse with (a+b) for a and n for b
[L26]  a + (b + c) = (a + b) + c      Above lines summarized, with c for n+1

Thus by induction we have our proof of associativity.

Commutative

We use a similar approach to show that addition is commutative, such that a+b=b+a. We start by showing that 0 commutes with a for any a.

[I27.1]  0 + 0 = 0                    From [A21] with 0 for a
0 + 1 = 0 + s(0)                      From [L23.1]
    = s(0 + 0)                        From [A22] with 0 for a and b
    = s(0)                            From [L27]
    = 1
[L27.2]  0 + 1 = 1                    Summary of the above few lines
[I27.3]  0 + n = n                    Inductive assumption, true for n=1 from [L27.2]
[I27.4]  0 + (n + 1) = (0 + n) + 1    From [L26]
[I27.5]  0 + (n + 1) = n + 1          By induction from [I27.3] and [I27.4]
[L27.6]  0 + a = a                    From [I27.5] with a for n+1
[I27.7]  0 + a = a = a + 0            From [L27.6] and [A21]
[L27.8]  0 + a = a + 0                From [L5.2]

Now we show that 1 commutes with any number by induction.

1 + (n + 1)
    = 1 + s(n)        From [L23.1] on (n+1) with n for a
    = s(1 + n)        From [A22] with 1 for a and n for b
    = s(n + 1)        From inductive assumption that 1 commutes with n, known true for n=0
    = n + s(1)        From [A21] with n for a and 1 for b
    = n + (1 + 1)     From [L23.1] on s(1) with 1 for a
    = (n + 1) + 1     From [L25.1]
[L28]  1 + a = a + 1  Summary of the above with a for n+1

Finally, we use induction again to show that any two numbers commute.

a + (n + 1)
    = (a + n) + 1       From [L25.1]
    = (n + a) + 1       From inductive assumption that a commutes with n, known true for n=1 [L28]
    = n + (a + 1)       From [L25.1]
    = n + (1 + a)       From [L28]
    = (n + 1) + a       From [L25.1]
[L29]  a + b = b + a    Summary of the above with b for n+1

As a final note for addition, since we have demonstrated that (a+b)+c=a+(b+c), we can omit the parentheses when adding multiple terms without creating any ambiguity.

[A30]  a + b + c = (a + b) + c = a + (b + c)

Repeated application of this rule can be used for addition with four or more terms without parentheses. By combining this rule with [L29] commutative law, we can see that we can take an expression with multiple terms added together, such as a + b + c + d + e and rearrange and group the terms any way we want.

The associative rule also makes it easy to calculate our addition facts. We already know that 1=0+1, 2=1+1, 3=2+1 etc from our definitions [A7] with [L23.1]. That lets us fill in the first row of our addition fact table. We can then calculate all of the n+2 values based on the n+1 values, and repeat ad infinitum for the rest of the numbers.

n + 2 = n + (1 + 1) = (n + 1) + 1
n + 3 = n + (2 + 1) = (n + 2) + 1
n + 4 = n + (3 + 1) = (n + 3) + 1

Wikipedia has proofs of associativity and commutativity of addition, which are similar to mine but actually a little more concise, and here is a proof of commutativity that does not rely on associativity - but I wanted to think through these derivations myself and present them here in-line with the rest of my exposition.

Identity

At this point we know that a+0=a [A21] and 0+a=a [L27.6], or in other words adding zero to any number (on either side, since we showed addition is commutative) yields that number. This is an interesting enough fact that we will give this number a special name: the Identity for addition.

It's easy to show that there is only one identity for addition.

Assume two identity values e and f.
Consider the expression e+f.
Because e is an identity, e+f=f.
Because f is an identity, e+f=e.
Therefore e=f.
[L31] Since this is true for any two identities,
      all are in fact the same one identity.

Algebra

We have built up our concepts in layers, like building a house: we set a foundation with zero and the successor function, put in some rim joists with the natural numbers, and laid on some flooring with the addition operator and its identity element. We have created a little structure from our concepts. Whereas a house is a physical structure, this is an algebraic structure.

It turns out that this algebraic structure is useful enough that mathematicians have given this kind of structure a name: a monoid. A monoid has these characteristics (with our case in parentheses):

It has a set of elements (the natural numbers).
It has a binary operation on those elements (the + operator).
The operation is associative (+ is associative).
The operation is closed (adding two natural numbers always produces another natural number).
It has an identity element (zero).

There are a few rules from the above section that we will use often enough that we want to reference them by name rather than lemma number. We use the first letter of the name of the characteristic, followed by the operator character.

[a+]  a + (b + c) = (a + b) + c      [L26] Associativity of addition
[c+]  a + b = b + a                  [L29] Commutativity of addition
[i+]  a + 0 = 0 + a = a              [A21], [L27.6] Identity for addition

Subtraction

At this point we have the ability to perform addition, which allows us to calculate a value for x in such equations as x = a + b. But we don't yet have the ability to solve for x in the equation a + x = b. We want to add an operation that is the opposite of addition. In other words, if we start with a and add b to it, we want to be able to take the result and perform another operation using b in order to get back to a. An operator that has this characteristic is called an inverse. We are going to define an operation that is the inverse of addition. We will call that operation subtraction, and we will use the dash character (-) as the operator.

Before we defined addition, we already had the successor function [A2] and we defined the numbers [A7] in terms of the successor function. We defined addition with two axioms [A21] and [A22], then showed that adding 1 to any number is the same [L23] as applying the successor function. Including the successor function and the definitions of the numbers in terms of the successor function, we really had four pieces going into the definition of addition.

We could follow the same path and define a predecessor function that is the inverse of the successor function, but instead we will skip that step and work in terms of adding and subtracting 1 instead of successor and predecessor functions.

We define our subtraction operator (-) recursively, similarly to how we defined the addition operator, using an additional axiom [A41.1] in place of defining a predecessor function p(x):

[A41]    a - 0 = a
[A41.1]  (a + 1) - 1 = a
[A42]    a - (b + 1) = (a - b) - 1

So let's see how this works:

3 - 0 = 3                     From [A41]
3 - 1 = (2 + 1) - 1 = 2       From [A41.1], and since 3 is the successor to 2 (i.e. 3=2+1)
3 - 2 = 3 - (1 + 1) = (3 - 1) - 1 = 2 - 1 = (1 + 1) - 1 = 1

Associative

We want to prove the associative laws for subtraction so we know how we can transform various combinations of parentheses and operators. We already know about a + (b + c), so there are three other possible combinations of + and - with the parentheses in the same position:

a - (b + c)
a + (b - c)
a - (b - c)

We start with a - (b + c).

[L43.1]  a - (b + n) = (a - b) - n     Inductive assumption, true for n=1 from [A42]
a - (b + (n + 1))
    = a - ((b + n) + 1)                From [a+]
    = (a - (b + n)) - 1                From [A42]
    = ((a - b) - n) - 1                From [L43.1] on (a-(b+n))
    = (a - b) - (n + 1)                From [A42] with (a-b) for a and n for b
[L43.2] a - (b + c) = (a - b) - c      Above lines summarized, with c for n+1

Next we do a + (b - c), which we do by induction after first doing a + (b - 1).

(a + (n + 1)) - 1
    = ((a + n) + 1) - 1               From [a+]
    = a + n                           From [A41.1] with a+n for a
    = a + ((n + 1) - 1)               From [A41.1] with n for a
[L44] (a + b) - 1 = a + (b - 1)       Above lines summarized, with b for n+1

[L45.1]  a + b = a + (b - 0)          From [A41] with b for a
[L45.2]  a + b = (a + b) - 0          From [A41] with (a+b) for a
[L45.3]  a + (b - 0) = (a + b) - 0    From [L45.1] and [L45.2] by [A4]
[L45.4]  a + (b - n) = (a + b) - n    Inductive assumption, true for n=0 by [L45.3]
a + (b - (n + 1))
    = a + (b - (1 + n))               From [c+] with n for a and 1 for b
    = a + ((b - 1) - n)               From [L43.2] on b-(1+n)
    = (a + (b - 1)) - n               From [L45.4] with b-1 for b
    = ((a + b) - 1) - n               From [L44]
    = (a + b) - (1 + n)               From [L43.2] with a+b for a, 1 for b, n for c
    = (a + b) - (n + 1)               From [c+] with n for a and 1 for b
[L45.5]  a + (b - c) = (a + b) - c    Above lines summarized, with c for n+1

Finally we tackle a - (b - c), which we build up to through quite a few lemmas.

[L46.1] 0 - 0 = 0               [A41] with 0 for a
[L46.2]  (0 + 1) - 1 = 0        [A41.1] with 0 for a
[L46.3]  1 - 1 = 0              From [L27.2] on 0+1
[L46.4]  n - n = 0              Inductive assumption, true for n=1 from [L46.3]
(n + 1) - (n + 1)
    = (n + 1) - (1 + n)         From [c+]
    = ((n + 1) - 1) - n)        From [L43.2] with n+1 for a, 1 for b, n for c
    = n - n                     From [A41.1] on (n+1)-1 with n+1 for a
    = 0                         From [L46.4]
[L46.5] a - a = 0               Above lines summarized, with a for n+1

a - b
    = a - (b + 0)               From a+0=0 with b for a
    = a - (b + (n - n))         From a-a=0 with n for a
    = a - ((b + n) - n)         From [L45.5] with b for a, n for b and c
    = a - ((n + b) - n)         From commutative+
    = a - (n + (b - n))         From [L45.5]
    = (a - n) - (b - n)         From [L43.2]
[L47]  a - b = (a - n) - (b - n)

Substituting a = (c + n), b = (d + n) in [L47] yields
[L48.1]  (c + n) - (d + n) = ((c + n) - n) - ((d + n) - n) = c - d
[L48.2]  c - d = (c + n) - (d + n)      [L48.1] last and first parts

(a - n) + n
    = n + (a - n)               From [c+]
    = (n + a) - n               From [L45.5]
    = (a + n) - n               From [c+]
    = a + (n - n)               From [L45.5]
    = a + 0                     From [L46.5]
    = a                         From [i+]
[L49]  (a - n) + n = a          Above lines summarized

a - (b - c)
    = (a + c) - ((b - c) + c)       From [L48.2] with a for c, b-c for d, c for n
    = (a + c) - b                   From [L49] with c for n
    = (c + a) - b                   From [c+] on a+c
    = c + (a - b)                   From [L45.5]
    = (a - b) + c                   From [c+]
[L50]  a - (b - c) = (a - b) + c    Above lines summarized

We now have all of our rules of association for addition and subtraction. The following four equations, repeated from above, show all eight possible combinations of + and - operators and grouping of three variables.

[L26]    a + (b + c) = (a + b) + c
[L43.2]  a - (b + c) = (a - b) - c
[L45.5]  a + (b - c) = (a + b) - c
[L50]    a - (b - c) = (a - b) + c

Earlier we saw that, because of [L26], we can write a + b + c and know that it is unambiguous. But that is not true if we write a - b - c, because the statement (a - b) - c = a - (b - c) is not in general true. In order to be able to write fewer parentheses, we arbitrarily choose to have a - b - c mean the same thing as (a - b) - c.

[A51]  a - b - c = (a - b) - c

We have specified that the middle variable (b in our equation), following the - operator, should be grouped with the variable on its left, so we call the - operator left-associative; but we generally say it is not associative, meaning it does not associate both ways as does addition.

Unlike addition, subtraction is not commutative, and it has no identity. More precisely, we could say that zero is a right identity for subtraction, but since it is not also a left identity, it is not a simple identity and we usually don't mention it.

Negative Numbers

You may already have noticed that adding the subtraction operator to our structure has created a bit of a problem: we are now able to write expressions which we can not evaluate within our structure. For example, the expression 2 - 4 can not be reduced to a single natural number. When we reduce this equation according to our rules, we eventually get to the point where we need to solve for 0 - 1, and we have no rule to reduce that any further. In other words, our system is no longer a closed system: to state the problem more precisely, the natural numbers are not closed under subtraction.

A pet peeve of mine: elementary school math teachers who tell their students "You cannot subtract 5 from 3." This statement is misleading in its imprecision, since it can be solved with the use of negative numbers. Math is a precise field. The correct statement should include that qualification: "You cannot subtract 5 from 3 using the counting numbers we are studying."

Likewise for other incorrect statements such as "You can not divide 3 by 2" and "You can not take the square root of -4."

We would like to be able to solve any equation we can write with our subtraction operator, so we will define new numbers that we can use for that purpose. We call these numbers negative numbers. We choose to write them using the same digits as we write our natural numbers, with a leading - character, such as -1 and -2.

In our house-building analogy, so far we have built a little house from the foundation upwards, and now we realize we need some more support in order to finish subtraction. Adding negative numbers is like adding another room to that house: in order to have a solid structure, we need to extend our foundation. To save on design work, we are going to reuse the same basic plan as we used when we built up the natural numbers. This is like using the same blueprint for the second room of our house as for the first, except in mirror image because we find symmetry pleasing. Here is a little diagram:


                             +-----+                 +-----+
                            /   3   \               /   3   \
               +----+      +----+----+             +----+----+
               |  2 |           |  2 |             | 5  |  2 |
+------+       +----+-+         +----+-+         +-+----+----+-+
|  1   |       |  1   |         |  1   |         |   4  |  1   |
+------+       +------+         +------+         +------+------+

1. Natural   2. Addition    3. Subtraction     4. Negative Numbers
   Numbers   on Naturals        Oops!          5. Addition on Negatives
                                               3. Completion of Subtraction

Thus we go back to the beginning of our derivation of natural numbers. To distinguish our original numbers from our newly defined negative numbers, we will call all of the numbers generated by our successor function (that would be all numbers 1 and above) the positive numbers. We will call the collection of all of these numbers (positive, negative and zero) the integers. We will call the characteristic of being "positive" and "negative" the sign of the number.

Since we want our rules to apply to all integers, we start by stating that in any of our previous assumptions and derivations, a variable name can refer to any integer unless the specific proof or assumption states otherwise (such as for induction proofs).

We started by defining a successor operator s(x) [A2], and we now define a corresponding predecessor operator p(x) that generates our negative numbers in a way which is symmetric to s(x):

[A61]  given x, p(x) generates another number, where p(x) is not the same as x

In all of our original assumptions and following proofs, we now state that variable names in those assumption refer to any integer. We define the predecessor function as the inverse of the successor function and vice-versa. In other words:

[A62.1]  p(s(a)) = a
[A62.2]  s(p(a)) = a

We define our negative numbers in the same way as we defined our natural (positive) numbers [A7]:

[A63.1]  -1 = p(0)
[A63.2]  -2 = p(-1)
[A63.3]  -3 = p(-2)
etc. to negative infinity.

We take our no-duplicates assumption [A8] on the successor function and state it for the predecessor function:

[A64]  For any x, repeated application of the predecessor function
       any number of times will never generate x.

For the relational operators, we can derive their meaning relative to the predecessor operator:

       s(a) > a           [A9]
       p(s(a)) > p(a)     Apply p(x) to both sides [L6.6]
       a > p(a)           From [A62.1]
[L65]  p(a) < a           From [A9]

Addition

We add to our definition of Addition ([A21] and [A22]) to handle negative numbers, and we extend our induction assumption [A24] to negative numbers:

[A71]  a + p(b) = p(a + b)

[A72]  If an equation is true for a known value of n,
       and it can be demonstrated to be true for n+(-1) for any n when true for n,
       then it is true for all natural numbers x where x < n.

For each of our original assumptions through addition, we have now added similar assumptions to handle our negative numbers. All of our assumptions are completely symmetrical: take any of the original assumptions, replace successor by predecessor, replace 1 by -1, and exchange < with >, and you will get the equivalent assumption for our negative numbers. Because all of our other proofs in those sections are based on those assumptions, the symmetric proofs for negative numbers follow from the symmetric assumptions in exactly the same way as for the natural numbers. Thus all of the results and conclusions in those sections are valid for addition of negative numbers: commutative, associative, identity, algebra.

We list the results of one lemma here, leaving the details of the derivation as an exercise to the reader:

[L73]  a + -1 = p(a)

We derive a couple of other useful results:

       p(s(a)) = a         [A62.1]
       p(a + 1) = a        [L23.1]
       (a + 1) + -1 = a    [L73]
       a + (1 + -1) = a    [a+]
       (1 + -1) = 0
[L74]  -1 + 1 = 0          [c+]

       (1 + -1) = 0                [L74]
       n + -n = 0                  Inductive assumption, true for n=1 [L74]
       (n + -n) + (1 + -1) = 0     From [i+] because (1 + -1) = 0
       (n + 1) + (-n + -1) = 0
       (n + 1) + (-(n+1)) = 0      From p(x) defn
[L75]  a + -a = 0                  Above lines summarized, with a for n+1

The above statement says that, for any element a in our set of natural numbers, there is an element -a (a negative number, negative a) which can be added to that natural number to produce zero (our identity element). We call negative a the inverse element of a, and likewise a is the inverse element of -a.

        -a + a = 0              [L75]
       (-a + a) - a = 0 - a     Subtract a from each side
       -a + (a - a) = 0 - a     [L45.5]
[L76]  -a = 0 - a               [L46.5] and [i+]

        a + -a = 0              [L75]
        (a + -a) - -a = 0 - -a  Subtract -a from each side
        a + (-a - -a) = 0 - -a  [L45.5]
        a = 0 - -a              [L46.5]
[L76.1] a = -(-a)               [L76]

a + -b
    = a + (0 - b)               [L76]
    = (a + 0) - b               [L45.5]
    = a - b                     [i+]
[L77]  a + -b = a - b

Subtraction

As with addition, we note that we can create a set of symmetric assumptions using negative numbers in place of positive numbers, so that all of our results and conclusions of subtraction on positive numbers also work on negative numbers.

For improved symmetry with the definition of addition, we restate our assumptions defining subtraction to use the successor and predecessor functions, and we add a symmetric assumption that covers negative numbers. We no longer need (a+1)-1=0 [A41.1] as an assumption for subtraction, because it is equivalent to p(s(a))=a) [A62.1]. Since these assumptions are just a rewriting of our original assumptions for subtraction, all of our derivations remain the same.

[A41]  a - 0 = a                Repeat of original [A41]
[A81]  a - s(b) = p(a - b)      [A42] restated in terms of s and p
[A82]  a - p(b) = s(a - b)      Symmetric assumption to [A81]

Algebra

With the addition of negative numbers to our structure, our set is closed with respect to subtraction. We now have a set (the integers) with an associative binary operator (+) with an identity (0) and inverse elements (the negative numbers). This algebraic structure is called a group. Because our operator (addition) is commutative, our algebraic structure is an abelian group. The group, however, ignores the subtraction operator.

Multiplication

Once we start using addition for real tasks, we find that we are often adding the same number many times, such as 3+3+3+3. Because this is so common, we would like to define a shortcut - a new operator - that means the same thing. We call this operation multiplication.

There are various conventions for how the multiplication operator is written: x, * and dot are common, and in some cases a convention is adopted that two variables written next to each other with no operator between them are to be multiplied. Most computer programming languages use the asterisk character (*), and I will use that here.

In order to have as much symmetry as we can, and to minimize our design work, we will define multiplication using a similar approach as we did when we defined addition:

[A101]  a * 0 = 0
[A102]  a * (b + 1) = (a * b) + a
[A103]  a * (b - 1) = (a * b) - a

We could equivalently have used a slightly different formulation for [A103] in which we add -1 rather than subtracting 1, as supported by [L77]:

a * (-1)
    = a * (0 - 1)       [L76]
    = (a * 0) - a       [A103]
    = 0 - a             [A101]
    = -a	        [L76]
[L104.1] a * -1 = -a    Above lines summarized

a * (b + -1)
    = a * (b - 1)                            [L77]
    = (a * b) - a                            [A103]
    = (a * b) + -a                           [L77]
    = (a * b) + (a * -1)                     [L104.1]
[L104.2] a * (b + -1) = (a * b) + (a * -1)   Above lines summarized

If the second operand is negative, we can factor that out and we see that it changes the sign of the result.

a * -n = -(a * n)		Inductive assumption, true for n=1
a * -(n + 1)
    = a * (-n - 1)
    = (a * -n) - a
    = -(a * n) - a
    = 0 - (a * n) - a
    = 0 - ((a * n) + a)
    = 0 - (a * (n + 1))
    = -(a * (n + 1))
[L104.3] a * -b = -(a * b)	Above summarized, with b for n+1
[L104.4] -a * b = -(a * b)      Swap a with b and use [c*]

-a * -b = -(-a * b)             [L104.3]
         = -(-(a * b))          [L104.3] again
         = a * b                [L76.1]
[L104.5] -a * -b = a * b        Above lines summarized

Identity and Zero

By setting b=0 in [A102], we see that 1 is a right-identity for multiplication:

       a * (0 + 1) = (a * 0) + a    From [A102] with 0 for b
       a * 1 = 0 + a                From [i+] on LHS, [A101] on RHS
[L105] a * 1 = a

We show by induction that zero multiplied on either side gives zero:

[L106.1] 0 * 0 = 0                   From [A101] with 0 for a
[L106.2] 0 * n = 0                   Inductive assumption, true for n=0
[L106.3] 0 * (n + 1) = (0 * n) + 0   From [A102] with 0 for a, n for b
[L106.4] 0 * (n + 1) = 0 + 0         From [L106.2]
[L106.5] 0 * (n + 1) = 0
[L106.6] 0 * a = 0                   Above summarized with a for n+1

By doing the same proof using [A103] we can conclude that [L106.6] holds for all integers.

We show that 1 is a left identity:

1 * 1 = 1               From [L105] with a=1
1 * n = n	        Inductive assumption, true for n=1
1 * (n + 1)
    = (1 * n) + 1	From [A102] with a=1 and b=n
    = n + 1		From Inductive assumption
[L106.8] 1 * a = a	Above summarized, with a for n+1

Since 1 is both a left identity and a right identity, we can drop the handedness and just refer to it as an identity.

With addition we had one special number, 0, which when added to any number yielded that number. With multiplication we see that we have two special numbers: the number 1 is an identity for multiplication, but 0 is also special, since anything multiplied by 0 yields 0. We choose to use the word "zero", when associated with a specific operation such as multiplication, to mean a value that, when given as an operand to that operator, always yields zero. Our multiplication operator has only one zero, but other systems and operators may have more than one zero.

By the same argument [L31] as for the additive identity, we can see that there is only one multiplicative identity and only one multiplicative zero.

Distributive

We show that multiplication is distributive over addition by induction:

[L107.1]  a * (b + 0) = a * b = (a * b) + 0 = (a * b) + (a * 0)

          a * (b + 1) = (a * b) + a		[A102]
          a * (b + 1) = (a * b) + (a * 1)	From [L105] on rightmost a
[L107.2]  a * (b + n) = (a * b) + (a * n)	Inductive assumption, true for n=1
a * (b + (n + 1))
    = a * ((b + n) + 1)			        From [a+]
    = (a * (b + n)) + a			        From [A102]
    = ((a * b) + (a * n)) + a		        From [L107.2]
    = (a * b) + ((a * n) + a)		        From [a+]
    = (a * b) + (a * (n + 1))		        From [A102]
[L107.3]  a * (b + c) = (a * b) + (a * c)	Above summarized, with c for n+1

The above proof can be repeated using -1 instead of 1 (by [L104.2]), so [L107.3] covers all integers.

Using the same proof steps using [A103] rather than [A102] demonstrates that multiplication distributes over subtraction as well. Since by [L77] subtraction is the equivalent of adding the negative of a number, this is consistent.

[L107.4]  a * (b - c) = (a * b) - (a * c)

     2 * 1 = 2 = 1 + 1
     2 * n = n + n		Inductive assumption, true for n=1
     2 * (n + 1)
	= 2 * n + 2
	= (n + n) + (1 + 1)
	= (n + 1) + (n + 1)
    2 * a = a + a

     1 * b  = b
(0 * 1) * b = (0 * b) + b
(n + 1) * b = (n * b) + b	Inductive assumption, true for n=0
(n + 2) * b = (n * b) + b + b	Inductive assumption, true for n=0
((n + 1) + 1) * b
    = (n + 2) * b
    = (n * b) + b + b
    = ((n + 1) * b) + b
(a + 1) * b = (a * b) + b

Associative

We show multiplication is associative by induction:

[L108.1] (a * b) * 0 = 0 = a * 0 = a * (b * 0)

[L108.2] (a * b) * 1 = a * b = a * (b * 1)	From [L105] on each side
[L108.3] (a * b) * n = a * b = a * (b * n)	Inductive assumption, true for n=1
(a * b) * (n + 1)
    = ((a * b) * n) + (a * b)
    = (a * (b * n)) + (a * b)		        From [L108.3]
    = a * ((b * n) + b)			        From [L107.3] with b*n for b, b for c
    = a * (b * (n + 1))			        From [A102] with b for a, n for b
[L108.4] (a * b) * c = a * (b * c)	        Above lines summarized, with c for n+1

As with the distributive law, we can replace 1 by -1 to show that our conclusion covers negative numbers as wel.

Commutative

m * n = n * m		        Inductive assumption, true for m=0 or 1 and n=0 or 1
(m + 1) * (n + 1)
    = (m + 1) * n + (n + 1)	From [A102]
    = (m * n) + m + (n + 1)	From [(a+1)*b = a*b+b]
    = (n * m) + n + (m + 1)	From Inductive assumption and [a+]
    = (n + 1) * m + (m + 1)	From [same as two lines up]
    = (n + 1) * (m + 1)		From [A102]
[L109] a * b= b * a

As with addition, the fact that multiplication is associative [L108.4] means that, if we have an expression that is a string of values multiplied together, we can drop the parentheses from the expression without creating any ambiguity; and the fact that it is commutative means that we can rearrange all of those multiplied values to any order we want.

Algebra

We have added a second operator to our repertoire that, like addition, is an associative binary operator with an identity. With two such operators, where one distributes over the other, we have a ring (for a more precise definition, follow the link). In the same way that group ignores subtraction, the ring ignores the division operator. As with addition, there are a few rules from the above section that we will use often enough that we want to reference them by name rather than lemma number.

[a*]  a * (b * c) = (a * b) * c         [L108.4] Associativity of multiplication
[c*]  a * b = b * a                     [L109] Commutativity of multiplication
[z*]  a * 0 = 0 * a = 0                 [L106.6] Zero for multiplication
[i*]  a * 1 = 1 * a = a                 [L106.8] Identity for multiplication
[d*]  a * (b + c) = (a * b) + (a * c)   [L107.3] Distributivity of multiplication over addition

Division

As when we defined subtraction to be the inverse operation of addition, we want an inverse operation to multiplication so that we can solve for x in equations such as a * x = b.

We call our inverse operation division. As with multiplication, there are a number of common ways this operation is expressed. For use in this presentation, we choose to use the slash character (/) to represent the division operation. We want division and multiplication each to be the inverse of the other, as is the case with addition and subtraction, so we have two candidate definitions:

[A120.1] (a * b) / b = a        for all a and b except b=0
[A120.2] (a / b) * b = a        for all a and b except b=0

Our definitions exclude zero because we already have a rule that says anything times zero is zero, so we know a priori that we can't make these new rules work for all a when b is zero.

The fact that we can't divide by zero is the first time we have encountered a special case in our structure, where we have to add a qualification to one of our rules stating that you can't do something rather than extending our structure to make it possible to do that. When, in building our structure of numbers, we realized that we could not answer the question "what is 3 - 5?", we expanded the structure to allow us to answer tha question ("negative 2"). In this case, we can't answer the question "what is 5 / 0?", but, for the first time, instead of trying to expand our structure to be able to answer that question, we make the statement "you can't do that". As we will see later, the further we go in defining our structure, the more such exceptions and caveats we need to make.

We check that the two assumptions above are compatible by starting with one and converting it into the other.

(a * b) / b = a                 [A120.1]
((a * b) / b) * b = a * b	Right-multiply both sides by b
(c / b) * b = c			Previous line with c for a*b; this is [A120.2]

We can quickly get some useful lemmas by plugging in a few different values for a and b:

[L121] a / 1 = a        From [A120.1 or 2] with b=1, after a*1=a
[L122] b / b = 1        From [A120.1] with a=1, after b*1=b
[L123] (1/b)*b = 1      From [A120.2] with a=1
[L124] 0 / b = 0        From [A120.1] with a=0, after 0*b=0
[L124.2] a / a = 1      From [A120.1] with a=1 and b=a

If we are looking at the equation

[I125] a = c / b

what does that mean? If we assume

[A126] c = a * b

then [I125] becomes

[I127] a = (a * b) / b

which is [A120.1]. This is true by definition, so our assumption [A126] is a valid assumption to use in solving [I125]. What we are saying here is that the solution (a) to [I125] is the value that, when multiplied by b, gives c.

[L128] If a = c / b, then c = a * b, and vice-versa (from [I125] and [A126])

Associative

As we did with subtraction, we want to prove the associative laws for division so we know how we can transform various combinations of parentheses and the multiplication and division operations. We already know about a * (b * c), so there are three other possible combinations of * and / with the parentheses in the same position:

a / (b * c)
a * (b / c)
a / (b / c)

[I129.1]  a / (b * c) = d            Given
          a = d * (b * c)            From [L128]
          a = (d * c) * b            From [a*] and [c*]
          a / b = d * c              From [L128]
[I129.2]  (a / b) / c = d            From [L128]
[L129.3]  a / (b * c) = (a / b) / c  From [I129.1] and [I129.2]

[I130.1]  a * (b / c) = d            Given
          a * (b / c) * c = d * c    Multiply both sides by c
          a * b = d * c              Reduce b /c * c = b by [A120.1]
[I130.2]  (a * b) / c = d            From [L128]
[L130.3]  a * (b / c) = (a * b) / c  From [I130.1] and [I130.3]

[I131.1]  a / (b / c) = d            Given
          a = d * (b / c)            From [L128]
            = (d * b) / c            From [L130.3]
          a * c = d * b              From [L128]
          c * a = d * b              From [c*]
          (c * a) / b = d            From [L128]
          c * (a / b) = d            From [L130.3]
[I131.2]  (a / b) * c = d            From [c*]
[L131.3]  a / (b / c) = (a / b) * c  From [I131.1] and [I131.2]

We now have all of our rules of association for multiplication and division. The following four equations, repeated from above, show all eight possible combinations of * and / operators and grouping of three variables. Note that this table is identical to the table of rules of association for addition and subtraction, with * instead of + and / instead of -.

[a*]      a * (b * c) = (a * b) * c
[L129.3]  a / (b * c) = (a / b) / c
[L130.3]  a * (b / c) = (a * b) / c
[L131.3]  a / (b / c) = (a / b) * c

We derive a few more useful lemmas.

        a / b
        = (a * 1) / b         From [i*]
        = a * (1 / b)         From [LL130.3]
[L132]  a / b = a * (1 / b)   Summary of the above lines

        1 / (a / b)
        = (1 / a) * b         From [L131.3]
        = b * (1 / a)         From [c*]
        = b / a               From [L132]
[L133]  1 / (a / b) = b / a   Summary of the above lines

        (a / b) * (c / d)
        = ((a / b) * c) / d)                    From [L130.3]
        = (c * (a / b)) / d)                    From [c*]
        = ((c * a) / b) / d)                    From [L130.3]
        = (c * a) / (b * d)                     From [L129.3]
        = (a * c) / (b * d)                     From [c*]
[L134]  (a / b) * (c / d) = (a * c) / (b * d)   Summary of the above lines

        (a / b) / (c / d)
         = ((a / b) * 1) / (c / d)              From [i*]
         = (a / b) * (1 / (c / d))              From [L130.3]
         = (a / b) * (d / c)                    From [L133]
         = (a * d) / (b * c)                    From [L134]
[L135]   (a / b) / (c / d) = (a * d) / (b * c)  Summary of the above lines

Rational Numbers

You may have noticed in the above section about the division operation that we discussed things like 1 / a without commenting on the fact that our number system, which up to now includes only integers, does not in general include the numbers that can represent that. The proper sequence would have been to introduce rational numbers first, but I wanted to finish the discussion about the properties of the division operation before discussing rational numbers. With that out of the way, let's turn to rational numbers.

We can easily build a table for specific values of a, b and c for equation [I125] by taking all pairs of integer values for a and b, generating c as their product, and defining the value of c/b to be a for all of those triplets. For example, 2*3=6, therefore 6/3=2.

Our division table does not include all possible combinations of c/b, so there are some division equations for which the answer can not be found in our tables. For example, 3/2 does not appear in our table because, in our system of numbers up to this point, which is all integers, there is no number that, when multiplied by 2, yields 3.

In order for our numbers to be closed under division, we have to add some new numbers, which are the numbers needed to solve the equation c/b when there is no integer number a such that a*b=c. We call these numbers rational numbers, because they are the ratio of two integers, and we choose to represent them as a fraction using the division operator. In other words, when we ask what is the answer to the equation c/b, we are simply defining the answer to be c/b and stating that that value is a number. We will then examine how to manipulate these numbers.

We have defined rational numbers as numbers of the form c/b. We also know from our table-based enumeration of division equations that, for any number c which can be written as a*b, the value of the division equation c/b is a. We define the value of our rational number that we write as c/b to be consistent with the known solutions of our division equations written the same way. Thus the value of the rational number 6/3 is defined to be 2, etc.

Algebra

With division as the inverse of multiplication, the multiplicative identity 1, and rational numbers, our ring is now a field.

This is as far as we will go with algebra. When we continue with exponentiation to derive real numbers and then complex numbers, those structures are still fields.

Operator Precedence

Up to now, we have been using parentheses to ensure that the order of application of operators in an expression is unambiguous. We noted earlier that we don't need those parentheses in an expression that consists solely of a number of values added together, and likewise that we don't need parentheses in an expression that consists solely of a number of values multiplied together. This is nice because it reduces the amount of writing we need to do.

We can further reduce the need for parentheses by defining a rule that tells us which operations to evaluate first when there are no parentheses to guide us. When we start with an operation and then define a second operation as the repeated application of the first operation, we can think of that second operation as being more powerful than the first operation. We then give priority to the more powerful operator, defining our rule of precedence to be that, in an expression in which the order of evaluation would otherwise be ambiguous, we will evaluate the more powerful operators first.

We define addition (+) and subtraction (-) to be at the first level, and multiplication (*) and division (/) to be at the second level and higher power than the first level. Thus, for example, the expression a + b * c will be equal to a + (b * c), and the expression a / b - c will be equal to (a / b) - c.

In cases where there are multiple operators of the same power, we define the order of evaluation to be left to right. Thus, for example, the expression a / b * c will be equal to (a / b) * c, and the expression a - b + c will be equal to (a - b) + c.

Exponentiation

Up to this point the structure we have built is pretty clean. With rational numbers and our four operators (+, -, *, /), we have a system that is closed and mostly complete and consistent, with the only exception being that we can't divide by zero. Other than that one exception, operations are well-defined, we have a nice set of rules including our commutative, associative, and distributive rules, and we have a host of identities and lemmas we can apply to our rational numbers.

Once we add exponentiation, things get a lot messier: we will have expressions that have multiple values, bigger swaths of undefined operations, and many places where our lemmas and rules of manipulation no longer apply. It might seem like it's hardly worth trading our nice clean rational numbers for this mess. But despite all of the rough edges, there are enough useful things you can do with real and complex numbers that it is worth carefully defining where those rough edges are and avoiding them. So, let's forge ahead.

As with addition, once we start using multiplication for real problems, we often find we want to multiply the same number together many times, such as 3*3*3*3. As we did when defining multiplication, we define a new operator that means the same as repeated multiplication. We call this new operation exponentiation. In programming languages this is sometimes written using the up-arrow (^) as an operator, but since this is HTML we have the luxury of using the standard notation, which is to write the exponent as a superscript. For example the expression 3⁴ means 3 multiplied by itself 4 times, or 3 * 3 * 3 * 3. We call the number on the left the base, and the superscript number the exponent. The operation of exponentiation is also referred to as taking a base to a power, where the power is the exponent.

In line with our precedence rules by which we evaluate higher-power operations first, we will evaluate exponentiation before multiplication, division, addition, and subtraction, when there are no parentheses to otherwise indicate the order of evaluation.

From [a*] we know we can group repeated multiplication any way we want, so for example 3 * 3 * 3 * 3 = (3 * 3 * 3) * 3 = (3 * 3) * (3 * 3). Using our new superscript notation, we can write this as 3⁴ = (3³) * (3¹) = (3²) * (3²). More generally, we can see these things from our definition of exponentiation and [a*]:

[L201.1]  a^{(b + c)} = a^b * a^c
[L201.2]  a¹ = a
[L201.3]  (a^b)^c = a^{(b * c)}
[L201.4]  (a^b)^c = a^b*c = a^c*b = (a^c)^b   From [L201.3] and [c*]

We can figure out how to deal with (a * b)ⁿ by starting with n=2:

 (a * b)²
      = (a * b) * (a * b)
      = a * b * a * b        From [a*]
      = a * a * b * b        From [c*]
      = a² * b²
[L201.5] (a * b)² = a² * b²  Summary of above lines

Then we use induction for the general case:

Assume (a * b)ⁿ = aⁿ * bⁿ for some n
(a * b)^{(n + 1)}
      = (a * b)ⁿ * (a * b)¹  From [L201.1]
      = (aⁿ * bⁿ) * (a * b)  From [L201.5]
      = aⁿ * a * bⁿ * b      From [a*] and [c*]
      = a^{(n + 1)} * b^{(n + 1)}
True when n=2 from [L201.5], so by induction true for all positive n
[L201.5] (a * b)ⁿ = aⁿ * bⁿ

Unlike addition and multiplication, we can quickly see from counterexamples that exponentiation is neither commutative:

2³ = 2 * 2 * 2 = 8
3² = 3 * 3 = 9
8 != 9, so 2³ != 3²

nor associative:

2^(3²) = 2^{(3 * 3)} = 2⁹ = 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 = 512
(2³)² = (2 * 2 * 2)² = 8² = 8 * 8 = 64
512 != 64, so 2^(3²) != (2³)²

These initial lemmas are based on our intuitive definition of exponentiation as repeated multiplication, which provides obvious answers only in the case where the exponent is a counting number (strictly positive integer). Let's extend our definition to cover other numbers in our algebra.

[A202.1] d = b + c                  Starting assumption
[I202.2] b = d - c
         a^d = a^b * a^c               From [A202.1] and [L201.1]
         a^d / a^c = a^b * a^c / a^c     Assuming a^c!=0
[I202.3] a^b = a^d / a^c
[L202.4] a^{(d - c)} = a^d / a^c          Substitute b from [I202.2]

We can't divide by zero, so the above is not valid when a^c is zero. When is that expression zero? From the definition of exponentiation, this expression represents repeated multiplication of a. What number when multiplied by itself is zero? There is only one such number: zero. So [L202.4] is not valid when a = 0, but it is valid for any other base.

Let's look at two special cases of [L202.4].

      a⁰ = a^{(1 - 1)}       From [L46.5], a!=0
         = a¹ / a¹       From [L202.4]
         = a / a         From [L201.2]
         = 1             From [L124.2]
[L203] a⁰ = 1            Above lines summarized, a!=0

      a^-b = a^{(0 - b)}      From [L76], a!=0
         = a⁰ / a^b       From [L202.4], b!=0
         = 1 / a^b        From [L203]
[L204] a^-b = 1 / a^b      Above lines summarized, a!=0, b!=0
[L204.1] a^-1 = 1 / a     From [L204] with b = 1, and [L201.2]

The above extends our exponentiation operator to all integer exponents and all bases other than zero. What about rational exponents?

Remember that our goal is to define a set of consistent and useful operations. To that end, we want to ask ourselves how we can define exponentiation using a rational exponent such that it is consistent with the rest of our algebra. Rational numbers are equivalent to division using integers, which is the inverse of multiplication. Our exponentiation rule [L201.3] includes multiplication, from which we can derive a rule for division.

        a = a¹           [L201.2]
          = a^{(b / b)}      From [L124.2], b!=0
          = a^{(b * 1/b)}    From [L132]
          = a^{(1/b * b)}    From [c*]
          = (a^1/b)^b      From [L201.3]
[L205] (a^1/b)^b = a       Summary of the above lines

What the above says is that the value of a^1/b is the number that, when raised to the power b, is equal to a. For example, the number a^1/2 is the number that, when raised to the power 2, is equal to a. We call a^1/b the b-th root of a. The case where b is 2 or 3 is common enough that we define special names: we call a² a squared and a^1/2 the square root of a; we call a³ a cubed and a^1/3 the cube root of a.

Previously when we added a new operation to represent repeated application of an earlier operation (addition as repeated counting and multiplication as repeated addition), we did not encounter closure problems until we added an inverse operation to the newly added operation (subtraction, division). As we will see below, this is not the case for exponentiation: here we will run into closure problems even without an inverse operation. But to keep the flow the same as with the other operators, I will discuss the inverse operation before getting back to closure.

Logarithms

As when we defined division to be the inverse operation of multiplication, we want an inverse operation to exponentiation so that we can solve for x in equations such as a^x = b.

We call our inverse operation logarithm.

There is a curious hole in math terminology about logarithms. Our other operations all have names: we talk about performing addition, multiplication, or exponentiation. We do addition by adding two addends to get a sum. But we don't "do logarithm": we "take a logarithm". The word logarithm refers to one of the elements in that operation, similar to how the word exponent refers to one of the elements in the operation of exponentiation. There seems to be no single word for logarithms that corresponds to the operation names such as addition, multiplication, and exponentiation. Talking about logarithms is like talking about sums rather than addition.

[A221.1] log_a(a^b) = b        for all a and b except a=0 or b=0
[A221.2] a^log_ab = b           for all a and b except a=0 or b=0

We can derive a few lemmas for log.

[L222.1] log_a(a) = log_a(a¹) = 1       [L201.2] and [A221.1] with b=1
[L222.2] log_a(1) = log_a(a⁰) = 0       [L203] and [A221.1] with b=0
[L222.3] log_a(1/a) = log_a(a^-1) = -1   [L204.1] and [A221.1] with b=0

[I223.1] log_a(a^c) = c                     [A221.1] using c instead of b
[I223.2] log_a(a^d) = d                     [A221.1] using d instead of b
[I223.3] log_a(a^c) + log_a(a^d) = c + d      Add left sides and right sides of [I223.1] and [I223.2]
[I223.4] log_a(a^c+d) = c+d                 [A221.1] using c+d instead of b
[L223.5] log_a(a^c+d) = log_a(a^c) + log_a(a^d)  Transitive equals on [I223.3] and [I223.4]

[I224.1] log_a(a^c) + log_a(a^d) = c + d      Subtract left sides and right sides of [I223.1] and [I223.2]
[I224.2] log_a(a^c-d) = c-d                 [A221.1] using c-d instead of b
[L224.3] log_a(a^c-d) = log_a(a^c) - log_a(a^d)  Transitive equals on [I224.1] and [I224.2]

[I225.1] log_a(a^c+d) = log_a(a^c*a^d)           [L201.1]
[I225.2] log_a(a^c+d) = log_a(a^c) + log_a(a^d)   [L223.5]
[I225.3] log_a(a^c*a^d) = log_a(a^c) + log_a(a^d)  Transitive equals on [I225.1] and [I225.2]
[L225.4] log_a(x*y) = log_a(x) + log_a(y)      Substitute x for a^c and y for a^d

[I226.1] log_a(a^c-d) = log_a(a^c/a^d)           [L202.4], a^d!=0
[I226.2] log_a(a^c-d) = log_a(a^c) - log_a(a^d)   [L224.3]
[I226.3] log_a(a^c/a^d) = log_a(a^c) - log_a(a^d)  Transitive equals on [I226.1] and [I226.2]
[L226.4] log_a(x/y) = log_a(x) - log_a(y)      Substitute x for a^c and y for a^d, y!=0

Principal Values

Previously, we noted that, when we added division to our algebraic structure, we had to add a small complication in that we can't divide by zero. When we add square root (or, more generally, exponentiation with any non-integer exponent), we run into another kind of special case where we have to take additional care: multivalued functions. We note that every number has two square roots: for example, the square root of 4 is 2 or -2, because either of those numbers, when multiplied by itself, is equal to 4. With multivalued functions like square root, we can run into trouble if we are not careful about choosing which value to use. Here's an example of this problem:

(4^1/2)² = 4
4^1/2 * 4^1/2 = 4
2 * 4^1/2 = 4                 Substitute 2 as the first square root
2 * -2 = 4                   Substitute -2 as the second square root
-4 = 4                       Wrong!

The bad substitution in the above sequence may be easy to spot and understand, but as we go further into building our algebra, problems of this nature become subtler and harder to recognize.

We can reduce the probability of running into this kind of problem by carefully selecting which of these multiple values to use. When we have one preferred value for a multivalued function, we call that the principal value of the function. For example, the principal value of sqrt(4) is 2.

Irrational Numbers

The ancient Greeks knew that 2^1/2 (the square root of two) is not a rational number. There are a lot of proofs of this. I happen to like this one that demonstrates that all roots (square root, cube root, and others) that are not integers are not rational.

Assume a^b = c (b=2 for square root, b=3 for cube root, etc)
and a = d/e, e!=1
where d/e is reduced to the lowest form, so they have no prime factors in common.
Then a^b = (d/e)^b = d^b/e^b = c = c/1
But d^b has no prime factors that are not in d,
and e^b has no prime factors that are not in e,
so d^b and e^b have no prime factors in common,
and the fraction can not be reduced at all,
and in particular can not be reduced to c/1,
therefore it can not be equal to c.
Since there is no rational number satisfying the original assumption,
any solution must not be a rational number,
except in the case that e=1, which means the root is an integer.

In order for our numbering system to be closed under exponentiation, we need to extend our numbers to include these values that are not rational numbers. We call them irrational numbers.

When we added negative numbers and rational numbers, that was after we had added not only an operation defined by repetition, but also its inverse. In this case, we had to extend our numbers to provide closure even without having yet added that inverse operation.

A brief aside about infinity: before adding irrational numbers, our set of numbers was always countably infinite, which means there was always a way to map the entire set of numbers onto the counting numbers. For example, we can count off all the integers, both positive and negative, by ordering them like this: 0, 1, -1, 2, -2, 3, -3, and so on. We can count off all the rational numbers by ordering them according to the sum of the numerator and denominator and alternating positive and negative, like this: 0, 1/1, -1/1, 1/2, -1/2, 2/1, -2/1, 1/3, -1/3, 2/2, -2/2, 3/1, -3/1, 1/4, and so on, then removing duplicates (any fraction that is not reduced). But once we add all the irrational numbers we can no longer come up with a counting order like this, which is why we say the set of all irrational numbers is uncountable.

For a proof of this assertion, look up Canter's diagonalization argument.

Decimal Notation

When we introduced rational numbers, such as 1/2, we defined their values in terms of the division operation, but did not provide any other representation. This was perhaps acceptable, as we can easily manipulation rational numbers in order to answer questions about them.

With irrational numbers, it is not quite so easy. How can we tell, for example, which of 2^1/2, 3^1/3, or 723/510 is the largest? We would like a representation that allows us to do real-world calculations with these values.

When counting up with integers, we use a place-notation system in which each digit, as we move to the left, represents a value that is ten times as much as the digit just to its right. For example, 1234 means 1 * 1000 + 2 * 100 + 3 * 10 + 4. We extend this sequence by defining each place to the right of the ones digit as having a place value of one tenth of the digit to its left. In order to unambiguously know which place is the ones place, we put a decimal point (.) just to the right of the ones digit (we in America, that is; in some other parts of the world people use a comma (,) instead). For example, 0.5678 means 5 * 1/10 + 6 * 1/100 + 7 * 1/1000 + 8 * 1/10000.

We can convert fractions to decimal form such as a.bcde by remembering that that means a + b/10 + c/100 + d/1000 + e/10000

723/510 = (510 + 213) / 510
       = 510/510 + 213/510
       = 1 + 213/510
       = 1 + 10 * 213/510 / 10
       = 1 + 2130/510 / 10
       = 1 + (2040 + 90)/510 / 10
       = 1 + 2040/510 / 10 + 90/510 / 10
       = 1 + 4/10 + 10 * 90/510 / 100
       = 1 + 4/10 + 900/510 / 100
       = 1 + 4/10 + (510 + 390)/510 / 100
       = 1 + 4/10 + (510/510 + 390/510) / 100
       = 1 + 4/10 + 1/100 + 390/510 / 100
       = 1 + 4/10 + 1/100 + 10 * 390/510 / 1000
       = 1 + 4/10 + 1/100 + 3900/510 / 1000
       = 1 + 4/10 + 1/100 + (3570 + 330)/510 / 1000
       = 1 + 4/10 + 1/100 + (3570/510 + 330/510) / 1000
       = 1 + 4/10 + 1/100 + 7/1000 + 330/510 / 1000
       = 1.417 + more digits from 330/510 / 1000

Figuring out the decimal representation for a number such as 2^1/2 is not quite as straightforward, but we can start by the brute-force approach of trial and error to get an estimate.

1² = 1, 1<2
2² = 4, 4>2, so our number must start with 1
1.1² = 1.21
1.2² = 1.44
1.3² = 1.69
1.4² = 1.96
1.5² = 2.25 so our number must start with 1.4
1.41² = 1.9881
1.42² = 2.0164 so our number must start with 1.41
1.411² = 1.990921
1.412² = 1.993744
1.413² = 1.996569
1.414² = 1.999396
1.415² = 2.002225 so our number must start with 1.414

From this much we can determine that 2^1/2 is less than 723/510. We don't have an exact answer, but for real world questions we often don't need to go to very many decimal digits to get the answer.

Our decimal notation is a sum of fractions, so any finite decimal number can be converted to a rational number. Conversely, irrational numbers can not be exactly represented as a decimal number, we can only approximate them when using decimal notation. If we want to maintain an exact representation of an irrational number such as 2, we have to keep it in that notation or something similar.

Imaginary Numbers

Adding irrational numbers extends our numbers to include the value of 2^1/2 and other fractional roots of positive numbers, but it doesn't cover everything. In particular, our numbers don't yet include a value for the expression -1^1/2. This is the square root of negative 1, which is equal to the number that, when multiplied by itself, equals negative 1. But any positive number multiplied by itself is a positive number, and from [L104.5] any negative number multiplied by itself is also a positive number, so we don't have any numbers that are candidates to be the square root of negative 1. In order to have exponentiation be closed for negative bases, we need to extend our numbers. We need to add a set of numbers that, when multiplied by themselves, produce negative numbers.

When we added negative numbers, we used our existing counting numbers with an added character (-) in front to indicate a negative number. We will do something similar here, using our existing counting numbers with an added character, in this case the letter i, following the number to indicate the new kind of numbers we are adding. We define 1i (or just i) to be the number such that i² = -1, and given a number a, we define ai = a * i (which is consistent with a common convention of defining ab = a * b).

We need to pick a name to distinguish these new numbers from what we had before, and "the square root of negative one" is too unwieldy, so we pick a shorter name and call them imaginary numbers.

When we defined negative numbers, we might have instead called them imaginary numbers, because you can't have negative lengths or a negative number of apples in the real world, so those numbers are not real, right? In the sense that they are highly useful for certain mathematical calculations, imaginary numbers are no more "imaginary" than negative numbers. It is unfortunate that we are stuck with a name that causes some people to get distracted from thinking about these new numbers as simply the next step in expanding our numbering system to be closed under exponentiation.

To distinguish them from our newly added imaginary numbers, we go back and lump together our previously defined rational and irrational numbers and call those real numbers. Having made the distinction between real and imaginary numbers, we note that we can have imaginary rational numbers, such as (1/2)i, or imaginary irrational numbers, such as 2^1/2i, as well as negative imaginary numbers such as -4i or negative irrational imaginary numbers such as -2^1/2i.

If we work through the mechanics of addition and subtraction with imaginary numbers, we find that they work the same as real numbers but with that extra i everywhere. To put it another way, imaginary numbers are closed under addition and subtraction. This is not the case with multiplication: imaginary numbers are not closed under multiplication, since i * i = -1, which is not an imaginary number. Similarly, imaginary numbers are not closed under division, since i / i = 1, which is not imaginary.

Complex Numbers

Since we defined imaginary numbers as being a different set of numbers from real numbers, we can't convert from one to the other, so if we try to add a real number a and an imaginary number bi together, we can't reduce that, so we just write it as a + bi. We call this kind of number a complex number, and since a or b could be zero, we note that all real numbers and all imaginary numbers are complex numbers.

We are, in a sense, cheating when we use the + symbol to enumerate the real and imaginary parts of a complex number, because, as just stated, we can't actually do anything with that operator to reduce the number. In that sense, we could have used any special character in that location. But we choose to use the + sign because it turns out the rules we have that deal with the + operator on real numbers also work with complex numbers: commutative, associative, and distributive rules all work consistently when applied to complex numbers when we use a + sign between the real and imaginary parts.

As with square root, complex numbers come with multivalued functions, some with an infinite number of solutions. It's easy to get bad results if you're not careful, so it's important to define a principal value for these functions and consistently use it.

Cartesian Coordinates

Since real and imaginary numbers can't be reduced to each other and are thus orthogonal, we can represent them on the plane. We choose real to be the X axis and imaginary to be the Y axis.

With this cartesian environment, we can represent complex numbers in polar coordinates using the standard conversion: (r, θ) = (sqrt(x² + y²), arctan(y/x), where x is the real part and y is the imaginary part (and with the appropriate sign adjustments for quadrants other than I). Converting the other way, we have (x, y) = (r * cos(θ), r * sin(θ)). Sometimes we refer to a complex number as z, where we can decompose it either by real and imaginary parts, written as x = Re(z), y = Im(z), or by polar coordinates, written as r = |z|, θ = Arg(z), where |z| is the magnitude of z and Arg(z) is the argument of z. More precisely, arg(z) is the argument of z, and Arg(z) is the principal argument of z. arg(z) is a multi-valued function equal to Arg(z) + n*2*π for all integer values of n.

We can treat our complex numbers as vectors in the two dimensional complex plane, so that adding two complex numbers can be displayed in our plane as vector addition. More interesting is multiplication, where we can see that when we use polar coordinates we get this nice result: (r1,θ1) * (r2,θ2) = (r1*r2, θ1+θ2).

(r1,θ1) * (r2,θ2) = (r1*cos(θ1) + r1*sin(θ1)i) * (r2*cos(θ2) + r2*sin(θ2)i)
    = r1*(cos(θ1) + sin(θ1)i) * r2*(cos(θ2) + r2*sin(θ2)i)
    = r1*r2 * (cos(θ1) + sin(θ1)i) * (cos(θ2) + r2*sin(θ2)i)
    = r1*r2 * (cos(θ1)*cos(θ2) + cos(θ1)*sin(θ2)i + sin(θ1)*cos(θ2)i + sin(θ1)*sin(θ2)*i²
    = r1*r2 * ((cos(θ1)*cos(θ2) - sin(θ1)*sin(θ2)) + (cos(θ1)*sin(θ2) + sin(θ1)*cos(θ2))i)
    = r1*r2 * (cos(θ1+θ2) + sin(θ1+θ2)i)
    = r1*r2*cos(θ1+θ2) + r1*r2*sin(θ1+θ2)i
    = (r1*r2, θ1+θ2)
[L301] (r1,θ1) * (r2,θ2) = (r1*r2, θ1+θ2)    The above summarized

Euler's Formula

Here is Euler's Formula:

e^iθ = cos(θ) + i*sin(θ)

Feynman calls this "one of the most remarkable, almost astounding, formulas in all of mathematics" and refers to it as an "amazing jewel".

As described in an article at Brilliant, Euler's Formula can be derived using the series expansions of sin(x), cos(x), and e^x:

cos(x) = 1 - x²/2! + x⁴/4! - ...
sin(x) = x - x³/3! + x⁵/5! - ...
e^x = 1 + x + x²/2! + x³/3! + ...

so:

e^i*x = 1 + i*x + (i*x)²/2! + (i*x)³/3! + (i*x)⁴/4! + (i*x)⁵/5! + ...
    = 1 + i*x - x²/2! - i*x³/3! + x⁴/4! + i*x⁵/5! - ...
    = (1 - x²/2! + x⁴/4! - ...) + i*(x - x³/3! + x⁵/5! - ...)
    = cos(x) + i*sin(x)

In the section on Cartesian Coordinates above, we noted that any complex number can be represented in polar coordinates using r and theta, but we didn't have a good place to put the i. With Euler's Formula, we can now unambiguously represent any complex number z = x + i*y as |z| * e^i*arg(z) where |z| is the magnitude of z and arg(z) is the argument of z.

Complex Exponentiation

Given w = u + i*v and z = x + i*y, how do we calculate w^z?

We would like w^z to satisfy the rules of exponentiation that we derived for real numbers, such as k^a+b = k^a * k^b. We will assume that we can apply this rule to complex exponentiation and see how that works out.

From the discussion of Euler's Formula above we know that we can represent any nonzero complex number w as |w|*e^i*arg(w), and we can represent the real number |w| as e^ln(|w|). Let's see where that takes us.

w^z = (|w|*e^(i*arg(w)))^z                                       Expand w
    = (e^ln(|w|)*e^i*arg(w))^z                                    Use exp form for magnitude of w
    = (e^{ln(|w|)+i*arg(w)})^z                                     e^a * e^b = e^a+b
    = e^{(ln(|w|)+i*arg(w))*z}                                    (e^a)^b = e^a*b
    = e^{(ln(|w|)+i*arg(w))*(x+i*y)}                               Expand z to real and imaginary parts
    = e^{ln(|w|)*x + ln(|w|)*i*y + i*arg(w)*x + i*arg(w)*i*y}          (a+b)*(c+d)=ac+ad+bc+bd
    = e^{((ln(|w|)*x - arg(w)*y) + i*(ln(|w|)*y + arg(w)*x)}           i²=-1 and rearrange terms
[L310] w^z = e^{((ln(|w|)*x - arg(w)*y) + i*(ln(|w|)*y + arg(w)*x)}     The above summarized

This gives us a number of the form r * e^i*θ where r = e^{((ln(|w|)*x - arg(w)*y)} and θ = ln(|w|)*y + arg(w)*x, both of which we can evaluate.

Note that the above result includes arg(w) in two places, once multiplied by x and once multiplied by y. arg is a multi-valued function, and thus complex exponentiation is also multi-valued for all exponents except zero.

If we are raising to a real power, then y is zero, so [L310] reduces to

w^x = e^{((ln(|w|)*x) + i*(arg(w)*x)}   [L310] with y=0
    = |w|^x * e^i*arg(w)*x         For real x and all w

This equation says the magnitude of the result is the magnitude of w raised to the x power and the arg of the result is the arg of w multiplied by x. If, for example, we are squaring and thus x is 2, we square the magnitude of the number and double the angle. This result is consistent with our earlier observation that, when multiplying two complex numbers, we can multiply the magnitudes and add the angles.

If y is zero and x is an integer, then e^i*arg(w)*x gives the same result for all of the multiple values of arg(w), so the overall function is single-valued. If x is not an integer, this is not the case. For example, if x is 1/2, then we get two different answers by plugging in Arg(w) and Arg(w) + 2*π. These are the two square roots of a number: they always have the same magnitude and differ in angle by π.

If we consider the path that would be traced out for powers of some fixed w as we change the real exponent, we can see that it generates a circle or a spiral. Here is a nice visualization of z^x from Suitcase of Dreams for when |z|>1:

If we are raising to an imaginary power, then x is zero, so [L310] reduces to

[L311] w^i*y = e^{(-arg(w)*y + i*ln(|w|)*y)}   [L310] with x=0

Let's evaluate iⁱ. We use [L311] with w=i and y=1:

iⁱ = e^{(-arg(w) + i*ln(|w|))}     [L311] with w=i and y=0
    = e^-π/2 * e^{i * 0}          |w|=1, ln(1) is 0
    = e^-π/2                  Imaginary part drops out completely!
    = 0.207879...

Surprisingly, iⁱ is a real number, a little larger than one fifth. At least, that's one answer. We can use any of the answers e^{-π/2 + k*2π} for any integer k.

We see that we can represent any nonzero complex number in the form e^i*z, given z = x + i*y.

e^i*z = e^i*(x+i*y)
    = e^{i*x + i*i*y}
    = e^{-y + i*x}
    = e^-y * e^i*x

One interesting thing we can do now is to extend Euler's Formula from real theta to complex theta, which allows us to define sin and cos for the entire complex plane:

e^i*z = cos(z) + i*sin(z)
e^-i*z = cos(z) - i*sin(z)      cos is an even function, sin is an odd function
e^i*z + e^-i*z = 2*cos(z)
cos(z) = 1/2 (e^i*z + e^-i*z)
e^i*z - e^-i*z = 2*i*sin(z)
sin(z) = 1/(2*i) (e^i*z - e^-i*z)

Euler's Identity

We evaluate Euler's Formula with theta set to pi:

e^i*π = cos(π) + i*sin(π)
    = -1 + 0
    = -1

We add one to both sides to get the typical presentation, e^i*π + 1 = 0.

Not only does this identity tie together five of the key values of algebra (e, π, i, 1, and 0), it does it with one each of the key operations we derived above (equality, addition, multiplication, exponentiation). That's a pretty sweet equation.

Final Closure

Throughout this presentation, we have expanded our system of numbers as we defined new operators and discovered our system of numbers was not closed under the new operators. But with complex numbers, we have reached a point where we don't need to define any new number types. Complex numbers are sufficient to solve all algebraic equations. This is one of the interpretations of the Fundamental Theorem of Algebra, but the proofs are pretty difficult, so I'm not going to try to prove it here.

You Are Not Alone

2021-08-17T11:32:00.000-07:00

Imagine that you live alone. I don't mean living by yourself in an apartment or house, I mean imagine you are the only person in the world. Furthermore, imagine that no other people have ever touched the world, so that you are living in a wilderness without any of the artifacts of humanity. Kind of like Brian in Hatchet, but without the hatchet. And without any clothes or any other manufactured items

Think about what you have to do to survive:

Gather or hunt your own food and prepare it
Protect yourself from predators and parasites
Create protection from the environment, such as clothes and a shelter
Care for your own injuries and illnesses

In that situation, how much could you accomplish? What could you create? What wealth could you accumulate?

The Only Human

But let's go one step further. Think about all of the knowledge you have that you learned from someone else rather than from direct experience. Now imagine that you did not know any of that. You only know the things that you have learned through you own interactions with the world. We'll be generous and say you also know things that you might reasonably have discovered on your own.

Now how much could you accomplish?

Remember, you can only work with available natural materials such as wood and stone. You can't use metal, ceramic, plastic, rubber, or cloth unless and until you can make it yourself. Remember also, we are assuming you don't have any knowledge except that which you have learned through direct experience. You are unlikely to even know that any of those materials exist or are possible, let alone know how to create them.

Would you be able to survive? Would you have any time left over to start the long process of discovering, learning about, and making any of the unavailable materials just mentioned? Compared to what you own today, and the accomplishments of your real life so far, how much could you have collected or accomplished in our imaginary situation?

Knowledge is Power

Let's ease up on the restrictions a bit and allow you to retain all of the knowledge you have. In fact, let's take it one step further and make available to you all of the collected knowledge and experience of humankind. Basically let's say you have internet access. Now you can look up anything you want, even if you have never thought about it before. You can read about and watch videos on how to make a bow and arrow, or how to knap flint to make an arrowhead, or how to make steel, or how a computer works.

Of course, reading about how to do something and actually being able to do it are not the same thing. If you want to make an arrowhead, first you'll have to find and identify some flint, then you'll have to practice, practice, practice knapping before you get a decent arrowhead. You should eventually be able to make your flint arrowhead and an arrow to attach it to, and with a lot more work you'll be able to make a functional bow. Your internet connection will provide you with many details that would take much longer to get right if you had to figure them out yourself, such as what kind of wood to use, how to fletch and nock the arrow, how to make string, how to make glue, and how to string your bow.

The knowledge you can get from your internet connection will help you much more quickly learn how to identify edible and poisonous plants, skin and cure animal hides, make fire (it's harder than you might think; rubbing two sticks together is not an effective approach), make and fire ceramic (clay), and maybe, if you are lucky enough to find some copper ore (which your internet knowledge can help you identify), create some metal tools.

There are many things you will not be able to create by yourself, even with a long and healthy life and with access to all that information. As examples, producing integrated circuits and stainless steel require far more prerequisite infrastructure than you could create in one lifetime. But having access to the distilled knowledge of millions of lifetimes of exploration and experimentation will allow you to create much more than you could if, as in our initial supposition, you had to learn everything yourself.

With all that knowledge available to you, how much could you create and accomplish in a world without other people and their creations as compared to your current life?

We've seen how much more you would likely be able to create if you had access to the knowledge of humankind via the internet. In the real world as well, we use that knowledge to help us accomplish much more than we could without it. We don't have to rely solely on what we have directly learned from our own experience. We benefit from the experiences and knowledge collected by many other people.

The Wealth of the World

What if, in addition to the knowledge humankind has collected, you also had access to the physical things humankind has created? Let's now assume that the world exists just as it does today, with all of its roads, factories, and other infrastructure, but with no other people. What could you accomplish?

The first question is, how long will all that infrastructure continue to operate without any people? How long will you continue to have electricity, water, communications, or the internet? If you were to apply your time and energy towards keeping those systems up, how much difference would it make? Probably not a lot. Those systems are too big, there are too many, and they require too much experience for your efforts as one person to make much difference. Without the continuing work of a very large number of people, all of these systems, that we rely on in the ordinary course of our lives, would likely fail relatively quickly.

If all those systems fail, what could you accomplish? You could perhaps figure out how to generate some electricity, but keeping that system running would certainly take some of your time. And you would still have to spend some time collecting and preparing food. For a while you could live off canned and preserved food that you could raid from a grocery store, but eventually you'd have to start gathering or hunting again, and that would cut into the time you have available for doing other work.

But for our imaginary scenario, let's say all of those systems continued to work. Let's even take it a step further, and stipulate that all of the factories and supply chains continue to operate. We'll even say you can order stuff online. So basically, everything works as it does in the real world, except that you don't have the ability to communicate or collaborate with any people. Now we are essentially asking, how much can you create or accomplish in the real world if you do not collaborate with anyone else or specifically ask anyone else to do some custom work for you?

People Power

This is not that much different that the way many people operate, and some people can create amazing things. One person can create a wonderful piece of art, or a fun computer program, or an elegant piece of furniture. But most of the things in the world, and all of the most complex and sophisticated things, are made by groups of people, sometimes very large groups of people, collaborating towards a common goal.

I hope that this exercise has helped you see how much all of us rely on the work of other people to accomplish what we do. In all of our lives, there are innumerable people who have helped us get to where we are and whose labors continue to contribute to our success. There is no person walking this earth who has not been helped by someone else at some point. As babies, we would have died if there were no one feeding us and caring for us. We have all learned things from teachers, friends, strangers, and, through media, from people we have never met. We have all inherited wealth from our ancestors, whether it is a personal mansion or the use of our public streets, bridges, and other infrastructure. We use knowledge from around the world and across time. We benefit from the factories and other capital created by our ancestors that provide us with better and less expensive goods. We rely on the labor of others to provide us with food, clean water, electricity, and many other things, so that we can focus on our own specialty. For large projects, we collaborate with others to get more done, and even for small projects we may solicit some piece of custom work from someone else. In all of these ways, the work of other people, both past and present, makes it possible for us to own more, do more, and produce more than we could without them.

The next time you think "I did it all myself", please remember to be grateful for all the people who helped you do it: all the people who kept you alive and cared for you as a baby or beyond, all the people who gained the knowledge of the world, all the people who helped you learn some of it, all the people who built the world around you, all the people who made things that you now have, and all the people who are still providing goods and services to you. You are not alone.

Recreating Butter Streusel

2021-05-23T15:32:00.000-07:00

Long ago, while I was living for one year in Heidelberg, Germany, I frequented a bakery on the Hauptstrasse that sold a baked good called Butter Streusel. It was half way between a cookie and a coffee cake: not quite an inch tall, about half of that being the streusel topping, with a base that was denser and crisper than cake. I never found it at any other bakery, and when I returned to Heidelberg some years later, that bakery no longer sold it either.

A lot of people baked bread during the pandemic. I decided it was a good time to make some Butter Streusel. I tried a couple of recipes from the web, but they didn't quite match my memory. I have to admit, however, that after so many years, it's possible that what I remember never existed. As Mark Twain may or may not have said, "The older I get, the more clearly I remember things that never happened."

I started with a yeast-based recipe with a photo that looked somewhat like what I remembered, but I wanted something more dense, and I decided I didn't like the yeast taste in something that should taste more like a cake or cookie than bread. I looked through a bunch of recipes for streuselkuchen, shortbread, short cake, pound cake, vanilla cake, and biscuits, and started experimenting.

One of my goals was that it should be easy to make, so some of my experimentation was not only about what ingredients to use, but how to mix them together. On trial #13 I had something I liked. A friend said it was "amazing." Here it is.

Butter Streusel

Preheat oven to 375F.
Get a 9x12x2in baking pan (a shorter pan or even a cookie sheet should work) and parchment paper, but don't put the paper in the pan yet.

Base

Prepare the base dough:

140g softened butter (salted) (10 tbsp)
60g sugar
2 tsp vanilla

200g flour
1 tsp baking powder (double acting)
1/4 tsp salt

100ml milk

Blend together butter, sugar, and vanilla in a bowl large enough for all the base dough ingredients.
Mix flour, baking powder, and salt (you can use a big bowl for this and reuse it for the topping), then mix into butter mixture.
Add milk and mix to smooth consistency, knead for about 2 minutes, then let rest for 10 minutes.
(I used an electric mixer for all of the above steps, except for pre-mixing the dry ingredients and for the last two minutes of kneading.)

Topping

While the base dough is resting, prepare the streusel topping:

160 g flour
120 g sugar
120 g butter, melted (melted makes the topping lumpier, which is good)

Mix all ingredients together, leaving some lumps.

Assemble and Bake

Get a piece of parchment paper a few inches larger than the pan. Lay the parchment paper on the counter, and spread the base dough on it using a couple of tablespoons until it is the size of the bottom of the pan. Place the parchment paper with dough into the pan and push down the sides and corners to make it flat. Spread the streusel over the top.
Bake at 375F for 35 minutes.
Remove from oven. Lift parchment paper with contents and place on a cutting board to cool. Cool about 20 minutes, then cut into 2x3in bars.

Yield: 18 bars about 3/4in thick.

Transferring MiniDV Tapes to Linux

2020-09-04T17:42:00.000-07:00

Downloading miniDV tapes from my Sony DCR-TRV22 camcorder to my Fedora 32 Linux system with a Thunderbolt 3 port was easy, using dvgrab and a couple of Apple converters to go from FireWire to Thunderbolt 3.

Many years ago I transferred all my VHS home videos to disk through the somewhat painful process of first recording them onto DVDs using a DVD recorder, then ripping the DVDs on my computer. My next video transfer project was to transfer my more recent home videos from miniDV to disk. There was always other work to do, and transferring the tapes was never a critical task, so it was easy to put off. I was thinking it probably would be time consuming but not too difficult, since my Linux computer had an IEEE 1394 (FireWire) port, so I wasn't too worried about it.

When the lockdown started earlier this year, that presented a good opportunity for me to start my tape transfer project. I grabbed my miniDV camcorder and my box of tapes, then went to get a cable to connect the camcorder to my computer. It was only then that I remembered that I upgraded to a new computer at the beginning of this year and gave away the old computer. The old computer, from 2010, had the IEEE 1394 port, the but new one did not. Oops! I waited a bit too long for this supposedly easy job.

My new computer has a ton of ports of various flavors, so it seemed possible that it might still work, if I could get the right cables and converters. After some digging, it looked like it should be possible to use the USBC Thunderbolt port on my new computer. But I couldn't find much support for whether it would work when run through converters on a current version of Linux. The required converters are pretty expensive, but I decided to take a chance and buy them.

My Sony DCR-TRV22 camcorder has a 4-pin FireWire 400 jack, and I had FireWire 400 to 800 cable. I purchased an Apple Thunderbolt to FireWire Adapter for $29 and an Apple Thunderbolt 3 (USB-C) to Thunderbolt 2 Adapter for $49, and for good measure I also purchased a FireWire 400 to 800 Adapter for $10 (in case I had to use a different cable), which I ended up not using. I connected the cable to the camcorder, connected the other end of the cable to the FireWire to Thunderbolt adapter, plugged the FireWire to Thunderbolt adapter into the Thunderbolt 2 to Thunderbolt 3 adapter, and plugged the Thunderbolt 2 to Thunderbolt 3 adapter into the USBC Thunderbolt 3 port on my computer. Then I ran dvgrab, which I had installed earlier. And... it did not see the camera. Rats.

# lsmod | grep -i fire
  (nothing)
# lspci | grep -i fire
  (nothing)

Fortunately, it turned out to be an easy fix. I was able to determine that the Thunderbolt to FireWire adapter was visible by looking in /sys/bus/thunderbolt:

# cat /sys/bus/thunderbolt/devices/0-3/device_name
  Thunderbolt to FireWire Adapter

I found the solution in an Ubuntu bug report: the Thunderbolt device had to be authorized. (Note that your device number might be different.)

# cat /sys/bus/thunderbolt/devices/0-3/authorized
  0
# echo 1 > /sys/bus/thunderbolt/devices/0-3/authorized
# lspci | grep -i fire
  40:00.0 FireWire (IEEE 1394): LSI Corporation FW643 [TrueFire] PCIe 1394b Controller (rev 08)
# lsmod | grep -i fire
  firewire_ohci          45056  0
  firewire_core          81920  1 firewire_ohci
  crc_itu_t              16384  1 firewire_core

At this point I was able to insert a tape into the camcorder and test it:

$ dvgrab foo-

This created the file foo-001.dv. By installing the mediainfo program, I was able to see the datestamp of the recording:

$ mediainfo foo-001.dv | grep date
  Recorded date                            : 2015-12-25 10:32:28.000

The actual command I used to download the tape is:

$ dvgrab --autosplit --timestamp --size 0 --rewind --showstatus dv-

At this point I could just put a tape in the camcorder, rewind it, run the above dvgrab command, come back an hour or so later when it was done, then put in the next tape and repeat. It took a long time to get through all my miniDV tapes, but not much work.

Go Composition vs Inheritance

2019-11-28T06:59:00.001-08:00

Go does not support inheritance, but sometimes using embedded structs can look a little like inheritance. I explore that feature to see how it differs.

Introduction
Base class
Subclass
Main and test
Overriding
Downcall
Method promotion
Solution
Conclusion

Introduction

In lieu of inheritance, the Go language encourages composition by allowing one struct to be embedded in another struct in a way that allows calling methods defined on the embedded struct as if they are defined on the containing struct.

Note: In this post I occasionally use object-oriented terminology such as base class, subclass, and override. Please remember that Go does not support these concepts; I am using those terms here to show how thinking that way with Go can lead to problems.

For the examples that follow, I assume we are building a graphical editor that allows manipulating visual objects on the screen. We want to be able to draw those objects, and we want to be able to transform them with operations such as rotate, so we define an interface with those methods:

Note: For convenience, the final collected code used in this post is available on play.golang.org.

type shape interface {
  draw()
  rotate(radians float64)
  // translate and scale omitted for simplicity
}

We write a function that will draw all our shapes:

func drawShapes(shapes []shape) {
  for _, s := range shapes {
    s.draw()
  }
}

Base class

We define our "base class", called polygon, where we implement a draw method that we can invoke from our "subclasses":

type polygon struct {
  sides int
  angle float64
}

func (p *polygon) draw() {
  fmt.Printf("draw polygon with sides=%d\n", p.sides)
  vertexDelta := 2*math.Pi / float64(p.sides)
  vertexAngle := p.angle
  x0 := math.Cos(vertexAngle)
  y0 := math.Sin(vertexAngle)
  for i := 0; i < p.sides; i++ {
    // Draw one side within unit circle, offset by p.angle.
    vertexAngle += vertexDelta
    x1 := math.Cos(vertexAngle)
    y1 := math.Sin(vertexAngle)
    fmt.Printf("draw from (%v, %v) to (%v, %v)\n", x0, y0, x1, y1)
    x0 = x1
    y0 = y1
  }
}

func (p* polygon) rotate(radians float64) {
  p.angle += radians
}

Subclass

We define a couple of "subclasses", triangle and square, that "extend" our "base class", along with functions to create instances of those types:

type triangle struct {
  polygon
}

type square struct {
  polygon
}

func createTriangle() *triangle {
  return &triangle{
    polygon {
      sides: 3,
    },
  }
}

func createSquare() *square {
  return &square{
    polygon {
      sides: 4,
    },
  }
}

Main and test

Finally, we write a couple of test functions to create a list of shapes and draw them, and a one-line main function that calls our test function.

package main

import (
  "fmt"
  "math"
)

func createTestShapes() []shape {
  shapes := make([]shape, 0)
  shapes = append(shapes, createTriangle())
  shapes = append(shapes, createSquare())
  return shapes
}

func testDrawShapes() {
  drawShapes(createTestShapes())
}
func main() { testDrawShapes() }

When we run this program, it produces the expected output:

draw polygon with sides=3
draw from (1.000, 0.000) to (-0.500, 0.866)
draw from (-0.500, 0.866) to (-0.500, -0.866)
draw from (-0.500, -0.866) to (1.000, -0.000)
draw polygon with sides=4
draw from (1.000, 0.000) to (0.000, 1.000)
draw from (0.000, 1.000) to (-1.000, 0.000)
draw from (-1.000, 0.000) to (-0.000, -1.000)
draw from (-0.000, -1.000) to (1.000, -0.000)

Note that we have not defined any methods on the triangle and square types, yet the compiler accepts them as implementing shape, as seen by the fact that we can store them in a slice of shape and we can invoke draw on them. Because we embedded polygon in triangle and square, without giving them field names, Go has promoted all of the methods in polygon into the namespaces of triangle and square, allowing draw to be called directly on an instance of type triangle or square.

So far, relying on an object-oriented mental model has not caused us problems. Let's keep going and see when it does.

Overriding

We add a typeName method to our shape interface and our "base class", polygon, and we "override" that method in our "subclasses", triangle and square:

type shape interface {
  draw()
  rotate(radians float64)
  // translate and scale omitted for simplicity
  typeName() string
}

func (p *polygon) typeName() string {
  return "polygon"
}

func (p *triangle) typeName() string {
  return "triangle"
}

func (p *square) typeName() string {
  return "square"
}

We can test our typeName methods by pointing our main to a different test function:

func printShapeNames(shapes []shape) {
  for _, s := range shapes {
    fmt.Println(s.typeName())
  }
}

func testShapeNames() {
  printShapeNames(createTestShapes())
}
func main() { testShapeNames() }

This outputs:

triangle
square

No problems yet.

Downcall

Let's add a method to our interface and "base class" that invokes the method that we are overriding, and a new test function to call it. This is sometimes referred to as a downcall, in that a superclass calls into the overriding method of a subclass that is below it in the class hierarchy.

type shape interface {
  draw()
  rotate(radians float64)
  // translate and scale omitted for simplicity
  typeName() string
  nameAndSides() string
}

func (p *polygon) nameAndSides() string {
  return fmt.Sprintf("%s (%d)", p.typeName(), p.sides)
}

func printShapeNamesAndSides(shapes []shape) {
  for _, s := range shapes {
    fmt.Println(s.nameAndSides())
  }
}

func testShapeNamesAndSides() {
  printShapeNamesAndSides(createTestShapes())
}
func main() { testShapeNamesAndSides() }

This outputs:

polygon (3)
polygon (4)

Well, that doesn't look right. We wanted it to print triangle and square instead of polygon both times. Thinking of this as inheritance has led us astray.

Method promotion

So, what happened here? Why did printShapeNames work, but printShapeNamesAndSides did not? Let's dig into that.

The return value of createShapes is []shape, which is a slice of objects that implement the shape interface. Since the triangle and square types implement that interface, we can store instances of those types in that slice. But how is it that those types implement that interface when we didn't write those methods for those types? The answer is method promotion.

When we embed one type inside another without giving the internal type a field name, Go automatically promotes all unambiguous names from the embedded type to the containing type. Effectively, for each method in the embedded type whose name does not conflict with a method in the containing type or in any other embedded type within that container, Go creates a method on the containing type that turns around and calls that method on the embedded type. For example, when we embed polygon in triangle the compiler effectively creates this code:

func (t *triangle) typeName() string {
  return t.polygon.typeName()
}

If the embedded type satisfies an interface, and there are no ambiguous method names, this promotion of all the methods of the embedded type makes the containing type also satisfy that interface. Let's explore this method promotion behavior. We create another struct type called thing that has a typeName method, embed it along with our previously defined polygon, which also has a typeName method, in a new type polygonThing, then try to assign an instance of that to a variable of type shape.

type thing struct{}
func (t *thing) typeName() string { return "thing" }

type polygonThing struct {
  polygon
  thing
}

func testPolygonThing() {
  p := &polygonThing{}
  p.draw()
  fmt.Println(p.typeName())
  var s shape = p
  fmt.Println(s.typeName())
}
func main() { testPolygonThing() }

When we compile this, we get these errors:

./comp.go:130:16: ambiguous selector p.typeName
./comp.go:131:7: polygonThing.typeName is ambiguous
./comp.go:131:7: cannot use p (type *polygonThing) as type shape in assignment:
        *polygonThing does not implement shape (missing typeName method)

where line 131 is the line where we are assigning to s.

From this error we can see that Go did not promote the typeName method from either of the embedded structs into polygonThing. But there was no error message about the call to draw, so it did promote that method from polygon, since it is not ambiguous.

If we comment out the embedded thing line from the definition of polygonThing, the code compiles. If, instead, we comment out the embedded polygon line, we get different errors:

./comp.go:129:4: p.draw undefined (type *polygonThing has no field or method draw)
./comp.go:131:7: cannot use p (type *polygonThing) as type shape in assignment:
        *polygonThing does not implement shape (missing draw method)

If we want to keep both embedded structs in our composite struct, there are a couple of ways we can resolve the ambiguity of typeName appearing in both embedded structs. The simplest is to assign a name to one of the embedded structs, converting it to a regular field. Instead of writing thing in the definition of polygonThing, we can write t thing. Go then does not attempt to promote the methods from thing into polygonThing, and the promotion of typeName from polygon into polygonThing is no longer ambiguous, so it succeeds.

Another possibility is to resolve the ambiguity by defining a typeName method directly on polygonThing. In this case, Go does not attempt to promote typeName from either of the embedded structs. We can call a method in an embedded struct by referring to that embedded struct as if it were a named field.

func (t *polygonThing) typeName() string {
  return t.polygon.typeName()+"Thing"
}

With this definition, the program compiles and runs, outputting

draw polygon with sides=0
polygonThing
polygonThing

Solution

Now that we understand how embedded structs work in Go, let's go back and reconsider what happened with our printShapeNamesAndSides function.

Assume one of the elements in our slice of shape is an instance of triangle. We call nameAndSides with that triangle as the receiver. Since we did not define nameAndSides on triangle, that calls the promoted version of that method. That promoted method turns around and calls nameAndSides on the embedded polygon, passing the embedded polygon as the receiver. In polygon.nameAndSides, it calls p.typeName, but p here is the receiver of the nameAndSides method, which is the polygon, not the triangle. So the call from nameAndSides to typeName call's the typeName method on polygon rather than on triangle.

With this understanding, let's update our code to make "overriding" work. The difference between the behavior we are seeing and what we would expect from a system with inheritance and overriding is that here our "base class" does not, by default, make calls to methods of the "subclass". It can't because the method in the "base class" has no reference to the type of the containing object. In order to implement a call to method in an instance of a "subclass" from polygon.nameAndSides, we need a reference to that instance, such as a triangle. We will do this by explicitly passing our shape as an argument, then calling the typeName method on that shape rather than on the receiver. By calling a method on a passed-in argument rather than the receiver, it is clear, when looking at that method in the "base class", that the call may be going to a different type of object than polygon.

type shape interface {
  ...
  nameAndSides(s shape) string
}

func (p *polygon) nameAndSides(s shape) string {
  return fmt.Sprintf("%s (%d)", s.typeName(), p.sides)
}

func printShapeNamesAndSides(shapes []shape) {
  for _, s := range shapes {
    fmt.Println(s.nameAndSides(s))
  }
}

With these changes, we get the expected output:

triangle (3)
square (4)

Conclusion

The way Go promotes methods of embedded structs makes it have some of the characteristics of inheritance as defined in object-oriented programming. In particular, it allows for methods to be automatically promoted to the containing struct, and thus for interfaces to be automatically promoted to the containing struct. One key difference is that, when you override one of those promoted methods in the containing struct, the code in the embedded class does not automatically call the overridden method in the containing class, as happens in some object-oriented languages such as Java.

You may have heard of the fragile base class problem. A related issue, that can arise when there are downcalls from a superclass to an overridden method in a subclass, similar to the example here where I "overrode" the typeName method, might be termed the fragile subclass problem. If you are interested into digging into that, you can read Safely Creating Correct Subclasses without Seeing Superclass Code, a paper from OOPSLA 2000 that examines that issue. See section 4. The designers of Go chose not to implement inheritance, but instead to favor composition. Although some Go constructs can look a little like inheritance, it's better to start thinking about designing in Go using composition rather than trying to bend Go to do something like inheritance.

A Future Telescope

2019-06-11T21:34:00.000-07:00

This post describes an idea for a telescope that can see where heavenly objects will be in the future. This may sound crazy, like something out of a science-fiction story, but I believe it is based on solid theory. Unless, or course, I have misinterpreted something. Read on if you enjoy considering surprising extrapolations of theory.

Collective Electrodynamics
Interpreting the Theory
The Big Idea
The Details
An Invitation

Collective Electrodynamics

Carver Mead's book Collective Electrodynamics, first published in 2002, puts forth a theory of electrodynamics based on four-vectors. As with many other low-level aspects of physics, this theory is time-symmetric, making no claims about how to distinguish between the past and the future.

I found Carver's theory and his exposition of it to be elegant and convincing. Even if you don't agree with my interpretation and conclusions in this post, I recommend you read this book if you are generally interested in physics.

Carver's description of the process of photon emission and absorption includes a few comments noting that a photon will not be emitted without a destination that will absorb the photon at some point in the future, because the emitter and absorber are a coupled pair forming a single resonator.

In section 4.8: "Any energy leaving one resonator is transferred to some other resonator, somewhere in the universe."
In section 4.12: "The spectral density of distant resonators acting as absorbers is, of necessity, identical to that of the resonators producing the local random field, because they are the same resonators."
In the Epilogue: "It is by now a common experimental fact that an atom, if sufficiently isolated from the rest of the universe, can stay in an excited state for an arbitrarily long period. ... The mechanism for initiating an atomic transition is not present in the isolated atom; it is the direct result of coupling with the rest of the universe."

Part 5 describes how two atoms couple electromagnetically as resonators.

Interpreting the Theory

As a thought experiment, if we were out in space in some part of the universe in which there were no matter in one direction, we would not be able to shine a flashlight in that direction because there would be nothing to absorb the photons, therefore they would not be emitted. If we were able to measure all of the other energy going into or out of the flashlight, we would be able to notice that energy leaves the flashlight when we point it towards other things, but not when we point it towards truly empty space.

Coming back to our current location in the universe, there is a finite amount of matter between us and the Hubble sphere. Consider a line segment from our location to a point on the Hubble sphere. If there are no atoms on the intersection of said line segment and our future light cone, then it should not be possible to emit a photon in that direction. More restricted, if there are no atoms in that intersection that are capable of absorbing a photon of the frequency our source atom is attempting to emit, then we will not be able to emit said photon in that direction.

The Big Idea

Assume, then, that we have a highly directional monochromatic light source that we can point accurately, and that we can accurately know how much light we are emitting based on energy input measurements. What would happen if we were to provide that light with a suitable input power signal, then scan the sky? If there are any differences in the density of atoms in different directions that are capable of absorbing photons of the frequency we are sending, would we be able to produce a map of the sky showing those differences? Would there be any anisotropism, as there is for the background radiation?

Given how much matter there is in the universe, I suspect it would be hard to find one of those line segments out to the Hubble sphere without a single atom capable of absorbing one of our photons, but perhaps if we are trying to send out a great many photons, there will be enough of a statistical variation to measure.

The thing that I find fascinating about this is that, if it did in fact work, we would be "seeing the future", because whatever map we produced would be a function of where the absorbing atoms are going to be when the light we emit reaches them. For planets in our solar system that would be minutes or hours in the future, but for distant nebulae that could be millions or billions of years from now.

The Details

The devil is in the details. Even if, in principle, the theory supports this conclusion, would it be possible to build such a device?

In addition to the statements of theory, I make two assumptions above:

We can accurately point our light source, such that we can perform a raster scan on a portion of the sky.
We can determine how much light energy is leaving our light source by measuring the input energy to that source.

The first assumption seems straightforward: the optics involved in sending out a beam of light to a small portion of the sky should be the same as receiving light from a small portion of sky, which we do on a regular basis to form images of space. But I am not an astronomer, so I may be missing something. For example, I know that some modern telescopes use a guide laser shining up through the atmosphere to allow for dynamic adjustments to the mirrors to compensate for atmospheric distortion. Would this also work when sending out a signal beam alongside the reference beam? I don't know why not, but, as mentioned, this is not my area of expertise.

I think the second assumption may require more effort to solve. The typical advice for powering a laser is to use a current source in order to get a stable output. For my experiment, however, I specifically don't want a stable source. Instead, I want a source that can output more or less light based on how much the space into which it is shining can accept.

Since I can't directly measure the light output, I also need a light source where I can accurately judge how much light is being output by measuring the input power. This means I need to know the power transfer characteristics of the light source. How much of the input power is transformed into light, and how much into heat or other forms of energy? Is that relationship constant over time, or might it vary such that at one point in time I get x% of the input turning into heat, and moments later I get 2x% turning into heat? Alas, I am not a solid-state physicist (assuming my light source is a solid-state laser), so I don't know the answers to these questions.

An Invitation

So, what do you think? Is there a fatal flaw to my understanding of the theory? A fundamental reason why it would not be possible to build such a "future telescope"? A technical limitation making it not currently possible?

I have talked to a few people about this idea, and the ones who I know have a good understanding of Carver's theory have said that, in principle, they don't see anything wrong with my reasoning.

AsI mentioned above, I'm not an astronomer or solid-state physicist, so I don't have the background to take this concept to the practical stage. But perhaps someone else does.

This seems like it would be a very exciting thing if it worked, but I think it would require a significant investment of time and access to some expensive equipment to take the next step. Would anyone like to give it a try? If you do, I'd love to hear about it.

Wormhole Musings

2018-11-27T20:00:00.000-08:00

I have questions about how wormhole portals in science fiction stories work.

Recently I started reading another science fiction novel where wormholes allow instantaneous travel between distant points. In books that use this mechanism, the author typically explores how the ability to travel easily and quickly between the stars shapes the course of history.

But I always get hung up thinking about all the other ways in which a portal might possibly be used, for good or evil, in ways much less grand but potentially more disruptive than distant travel. Of course, since the use of wormholes in these books does not rely on our currently generally accepted science, these questions do not have well-defined answers. That's why I muse.

In this article, I ask some questions about how some of our currently accepted principles of physics apply (or don't apply) to wormholes, and ponder the ways in which one might use (or misuse) a wormhole based on the answer to those questions.

Caveat lector: If you want to keep reading wormhole stories without being distracted by questions like these, you might want to stop reading now. Because once you read these questions, you won't be able to unread them.

My Questions

How big and expensive is the equipment required to create and maintain a wormhole?
How much energy is required to create and maintain a wormhole?
What shape is a wormhole?
Can I make a wormhole as large or as small as I want?
How do you control the location of the wormhole portals?
Is energy conserved when traversing a wormhole?
Is momentum conserved when traversing a wormhole?
How do physical forces propagate through a wormhole?
What is the geometry of the wormhole connection?
In what reference frame is traversal of the wormhole instantaneous?

Potential Answers

How big and expensive is the equipment required to create and maintain a wormhole?

Mainly what I want to know for this question is whether the equipment is small and inexpensive enough that an individual can own one. If they are within the reach of many people, that makes it much more likely that there will be some people who will use it for unexpected purposes.

I once read a story in which someone had invented a personal flying belt that anyone could get for five dollars. With such easy personal mobility, border control suddenly became much more difficult, which of course led to some interesting problems. If anyone could buy and control a wormhole for five dollars, that would be a very different situation than if there were only a few wormholes controlled by a few rich and powerful entities.

How much energy is required to create and maintain a wormhole?

Although science fiction wormholes don't rely on any currently known physics, my feeling is that any scientifically plausible mechanism for a wormhole would require a prohibitive amount of energy to use. And I mean the word prohibitive literally: the amount of energy required would be so high, it would effectively prohibit the possibility of using a wormhole.

Since that doesn't make for good science fiction stories, we have to assume that the energy requirement is modest enough that we are able to produce and use wormholes. The question then becomes, how much energy is required? This question is related to the earlier question about cost, in that if a wormhole requires a relatively large amount of energy to operate, that could restrict its operation to a small number of controlling entities. Whereas if I can run it with a D-cell battery, there would be many more interesting things I could do with it.

It may not matter how much energy it requires to operate a wormhole, because, as discussed in some of my comments below, it seems likely that once you have a wormhole you could get as much free energy as you want.

What shape is a wormhole portal?

In most stories, wormholes portals are portrayed as circular areas that you step through, much like the entrance to a common tunnel. This is very convenient for imagining things like train lines that run through wormholes, and for thinking about the equipment that might be required to hold open a wormhole portal. That equipment is sometimes described as a torus with massive structures around it.

I think it is more likely that a wormhole would be spherical. You could enter it from any direction, and you would exit in a direction based on the direction you entered. This is a bit harder to visualize, which may be one reason it is not often described this way.

If a wormhole portal is a sphere, how does that impact the equipment required to maintain it? It would be tough to have equipment symmetrically on all sides and still have something that allows easy access. But maybe it doesn't all have to be completely symmetrical, so you can leave a few holes to let the trains get through the equipment so they can enter the portal.

Can I make a wormhole as large or as small as I want?

In most stories, wormholes are of a size that makes them convenient to step through, or drive a car or train through. Is this an essential feature of wormholes, or is it just that that happens to be the most convenient size? Could we make them any size if we wanted to? Perhaps big wormholes would be harder, but I would think smaller wormholes would actually be easier to make. And I can think of lots of interesting uses for small wormholes, depending on the answers to the other questions.

One example of a good use for a tiny wormhole would be to shine a laser through it and have a high capacity communication channel.

How do you control the location of the wormhole portals?

Some stories postulate that maintaining a wormhole portal requires physical equipment at both ends. In this case, the question of how to control the location of the portal is clear: you have to move the equipment to move the portal.

In other stories, the two ends of the wormhole are created at one location, after which one end can be moved to another location. In considering the geometry of wormholes, I would guess that it is possible to move one end of a wormhole through another wormhole, but perhaps only if the wormhole being transported is sufficiently smaller than the one it is being moved through.

If equipment is required at both ends of the wormhole, establishing a wormhole from A to B requires first traveling from A to B through normal space to deliver the necessary equipment, or possibly from C to B if the two ends of the wormhole don't need to be created in one place. This constrains the expansion of an interstellar civilization to the speed of light, which is annoyingly slow to some authors.

The more interesting case, as postulated in some stories, is that you can project the other end of the wormhole to a desired location without first having to get there some other way. This is, of course, a much-preferred mechanism if you want to quickly expand your network of gates, since who wants to wait many years while the slowship takes your gate to the next star? But what could we do if we could project the other end of our portal to anywhere we wanted in space?

If I can project tiny wormholes, I could do cut-less surgery. Mining would be much cheaper, as I could just project a wormhole down to where the ore is without having to tunnel or strip-mine down to it. I could make a great vacuum pump by putting one end out in space.

At a more banal level, I could eat as much as I want and not gain weight. I just need to project a tiny wormhole into my stomach and remove the food I just ate before my body digests it. I get all the pleasure of eating without suffering the problems of obesity.

I read one story in which a little wormhole was located on the bottom of a drinking glass, with the other end at the bottom of a vat of beer, wine, or whatever drink was selected. Each time the glass was set down, the wormhole would open to fill the glass, then close once the glass was full.

If I put on my black hat, the most obvious nefarious deed is, I project the other end of my wormhole into a bank vault and walk off with the cash. Or into a collection of classified documents and walk off with the secret plans. Or into my enemy's bedroom and kidnap him or kill him. I really only need to project a tiny wormhole, big enough for a bullet, to do a dastardly deed. Or so small it's only big enough for a packet of viruses that I inject into his bloodstream without him even knowing it.

If we can project one end of our wormhole to any desired location in space, perhaps we could project both ends. This would allow us to establish a wormhole between any two points anywhere in space, without having to have equipment at either end. This could actually be an interesting premise for a story, as it would allow for the case where there is a single wormhole-generating facility that creates all of the wormholes used throughout the civilization. That facility would presumably be controlled by some now-very-powerful entity, and would be both heavily secured and heavily attacked, so there are lots of opportunities for story lines.

The ability to create a wormhole between any two other points in space also opens up lots of additional opportunities for mischief. One could create a pretty effective weapon of mass destruction by creating a wormhole with one end in the middle of the sun and the other end where you want the destruction. Or put one end in the middle of a magma reservoir, or deep in the ocean, depending on the type of destruction desired. Or put one end in space to suck everything into the vacuum.

On the positive side, one could create a really nice package delivery system. Open a wormhole between the package source and destination, drop the package in for instant delivery, and close the wormhole.

Assuming we have the ability to create a wormhole portal anywhere in space, there is still the question of how we figure out where it gets created. Do we have to use trial and error to place the wormhole in just the right place? If we are trying to create a wormhole portal in a distant location, do we have to worry about the precision of our equipment, in the same way that launching a spaceship to land on Mars requires more precise equipment than launching one to land on the moon? Can we create the remote wormhole portal and then move it around at will, and if so, can we move it faster than the speed of light?

Is energy conserved when traversing a wormhole?

In most wormhole stories, one can step through a wormhole to get from one end to the other with no more effort than walking across the room. There is no explicit discussion of conservation of energy, and my assumption is that the authors don't worry about it because that detail doesn't advance the story. But I worry about it.

If I open a wormhole between Earth and its moon, there is a pretty big difference in the gravitational potential energy between those two points. When I want to put something in the wormhole portal on Earth and have it come out on the moon, do I need to supply the difference in energy between those two points? That would mean supplying a whole lot of energy to move in that direction. Conversely, if I step through the wormhole from the moon back to the Earth, what happens to all that gravitational potential energy?

If I can move from one end of a wormhole to the other end without having to supply that extra energy, then I can get free energy. Here's one way: go find a big dam with a hydro generating plant and install a wormhole with the entrance portal under the water at the bottom of the dam, just past the outflow of the generator, and with the exit portal just above the surface of the lake at the top of the dam. Since the entrance portal is underwater and the exit is above, water flows into the entrance portal and comes out at the exit portal. Thus the lake is ever refilled and our hydroelectric generators can keep running.

Maybe the wormhole technology works like a battery with regenerative braking on electric cars: it supplies the energy needed when traveling in one direction, and absorbs the excess energy when traveling in the other direction.

Is momentum conserved when traversing a wormhole?

If I am in New York City, the Earth's rotation is moving me at about 700 miles per hour relative to the center of the Earth. At the same time, Sydney is also moving at about 700 miles per hour, but in roughly the opposite direction, as it is almost on the opposite side of the Earth. If I open a wormhole between New York City and Sydney, and I step through, what happens to that 1400 miles per hour difference? Do I splat into the nearest wall at supersonic speed, or do I casually step through and continue walking to my destination?

If momentum is conserved, then I would be moving at a high speed relative to the exit point of the wormhole. If I put the appropriate mechanical devices next to the wormhole exit, I could send through a rock, catch it moving at 1400 miles per hour, and convert that kinetic energy to electricity. Then I could toss the rock back and do the same thing on the other side. Free energy.

The question of conservation of momentum is subtler than it first appears. If I want to conserve momentum, I come out of the wormhole in Sydney with that supersonic velocity relative to the city. But what does that mean for the angular momentum of the system? If I just moved that mass over to a new location and nothing else changed, then I have changed the angular momentum of the system. If the whole earth moves a tiny bit in the other direction, to keep the same center of mass, that could take care of that issue, but why should the whole Earth move when I use a wormhole? Would that happen if I were in an airplane? In a spaceship in low orbit? In a spaceship in high orbit? In a spaceship at the orbit of the moon, or beyond?

As with conservation of energy, perhaps the wormhole portals absorb or supply momentum as needed, transferring it to the surrounding masses. This could mean that wormhole portals would most effectively be placed on large masses such that they had a reservoir of momentum to transfer to or from. The larger the masses that were transferred through a wormhole, and the larger the relative velocity of the portals, the more momentum would have to be transferred, and the larger the attached mass would have to be.

How do physical forces propagate through a wormhole?

In every wormhole story I have read, light traverses a wormhole with no problems. I assume that means all forms of electromagnetic radiation traverse a wormhole equally easily. This presents another opportunity for a good energy source: put a wormhole portal in close orbit around the sun, then put the other wormhole portal on Earth. Stream that high-intensity light through and use it to drive solar cells for direct production of electricity, or as a heat source for standard steam turbines. If no equipment is required at the solar end of the wormhole, you're all set. If equipment is required, you might have to build some kind of refrigerator that brings that heat back to Earth and keeps the equipment cool.

How about gravity? How does that propagate through a wormhole? Most wormhole stories I have read describe travelers stepping through a wormhole and experiencing a discontinuity in the gravity field, meaning gravity is not propagating through the wormhole. This seems odd to me. Why would light propagate through a wormhole but not gravity?

The intensity of light from a point source drops off proportionally to the distance squared, which makes sense because the light is spreading out at that rate, and a fixed-size object intercepting the light will thus get less of it when it is further away. Because of this behavior, it makes sense to me that the amount of light that would come through a wormhole would be proportional to its size. If the wormhole is very small, only a small amount of light would come through.

Gravity also drops off proportionally to the distance squared, but not quite for the same reason. Given a particular mass, the gravitational force on that mass is independent of whether it is small and dense, or larger and less dense. The amount of area covered by the mass is not important, only its mass and its distance from another mass. If there is a tiny wormhole and I can measure a distance through that wormhole from my object to a large mass, wouldn't that mean the gravitational force is proportional to the square of that distance?

If gravity does propagate through a wormhole, perhaps I could make a null-gravity region by creating a pair of wormhole portals, then putting each one slightly above the surface of the Earth and upside down from each other. If you were to stand under one portal and look up, you would see the Earth above you. You have one Earth gravity below you and one above, so they cancel out and you have no gravity. A nice tourist attraction. Then again, the two Earths would also be exerting a gravitational pull on each other, so whatever is holding up each wormhole portal might be carrying the weight of the world.

On the other hand, given that General Relativity says that mass causes curvature of space, and thus gravity, and wormholes are usually described as some way of warping space, that seems to imply that being able to control wormholes means being able to control the curvature of space and thus being able to control gravity. So perhaps based on that we can choose how we want gravity to propagate through wormholes for our stories.

If you can turn wormholes on and off at will, you might be able to use this effect to get some free energy. You turn on a wormhole, have it pull up a weight, then turn it off, let the weight fall, and use that to generate energy.

What is the geometry of the wormhole connection?

A wormhole is usually described as a connection that goes through a higher dimension than the three dimensions in which we live. Those higher dimensions may present degrees of freedom that can lead to some curious and unpleasant results. Let me try to explain with a flatland analogy.

If I live in a two dimensional space, I can create a wormhole by folding that sheet of space until two points meet, then punching out a circle around those two points, and sewing those two circles together. This is topologically equivalent to attaching a hose that stretches up from a circle around one of those points and comes down at a circle around the other, with the assumption that the hose represents no distance (or a very short distance). A 2D creature could move from regular space onto the surface of that hose (assuming the hose diameter is much larger than the creature), then to regular space on the other end, then return to its original location via regular space, and all is well.

Now consider what happens if I take that same hose, but instead of going up from the first point and down at the second, I go up from the first point, then go around to the under side of the plane (which I can do without going through the plane if I have yet another dimension) and come up from the bottom side of the plane to meet the second point. Consider again what happens to that 2D creature who travels into the wormhole, out the other end, and returns to its starting point in normal 2D space. The result is that it comes back inverted. What was left is now right, and vice-versa.

I once read an old science fiction story in which there was a place deep within the Amazon where, if you navigated a certain course, it would reverse everything left to right. An enterprising businessman heard this and figured he could more efficiently make shoes by manufacturing only left shoes, then shipping half of them around this circuit, so he went exploring to find it. After going around the course, he looked at his sample left shoes, but they were all still left. Frustrated, he threw them all away, destroyed the worthless maps, and returned to civilization - only to discover that in fact the trick had worked, but he had not recognized it because he, too, had been reversed. But he could never find the place again.

Getting your body flipped left to right would probably be fatal. Almost all of our body chemistry is chiral, so you would not be able to extract any nutrition from most foods, and you would starve to death or die of malnutrition.

If there is an extra dimension in which a wormhole exists, why not two extra dimensions? If there are two or more extra dimensions, you now have the issue described above, and you will need to make sure you get the two ends of your wormhole attached with the right geometry, or things that move through the wormhole might not come out quite as expected.

Of course, a black-hat could surely come up with evil things that could be done with that kind of wormhole.

When considering wormhole geometry, another potential problem is the curvature of space in the wormhole. According to Einstein's Theory of General Relativity, curved space causes uneven acceleration. Too much curvature can lead to disastrous gravitational tidal effects that can tear things apart. Small wormholes would be most likely to have this problem. Larger wormholes, like South Pass through the Rockies, would allow that curvature to be spread out enough to be hardly noticeable.

In what reference frame is traversal of the wormhole instantaneous?

This is the issue which to me is the killer.

Einstein's Theory of Special Relativity is quite well supported by experimental evidence. According to that theory, there is no such thing as universal simultaneity, so we have to ask what instantaneous travel means.

You may have heard that, according to Special Relativity, if observer A with clock A in spaceship A is moving near the speed of light relative to observer B, clock A will run more slowly than observer B's clock B, according to observer B, due to time dilation. But at the same time, according to observer A, observer B with clock B is moving near the speed of light relative to A, so observer A sees clock B as moving more slowly. This effect is the core of the twin paradox, where one twin gets on a spaceship from Earth, flies away at near light speed, and returns, while the other stays on Earth.

The twin paradox is resolved by noting that there is an asymmetry between the twins: one stays at rest on Earth, whereas the other accelerates three times during the trip (takeoff, turnaround, and landing). This difference is the key to understanding the paradox and determining that the twin on the spaceship ages more slowly than the one left on earth.

In 1971 a couple of scientists ran an experiment where they took some atomic clocks with them on commercial flights around the world and confirmed that they really did slow down as compared to the stationary atomic clocks left behind, just as predicted by Special Relativity (and by General Relativity, which predicted time dilation due to gravitational differences).

For instantaneous travel between wormholes, it seems like we can set up a symmetric situation so that we can't resolve our paradox the same way as for the twin paradox. Consider the situation where we have a wormhole between two spaceships (or planets, if you prefer) A and B that are moving at near the speed of light relative to each other. As noted above, the observer in each location observes the clock moving more slowly at the other location. If person C with clock C steps from spaceship A to B through the wormhole, spends a bit of time on spaceship B, then comes back to spaceship A, observer A will calculate that clock C will be behind clock A, having moved more slowly than clock A while it was on spaceship B. If person D with clock D steps from spaceship B to A through the wormhole, spends a bit of time on spaceship A, then goes back to spaceship B, observer A will calculate that clock D will be ahead of clock B, having moved more quickly than clock B while it was on spaceship A. But in this symmetric situation, observer B will calculate that clock C will be ahead of clock A, and clock D will be behind clock B, the opposite of what observer A calculates. So which is it?

The problem here is that statement that travel between wormholes is instantaneous. According to Special Relativity, two events that occur at the same time but different locations in one reference frame will occur at different times in a reference frame that is moving with respect to the first. For our example, this means that if observer A sees person C moving instantaneously through the wormhole from A to B, observer B does not see person C moving instantaneously through the wormhole except for when A and B are right next to each other. And since A and B are moving with respect to each other, they will not be right next to each other for at least one leg of the wormhole round trip. When A and B are not right next to each other, what appears as simultaneous in one reference frame is not simultaneous in the other reference frame.

The only way I know of that is consistent with Special Relativity that would allow wormhole travel to be instantaneous according to both ends of the wormhole would be to constrain wormholes to be stationary relative to each other. But this would be a pretty strong constraint for stories, since essentially everything in the universe is moving relative to each other, and even the rotation of a planet is enough velocity variation to cause measurable time issues across the kind of distances wormholes sometimes connect.

But wait, it gets crazier. By the laws of Special Relativity, if you have any mechanism that lets you move between two points faster than the speed of light, in any arbitrary frame of reference, you can use that mechanism to travel backwards in time. The Tachyonic antitelephone is an example of how being able to send a message faster than light allows sending a message backwards in time, and this same principle applies to sending an object rather than a message.

One way to explain this is based on the assertion of Special Relativity that two events that are not at the same location in space that occur simultaneously in a frame of reference A will not be simultaneous in a frame of reference B that is moving with respect to A. In frame B, one of those two events will happen before the other. Let's assume that we have a wormhole with a pair of distant portals that are stationary in frame A, and another wormhole with portals stationary in frame B, moving with respect to frame A in the direction from one of the A portals to the other. We arrange the portals such that wormhole portal B2 is immediately adjacent to wormhole portal A2 at the starting time of our experiment according to observer A located at A1, and we arrange that B1 and B2 are adjacent to A1 and A2, respectively, at the same time in frame B. At the starting time in A, we step from portal A1 to A2. Since we arranged for B2 to be adjacent to A2 at this time, we can immediately move over to B2 and step through to B1, which we assume is instantaneous in frame B. Because we have arranged that B1 is adjacent to A1 at the same moment as B2 is adjacent to A2 in frame B, when we exit B1 we can then hop back over to A1 and complete our circuit in space. Since our trip through the wormhole B is instantaneous in frame B, it will not be instantaneous in frame A. For the traveler, all four legs of the trip are nearly instantaneous, but for an observer who remains in A only three legs are, with the leg through wormhole B not being instantaneous. Depending on which direction travelers takes around this loop, they will return to A1 either well after or well before the time they left.

The amount of time is proportional to the distance traveled through the wormholes and is related to the velocity of one frame with respect to the other. If frame B is traveling near the speed of light relative to A, the amount of time will be close to the light-distance between the two ends of the portal, so even if you are "just" traveling to Proxima Centauri B near Alpha Centauri, the closest extrasolar star group to Earth at four light years away, you could travel up to four years into the future or the past. The effect is less pronounced, but still present, at lower speeds.

Note that Special Relativity itself doesn't preclude faster-than-light messages or travel, it just says that being able to do so allows sending a message or traveling backwards in time, as demonstrated above. Our current theories do not say this is not possible, but most people believe in causality and thus find time travel problematic.

If you want to get a better intuitive feel for some of the weird things that happen when you start moving at near the speed of light, check out the free video game A Slower Speed of Light from MIT.

Potential Answers

Given that typical science fiction wormholes are based on new science beyond our current theories, we have a lot of leeway in deciding how that science works so as to create the conditions that best advance our story. We could say that managing wormholes requires an amount of money and energy that are only available to large organizations, or we could say that, once the science is known, wormholes are easy and cheap and anybody can make them, and see what kind of havoc is wreaked. We could say that small wormholes are easy to make, or that larger wormholes are easier. We could choose the geometry of the wormhole and portals to be troublesome or trivial. We could say that wormhole portals require equipment to maintain, or that we can cast them anywhere with ease.

All of the above choices are pretty easy in the sense that they are about the fictional new wormhole science and don't conflict with our existing science. Things get a little harder when we try to decide how conservation of energy and momentum work with wormholes, but even there we should be able to postulate something that allows us to remain consistent with known science, such as the wormhole absorbing or supplying the difference, or perhaps even requiring an exchange of equal mass from either end of the wormhole.

Propagation of gravity through a wormhole seems to me a little more difficult to deal with. As mentioned above, you might be able to claim that wormhole technology allows controlling the curvature of space. But another view of mass and space is that mass is the curvature of space, in which case making space curve is equivalent to creating mass, and at that point we get into all the questions of conservation of mass and energy and where it comes from when curving space for a wormhole.

The one that I really can't figure out how to make consistent is, as mentioned above, the question of time. The main reason wormholes are typically introduced is to allow faster-than-light travel, which, as described above, is what leads directly to the potential of time travel, according to Special Relativity. For all of the other questions, it seems like it may be possible to define some new science that answers those questions in a way that does not require us to discard any of our current well-established scientific theories, but for faster-than-light travel, I don't see any way to do this.

I can't even just assume that Special Relativity doesn't apply in that universe. There is a deep connection between having the same laws of physics everywhere, electromagnetism, and having a maximum velocity for any matter or information. Special Relativity builds on the work of Newton and Maxwell. and discarding it would require some other significant changes to the way the universe works.

A science fiction author might choose to focus on how wormholes allow time travel, as Robert L. Forward does in some of his stories. For the other stories, the ones that don't mention time travel, I just have to suspend my understanding of Special Relativity and enjoy the story as told.

Golang Web Server Auth

2018-04-13T20:29:00.000-07:00

An example of authentication and authorization in a simple web server written in go.

Background
Before Auth
Adding Authentication
Adding Authorization
Summary

Background

As described in my previous blog post, I recently rewrote my image viewer desktop app as a web app, for which I wrote the web server in go.

Since I was adding a new potential attack vector, I wanted to add security; but since this is only available on my internal network, and it's not critically valuable data, I did not need enterprise-grade security. In this post I describe how I implemented a relatively simple authentication and authorization mechanism, in particular highlighting the features of go I used that made that easy to do. For a simple app such as this one, the third of the three As of security, auditing, can be done with simple logging if desired.

The code I present here is taken from the github repo for my mimsrv project, with links to specific commits and versions of various files. You can visit that project if you'd like to see more of the code than I present in this post.

Before Auth

Go has good support for writing simple web servers. The net.http package allows setting up a web server that routes requests based on path to specific functions. In the first commit for mimsrv, before there was any code for authentication or authorization, the http processing code looked like this:

In mimsrv.go:

func main() {
  ...
  mux := http.NewServeMux()
  ...
  mux.Handle("/api/", api.NewHandler(...))
  ...
  log.Fatal(http.ListenAndServe(":8080", mux))
}

In api/api.go:

func NewHandler(c *Config) http.Handler {
  h := handler{config: c}
  mux := http.NewServeMux()
  mux.HandleFunc(h.apiPrefix("list"), h.list)
  mux.HandleFunc(h.apiPrefix("image"), h.image)
  mux.HandleFunc(h.apiPrefix("text"), h.text)
  return mux
}

func (h *handler) list(w http.ResponseWriter, r *http.Request) {
  ...
}

The above two functions set up the routing and start the web server. The code in mimsrv.go creates a top-level router (mux) that routes any request with a path starting with "/api/" to the api handler that is created by the NewHandler function in api.go. The top-level router also defines routes for other top-level paths, such as "/ui/" for delivering the UI files.

The api code in turn sets up the second-level routing for all of the paths within /api (the h.apiPrefix function adds "/api/" to its argument). So when I make a request with the path /api/list, the main mux passes the request to the api mux, which then calls the h.list function.

Adding Authentication

To implement authentication in mimsrv, I added a new "auth" package with three files, and modified mimsrv.go to use that new auth package. The most interesting part of this change is that it implements the enforcement of the constraint that all requests to any path starting with "/api/" must be authenticated, yet I did not have to make any changes to any of the api code that services those requests.

When I originally wrote my request routing code, it could have been simpler if I had defined everything in one mux. I didn't do that because I think the approach I took provides better modularity, but in addition, that structure made it easy for me to require authentication for all of the api calls.

The authentication code itself is not trivial, but wiring that code into the request routing to enforce authentication for whole chunks of the request path space was. I wrote a wrapper function and inserted it in the middle of the request-handling flow for requests where I wanted to require authentication.

To wire in the authentication requirement for all requests starting with "/api/", I changed mimsrv.go to replace this line:

mux.Handle("/api/", api.NewHandler(...))

with these lines:

apiHandler := api.NewHandler(...))
mux.Handle("/api/", authHandler.RequireAuth(apiHandler))

Here is the RequireAuth method from the newly added auth.go:

func (h *Handler) RequireAuth(httpHandler http.Handler) http.Handler {
  return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request){
    token := cookieValue(r, tokenCookieName)
    idstr := clientIdString(r)
    if isValidToken(token, idstr) {
      httpHandler.ServeHTTP(w, r)
    } else {
      // No token, or token is not valid
      http.Error(w, "Invalid token", http.StatusUnauthorized)
    }
  })
}

The RequireAuth function looks at a cookie to see if the user is currently logged in (which means the user has been authenticated). If so, RequireAuth calls the handler it was passed, which in this case is the one created by api.NewHandler. If not, then RequireAuth calls http.Error, which prevents the request from being fulfilled and instead returns an authorization error to the web caller. When the mimsrv client gets this error it displays a login dialog.

The other code I added handles things like login, logout, and cookie renewal and expiration, but all of that code other than RequireAuth is specific to my implementation of authentication. You could instead, for example, use OAuth to authenticate, in which case you would have a completely different mechanism for authenticating a user, but you could still use a function similar to RequireAuth and wire it in the same way.

Adding Authorization

Wrapping selected request paths as described above makes it so that authentication provides authorization for those requests. This coarse-grained authorization is a good start, but for mimsrv I wanted to be able to use fine-grained authorization as well. As this is a simple program with a very small number of users, I don't need anything sophisticated such as role-based authorization. I chose to implement a model in which I only define permissions for global actions, then assign those permissions directly to users.

For this simple permissions model, I needed to be able to define permissions, assign them to users, and check them at run-time before performing an action that requires authorization. My permissions are simple strings, stored in a column in the CSV file that defines my users. To give a permission to a user, I manually edit that CSV file, and to check for authorization before taking an action, the code looks for that permission string in the set of permissions for the current user.

The one piece that is not obvious is how to pass the user's permissions to the code that needs to check them. The reason this is not obvious is because the http routing package defines the function signature for the functions that process an http request, and that function signature includes only the request and a writer for the response. You can't simply add another argument in which you pass your user information, so you have to dig a little deeper to figure out how to pass along that information.

The solution relies on the fact that there is a Context attached to the Request that is passed to the handler function. By adding the user info to the Context, you can then extract that information further along in the processing when you need to check the permission.

The RequireAuth function validates that the user making the request is authenticated, so it already has information about who the user is, and this is the point at which we want to add the user info to the Context. We do this in our RequireAuth function by replacing this line:

      httpHandler.ServeHTTP(w, r)

with these lines:

      user := userFromToken(token)
      mimRequest := requestWithContextUser(r, user)
      httpHandler.ServeHTTP(w, mimRequest)

func requestWithContextUser(r *http.Request, user *users.User) *http.Request {
  mimContext := context.WithValue(r.Context(), ctxUserKey, user)
  return r.WithContext(mimContext)
}

When the code needs to know whether the current user is authorized for an action, it can call the new CurrentUser function, which retrieves the user info from the Context attached to the Request, from which the code can query the user's permissions:

func CurrentUser(r *http.Request) *users.User {
  v := r.Context().Value(ctxUserKey)
  if v == nil {
    return nil
  }
  return v.(*users.User)
}

Summary

While implementing authentication and authorization in a web server takes more than just a few lines of code, at least the part about how it gets tied in to the http processing in go is only a few lines. Although that part is only a few lines of code, it took me a while to dig around and find exactly how to do that. I hope that this article can save some other people a bit of time when doing their own research on how to add auth to a go web server.

Golang server, Polymer Typescript client

2018-03-13T21:45:00.000-07:00

Finally, a web development environment I enjoy using.

Background
Mimsrv
What I Like

Offline Development
Simple Mental Model
Simple Dependency Management
Simple Compilation
Type Safety
Separation of Concerns
Go http support

Room for Improvement

Polymer/Typescript mismatch
Debugging Typescript

TL;DR

I have found Go to be a nice tool for developing a small web server, and Polymer + Typescript to be a nice combination for developing a web UI. The Go server acts as both the API server and the static content server delivering the UI pages. If you think you might want to try this approach, you can look at my mimsrv program on github as an example. If it looks too complicated, browse in the git history back to some of the earliest commits, such as the first ui commit and the first api commit, to see how things looked at a simpler time.

Background

I have been developing web pages and apps for a long time, since the earliest days of HTML when there were no tools more sophisticated than a text editor, and server-side scripts were the only form of executable web code. In 1994 I wrote htimp, an experiment in how to attach a web browser to an interactive program with a lifetime longer than a single message.

Over the years I tried many technologies, including JavaServer Pages, JavaServer Faces, PHP, jQuery, and others I have forgotten. Some were better than others (more accurately, some were bad and some were excruciating), but I never felt any of them provided a reasonable mental model for how to put together an application.

I was away from the web UI scene for a while, and when I got back to doing some web development a couple of years ago, things seemed to have improved quite a bit. In the last year, I have been introduced to a few technologies that, in combination, provide me with a development environment with a working mental model of how to put together a program, and a set of tools that makes it easy to do that at a good clip.

The three technologies that together have brought pleasure back to my web programming are:

The Go language and development environment
The Typescript language
Polymer-2 (and Web Components) with decorators

Below I describe the project on which I tried out these technologies, followed by a discussion of what I liked about them.

Mimsrv

Mimsrv is a web server and UI to view a collection of photos. It is a replacement for mimprint, which is a desktop app that I originally wrote starting in 2001 in Java, and converted to Scala starting in 2008.

A couple of years ago I started looking into rewriting mimprint once again, this time as a web app. As a web app, I would no longer have to worry about distributing a desktop application to the various machines I have on which I wanted to view my photos. I also thought I should be able to leverage the web browser's media capabilities so that I would not have to develop or support that whole chunk of code.

The tools I tried were never nice enough to pull me in and get me going on that replacement, and I had moved my rewrite-mimprint project way down on my TODO-list.

At Google last year I worked on the open-source Datalab project. When I started on it, we were using jQuery and Javascript. I liked it when we converted to Polymer-2 and Typescript, and I liked it more when we switched to using Polymer decorators.

I started learning Go in order to review code from my teammates. It took a little getting used to, but the more I learned, the more it made sense to me. I felt it was much easier to understand the existing Go codebase than similar codebases I had looked at in other languages. It grew on me, and after I started adding my own Go code to the project, I was surprised at how much I liked using it, and I felt that I was making pretty good coding progress.

I thought the combination of Go for the server, and Polymer and Typescript with decorators for the client, worked quite well, and I decided to try it for my personal project. So far that combination has worked well for me, and I have been quite happy with it.

What I Like

Offline Development

One of my requirements is that I be able to develop when I am offline. I insist on this because one of the situations in which I have the most amount of time available for programming on my personal projects is when I am traveling and often don't have network access.

In a previous attempt at putting together a collection of technologies for developing web apps, some of the pieces used maven, and I was unable to figure out how to convince it not to go out looking for new versions of the snapshots I needed every time it compiled.

After using Go on a project at work and being pleasantly surprised at how much I enjoyed using it, I decided to see it if would work for my personal projects. When I downloaded and installed it, I was delighted to discover that, not only did the installation provide everything I needed to compile and run my programs, but it also included all of the documentation and the Go Tour, so those would all be available to me offline!

Similarly, the Typescript and Polymer tools allow just building the code, without attempting to do any dependency resolution, so can easily be used offline.

Simple Mental Model

There are a couple of changes to the web app landscape that have made for a much simpler mental model than in the old days. The main one is the Single Page Application (SPA). With the old approach of having to move to a new page every time the user took an action, saving state across those page changes required mental and technical gyrations. With a SPA, you make AJAX calls to the server using XMLHttpRequest, and just keep your state in variables as in any other program.

The SPA model also allows for a clean separation of responsibility between the server and the client. With Polymer, all of the UI manipulation is handled in the client, so the server doesn't need to deal with any kind of templating of client-side functionality. This means the server can focus on the API and on just delivering the UI code to the client, and the client can focus on managing the UI and making API calls.

The other big change on the client side is the progress that has been made on the asynchronous programming model. At first we had to pass around success and failure callbacks, which requires splitting code up in unwieldy ways around every asynchronous call. The introduction of Promises provided a nice way to avoid the "callback hell" of deeply nested callbacks, but still requires chopping your code up around every asynchronous call. Lastly, the introduction of the async and await keywords made asynchronous programming almost as straightforward as synchronous programming. I'm particularly impressed that you can do things like have an if-statement with synchronous code on one side and asynchronous code on the other side, or a loop with an asynchronous call in it. This is so much simpler to reason about than if you had to figure out how to do that with callbacks or even Promises.

Simple Dependency Management

The few times I had to deal with Maven were unpleasant. I found it hard to control, hard to configure, and hard to understand what it was doing. Perhaps it's just that, with the march of time, people have figured out how to make dependency management better, but I found the dependency management in both Go and Polymer to be pleasant to use.

In Go, when you need a package, you just say go get package, and it downloads that package and all its dependencies. Assuming you follow the Go conventions when naming and locating your package, when someone then wants to download your package, they do the same thing, and Go will also download all of your dependencies to their system.

Polymer-2 uses bower for its package management, and it is almost as easy to use. The bower.json file lists the packages needed, and running bower install installs those packages and their dependencies. When you add a new dependency to one of your Polymer components, you just run bower install --save new-package to download that new package, and you're done. Not quite as effortless as go, but much better than my experience with maven.

For both Go and bower, they don't attempt to download anything except when you explicitly tell them to with go get or bower install, which is good for offline development.

Simple Compilation

For pretty much my whole programming career, I have been accustomed to using some kind of build tool that requires a configuration file: make, ant, maven, sbt, grunt, bazel, gradle, and others.

Go is different: it is so opinionated about where you have to put your packages and how you have to name stuff, that it has all the dependency information it needs by looking at the source files. You just tell it to build your program with the command go build, and it does it. No build config file required.

The Typescript compiler and Polymer build commands do require config files, but they were pretty simple to set up and understand, and seldom need to be modified. Running tsc compiles all the Typescript files to Javascript, and running polymer build packages all the Polymer Javascript and HTML files into a directory where they are served by the Go server.

Type Safety

I like the compiler to catch as many errors in my code as possible. Using compile-time types allows the compiler to spot more errors. This is why I greatly prefer Typescript over Javascript.

Go is also a compiled and typed language, so it catches a lot of problems before execution time.

Separation of Concerns

While I don't think having to use multiple languages is a benefit, the ability to select the best tools for different parts of the problem is. Go works very well as a web server for API calls and static content. Most people using Polymer embed Javascript code in their HTML file, but I prefer using Typescript and am happy putting that in a separate file from the HTML, where my editor understands it better.

Go http support

Go has a nice http package that makes it easy to define web routing and implement handler functions.

Because Go supports functions as first-class values, it's easy to define a function that can take a function as an argument and return another function. In my case, I used that approach to create a function that I could use to specify that certain parts of my API required authentication.

I wrote my http handlers to do only the marshaling and unmarshaling of data and then call the underlying routine that implements the requested functionality. This made it easy to write unit tests of the underlying function. But Go also provides a nice testing package for http handlers that makes it relatively easy to test the http handler as well.

Room for Improvement

I'm pretty happy with this collection of technologies, but there are a couple of things I would like to see improved.

Polymer/Typescript type mismatch

Polymer decorators are a nice improvement over the previous approach, as there is now much less boilerplate and repeated code. But I still have to specify a type in each Polymer.decorators.property line, and that type is not quite the same as the Typescript type (for example, string vs String, any vs Object).

I suppose this is not that surprising, given that Typescript is not officially supported by Polymer. I guess that's really what I would like to see happen.

Debugging Typescript

Writing Typescript rather than Javascript is nice, but when it gets loaded into the browser it's Javascript, so debugging in the browser uses the transpiled Javascript. The Javascript is usually close enough to the source Typescript that it's manageable, but it would be nice to be able to debug with the Typescript source code.

Maybe this situation will get better when WebAssembly gets implemented.

FiOS - A Cautionary Tale

2017-06-22T21:03:00.001-07:00

I delayed signing up for Frontier FiOS because I was concerned they might screw things up. I should have been more concerned.

This is a long post. Consider it entertainment. Or just skip to the Answers.

The Need for Speed
My Unusual Situation
Questions
Research
Ordering
Trouble
Mistakes
Good Stuff
Answers
Frontier's Problems
Timeline
Selected Quotes

The Need for Speed

I have had internet connectivity for decades, starting back with modems so slow, I knew people who had to pause in their typing to let the modem catch up. I appreciated every doubling of speed as each generation of modem arrived. I was surprised when modem speeds reached 4800 and then 9600 baud - how could you get more bits per second than the 3K bandwidth of a phone line? - and I was astounded with the jump to 56K modems.

When DSL came out, I waited impatiently for it to be available in my neighborhood, and signed up as soon as I could. After years of using a 56K modem, my 740Kbps DSL line was satisfyingly fast.

I lived with 740Kbps for six years, until one day my DSL modem broke. While researching new modems, I learned that I could have my service switched from Frame Relay to ATM and bump up my speed to 3Mbps. Normally this would mean my service would be out for 10 days while they did that, but since it was already out, it seemed like a good time to make the switch. The speed bump from 740Kbps to 3Mbps was a mere 4x, far less than the 13x increase from the 56K modem to 740Kbps DSL, but still, 3Mbps was satisfyingly fast.

Verizon actually offered FiOS in my neighborhood fairly early, but I was pretty happy with my DSL service, and I wasn't doing anything that I thought needed more than 3Mbps bandwidth. I remained happy with my 3Mbps for over a decade. But technology marches on; I bought some HDTVs, started watching more YouTube, and started working from home more often using bandwidth-hungry remote desktop applications. My 3Mbps connection was not sufficient to stream HDTV movies and YouTube clips, and my remote desktop experience was annoyingly slow. I was finally feeling the bandwidth squeeze.

Still, I delayed upgrading to FiOS. I had heard that I would have to give up my copper-wire land lines, which I was not keen to do. Some years ago our power was out for over a week; batteries everywhere ran down - even the local cell towers ran out of juice after a few days, so there was no cell service in our neighborhood - but, with our copper wires, we had phone service the whole time. I liked that.

In addition, by this time Verizon had sold to Frontier, and based on my experience and anecdotes I read, I was concerned that Frontier would mess something up when dealing with my service change request, particularly since my situation was rather unusual.

My Unusual Situation

In my case, there were a number of things about my situation that gave me pause when thinking about asking for any kind of change.

I have a land line. This has become increasingly rare, and it seems Frontier is deprioritizing phone service so they can focus on providing internet and television service. It seems they want to provide packages that include everything, or at least include both internet and television.
Actually, I have two land lines. I'm not sure I know anybody else who has two land lines at home any more. I used to have three, but finally got rid of the third line after disposing of my last FAX machine years ago. Although I have two lines, they have not both shown up on my monthly bill for many years now. Oh, I am paying for two lines, it's just that the second line is not itemized anywhere. If you didn't know a-priori that I had two phone lines, you would hardly be able to tell that by looking at my phone bill. Based on conversations I have had with support and billing people at Frontier, it's not obvious to them either, although after I point it out, and with enough digging, some of them could figure it out.
I had a DSL line. As I mentioned, I delayed for quite some time in switching from DSL to FiOS. The longer I delayed, the fewer people had DSL lines, and the less Frontier cared about them. For this particular problem, I suppose my delaying upgrading perhaps made things worse.
My DSL service provider was not Frontier. This caused a fair amount of frustration any time I had a service issue with my DSL line.

My DSL service was perhaps the most unusual part of my situation. Back in 1999, when I originally ordered DSL, GTE (yes, it was that long ago!) had partnerships with Internet Service Providers who provided the actual internet service. These are known as CLECs (Competitive Local Exchange Carriers). So GTE provided the line, and my selected ISP provided the internet service. Originally I paid GTE directly for the line and I paid the ISP for the internet service. But when I switched from Frame Relay to ATM service, my billing also changed so that I paid everything to the CLEC, and they paid Verizon for the line.

Back then many of the carrier ISPs had annoying policies such as blocking some ports, so it was nice to be a customer of a smaller ISP that was more interested in making its customers happy. The downside was that, whenever there was a service problem, I had to deal with two companies, and they each tended to say it was the other company's problem.

By the time I was considering switching from DSL to FiOS this year, it had become perhaps comically bad: when I talked to support and billing people at Frontier, they were completely unaware that I had DSL service on my Frontier phone line, and even with a lot of digging, nobody I talked to at Frontier this year was ever able to find even a trace of information about my DSL line.

On the other side, my ISP had been acquired multiple times over the years, each time by a larger and more remote company, until by this year they were no longer in the DSL business and no longer in the residential ISP business. Somehow through all this, my residential DSL line kept working, but I did start to feel I was skating on ever-thinning ice.

It was time to take the plunge and upgrade.

Questions

Before ordering FiOS service, I wanted to get the answers to four questions:

What service options do I have?
What equipment will be installed and where?
What is the installation process?
How much will it cost?

How does one answer questions like these in today's world? Hit up the internet, of course.

Research

Frontier's Web Site

I started by browsing Frontier's web site looking for information about their service offerings.

NOTE: Frontier's offerings are regional, so you may see different web pages than what I describe.

Their FiOS page shows four levels of service: 50Mbps, 75Mbps, 100Mbps or 150Mbps. Since I wanted both internet and phone service, I headed over to the bundles page to see what I could get. I don't want their television service, so I unchecked the "Video" box. This shows three bundles: two that include 30Mbps internet service (30? that's not one of the speeds listed on their FiOS page!) and one that includes 50Mbps. Do they offer bundles that include internet service faster than 50Mbps? Their web site doesn't say.

Their Phone page shows me information about copper-line phone service (lower in the page it says "Our reliable copper power stays on even when the power goes out or in an emergency"), where they list two plans that differ by $3. Confusingly, this phone service - you know, POTS using analog signals on copper wires - is called "Digital Essentials." Are there any other optional add-ons? There are a fair number of features bundled with the basic phone service, and a lot more bundled with that extra $3, but is that it? Like, I currently have an unlisted phone number, what's the charge for that? Sorry, that kind of stuff is not on their web site. Ah, here on the the Digital Phone Unlimited page, it says "Optional international calling packages are available for great savings", so apparently there are other options available - but it's not a link, so I have no idea what kind of packages they might offer.

How about a second phone line, how much does that cost? Sorry, that's not on the web page. VoIP? Oh, maybe you mean FiOS Digital Voice. Beats me what the scoop is on that. If you go to Frontier's FiOS Bundles page, where it says the phone service in their bundles is Digital Phone Unlimited, and you click on the Learn More button for the phone service, it takes you to that phone page I mentioned above - you know, the one that says "Our reliable copper power stays on even when the power goes out or in an emergency." So, if I get a bundle that includes FiOS internet, does that bundle include Digital Phone Unlimited running on copper wires?

The details of the above web pages are what they look like now, in June 2017. I believe they have changed since I did my initial research a few months ago, but the gist is the same: I was unable to figure out what options were available to me by reading their web site.

Besides looking at Frontier's web pages, I did a lot of Googling and browsing of other web sites. I learned a lot in general about equipment, but it was hard to know how much of it would apply to my situation. Although I did not record the time I spent browsing Frontier's and others' web sites, I estimate it was probably about five hours.

It was time to move on to online chat to get more answers.

Online Chat

I had six online chats with Frontier, totaling about 4 1/2 hours. Between each chat I did more online research, looking for details about the equipment and the installation experience both inside and outside the house. It was difficult to get a good handle on these details, particularly since I sometimes got conflicting answers from the Frontier people I chatted with.

For example, one of my questions was whether I could keep my copper phone lines, or whether I would be required to switch my phone service to fiber. One of the people I chatted with said this:

You would have to switch to a digital phone service ! Voip. Which basically means Voice over Internet

Another one said I could keep my phone service on copper and get their "Simply FiOS" service, which is fiber with only internet service.

Ordering

Once I reached the point where I felt I had as good answers as I could get - which admittedly were not always very good - it was time to place my order.

On April 9 I called Frontier to place my order. I would say the fact that it took me well over an hour to place my order was the first hint of trouble, but in truth there were plenty of hints during the many chats I had, where I was not getting consistent answers.

Part of the reason the phone call took so long was due to my unusual situation. The DSL was not much of an issue during ordering, since it was completely invisible to them and they couldn't do anything about it. The real trouble was that second phone line. Figuring out how to deal with that took probably 45 minutes.

When I asked if I was required to switch my phone service from copper to fiber, the service rep first said no, but then went and asked someone else, came back and said yes, I would have to switch. I would have preferred to keep my phones on copper (and especially I would have preferred it given how much trouble I have had with the switch), but I was not given that option. So I placed the order to switch both of my phone lines over to fiber.

At some point I learned that each phone number at Frontier is on a separate account. This was completely invisible to me because both of my phone lines are billed on the account for my primary number, so that's the only account I see. Some of the Frontier people I talked to were able to find the separate account for the secondary line, but it always seemed to take them a while. In the end, I think that the fact that the secondary phone was actually a separate account has saved me some hassle with it: because it was on a separate account, the order to change the second phone over to fiber was done with a separate work order, scheduled for the day following the primary work order. Once the trouble started, I was able to cancel that second work order before anything was done to the second line; but the work had already started on the first line, and that has been the headache. I wonder now if there was any way I could have convinced them to just treat the internet service as a new internet-only account and so leave the phone lines and their account completely untouched.

I was pleased that my installation was scheduled very quickly, just two days later, on April 11. I should not have been. As it turned out, I did not actually get my FiOS service until April 18.

I wonder, had I known then what I know now, what I might have been able to do to avoid any of the troubles I have had.

Trouble

On the morning of April 11, I was a bit surprised that the installer did not call first to confirm I was home before coming by. When he arrived, I learned why: although he had two different phone numbers for me, somehow he had typos in both of them. These two numbers were for my two Frontier phone lines. I would have thought the computer would have just copied those numbers into the work order, but I assume now that a person manually put in those phone numbers, and somehow got them both wrong.

Unfortunately, the person who took my order scheduled my installer visit without first scheduling the preceding two steps of the installation process. As a result, when the installer came out for the April 11 appointment, he was unable to do his work, and had to leave having done nothing.

Before that first installer left, he told me he would call in the work orders to do the steps that should have been done before he got there. He might have done this right away, but when I called Frontier a little bit later that day, I was still unable to reschedule the installation because they didn't have the notes from that day's work order yet. So I had to wait and call back a couple of days later.

I was disappointed that I would have to wait longer to get my fast internet service, but that was just a mild disappointment. What was more annoying was that my DSL service went out on April 13, two days after that original installation date.

As I mentioned above, due to my unusual DSL situation, it was very difficult for me to get anyone to take any action on my DSL line. I called my ISP, and they said everything looked fine to them. I called Frontier and they couldn't help me at all; they had absolutely zero visibility into my DSL service. One tech said he would run a DSL line check on my line, but the computer wouldn't let him because it said there was no DSL service on my line.

My ISP suggested that my DSL modem may have died, and while I admit that is a possibility, the timing of the outage, plus the fact that the modem lights indicated no DSL carrier, leads me to believe that the work order to switch my copper line to fiber triggered some follow-on internal work order to turn off the DSL on that line, and because my DSL service was invisible to everyone who looked at my account, they had no way to manage that internal work order.

After a few frustrating and fruitless phone calls trying to get my DSL line fixed, I decided to forget it and hope that my new fiber internet connection would be running soon. In the meantime, I tethered my computer to my phone when I wanted to use the internet, so I did not have to suffer internet withdrawal while waiting for FiOS. Ironically, this gave me a faster connection than my 3Mbps DSL line, although I never got it working as a gateway for my entire LAN, but only used it on one computer at a time.

The first step in the installation process is for the utility locators to come out and spray lines marking the location of the existing utilities so that the people burying the fiber don't damage any existing buried utilities. Two days after the aborted initial installation appointment, on the same day my DSL service went out, various colored lines started appearing in my front yard marking the utilities. The following morning the fiber installers came out and buried the fiber cable running from the curb to my house (yay!). BUT - that afternoon, yet another utility locator came out to locate more utilities. So the fiber installers jumped the gun by installing the fiber before all of the utilities were located. Fortunately, they did not damage any of the unlocated utilities, so although they did not follow the prescribed procedure, at least no harm resulted from that mistake.

On April 18, now that the fiber was in place, the second installer came out to finish the installation. In about two hours he installed all the equipment and got the FiOS internet service working (yay!). For much of the next hour he worked over the phone with a technician trying to get the primary phone line working over fiber. After some discussion with me, they finally gave up and moved the phone line back to copper.

I was perfectly happy keeping my phone service on copper, as that's what I had originally wanted anyway. If only it had been so easy.

I learned from the installer that the second phone line was on a separate work order, to be moved from copper to fiber the next day. Given that they were unable to move the first line, and were willing to keep it on copper (I thought), I called and canceled the service call that was scheduled for the next day. I'm pretty sure doing that has saved me a lot of grief on my second phone line, as so far I have not had any problems with it, and it has continued working just fine on copper, as well as being billed properly.

On April 25, one week after the FiOS installation, I learned that my primary phone was not working properly. It may be that it stopped working a day or two sooner, but this is the day I realized it. It was broken in a strange way: I could place outgoing calls, and I could receive incoming calls from another phone number in the same exchange, such as my second phone line, but calls from outside the exchange would not go through. When I called from my mobile phone, which has a different area code, I could hear a ringback on my mobile, but my landline never rang. When I called from my wife's mobile phone, which is in the same area code but not in the same exchange, I immediately got a message saying "Your call can not be completed." I spent a couple of hours on the phone with Frontier over this.

On April 30, five days later, they finally managed to get the phone working again. We got a call at 8:15am that Sunday morning from a repair man testing to see if the line was working. Fortunately, we were already awake.

Two days later, on May 2, the phone service went out again, in the same way. Another hour on the phone with Frontier, and this time it "only" took them two days to get it fixed. So far, from then until now (mid-June), the phone service has not gone out again, so I am hopeful that they really have fixed it.

On May 8, I received my first bill from Frontier since getting my new FiOS service. It had a couple of minor errors on it, which I was able to deal with on the phone to Frontier in about 15 minutes.

On June 7, I received my second bill from Frontier since getting my new FiOS service. This one had more serious problems, and I spent closer to an hour on the phone with Frontier. The most significant problem is that, although my phone service never got switched over to fiber, which also would have included switching to a new service plan, the billing did get switched to the new plan. My old plan was $18.90/month, the new plan is $30.99/month. So I am being charged an extra $12.09 for exactly the same service that I was getting before the FiOS installation. The billing person I talked to told me she was unable to change my phone service back to the old plan because I had been grandfathered in at that old rate. I assume the computer did not provide her any way to go back to that grandfathered rate.

So here I am, two months after ordering FiOS, trying to figure out what I should do about my phone service. Try harder to get it back to the old rate? Try to get it changed to the service plan I am now being forced to pay for?

Or maybe I should just cancel it. Who has land lines these days anyway?

Mistakes

Here is a list of what I believe are the mistakes Frontier made that led to the above trouble.

When taking my order, the service rep scheduled the equipment installation without first scheduling utility location and fiber installation
Both of my phone numbers were entered incorrectly in the original work order
The fiber installers buried the fiber before all of the utilities were located
When the original installation was postponed, the order to disconnect my DSL service was not also postponed
When the installer was unable to move the phone service to fiber, and kept it on copper, he should have canceled the rest of the service order for moving the phone service (although I suspect the computer would not have let him do that, since he had already done some of the work on it)
When the phone went out the first time, and the repair man got it working again, he must have missed some piece of the puzzle, since it went out again two days later
Given that the phone service never actually got switched to the new plan on fiber, the billing likewise should not have changed

Good Stuff

While I think far more has gone badly than is reasonable, not everything has gone wrong. In fairness, I list here some good things.

The fiber installer did a very nice job burying the fiber line from the curb to the house. We could hardly see where they ran it, including through sod, and even where they had to run it under a bed of solid pachysandra, they only damaged a strip a few inches wide.
The equipment installer cheerfully ran ethernet cable from the ONT, across the ceiling in my garage, through a wall, into my network equipment closet, and to a wall-mounted jack.
My 100/100 internet service came up smoothly on the (second) scheduled date, and has been working well ever since. It is satisfyingly fast.
When I run speed tests, I consistently do get 100Mbps both up and down.
The Arris wifi router they included in the installation was actually pretty nice (although it would be better if there were some documentation available somewhere). If I were a less technically demanding customer, I would probably still be using it.
Both installers who came to my house were friendly and competent. A few of the tech support people I talked to also seemed quite competent.
Almost everyone I have communicated with at Frontier has been friendly and has (as far as I can tell) tried their best to help me. They always let me stay on the line asking questions as long as I wanted to; I never felt anyone was trying to get me to hang up.
I have not had any trouble getting credits applied to my bill.

Answers

This section lists what I think are the answers to the four questions I started with. YMMV: service, equipment, processes and prices may vary across regions and over time, and depending on your situation.

What service options do I have?

Sadly, I can't give you good answers here, so you will probably have to call or chat with Frontier and experience your own frustration at getting a different answer each time.

I do, however, have a few things to point out.

One point, that was always unclear to me when researching FiOS, is that there is no technical reason you can not keep your copper-wire phone along with FiOS. The fiber line is installed completely independently of the copper wires, and the service is likewise independent. Frontier may tell you that you must switch your phone service over to fiber service (either TDM, in which the phone signal is sent over the fiber separately from the Internet signal, or VoIP, where it is sent on top of the Internet signal), but that is purely a business issue.

A possible sticking point is the way Frontier handles their accounts: if your phone service is on the same account as your FiOS internet service, they are constrained as to what the computer will let them do with that phone service. If you want to keep your copper phone lines and they are telling you you can't, perhaps you can ask to put the Internet service on a separate account. You can then ask to have both accounts billed together. But you might lose out on some bundling discounts this way.

One of the differences between POTS over copper wires and VoIP is that POTS is regulated phone service, but VoIP is not. More specifically, under the Telecommunications Act of 1996 VoIP is considered an information service rather than a communications service, the upshot being that you don't have the same level of guarantees as POTS, which is regulated as a communications service. However, IANAL, and I was unable to determine whether or how later laws may have modified this situation, or whether those regulations are still being enforced, so this may be a moot point.

What equipment will be installed and where?

Not including the fiber from the street to your house that gets buried as part of the installation process, the installer installs three pieces of equipment:

The ONT (Optical Network Terminal), which converts between the optical signal carried on the fiber to the electrical signals used in the house. The ONT has the following connections:
- An optical connection that gets connected to the fiber from the street
- Two 8P8C (RJ45) ethernet jacks for the internet connection
- Two RJ-11 jacks for phone connections
- A coaxial connector for the cable connection
The ONT can be configured to provide internet service either through the 8P8C connector on a standard ethernet cable, or through the coaxial cable using MOCA.
The ONT is typically mounted on the outside of the garage. The fiber from the street is routed first into a holding box, typically mounted behind the actual ONT, where the excess cable is wrapped in big loops to take up all the slack, then from there it enters the ONT.
A power supply that includes a small battery backup for the ONT. This is typically mounted inside the garage, ideally just opposite where the ONT is mounted on the outside, and near a power outlet. The installer will then drill a hole through the garage wall to feed through the power wire from the supply to the ONT, and possibly another to bring the ethernet and coaxial cables into the garage if they will be routed through the garage. By default, the battery backup provides power only for the phone lines. It can be hacked to provide power for the internet portion of the ONT, or you can just buy your own UPS and plug the ONT power supply into that (although Frontier recommends plugging the ONT power supply directly into an outlet).
A MOCA-capable router. In my case this was an Arris NVG468MQ, which is a reasonably nice wireless router, except that they didn't give me a manual, and I was unable to find anything of substance online. The router has the following connections:
- A WAN ethernet port
- Four LAN ethernet ports
- A coax connector in case the internet signal is being supplied using MOCA
- A four-wire RJ-11 phone jack for up to two phone lines
If you have a good installer, they should be willing to let you decide where you want to put your router, and run ethernet cable (or coax if using MOCA) to that location, including drilling holes and installing a wall jack.

The internet signal from the ONT to the router can run either over an ethernet cable or over a coax cable. If you are getting TV service, they will have to run a coax cable for the TV service. If your internet service is slower than 100/100, it is possible to run the internet service over that same cable to the MOCA-capable router. If your internet service is 100/100 or faster, you probably want to run that over an ethernet cable; and you might someday want to upgrade to 100/100 or faster service later, so you probably should have them install that ethernet cable now anyway and have them run the internet signal through that to the router. Plus, that gives you the option of replacing their router with one of your own choice that doesn't do MOCA.

What is the installation process?

Installation of new FiOS service - not including preliminary research, placing the order, and post-installation followup to correct problems - consists of three sequential steps:

Locate existing utilities: one or more people come out with metal detectors that they use to locate existing utilities such as power, water, sewer, gas, phone, and cable, and paint different colored lines marking those locations so that the fiber installers don't accidentally damage the existing utilities.
Bury fiber from curb to house: a fiber installer puts in that last piece of fiber from the drop point (by the street near your house) to your house, typically to the garage. In the other direction, the fiber at the curb runs to a nearby junction box, where the installer connects it to an available port. At this point a signal is available at the fiber end by the house.
Install equipment outside and inside the house: an equipment installer installs the equipment on the outside of your house and inside your house, and connects everything up. If you have existing POTS service and are switching to FiOS phone service, the phone lines that lead into the house are disconnected from the old copper lines and connected to the output of the ONT. The installer calls the plant and works with them to bring up the services you have ordered.

How much will it cost?

Perhaps because I am a long-time customer, Frontier did not charge me any kind of installation fee, which was nice. I don't know if that is standard. One person told me the regular installation fee is $80.

For the monthly fees, it may cost significantly more than you expect.

Frontier advertises their 100/100 internet service as $60 per month. They have not yet managed to send me a clean monthly bill since my upgrade, but based on my estimate of what that monthly amount is going to be, I believe the effective cost of my 100/100 service is actually over $100 per month. Here's how that breaks down:

The $60 rate is only if you sign a two year contract and only for the first six months. This is stated in the fine print on their web page, along with "Equip. and other fees apply." I did not sign a contract, so my monthly fee is $85.
After Frontier told me I was required to change my phone service to a new plan, and then was unable to deliver, my old grandfathered-in rate of $18.90 disappeared and was replaced by the $30.99 rate for Digital Phone Unlimited, despite the fact that I don't actually have that service. So I am currently paying an additional $12.09 per month for exactly the same phone service that I had before ordering FiOS internet service.
Taxes look like they will be about an additional $6 per month.

One other annoyance relating to cost: Frontier offered me a $100 gift card for signing up with them for FiOS internet. When I went to activate the gift card on their web site, I was presented with a terms and conditions screen requiring me to agree to a new 1 year term agreement. I had chosen not to sign a contract and to pay $85/month rather than $60/month, so it felt kind of like they were trying to pull a fast one on me by hoping I would activate the gift card without reading the fine print.

Frontier's Problems

Frontier's web site does not provide very good information about what service options are available.
If you call their Customer Service outside of their working hours, you get a message telling you they are closed, but that message does not tell you when they are open, and it's not an easy thing to find on their web site.
Different people at Frontier will give you different answers to the same questions. For example, I asked whether I would need to upgrade my copper-wire phone service to fiber; some said yes, some said no. Or sometimes first one answer then the other. One person suggested I put my phone service on a separate account from my internet service; another told me I could not do that.
Frontier's phone bills provide tons of details about taxes, but almost no details about regular charges. For example, I have two phone lines, and for most of the last few years they were billed as one line item labeled "Residence Line", with no indication that there were two lines.
Frontier's computers significantly constrain what their people can see and do. Or maybe their programs are just really hard to use. The customer service reps can't see the details of service calls, and the service techs can't see the account details. It is apparently not obvious when a customer has multiple accounts being billed together. And nobody could see anything about my DSL line.

Timeline

Date	Event
2017-03-02 Th	Online chat #1 with Frontier (43 minutes)
2017-03-06 Mo	Online chat #2 with Frontier (23 minutes)
2017-03-15 We	Online chat #3 with Frontier (55 minutes)
2017-03-18 Sa	Online chat #4 with Frontier (20 minutes, then cut off)
2017-03-20 Mo	Online chat #5 with Frontier (estimated 20 minutes)
2017-03-21 Tu	Online chat #6 with Frontier (1 hour and 38 minutes)
2017-04-09 Su	Phone call with Frontier to order FiOS, service scheduled for Apr 11 (1 hour and 17 minutes)
2017-04-11 Tu	Installer came out, couldn't do anything because they have not yet buried the fiber from the curb to the house
2017-04-11 Tu	Called Frontier to reschedule installation, was told the current installer has not yet entered his notes, please call back in 24 hours (12 minutes)
2017-04-13 Th	DSL service died at about 12:30pm
2017-04-13 Th	Utility locators started painting colored lines where existing services are buried
2017-04-13 Th	Called Frontier to try to get DSL line fixed (24 minutes)
2017-04-14 Fr	Fiber installers installed the curb-to-house fiber (before all the locators had painted their lines)
2017-04-14 Fr	Another locator came out to paint lines; when I pointed out that the fiber had already been installed, he stopped painting, took his final photos, and left
2017-04-14 Fr	Called ISP to try to get DSL line fixed (12 minutes)
2017-04-14 Fr	Called Frontier (multiple times) to check on status of FiOS order (the fiber was installed this morning, but they said the order had not yet been updated to show that) (8 minutes + 13 minutes + 12 minutes + 25 minutes)
2017-04-15 Sa	Called Frontier to check on the status of my FiOS order (8 minutes)
2017-04-18 Tu	Installer came out and completed the physical installation of the equipment, got the FiOS internet service working. He was unable to get the phones working over fiber, so switched everything back to copper and left, with everything working (3 hours and 10 minutes)
2017-04-18 Tu	Called Frontier, canceled the remaining order to move the second line over to fiber (scheduled for tomorrow) (7 minutes)
2017-04-25 Tu	Our main line stopped working, was unable to be reached from outside our exchange
2017-04-25 Tu	Called Frontier to report our main phone line not working (44 minutes)
2017-04-26 We	Called Frontier to continue discussions about non-working phone (1 hour and 35 minutes)
2017-04-30 Su	Received a call from Frontier at about 8:15am this morning on the main line, he said it was now fixed (1 minute)
2017-05-01 Mo	Phone seems to have been working today, we received at least one incoming phone call
2017-05-02 Tu	Called Frontier in the morning because my main phone was not working again (39 minutes, then was cut off)
2017-05-02 Tu	My wife called Frontier mid-day about the non-working phone (15 minutes)
2017-05-02 Tu	Called Frontier in the evening to continue the call from this morning (14 minutes)
2017-05-04 Th	Frontier called, the line is working again
2017-05-08 Mo	Called Frontier to have them correct errors on my April bill (the first received since I started FiOS service) (12 minutes)
2017-06-07 We	Received second bill since switching to FiOS - still wrong
2017-06-14 We	Called Frontier to deal with problems on my May bill (48 minutes)

Total time (as of June 14): 20.3 hours

Web research: 5 hours
Chat: 4.3 hours
Place order: 1.3 hours
Installer: 3.2 hours
Followup phone calls (through June 14): 6.5 hours

Selected Quotes

I took notes on all my phone calls with Frontier, including writing down certain things verbatim. For your entertainment, I present here some of those quotes, in no particular order. I will let you imagine the context.

That is very confusing.
Why can't I see that one?
I don't know why they didn't just leave it alone.
The program is wrong.
How are you an R-U out of Washington?
... and that's what I'm not seeing.
We don't do these very often.
Within our system we have nine different portals where we have to test things.
This is very new to me, I have never dealt with two lines like this.
Sorry this is taking so long, we'll get it figured out for you.
It's not giving me anything.

The Rule Of Law

2014-12-21T20:45:00.000-08:00

A layman's view of The Rule of Law. IANAL.

This I Believe

Some years ago NPR started running a series called This I Believe as a tribute to Edward R. Murrow and his original 1951 radio program of the same name. As I commuted I would occasionally catch an episode and hear an essay about the topic in which a contributor believed. I would listen to an essay around a weighty topic as God, Love, Funerals, Good and Evil or Public Service and think, "no", "maybe", or "yeah, sure". Then one day I heard Michael Mullane's essay on The Rule Of Law and I thought "Yes! This I believe!"

I particularly liked Michael's point that the Tinkerbell effect applies to the Rule of Law. As he says, God exists (or does not exist) whether or not you or I believe that to be so, but with the Rule of Law, it can only exist if almost all of us believe in it and follow it. As Death says to a human near the end of Terry Pratchett's Discworld book Hogfather, "YOU NEED TO BELIEVE IN THINGS THAT AREN'T TRUE. HOW ELSE CAN THEY BECOME?" (Death always talks IN CAPITAL LETTERS.)

The Importance of Law

Why is law important? The American Declaration of Independence asserts that "all men are created equal" and The Universal Declaration of Human Rights asserts that "all human beings are born free and equal in dignity and rights." To support that position we need a system of law that in fact treats all people equally. But even if the law does not protect all of the fundamental human rights, it can provide an important benefit to its society: stability through predictability.

To be predictable, the system of laws must be:

Understandable - the laws can be understood by most people.
Consistent - individual laws do not conflict with each other.
Extensive - the laws cover all common situations and a large portion of less common situations.

There have been many successful nations that followed the Rule of Law with different laws for different classes of people, including Rome and Greece. A system of law can provide stability and a foundation for an orderly and effective society without treating all people equally.

Prescriptive and Proscriptive Law

Prescriptive laws are those that tell us what we must do, such as Honour thy father and thy mother. Proscriptive laws are those that tell us what we must not do, such as "Thou shalt not kill".

You can think of prescriptive law as additive manufacturing: you can start with nothing, and add pieces until you get something useful, like building up a sculpture by adding little pieces of clay, or 3D printing.

Proscriptive law is more like subtractive manufacturing: you start with a block of something and carve away pieces until you get the desired result, like starting with a chunk of marble and carving a sculpture out of it, or machining.

(But don't try searching for additive law or subtractive law unless you are working with primary colors. :-) )

Given the assumption of freedom in both of the Declarations above, it's easier to start by saying people can do anything, then add proscriptive laws specifying what they can't do. Compared to the complete freedom and anarchy of a society with no laws, you can get pretty far down the road to stability just with proscriptive laws. Of the Ten Commandments, eight are proscriptive and only two are prescriptive.

Law versus Convention

While the Rule of Law normally refers to the explicit and codified laws on the books, which can be enforced by the state, there is another set of rules that most of us live by which are not legally mandated. These conventions include social guidelines that prescribe how to behave and communicate, including when and how it is appropriate to touch (such as shaking hands or a pat on the back), to ask for something (with "please" and "thank you"), to offer advice ("true/kind/necessary") or apologies, and many other behaviors.

These conventions don't have the force of law. If you break these rules, you won't be sent to jail or be forced to pay someone monetary damages - but you might find that you are a little less successful and your life might be a little less pleasant. Like laws, conventions are only useful if most of us agree on them, and like laws, a widely accepted and understood set of conventions helps make the world a little bit more predictable, which in turn makes it a little bit easier for people to make plans and be successful.

In effect, social conventions are simply another layer of "laws" that sit below the constitutional laws and the statute laws (and in reality the American legal system has many other levels than just those two).

Multiple Systems of Law

I am intrigued by the fact that we have so many different implementations of the Rule of Law. Every nation on Earth that abides by the Rule of Law has its own system of law. The ways in which the laws of nations interact is as varied as the relationships between the nations. For example, American Law has specific sections dealing with the fact that there are Native American "domestic dependent nations" within its borders that have their own laws.

Similarly, every nation has a different set of social conventions, those unwritten rules that lubricate our everyday interactions.

On top of all those different systems of law, we have International Law, with the intent of providing structure for interactions between nations when those nations have different and possibly incompatible systems of laws. Two aspects of International Law that I find particularly thought-provoking are the Law of War, and Jurisdiction.

(For an interesting bit of history about Jurisdiction, read about Peine forte et dure.)

Meta Law

In order to be predictable, the laws must be stable and not change often; but the laws must sometimes be changed in order to cover new situations or to correct problems in existing laws. One approach to improving the predictability of the system of laws while still allowing for change is to use a layered approach, where some laws are considered more important than others and are thus harder to change. The set of harder-to-change laws typically includes the rules on how to change the laws. This is the basis of the constitutional model, as is used in the United States, in which the most important laws are embodied in the constitution, with rules that make those laws much harder to change than regular laws. A constitution will typically include rules on how both "normal" rules and the rules embodied in the constitution can be changed.

Back in 1982, a "constitution" game by Peter Suber called Nomic appeared in Douglas R. Hofstadter's column, "Metamagical Themas," in Scientific American. In this game, players take turns proposing changes to the rules of the game. The rules start out in two categories, "immutable" and "mutable", corresponding to the simple two-level "constitutional" and "statute" law that Americans are taught in civics classes. The rules of the game tell how a player wins the game, and also tell how the rules can be changed - including how to change the rules that tell how to win and how to change the rules. The Nomic game is intended to illustrate the mechanisms and possibilities described in Peter Suber's book The Paradox of Self-Amendment, available online. For the quickest read on the game, you can jump straight to the rules, but the game description, although somewhat lengthy, is also interesting.

In Suber's book he starts by asking how a legal system can deal with paradox, when there are laws that directly contradict each other, and he notes that "paradoxes come and go without much notice and are dealt with without much ado."

Given that systems of laws seem always to be self-referential (since they include rules about how to change the rules), attempting to craft a system of laws that is also complete and consistent would seem to run into a version of Gödel's Incompleteness Theorem. In practice, systems of laws are not really complete and still blithely violate consistency, yet manage to be quite useful despite their flaws.

Law and Software

The title of this section might refer to laws that affect software, such as copyright law, or it might refer to the use of software to assist in the application of law, such as computerized law indexes or Regulation by Software; but in fact, I am referring to the use of law as a concept in defining how software works.

As in a society, a programming language is built on a set of rules that describe how statements in the language are interpreted by the computer. The developer uses his knowledge of these rules to create a program that instructs the computer to do something that is useful to the developer.

Imagine trying to program in a computer language with no rules. How could you get anything done? You could never predict the results of a statement, so you could never make a program that produced anything predictable.

Just as different societies each have their own set of rules, different programming languages each have their own set of rules. And just as with social conventions, different groups of programmers typically adopt programming conventions that are not enforced by the compiler but are intended to make life a bit simpler for the developers in the group.

In fact, all of the concepts discussed above are applicable to software. You can consider that as we take a look at what it means to define software in terms of laws.

Law and Object-Oriented Programming

Back in 1987 Naftaly Minsky and David Rozenshtein published "A Law-Based Approach to Object-Oriented Programming" (available for purchase on-line) in which they discussed how an object-oriented system can be described in terms of the laws that control the exchange of messages between objects. (Minsky has published quite a few other papers on related topics concerning law and software.)

They start by defining objects as containing state and program, with four primitive messages (prefixed by the octothorpe character, #) to create (#new) and destroy (#kill) objects and to get (#get) and set (#mutate) state. Messages are defined as a triplet of sender, message text, and target. Message delivery goes through the law system, which can take one of three actions:

The message can be delivered to its target.
The message text and/or target can be modified and then delivered.
The message can be blocked and thus not delivered.

With these definitions and a permissive law that allows any object to send any message to any other object, the system does not exhibit many of the characteristics typically associated with object-oriented systems.

They then examine the effect of different kinds of laws, such as allowing primitive messages to be sent only by the same object. Through this approach they show how to implement common object-oriented features such as encapsulation, inheritance, and class variables as well as less common features such as multiple inheritance, exclusion of methods from inheritance, and triggers.

Given that the program is part of the state of the object, it can be modified with a #mutate message, so it is possible to describe self-modifying programs within this framework. The laws of the system control whether and how this message is allowed to be sent.

By defining the laws in objects that are themselves part of the system, those laws can then be changed. The system could start with a separate subset of laws that control how the laws can be changed, making this approach look very much like Suber's Constitution game.

The law system allows the laws to modify the content of a message or redirect it to a different target, allowing for the implementation of security checks and other forms of enforced delegation.

I am not aware of a production system that directly uses this style of law-based control of message passing, but there are some systems that use a conceptually similar method of applying a set of rules to some messages to control their delivery. For example, in the Java security model different environments can have different implementations of the SecurityManager, each with its own definition of the security policy (i.e. rules) that controls whether certain actions are allowed to be taken, which can be viewed as allowing the messages requesting those actions to be delivered. The OSGi security model goes further towards being a general law-based system, including the ability to specify rules via a string and to compose multiple security policies.

Law and Mind

For both societies and software, laws are rules telling us what we must and must not do, or do differently, and conventions are rules telling us what we should and should not do, or do differently. By following these rules and conventions, a society or a software system can be far more productive than one with the same underlying capabilities but where the rules and laws are less cohesive and effective or are not followed.

Could it be that the same is true for our minds? According to Marvin Minsky's theory of the mind as set forth in Society of Mind, our minds are composed of many small agents communicating with each other. Minsky's agents are very small pieces, and the communication between them is below our level of awareness. Perhaps our minds use something like Naftaly Minsky's law-based message delivery mechanism to monitor and control these low-level communications between agents.

Maybe the biggest difference between people who are productive and those who are not is in the different internal rules the two minds follow, and not so much a difference in raw underlying capability. Maybe productive people have a better set of mental rules controlling the messages within their minds. And if that is true, that leads to an interesting question: to what extent is it possible for people to rewrite their own low-level internal communication rules to improve their performance, and how might that be accomplished?

From Obvious To Agile

2014-07-06T12:57:00.000-07:00

What do you do when obvious isn't?

Installing new fence posts

Many years ago I had a fence that needed to be repaired. I got a recommendation for a fence repair man from a friend and had him come out to take a look. He said the panels between the posts were fine and did not need to be replaced, I just needed new posts. He quoted me a price for installing new fence posts that seemed quite reasonable, and I accepted his bid.

A few days later he came back to do the job. After he had been out there working for a while, I went out to take a look. I was surprised when I saw how he had installed the new fence posts. He had not removed the old posts and put new posts in their places, as I had assumed; instead, he simply planted a new post next to each old post and strapped them together. I was flabbergasted, and complained to him that my expectation was that he was going to take out the old posts and replace them with new posts. He was nonplussed. "I told you I would install new posts," he said. "Taking out the old posts would be way more work, and I would have to charge you more."

Well, he had me: he had indeed said only that he would install new posts. I was the one who assumed he would take out the old posts. I grumbled, paid him extra to replace a few of the old posts where it was particularly troublesome to have an extra post sticking out, and had the whole fence replaced the right way a few years later.

Keep using gmail

One of the startups at which I worked used gmail and was acquired by a large company that used Exchange. Concerned about the possibility of having to move to what we felt was a worse system, we asked what would happen with email. We were relieved when they said we could keep using gmail.

On the very first day that we were officially part of the new company, we were all told that we now had Exchange email accounts. "Hey!," we said, "you told us we could keep our gmail accounts." "Yes, you can," came the response, "but you also need to have an Exchange account for all official company email."

This was, of course, not what we had expected when we asked if we could keep our gmail accounts. But, as with the new fence posts, they had in fact kept their word and let us keep our gmail accounts; it was we who assumed that that would continue to be our only email account.

Everything under SCCS

At one of the places I worked, we hired a contractor to work on a subsystem. At one point we became concerned about how he was managing his source code, so we asked how he was doing that. "Everything is under sccs," he said. (This was well before the days of git, subversion, cvs, or even rcs; at the time, sccs (Source Code Control System) was what most people in our industry were using.) When he finally delivered the source code to us, we were annoyed to discover that he simply had a directory named "sccs", and all of his source code was contained in that directory; there was in fact no versioning or history.

Once again, this was not what we had expected. When he said "sccs" we assumed he was talking about the source code control system, when in fact he was just referring to a directory name; and when he said "under" we assumed he meant "managed by", when in fact he just meant "contained in."

A new and improved version of Android

My first smart phone was an Android phone running version 2.2. I watched as the newer versions of Android came out, filled with interesting new features. Finally, an over-the-air update was available for my phone. I eagerly updated and started playing with the new features. My first disappointment was with the new and definitely not improved performance: my phone was slow and laggy, and it no longer lasted even one day on a full charge.

I was even more dismayed to discover that they had removed USB Mass Storage Mode (MSC or UMS) and replaced it with a significantly less functional alternative, MTP (Media Transfer Protocol). In my case, it was completely non-functional for my use, because my home desktop machine was running Linux, and at the time there was not a working Linux driver for MTP mode.

I was, as you might expect, pretty ticked off. I had assumed without thinking about it that they would not remove a significant feature from a new version of the software, but they never said that.

Alternate Interpretations

Ask yourself: when reading the above anecdotes, did you realize in advance of the denouement what the problem would be for all of them? If it had been you, would you have made the same assumptions as I did?

Sometimes something seems so obvious to us that it does not even cross our minds that there might be an alternate interpretation.

I don't think it is possible for us to see these alternative interpretations in every case; often it is something with which we have had no experience, so could not be expected to know. We do, of course, sometimes consider alternative interpretations. In the future, if someone tells me they will install new fence posts, I will be sure to ask for more details. But we have to make assumptions as we deal with the world every day. If we examined every statement and every experience for alternative interpretations, that would consume all of our time, and we would not have any time left to pursue new thoughts. We learn to make instant and unconscious judgment calls: as long as what we hear and see has a high enough probability of an unambiguous interpretation, the possibility that there is an alternate interpretation does not bubble up to our conscious minds. Overall this is a very effective strategy that lets us focus our mental energies on situations where an unusual outcome is more likely. But this does mean that every once in a while we will miss something, with undesired results.

Going beyond obvious

I have already given my recommendation to State The Obvious. However, as you can see from the above anecdotes, this is not always enough. But what else can we do?

If you consider the anecdotes above, you might notice that, in most of them, by the time I realized that I had made an incorrect assumption, the deed was done and I was stuck with an undesired result. But the fence post story was a little different: in that case, I checked up on the work before it was done. Because I discovered the problem while it was happening, I was able to ask for changes and get a result that was closer to what I wanted.

Software Development

Not all of my blog posts are about software development, but in this case the application is obvious. Well, it seems obvious to me, but just in case it is not obvious to everyone, I will follow my own advice and explain in detail.

In the traditional waterfall process, a complete and detailed specification of the desired system is created before doing any of the implementation work. Once that spec is done, the system is built to match it. But, as we have seen from the anecdotes above, even a very simple spec, such as "install new fence posts", might be interpreted in a bizarre way that still matches the letter of the specification. In this case, the result might be something that arguably matches what was specified, but is not what was wanted.

Based on my personal experience and anecdotes I have heard from others, I believe that it is very difficult to write a good spec for something new, and impossible to write a spec that can not be interpreted by somebody in some bizarre way that satisfies the spec but is not the desired result.

Given that we can't guarantee that we can write a spec that will not be misinterpreted, what is the alternative? I think the only alternative is to do what I did in the fence-post case: check up on the work and make corrections along the way. This is embodied in a couple of the value statements in The Agile Manifesto: "Customer collaboration over contract negotiation" and "Responding to change over following a plan".

If you are asking someone to create something that is very similar to things that have been created before, and through previous common experience there is already a shared vocabulary sufficient to describe how the desired result compares to those previous creations, then you can perhaps write a spec that will get you what you want. The closer the new thing is to those previously created things, the easier that will be. But in software development, where the goal is often specifically to create something novel, this is particularly difficult. In that situation, I think that creating and then relying solely on a detailed spec is less likely to result in a satisfactory outcome; I believe an agreement on direction and major points, followed by keeping a close eye on progress, paying particular attention when something is being done for the first time, is the key to good results.

Writing a Spec

I'm not saying don't write a spec. I'm saying you need to recognize that a spec won't take you all the way, and a poorly written spec can hinder your progress. Writing a spec is like looking at a map and planning your route: often necessary but seldom sufficient. You need to be prepared for construction closures, blocking accidents, or even additional interesting sights you might decide to see along the way. For any of these diversions, you will need to reexamine your route in the middle of the trip and select an alternative. For a short trip, you might not run into any such problems and thus not need to modify your route, but the longer the journey the more likely that at some point you will need or want to deviate from your original route.

If you are familiar with the roads and have a clear destination, you might be able to dispense with the initial route planning completely: just head in the right direction and follow the signs. Or if you are on a discovery road trip and don't have a specific destination, then heading out without a planned route is fine. In most cases, though, some level of advance route planning will save time. You just need to stay agile and be prepared to change your route along the way.

Code Guidelines

2013-11-03T11:28:00.000-08:00

A list of basic goals for creating code.

In our team project at work, we wanted to have a set of style guidelines to allow everyone to more easily and quickly read the codebase and to avoid spurious code reformatting changes. As you might expect, there were different opinions on many points. To avoid fruitless "my way is just better" discussions, I wanted to step back and make sure we could all agree on some general goals. With that agreement in place, we could at least ask people to explain how their preferred style on some point supports our general goals. If nobody can provide an argument to support a favored construct, we might as well flip a coin.

Below are the goals I proposed and with which the team agreed. I think many of these are obvious, but then I usually believe in stating the obvious. The first two criteria below are also listed in my post on Software Quality Dimensions. Your team may choose slightly different guiding principles, but I think having the team agree on and write down their principles and asking people to justify their proposed standards against those principles can help short-circuit disagreements that might otherwise take longer to resolve.

Goals

In order of priority, with the most important criteria first: First, we want our code to be correct.
This means that the code must:

perform the desired primary behavior.
behave in a defined way for expected error conditions.
not have undesirable side-effects.
not have security vulnerabilities such as buffer overflows or injections.
not have memory problems such as leaks or use of released or uninitialized memory.
run fast enough for the intended use cases (but without premature optimization).

Second, we want our code to be robust.
This means that the code should be written in such a way as to minimize the probability of incorrect behavior under a wide range of conditions, including when:

it receives unexpected, corrupted, or no input data (graceful degradation).
a programmer unfamiliar with the code makes changes to it.
the functionality of neighboring code changes.
the development environment or toolset changes.

Third, we want our developers to be as productive as possible.
This means the code should be written such that:

developers are unlikely to misunderstand what the code does (principle of least surprise).
developers can read and understand the code quickly.

Role-Based Authorization

2012-10-24T20:45:00.000-07:00

A simple, uniform, powerful and extensible authorization model.

Introduction
Separation of Concerns
Users
Actions
Objects
Roles
Role Activation
Role Hierarchies
Alternate Hierarchy Implementations
Interlude
Tasks
Domains
Intermediate Summary
Times, Periods and Schedules
Locations, Areas and Regions
Denials
Exceptions
Prioritization
Summary

Introduction

The "three As" of security are:

Authentication - assuring that the user is who he says he is.
Authorization - allowing each authenticated user to perform selected privileged actions.
Audit - recording privileged actions to allow review of changes or potential abuse of privileges.

Given authentication and auditing it is pretty simple to add a bit more monitoring that is very useful for billing purposes and resource management, so you more often see the combination AAA (Authentication, Authorization, Accounting) or AAAA (Authentication, Authorization, Audit, Accounting).

In this post I discuss only authorization. Authentication and auditing are each big topics, so I won't try to cover them here. Similarly, I assume that the code and data are themselves secure. In particular, I do not cover the issue of multiple security domains and the problem of having lower security code make requests to higher security code.

With my focus only on authorization, in the discussion below I assume that the user has been authenticated so that we can trust that piece of data within the application.

I will use the language of relational databases in this post because it is well-known and precise. An implementation of this model can use some other mechanism to store and query the authorization data. The SQL examples provide precision to the discussion, but you should be able to skip the SQL code and still gain a basic understanding of the model.

In the SQL example code I indicate replacement variables within braces; for example the string {user} in a SQL statement indicates that the application should plug in the user name at that point in the expression. For a real implementation, the actual syntax would depend on the database access package in use.

I have run into some authorization systems intended to provide a powerful set of capabilities for a complex situation that were, unfortunately, themselves so complex as to make it difficult to understand how they were supposed to work, and even after having it explained, difficult to remember because there was not a simple underlying model to tie it all together.

In this post I present an approach to authorization that I believe provides a very high level of power with a model that is relatively simple to understand and to extend as needed. This model initially implements a Role-Based Access Control (RBAC) mechanism, a widely used approach to security that is now a NIST standard. I add a few extensions to the common model that make it start to look more like an Attribute-Based Access Control (ABAC) model.

Separation of Concerns

In an authorization system, we want to separate the management of authorization from the application. The application should ask permission for what it wants to do, which permission is supplied by the authorization system. All management of the granting of the authorizations is handled from the authorization system, completely outside of the application. If you build a system in which any of the abstractions used in the management of authorizations, such as roles, appear in the application, then, as they say, you are doing it wrong.

In this post I focus only on the part of the system that determines whether to grant authorization. A separate system is required to maintain the data that is used by the authorization system. That maintenance can become quite complex in enterprise systems, but I will not be discussing it further in this post except to mention that the authorization mechanism described here can be applied to the system that maintains the authorization data in order to control who is allowed to modify what parts of that data.

Users

Let's start with perhaps the simplest useful authorization model possible. We begin with a one-column user table containing user names.

create table user(name varchar(32) primary key);

When the application wants to check for our sole authorization, it takes a passed-in authenticated user name and calls the authorization function with that value. The authorization function just checks to see if that user exists in the table. If so, the user is authorized and the authorization function returns true; if not, the user is not authorized and the authorization function returns false.

-- authorized if count>0
select count(*) from user where name={user};

The user-only model is too simple for most applications.

Actions

The next step is to add a one-column action table containing actions. We will assume each action is represented by a string name, although for performance reasons some might choose a different representation.

create table action(name varchar(32) primary key);

We add one row to this table for each restricted action; for example, we might have entries for login, reboot_system, and view_system_users.

With the addition of the action table we can no longer just look up users in the user table. We add a third table called grant (or auth_grant, since grant is typically a reserved word in SQL) with two columns that are foreign-key columns to the user and action tables. Each row of the grant table refers to a user and an action, with the meaning that that user is granted authorization to perform that action.

create table auth_grant(
    user varchar(32) not null,
    action varchar(32) not null,
    constraint FK_grant_user foreign key(user)
        references user(name),
    constraint FK_grant_action foreign key(action)
        references action(name)
);

Our authorization function will now accept a combination of values. We will refer to this combination as the requested operation (the NIST standard uses transaction as the unit for which permissions are granted). When an application wants to perform a potentially restricted operation, it takes the passed-in authenticated user name, adds the action it wants to perform, and passes that data to the authorization function. The authorization function takes the passed-in user and action arguments and looks in the grant table for a row in which the passed-in values for user and action match the values in the corresponding columns in the table. That row defines a permission to execute the requested operation. If that row exists, the operation is authorized; if that row does not exist, the operation is not authorized.

-- authorized if count>0
select count(*) from auth_grant where
    user={user} and
    action={action};

The user+action model is sufficient for many simple systems, such as granting login rights to some users and admin rights to other users.

Objects

With just users and actions, each action granted to a user effectively has global scope within the system. This is fine for actions such as login which truly are intended to be global in scope, but we would also like to be able to specify that certain actions can be performed on specific objects. Modern operating systems include mechanisms to grant different access rights, such as read-file or write-file, to specific files based on the user.

We add a one-column object table containing references to the objects in our system for which we want to be able to issue grants, with one row for each such object. We are making the simplifying assumption that each object already has a unique identifier that can be stored in our database.

create table object(name varchar(32) primary key);

We add a third column to our grant table that is a foreign-key column to the object table, exactly analogous to the existing references to the user and action tables. Each row of the grant table now refers to a user, an action and an object, with the meaning that that user is granted authorization to perform that action on that object.

create table auth_grant(
    user varchar(32) not null,
    action varchar(32) not null,
    object varchar(32) not null,
    constraint FK_grant_user foreign key(user)
        references user(name),
    constraint FK_grant_action foreign key(action)
        references action(name),
    constraint FK_grant_object foreign key(object)
        references object(name)
);

If we still want to have actions with global scope, such as the example of a login action in the user+action model, we can add a special system object that can be used in that situation.

Our authorization requests from the application now include three pieces of data. We modify our function for authorizing a restricted operation to take an argument specifying the object, along with the user and action arguments that we already have. The authorization function looks in the grant table as before, but it now must find a row that matches all three fields rather than only user and action.

-- authorized if count>0
select count(*) from auth_grant where
    user={user} and
    action={action} and
    object={object};

The user+action+object model presented here is used in many databases, with the objects being database tables or views and the actions being the four database actions of select, insert, update and delete. There may also be additional actions such as grant (the ability to create additional grants on an object) or actions that allow creating and modifying users or databases.

Roles

In order to simplify the maintenance of grants when we have a large number of users, we add a mechanism that allows us to group users together and grant permissions to a group of users rather than just to a single user. Users are grouped according to the roles they play; example roles are user, administrator, and superuser.

We add a role table with one row for each role we define. (We will look at other possible implementations later, but this choice serves well for explaining the concepts.)

create table role(name varchar(32) primary key);

In order to indicate which users have been granted (assigned) which roles, we add a user_role table with two columns: the user column is a foreign key to the user table that references the user, and the role column is a foreign key to the role table. A user having a role is indicated by adding a row to the user_role table referencing that user and that role. When granting authorization, a user will receive authorization for all roles he has.

create table user_role(
    user varchar(32) not null,
    role varchar(32) not null,
    constraint FK_userrole_user foreign key(user)
        references user(name),
    constraint FK_userrole_role foreign key(role)
        references role(name)
);

We also add a role column to our grant table. This column is a foreign key to the one column in our role table. A row in the grant table can now refer either to a user or to a role. It must reference one or the other; while it might be possible to set up a structure to enforce that constraint directly in the database, we will skip that exercise and instead suggest that this constraint could be enforced by an application-level database consistency check.

create table auth_grant(
    user varchar(32),
    role varchar(32),
    action varchar(32) not null,
    object varchar(32) not null,
    constraint FK_grant_user foreign key(user)
        references user(name),
    constraint FK_grant_role foreign key(role)
        references role(name),
    constraint FK_grant_action foreign key(action)
        references action(name),
    constraint FK_grant_object foreign key(object)
        references object(name)
);

The addition of roles is entirely an abstraction within the authorization system; the application is not aware of roles. An operation is defined by the same three values as before, and the application calls the authorization function in the same way as before to see if an operation is authorized, but the authorization function has to do a little more work now.

The application still passes the user, action and object arguments to the authorization function, and the authorization function still looks in the grant table to see if that combination of user, action and object is authorized, but now in addition to looking for a row that exactly matches those three values, it also looks up all of the roles the specified user has, and it looks for a row in the grant table in which the action and object values exactly match the values passed in and in which the role in the grant table is one of the roles the user has. If the authorization function finds a row that exactly matches the action and object and that exactly matches either the user or any of the user's roles then the action is authorized; if no such matching row is found then the action is not authorized.

-- authorized if count>0
select count(*) from auth_grant where
    (user={user} or role in (select role from user_role where user={user})) and
    action={action} and
    object={object};

The (user+role)+action+object model presented here has been used in the Unix filesystem for many years, with the objects being files and directories, the actions being read, write and execute/search, and the roles called groups.

In the NIST RBAC model permissions can only be assigned to roles, not to users. A strict implementation of this aspect could easily be implemented by dropping the user check in our authorization test (which also means we can drop the user column in the grant table):

-- authorized if count>0
select count(*) from auth_grant where
    role in (select role from user_role where user={user}) and
    action={action} and
    object={object};

Alternatively, we could think of each user as automatically being assigned a unique role whose name is the same as the user name. Or, we can choose never to assign any permissions to a user, only assigning them to roles.

Role Activation

The NIST RBAC standard includes a concept called Role Activation (or Role Authorization). When a user logs in, some subset of his roles can be activated. Allowing a user to activate and deactivate his assigned roles gives the user a way to ensure that he (or some program he is running) does not perform a privileged operation when he is not expecting it. Permissions are only granted for active roles, so even if a user has been given permissions through a role, a program will not be able to take advantage of them unless the user has activated a role that grants those permissions.

We can implement role activation globally by adding an is_active column to the user_role table.

create table user_role(
    user varchar(32) not null,
    role varchar(32) not null,
    is_active boolean not null default false,
    constraint FK_userrole_user foreign key(user)
        references user(name),
    constraint FK_userrole_role foreign key(role)
        references role(name)
);

When checking for authorization, we only include roles that are active for that user. If we continue to allow user-based permissions, then we would need to add an is_active flag for those permissions as well. When using activation it is simpler to exclude user-based permissions, as is done in the NIST RBAC model.

-- authorized if count>0
select count(*) from auth_grant where
    role in (select role from user_role where user={user} and is_active) and
    action={action} and
    object={object};

The NIST RBAC standard uses session-based activation rather than global activation. This allows a user to have multiple sessions open simultaneously with different roles active for each session. To implement this, rather than adding an is_active column to the user_role table, we create a session table that keeps track of our sessions and a session_role table that lists the roles that are active for each session.

create table session(
    id varchar(32) primary key,
    user varchar(32) not null,
    constraint FK_session_user foreign key(user)
        references user(name)
);

create table session_role(
    session_id varchar(32) not null,
    role varchar(32) not null,
    constraint FK_sessionrole_sessionid foreign key(session_id)
        references session(id),
    constraint FK_sessionrole_role foreign key(role)
        references role(name)
);

When testing for authorization we only want to use roles that are both assigned (in the user_role table) and active (in the session_role table). Assuming the mechanism that maintains active roles in the session_role table ensures that the only roles appearing in that table are in the user_role table (i.e. only an assigned role from the user_role table can be active in the session_role table), then we can modify the authorization function to accept an additional argument which is the session_id, and change our implementation SQL:

-- authorized if count>0
select count(*) from auth_grant where
    role in (select role from session_role where user={user} and session_id={session_id}) and
    action={action} and
    object={object};

At this point our model includes the capabilities of RBAC0, the first level of the NIST RBAC standard (although the NIST model does not include action and object as presented above). However, in order to keep the discussion of the other aspects of the model less cluttered, I will generally not be including role activation in the remainder of this discussion except where noted.

Role Hierarchies

Given the ability to group users into roles and thus simplify the number of grants we need to create, we can generalize on that concept by also allowing roles to be grouped into other roles.

In the discussion of Roles above, we added a user_role table that allowed us to assign roles to users. We now add a role_hierarchy table with parent and child columns that allows us to assign roles (children) to other roles (parents).

create table role_hierarchy(
    parent varchar(32),
    child varchar(32),
    constraint FK_rolehierarchy_parent foreign key(parent)
        references role(name),
    constraint FK_rolehierarchy_child foreign key(child)
        references role(name)
);

When collecting the list of roles for a user, we now have to recursively consult the role_hierarchy table to collect all of the child roles for any role the user has. How this is actually done is heavily dependent on the implementation. Some SQL databases include the ability to formulate recursive queries, but most do not.

We hide this implementation detail inside a view that collects the closure of the role-role relationships, effectively flattening our hierarchy. Defining this flattening in a view allows us to change how we collect the closure of the roles without affecting the queries that invoke this view. In this particular example, our view is defined using a non-recursive query that will suffice for a hierarchy of limited depth.

-- not a full closure if the hierarchy is too deep
create view role_closure as
    select distinct user, a3.child as role from user_role
        join role_hierarchy as a1 on user_role.role=a1.parent or
            (user_role.user=a1.parent and user_role.role=a1.child)
        join role_hierarchy as a2 on a1.child=a2.parent or
            (user_role.user=a2.parent and user_role.role=a2.child)
        join role_hierarchy as a3 on a2.child=a3.parent or
            (user_role.user=a3.parent and user_role.role=a3.child)
    ;

We can now use the role_closure view in place of the user_role table:

-- authorized if count>0
select count(*) from auth_grant where
    (user={user} or role in (select role from role_closure where user={user})) and
    action={action} and
    object={object};

If we want to use session-based activation, we can do that by modifying our role_closure view to be based on the session_role table rather than the user_role table:

-- not a full closure if the hierarchy is too deep
create view role_closure as
    select distinct session_id, user, a3.child as role from session_role
        join role_hierarchy as a1 on session_role.role=a1.parent or
            (session_role.user=a1.parent and session_role.role=a1.child)
        join role_hierarchy as a2 on a1.child=a2.parent or
            (session_role.user=a2.parent and session_role.role=a2.child)
        join role_hierarchy as a3 on a2.child=a3.parent or
            (session_role.user=a3.parent and session_role.role=a3.child)
    ;

As above when adding session-based role activation, the authorization SQL includes the session-id and we no longer allow user-based permissions:

-- authorized if count>0
select count(*) from auth_grant where
    role in (select role from role_closure where user={user} and session_id={session_id}) and
    action={action} and
    object={object};

This change can be made in any of the authorization SQL statements given below to add session-based authorization where it is otherwise not included.

Note that although we stated that the role parent/child relationships form a hierarchy, there is actually no reason to limit it to that, and our design does not preclude defining role relationships that form a more complex graph. We do want to avoid cycles in our role graph, as a graph with cycles would not provide us any useful benefits, and we need to ensure that our implementation does not blow up if the role graph happens to have some cycles. If we use the role_closure view implementation provided above, an incidental benefit is that the closure mechanism is so simple and limited, cycles will not cause any problems other than wasting a bit of processing power.

The NIST RBAC standard defines both general and restricted forms of hierarchy as part of the RBAC1 level. The restricted form is a tree structure and the general form is an arbitrary partial order. Our model above support the general form.

NIST RBAC levels RBAC2 and RBAC3 add Constraints (to ensure support of Separation of Duties) and Symmetry (the ability to review permission-role assignments as well as user-role assignments). With the simple database implementation presented here, these capabilities are available.

Alternate Hierarchy Implementations

In the implementation of user roles and role hierarchies above we added a role table, a user_role table and a role_hierarchy table, we added a role column to the grant table, we added a role_closure view and we modified our example SQL select statement for checking authorization to use that view. In this section I present three alternate approaches to this step when using a relational database, and of course there are other approaches not discussed here that are not based on a relational database. These implementation alternatives do not affect the basic model being developed.

In the first alternate approach, after defining the role table we next define the user_or_role view that is the union of those two tables.

create view user_or_role as
    (select name from user)
    union all
    (select name from role)
;

In the grant table, rather than adding a role column and having the user column be a foreign key to the user table, we make the user column a foreign key to the user_or_role view. Unfortunately, it is typically not possible to declare a foreign key to a view, in which case this foreign key relationship would have to remain implicit and not enforced by the database (it could be part of our application-level database consistency checks). Nonetheless, the SQL statements that join using this foreign key will work the same as if the foreign key were declared, although performance may be an issue if the user_or_role view can not be indexed. By using a materialized view it might be possible to index the view and have a foreign key refer to it, but then we would need to deal with rematerializing the view every time we changed the contents of the user or role tables.

Instead of creating a role_hierarchy table, we do the same thing to the user column of the user_role table as we did to the grant table, making it a foreign key to the user_or_role view rather than to the user table. This allows the user_role table to represent which roles have other roles as well as which roles users have directly been given.

In our second alternate implementation, we start by defining user_or_role as a table that contains the records for both users and roles, with an is_role column that indicates whether a row represents a user or a role. We then create user and role as appropriate views into that table.

create table user_or_role (
    name varchar(32) primary key,
    is_role boolean not null default false
);

create view user as
    select name from user_or_role where not is_role;

create view role as
    select name from user_or_role where is_role;

As in our first alternate implementation, the grant table points to the user_or_role table, as does the user column in the user_role table.

create table auth_grant(
    user_or_role varchar(32) not null,
    action varchar(32) not null,
    object varchar(32) not null,
    constraint FK_grant_user foreign key(user_or_role)
        references user_or_role(name),
    constraint FK_grant_action foreign key(action)
        references action(name),
    constraint FK_grant_object foreign key(object)
        references object(name)
);

create table user_role(
    user varchar(32) not null,
    role varchar(32) not null,
    constraint FK_userrole_user foreign key(user)
        references user_or_role(name),
    constraint FK_userrole_role foreign key(role)
        references role(name)
);

Many databases, including MySQL, do not allow indexes or foreign keys on views, so neither of the above two alternate implementations will work very well on those databases, and the table statements would have to be modified not to declare foreign keys to view columns.

If we want to use indexes and foreign keys, we have to compromise our data model a bit and not use views when we need foreign keys, which leads us to our final alternative.

In our third alternate implementation, we don't have a separate role table or view. Instead, we use the user_or_role approach as in the second alternative above: we place the role names into the user table and add an is_role column that indicates whether a row represents a user or a role.

create table user (
    name varchar(32) primary key,
    is_role boolean not null default false
);

In our user_role table, in which the role column was a foreign key to the role table, we make that column instead be a foreign key to the user table, where we are now storing our role names.

create table user_role(
    user varchar(32) not null,
    role varchar(32) not null,
    constraint FK_userrole_user foreign key(user)
        references user(name),
    constraint FK_userrole_role foreign key(role)
        references user(name)
);

We don't need a role_hierarchy table because we can now represent those role-to-role relationships in the user_role table. In our role_closure view we replace the role_hierarchy references with user_role references.

create view role_closure as
    select distinct a0.user, a3.role from user_role as a0
        join user_role as a1 on a0.role=a1.user or
            (a0.user=a1.user and a0.role=a1.role)
        join user_role as a2 on a1.role=a2.user or
            (a0.user=a2.user and a0.role=a2.role)
        join user_role as a3 on a2.role=a3.user or
            (a0.user=a3.user and a0.role=a3.role)
    ;

Because we are now storing our roles in the user table, the user column in our grant table can refer to either a user or a role, depending on what we are storing in the user table, so we don't need the role column and we can go back to the previous definition that did not have that column.

With this implementation our foreign key constraints all work because we are not dealing with any views, and our table structure is simpler because we have combined users and roles into one table. Although we are putting roles into the user table, we do need to remember that this is just a convenient fiction to simplify our implementation because there are some situations in which we want to treat users and roles the same. But we must remember that, although we are storing them in the same table and in some situations ignoring the difference between them, if we forget about that difference and start treating them the same in other situations we can easily start getting absurd behavior from our system.

(I have a mental image of our legal system as having a people table, and a law table with a foreign key to the people table. At some early point, someone wanted some laws that applied to corporations as well as people, so they said, "I know, let's just add an is_corporation flag to the people table and put the corporations in there, then our foreign keys from the law table will still work and we won't need to add a bunch more structure to our law schema!" With the passage of time, law programmers who should have been paying attention to the is_corporation flag started ignoring it more and more often, until finally the law programmers were saying, "Well, those corporations are in the people table, so they must be people." If you are concerned that this kind of situation might happen to you, you might not want to put roles into the user table.)

For the remainder of this discussion, we will use this third alternate implementation approach.

Interlude

In the above discussions, I have been assuming that the names of users, actions, objects and roles are also their key values. This implies that each of those names are unique. Given that I have discussed a couple of implementations in which users and roles have been mixed together, you might wonder whether it would cause problems to add a user whose name is the same as a role. In the above simple implementation the answer is "yes", and the system would have to disallow that. A real system is likely to be a bit more complex, using unique IDs as primary keys rather than names. The problem of having unique names thus gets moved from a database issue to an application-level issue. The system implementer must decide under what circumstances it is acceptable to have duplicate names, and there must be a way to distinguish those duplicates to someone operating the system.

We have reached a point in the development of our authorization model that is similar in power to many existing systems. People who need more flexibility than this model provides might diverge at this point into custom authorization systems with various forms of exceptions and extensions that rapidly start adding complexity to the model.

There are still a number of extensions we can make to our authorization model that will improves its power while adding only a small amount to the cognitive load of understanding how it all works. Let's get back to our model and add some more power to it.

Tasks

In the same way that we allow specifying a group of users having a role, we add the ability to specify a group of actions, which we call a task. The relation between tasks and actions is exactly analogous to the relation between users and roles. Each action can be assigned to multiple tasks, a task can be assigned other tasks, and an authorization grant can refer either to an action or to a task.

Analogous with our second alternative implementation above, in which we added an is_role column to the user table and put roles into the user table, for the equivalent addition of tasks we add an is_task column to the action table, add an action_task table with columns action and task both being foreign key references to the action table, and add a task_closure view.

create table action(
    name varchar(32) primary key,
    is_task boolean not null
);

create table action_task(
    action varchar(32) not null,
    task varchar(32) not null,
    constraint FK_actiontask_action foreign key(action)
        references action(name),
    constraint FK_actiontask_task foreign key(task)
        references action(name)
);

create view task_closure as
    select action, a3.task as task from action_task as a0
        join action_task as a1 on a0.task=a1.action or
            (a0.action=a1.action and a0.task=a1.task)
        join action_task as a2 on a1.task=a2.action or
            (a0.action=a2.action and a0.task=a2.task)
        join action_task as a3 on a2.task=a3.action or
            (a0.action=a3.action and a0.task=a3.task)
    ;

We expand our authorization query to look for tasks in the same way as we expanded it to handle roles, with the same caveats about hierarchy depth.

-- authorized if count>0
select count(*) from auth_grant where
    (user={user} or user in (select role from role_closure where user={user})) and
    (action={action} or action in (select task from task_closure where action={action})) and
    object={object};

Domains

Roles and tasks give us the ability to group users and actions. We complete the pattern by adding the ability to group objects into groups that we call domains (not to be confused with internet domain names). As with the tasks example above, we add the is_domain column to the object table, create the object_domain table to allow defining groups of objects, create the domain_closure view, and modify the authorization function to check for either objects or domains in the same way as we modified it to check for either actions or tasks. All of these steps are exactly analogous to what we did when we added tasks.

Intermediate Summary

Let's take stock of what our model looks like:

There are three dimensions: user, action, and object.
The handling of the three dimensions is completely symmetric (unless role activation is being used, in which case the user dimension has that extra wrinkle).
The application passes those three values to the authorization function, which returns true if that operation is authorized, false if not.
For each dimension, there is a grouping mechanism: role for user group, task for action group, domain for object group.
The grouping mechanism supports a hierarchy of groups, or more generally a (directed acyclic) graph of groups (a partial ordering).
To determine if a request should be authorized, take each dimension, collect the closure of the groups for that dimension, and look for a grant in which each dimension of the grant matches any of the items in the closure for that dimension.

The model presented above is easy to understand, but despite its simplicity it is quite powerful. Yet it does not suffice for everyone. Let's see how we can continue to enhance it's power without significantly increasing its complexity.

Times, Periods and Schedules

In some systems it is desirable to allow some operations only at specified times. For example, one might want to allow users to log in to the system only during their work shift.

We define another dimension, the time dimension, and we define a time range as a period, where a period is an interval of time such as 8AM to 5PM, or Sunday, or 8AM to 5PM on weekdays. We add the time dimension to our definition of an operation, so when the application calls the authorization function, it must now pass the current time as a fourth argument.

The dimensions we have defined previously are all discrete dimensions, with only one matching value for each definition. The time dimension is different in that it is a continuous dimension: there are multiple time values that can match a period. This makes the authorization function a little more difficult to write, but it does not add much complexity to the user's conceptual model.

The other dimensions all have groups, so it would not add to the complexity of the model to add groups of periods. In fact, the model would be more complex if we did not add groups of periods, as that would make this dimension different from all the others in that aspect, which would be an additional detail that the user would have to factor into his mental model.

We add a group called schedule. As with all the other groups, a period can be included in any number of schedules, and schedules can contain other schedules. When checking authorization, we collect all the periods that match the current time and the closure of all the schedules for those periods, and we search for grants that include any of those in the period column.

Locations, Areas and Regions

By now the pattern should be pretty clear. If the system requires other dimensions, they are easy to add by following the same pattern. By keeping to the pattern, the complexity of the model that the user must work with to understand the system is kept low, even when there is some small difference for the new dimension, as there was for the time dimension when compared to the three previously defined dimensions. When there are small model extensions for a dimension, as there was when we added the time dimension, we can leverage that model concept when adding some other dimension.

Location is a system-specific concept. For some systems it might be a logical location, such as "console", "secure terminal", or "dial up". Since these are discrete values, it would suffice to have a location table, group locations in regions, and handle it in the same manner as the other discrete dimensions such as user.

For other systems a location might mean a physical location specified by one or more continuous values, such as latitude and longitude, in which case we define an area analogously to a period, where one area includes a range of locations. The area might be defined with a center point and radius, it might be defined with a bounding box, it might be defined as a polygon, using splines, or in some other even more complex way. As with periods, the complexity of the definition of an area has an effect on the difficulty of implementing the authorization function that has to determine whether a location is or is not in an area, but has little effect on the complexity of the user's mental model of the authorization. For the user, it is sufficient to know that a given location will be either contained in or not contained in an area, and that grants are based on areas.

Our group for an area is a region, and it groups together areas and other regions in the same way as the groups in the other dimensions.

Denials

The approach described above is essentially a "whitelist" approach, which is the standard approach to authorization. If an operation is listed in the grant table then it is allowed; any operation which is not listed is not allowed.

It is also possible to use a "blacklist" approach: rather than allowing what is listed and denying everything else, we can deny what is listed and allow everything else. In this case we would create a denial table that is exactly like the grant table except that it contains operations to be denied rather than operations to be allowed. The authorization function would do the same search as before, except that it would deny the operation if any matching records were found, and allow the operation otherwise.

Using a blacklist approach to authorization as just described is generally not recommended (in fact the NIST RBAC standard specifically recommends against "Negative permissions", although it does not outright disallow them). Since the default action is to allow an operation, if a new operation is added to the system and through oversight the appropriate denials are not added, then there is no protection for the new operations.

Exceptions

We can combine the original grant approach and the denial approach described just above to give us the ability to have both a whitelist and a blacklist. We start with our original grant table approach, following the recommended position that the default is to deny any operation unless it is explicitly granted; on top of that, we add the denial table as exceptions to the grants.

Our authorization function first looks in the denial table; if a matching record is found, then the request is denied. If no matching record is found, then the function looks in the grant table; if a matching record is found, then the request is granted; otherwise it is denied.

This allows the admin to think in terms of exceptions: grant privileges to all of X, except for Y. In some situations this allows expressing the intended grants more simply than if one is restricted to just additive grants.

We could also flip the grant and denial tables around, first looking in the grant table for a match, then looking in the denial table for a match, then granting if nothing is found. As discussed in the previous section, this is not recommended, but understanding that it is possible is conceptually useful, and leads us to our last enhancement.

Prioritization

The structure of the grant and denial tables are identical, and their contents are checked in the same way, with the only difference being an inversion of the interpretation of the results in one case as compared to the other. We can easily combine both of these tables into a single auth table that includes an additional allow column that is true for all records from the grant table and false for all the records from the denial table. We can also add a priority column that we use to determine which records we should attend to first.

create table auth(
    id integer auto_increment,
    allow boolean not null default true,
    priority integer not null default 0, -- higher values take precedence
    user varchar(32) not null,
    action varchar(32) not null,
    object varchar(32) not null,
    period varchar(32) not null,
    area varchar(32) not null,
    constraint FK_auth_user foreign key(user)
        references user(name),
    constraint FK_auth_action foreign key(action)
        references action(name),
    constraint FK_auth_object foreign key(object)
        references object(name),
    constraint FK_auth_period foreign key(period)
        references period(name),
    constraint FK_auth_area foreign key(area)
        references area(name)
);

If we define the priority value such that higher values are more important than lower values, then we can get the same behavior as described in the first part of the previous section by setting the priority on all the denial records to 2 and setting the priority on all the grant records to 1. Our authorization function then looks in the auth table for the matching record with the highest priority value and looks at the allow value for that record.

If we wanted to get the (non-recommended) behavior as described at the end of the previous section, we could do that by setting the priority of all the grant records to 2 and setting the priority of all the denial records to 1, plus making the default behavior (when no matching rows are found) to allow the operation.

Given this structure, we can of course put in records with any priority value. This allows building up a series of toggling exceptions, much as the way leap years in the Gregorian calendar are defined (each year has 365 days, except every 4th year is a leap year with 366 days, except every 100 years is not a leap year, except every 400 years is a leap year).

Since we can stack up alternating grant and denial records, the only distinction between the "whitelist" and "blacklist" approaches discussed earlier is the question of what the default is when no matching records are found in the auth table (the default for whitelisting is deny, the default for blacklisting is grant). Given that using a default of allow is not recommended, we define the system to use a default of deny, but we provide a way that the system can effectively be set up with a default of allow if desired.

To simulate a default of allow, the admin can create a group for each of the dimensions in our authorization model (user, action, etc) that includes all elements of that dimension. Thus there would be an AllUsers role, an AllActions task, an AllObjects domain, etc. The admin then creates a rule that includes all of these groups with allow set to true and priority set to zero. Since the rule has been defined to include all elements of every dimension, it will always match every operation, so there will never be a case where there are no matches and the system default of deny is used. Assuming all other priority values are greater than zero, this rule will be the lowest priority, so it will only have an effect if there are no other matches, and thus it acts as the default.

As described above, there is one more potential ambiguity to resolve: what happens if there are two rules with the same priority but opposite allow values? (Two rules with the same priority and the same allow are not a problem, as they both give the same result.) We resolve this ambiguity by defining the denial records to take precedence over the grant records when they have the same priority value. This definition reduces nicely to the desired behavior for the simplest denial+grant case when all records have the same priority.

Our authorization function thus looks for all matching records in the auth table, sorts first by priority then by allow, picks the first one, and uses its allow value to determine whether to allow the operation. If no matching records are found, the operation is not allowed.

Ignoring for now the more complicated portions of the WHERE clause for selecting time and location, here is our SQL statement for determining if an operation is authorized:

-- The single selected value is true if authorized; if false or no records, not authorized
select allow from auth_grant where
    (user={user} or user in (select role from role_closure where user={user})) and
    (action={action} or action in (select task from task_closure where action={action})) and
    (object={object} or object in (select domain from domain_closure where object={object}))
    order by priority desc, allow asc
    limit 1

Adding prioritization like this adds a new concept to the authorization model, but provides a good amount of additional power relative to the additional mental load to understand the model. However, creating well-structured rules using prioritization is trickier that it seems at first glance. It has the same essential problem as for the blacklist approach described above: mistakes in setting up the conceptual layers of the different levels of prioritization can result in unexpected security holes. If you can figure out how to set up your authorizations using grants only, without denials, you should do that. But if the grant-only model is not sufficient, then adding prioritization as described in this section is a reasonable way to take the model to the next level of power - just remember that you have to be more careful in how you set up your rules.

Summary

With the addition of prioritization in the previous section, our authorization model is complete. Let's review the complete model.

There are two kinds of dimensions: discrete and continuous.
There are five dimensions: user, action, object, time and location.
User, action and object are discrete; time is continuous; location can be either discrete or continuous, depending on how the system defines it.
Additional dimensions can be added if necessary, following the pattern of the existing dimensions.
The handling of every discrete dimension is completely symmetrical with every other discrete dimension (unless session-based role activation is included, in which case the user dimension is a little different); the handling of each continuous dimension is close to completely symmetrical with the other continuous dimensions; and there is a high level of symmetry between the discrete and the continuous dimensions.
The application passes a value for each dimension to the authorization function. This collection of dimension values is the operation for which the application is requesting authorization. The authorization function returns true if that operation is authorized, false if not.
For each continuous dimensions, there is a range defined as the basic match: period for time, area for location.
For each dimension, there is a grouping mechanism: role for user group, task for action group, domain for object group, schedule for period group, region for area or location group.
The grouping mechanism supports a hierarchy of groups, or more generally a (directed acyclic) graph of groups.
There is a set of rules that is used to determine whether an operation is authorized. Each rule includes a set of comparison values, one for each dimension, a priority, and an allow flag that tells whether that rule specifies that authorization for a matching operation should be granted or denied.
To determine if a request is authorized, take the value for each dimension in the request, collect the closure of the groups for that value, and collect the records in which each dimension of the grant matches any of the items in the closure for that dimension. Pick the record with the highest priority, giving preference to deny records over grant records, and use the allow value of that record to determine whether to authorize or deny the operation. If no matching records are found, the operation is denied.

This conceptual model is no longer trivial, but the above rules are still relatively concise and easy to understand. The model is general enough and powerful enough that it should be suitable for a wide variety of applications.

In our model the application passes in a set of values to the authorization function, which uses its abstractions (in the form of groups) and rules (in the form of prioritization) to determine whether or not to grant permission for an operation. If we need more power, the application can pass in additional information, whether it is additional attribute information about the user, the environment, or other aspects of the operation, and the authorization system can apply even more complex rules. This is the approach used by Attribute-Based Access Control, with a rules engine used in place of the mechanisms described here.

Git Rebase Across Many Commits

2012-04-30T22:40:00.000-07:00

Not all git merge conflicts are real.

The Scenario
The Problem
The Solution

The Scenario

In both my personal and my work projects I prefer to use git rebase to keep my commit histories simple and readable. To make this work in a team setting, we never work on the master branch, instead always working on a feature branch in our local repositories. Our process flow looks something like this:

$ git branch feature           #create the working branch
$ git checkout feature          #do all development work on that branch
#Edit files, etc.
$ git commit -m "Implement Feature"
#Repeat the above as desired during development.
#When ready to merge to master, do the following:
$ git checkout master
$ git pull                      #update master from shared repository
$ git checkout feature
$ git rebase master             #optionally with -i if squashing is desired
$ git checkout master
$ git merge feature
$ git push origin master
$ git branch -d feature

Because we never use our local master branch for development, the git pull on master is always a fast-forward merge. Likewise, because we have just rebased the feature branch against the master right before we merge that feature branch back into master, that merge is also always a fast-forward merge. Looking at it another way, we don't have any merge conflicts when updating or merging master because we resolve all of the merge conflicts when we rebase the feature branch against the latest master.

The Problem

At work, we have a large codebase and a handful of active developers who typically merge feature branches to the master using the above workflow multiple times each day. Sometimes somebody has a feature branch that takes a long time to finish, so that between the time that branch was started and the time it is ready to go into master, there may have been 40 or 50 other commits made to master. In general in this situation we will occasionally rebase our local feature branch against the latest master a few times during feature development, but inevitably there are occasions when a large rebase across many commits ends up being done.

Even if there are many commits on the master branch, if none of those commits touched any of the same code as the commits on the feature branch, then there should be no merge conflicts when rebasing the feature branch against the updated main branch. However, in my experience this has not always been the case. Sometimes git rebase reports merge conflicts when I think there should not be any. Since I don't generally know exactly what code the other team members have edited, I can't immediately tell if the merge conflicts make sense.

The normal advice for how to handle merge conflicts is to edit the named file, look for the conflict markers, inspect the conflicting code fragments, determine what to keep, edit out what is not being kept along with the conflict markers, git add the repaired file, and git rebase --continue to let it tell you about the next merge conflict.

That's a lot of work, and it might all be completely unnecessary.

The Solution

It seems that git sometimes just gets confused when doing a rebase across a large number of commits. Sometimes if you rebase in smaller steps, git will happily rebase each smaller step with no merge conflicts, until you have stepped all the way up to the latest master, at which point your rebase is done.

You could rebase against every single commit and work your way up to master, but that, too, is a lot of work. Here's what I do when the initial rebase of the feature branch against the latest master tells me there are merge conflicts.

When the initial git rebase reports a merge conflict, I immediately do git rebase --abort to undo that rebase attempt. Using gitk --all to view the commit tree, which lets me see the master branch and the commit at which my feature branch branches off the master branch, I select a commit on the master branch about half way between those two commits. I copy the commit ID and paste it into a rebase command that looks something like this:

$ git rebase 8bc85584989e4435c2d98b13447bcab37648ba7f

If this rebase reports no merge conflicts, then I try rebasing against master and repeat the process.

If there are merge conflicts, then I abort the rebase and pick another commit half way again to the branch point. I repeat this until either the rebase succeeds or I am trying to rebase across a single commit. At that point, if there are still merge conflicts, they are real and I address them in the normal way. Since the conflict is only across a single commit, it is easier to see the cause of the conflict and to resolve it.

After resolving the conflict across that one commit, I go back to the first step and try rebasing against master again, repeating the process.

I have followed this process a number of times. I think that a majority of these times I binary-divide my commits a few times and end up piecemeal stepping through the commits until I have rebased against master without ever having to resolve any conflicts. The other times I typically have to resolve one or two small conflicts, after which I can rebase against master.

The next time you do a rebase across more than one commit and git tells you there are merge conflicts, try this approach. You might save yourself a lot of work.

Levels of Expertise

2011-12-08T21:36:00.001-08:00

An attempt to improve the objectivity of skill self-ratings.

Discussion
Scale
References

Discussion

We are often asked to rate things on a scale, typically 1 to 5 or 1 to 10. Rarely is there an attempt to define what those different numbers mean. From a statistician's point of view, this makes the values useful for the sole purpose of comparing a single individual's ratings against other ratings of that individual. In particular, without a good definition of what the various levels mean, I don't see how there can be any effective communication from one person to another of the meaning of such a rating.

When my doctor asks me to tell him how much something hurts on a scale of 1 to 10, I have no idea what information he expects to get when I say "3" or "7".

I once asked an acquaintance to rate, on a scale of 1 (bad) to 10 (good), a movie he had just seen. He said it was a 9. I was suspicious of this answer, so I asked him how he would rate Star Wars, which I knew to be his all-time favorite movie, on the same 1-to-10 scale. He said 12.

I personally consider it an aspect of innumeracy, but people often try to emphasize something by using numbers that are outside of the valid range. We may chuckle when Nigel says he likes his amp better because it goes to 11, but how often have you heard someone talking in all seriousness about putting in a "110% effort"? What does that actually mean? How would you know if someone were putting in 110% versus 100%? If 110% is a valid number, then presumably so is 120%, so anyone suggesting a mere 110% is clearly not asking for enough effort.

People tend to overestimate how good they are at all sorts of things, including cognitive, social and physical skills. If we all overrate ourselves by the same amount, I suppose that could all cancel out and you could still compare people's ratings - but without knowing a priori what their ratings should be, we don't know how much they might be overrating themselves.

When people consider their own expertise, it is common for those with less expertise to overvalue themselves more than people with more expertise. With more expertise comes more awareness of what one could do better. Einstein said, "As our circle of knowledge expands, so does the circumference of darkness surrounding it." Relative beginners easily fall into the Sophomore Illusion of thinking they know a lot because the circumference of their knowledge is not yet large enough for them to recognize the size of the surrounding darkness.

In 1989, psychologist John Hayes at Carnegie Mellon University identified what is now called the "ten-year rule" (although there are earlier commenters, including Herbert Simon, who was also at CMU). As Leonard Mlodinow says in "The Drunkard's Walk", "Experts often speak of the 'ten-year rule,' meaning that it takes at least a decade of hard work, patience and striving to become highly successful in most endeavors." (links mine) The ten-year rule is related to the idea that it takes about 10,000 hours of practice at something to become an expert; with 5 hours of practice per business day and 200 business days per year, it would take ten years to rack up that many hours. If you find yourself thinking how wonderfully expert you are in something that you have practiced for only a few years, perhaps you should consider the ten-year rule and temper your evaluation.

Given that people are so bad at these ratings, it seems to me that the only way to get any useful information from someone when asking this kind of self-rating question is to have an objective definition of what each level means.

One way to think about a scale is by how many people fall into each level. There are currently 7 billion people in the world, or almost 10 to the 10th power. This conveniently maps to a logarithmic scale from 0 to 10, allowing us to define eleven levels starting with level 0 containing all approximately 10 billion people in the world and with each higher level having one tenth the number of people as the level just below it. If the descriptions of a level are hard to interpret, perhaps the size of that level will help give an indication of whether a person should be rated there.

Years ago, during a job interview, I was asked to rate my level of expertise in various subjects, such as programming languages and development tools. This was not an unusual question, I had been asked this question before and have been asked it since. What was different that time was that the interviewer included a scale with some relatively objective descriptions for determining level of expertise. I rather liked the scale, so although I don't recall the exact definition of his levels, I have tried to reproduce that concept here, using descriptions somewhat similar to those given by that interviewer. Unfortunately, I don't remember who introduced that scale to me, so I am unable to give credit.

There are many reasons one might want a scale of expertise, including rating potential employees or creating a summary of the amount of expertise within a company. The scale I present here is intended to be very general; given its logarithmic nature that can include the entire world population, it is capable of allowing comparison of expertise across everyone in the world. You might think that would make it suboptimal for rating (potential) employee expertise, but I think there are enough levels to make it useful for that purpose.

Scale

The scale below includes the following columns:

Level: a number for the level, from 0 to 10, with 10 being the highest level of expertise.
Name: a name for the level. These are taken from a set of expertise level names proposed by the Traveling School of Life. My use of them probably doesn't quite match their intent, but I liked the names and thought the ten words matched my levels pretty well, so I applied them to my levels and added "ignorant" for level 0.
Description: a brief description of the level. The descriptions are worded as if for a technical tool; for application to other areas or concepts, modify accordingly. Comments referring to companies assume a large company (10,000+ people) with large divisions (1000+ people); being a company-wide guru in a company with 100 people might not get you past level 6.
Size: the approximate number of people expected to be at that level worldwide. As mentioned above, this is a simple logarithmic scale. The number of people in a level is 10^10-L where L is the level number.
Practice: the approximate amount of practice that could be required to reach that level of expertise. Putting in that many hours does not guarantee reaching that level, and reaching that level does not necessarily require putting in that many hours. The conversion factors are 1,000 hours per year or 5 hours per day.

All of these different factors are rough estimates, not intended as absolutes but merely as guidelines to help people rank themselves in a way that allows for more meaningful results. I don't have any research to show how well my guesses about Description, Size and Practice correlate; if anyone knows of something along those lines, that would be interesting.

Level	Name	Description	Size	Practice
0	ignorant	I have never heard of it.	10,000,000,000	none
1	interested	I have heard a little about it, but don't know much.	1,000,000,000	1 hour
2	pursuing	I have read an article or two about it and understand the basics of what it is, but nothing in depth.	100,000,000	1 day (5 hours)
3	beginner	I have read an in-depth article, primer, or how-to book, and/or have played with it a bit.	10,000,000	1 week (25 hours)
4	apprentice	I have used it for at least a few months and have successfully completed a small project using it.	1,000,000	3 months (250 hours)
5	intermediate	I have used it for a year or more on a daily or regular basis, and am comfortable using it in moderately complex projects.	100,000	1 year (1,000 hours)
6	advanced	I have been using it for many years, know all of the basic aspects, and am comfortable using it as a key element in complex projects. People in my group come to me with their questions.	10,000	5 years (5,000 hours)
7	accomplished	I am a local expert, with ten or more years of solid experience. People in my division come to me with their questions.	1,000	10 years (10,000 hours)
8	master	I am a company-wide guru with twenty or more years of experience; people from other divisions come to me with their questions.	100	20 years (20,000 hours)
9	grandmaster	I am a recognized international authority on it.	10	30 years (30,000 hours)
10	great-grandmaster	I created it, and am the number 1 expert in the world.	1	50 years (50,000 hours)

References

Other scales of expertise:

Ted Neward describes four levels in his post of August 16: Apprentice, Journeyman, Master, Adept.
The Dreyfus model of skill acquisition: Novice, (Advanced) Beginner, Competent, Proficient, Expert.
Paul Schempp's take on Dreyfus's five levels, with "Capable" rather than "Advanced Beginner".
The Four Stages of Competence of Thomas Gordon: Unconscious Incompetence, Conscious Incompetence, Conscious Competence, Unconscious Competence. And how they might apply to programming.

Debugging Scala Parser Combinators

2011-07-28T15:36:00.000-07:00

Two simple mechanisms for debugging parsers written using Scala's parser combinators.

Introduction
Example Parser
Calling Individual Parsers
Tracing
Updated Example

Introduction

In a recent comment on my 2008 blog post about Scala's parser combinators, a reader asked how one might go about debugging such a parser. As one post says, "Debugging a parser implemented with the help of a combinator library has its special challenges." You may have trouble setting breakpoints, and stack traces can be difficult to interpret.

The two techniques I show here may not provide you with the kind of visibility you might be used to when single-stepping through problem code, but I hope they provide at least a little more visibility than you might otherwise have.

Example Parser

As an example parser I will use an integer-only version of the four-function arithmetic parser I built for my 2008 parser combinator post. The code consists of a set of case classes to represent the parsed results and a parser class that contains the parsing rules and a few helper methods. You can copy this code into a file and either compile it or load it into the Scala REPL.

import scala.util.parsing.combinator.syntactical.StandardTokenParsers

sealed abstract class Expr {
    def eval():Int
}

case class EConst(value:Int) extends Expr {
    def eval():Int = value
}

case class EAdd(left:Expr, right:Expr) extends Expr {
    def eval():Int = left.eval + right.eval
}

case class ESub(left:Expr, right:Expr) extends Expr {
    def eval():Int = left.eval - right.eval
}

case class EMul(left:Expr, right:Expr) extends Expr {
    def eval():Int = left.eval * right.eval
}

case class EDiv(left:Expr, right:Expr) extends Expr {
    def eval():Int = left.eval / right.eval
}

case class EUMinus(e:Expr) extends Expr {
    def eval():Int = -e.eval
}

object ExprParser extends StandardTokenParsers {
    lexical.delimiters ++= List("+","-","*","/","(",")")

    def value = numericLit ^^ { s => EConst(s.toInt) }

    def parens:Parser[Expr] = "(" ~> expr <~ ")"

    def unaryMinus:Parser[EUMinus] = "-" ~> term ^^ { EUMinus(_) }

    def term = ( value |  parens | unaryMinus )

    def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = {
        level match {
            case 1 =>
                "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } |
                "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) }
            case 2 =>
                "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } |
                "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) }
            case _ => throw new RuntimeException("bad precedence level "+level)
        }
    }
    val minPrec = 1
    val maxPrec = 2

    def binary(level:Int):Parser[Expr] =
        if (level>maxPrec) term
        else binary(level+1) * binaryOp(level)

    def expr = ( binary(minPrec) | term )

    def parse(s:String) = {
        val tokens = new lexical.Scanner(s)
        phrase(expr)(tokens)
    }

    def apply(s:String):Expr = {
        parse(s) match {
            case Success(tree, _) => tree
            case e: NoSuccess =>
                   throw new IllegalArgumentException("Bad syntax: "+s)
        }
    }

    def test(exprstr: String) = {
        parse(exprstr) match {
            case Success(tree, _) =>
                println("Tree: "+tree)
                val v = tree.eval()
                println("Eval: "+v)
            case e: NoSuccess => Console.err.println(e)
        }
    }
    
    //A main method for testing
    def main(args: Array[String]) = test(args(0))
}

In the ExprParser class, the lines up to and including the definition of the expr method define the parsing rules, whereas the methods from parse onwards are helper methods.

Calling Individual Parsers

In our example parser we can easily ask it to parse a string by calling our ExprParser.test method, which parses the string using our parse method, prints the resulting parse, and (if the parse was successful) evaluates the parse tree and prints that value.

The last line of parse parses a string using our expression parser:

phrase(expr)(tokens)

phrase is a method in StandardTokenParsers that parses an input stream using the specified parser. The only thing special about our expr method is that we happen to have selected it as our top-level parser - but we could just as easily have picked one of our other parsers as our top-level parser.

Let's add another version of the test method that lets us specify which parser to use as the top-level parser. We want to print out the results in the same way as for the existing test method, so we first refactor that existing method:

def test(exprstr: String) =
        printParseResult(parse(exprstr))

    def printParseResult(pr:ParseResult[Expr]) = {
        pr match {
            case Success(tree, _) =>
                println("Tree: "+tree)
                val v = tree.eval()
                println("Eval: "+v)
            case e: NoSuccess => Console.err.println(e)
        }
    }

Now we add a new parse method that accepts a parser as an argument, and we call that from our new test method:

def parse(p:Parser[Expr], s:String) = {
        val tokens = new lexical.Scanner(s)
        phrase(p)(tokens)
    }

    def test(p:Parser[Expr], exprstr: String) =
        printParseResult(parse(p,exprstr))

We can run the Scala REPL, load our modified file using the ":load" command, then manually call the top-level parser by calling our test method. To reduce typing, we import everything from ExprParser. In the examples below, text in bold is what we type, the rest is printed by the REPL.

scala> import ExprParser._
import ExprParser._

scala> test("1+2")
Tree: EAdd(EConst(1),EConst(2))
Eval: 3

scala> test("1+2*3")
Tree: EAdd(EConst(1),EMul(EConst(2),EConst(3)))
Eval: 7

scala> test("(1+2)*3")
Tree: EMul(EAdd(EConst(1),EConst(2)),EConst(3))
Eval: 9

We can also call the test method that takes a parser as an argument, allowing us to specifically test one particular parsing rule at a time. If we pass in expr as the parser, we will get the same results as above; but if we pass in a different parser, we may get different results.

scala> test(expr,"1+2*3")
Tree: EAdd(EConst(1),EMul(EConst(2),EConst(3)))
Eval: 7

scala> test(binary(1),"1+2*3")
Tree: EAdd(EConst(1),EMul(EConst(2),EConst(3)))
Eval: 7

scala> test(binary(2),"1+2*3")
[1.2] failure: ``/'' expected but `+' found

1+2*3
 ^

scala> test(parens,"1+2")
[1.1] failure: ``('' expected but 1 found

1+2
^

scala> test(parens,"(1+2)")
Tree: EAdd(EConst(1),EConst(2))
Eval: 3

scala> test(parens,"(1+2)*3")
[1.6] failure: end of input expected

(1+2)*3
     ^

Tracing

If you have a larger parser that is not behaving and you are not quite sure where the problem lies, it can be tedious to directly call individual parsers until you find which one is misbehaving. Being able to trace the progress of the whole parser running on an input known to cause the problem might be helpful, but sprinkling println statements throughout your parser can be tricky. This section provides an approach that allows you to do some tracing with minimal changes to your code. The output can get pretty verbose, but at least this will give you a starting point from which you may be able to devise your own improved debugging.

The idea behind this approach is to wrap some or all of the individual parsers in a debugging parser that delegates its apply action to the wrapper parser, but that prints out some debugging information. The apply action is called during the act of parsing.

Note: this code relies on the fact that the code for the various combinators in the Parser class in Scala's StandardTokenParsers (which is implemented as an inner class in scala.util.parsing.combinator.Parsers) does not override any Parser method other than apply.

This code could be added directly to the ExprParser class, but it is presented here as a separate class to make it easier to reuse. Add this DebugStandardTokenParsers class to the file containing ExprParsers.

trait DebugStandardTokenParsers extends StandardTokenParsers {
    class Wrap[+T](name:String,parser:Parser[T]) extends Parser[T] {
        def apply(in: Input): ParseResult[T] = {
            val first = in.first
            val pos = in.pos
            val offset = in.offset
            val t = parser.apply(in)
            println(name+".apply for token "+first+
                    " at position "+pos+" offset "+offset+" returns "+t)
            t
        }
    }
}

The Wrap class provides the hook into the apply method that we need in order to print out our trace information as the parser runs. Once this class is in place, we modify ExprParser to inherit from it rather than from StandardTokenParsers:

object ExprParser extends DebugStandardTokenParsers { ... }

So far we have not changed the behavior of the parser, since we have not yet wired in the Wrap class. To do so, we can take any of the existing parsers and wrap it in a new Wrap. For example, with the top-level expr parser we could do this, with the added code highlighted in bold:

def expr = new Wrap("expr", ( binary(minPrec) | term ) )

We can make this a bit easier to edit and read by using implicits. In DebugStandardTokenParsers we add this method:

implicit def toWrapped(name:String) = new {
        def !!![T](p:Parser[T]) = new Wrap(name,p)
    }

Now we can wrap our expr method like this:

def expr = "expr" !!! ( binary(minPrec) | term )

If you don't like using !!! as an operator, you are free to pick something more to your taste, or you can leave out the implicit and just use the new Wrap approach.

At this point you must modify your source code by adding the above syntax to each parsing rule that you want to trace. You can go through and do them all, or you can just pick out the ones you think are the most likely culprits and wrap those. Note that you can wrap any parser this way, including those that appear as pieces in the middle of other parsers. The following example shows how some of the parsers in the term and binaryOp methods can be wrapped:

    def term = "term" !!! ( value |  "term-parens" !!! parens | unaryMinus )

    def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = {
        level match {
            case 1 =>
                "add" !!! "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } |
                "sub" !!! "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) }
            case 2 =>
                "mul" !!! "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } |
                "div" !!! "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) }
            case _ => throw new RuntimeException("bad precedence level "+level)
        }
    }

Assuming we have wrapped the expr, term and binaryOp methods as in the above examples, here is what the output looks like for a few tests. As in the previous REPL example, user input is in bold. If you are using the REPL and reload the file, remember to run import ExprParser._ again to pick up the newer definitions.

scala> test("1")
term.apply for token 1 at position 1.1 offset 0 returns [1.2] parsed: EConst(1)
plus.apply for token EOF at position 1.2 offset 1 returns [1.2] failure: ``+'' expected but EOF found

1
 ^
minus.apply for token EOF at position 1.2 offset 1 returns [1.2] failure: ``-'' expected but EOF found

1
 ^
expr.apply for token 1 at position 1.1 offset 0 returns [1.2] parsed: EConst(1)
Tree: EConst(1)
Eval: 1

scala> test("(1+2)*3")
term.apply for token 1 at position 1.2 offset 1 returns [1.3] parsed: EConst(1)
plus.apply for token `+' at position 1.3 offset 2 returns [1.4] parsed: +
term.apply for token 2 at position 1.4 offset 3 returns [1.5] parsed: EConst(2)
plus.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``+'' expected but `)' found

(1+2)*3
    ^
minus.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``-'' expected but `)' found

(1+2)*3
    ^
expr.apply for token 1 at position 1.2 offset 1 returns [1.5] parsed: EAdd(EConst(1),EConst(2))
term-parens.apply for token `(' at position 1.1 offset 0 returns [1.6] parsed: EAdd(EConst(1),EConst(2))
term.apply for token `(' at position 1.1 offset 0 returns [1.6] parsed: EAdd(EConst(1),EConst(2))
term.apply for token 3 at position 1.7 offset 6 returns [1.8] parsed: EConst(3)
plus.apply for token EOF at position 1.8 offset 7 returns [1.8] failure: ``+'' expected but EOF found

(1+2)*3
       ^
minus.apply for token EOF at position 1.8 offset 7 returns [1.8] failure: ``-'' expected but EOF found

(1+2)*3
       ^
expr.apply for token `(' at position 1.1 offset 0 returns [1.8] parsed: EMul(EAdd(EConst(1),EConst(2)),EConst(3))
Tree: EMul(EAdd(EConst(1),EConst(2)),EConst(3))
Eval: 9

scala> test(parens,"(1+2)")
term.apply for token 1 at position 1.2 offset 1 returns [1.3] parsed: EConst(1)
mul.apply for token `+' at position 1.3 offset 2 returns [1.3] failure: ``*'' expected but `+' found

(1+2)
  ^
div.apply for token `+' at position 1.3 offset 2 returns [1.3] failure: ``/'' expected but `+' found

(1+2)
  ^
add.apply for token `+' at position 1.3 offset 2 returns [1.4] parsed: +
term.apply for token 2 at position 1.4 offset 3 returns [1.5] parsed: EConst(2)
mul.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``*'' expected but `)' found

(1+2)
    ^
div.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``/'' expected but `)' found

(1+2)
    ^
add.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``+'' expected but `)' found

(1+2)
    ^
sub.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``-'' expected but `)' found

(1+2)
    ^
expr.apply for token 1 at position 1.2 offset 1 returns [1.5] parsed: EAdd(EConst(1),EConst(2))
Tree: EAdd(EConst(1),EConst(2))
Eval: 3

As you can see, even for these very short input strings the output is pretty verbose. It does, however, show you what token it is trying to parse and where in the input stream that token is, so by paying attention to the position and offset numbers you can see where it is backtracking.

When you have found the problem and are done debugging, you can remove the DebugStandardTokenParsers class and take out all of the !!! wrapping operations, or you can leave everything in place and disable the wrapper output by changing the definition of the implicit !!! operator to this:

def !!![T](p:Parser[T]) = p

Or, if you want to make it possible to enable debugging output later, change !!! to return either p or new Wrap(p) depending on some debugging configuration value.

Updated Example

Below is the complete program with all of the above changes.

import scala.util.parsing.combinator.syntactical.StandardTokenParsers

sealed abstract class Expr {
    def eval():Int
}

case class EConst(value:Int) extends Expr {
    def eval():Int = value
}

case class EAdd(left:Expr, right:Expr) extends Expr {
    def eval():Int = left.eval + right.eval
}

case class ESub(left:Expr, right:Expr) extends Expr {
    def eval():Int = left.eval - right.eval
}

case class EMul(left:Expr, right:Expr) extends Expr {
    def eval():Int = left.eval * right.eval
}

case class EDiv(left:Expr, right:Expr) extends Expr {
    def eval():Int = left.eval / right.eval
}

case class EUMinus(e:Expr) extends Expr {
    def eval():Int = -e.eval
}

trait DebugStandardTokenParsers extends StandardTokenParsers {
    class Wrap[+T](name:String,parser:Parser[T]) extends Parser[T] {
        def apply(in: Input): ParseResult[T] = {
            val first = in.first
            val pos = in.pos
            val offset = in.offset
            val t = parser.apply(in)
            println(name+".apply for token "+first+
                    " at position "+pos+" offset "+offset+" returns "+t)
            t
        }
    }

    implicit def toWrapped(name:String) = new {
        def !!![T](p:Parser[T]) = new Wrap(name,p) //for debugging
        //def !!![T](p:Parser[T]) = p              //for production
    }
}

object ExprParser extends DebugStandardTokenParsers {
    lexical.delimiters ++= List("+","-","*","/","(",")")

    def value = numericLit ^^ { s => EConst(s.toInt) }

    def parens:Parser[Expr] = "(" ~> expr <~ ")"

    def unaryMinus:Parser[EUMinus] = "-" ~> term ^^ { EUMinus(_) }

    def term = "term" !!! ( value |  "term-parens" !!! parens | unaryMinus )

    def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = {
        level match {
            case 1 =>
                "add" !!! "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } |
                "sub" !!! "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) }
            case 2 =>
                "mul" !!! "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } |
                "div" !!! "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) }
            case _ => throw new RuntimeException("bad precedence level "+level)
        }
    }
    val minPrec = 1
    val maxPrec = 2

    def binary(level:Int):Parser[Expr] =
        if (level>maxPrec) term
        else binary(level+1) * binaryOp(level)

    def expr = "expr" !!! ( binary(minPrec) | term )

    def parse(s:String) = {
        val tokens = new lexical.Scanner(s)
        phrase(expr)(tokens)
    }

    def parse(p:Parser[Expr], s:String) = {
        val tokens = new lexical.Scanner(s)
        phrase(p)(tokens)
    }

    def apply(s:String):Expr = {
        parse(s) match {
            case Success(tree, _) => tree
            case e: NoSuccess =>
                   throw new IllegalArgumentException("Bad syntax: "+s)
        }
    }

    def test(exprstr: String) =
        printParseResult(parse(exprstr))

    def test(p:Parser[Expr], exprstr: String) =
        printParseResult(parse(p,exprstr))

    def printParseResult(pr:ParseResult[Expr]) = {
        pr match {
            case Success(tree, _) =>
                println("Tree: "+tree)
                val v = tree.eval()
                println("Eval: "+v)
            case e: NoSuccess => Console.err.println(e)
        }
    }
    
    //A main method for testing
    def main(args: Array[String]) = test(args(0))
}

Multithread Coroutine Scheduler

2011-07-19T16:49:00.000-07:00

Multithread Coroutine Scheduler

A scheduler that uses multiple worker threads for continuations-based Scala coroutines.

In my recent series of posts that ended with a complete Scala server that uses continuations-based coroutines to store per-client state, I asserted that the single-threaded scheduler implementation in that example could relatively easily be replaced by a scheduler that uses multiple threads. In this post I provide a simple working example of such a multithread scheduler.

Overview
Managing Tasks
Scheduler
Synchronization

Overview

We can use the standard thread-pool approach in which we have a pool of worker threads that independently pull from a common task queue. Java 1.5 introduced a set of classes and interfaces in the java.util.concurrent package to support various kinds of thread pools or potentially other task scheduling mechanisms. Rather than writing our own, we will use an Executor from that package.

We have an additional requirement that makes our situation a little bit more complex than the typical thread-pool: our collection of tasks includes both tasks that are ready to run and tasks that are currently blocked but will become ready to run at some point in the future.

We will implement a new scheduler class JavaExecutorCoScheduler that maintains a list of blocked tasks and uses a Java Executor to manage runnable tasks.

The updated complete source code for this post is available on github in my nioserver project under the tag blog-executor.

Managing Tasks

As mentioned above, we need to deal with two kinds of tasks: tasks that are ready to run and tasks that are blocked. The standard Executor class allows us to submit a task for execution, but does not handle blocked tasks. Since we don't want to submit blocked tasks to the Executor, we have to queue them up ourselves. We have two issues to attend to:

When our scheduler is passed a task, we must put it into our own queue of blocked tasks if it is not currently ready to run.
When a previously blocked task becomes ready to run, we must remove it from our queue of blocked tasks and pass it to the Executor.

The first issue is straightforward, as our framework already allows us to test the blocker for a task and see if the task is ready to run. In order to properly take care of the second issue, we will make a small change to our framework to allow us to notice when a blocker has probably stopped blocking so that we can run the corresponding task. We do this by modifying our CoScheduler class to add a method to notify it that a blocker has probably become unblocked:

    def unblocked(b:Blocker):Unit

We call this method from CoQueue in the two places where we previously called scheduler.coNotify: in the blockingEnqueue method after we have enqueued an item to notify the scheduler that the dequeue side is probably unblocked, and in the blockingDequeue method after we have dequeued an item to notify the scheduler that the enqueue side is probably unblocked. Those two methods in CoQueue now look like this:

    def blockingEnqueue(x:A):Unit @suspendable = {
        enqueueBlocker.waitUntilNotBlocked
        enqueue(x)
        scheduler.unblocked(dequeueBlocker)
    }

    def blockingDequeue():A @suspendable = {
        dequeueBlocker.waitUntilNotBlocked
        val x = dequeue
        scheduler.unblocked(enqueueBlocker)
        x
    }

The implementation of unblocked in our default scheduler DefaultCoScheduler is just a call to coNotify, so the behavior of that system will remain the same as it was before we added the calls to unblocked.

Because we need to ensure that all of our NIO read and write operations are handled sequentially, we continue to manage those tasks separately with our NioSelector class, where all of the reads are executed on one thread and all of the writes are executed on another thread.

Scheduler

We already have a scheduler framework that defines a CoScheduler class as the parent class for our scheduler implementations, which requires that we implement the methods setRoutineContinuation, runNextUnblockedRoutine and the newly added unblocked.

In our JavaExecutorCoSchduler, our setRoutineContinuation method is responsible for storing or executing the task. It checks to see if the task is currently blocked, storing it in our list of blocked tasks if so. Otherwise, it passes it to the thread pool (which is managed by an ExecutorService), which takes care of managing the threads and running the task. We define a simple case class, RunnableCont, to turn our task into a Runnable that is usable by the pool.

Our unblocked method gets passed a blocker which is probably now unblocked. We test that, and if in fact it is still blocked we do nothing. If it is unblocked, then we remove it from our list of blocked tasks and pass it to the pool.

The runNextUnblockedRoutine method in this scheduler doesn't actually do anything, since the pool is taking care of running everything. We just return SomeRoutinesBlocked so that the caller goes into a wait state.

In addition to the above three methods, we will have our thread pool, a lock that we use when managing our blocked and runnable tasks, and a set of blocked tasks waiting to become unblocked. For this implementation we choose to use a thread pool of a fixed size, thus the call to Executors.newFixedThreadPool.

Here is our complete JavaExecutorCoScheduler class:

package net.jimmc.scoroutine

import java.lang.Runnable
import java.util.concurrent.Executors
import java.util.concurrent.ExecutorService

import scala.collection.mutable.LinkedHashMap
import scala.collection.mutable.SynchronizedMap

class JavaExecutorCoScheduler(numWorkers:Int) extends CoScheduler {
    type Task = Option[Unit=>Unit]
    case class RunnableCont(task:Task) extends Runnable {
        def run() = task foreach { _() }
    }

    private val pool = Executors.newFixedThreadPool(numWorkers)
    private val lock = new java.lang.Object
    private val blockedTasks = new LinkedHashMap[Blocker,Task] with
            SynchronizedMap[Blocker,Task]

    private[scoroutine] def setRoutineContinuation(b:Blocker,task:Task) {
        lock.synchronized {
            if (b.isBlocked) {
                blockedTasks(b) = task
            } else {
                pool.execute(RunnableCont(task))
                coNotify
            }
        }
    }

    def unblocked(b:Blocker):Unit = {
        lock.synchronized {
            if (!b.isBlocked)
                blockedTasks.remove(b) foreach { task =>
                    pool.execute(RunnableCont(task)) }
        }
        coNotify
    }

    def runNextUnblockedRoutine():RunStatus = SomeRoutinesBlocked
}

Synchronization

Although not necessitated by the above changes, I added one more change to CoScheduler to improve its synchronization behavior.

While exploring various multi-threading mechanisms as alternatives to using Executor, I wrote a scheduler called MultiThreadCoScheduler in which I implemented my own thread pool and in which the master thread directly allocated tasks to the worker threads in the pool. Although that scheduler was quite a bit larger than the one presented above, it provided much more control over the threads, allowing me to change the number of worker threads on the fly and to be able to tell in my master thread whether there were any running worker threads.

In MultiThreadCoScheduler, the main thread would call coWait to wait until it needed to wake up and hand out another task, and the worker threads would call coNotify when they were done processing a task and were ready to be assigned the next task. Similarly, a call to coNotify would be issued whenever a new task was placed into the task queue.

Unfortunately, Java's wait and notify methods, which are the calls underlying our coWait and coNotify methods, do not quite behave the way we would like. If we compare those calls to the Java NIO select and wakeup calls, we note that if a call is made to wakeup before a call to select, the select call will return immediately. The wait/notify calls do not behave this way; if a call is made to notify when there is no thread waiting in a wait call on that monitor, the notify call does nothing, and the following call to wait will wait until the next call to notify.

This small difference in semantics actually makes a pretty big difference in behavior, because it means when using wait and notify you must be concerned with which happens first. Let's see how that works.

In a typical scenario we have a resource with a boolean state that indicates when a thread can access that resource, for example, a queue with a boolean state of "has some data" that indicates when a reader thread can pull an item from the queue (and perhaps another boolean state of "queue is full" that indicates when a writer thread can put an item into the queue). In the case of MultiThreadCoScheduler we have a task with a "ready" flag that tells us when we can assign that task to a worker, and a worker with an "idle" flag that tells us when we can assign a task to that worker. When a task becomes ready to run, we want a thread (other than the master, since it may be waiting) to add the task to our queue of tasks and then notify the master that a task is available. Meanwhile, when the master is looking for an available task to assign to an idle worker, it will query to see if a task is available, and if not it will then wait until one becomes available. The problem sequence would be if the master checks for available tasks, finds none, then before the master executes its wait, the non-master puts a ready task into the queue and issues a notify to the master. The result of this sequence would be a ready task in the queue, but a master waiting for a notify.

When all of the synchronization is done within a single class, you can ensure that the above problem sequencing of operations does not happen by arranging that the code that places a ready task into the queue and notifies the master happens within one synchronized block, and the code used by the master to query the queue for a ready task and then to wait happens within one synchronized block on the same monitor. But when dealing with subclasses, we run into the "inheritance anomaly" (or "inheritance-synchronization anomaly"). The essence of this problem is that the base class provides a method that is synchronized, but the subclass would like to include more functionality within that synchronized block. If, as is often the case, the subclass does not have access to the monitor being used by the base class to control its synchronization, there is no way for it to do this.

In our case, we can implement something that is sufficient for our current needs by making a small change to our coWait and coNotify methods in CoScheduler so that they behave in the same manner as select and wakeup: if a call to coNotify is made before a call to coWait, the call to coWait will return immediately. We do this by changing the implementation of coWait and coNotify in CoScheduler from this:

    def coWait():Unit = {
        defaultLock.synchronized {
            defaultLock.wait()
        }
    }

    def coNotify():Unit = {
        defaultLock.synchronized {
            defaultLock.notify
        }
    }

to this:

    private var notified = false
    def coWait():Unit = {
        defaultLock.synchronized {
            if (!notified)
                defaultLock.wait()
            notified = false
        }
    }

    def coNotify():Unit = {
        defaultLock.synchronized {
            notified = true
            defaultLock.notify
        }
    }

With the above change to our base class, our subclass no longer needs to be concerned about the problem sequence described above, because the call to coWait will return immediately if there was a call to coNotify since the most recent previous call to coWait.

Sledgehammer Words

2011-06-25T07:46:00.000-07:00

Words are tools that we use to clarify our concepts, express our emotions and persuade others to our positions. We use those tools to craft mental models which we deliver to our listener. The better the job we do with those tools, the more effectively we can communicate our message.

The words we use every day are our basic tools. Like screwdrivers and pliers, these words are simple but versatile, performing adequately for most tasks. Occasionally we might want to use a more esoteric word for a specific task, as we might pull out a pair of bent needle nose pliers when that tool is just right for the job.

The better your selection of tools, the better job you can do at making a beautiful and effective work. In a pinch you can use a slot-head screwdriver to set a Phillips screw, but you stand a higher chance of damaging the screw head and it is more difficult to set it just right. Similarly but more subtly, you may be able to use a Phillips screwdriver to set a Frearson screw, but you will be able to do a better job if you have a Frearson driver. Most of us will probably not need this level of distinction and can get by with just a Phillips, or indeed perhaps with just a slot-head driver, but if you want to be able to craft the best results over the widest range of projects, having that Frearson screwdriver in your toolbox will provide one more area in which you can do things better.

Swear words are the sledgehammers of our verbal toolbox. Like a sledgehammer, a swear word can pack a lot of punch, and like a sledgehammer it lacks precision. Sometimes a sledgehammer is the right tool for the job: when you need to smash a hole in something, one good whack with a sledgehammer can be far more effective than trying to use pliers and screwdrivers to do the same thing.

But for most of us, most of the time, that's not the job we are trying to do. Most of the time we are more interested in making a neat hole, and we should pull out the electric drill, or the hole saw, or even the Sawzall to do the job; or we just need to tap in a small nail, where a standard hammer would work nicely. If we smash it with a sledgehammer, it's likely that we will then need to spend a lot of time cleaning things up afterwards, which would probably be more work than using one of the other tools in the first place.

Some people seem to have a very small toolbox and are constantly swinging around that sledgehammer. They use it for almost everything; rather than pulling out a screwdriver to set a screw, they whack it with their sledgehammer. To me, everything these people say seems like a pile of smashed rubble. I doubt that's really the message they want to deliver.

Even a single use of a sledgehammer word can derail any kind of nuance or subtlety, and casual use will likely overwhelm everything else in the message.

So go ahead and use a sledgehammer when it is appropriate, but do so deliberately and fully conscious of your intended result. Make an effort to add a good assortment of tools to your toolbox, understand what you are trying to accomplish, learn to use the best tool for the job and use it well.

Java Nio Complete Scala Server

2011-04-15T11:08:00.000-07:00

The capstone to this series of posts: a complete multi-client stateful application server in Scala using Java NIO non-blocking IO for both reading and writing, and delimited continuations as coroutines for both IO and application processing.

Background
NioApplication
NioServer
NioListener
NioConnection
EchoServer
ThreeQuestionsServer
Limitations

Background

In the initial post of this series on Java NIO in Scala I mentioned a set of Limitations of the first example server. In the next three posts after that initial post I addressed some of those limitations. In this post I address the remaining limitation in that original list: the application code (an echo loop in the example) is buried in the NioConnection class, which makes that application code more difficult to maintain and makes the server code not directly reusable as a library.

With the changes described in the next section, all of the application-specific behavior will be encapsulated in an instance of an application-specific subclass of a new class, NioApplication. Since the remainder of the classes presented so far will now be independent of the application and reusable without any modifications for multiple applications, they will be moved into a separate package, net.jimmc.nio.

Other than adding package net.jimmc.nio, there were no changes to LineDecoder and NioSelector, and there were no changes to the coroutine package net.jimmc.scoroutine for this latest set of changes. For the files that were changed, listed below, the listings show the complete new version of the file, with changes from the previous version highlighted in bold.

The complete source for this series of posts is available on github in my nioserver project, with the specific version after the changes specified in this post tagged as blog-complete.

NioApplication

Extracting the application-specific code out of NioConnection is pretty simple: in NioConnection.startApp, rather than starting up a built-in echo loop, we add a hook that allows us to call back to an application-specific method that implements whatever behavior the application wants for dealing with a connection. To do this, we define a new abstract class NioApplication that includes a runConnection method that we can call from NioConnection.startApp.

We will also use the NioApplication class as a convenience class where we can bundle up some of the arguments that get passed around a lot, in particular the coroutine scheduler and the read and write selectors. This gives us the opportunity to override the coroutine scheduler with one more appropriate for the application, although we will not do so in this example.

package net.jimmc.nio

import net.jimmc.scoroutine.DefaultCoScheduler

import scala.util.continuations._

abstract class NioApplication {
    val readSelector = new NioSelector()
    val writeSelector = new NioSelector()
    val sched = new DefaultCoScheduler

    def runConnection(conn:NioConnection):Unit @suspendable
}

NioServer

We simplify the NioServer class by removing object NioServer, which will instead be in the application main object. We replace three parameters in the constructor with the single app parameter and likewise replace three arguments in the call to NioListener with the single app argument.

package net.jimmc.nio

import net.jimmc.scoroutine.DefaultCoScheduler

import java.net.InetAddress

class NioServer(app:NioApplication, hostAddr:InetAddress, port:Int) {
    val listener = new NioListener(app, hostAddr, port)

    def start() {
        listener.start(true)
        //run the NIO read and write selectors each on its own thread
        (new Thread(app.writeSelector,"WriteSelector")).start
        (new Thread(app.readSelector,"ReadSelector")).start
        Thread.currentThread.setName("CoScheduler")
        app.sched.run    //run the coroutine scheduler on our thread, renamed
    }
}

NioListener

Three parameters in the constructor have been replaced by the single app parameter.

package net.jimmc.nio

import net.jimmc.scoroutine.CoScheduler

import java.net.{InetAddress,InetSocketAddress}
import java.nio.channels.{ServerSocketChannel,SocketChannel}
import java.nio.channels.SelectionKey
import scala.util.continuations._

class NioListener(app:NioApplication, hostAddr:InetAddress, port:Int) {

    val serverChannel = ServerSocketChannel.open()
    serverChannel.configureBlocking(false);
    val isa = new InetSocketAddress(hostAddr,port)
    serverChannel.socket.bind(isa)

    def start(continueListening: =>Boolean):Unit = {
        reset {
            while (continueListening) {
                val socket = accept()
                NioConnection.newConnection(app, socket)
            }
        }
    }

    private def accept():SocketChannel @suspendable = {
        shift { k =>
            app.readSelector.register(serverChannel,SelectionKey.OP_ACCEPT, {
                val conn = serverChannel.accept()
                conn.configureBlocking(false)
                k(conn)
            })
        }
    }
}

NioConnection

We modify the constructor and the companion to replace three parameters with the single app parameter, and we replace our echo loop in startApp with a call to the application runConnection method, followed by a call to our close method to make sure we close the socket when the application is done with it.

package net.jimmc.nio

import net.jimmc.scoroutine.{CoQueue,CoScheduler}

import java.nio.ByteBuffer
import java.nio.channels.SelectionKey
import java.nio.channels.SocketChannel
import scala.util.continuations._

object NioConnection {
    def newConnection(app:NioApplication, socket:SocketChannel) {
        val conn = new NioConnection(app, socket)
        conn.start()
    }
}

class NioConnection(app:NioApplication, socket:SocketChannel) {

    private val buffer = ByteBuffer.allocateDirect(2000)
    private val lineDecoder = new LineDecoder
    private val inQ = new CoQueue[String](app.sched, 10)
    private val outQ = new CoQueue[String](app.sched, 10)

    def start():Unit = {
        startReader
        startWriter
        startApp
    }

    private def startApp() {
        reset {
            app.runConnection(this)
            close()
        }
    }

    private def startReader() {
        reset {
            while (socket.isOpen)
                readWait
        }
    }

    private def readWait:Unit @suspendable = {
        buffer.clear()
        val count = read(buffer)
        if (count<1) {
            socket.close()
            shiftUnit[Unit,Unit,Unit]()
        } else {
            buffer.flip()
            lineDecoder.processBytes(buffer, inQ.blockingEnqueue(_))
        }
    }

    private def read(b:ByteBuffer):Int @suspendable = {
        if (!socket.isOpen)
            -1  //indicate EOF
        else shift { k =>
            app.readSelector.register(socket, SelectionKey.OP_READ, {
                val n = socket.read(b)
                k(n)
            })
        }
    }

    def readLine():String @suspendable = inQ.blockingDequeue

    private def startWriter() {
        reset {
            while (socket.isOpen)
                writeWait
        }
    }

    private def write(b:ByteBuffer):Int @suspendable = {
        if (!socket.isOpen)
            -1  //indicate EOF
        else shift { k =>
            app.writeSelector.register(socket, SelectionKey.OP_WRITE, {
                val n = socket.write(b)
                k(n)
            })
        }
    }

    private def writeBuffer(b:ByteBuffer):Unit @suspendable = {
        write(b)
        if (b.remaining>0 && socket.isOpen)
            writeBuffer(b)
        else
            shiftUnit[Unit,Unit,Unit]()
    }

    private def writeWait():Unit @suspendable = {
        val str = outQ.blockingDequeue
        if (str eq closeMarker) {
            socket.close
            shiftUnit[Unit,Unit,Unit]()
        } else
            writeBuffer(ByteBuffer.wrap(str.getBytes("UTF-8")))
    }

    def writeLine(s:String) = write(s+"\n")
    def write(s:String) = outQ.blockingEnqueue(s)

    def isOpen = socket.isOpen
    private val closeMarker = new String("")
    def close():Unit @suspendable = write(closeMarker)
}

EchoServer

We move the application-specific main object out of NioServer and place it into our sample application class, which we call EchoServer, along with a subclassed NioApplication that provides our application behavior.

Highlighted differences are as compared to the previous version of NioServer.

import net.jimmc.nio.{NioApplication,NioConnection,NioServer}
import net.jimmc.scoroutine.DefaultCoScheduler

import java.net.InetAddress
import scala.util.continuations._

object EchoServer {
    def main(args:Array[String]) {
        val app = new EchoApplication
        val hostAddr:InetAddress = null //listen on local connection
        val port = 1234
        val server = new NioServer(app,hostAddr,port)
        server.start()
    }
}

class EchoApplication extends NioApplication {
    def runConnection(conn:NioConnection):Unit @suspendable = {
        while (conn.isOpen) {
            conn.writeLine(conn.readLine)
        }
    }
}

The above class is the complete application definition for our echo server when built on top of our generic nio package. After compiling, run with this command:

$ scala EchoServer

With all the above changes, we have once again internally transformed our application, but besides starting it up with a different name it's external behavior is still the same. However, we have reached the point where defining a new server-based application is easy.

ThreeQuestionsServer

The example in this section shows a slightly more complex application that maintains some local per-client state as it progresses through a short series of steps interacting with the client. In this simple application, the server asks up to three questions of the client and collects responses, with each next question sometimes depending on the previous answers. The per-client state is contained both in local variables and in the location of execution within the application. Each time the processing for a client is suspended the state for that client is captured in a continuation to be restored when the next piece of input is available. The continuation includes all of the above per-client state information, so we don't have to write any application-specific code to save and restore that data.

By defining the ReaderWriter interface trait, the application is written so as to be able to run either in server mode using an instance of ConnReader, in which case it accepts connections from clients, or in standalone mode using an instance of SysReader, in which case it only interacts with the console.

When our application running in server mode finishes handling a client and exits from the run method, control returns to NioConnection, which closes the connection.

import net.jimmc.nio.{NioApplication,NioServer,NioConnection}

import java.io.{BufferedReader,InputStreamReader,PrintWriter}
import java.net.InetAddress

import scala.util.continuations._

object ThreeQuestionsConsole {
    def main(args:Array[String]) {
        val in = new BufferedReader(new InputStreamReader(System.in))
        val out = new PrintWriter(System.out)
        val io = new SysReader(in,out)
        reset {
            (new ThreeQuestions(io)).run
        }
    }
}

object ThreeQuestionsServer {
    def main(args:Array[String]) {
        val app = new ThreeQuestionsApp
        val hostAddr:InetAddress = null //localhost
        val port = 1234
        val server = new NioServer(app,hostAddr,port)
        server.start()
    }
}

class ThreeQuestionsApp extends NioApplication {
    def runConnection(conn:NioConnection):Unit @suspendable = {
        val io = new ConnReader(conn)
        (new ThreeQuestions(io)).run
    }
}

trait ReaderWriter {
    def readLine():String @suspendable
    def writeLine(s:String):Unit @suspendable
}

class SysReader(in:BufferedReader,out:PrintWriter) extends ReaderWriter {
    def readLine() = in.readLine
    def writeLine(s:String) = { out.println(s); out.flush() }
}

class ConnReader(conn:NioConnection) extends ReaderWriter {
    def readLine():String @suspendable = conn.readLine
    def writeLine(s:String):Unit @suspendable = conn.writeLine(s)
}

class ThreeQuestions(io:ReaderWriter) {
    def run():Unit @suspendable = {
        val RxArthur = ".*arthur.*".r
        val RxGalahad = ".*galahad.*".r
        val RxLauncelot = ".*(launcelot|lancelot).*".r
        val RxRobin = ".*robin.*".r
        val RxHolyGrail = ".*seek the holy grail.*".r
        val RxSwallow = ".*african or european.*".r
        val RxAssyriaCapital =
            ".*(assur|shubat.enlil|kalhu|calah|nineveh|dur.sharrukin).*".r
        val name = ask("What is your name?").toLowerCase
        val quest = ask("What is your quest?").toLowerCase
        val holy = quest match {
            case RxHolyGrail() => true
            case _ => false
        }
        if (holy) {
            val q3Type = name match {
                case RxRobin() => 'capital
                case RxArthur() => 'swallow
                case _ => 'color
            }
            val a3 = (q3Type match {
                case 'capital => ask("What is the capital of Assyria?")
                case 'swallow => ask("What is the air-speed velocity of an unladen swallow?")
                case 'color => ask("What is your favorite color?")
            }).toLowerCase
            (q3Type,a3,name) match {
                //Need to use an underscore in regex patterns with alternates
                case ('capital,RxAssyriaCapital(_),_) => accept
                case ('capital,_,_) => reject
                case ('swallow,RxSwallow(),_) => rejectMe
                case ('swallow,_,_) => reject
                case ('color,"blue",RxLauncelot(_)) => accept
                case ('color,_,RxLauncelot(_)) => reject
                case ('color,"yellow",RxGalahad()) => accept
                case ('color,_,RxGalahad()) => reject
                case ('color,_,_) => accept
            }
        } else {
            reject
        }
    }

    def ask(s:String):String @suspendable = { io.writeLine(s); io.readLine }
    def accept:Unit @suspendable = io.writeLine("You may pass")
    def reject:Unit @suspendable = io.writeLine("you: Auuuuuuuugh!")
    def rejectMe:Unit @suspendable = io.writeLine("me: Auuuuuuuugh!")
}

To run in console or server mode, use one of the following two commands:

$ scala ThreeQuestionsConsole
$ scala ThreeQuestionsServer

Limitations

I am calling this version complete because it addresses all of the issues in the Limitations section of my original post, but it is far from production-ready. Before putting this code into production I would address the following issues.

Although the application now uses more than one thread, it still runs all of the application code on a single thread. The scheduler should be replaced by one that can choose how many threads to use and distribute the execution of the coroutines among those threads.
This version still has not addressed all of the issues raised in the Limitations section of the second post in this series, on character decoding. In particular:
- Error handling should be improved.
- It only supports UTF-8 encoding.
For an example of this problem, type a Control-C into your telnet window when connected to the EchoServer application.
The application should parse its command line arguments so that it has the flexibility to, for example, use a different port number without requiring a code change.
The application should read a configuration file.
Error handling in general needs to be improved.
Logging should be added.

Java NIO for Writing

2011-04-08T08:58:00.000-07:00

Using Java NIO non-blocking IO for writing as well as reading is almost - but not quite - straightforward.

Background
Implementation
Two Selectors
Close
Summary

Background

One of the limitations pointed out in the Limitations section of the original post in this series was that we were still directly writing our output data to the socket rather than using non-blocking IO and continuations as we were doing when reading our input data. If a client stops reading its input (or if there is sufficient network congestion that it looks that way from our end) then our socket output buffer may fill up. If that happens, then one of two things will happen when we try to write our data to that socket: either the call will block, or the data will not all be written. If the call blocks, then we have a blocked thread that we can not use for processing other clients until it is unblocked. If there are many clients who are not reading their input, we could have many blocked threads. Since one of the goals of this exercise is to be able to run many clients on a relatively small number of threads, having blocked threads is bad. To avoid this problem, we use non-blocking output and continuations for writing to the output, just as we did for reading the input.

The complete source for this series of posts is available on github in my nioserver project, with the specific version after the changes specified in this post tagged as blog-write.

Implementation

We model the output code on the input code by making these changes:

We write a suspending write method that registers our interest in writing to the output socket connection.
We add an output queue to receive data from the application.
We modify the writeLine method to add a line to the output queue rather than writing directly to the output socket.
We run a separate control loop that reads from the output queue and writes to the output socket.

//In class NioConnection
    private val outQ = new CoQueue[String](sched, 10)

    def start():Unit = {
        startReader
        startWriter
        startApp
    }

    private def startWriter() {
        reset {
            while (socket.isOpen)
                writeWait
        }
    }

    private def write(b:ByteBuffer):Int @suspendable = {
        if (!socket.isOpen)
            -1  //indicate EOF
        else shift { k =>
            selector.register(socket, SelectionKey.OP_WRITE, {
                val n = socket.write(b)
                k(n)
            })
        }
    }

    private def writeBuffer(b:ByteBuffer):Unit @suspendable = {
        write(b)
        if (b.remaining>0 && socket.isOpen)
            writeBuffer(b)
        else
            shiftUnit[Unit,Unit,Unit]()
    }

    private def writeWait:Unit @suspendable = {
        val str = outQ.blockingDequeue
        writeBuffer(ByteBuffer.wrap(str.getBytes("UTF-8")))
    }

    def writeLine(s:String):Unit @suspendable = write(s+"\n")
    def write(s:String):Unit @suspendable = outQ.blockingEnqueue(s)

This seems pretty straightforward, but unfortunately it doesn't work. The problem is that we have attempted to register our channel twice (once for read and once for write) with the same selector. The documentation for SelectableChannel says, "A channel may be registered at most once with any particular selector." If we call register for our channel for write when it is already registered for read, the read registration is overwritten by the write registration and is lost.

In his Rox Java NIO Tutorial James Greenfield explicitly recommends that you "Use a single selecting thread" and "Modify the selector from the selecting thread only." We could take this approach, adding some code to combine the read and write interest flags when we are in that position, but unlike in James' case we would also need to add some code to demultiplex the separate callbacks for read and write. Instead, we use a different approach: we use separate selectors for reading and writing, and we give each of them its own thread.

Two Selectors

Depending on the implementation, using two selectors and two threads this way could cause problems. However, based on my understanding of the documentation, the code in the Sun implementation and the operation of the POSIX select operation, I believe this approach should work (at least on POSIX systems). This would need to be tested on all supported platforms for a production system.

To use separate read and write selectors, we replace the current selector parameter in NioConnection with two parameters readSelector and writeSelector of the same type.

//In object NioConnection:
    def newConnection(sched:CoScheduler, readSelector:NioSelector,
            writeSelector:NioSelector, socket:SocketChannel) {
        val conn = new NioConnection(sched,readSelector,
            writeSelector,socket)
        conn.start()
    }

class NioConnection(sched:CoScheduler, readSelector:NioSelector, 
        writeSelector:NioSelector, socket:SocketChannel) {
    ...
    private def read(b:ByteBuffer):Int @suspendable = {
        if (!socket.isOpen)
            -1  //indicate EOF
        else shift { k =>
            readSelector.register(socket, SelectionKey.OP_READ, {
                val n = socket.read(b)
                k(n)
            })
        }
    }

    private def write(b:ByteBuffer):Int @suspendable = {
        if (!socket.isOpen)
            -1  //indicate EOF
        else shift { k =>
            writeSelector.register(socket, SelectionKey.OP_WRITE, {
                val n = socket.write(b)
                k(n)
            })
        }
    }

    ...
}

We also change NioListener to pass through those two arguments, and we choose to use the readSelector to handle our accept calls.

//In NioListener
class NioListener(sched:CoScheduler, readSelector:NioSelector,
        writeSelector:NioSelector, hostAddr:InetAddress, port:Int) {
    ...
    def start(continueListening: =>Boolean):Unit = {
        reset {
            while (continueListening) {
                val socket = accept()
                NioConnection.newConnection(sched,
                    readSelector,writeSelector,socket)
            }
        }
    }

    private def accept():SocketChannel @suspendable = {
        shift { k =>
            readSelector.register(serverChannel,SelectionKey.OP_ACCEPT, {
                val conn = serverChannel.accept()
                conn.configureBlocking(false)
                k(conn)
            })
        }
    }
}

Finally, we instantiate the new write selector in NioServer, pass it in to NioListener, and start it running in a new thread.

//In NioServer
class NioServer(hostAddr:InetAddress, port:Int) {
    val readSelector = new NioSelector()
    val writeSelector = new NioSelector()
    val sched = new DefaultCoScheduler
    val listener = new NioListener(sched, 
        readSelector, writeSelector, hostAddr, port)

    def start() {
        listener.start(true)
        //run the NIO read and write selectors each on its own thread
        (new Thread(writeSelector,"WriteSelector")).start
        (new Thread(readSelector,"ReadSelector")).start
        Thread.currentThread.setName("CoScheduler")
        sched.run    //run the coroutine scheduler on our thread, renamed
    }
}

Close

Our current example has no terminating condition, so never attempts to close the connection. Looking ahead, we expect to have applications that will want to do that, so we add a close method to NioConnection, and an isOpen method that allows us to see when it is closed.

We can't just add a close method that directly closes the socket, because there may still be output data waiting to be written. Thus we need an implementation that somehow waits until all of the queued output data has been written to the output before closing the socket.

One easy way to do this is to have a special marker string that we put into the output queue when the application requests to close the socket. When our socket output code sees that marker, we know it has already written out all of the data that came before that marker in the output queue, so we can close the socket. By doing the socket close in the same method that does the writes to the socket, and by ensuring that that method is called on the (write) selection thread, we also ensure that the close happens on the selection thread.

The compiler shares constant strings, so to make sure we have a unique string for our marker that can't be passed in by any code outside of our close method, we use new String(). In writeWait, where we check for that marker, we use the identity comparison eq when checking for the marker, and we add a call to shiftUnit to make both sides of the if statement be CPS.

A call to our close method will return right away, but the socket will not get closed until after all of the data in the output queue has been written to the output socket. The application can tell when the socket has actually been closed by calling the isOpen method.

//In NioConnection
    private def writeWait():Unit @suspendable = {
        val str = outQ.blockingDequeue
        if (str eq closeMarker) {
            socket.close
            shiftUnit[Unit,Unit,Unit]()
        } else
            writeBuffer(ByteBuffer.wrap(str.getBytes("UTF-8")))
    }

    def isOpen = socket.isOpen
    private val closeMarker = new String("")
    def close():Unit @suspendable = write(closeMarker)

Summary

As in the previous two posts, we have modified the program to make an internal improvement that has not changed its basic external behavior. We have, however, changed its behavior for one of the corner cases - in this case what happens when an output socket fills up, such as might happen when there is excessive network latency - which is a necessary improvement for a production application, particularly if one expects the kind of high volume that would make those corner cases more likely.

Java NIO and Scala Coroutines

2011-04-02T18:33:00.000-07:00

I present a multi-client server in Scala that uses coroutines to allow modularization of stateful client processing in a way that is independent of threads.

Background
Coroutines
Architecture
NioSelector
CoScheduler
CoQueue
NioConnection
LineDecoder
NioListener
NioServer
Summary
Caveats

Background

In my previous two posts I presented a server in Scala that uses Java NIO non-blocking IO and continuations to allow scaling to a large number of clients. As I pointed out in the Limitations section of that first post, that example used one thread for all execution. On a multi-core machine, as is common today, we would prefer to have multiple threads running to allow us to take advantage of all of the processing power available to us, yet we don't want to allocate a thread to every client.

It would be nice if we could add our own SelectableChannel types to the set of NIO channel types that we can use with the select call so that we could have one place where we do all our scheduling, but that feature is not available. We thus have to come up with another mechanism for handling all of the other potentially blocking tasks we will want to do. Fortunately, we already have such a mechanism: coroutines.

Coroutines

Coroutines provide a separation of the maintenance of task state from the execution of code for that task, allowing us to bind execution of the task to different threads as we desire. When one of our task coroutines becomes blocked waiting for an unavailable resource, we suspend it by storing its continuation, allowing us to use that thread for another purpose, such as to restore and run a different previously stored continuation that is now runnable.

In my earlier post on coroutines I presented an implementation of a coroutine package that included a scheduler (CoScheduler) and a blocking queue (CoQueue). We will modify the server implementation of my previous two "Java NIO" posts to make use of those classes.

As pointed out in that earlier coroutines post, the default scheduler implementation in the example can easily be replaced by another implementation with no other changes to the code. In particular, that new implementation could use a thread pool or a group of actors to execute the coroutines that are ready to run, assuming the coroutine code itself is multi-thread safe. We will not write that multi-thread scheduler for this post, but will assume that it can be written later.

Architecture

At a high level, we want to modify our server so that we have a queue between our socket reader and the application that will eventually consume the data. We can then set up a small processing loop that reads the socket data, converts it to a string and writes it to that queue. The application will read the contents of the queue, process it, and write back its results to the connection. We will let the socket reader continue to run on the select thread, but we will run the application on a separate thread (or threads), ensuring that the select loop can quickly get to all connections and preventing the application processing of any one connection from delaying the IO of other connections.

With this architecture we have two processing loops:

Read data from socket, write to queue.
Read data from queue, process it, write data (to socket, for now).

Given that for now we are writing directly to the connection socket on output (and ignoring the possibility that the output socket might be blocked), the second loop only has one potential blocking point: if there is no data in the queue, it will block when trying to read from the queue. The first loop has two potential blocking points: when it reads data from the socket (if there is no data available), and when it writes data to the queue (if the queue is full). The difficulty here is that the potentially blocking socket read must be handled by the NIO select call, but the potentially blocking write to the queue can't be handled by the NIO select call and thus must be handled by our own scheduler.

Having one processing loop that when blocked is sometimes managed by one scheduler (NIO select) and sometimes by another (our coroutine scheduler) is not necessarily a problem. Each scheduler just sees a blocking resource that has a continuation associated with it; when the blocking resource becomes available, the continuation is called and the process continues. The new issue that arises when trying to combine two schedulers like this is that an action by one scheduler can potentially unblock a task that is currently controlled by (i.e. in a wait state on) the other scheduler. Every time we perform an action that might unblock a task we need to ensure that the appropriate scheduler is not stuck waiting on the other tasks. In other words, we need to wake up or notify the schedulers at appropriate points in our code.

In this post, code which has changed is highlighed in bold (when not using Syntax Highlighting). Changes for CoScheduler and CoQueue are as compared to the code in my post on coroutines; changes to NioSelector, NioConnection, LineDecoder, NioListener and NioServer are as compared to the code in my previous two posts.

The complete source for this post is available on github in my nioserver project, with the specific version used in this post tagged as blog-coroutines. There are also tags for the previous two posts, so you can compare using those tags to see the changes between the versions as used in each post.

NioSelector

As mentioned above, we have to cooperate with the coroutine scheduler. In particular, we have to be able to deal with the situation that we have no active connections, so we are in a select call, then another thread registers interest in an operation. The documentation for the select call states:

Changes made to the interest sets of a selector's keys while a selection operation is in progress have no effect upon that operation; they will be seen by the next selection operation.

To terminate the select operation early so that it retries with the newly registered channel, we add a call to wakeup just after registering our interest.

Unfortunately, this is not enough. The documentation for the select call is not very precise about whether it is actually possible to call the register call from another thread while the select call is blocked waiting for a previously registered channel to become active. The documentation for SelectableChannel does explicitly say "Selectable channels are safe for use by multiple concurrent threads", but the documentation for the register method says "This method will then synchronize on the selector's key set and therefore may block if invoked concurrently with another registration or selection operation involving the same selector." In fact, the standard Sun implementation does quite a bit of synchronization, so quite easily gets deadlocked when used by multiple threads. In particular, the OS-level select call in the Java select method is inside a pair of synchronized blocks that lock the set of SelectionKeys associated with that selector. If, while the first thread is blocked on the select, a second thread calls SelectableChannel.register, it locks the channel, then attempts to lock on the key set to which that channel is being added, so it blocks. If a third thread then tries to register that channel with a second selector, which the documentation implies is allowed, the third thread will attempt to lock the channel, which will block until the second thread unblocks and releases its lock on the channel.

In his Rox Java NIO Tutorial James Greenfield explicitly recommends that you "Use a single selecting thread" and "Modify the selector from the selecting thread only." From the description of how register works above, you can see why.

To get around this problem and ensure that all changes to the selection keys happen on the thread that is calling select, we modify NioSelect.register so that, rather than calling SelectableChannel.register directly, it packages the arguments up and puts them into a queue which is processed by the selection thread in order to make all of the calls to SelectableChannel.register just before it calls select.

Fortunately, the semantics of the wakeup call ensure that we won't get ourselves into a position where we have put our registration request into the queue but the select call doesn't see it and blocks on all the other channels. This is because wakeup is defined such that a call to it that happens while the selector is not currently in a select operation will cause the next select to wake up immediately.

With this change, all of the key set operations happen on the selection thread and, since the socket read operation is in a callback that gets executed by the selection thread in NioSelector.executeCallbacks, all socket reads (and likewise accepts) will happen on the selection thread.

//In class NioSelector
import scala.collection.mutable.SynchronizedQueue

    private case class RegistrationRequest(
        channel:SelectableChannel,op:Int,callback:Function0[Unit])
    private val regQ = new SynchronizedQueue[RegistrationRequest]

    def register(channel:SelectableChannel, op:Int, body: => Unit) {
        val callback:Function0[Unit] = { () => { body }}
        regQ.enqueue(RegistrationRequest(channel,op,callback))
        selector.wakeup()
    }

    def selectOnce(timeout:Long) {
        while (regQ.size>0) {
            val req = regQ.dequeue()
            req.channel.register(selector,req.op,req.callback)
        }
        ...
    }
}

CoScheduler

For our coroutine scheduler, we have to be able to deal with the situation that we have no coroutines that are currently runnable, then at some point one of those coroutines becomes runnable by the actions of another thread. In the architecture described above, this can happen when new data that has been read from a connection is placed into the input queue. To allow us to wait for this kind of event and to be awakened when it happens, we use Java's wait/notify model. We can't override those methods, since notify is final, so we define our own versions, which we call coWait and coNotify. Given those methods, we also extend Runnable and replace the old run method with one that runs coroutines until none are available to run, then waits until we are notified and continues the loop.

trait CoScheduler extends Runnable { cosched =>
    //we add the following items
    private val defaultLock = new java.lang.Object
    def coWait():Unit = { defaultLock.synchronized { defaultLock.wait() } }
    def coNotify():Unit = { defaultLock.synchronized { defaultLock.notify } }

    def run {
        while (true) {
            runUntilBlockedOrDone
            coWait
        }
    }
}

A coNotify method that accepts as an argument the coroutine or blocker that has potentially changed state would allow for a more efficient implementation, but for now we choose the simple implementation given above that does not attempt that optimization.

CoQueue

We use an instance of CoQueue as the queue between the socket read loop and the application processing loop. The socket read loop calls blockingEnqueue to place an item into the queue, and the application processing loop calls blockingDequeue to take an element out of the queue. The result of either of these actions could be to unblock another coroutine, so we modify those methods to add a call to coNotify in case they are being called from a coroutine that is not currently being managed by our coroutine scheduler. Since we are calling the enqueue and dequeue methods from different threads, we use a SynchronizedQueue rather than a plain Queue. Those two methods now look like this:

import scala.collection.mutable.SynchronizedQueue

class CoQueue ... extends SynchronizedQueue[A] { ...

    def blockingEnqueue(x:A):Unit @suspendable = {
        enqueueResource.waitUntilNotBlocked
        enqueue(x)
        dequeueResource.coNotify
    }

    def blockingDequeue():A @suspendable = {
        dequeueResource.waitUntilNotBlocked
        val x = dequeue
        enqueueResource.coNotify
        x
    }

NioConnection

We add a CoQueue which we use as our input queue between the socket reader loop and the application loop. For this example, we pick an arbitrary limit of 10; if our application gets behind by more than 10 items, the socket reader code will suspend when attempting to write to the queue. If more data arrives while that code is thus suspended, it will back up in the system's input buffer for that connection, and eventually the client will get an error when trying to write to its output connection.

In order to initialize the CoQueue we need to pass in a CoScheduler, so we add that parameter to our constructor and to the convenience method in our companion object.

import net.jimmc.scoroutine.{CoQueue,CoScheduler}

//In object NioConnection
    def newConnection(sched:CoScheduler, selector:NioSelector, socket:SocketChannel) {
        val conn = new NioConnection(sched,selector,socket)
    }

class NioConnection(sched:CoScheduler, selector:NioSelector, socket:SocketChannel) {
    //Add CoQueue
    private val inQ = new CoQueue[String](sched, 10)
}

Now that we have a queue, we modify our socket reader code to place our input data (after conversion to a Java string) into our queue rather than writing it straight to the output socket. We want to block when the queue is full, so we call the blockingEnqueue method. Since we now know that's the only action we will be taking, we fold the readAction method back into readWhile. Because blockingEnqueue is suspendable, the else branch of the if (count<1) code block is suspendable, so we need to make the if branch suspendable as well. We do this by adding a shiftUnit call as the final value in the if branch. The readWhile method now looks like this:

    private def readWait = {
        buffer.clear()
        val count = read(buffer)
        if (count<1) {
            socket.close()
            shiftUnit[Unit,Unit,Unit]()
        } else {
            //Moved here from readAction
            buffer.flip()
            lineDecoder.processBytes(buffer, inQ.blockingEnqueue(_))
        }
    }

We now have input data going into our queue, but nobody is reading it. For this example, we implement a simple echo loop that reads from the input queue using a new readLine method and writes to the output using our existing writeLine method. We do this inside a reset block so that it becomes another coroutine that can be managed by our coroutine scheduler. Our previous start method started up the socket reader loop. We rename that one to startReader, add a startApp method that starts up our echo loop, and call both of those from a new start method. Our start method now looks like this:

//In class NioConnection
    def start():Unit = {
        startReader
        startApp
    }   
            
    private def startApp() {
        reset {
            while (socket.isOpen)
                writeLine(readLine())
        }
    }

    private def startReader() {
        reset {
            while (socket.isOpen)
                readWait
        }
    }

    def readLine():String @suspendable = inQ.blockingDequeue

LineDecoder

Our processBytes method is now getting passed a callback that is suspendable, so we need to modify the signature of our method to accept that. It passes that callback to processChars, so that signature needs to be changed in the same way. Since processChars is now calling a suspendable method, it too is suspendable, so its return signature needs to be modified to note that, and since processBytes calls processChars, it too needs to be modified to have a suspendable return signature.

//In class LineDecoder
import scala.util.continuations._

    def processBytes(b:ByteBuffer,
        lineHandler:(String)=>Unit @suspendable):Unit @suspendable = ...

    private def processChars(cb:CharBuffer,
        lineHandler:(String)=>Unit @suspendable):Unit @suspendable = { ... }

NioListener

NioListener calls NioConnection.newConnection, and that call now requires a CoScheduler argument, so we add that to our constructor and pass it through when we call newConnection.

import net.jimmc.scoroutine.CoScheduler

class NioListener(sched:CoScheduler, selector:NioSelector, hostAddr:InetAddress, port:Int) {

    def start(continueListening: =>Boolean):Unit = {
        reset {
            while (continueListening) {
                val socket = accept()
                NioConnection.newConnection(sched,selector,socket)
            }
        }
    }
}

NioServer

NioServer instantiates the NioListener, so we need to pass it an instance of CoScheduler. We create an instance of DefaultCoScheduler and pass that in. We now need two threads, one for our coroutine scheduler and one for the NIO scheduler. In our start method, we create and start a second Thread for the NIO scheduler, then rename our own thread and run the coroutine scheduler on it.

import net.jimmc.scoroutine.DefaultCoScheduler

class NioServer(hostAddr:InetAddress, port:Int) {
    val selector = new NioSelector()
    val sched = new DefaultCoScheduler
    val listener = new NioListener(sched, selector, hostAddr, port)

    def start() {
        listener.start(true)
        //run the NIO selector on its own thread
        (new Thread(selector,"NioSelector")).start
        Thread.currentThread.setName("CoScheduler")
        sched.run    //run the coroutine scheduler on our thread, renamed
    }
}

Summary

As in the previous post, we have once again transformed our example application in a way which provides an internal improvement - in this case the ability to use multiple threads - but which has not changed its basic external behavior: we still have a simple echo server. We also have not yet addressed all of the Limitations from the first post in this series. Stay tuned for more.

Caveats

Although I have asserted that it is possible to write a multi-threading scheduler to the CoScheduler API, I have not yet actually done this. It is possible that this may be more difficult than I expect.
Multi-threaded code is generally tricky stuff. I have not spent a lot of time running this example code, so it is certainly possible that there are race conditions or other concurrency problems.

Java NIO for Character Decoding in Scala

2011-03-28T15:01:00.000-07:00

The Java NIO package includes some handy character encoding and decoding methods that can be used from Scala.

Background
Java NIO Character Coders
LineDecoder
NioConnection
Limitations

Background

In my previous post I described a simple Scala server using NIO and continuations, and mentioned in the Limitations section that the example did not convert the data bytes to characters. In this post I show how that can easily be added by using another feature of the Java NIO package: character-set encoders and decoders.

Java NIO Character Coders

The java.nio.charset package includes a Charset class that represents a mapping between the 16-bit Unicode code-units that Java uses for its internal representation for characters and strings, and a sequence of bytes as are stored in a file or transmitted through a socket connection. Each such mapping is represented by a separate instance of the Charset class. Standard character mappings such as "UTF-8" and "ISO-8859-1" can be retrieved using the static forName method.

Given an instance of Charset, a CharsetEncoder for that character mapping can be retrieved by calling the newEncoder method on that instance. That encoder can then be used to convert a Java string into a sequence of bytes suitable for writing to a file or connection.

Similarly, the newDecoder method on Charset retrieves a CharsetDecoder that can be used for the complementary task of converting bytes from a file or connection into a Java string.

The encoding and decoding methods convert data between a CharBuffer and a ByteBuffer. Since the java.nio socket I/O calls we are using read and write their data to and from ByteBuffers, it is convenient for the encoding and decoding to use those objects.

LineDecoder

Using the java.nio.charset classes described above, we write a LineDecoder class containing a processBytes method that takes as input a ByteBuffer (which is what we have to read into when using a SocketChannel) and converts that byte data to Java characters. For this example, we also break up that character data into separate lines when we see line break characters, converting each line of characters to a Java String. One buffer of data might contain multiple lines of character data, so rather than returning a set of lines, our method accepts a callback to which we pass each line as we decode it.

import java.nio.{ByteBuffer,CharBuffer}
import java.nio.charset.{Charset,CharsetDecoder,CharsetEncoder,CoderResult}
import scala.annotation.tailrec

class LineDecoder {

    //Encoders and decoders are not multi-thread safe, so create one
    //for each connection in case we are using multiple threads.
    val utf8Charset = Charset.forName("UTF-8")
    val utf8Encoder = utf8Charset.newEncoder
    val utf8Decoder = utf8Charset.newDecoder

    def processBytes(b:ByteBuffer, lineHandler:(String)=>Unit):Unit =
        processChars(utf8Decoder.decode(b),lineHandler)

    @tailrec
    private def processChars(cb:CharBuffer, lineHandler:(String)=>Unit) {
        val len = lengthOfFirstLine(cb)
        if (len>=0) {
            val ca = new Array[Char](len)
            cb.get(ca,0,len)
            eatLineEnding(cb)
            val line = new String(ca)
            lineHandler(line)
            processChars(cb, lineHandler)       //handle multiple lines
        }
    }

    //Assuming the first character in the buffer is an eol char,
    //consume it and a possible matching CR or LF in case the EOL is 2 chars.
    private def eatLineEnding(cb:CharBuffer) {
        //Eat the first character and see what it is
        cb.get match {
            case '\n' => if (cb.remaining>0 && cb.charAt(0)=='\r') cb.get
            case '\r' => if (cb.remaining>0 && cb.charAt(0)=='\n') cb.get
            case _ => //ignore everything else
        }
    }

    private def lengthOfFirstLine(cb:CharBuffer):Int = {
        (0 until cb.remaining) find { i =>
            List('\n','\r').indexOf(cb.charAt(i))>=0 } getOrElse -1
    }
}

Here is an imperative version of lengthOfFirstLine that does the same thing as the functional version above.

    private def lengthOfFirstLine(cb:CharBuffer):Int = {
        var cbLen = cb.remaining
        for (i <- 0 until cbLen) {
            val ch = cb.charAt(i)
            if (ch == '\n' || ch == '\r')
                return i
        }
        return -1
    }

NioConnection

One of the classes shown in my previous post was the NioConnection class, whose responsibilities include processing input data from the client. It does this in the method readAction, which initially looks like this:

//The old version
    private def readAction(b:ByteBuffer) {
        b.flip()
        socket.write(b)
        b.clear()
    }

We replace the direct call to socket.write with a call to LineDecoder.processBytes, which is responsible for decoding the input data, and we pass it our new writeLine method that accepts a line of characters and writes it back to the client. Also, we don't actually need the call to b.clear here, which is effectively at the bottom of our readWhile loop, since we call that method at the top of the loop.

    private val lineDecoder = new LineDecoder

    private def readAction(b:ByteBuffer) {
        b.flip()
        lineDecoder.processBytes(b, writeLine)
    }

    def writeLine(line:String) {
        socket.write(ByteBuffer.wrap((line+"\n").getBytes("UTF-8")))
    }

Now when we receive some input data, it gets passed to LineDecoder.processBytes, which converts it to characters, breaks it up into separate lines, and calls our writeLine method for each line. The writeLine method uses String.getBytes to convert the characters in the line back to bytes, wraps those bytes into a ByteBuffer and writes them directly to the output channel.

As compared to the example in the previous post, this example should behave the same externally, but we are now passing around Java strings rather than NIO buffers, which, assuming we want to deal with string data rather than binary data, will make it simpler to write the rest of the real application.

Limitations

As with the example in the previous post, the current example only shows how to use the NIO calls on the read side of the connection. We could use a CharsetEncoder on the write side rather than using String.getBytes and ByteBuffer.wrap.
Partial input lines (characters not terminated by an EOL character) are ignored by this implementation.
The example uses the convenience method version of decode, which assumes that the input ByteBuffer contains complete character sequences. It is possible that a multi-byte character sequence will be split such that only the first part of that sequence appears at the end of the input buffer, with the remainder of the sequence appearing at the start of the next buffer of input data. The above implementation will not properly handle this situation. The underlying decode method does handle this situation properly, but the remaining code in this example is not set up for this situation.
The decode convenience method throws exceptions rather than returning a status code as the full decode method does. Since these exceptions are nowhere caught in the code, such an exception would cause that task to abort. A more robust solution would have a mechanism to catch exceptions or restart an aborted task.
The example assumes UTF-8 encoding.

Jim McBeath

The Ideal Software Law

Contents

The Ideal Gas Law

The Ideal Software Law

The Parameters

Functionality (F)

Quality (Q)

Resources (n)

Time (T)

The Software Constant

The form of the equation

Example

Analysis

Limitations of the abstraction

Conclusion

Home Automation for a Hot Water Recirculating Pump

Contents

Recirculating Hot Water

Home Automation

Hardware

Initial Setup

Adding Devices

Programming

From Counting to Complex by Inverse and Closure

Contents

Preface

Introduction

Concepts

Preview

Counting

Equals

Natural Numbers

Greater Than

Addition

Associative

Commutative

Identity

Algebra

Subtraction

Associative

Negative Numbers

Addition

Subtraction

Algebra

Multiplication

Identity and Zero

Distributive

Associative

Commutative

Algebra

Division

Associative

Rational Numbers

Algebra

Operator Precedence

Exponentiation

Logarithms

Principal Values

Irrational Numbers

Decimal Notation

Imaginary Numbers

Complex Numbers

Cartesian Coordinates

Euler's Formula

Complex Exponentiation

Euler's Identity

Final Closure

You Are Not Alone

The Only Human

Knowledge is Power

The Wealth of the World

People Power

Recreating Butter Streusel

Butter Streusel

Base

Topping

Assemble and Bake

Transferring MiniDV Tapes to Linux

Go Composition vs Inheritance