tag:blogger.com,1999:blog-70455243302534825412024-02-18T21:43:45.699-08:00Jim McBeathCoding and LifeJim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.comBlogger79125tag:blogger.com,1999:blog-7045524330253482541.post-72681529641283948812022-08-14T22:28:00.000-07:002022-08-14T22:28:55.889-07:00The Ideal Software LawIn science, we make abstractions that are simplified models of reality,
then we try to describe them with equations that let us make accurate
predictions given the conditions assumed by the model.
In this post I attempt to do that for software projects.
<h2>Contents</h2>
<ul>
<li><a href="#ideal-gas-law">The Ideal Gas Law</a>
<li><a href="#ideal-software-law">The Ideal Software Law</a>
<li><a href="#parameters">The Parameters</a>
<ul>
<li><a href="#functionality">Functionality</a>
<li><a href="#quality">Quality</a>
<li><a href="#resources">Resources</a>
<li><a href="#time">Time</a>
<li><a href="#software-constant">The Software Constant</a>
</ul>
<li><a href="#form-equation">The form of the equation</a>
<li><a href="#example">Example</a>
<li><a href="#analysis">Analysis</a>
<li><a href="#limitations-abstraction">Limitations of the abstraction</a>
<li><a href="#conclusion">Conclusion</a>
</ul>
<a name="ideal-gas-law">
<h2>The Ideal Gas Law</h2>
</a>
In physics, the behavior of an idealized gas is described by the
<a href="https://en.wikipedia.org/wiki/Ideal_gas_law">ideal gas law</a>:
<b><i>PV=nRT</i></b>, where <b><i>P</i></b> is pressure, <b><i>V</i></b> is volume, <b><i>n</i></b> is the quantity
of gas, <b><i>R</i></b> is a <a href="https://en.wikipedia.org/wiki/Gas_constant">constant</a>,
and <b><i>T</i></b> is the absolute
temperature. While real gases don't follow this law exactly, it can be
used to make pretty good predictions. It can help you understand how
steam engines, refrigerators, and hot air balloons work.
<br/><br/>
A key insight that follows from this equation is that you can't hold
three of the four parameters fixed and change just one parameter. If you
have a fixed amount of gas at a given pressure, volume, and temperature,
and you increase the temperature, then either the pressure goes up,
the volume goes up, or both. If, with the same starting conditions,
you decrease the volume, then either the pressure must go up, or the
temperature must go down, or both. You can keep any two parameters fixed
and change the other two in fixed relationships, but you simply can't
hold three of the parameters fixed and change just one. If you try to
do that, you will invariably fail: one or more of the other parameters
will, perforce, also change.
<a name="ideal-software-law">
<h2>The Ideal Software Law</h2>
</a>
We can use a similar equation to convey the relationships among the
parameters of software development. Instead of <b><i>PV=nRT</i></b>, we have:
<blockquote>
<font size="+2">
<b><i>FQ=nST</i></b>
</font>
</blockquote>
where <b><i>F</i></b> is functionality, <b><i>Q</i></b> is quality, <b><i>n</i></b> is development
resources, <b><i>S</i></b> is a constant, and <b><i>T</i></b> is the amount of time to complete
development. As with the ideal gas law, this equation does not precisely
apply to real software projects, but it can be used to make predictions
and gain insights. In particular, we can see in this formulation the
same basic insight as with the ideal gas law: it is not possible to hold
all but one of the parameters fixed and change only one parameter. If
you try to do so, one or more of the other parameters will, perforce,
also change.
<a name="parameters">
<h2>The Parameters</h2>
</a>
Let's take a look at what the parameters in our equation mean
and how we might measure them.
<a name="functionality">
<h3>Functionality (F)</h3>
</a>
Functionality represents what our software can do.
There are <a href="https://www.iso.org/standard/71197.html">defined ways</a>
to measure the <a href="https://en.wikipedia.org/wiki/Software_measurement">functional size</a>
of software, such as <a href="https://en.wikipedia.org/wiki/COSMIC_functional_size_measurement">COSMIC function points</a>,
but we would like something simpler that still allows us to understand the relationships
between the parameters of the equation.
For our purposes, a reasonable proxy for functionality is lines of code (LoC).
<br/><br/>
We are not claiming that lines of code is a good general metric for measuring productivity.
Some people write denser code than others, so can implement more
functionality in the same number of lines of code.
Some research has concluded that people can write the same number of lines of code
per day independent of language, but a higher-level language can express more with the
same number of lines of code, so could be used to implement more functionality
in the same number of lines of code as compared to a lower-level language.
Some projects have a more difficult environment than others, so developers
produce fewer lines of code per day in that environment.
<br/><br/>
However, we are using LoC slightly
differently in this case. We are not using it to compare productivity or functionality
between projects and teams,
but only within the team and project for which we are measuring functionality.
We assume that all of the factors mentioned above that affect the LoC metric
are constant within the project and time span of interest, so that twice as many
lines of code will provide twice as much functionality.
<a name="quality">
<h3>Quality (Q) </h3>
</a>
For quality, we could use a sophisticated quality model
such as <a href="https://www.iso.org/standard/35733.html">ISO/IEC 25010</a>,
but for this exercise we will use the simpler
<a href="https://asq.org/quality-resources/software-quality#:~:text=SOFTWARE%20QUALITY%20DEFECT%20MANAGEMENT%20APPROACH">Defect Management</a>
approach.
<br/><br/>
Intuitively, it makes sense that higher quality software will have
fewer bugs (also called defects). We also expect a larger project to have more total bugs
than a smaller project. Roughly speaking, then, we can think of the
number of bugs per line of code as being a proxy for the level of
quality of a software project. We can call this the bug density (or defect density).
We want our parameter to be larger for higher quality software,
so we use the reciprocal of the bug density. The reciprocal of density
for materials is called
<a href="https://en.wikipedia.org/wiki/Specific_volume">specific volume</a>,
so we will call this measure bug specific volume (or defect specific volume), and use
that as our measure of quality.
Our units for quality are thus LoC/bug.
<br/><br/>
We recognize that there are some practical problems with this measure.
Firstly, bugs come in different sizes. For our purpose
we will assume some kind of "normalized" bug units, and assign
more serious bugs more than one bug unit.
Secondly, we don't know how many bugs are in a piece of software
until well after it is delivered. We assume those bugs exist and
will be revealed over time, at a rate which depends on factors such
as how much use the software gets, so although we don't know the
number in advance, we can still use this concept in our abstraction
to understand the relation of quality to the other parameters.
<a name="resources">
<h3>Resources (n)</h3>
</a>
Resources, as in Human Resources, refers to the people we have available to work
on the project.
To a first approximation, n is the number of people developing the project.
Many studies have shown that different people have
<a href="https://www.construx.com/blog/productivity-variations-among-software-developers-and-teams-the-origin-of-10x/">different levels of productivity</a>.
For this idealization we assume that there is a baseline developer and that
we know what the productivity multiplier is for each of our developers
as compared to that baseline developer,
despite that in practice
<a href="https://insights.sei.cmu.edu/blog/programmer-moneyball-challenging-the-myth-of-individual-programmer-productivity/">this might be difficult</a>,
and the factor could be different depending on circumstances.
We then define n as the number of baseline developers on the project.
If we have a developer who we believe is three times as productive as our baseline,
that would increase n by three.
Our units for n are thus baseline developers, but
for simplicity, we will sometimes just refer to the units for n as people.
<br/><br/>
Our idealized
equation assumes that we could do our project in half the time if we had
twice the resources. We recognize that we are blatantly ignoring the problems of
<a href="https://en.wikipedia.org/wiki/The_Mythical_Man-Month">the mythical man-month</a>.
<a name="time">
<h3>Time (T)</h3>
</a>
Time refers to how much time it will take to complete the project.
This is the most straightforward dimension to measure, and because of that it is
often the dimension
that gets the most attention during project planning.
We choose to use days as our units, as that is a commonly used unit
for other aspects of software development.
<a name="software-constant">
<h3>The Software Constant</h3>
</a>
The units we have selected for the four
parameters define the units of the constant S.
<br/><br/>
F(LoC)Q(LoC/bug)=n(person)S(??)T(days)
<br/><br/>
Therefore the
units for S must be (LoC^2)/(bug*person*days).
We can also write this as (LoC/bug)*(LoC/person/day).
LoC/bug is a bug specific volume (our quality measure), and LoC/person/day is a
<a href="https://www.bunnyshell.com/blog/what-development-velocity">development velocity</a>
for our baseline developer,
so S is the product of a
bug specific volume and a
per-person development velocity.
We can think of S as the "quality velocity" for one baseline developer.
A higher value of S means higher
productivity: more functionality or quality from a given amount of time, per developer.
<br/><br/>
So what value should we use for S?
Some people (such as Brooks in The Mythical Man-Month) say a programmer
can write about 10 lines of production code per day. Other sources use
different numbers, but as a baseline we will go with Brooks value of 10 LoC/person/day.
<br/><br>
For bug density,
<a href="https://www.mayerdan.com/ruby/2012/11/11/bugs-per-line-of-code-ratio?ref=hackernoon.com#:~:text=Bug%20to%20code%20ratios">various studies</a>
have come up with a number from 3 to 50 defects per 1000 LoC.
As a starting point, I will select 10 bugs per 1000 LoC,
or a bug specific volume of 100.
Combining these two values gives 10 * 100 = 1000 as the value of S.
This means our baseline developer could, for example, write 10 lines of code
with 10 bugs per 1000 LoC in one day, or 20 lines of code with
20 bugs per 1000 LoC.
<br/><br/>
In reality, different collections of people, different development environments,
and different project attributes will all lead to different values of S.
Organization should always be looking for ways to increase the value of S
for their projects, but for this analysis I am assuming that they have
already done this in all the easy ways, and the remaining opportunities
to increase S require larger investments and time to have an effect on the project.
Thus when analyzing our equation to see what predictions it makes for
a particular project, we will assume S is constant.
<a name="form-equation">
<h2>The form of the equation</h2>
The Ideal Gas Law was created by assembling a number of simpler laws that were
derived from empirical observations. Each of these simpler laws demonstrated the
relationship between two parameters when the other two were held constant.
<ul>
<li><a href="https://en.wikipedia.org/wiki/Boyle%27s_law">Boyle's Law</a>:
P ∝ 1/V when n and T are held constant
<li><a href="https://en.wikipedia.org/wiki/Charles%27s_law">Charles's Law</a>:
V ∝ T when P and n are held constant
<li><a href="https://en.wikipedia.org/wiki/Avogadro%27s_law">Avogadro's Law</a>:
V ∝ n when P and T are held constant
<li><a href="https://en.wikipedia.org/wiki/Gay-Lussac%27s_law">Gay-Lussac's Law</a>:
P ∝ T when V and n are held constant
</ul>
Our Ideal Software Law is similarly assembled from simpler guidelines.
We don't have previously stated laws, so we rely on our intuition to guide us.
<ul>
<li>All other things being equal, functionality is proportional to resources: F ∝ n
<li>All other things being equal, functionality is proportional to time: F ∝ T
<li>All other things being equal, quality will he higher with more resources
<li>All other things being equal, quality will he higher with more time
</ul>
Because quality is hard to define and measure, we don't actually know how close to
being proportional to the other variables it is.
For simplicity, we assume that it is proportional to both resources and time,
the same as functionality: Q ∝ n and Q ∝ T.
<br/><br/>
These four rules, when assembled, give us the form of the equation for
the Ideal Software Law shown above.
<a name="example">
<h2>Example</h2>
</a>
Let's make a concrete example.
Let's assume we have a project with the following parameters:
<ul>
<li>The functionality we desire requires 10,000 lines of code
<li>Our quality bar is 5 bugs per 1000 lines of code (better than baseline), so 200 LoC/bug
<li>We have 10 people on our team, all operating at baseline
<li>Our team software constant S is 1000, as calculated <a href="#software-constant">above</a>.
</ul>
How many days should we expect this project to take to complete?
From the Ideal Software Law, we have:
<br/><br/>
10,000 (LoC) * 200 (LoC/bug) = 10 (people) * 1000 (LoC^2/(bug*people*days)) * d (days)
<br/><br/>
Solving for d, we get d = (10,000*200)/(10*1000) = 200 days. A project team, given the assumptions above
(although perhaps not so explicitly), would perhaps deliver this estimate to
management when asked how long the project will take.
<a name="analysis">
<h2>Analysis</h2>
</a>
Now let's play with the parameters and see what happens.
<br/><br/>
The typical scenario is that management comes back to the team and says
"That estimate is too long. We need to deliver sooner. Make it happen faster."
What options does the team have?
<br/><br/>
Looking at the Ideal Software Law equation, if we want to make T smaller, we have four options:
<ul>
<li>Make F smaller (less functionality)
<li>Make Q smaller (less quality)
<li>Make n larger (more developers)
<li>Make S larger (higher velocity)
</ul>
Clearly making S larger would be good, but, as mentioned above, when considering the schedule for
a single project, this is unlikely to be a short-term option. That leaves us with three other
parameters that can be changed.
<br/><br/>
We could make n larger by adding more developers to the team. This can be effective if there are
people available, but practically speaking is difficult because of limited budgets, the difficulty
of finding appropriate developers, and the time-cost of bringing a new team member up to speed.
All of those factors make this choice possible but unlikely.
<br/><br/>
Now we are down to two parameters: functionality and quality. The developer team will typically
propose to make F smaller, also called a reduction in scope, by removing features from the project.
If this is acceptable to management, then the reduced value of T can be balanced by the reduced
value of F.
<br/><br/>
In many cases, however, management insists on not cutting any features. Now we are left with only
one parameter: quality. Because this is the hardest parameter to measure, it is also the one that
most often is ignored. In this situation, when T is made smaller and F, n, and S are unchanged,
Q must, perforce, be made smaller by the same fraction as T was reduced.
<br/><br/>
The choice to reduce quality is sometimes made consciously, and could come with a commitment to go back
later and improve quality. This is often referred to as taking on
<a href="https://en.wikipedia.org/wiki/Technical_debt">technical debt</a>,
which is expected to
be paid back by improving the code later. The word "debt" is used here in intentional analogy to financial debt:
there is a carrying cost to debt in the form of interest, making the total cost continue to go up the
longer it remains unpaid. In software, this manifests as more time spent fixing bugs after product release,
until such time as the debt is repaid by cleaning up the code to bring its quality back up.
<br/><br/>
If, however, a decision is made to reduce project time without changing functionality or resources, without
consciously recognizing that there will be a reduction in quality, this is effectively like borrowing
money without realizing it or having a plan to pay it back. The interest payments will still be there,
in the form of more time spent fixing bugs and more time required to add new features,
and that will negatively impact the team's schedule on future projects.
<a name="limitations-abstraction">
<h2>Limitations of the abstraction</h2>
</a>
All abstractions will eventually break down when the parameters go outside the valid range of the abstraction.
<ul>
<li>Newton's law of gravity elegantly describes the paths of the planets, but starts to break
down in <a href="https://aether.lbl.gov/www/classes/p10/gr/PrecessionperihelionMercury.htm">strong gravitational fields</a>
<li>The constant-time swing of a <a href="https://qsstudy.com/the-laws-of-a-simple-pendulum/">pendulum</a>
of a given length <a href="https://www.acs.psu.edu/drussell/Demos/Pendulum/Pendula.html">starts to change</a>
when the pendulum swings too far from its center position
<li>The Ideal Gas Law
<a href="https://en.wikipedia.org/wiki/Ideal_gas_law#Deviations_from_ideal_behavior_of_real_gases">becomes less accurate</a>
at lower temperatures, higher pressures, and with larger gas molecules
</ul>
Understanding the limitations of an abstraction allows us to improve our predictions.
In the <a href="#parameters">Parameters</a> section above, I discuss some of the assumptions
about each parameter. When we recognize that an assumption does not hold, we can bend the
results of our formula to try to compensate.
<br/><br/>
For example, our formula tells us we can get the same functionality in half the time
by doubling our resources. But we know that it takes time to bring a new developer up to speed
on a project, so we won't actually be able to cut our time in half. By estimating how much
reality deviates from our assumption, we can improve the accuracy of the predictions made
by the formula despite
the fact that the assumptions behind the formula are not entirely accurate.
<a name="conclusion">
<h2>Conclusion</h2>
</a>
By abstracting the parameters of software development and creating an equation, we can make
practical predictions about those parameters.
We can make such predictions even when the assumptions behind our formula are not completely true.
<br/><br/>
One of the most important predictions is this:
<blockquote>
If you insist on reducing the time available to complete a software project, and you don't increase
the number of people on the project or cut some features, the quality of the delivered sofware will
decrease proportionally to the reduction in time.
</blockquote>
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-7734594885621411122022-05-01T21:24:00.000-07:002022-05-01T21:24:42.366-07:00Home Automation for a Hot Water Recirculating PumpMy bathroom is pretty far from the water heater. It took over
a minute of running the hot water for it to actually get hot. That's
a lot of water wasted every time I waited for hot water.
I wanted hot water faster.
<h2>Contents</h2>
<ul>
<li><a href="#recirculating-hot-water">Recirculating Hot Water</a>
<li><a href="#home-automation">Home Automation</a>
<li><a href="#hardware">Hardware</a>
<li><a href="#initial-setup">Initial Setup</a>
<li><a href="#adding-devices">Adding Devices</a>
<li><a href="#programming">Programming</a>
</ul>
<a name="recirculating-hot-water">
<h2>Recirculating Hot Water</h2>
</a>
Last year, as part of a bathroom remodel, I had a hot water recirculating
system installed. This consisted of a return pipe from the bathroom
and a recirculating pump at the water heater to
pull water from the return pipe, thus bringing hot water to the bathroom
without having to run water down the drain waiting for it to warm up.
<br/><br/>
Once the system was installed, I learned that the pump is not supposed to
run all the time. In addition,
the pump, while not terribly noisy, produced enough noise to be
annoying, especially in the parts of the house adjacent to the garage
where the water heater and pump were located. So I didn't want to run
it all the time for that reason.
<br/><br/>
The installer gave me a timer. I set it up to run in the morning and the
evening. My schedule wasn't precise
enough to run the timer for just a short amount of time, so I had it
set up to run for about an hour.
This didn't work very well:
besides the noise issue mentioned above,
the temperature of the water dropped a noticeable amount during this period.
I needed another solution.
<a name="home-automation">
<h2>Home Automation</h2>
</a>
My solution was to set up a home automation system with some outlets and
some battery-powered pushbuttons and program it so that when one of the
pushbuttons was pressed, it would turn on the outlet for a couple of minutes
to run the recirculating pump. This has worked well.
<br/><br/>
Years ago I used a bunch of
<a href="https://en.wikipedia.org/wiki/X10_(industry_standard)">X10</a>
switches and outlets. I even
installed a blocker to isolate the X10 signals in my house from the incoming
power line and a coupler to ensure the X10 signals from one 120V leg made it
to devices on the other 120V leg.
I eventually stopped using those devices and had not installed any other
home automation until now.
<br/><br/>
After looking at what was available, I decided to use the following technologies
for my new home automation system:
<ul>
<li>Home Assistant as the controller
<br/>I chose this for two reasons:
<ol>
<li>I don't want my system to depend on the cloud or to be sending data
out to anyone. Home Assistant allows me to do everything myself and
be isolated from the internet. My automation won't stop working when
my internet connection or someone else's computers or software go down.
<li>I like to tinker. Home Assistant is highly customizable - as long as
you are willing to fiddle with it.
</ol>
<li>Zigbee 3.0 devices
<ul>
<li>I looked at Zigbee and Z-wave and decided Zigbee looked like the better
choice for number of compatible available devices.
<li>I specifically did not want to use wifi devices.
</ul>
</ul>
Having made those two choices, the next choice was where to run Home Assistant and
how to connect the Zigbee devices to it.
I figured I would use a USB Zigbee coordinator.
For the Home Assistant host, I considered running it on my desktop (which is always on),
on my Synology NAS, or on a bespoke device such as a Raspberry Pi. I learned that
Synology announced they would be
<a href="https://macandegg.com/2021/06/synology-dsm-7-0-ends-support-for-usb-devices/"
>removing support for external USB devices</a>
other than disks,
so I eliminated that choice. I starting looking into using a Raspberry Pi and read
multiple comments about high failure rates of the SD cards. Someone suggested attaching
a USB SSD, which seemed like a good idea, but that would require more research and figuring
out how to mount everything. About this time I discovered
<a href="https://www.home-assistant.io/blue/">HA Blue</a>,
a nice little device based on the Odroid-N2 with 128GB of on-board eMMC,
4 USB ports, ethernet, and HDMI, all in a good-looking extruded aluminum case,
and pre-loaded with Home Assistant.
It's a little more expensive than some other options, but for me the
added convenience of a pre-installed system and the nice case were
worth the price.
<br/><br/>
Note: Home Assistant Blue has been discontinued and is being superseded by
<a href="https://www.crowdsupply.com/nabu-casa/home-assistant-yellow">Home Assistant Yellow</a>,
which has a built-in Zigbee radio and more expansion slots.
<br/><br/>
Even after deciding on Zigbee, there were a few different available ways to set up
the communication between the Zigbee devices and
Home Assistant. After doing some reading, I settled on using zigbee2mqtt.
It seems like one of the newer solutions, and one where I would have less
trouble integrating a wider variety of devices.
<a name="hardware">
<h2>Hardware</h2>
</a>
For my initial foray into home automation and based on my decisions above, I bought the following:
<ul>
<li><a href="https://www.home-assistant.io/blue/">HA Blue</a>
bespoke Home Assistant controller pre-loaded with Home Assistant
<li>SmartLight Zigbee CC2652P Coordinator v4 USB Adapter preflashed with
CC2652P_E72_20210319 firmware to support zigbee2mqtt
<li>Some Sonoff S31 Lite Zigbee outlet plugs
<li>Sonoff SNZB-01 Zigbee switch
<li>Some Linkind Zigbee switches and outlets
</ul>
I used the <a href="https://zigbee.blakadder.com/">Blakadder compatibility list</a>
to find devices that were compatible with zigbee2mqtt, then looked at which ones I
could get and what they cost. The outlets and switches I bought were on the less
expensive end of the range, costing less than $10 each, although the price
has since gone up.
<a name="initial-setup">
<h2>Initial Setup</h2>
</a>
Setting up the HA Blue system was straightforward:
<ol>
<li>Plug it in to power and ethernet
<li>Look in my DHCP log to see what IP address it was assigned
<li>Open my web browser to port 8123 at that IP address
<li>Wait for it to run through its first-boot setup (about 10 minutes)
<li>Create an account for myself
</ol>
I set up the Zigbee USB adapter (following a
<a href="https://www.youtube.com/watch?v=1uxRvbbd0fc">YouTUBE video</a>
(but beware, there have been some changes since that video was made):
<ol>
<li>Plug in the Zigbee USB adapter
<li>Log into HA Blue using my account
<li>Enable Advanced mode in my profile
<li>Create user "mqtt" to handle mqtt stuff
<li>From the Add-on store, install Mosquito Broker
<li>Configure Mosquito Broker by adding the mqtt user, and start it
</ol>
Once the Zigbee adapter was in place, I set up zigbee2mqtt:
<ol>
<li>In the Add-on store screen, from the "..." menu, select Repository
and add the URL for the
<a href="https://github.com/zigbee2mqtt/hassio-zigbee2mqtt">zigbee2mqtt repository</a>,
then find the Zigbee2mqtt Hass.io Add-on near the bottom and select it
<li>Find the USB port the Zigbee adapter is connected to: in
Supervisor, System, Host box, three-dot menu, Hardware is a list of
devices in /dev; by plugging and unplugging the Zigbee adapter
I could see that it shows up as device 1-1.2 with path /dev/bus/usb/001/004
and as /dev/ttyUSB0. Or you can just assume /dev/ttyUSB0.
<li>Edit the configuration on the zigbee2mqtt module and change the default pot
from /dev/ttyACM0 to /dev/ttyUSB0, and change the username to mqtt
<li>Start the module
</ol>
I also set up ssh to simplify future customizations:
<ol>
<li>Install the Terminal & SSH Add-on and start it
<li>Open the Terminal & SSH Web UI, which is a web terminal, usable as an alternative to ssh
<li>In the Terminal & SSH Config network page, specify port 22
<li>In the Terminal & SSH Config page, add my public key
to the authorized_keys array in single quotes
<li>Save, and restart the module
<li>ssh to the HA Blue as root
</ol>
At this point I rebooted the HA Blue and looked in the Log for each module
to make sure it was working properly.
<br/><br/>
The above description of setting up zigbee2mqtt is condensed, as I actually
had a bit of trouble setting it up, including using an old zigbee2mqtt repository
that I later replaced with the newer repository URL given above.
<a name="adding-devices">
<h2>Adding Devices</h2>
</a>
With Zigbee configured on my HA system, I was ready to add my Zigbee switches and outlets.
<br/><br/>
In order to add a new Zigbee device to the network, the zigbee2mqtt module must be configured to
permit devices to join. Initially I was doing this by directly editing the configuration
of the zigbee2mqtt module and changing the value of the permit-join attribute to true.
Once the new device had been added, I then edited the configuration again and changed
permit-join back to false. Later, I discovered I could just use the Web UI for the
zigbee2mqtt module and click on the "Permit join" button, which enables permit-join
for 255 seconds with a count-down timer, after which it automatically turns it off.
<br/><br/>
With the HA Blue system beside me, I enabled permit-join. The LED in the Zigbee adapter started
flashing green to indicate that it was in permit-join mode.
<br/><br/>
The first device I attached was a SONOFF SNZB-01 button:
<ol>
<li>Pry off the back of the button, remove the paper battery insulation sheet,
replace the battery and back
<li>Using a paper clip, press and hold the reset button for 5 seconds, until the red light flashes
<li>After a couple more seconds, the tile for Mosquitto Broker shows "1 device and 3 entities"
<li>Click on "1 device" to open a list of devices
<li>Click on the device to open its details page
<li>Click on the pencil icon by the hex name at the top of the page and rename the device
and the entity IDs
<li>Press the button, it briefly shows "single" by the "action" line
<li>Double-click, it briefly shows "double" by the "action" line
</ol>
Yay, my first Zigbee device is working!
<br/><br/>
I added a few more devices with basically the same process. Sometimes they would
join just by enabling permit-join, but sometimes I also had to reset the device.
I had some Sonoff devices and some Linkind devices, and I got them all working,
although I did have one unexpected hiccup.
<br/><br/>
I had purchased a few Linkind outlets. The first one successfully joined my
network, but the second one did not. After a few tries, I finally looked at the
zigbee2mqtt log and saw that there were error messages saying the unit was
not supported. (Lesson: if a new device doesn't join right away, look in the
log file for errors!) Although the two outlets were sold under the same product name
and looked the same, it turned out they had different model numbers:
the unit that worked was ZS190000118 and the unit that failed to join was
ZS190000108.
<br/><br/>
In order to add support for this slightly different flavor of Linkind outlet,
I found and followed some instructions to
<a href="https://www.zigbee2mqtt.io/how_tos/how_to_support_new_devices.html">support a new device</a>.
<ol>
<li>ssh into my HA Blue as root
<li>cd to <code>config/zigbee2mqtt</code>
<li>edit the new file <code>ZS190000118.js</code>
<li>In web browser, open https://github.com/Koenkk/zigbee-herdsman-converters/blob/master/devices/linkind.js,
look for Linkind ZS190000118, and copy that stanza into my yaml file
(this assumed the description was compatible, which turned out to be true)
<li>Change zigbeeModel to ['ZB_ONOFFPlug_D0008'] (from the zigbee2mqtt log)
<li>Change model to 'ZS190000108' (from the zigbee2mqtt log)
<li>Add the rest of the boilerplate as specified in step 2 of the instructions
<li>Write out the new file
<li>Update the zigbee2mqtt config to add the new device:
set advanced:log_level: debug (was warn);
set external_converters: - ZS190000108.js
<li>Save, Restart
</ol>
<a name="programming">
<h2>Programming</h2>
</a>
Once the hardware was all in place and working, the next step was to set up
the programming. It looks like there are multiple ways this can be done, and
as a programmer I figured it wouldn't be too hard to write some automation
code, but then I discovered Node-RED, a graphical editor plugin.
<br/><br/>
I installed Node-RED from the Community section of the AddOns menu.
I had a bit of trouble with the certificate stuff, but eventually got
that working. I then created a flow such that when I pressed one of
my buttons, it would turn on the pump for two minutes.
I spent too much time trying to figure out how to do the whole thing
using standard components, but eventually decided the standard components
were not quite up to the task. I ended up using a few function components,
in which I wrote a bit of Javascript code.
<br/><br/>
My buttons are connected to the input of the Add Time function component,
which adds time to a counter each time a button is pressed, with a
max value. The buttons are also wired to an on-outlet component
that turns on the recirculating pump.
<br/><br/>
Here is the Add Time code:
<pre><div class="code"
>// On Start
flow.set("max_count", 120); // 2 minutes
flow.set("button_increment", 80); // 1 minute and 20 seconds
// On Message
max_count = flow.get("max_count");
button_increment = flow.get("button_increment");
c = flow.get("counter")
if (c < 0) {
c = 0;
}
c = c + button_increment;
if (c > max_count) {
c = max_count;
}
node.status({fill:"blue",shape:"dot",text:"count:"+c});
flow.set("counter", c);
return msg;
</div></pre>
Once time has been added to the timer, there is another function
that counts down to zero, the Count Down function.
The input of the Count Down function is connected to a Ticker
component that ticks once per second.
The output of the Count Down function is connected to an off-outlet
component that turns off the recirculating pump.
<br/><br/>
Here is the Count Down code:
<pre><div class="code"
>// On Start
flow.set("counter", 0)
// On Message
c = flow.get("counter")
c = c - 1
flow.set("counter", c)
if (c > 0) {
node.status({fill:"green",shape:"dot",text:"count:"+c});
return {payload:{counter:c}};
} else if (c == 0) {
node.status({fill:"yellow",shape:"dot",text:"stop"});
return {payload:"stop"};
} else {
node.status({fill:"red",shape:"dot",text:"stopped"});
return {payload:"stopped"};
}
</div></pre>
This worked well, but I wanted some kind of feedback so I knew when
the pump was on. To get that, I added another smart outlet, into which
I plugged a
<a href="https://www.amazon.com/Westek-NL-Orbs-2-Specialty-Lites-White/dp/B07CP745WB/">guide light</a>.
I then added a function component that monitored the state of the pump switch
with a state-changed component, such that when the pump outlet turned on or off,
the function would turn on or off the outlet with the guide light.
The function also set the node status within Home Assistant so I could see
on the Node-RED schematic when it was on or off.
<br/><br/>
Here is the Outlet State code:
<pre><div class="code"
>// On Start
flow.set("counter", 0)
// On Message
state = msg.payload;
if (state == "on") {
node.status({fill:"green",shape:"dot",text:"on"});
} else if (state == "off") {
node.status({fill:"red",shape:"dot",text:"off"});
}
return msg
</div></pre>
After getting this all set up, I spent some time testing with
different pump-on times and tweaked the values to be just long
enough to get the initial hot water to the bathroom sinks.
I'm pretty happy with how it is working now.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-85617661734104635362021-11-21T15:31:00.002-08:002021-11-21T15:42:53.682-08:00From Counting to Complex by Inverse and ClosureWalking the path from counting numbers to complex numbers.
<h3>Contents</h3>
<ul>
<li><a href="#preface">Preface</a>
<li><a href="#intro">Introduction</a>
<ul>
<li><a href="#concepts">Concepts</a>
<li><a href="#preview">Preview</a>
</ul>
<li><a href="#counting">Counting</a>
<ul>
<li><a href="#equals">Equals</a>
<li><a href="#natural-numbers">Natural Numbers</a>
<li><a href="#greater-than">Greater Than</a>
</ul>
<li><a href="#addition">Addition</a>
<ul>
<li><a href="#addition-associative">Associative</a>
<li><a href="#addition-commutative">Commutative</a>
<li><a href="#addition-identity">Identity</a>
<li><a href="#addition-algebra">Algebra</a>
</ul>
<li><a href="#subtraction">Subtraction</a>
<ul>
<li><a href="#subtraction-associative">Associative</a>
</ul>
<li><a href="#negative-numbers">Negative Numbers</a>
<ul>
<li><a href="#negative-addition">Addition</a>
<li><a href="#negative-subtraction">Subtraction</a>
<li><a href="#negative-algebra">Algebra</a>
</ul>
<li><a href="#multiplication">Multiplication</a>
<ul>
<li><a href="#multiplication-identity">Identity and Zero</a>
<li><a href="#multiplication-distributive">Distributive</a>
<li><a href="#multiplication-associative">Associative</a>
<li><a href="#multiplication-commutative">Commutative</a>
<li><a href="#multiplication-algebra">Algebra</a>
</ul>
<li><a href="#division">Division</a>
<ul>
<li><a href="#division-associative">Associative</a>
</ul>
<li><a href="#rational-numbers">Rational Numbers</a>
<ul>
<li><a href="#rational-algebra">Algebra</a>
</ul>
<li><a href="#exponentiation">Exponentiation</a>
<li><a href="#logarithms">Logarithms</a>
<li><a href="#principal-values">Principal Values</a>
<li><a href="#irrational-numbers">Irrational Numbers</a>
<ul>
<li><a href="#decimal-notation">Decimal Notation</a>
</ul>
<li><a href="#imaginary-numbers">Imaginary Numbers</a>
<li><a href="#complex-numbers">Complex Numbers</a>
<ul>
<li><a href="#complex-cartesian">Cartesian Coordinates</a>
<li><a href="#eulers-formula">Euler's Formula</a>
<li><a href="#complex-exponentiation">Complex Exponentiation</a>
<li><a href="#eulers-identity">Euler's Identity</a>
</ul>
<li><a href="#final-closure">Final Closure</a>
</ul>
<a name="preface"></a>
<h3>Preface</h3>
Many years ago I read that Richard Feynman
gave a talk to a room full of scientists
in which he rederived basic abstract algebra on real numbers
in under an hour.
Since then I found that Feynman gave this derivation in a discussion on Algebra
in his Lectures on Physics, for which I give a link a few paragraphs below.
<br/><br/>
I'm not going to compete with Feynman,
but doing this derivation seemed like a fun challenge to undertake.
Below I present my explanation of how one gets to complex
numbers based on a few simple concepts: repetition,
<a href="http://en.wikipedia.org/wiki/Inverse_function">inverse</a> and
<a href="http://en.wikipedia.org/wiki/Closure_(mathematics)">closure</a>.
Along the way I try to throw in a few comments about
<a href="http://en.wikipedia.org/wiki/Algebraic_structure">abstract algebra</a>.
By the end, we will look at
<a href="https://en.wikipedia.org/wiki/Euler%27s_identity">Euler's Identity</a>,
<code>e<sup><i>i</i>π</sup>+1=0</code>,
and maybe make it a little less mystical than it might appear.
<br/><br/>
It is not necessary for you to understand all of the references
to math terms, so you don't need to follow those links unless you
want to learn about that concept.
Similarly, it is not necessary for you to follow and understand in
detail every proof.
Hopefully you can simply ignore any parts you don't
immediately understand and yet still get something out
of the overall presentation.
<br/><br/>
I walked this path mostly for my own entertainment, but I thought
perhaps others might get something out of it.
It is quite long and likely contains some errors,
so <i>caveat lector</i>.
<br/><br/>
Here are a couple of other documents that discuss Algebra that you might find interesting:
<ul>
<li><a href="https://www.feynmanlectures.caltech.edu/I_22.html"
>Feynman Lectures on Physics, chapter 22: Algebra</a>, including a discussion of Euler's Formula,
which Feynman referred to as
"one of the most remarkable, almost astounding, formulas in all of mathematics."
<li><a href="https://books.google.com/books?id=zqsXAAAAIAAJ&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false">Elementary Algebra</a> by J. H. Tanner, PhD, 1904
</ul>
<a name="intro"></a>
<h3>Introduction</h3>
Imagine that none of this stuff exists, so we are making it all up
as we go. We are going to define our numbering system from the ground up,
gradually building up a structure of definitions and operations
that all manage to work together nicely.
It's not just by random chance that things work nicely:
we are defining our numbers and operations precisely to make
them work together nicely.
<br/><br/>
In the code blocks below, I label each assumption (or definition)
with a name such as A1 enclosed in square brackets,
like this: [A1].
Lemmas (things which can be proved from the assumptions and are
used in later proofs) are labeled
similarly but with L rather than A.
Other intermediate steps in a proof which are not referenced outside
of that proof are labeled similarly but with I.
These names may be referenced later to build up additional lemmas.
The references look the same,
but appear in the text or in comments after an equation rather than before.
<a name="concepts"></a>
<h4>Concepts</h4>
There are three basic ways we will be extending our system:
<ul>
<li>Repetition: performing the same operation many times.
For example, multiplication is repeated addition.
<li>Inverse: an operation that has the opposite effect of some
other operation.
For example, subtraction is the inverse of addition.
<li>Closure: the results of an operation are in the same set
as the operands.
For example, the natural numbers (or positive integers) are closed under
addition, because you can add any two natural numbers
and get another natural number;
but they are not closed under
subtraction, because there are some expressions on natural numbers
using subtraction whose results are not natural numbers,
such as (3 - 5).
</ul>
<a name="preview"></a>
<h4>Preview</h4>
Here is the quick preview of how we will move from counting to complex:
<ul>
<li>start with zero and the successor function
<li>repeated successors yields counting and the natural numbers
<li>repeated counting yields addition
<li>inverse of addition yields subtraction
<li>closure on subtraction yields negative numbers
<li>repeated addition yields multiplication
<li>inverse of multiplication yields division
<li>closure on division yields rational numbers
<li>repeated multiplication yields exponentiation
<li>inverse of exponentiation yields logarithms
<li>closure on exponentiation with positive rational
numbers yields real numbers
<li>closure on exponentiation with negative rational
numbers yields complex numbers
<li>all of our operations on complex numbers are already closed, so we are done
</ul>
If you enjoy playing with math you might want to try doing
all of these derivations yourself before reading my derivations.
<a name="counting"></a>
<h3>Counting</h3>
At the most basic level, we start with some simple assumptions,
which happen to be a subset of the Peano axioms.
<br/><br/>
We define a starting point for counting.
Historically, people typically started with one,
but for later simplicity in this exercise we start with zero.
We define a successor function s(x) that takes a number x
and produces the next number, which by definition is distinct
from x.
<pre><div class="code"
>[A1] zero exists
[A2] given x, s(x) generates another number, where s(x) is not the same as x
</div></pre>
<a name="equals"></a>
<h4>Equals</h4>
We define an equals operator (=) so that the statement a=a is true,
and the statement a=b
means that, for any true statement containing a, we can replace any or all
instance of
a by b and the resulting statement will also be true.
We further assume that if a=b is false, then the same replacements as
described above will generally (but not always) yield a false statement.
<pre><div class="code"
>[A3] a=a is true for all a
[A4] a=b is a replacement rule (described above)
</div></pre>
The equals operator is:
<ul>
<li>Reflexive: a=a (by definition)
<li>Symmetric: if a=b then b=a. Starting with the true statement a=a
and the predicate a=b,
by our definition of equals we can replace any instance of a by b
in a=a and still have a true statement; we chose to replace
the first a by b, yielding b=a.
<li>Transitive: if a=b and b=c, then a=c.
Taking the assumed true statement a=b, and applying our equals rule
using the second statement b=c, we replace b by c in the first
statement, yielding a=c.
</ul>
<pre><div class="code"
>[L5.1] if a=b then b=a (demonstrated above)
[L5.2] if a=b and b=c then a=c (demonstrated above)
</div></pre>
For convenience, we define the not-equals operator != to be false
whenever equals on the same values is true, and vice=versa.
<br/><br/>
The above definition also leads almost directly to one of the
common ways of solving algebraic equations: performing the same
operation to both sides of an equation, such as adding the same
number to both sides of an equation, or multiplying both sides
by the same number.
Here's an example of adding the same amount to both sides of an equation.
<pre><div class="code"
>a = b Assume this is our starting equation we are working with
a + c = a + c True by definition [A3]
a + c = b + c From [A4]
</div></pre>
Note that this works for any function:
<pre><div class="code"
>[I6.1] a = b Assume this is our starting equation we are working with
[I6.2] f(a) = f(a) True by definition [A3]
[I6.3] f(a) = f(b) From [A4] using [I6.2] as a starting equation
and [I6.1] as our replacement rule
[L6.4] if a = b then f(a) = f(b) for any f defined for a
</div></pre>
f(x) might be 2*x, x+3, sin(x), or anything else we desire.
Thus we can start with any true equation, perform the same valid
operation on both sides, and still have a true equation.
<a name="natural-numbers"></a>
<h4>Natural Numbers</h4>
Given our previously defined starting point of zero,
we now define the natural numbers:
<pre><div class="code"
>[A7.0] 0=zero
[A7.1] 1=s(0)
[A7.2] 2=s(1)
[A7.3] 3=s(2)
etc. to infinity.
</div></pre>
By definition, s(x)!=x, so 1!=0, 2!=1, etc.
Note that we did not assume that repeated application of s(x) would not
eventually give us the same number.
Without that assumption it is possible that, for example, s(s(s(x)))=x,
or in other words, 3=0.
This yields a "modulo" system, which can be useful.
But for this particular exposition, I want to use the "normal" numbers,
so we will add the assumption that s(x) is never equal to any previous
value in the sequence.
More precisely, we assume:
<pre><div class="code"
>[A8] For any x, repeated application of the successor function
any number of times will never generate x.
</div></pre>
We have now defined an unending stream of distinct numbers, each of which is
a successor to one other number.
<a name="greater-than"></a>
<h4>Greater Than</h4>
We next define the relational operators less than (<)
and greater than (>) with the
following statements:
<pre><div class="code"
>[A9] s(a) > a
[A10] if (a > b) and (b > c) then (a > c)
[A11] (b < a) always has the same truth value as (a > b)
</div></pre>
We are now at the point where we can count and know
(by definition) that each time we
count we get a number that is greater than all of the previous numbers.
We can start with any number and count up from there by repeated
application of the successor function.
For example, if we start with 4 (which is s(s(s(s(zero))))) we can
count up from there by three by repeated application of
the successor function three
times to get s(s(s(4))), which we can calculate is 7.
This gets unwieldy pretty fast.
To make this simpler, let's define an "addition" operator + that gives us
the same results as repeated counting.
<a name="addition"></a>
<h3>Addition</h3>
We define the addition operator (<code>+</code>) as follows:
<pre><div class="code"
>[A21] a + 0 = a
[A22] a + s(b) = s(a + b)
</div></pre>
Some quick examples:
<pre><div class="code"
>[L23.1] a + 1 = a + s(0) = s(a + 0) = s(a)
[L23.2] a + 2 = a + s(1) = s(a + 1) = s(s(a))
</div></pre>
Since s(a) = a+1, we also have
<pre><div class="code"
>[L23.3] a + s(b) = a + (b+1)
[L23.4] s(a + b) = (a + b) + 1
</div></pre>
For some of what we want to do below,
we are going to need to use the rule of induction:
<pre><div class="code"
>[A24] If an equation is true for a known value of n,
and it can be demonstrated to be true for n+1 for any n when true for n,
then it is true for all natural numbers x where x > n.
</div></pre>
<a name="addition-associative"></a>
<h4>Associative</h4>
We now show that our addition operator is associative.
We want to prove that (a+b)+n = a+(b+n) for all n.
We start by showing this is true for n=1,
then use induction:
<pre><div class="code"
>[L25.1] a + (b + 1) = (a + b) + 1 From [A22], [L23.3] and [L23.4]
[I25.2] a + (b + n) = (a + b) + n Inductive assumption, true for n=1
a + (b + (n + 1))
= a + ((b + n) + 1) From [L25.1] on (b+(n+1))
= (a + (b + n)) + 1 From [L25.1] with (b+n) for b
= ((a + b) + n)+ 1 From [I25.2] applied to (a+(b+n))
= (a + b) + (n + 1) From [L25.1] in reverse with (a+b) for a and n for b
[L26] a + (b + c) = (a + b) + c Above lines summarized, with c for n+1
</div></pre>
Thus by induction we have our proof of associativity.
<a name="addition-commutative"></a>
<h4>Commutative</h4>
We use a similar approach to show that addition is commutative,
such that a+b=b+a.
We start by showing that 0 commutes with a for any a.
<pre><div class="code"
>[I27.1] 0 + 0 = 0 From [A21] with 0 for a
0 + 1 = 0 + s(0) From [L23.1]
= s(0 + 0) From [A22] with 0 for a and b
= s(0) From [L27]
= 1
[L27.2] 0 + 1 = 1 Summary of the above few lines
[I27.3] 0 + n = n Inductive assumption, true for n=1 from [L27.2]
[I27.4] 0 + (n + 1) = (0 + n) + 1 From [L26]
[I27.5] 0 + (n + 1) = n + 1 By induction from [I27.3] and [I27.4]
[L27.6] 0 + a = a From [I27.5] with a for n+1
[I27.7] 0 + a = a = a + 0 From [L27.6] and [A21]
[L27.8] 0 + a = a + 0 From [L5.2]
</div></pre>
Now we show that 1 commutes with any number by induction.
<pre><div class="code"
>1 + (n + 1)
= 1 + s(n) From [L23.1] on (n+1) with n for a
= s(1 + n) From [A22] with 1 for a and n for b
= s(n + 1) From inductive assumption that 1 commutes with n, known true for n=0
= n + s(1) From [A21] with n for a and 1 for b
= n + (1 + 1) From [L23.1] on s(1) with 1 for a
= (n + 1) + 1 From [L25.1]
[L28] 1 + a = a + 1 Summary of the above with a for n+1
</div></pre>
Finally, we use induction again to show that any two numbers commute.
<pre><div class="code"
>a + (n + 1)
= (a + n) + 1 From [L25.1]
= (n + a) + 1 From inductive assumption that a commutes with n, known true for n=1 [L28]
= n + (a + 1) From [L25.1]
= n + (1 + a) From [L28]
= (n + 1) + a From [L25.1]
[L29] a + b = b + a Summary of the above with b for n+1
</div></pre>
As a final note for addition, since we have demonstrated that
(a+b)+c=a+(b+c), we can omit the parentheses when adding multiple
terms without creating any ambiguity.
<pre><div class="code"
>[A30] a + b + c = (a + b) + c = a + (b + c)
</div></pre>
Repeated application of this rule can be used for addition with
four or more terms without parentheses.
By combining this rule with [L29] commutative law,
we can see that we can take an expression with multiple terms
added together, such as
<code>a + b + c + d + e</code>
and rearrange and group the terms any way we want.
<br/><br/>
The associative rule also makes it easy to calculate our addition facts.
We already know that 1=0+1, 2=1+1, 3=2+1 etc from our definitions [A7]
with [L23.1].
That lets us fill in the first row of our addition fact table.
We can then calculate all of the n+2 values based on the n+1 values,
and repeat ad infinitum for the rest of the numbers.
<pre><div class="code"
>n + 2 = n + (1 + 1) = (n + 1) + 1
n + 3 = n + (2 + 1) = (n + 2) + 1
n + 4 = n + (3 + 1) = (n + 3) + 1
</div></pre>
Wikipedia has
<a href="http://en.wikipedia.org/wiki/Addition_of_natural_numbers/Proofs">
proofs of associativity and commutativity</a>
of addition, which are similar to mine but
actually a little more concise,
and
<a href="http://www.dpmms.cam.ac.uk/~wtg10/addcomm.html">here</a>
is a proof of commutativity that does not rely on associativity -
but I wanted to think through these derivations
myself and present them here in-line with the rest of my exposition.
<a name="addition-identity"></a>
<h4>Identity</h4>
At this point we know that a+0=a [A21] and 0+a=a [L27.6],
or in other words adding zero to any number (on either side,
since we showed addition is commutative) yields that number.
This is an interesting enough fact that we will give this
number a special name: the <b>Identity</b> for addition.
<br/><br/>
It's easy to show that there is only one identity for addition.
<pre><div class="code"
>Assume two identity values e and f.
Consider the expression e+f.
Because e is an identity, e+f=f.
Because f is an identity, e+f=e.
Therefore e=f.
[L31] Since this is true for any two identities,
all are in fact the same one identity.
</div></pre>
<a name="addition-algebra"></a>
<h4>Algebra</h4>
We have built up our concepts in layers, like building a house:
we set a foundation with zero and the successor function,
put in some rim joists with the natural numbers,
and laid on some flooring with the addition operator and
its identity element.
We have created a little structure from our concepts.
Whereas a house is a physical structure, this is an
algebraic structure.
<br/><br/>
It turns out that this algebraic structure is useful enough
that mathematicians have given this kind of structure a name:
a <a href="http://en.wikipedia.org/wiki/Monoid">monoid</a>.
A monoid has these characteristics (with our case in parentheses):
<ul>
<li>It has a set of elements (the natural numbers).
<li>It has a binary operation on those elements (the + operator).
<li>The operation is associative (+ is associative).
<li>The operation is closed (adding two natural numbers always
produces another natural number).
<li>It has an identity element (zero).
</ul>
There are a few rules from the above section that we will use often enough
that we want to reference them by name rather than lemma number.
We use the first letter of the name of the characteristic, followed
by the operator character.
<pre><div class="code"
>[a+] a + (b + c) = (a + b) + c [L26] Associativity of addition
[c+] a + b = b + a [L29] Commutativity of addition
[i+] a + 0 = 0 + a = a [A21], [L27.6] Identity for addition
</div></pre>
<a name="subtraction"></a>
<h3>Subtraction</h3>
At this point we have the ability to perform addition, which allows us
to calculate a value for x in such equations as <code>x = a + b</code>.
But we don't yet have the ability to solve for x in the equation
<code>a + x = b</code>.
We want to add an operation that is the opposite of addition.
In other words, if we start with a and add b to it, we want to be
able to take the result and perform another operation using b
in order to get back to a.
An operator that has this characteristic is called an inverse.
We are going to define an operation that is the inverse of addition.
We will call that operation subtraction,
and we will use the dash character (<code>-</code>) as the operator.
<br/><br/>
Before we defined addition, we already had the successor function [A2]
and we defined the numbers [A7] in terms of the successor function.
We defined addition with two axioms [A21] and [A22], then showed that
adding 1 to any number is the same [L23] as applying the successor function.
Including the successor function and the definitions of the numbers in
terms of the successor function, we really had four pieces going into
the definition of addition.
<br/><br/>
We could follow the same path and define a predecessor function that is
the inverse of the successor function, but instead we will skip that step
and work in terms of adding and subtracting 1 instead of successor and
predecessor functions.
<br/><br/>
We define our subtraction operator (<code>-</code>) recursively,
similarly to how we defined the addition operator, using an additional
axiom [A41.1] in place of defining a predecessor function p(x):
<pre><div class="code"
>[A41] a - 0 = a
[A41.1] (a + 1) - 1 = a
[A42] a - (b + 1) = (a - b) - 1
</div></pre>
So let's see how this works:
<pre><div class="code"
>3 - 0 = 3 From [A41]
3 - 1 = (2 + 1) - 1 = 2 From [A41.1], and since 3 is the successor to 2 (i.e. 3=2+1)
3 - 2 = 3 - (1 + 1) = (3 - 1) - 1 = 2 - 1 = (1 + 1) - 1 = 1
</div></pre>
<a name="subtraction-associative"></a>
<h4>Associative</h4>
We want to prove the associative laws for subtraction so we know how
we can transform various combinations of parentheses and operators.
We already know about <code>a + (b + c)</code>,
so there are three other possible combinations of + and - with the
parentheses in the same position:
<ul>
<li><code>a - (b + c)</code>
<li><code>a + (b - c)</code>
<li><code>a - (b - c)</code>
</ul>
We start with <code>a - (b + c)</code>.
<pre><div class="code"
>[L43.1] a - (b + n) = (a - b) - n Inductive assumption, true for n=1 from [A42]
a - (b + (n + 1))
= a - ((b + n) + 1) From [a+]
= (a - (b + n)) - 1 From [A42]
= ((a - b) - n) - 1 From [L43.1] on (a-(b+n))
= (a - b) - (n + 1) From [A42] with (a-b) for a and n for b
[L43.2] a - (b + c) = (a - b) - c Above lines summarized, with c for n+1
</div></pre>
Next we do <code>a + (b - c)</code>, which we do by induction
after first doing <code>a + (b - 1)</code>.
<pre><div class="code"
>(a + (n + 1)) - 1
= ((a + n) + 1) - 1 From [a+]
= a + n From [A41.1] with a+n for a
= a + ((n + 1) - 1) From [A41.1] with n for a
[L44] (a + b) - 1 = a + (b - 1) Above lines summarized, with b for n+1
</div></pre>
<pre><div class="code"
>[L45.1] a + b = a + (b - 0) From [A41] with b for a
[L45.2] a + b = (a + b) - 0 From [A41] with (a+b) for a
[L45.3] a + (b - 0) = (a + b) - 0 From [L45.1] and [L45.2] by [A4]
[L45.4] a + (b - n) = (a + b) - n Inductive assumption, true for n=0 by [L45.3]
a + (b - (n + 1))
= a + (b - (1 + n)) From [c+] with n for a and 1 for b
= a + ((b - 1) - n) From [L43.2] on b-(1+n)
= (a + (b - 1)) - n From [L45.4] with b-1 for b
= ((a + b) - 1) - n From [L44]
= (a + b) - (1 + n) From [L43.2] with a+b for a, 1 for b, n for c
= (a + b) - (n + 1) From [c+] with n for a and 1 for b
[L45.5] a + (b - c) = (a + b) - c Above lines summarized, with c for n+1
</div></pre>
Finally we tackle <code>a - (b - c)</code>,
which we build up to through quite a few lemmas.
<pre><div class="code"
>[L46.1] 0 - 0 = 0 [A41] with 0 for a
[L46.2] (0 + 1) - 1 = 0 [A41.1] with 0 for a
[L46.3] 1 - 1 = 0 From [L27.2] on 0+1
[L46.4] n - n = 0 Inductive assumption, true for n=1 from [L46.3]
(n + 1) - (n + 1)
= (n + 1) - (1 + n) From [c+]
= ((n + 1) - 1) - n) From [L43.2] with n+1 for a, 1 for b, n for c
= n - n From [A41.1] on (n+1)-1 with n+1 for a
= 0 From [L46.4]
[L46.5] a - a = 0 Above lines summarized, with a for n+1
</div></pre>
<pre><div class="code"
>a - b
= a - (b + 0) From a+0=0 with b for a
= a - (b + (n - n)) From a-a=0 with n for a
= a - ((b + n) - n) From [L45.5] with b for a, n for b and c
= a - ((n + b) - n) From commutative+
= a - (n + (b - n)) From [L45.5]
= (a - n) - (b - n) From [L43.2]
[L47] a - b = (a - n) - (b - n)
</div></pre>
<pre><div class="code"
>Substituting a = (c + n), b = (d + n) in [L47] yields
[L48.1] (c + n) - (d + n) = ((c + n) - n) - ((d + n) - n) = c - d
[L48.2] c - d = (c + n) - (d + n) [L48.1] last and first parts
</div></pre>
<pre><div class="code"
>(a - n) + n
= n + (a - n) From [c+]
= (n + a) - n From [L45.5]
= (a + n) - n From [c+]
= a + (n - n) From [L45.5]
= a + 0 From [L46.5]
= a From [i+]
[L49] (a - n) + n = a Above lines summarized
</div></pre>
<pre><div class="code"
>a - (b - c)
= (a + c) - ((b - c) + c) From [L48.2] with a for c, b-c for d, c for n
= (a + c) - b From [L49] with c for n
= (c + a) - b From [c+] on a+c
= c + (a - b) From [L45.5]
= (a - b) + c From [c+]
[L50] a - (b - c) = (a - b) + c Above lines summarized
</div></pre>
We now have all of our rules of association for addition and subtraction.
The following four equations, repeated from above, show all eight
possible combinations of + and - operators and grouping of three
variables.
<pre><div class="code"
>[L26] a + (b + c) = (a + b) + c
[L43.2] a - (b + c) = (a - b) - c
[L45.5] a + (b - c) = (a + b) - c
[L50] a - (b - c) = (a - b) + c
</div></pre>
Earlier we saw that, because of [L26], we can write <code>a + b + c</code>
and know that it is unambiguous.
But that is not true if we write <code>a - b - c</code>, because
the statement <code>(a - b) - c = a - (b - c)</code>
is not in general true.
In order to be able to write fewer parentheses, we arbitrarily choose
to have <code>a - b - c</code> mean the same thing as <code>(a - b) - c</code>.
<pre><div class="code"
>[A51] a - b - c = (a - b) - c
</div></pre>
We have specified that the middle variable (b in our equation),
following the <code>-</code> operator, should be
grouped with the variable on its left,
so we call the <code>-</code> operator left-associative;
but we generally say it is not associative,
meaning it does not associate both ways as does addition.
<br/><br/>
Unlike addition, subtraction is not commutative,
and it has no identity.
More precisely, we could say that zero is a
<a href="http://en.wikipedia.org/wiki/Identity_element">
right identity</a> for subtraction,
but since it is not also a left identity,
it is not a simple identity and we usually don't mention it.
<a name="negative-numbers"></a>
<h3>Negative Numbers</h3>
You may already have noticed that adding the subtraction operator
to our structure has created a bit of a problem:
we are now able to write expressions which we can not evaluate
within our structure.
For example, the expression <code>2 - 4</code> can not be reduced to
a single natural number.
When we reduce this equation according to our rules, we eventually
get to the point where we need to solve for <code>0 - 1</code>,
and we have no rule to reduce that any further.
In other words, our system is no longer a closed system:
to state the problem more precisely,
the natural numbers are not closed under subtraction.
<blockquote>
<div style="background-color: lightyellow; border: medium ridge black;
padding: 0.6em; margin-top: 1em; margin-bottom: 1em;">
A pet peeve of mine: elementary school math teachers who tell their
students "You cannot subtract 5 from 3."
This statement is misleading in its imprecision, since it can be solved with
the use of negative numbers.
Math is a precise field.
The correct statement should include that qualification:
"You cannot subtract 5 from 3 using the counting numbers we are studying."
<br/><br/>
Likewise for other incorrect statements such as
"You can not divide 3 by 2" and
"You can not take the square root of -4."
</div>
</blockquote>
We would like to be able to solve any equation we can write with our
subtraction operator, so we will define new numbers that we can use
for that purpose.
We call these numbers negative numbers.
We choose to write them using the same digits as we write our natural
numbers, with a leading <code>-</code> character, such as -1 and -2.
<br/><br/>
In our house-building analogy, so far we have built a little
house from the foundation upwards,
and now we realize we need some more support in order to finish subtraction.
Adding negative numbers is like adding another room to that house:
in order to have a solid structure, we need to extend our foundation.
To save on design work,
we are going to reuse the same basic plan as we used
when we built up the natural numbers.
This is like using the same blueprint for the second room
of our house as for the first,
except in mirror image because we find symmetry pleasing.
Here is a little diagram:
<pre><div class="code"
>
+-----+ +-----+
/ 3 \ / 3 \
+----+ +----+----+ +----+----+
| 2 | | 2 | | 5 | 2 |
+------+ +----+-+ +----+-+ +-+----+----+-+
| 1 | | 1 | | 1 | | 4 | 1 |
+------+ +------+ +------+ +------+------+
1. Natural 2. Addition 3. Subtraction 4. Negative Numbers
Numbers on Naturals Oops! 5. Addition on Negatives
3. Completion of Subtraction
</div></pre>
Thus we go back to the beginning of our derivation of natural numbers.
To distinguish our original numbers from our newly defined negative
numbers, we will call all of the numbers generated by our successor
function (that would be all numbers 1 and above) the positive numbers.
We will call the collection of all of these numbers
(positive, negative and zero) the integers.
We will call the characteristic of being
"positive" and "negative" the sign of the number.
<br/><br/>
Since we want our rules to apply to all integers, we start by stating
that in any of our previous assumptions and derivations, a variable
name can refer to any integer unless the specific proof or assumption
states otherwise (such as for induction proofs).
<br/><br/>
We started by defining a successor operator s(x) [A2],
and we now define a corresponding predecessor operator p(x)
that generates our negative numbers in a way which is symmetric to s(x):
<pre><div class="code"
>[A61] given x, p(x) generates another number, where p(x) is not the same as x
</div></pre>
In all of our original assumptions and following proofs, we now state
that variable names in those assumption refer to any integer.
We define the predecessor function as the inverse of the successor function
and vice-versa.
In other words:
<pre><div class="code"
>[A62.1] p(s(a)) = a
[A62.2] s(p(a)) = a
</div></pre>
We define our negative numbers in the same way as we defined
our natural (positive) numbers [A7]:
<pre><div class="code"
>[A63.1] -1 = p(0)
[A63.2] -2 = p(-1)
[A63.3] -3 = p(-2)
etc. to negative infinity.
</div></pre>
We take our no-duplicates assumption [A8] on the successor function
and state it for the predecessor function:
<pre><div class="code"
>[A64] For any x, repeated application of the predecessor function
any number of times will never generate x.
</div></pre>
For the relational operators, we can derive their meaning relative to
the predecessor operator:
<pre><div class="code"
> s(a) > a [A9]
p(s(a)) > p(a) Apply p(x) to both sides [L6.6]
a > p(a) From [A62.1]
[L65] p(a) < a From [A9]
</div></pre>
<a name="negative-addition"></a>
<h4>Addition</h4>
We add to our definition of Addition ([A21] and [A22]) to handle
negative numbers,
and we extend our induction assumption [A24] to negative numbers:
<pre><div class="code"
>[A71] a + p(b) = p(a + b)
[A72] If an equation is true for a known value of n,
and it can be demonstrated to be true for n+(-1) for any n when true for n,
then it is true for all natural numbers x where x < n.
</div></pre>
For each of our original assumptions through addition, we have now
added similar assumptions to handle our negative numbers.
All of our assumptions are completely symmetrical:
take any of the original assumptions, replace successor by predecessor,
replace 1 by -1, and exchange < with >,
and you will get the equivalent assumption for our negative numbers.
Because all of our other proofs in those sections are based on those
assumptions, the symmetric proofs for negative numbers follow from
the symmetric assumptions in exactly the same way as for the natural
numbers.
Thus all of the results and conclusions in those sections
are valid for addition of negative numbers:
commutative, associative, identity, algebra.
<br/><br/>
We list the results of one lemma here,
leaving the details of the derivation as an exercise to the reader:
<pre><div class="code"
>[L73] a + -1 = p(a)
</div></pre>
We derive a couple of other useful results:
<pre><div class="code"
> p(s(a)) = a [A62.1]
p(a + 1) = a [L23.1]
(a + 1) + -1 = a [L73]
a + (1 + -1) = a [a+]
(1 + -1) = 0
[L74] -1 + 1 = 0 [c+]
</div></pre>
<pre><div class="code"
> (1 + -1) = 0 [L74]
n + -n = 0 Inductive assumption, true for n=1 [L74]
(n + -n) + (1 + -1) = 0 From [i+] because (1 + -1) = 0
(n + 1) + (-n + -1) = 0
(n + 1) + (-(n+1)) = 0 From p(x) defn
[L75] a + -a = 0 Above lines summarized, with a for n+1
</div></pre>
The above statement says that, for any element a in our set of natural
numbers, there is an element -a (a negative number, negative a)
which can be added
to that natural number to produce zero (our identity element).
We call negative a the
<a href="http://en.wikipedia.org/wiki/Inverse_element">inverse element</a>
of a, and likewise a is the inverse element of -a.
<pre><div class="code"
> -a + a = 0 [L75]
(-a + a) - a = 0 - a Subtract a from each side
-a + (a - a) = 0 - a [L45.5]
[L76] -a = 0 - a [L46.5] and [i+]
</div></pre>
<pre><div class="code"
> a + -a = 0 [L75]
(a + -a) - -a = 0 - -a Subtract -a from each side
a + (-a - -a) = 0 - -a [L45.5]
a = 0 - -a [L46.5]
[L76.1] a = -(-a) [L76]
</div></pre>
<pre><div class="code"
>a + -b
= a + (0 - b) [L76]
= (a + 0) - b [L45.5]
= a - b [i+]
[L77] a + -b = a - b
</div></pre>
<a name="negative-subtraction"></a>
<h4>Subtraction</h4>
As with addition, we note that we can
create a set of symmetric assumptions using negative numbers in place
of positive numbers,
so that all of our results and conclusions of subtraction on positive
numbers also work on negative numbers.
<br/><br/>
For improved symmetry with the definition of addition,
we restate our assumptions defining subtraction to use the
successor and predecessor functions,
and we add a symmetric assumption that covers negative numbers.
We no longer need <code>(a+1)-1=0</code> [A41.1]
as an assumption for subtraction,
because it is equivalent to <code>p(s(a))=a)</code> [A62.1].
Since these assumptions are just a rewriting of our original
assumptions for subtraction, all of our derivations remain the same.
<pre><div class="code"
>[A41] a - 0 = a Repeat of original [A41]
[A81] a - s(b) = p(a - b) [A42] restated in terms of s and p
[A82] a - p(b) = s(a - b) Symmetric assumption to [A81]
</div></pre>
<a name="negative-algebra"></a>
<h4>Algebra</h4>
With the addition of negative numbers to our structure,
our set is closed with respect to subtraction.
We now have a set (the integers)
with an associative binary operator (+) with an identity (0)
and inverse elements (the negative numbers).
This algebraic structure is called a
<a href="http://en.wikipedia.org/wiki/Group_(mathematics)">group</a>.
Because our operator (addition) is commutative,
our algebraic structure is an
<a href="http://en.wikipedia.org/wiki/Abelian_group">abelian group</a>.
The group, however, ignores the subtraction operator.
<a name="multiplication"></a>
<h3>Multiplication</h3>
Once we start using addition for real tasks, we find that we are often
adding the same number many times, such as 3+3+3+3.
Because this is so common, we would like to define a shortcut -
a new operator - that means the same thing.
We call this operation multiplication.
<br/><br/>
There are various conventions for how the multiplication operator
is written: x, * and dot are common, and in some cases a convention
is adopted that two variables written next to each other with no
operator between them are to be multiplied.
Most computer programming languages use the asterisk character (*),
and I will use that here.
<br/><br/>
In order to have as much symmetry as we can, and to minimize our design work,
we will define multiplication using a similar approach as we did when
we defined addition:
<pre><div class="code"
>[A101] a * 0 = 0
[A102] a * (b + 1) = (a * b) + a
[A103] a * (b - 1) = (a * b) - a
</div></pre>
We could equivalently have used a slightly different formulation
for [A103] in which we add -1 rather than subtracting 1,
as supported by [L77]:
<pre><div class="code"
>a * (-1)
= a * (0 - 1) [L76]
= (a * 0) - a [A103]
= 0 - a [A101]
= -a [L76]
[L104.1] a * -1 = -a Above lines summarized
</div></pre>
<pre><div class="code"
>a * (b + -1)
= a * (b - 1) [L77]
= (a * b) - a [A103]
= (a * b) + -a [L77]
= (a * b) + (a * -1) [L104.1]
[L104.2] a * (b + -1) = (a * b) + (a * -1) Above lines summarized
</div></pre>
If the second operand is negative, we can factor that out and we see
that it changes the sign of the result.
<pre><div class="code"
>a * -n = -(a * n) Inductive assumption, true for n=1
a * -(n + 1)
= a * (-n - 1)
= (a * -n) - a
= -(a * n) - a
= 0 - (a * n) - a
= 0 - ((a * n) + a)
= 0 - (a * (n + 1))
= -(a * (n + 1))
[L104.3] a * -b = -(a * b) Above summarized, with b for n+1
[L104.4] -a * b = -(a * b) Swap a with b and use [c*]
</div></pre>
<pre><div class="code"
>-a * -b = -(-a * b) [L104.3]
= -(-(a * b)) [L104.3] again
= a * b [L76.1]
[L104.5] -a * -b = a * b Above lines summarized
</div></pre>
<a name="multiplication-identity"></a>
<h4>Identity and Zero</h4>
By setting b=0 in [A102], we see that 1 is a right-identity for
multiplication:
<pre><div class="code"
> a * (0 + 1) = (a * 0) + a From [A102] with 0 for b
a * 1 = 0 + a From [i+] on LHS, [A101] on RHS
[L105] a * 1 = a
</div></pre>
We show by induction that zero multiplied on either side gives zero:
<pre><div class="code"
>[L106.1] 0 * 0 = 0 From [A101] with 0 for a
[L106.2] 0 * n = 0 Inductive assumption, true for n=0
[L106.3] 0 * (n + 1) = (0 * n) + 0 From [A102] with 0 for a, n for b
[L106.4] 0 * (n + 1) = 0 + 0 From [L106.2]
[L106.5] 0 * (n + 1) = 0
[L106.6] 0 * a = 0 Above summarized with a for n+1
</div></pre>
By doing the same proof using [A103] we can conclude that [L106.6]
holds for all integers.
<br/><br/>
We show that 1 is a left identity:
<pre><div class="code"
>1 * 1 = 1 From [L105] with a=1
1 * n = n Inductive assumption, true for n=1
1 * (n + 1)
= (1 * n) + 1 From [A102] with a=1 and b=n
= n + 1 From Inductive assumption
[L106.8] 1 * a = a Above summarized, with a for n+1
</div></pre>
Since 1 is both a left identity and a right identity,
we can drop the handedness and just refer to it as an identity.
<br/><br/>
With addition we had one special number, 0, which when added to any
number yielded that number.
With multiplication we see that we have two special numbers:
the number 1 is an identity for multiplication,
but 0 is also special, since anything multiplied by 0 yields 0.
We choose to use the word "zero", when associated with a specific
operation such as multiplication, to mean a value that, when given
as an operand to that operator, always yields zero.
Our multiplication operator has only one zero, but other systems
and operators may have more than one zero.
<br/><br/>
By the same argument [L31] as for the additive identity, we can
see that there is only one multiplicative identity and only one
multiplicative zero.
<a name="multiplication-distributive"></a>
<h4>Distributive</h4>
We show that multiplication is distributive over addition by induction:
<pre><div class="code"
>[L107.1] a * (b + 0) = a * b = (a * b) + 0 = (a * b) + (a * 0)
a * (b + 1) = (a * b) + a [A102]
a * (b + 1) = (a * b) + (a * 1) From [L105] on rightmost a
[L107.2] a * (b + n) = (a * b) + (a * n) Inductive assumption, true for n=1
a * (b + (n + 1))
= a * ((b + n) + 1) From [a+]
= (a * (b + n)) + a From [A102]
= ((a * b) + (a * n)) + a From [L107.2]
= (a * b) + ((a * n) + a) From [a+]
= (a * b) + (a * (n + 1)) From [A102]
[L107.3] a * (b + c) = (a * b) + (a * c) Above summarized, with c for n+1
</div></pre>
The above proof can be repeated using -1 instead of 1 (by [L104.2]),
so [L107.3] covers all integers.
<br/><br/>
Using the same proof steps using [A103] rather than [A102]
demonstrates that multiplication distributes over subtraction as well.
Since by [L77] subtraction is the equivalent of adding the negative
of a number, this is consistent.
<pre><div class="code"
>[L107.4] a * (b - c) = (a * b) - (a * c)
</div></pre>
<pre><div class="code"
> 2 * 1 = 2 = 1 + 1
2 * n = n + n Inductive assumption, true for n=1
2 * (n + 1)
= 2 * n + 2
= (n + n) + (1 + 1)
= (n + 1) + (n + 1)
2 * a = a + a
1 * b = b
(0 * 1) * b = (0 * b) + b
(n + 1) * b = (n * b) + b Inductive assumption, true for n=0
(n + 2) * b = (n * b) + b + b Inductive assumption, true for n=0
((n + 1) + 1) * b
= (n + 2) * b
= (n * b) + b + b
= ((n + 1) * b) + b
(a + 1) * b = (a * b) + b
</div></pre>
<a name="multiplication-associative"></a>
<h4>Associative</h4>
We show multiplication is associative by induction:
<pre><div class="code"
>[L108.1] (a * b) * 0 = 0 = a * 0 = a * (b * 0)
[L108.2] (a * b) * 1 = a * b = a * (b * 1) From [L105] on each side
[L108.3] (a * b) * n = a * b = a * (b * n) Inductive assumption, true for n=1
(a * b) * (n + 1)
= ((a * b) * n) + (a * b)
= (a * (b * n)) + (a * b) From [L108.3]
= a * ((b * n) + b) From [L107.3] with b*n for b, b for c
= a * (b * (n + 1)) From [A102] with b for a, n for b
[L108.4] (a * b) * c = a * (b * c) Above lines summarized, with c for n+1
</div></pre>
As with the distributive law, we can replace 1 by -1 to show that
our conclusion covers negative numbers as wel.
<a name="multiplication-commutative"></a>
<h4>Commutative</h4>
<pre><div class="code"
>m * n = n * m Inductive assumption, true for m=0 or 1 and n=0 or 1
(m + 1) * (n + 1)
= (m + 1) * n + (n + 1) From [A102]
= (m * n) + m + (n + 1) From [(a+1)*b = a*b+b]
= (n * m) + n + (m + 1) From Inductive assumption and [a+]
= (n + 1) * m + (m + 1) From [same as two lines up]
= (n + 1) * (m + 1) From [A102]
[L109] a * b= b * a
</div></pre>
As with addition, the fact that multiplication is associative [L108.4]
means that, if we have an expression that is a string of values multiplied
together, we can drop the parentheses from the expression without
creating any ambiguity; and the fact that it is commutative means that
we can rearrange all of those multiplied values to any order we want.
<a name="multiplication-algebra"></a>
<h4>Algebra</h4>
We have added a second operator to our repertoire
that, like addition, is an associative binary operator with an identity.
With two such operators, where one distributes over the other, we have a
<a href="http://en.wikipedia.org/wiki/Ring_(mathematics)">ring</a>
(for a more precise definition, follow the link).
In the same way that group ignores subtraction,
the ring ignores the division operator.
As with addition,
there are a few rules from the above section that we will use often enough
that we want to reference them by name rather than lemma number.
<pre><div class="code"
>[a*] a * (b * c) = (a * b) * c [L108.4] Associativity of multiplication
[c*] a * b = b * a [L109] Commutativity of multiplication
[z*] a * 0 = 0 * a = 0 [L106.6] Zero for multiplication
[i*] a * 1 = 1 * a = a [L106.8] Identity for multiplication
[d*] a * (b + c) = (a * b) + (a * c) [L107.3] Distributivity of multiplication over addition
</div></pre>
<a name="division"></a>
<h3>Division</h3>
As when we defined subtraction to be the inverse operation of addition,
we want an inverse operation to multiplication so that we can solve for <code>x</code>
in equations such as <code>a * x = b</code>.
<br/><br/>
We call our inverse operation division.
As with multiplication, there are a number of common ways this
operation is expressed.
For use in this presentation, we choose to use the slash character (/)
to represent the division operation.
We want division and multiplication each to be the inverse of the other,
as is the case with addition and subtraction,
so we have two candidate definitions:
<pre><div class="code"
>[A120.1] (a * b) / b = a for all a and b except b=0
[A120.2] (a / b) * b = a for all a and b except b=0
</div></pre>
Our definitions exclude zero because we already have a rule that says
anything times zero is zero, so we know <i>a priori</i> that we can't make
these new rules work for all a when b is zero.
<br/><br/>
The fact that we can't divide by zero is the first time we have
encountered a special case in our structure, where we have to add a
qualification to one of our rules stating that you can't do something
rather than extending our structure to make it possible to do that.
When, in building our structure of numbers, we realized that we could
not answer the question "what is 3 - 5?", we expanded the structure to
allow us to answer tha question ("negative 2"). In this case, we can't
answer the question "what is 5 / 0?", but, for the first time,
instead of trying to expand
our structure to be able to answer that question, we make the statement
"you can't do that".
As we will see later, the further we go in defining our structure,
the more such exceptions and caveats we need to make.
<br/><br/>
We check that the two assumptions above are compatible by starting with one
and converting it into the other.
<pre><div class="code"
>(a * b) / b = a [A120.1]
((a * b) / b) * b = a * b Right-multiply both sides by b
(c / b) * b = c Previous line with c for a*b; this is [A120.2]
</div></pre>
We can quickly get some useful lemmas by plugging in a few different
values for a and b:
<pre><div class="code"
>[L121] a / 1 = a From [A120.1 or 2] with b=1, after a*1=a
[L122] b / b = 1 From [A120.1] with a=1, after b*1=b
[L123] (1/b)*b = 1 From [A120.2] with a=1
[L124] 0 / b = 0 From [A120.1] with a=0, after 0*b=0
[L124.2] a / a = 1 From [A120.1] with a=1 and b=a
</div></pre>
If we are looking at the equation
<pre><div class="code"
>[I125] a = c / b
</div></pre>
what does that mean?
If we assume
<pre><div class="code"
>[A126] c = a * b
</div></pre>
then [I125] becomes
<pre><div class="code"
>[I127] a = (a * b) / b
</div></pre>
which is [A120.1]. This is true by definition, so our assumption [A126]
is a valid assumption to use in solving [I125].
What we are saying here is that the solution (a) to [I125] is the value
that, when multiplied by b, gives c.
<pre><div class="code"
>[L128] If a = c / b, then c = a * b, and vice-versa (from [I125] and [A126])
</div></pre>
<a name="division-associative"></a>
<h4>Associative</h4>
As we did with subtraction,
we want to prove the associative laws for division so we know how
we can transform various combinations of parentheses and the
multiplication and division operations.
We already know about <code>a * (b * c)</code>,
so there are three other possible combinations of * and / with the
parentheses in the same position:
<ul>
<li><code>a / (b * c)</code>
<li><code>a * (b / c)</code>
<li><code>a / (b / c)</code>
</ul>
<pre><div class="code"
>[I129.1] a / (b * c) = d Given
a = d * (b * c) From [L128]
a = (d * c) * b From [a*] and [c*]
a / b = d * c From [L128]
[I129.2] (a / b) / c = d From [L128]
[L129.3] a / (b * c) = (a / b) / c From [I129.1] and [I129.2]
</pre>
<pre><div class="code"
>[I130.1] a * (b / c) = d Given
a * (b / c) * c = d * c Multiply both sides by c
a * b = d * c Reduce b /c * c = b by [A120.1]
[I130.2] (a * b) / c = d From [L128]
[L130.3] a * (b / c) = (a * b) / c From [I130.1] and [I130.3]
</pre>
<pre><div class="code"
>[I131.1] a / (b / c) = d Given
a = d * (b / c) From [L128]
= (d * b) / c From [L130.3]
a * c = d * b From [L128]
c * a = d * b From [c*]
(c * a) / b = d From [L128]
c * (a / b) = d From [L130.3]
[I131.2] (a / b) * c = d From [c*]
[L131.3] a / (b / c) = (a / b) * c From [I131.1] and [I131.2]
</pre>
We now have all of our rules of association for multiplication and division.
The following four equations, repeated from above, show all eight
possible combinations of * and / operators and grouping of three
variables.
Note that this table is identical to the table of rules of association
for addition and subtraction, with * instead of + and / instead of -.
<pre><div class="code"
>[a*] a * (b * c) = (a * b) * c
[L129.3] a / (b * c) = (a / b) / c
[L130.3] a * (b / c) = (a * b) / c
[L131.3] a / (b / c) = (a / b) * c
</div></pre>
We derive a few more useful lemmas.
<pre><div class="code"
> a / b
= (a * 1) / b From [i*]
= a * (1 / b) From [LL130.3]
[L132] a / b = a * (1 / b) Summary of the above lines
</div></pre>
<pre><div class="code"
> 1 / (a / b)
= (1 / a) * b From [L131.3]
= b * (1 / a) From [c*]
= b / a From [L132]
[L133] 1 / (a / b) = b / a Summary of the above lines
</pre>
<pre><div class="code"
> (a / b) * (c / d)
= ((a / b) * c) / d) From [L130.3]
= (c * (a / b)) / d) From [c*]
= ((c * a) / b) / d) From [L130.3]
= (c * a) / (b * d) From [L129.3]
= (a * c) / (b * d) From [c*]
[L134] (a / b) * (c / d) = (a * c) / (b * d) Summary of the above lines
</pre>
<pre><div class="code"
> (a / b) / (c / d)
= ((a / b) * 1) / (c / d) From [i*]
= (a / b) * (1 / (c / d)) From [L130.3]
= (a / b) * (d / c) From [L133]
= (a * d) / (b * c) From [L134]
[L135] (a / b) / (c / d) = (a * d) / (b * c) Summary of the above lines
</pre>
<a name="rational-numbers"></a>
<h3>Rational Numbers</h3>
You may have noticed in the above section about the division operation
that we discussed things like <code>1 / a</code> without commenting on the
fact that our number system, which up to now includes only integers,
does not in general include the numbers that can represent that.
The proper sequence would have been to introduce rational numbers first,
but I wanted to finish the discussion about the properties of the division
operation before discussing rational numbers.
With that out of the way, let's turn to rational numbers.
<br/><br/>
We can easily build a table for specific values of a, b and c for equation
[I125] by taking all pairs of integer values for a and b, generating c as their
product, and defining the value of c/b to be a for all of those triplets.
For example, 2*3=6, therefore 6/3=2.
<br/><br/>
Our division table does not include all possible
combinations of <code>c/b</code>,
so there are some division equations for which the
answer can not be found in our tables.
For example, 3/2 does not appear in our table because,
in our system of numbers up to this point, which is all integers,
there is no number that, when multiplied by 2, yields 3.
<br/><br/>
In order for our numbers to be closed under division, we have to add some
new numbers, which are the numbers needed to solve the equation
<code>c/b</code> when
there is no integer number <code>a</code> such that <code>a*b=c</code>.
We call these numbers rational numbers,
because they are the ratio of two integers,
and we choose to represent them
as a fraction using the division operator.
In other words, when we ask what is the answer to the equation <code>c/b</code>,
we are simply defining the answer to be <code>c/b</code> and stating that
that value is a number.
We will then examine how to manipulate these numbers.
<br/><br/>
We have defined rational numbers as numbers of the form <code>c/b</code>.
We also know from our table-based enumeration of division equations that,
for any number c which can be written as <code>a*b</code>, the value of
the division equation <code>c/b</code> is a.
We define the value of our rational number that we write as <code>c/b</code>
to be consistent with the known solutions of our division equations written
the same way.
Thus the value of the rational number 6/3 is defined to be 2, etc.
<a name="rational-algebra"></a>
<h4>Algebra</h4>
With division as the inverse of multiplication, the multiplicative identity 1,
and rational numbers, our ring is now a
<a href="http://en.wikipedia.org/wiki/Field_(mathematics)">field</a>.
<br/><br/>
This is as far as we will go with algebra. When we continue with
exponentiation to derive real numbers and then complex numbers,
those structures are still fields.
<a name="operator-precedence"></a>
<h4>Operator Precedence</h4>
Up to now, we have been using parentheses to ensure that the order of
application of operators in an expression is unambiguous. We noted
earlier that we don't need those parentheses in an expression that
consists solely of a number of values added together, and likewise that we
don't need parentheses in an expression that consists solely of a number
of values multiplied together. This is nice because it reduces the amount
of writing we need to do.
<br/><br/>
We can further reduce the need for parentheses by defining a rule that
tells us which operations to evaluate first when there are no parentheses
to guide us. When we start with an operation and then define a second
operation as the repeated application of the first operation,
we can think of that second operation as being more powerful than the
first operation. We then give priority to the more powerful operator,
defining our rule of precedence to be that, in an expression in which
the order of evaluation would otherwise be ambiguous, we will evaluate
the more powerful operators first.
<br/><br/>
We define addition (+) and subtraction (-) to be at the first level,
and multiplication (*) and division (/) to be at the second level and
higher power than the first level.
Thus, for example, the expression <code>a + b * c</code> will be
equal to <code>a + (b * c)</code>, and the expression
<code>a / b - c</code> will be equal to <code>(a / b) - c</code>.
<br/><br/>
In cases where there are multiple operators of the same power,
we define the order of evaluation to be left to right.
Thus, for example, the expression <code>a / b * c</code> will be
equal to <code>(a / b) * c</code>, and the expression
<code>a - b + c</code> will be equal to <code>(a - b) + c</code>.
<a name="exponentiation"></a>
<h3>Exponentiation</h3>
Up to this point the structure we have built is pretty clean. With
rational numbers and our four operators (+, -, *, /), we have a system
that is closed and mostly complete and consistent, with the only exception
being that we can't divide by zero. Other than that one exception,
operations are well-defined, we have a nice set of rules including our
commutative, associative, and distributive rules, and we have a host of
identities and lemmas we can apply to our rational numbers.
<br/><br/>
Once we add exponentiation, things get a lot messier: we will have
expressions that have multiple values, bigger swaths of undefined
operations, and many places where our lemmas and rules of manipulation
no longer apply. It might seem like it's hardly worth trading our nice
clean rational numbers for this mess. But despite all of the rough
edges, there are enough useful things you can do with real and complex
numbers that it is worth carefully defining where those rough edges are
and avoiding them. So, let's forge ahead.
<br/><br/>
As with addition, once we start using multiplication for real problems,
we often find we want to multiply the same number together many times,
such as <code>3*3*3*3</code>.
As we did when defining multiplication, we define a new operator that
means the same as repeated multiplication.
We call this new operation exponentiation.
In programming languages this is sometimes written using the up-arrow (^)
as an operator, but since this is HTML we have the luxury of using the
standard notation, which is to write the exponent as a superscript.
For example the expression <code>3<sup>4</sup></code>
means 3 multiplied by itself
4 times, or <code>3 * 3 * 3 * 3</code>.
We call the number on the left the base, and the superscript number
the exponent.
The operation of exponentiation is also referred to as taking a base to a power,
where the power is the exponent.
<br/><br/>
In line with our precedence rules by which we evaluate higher-power
operations first, we will evaluate exponentiation before multiplication,
division, addition, and subtraction, when there are no parentheses to
otherwise indicate the order of evaluation.
<br/><br/>
From [a*] we know we can group repeated multiplication any way we want,
so for example <code>3 * 3 * 3 * 3 = (3 * 3 * 3) * 3 = (3 * 3) * (3 * 3)</code>.
Using our new superscript notation, we can write this as
<code>3<sup>4</sup> = (3<sup>3</sup>) * (3<sup>1</sup>) = (3<sup>2</sup>) * (3<sup>2</sup>)</code>.
More generally, we can see these things
from our definition of exponentiation and [a*]:
<pre><div class="code"
>[L201.1] a<sup>(b + c)</sup> = a<sup>b</sup> * a<sup>c</sup>
[L201.2] a<sup>1</sup> = a
[L201.3] (a<sup>b</sup>)<sup>c</sup> = a<sup>(b * c)</sup>
[L201.4] (a<sup>b</sup>)<sup>c</sup> = a<sup>b*c</sup> = a<sup>c*b</sup> = (a<sup>c</sup>)<sup>b</sup> From [L201.3] and [c*]
</div></pre>
We can figure out how to deal with <code>(a * b)<sup>n</sup></code>
by starting with n=2:
<pre><div class="code"
> (a * b)<sup>2</sup>
= (a * b) * (a * b)
= a * b * a * b From [a*]
= a * a * b * b From [c*]
= a<sup>2</sup> * b<sup>2</sup>
[L201.5] (a * b)<sup>2</sup> = a<sup>2</sup> * b<sup>2</sup> Summary of above lines
</div></pre>
Then we use induction for the general case:
<pre><div class="code"
>Assume (a * b)<sup>n</sup> = a<sup>n</sup> * b<sup>n</sup> for some n
(a * b)<sup>(n + 1)</sup>
= (a * b)<sup>n</sup> * (a * b)<sup>1</sup> From [L201.1]
= (a<sup>n</sup> * b<sup>n</sup>) * (a * b) From [L201.5]
= a<sup>n</sup> * a * b<sup>n</sup> * b From [a*] and [c*]
= a<sup>(n + 1)</sup> * b<sup>(n + 1)</sup>
True when n=2 from [L201.5], so by induction true for all positive n
[L201.5] (a * b)<sup>n</sup> = a<sup>n</sup> * b<sup>n</sup>
</div></pre>
Unlike addition and multiplication, we can quickly see from
counterexamples that exponentiation is neither commutative:
<pre><div class="code"
>2<sup>3</sup> = 2 * 2 * 2 = 8
3<sup>2</sup> = 3 * 3 = 9
8 != 9, so 2<sup>3</sup> != 3<sup>2</sup>
</div></pre>
nor associative:
<pre><div class="code"
>2<sup>(3<sup>2</sup>)</sup> = 2<sup>(3 * 3)</sup> = 2<sup>9</sup> = 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 = 512
(2<sup>3</sup>)<sup>2</sup> = (2 * 2 * 2)<sup>2</sup> = 8<sup>2</sup> = 8 * 8 = 64
512 != 64, so 2<sup>(3<sup>2</sup>)</sup> != (2<sup>3</sup>)<sup>2</sup>
</div></pre>
These initial lemmas are based on our intuitive definition of exponentiation
as repeated multiplication, which provides obvious answers only in the case
where the exponent is a counting number (strictly positive integer).
Let's extend our definition to cover other numbers in our algebra.
<pre><div class="code"
>[A202.1] d = b + c Starting assumption
[I202.2] b = d - c
a<sup>d</sup> = a<sup>b</sup> * a<sup>c</sup> From [A202.1] and [L201.1]
a<sup>d</sup> / a<sup>c</sup> = a<sup>b</sup> * a<sup>c</sup> / a<sup>c</sup> Assuming a<sup>c</sup>!=0
[I202.3] a<sup>b</sup> = a<sup>d</sup> / a<sup>c</sup>
[L202.4] a<sup>(d - c)</sup> = a<sup>d</sup> / a<sup>c</sup> Substitute b from [I202.2]
</div></pre>
We can't divide by zero, so the above is not valid when <code>a<sup>c</sup></code>
is zero. When is that expression zero? From the definition of
exponentiation, this expression represents repeated multiplication of
<code>a</code>. What number when multiplied by itself is zero?
There is only one such number: zero. So [L202.4] is not valid when
<code>a = 0</code>, but it is valid for any other base.
<br/><br/>
Let's look at two special cases of [L202.4].
<pre><div class="code"
> a<sup>0</sup> = a<sup>(1 - 1)</sup> From [L46.5], a!=0
= a<sup>1</sup> / a<sup>1</sup> From [L202.4]
= a / a From [L201.2]
= 1 From [L124.2]
[L203] a<sup>0</sup> = 1 Above lines summarized, a!=0
</div></pre>
<pre><div class="code"
> a<sup>-b</sup> = a<sup>(0 - b)</sup> From [L76], a!=0
= a<sup>0</sup> / a<sup>b</sup> From [L202.4], b!=0
= 1 / a<sup>b</sup> From [L203]
[L204] a<sup>-b</sup> = 1 / a<sup>b</sup> Above lines summarized, a!=0, b!=0
[L204.1] a<sup>-1</sup> = 1 / a From [L204] with b = 1, and [L201.2]
</div></pre>
The above extends our exponentiation operator to all integer exponents
and all bases other than zero. What about rational exponents?
<br/><br/>
Remember that our goal is to define a set of consistent and useful
operations. To that end, we want to ask ourselves how we can define
exponentiation using a rational exponent such that it is consistent with
the rest of our algebra.
Rational numbers are equivalent to division using integers, which is
the inverse of multiplication. Our exponentiation rule [L201.3]
includes multiplication, from which we can derive a rule for division.
<pre><div class="code"
> a = a<sup>1</sup> [L201.2]
= a<sup>(b / b)</sup> From [L124.2], b!=0
= a<sup>(b * 1/b)</sup> From [L132]
= a<sup>(1/b * b)</sup> From [c*]
= (a<sup>1/b</sup>)<sup>b</sup> From [L201.3]
[L205] (a<sup>1/b</sup>)<sup>b</sup> = a Summary of the above lines
</div></pre>
What the above says is that the value of <code>a<sup>1/b</sup></code> is the
number that, when raised to the power b, is equal to a.
For example, the number <code>a<sup>1/2</sup></code> is the number that,
when raised to the power 2, is equal to a.
We call <code>a<sup>1/b</sup></code> the b-th root of a.
The case where b is 2 or 3 is common enough that we define special names:
we call <code>a<sup>2</sup></code> a squared and
<code>a<sup>1/2</sup></code> the square root of a;
we call <code>a<sup>3</sup></code> a cubed and
<code>a<sup>1/3</sup></code> the cube root of a.
<br/><br/>
Previously when we added a new operation to represent repeated application
of an earlier operation (addition as repeated counting and
multiplication as repeated addition), we did not encounter closure
problems until we added an inverse operation to the newly added
operation (subtraction, division).
As we will see below, this is not the case for exponentiation:
here we will run into closure problems even without an inverse
operation. But to keep the flow the same as with the other operators,
I will discuss the inverse operation before getting back to closure.
<a name="logarithms"></a>
<h3>Logarithms</h3>
As when we defined division to be the inverse operation of multiplication,
we want an inverse operation to exponentiation so that we can solve for <code>x</code>
in equations such as <code>a<sup>x</sup> = b</code>.
<br/><br/>
We call our inverse operation logarithm.
<blockquote>
<div style="background-color: lightyellow; border: medium ridge black;
padding: 0.6em; margin-top: 1em; margin-bottom: 1em;">
There is a curious hole in math terminology about logarithms.
Our other operations all have names: we talk about performing
addition, multiplication, or exponentiation.
We do addition by adding two addends to get a sum.
But we don't "do logarithm": we "take a logarithm".
The word <i>logarithm</i> refers to one of the elements in
that operation, similar to how the word <i>exponent</i>
refers to one of the elements in the operation of exponentiation.
There seems to be no single word for logarithms that corresponds
to the operation names such as addition, multiplication, and exponentiation.
Talking about logarithms is like talking about sums rather than addition.
</div>
</blockquote>
<pre><div class="code"
>[A221.1] log<sub>a</sub>(a<sup>b</sup>) = b for all a and b except a=0 or b=0
[A221.2] a<sup>log<sub>a</sub>b</sup> = b for all a and b except a=0 or b=0
</div></pre>
We can derive a few lemmas for log.
<pre><div class="code"
>[L222.1] log<sub>a</sub>(a) = log<sub>a</sub>(a<sup>1</sup>) = 1 [L201.2] and [A221.1] with b=1
[L222.2] log<sub>a</sub>(1) = log<sub>a</sub>(a<sup>0</sup>) = 0 [L203] and [A221.1] with b=0
[L222.3] log<sub>a</sub>(1/a) = log<sub>a</sub>(a<sup>-1</sup>) = -1 [L204.1] and [A221.1] with b=0
</div></pre>
<pre><div class="code"
>[I223.1] log<sub>a</sub>(a<sup>c</sup>) = c [A221.1] using c instead of b
[I223.2] log<sub>a</sub>(a<sup>d</sup>) = d [A221.1] using d instead of b
[I223.3] log<sub>a</sub>(a<sup>c</sup>) + log<sub>a</sub>(a<sup>d</sup>) = c + d Add left sides and right sides of [I223.1] and [I223.2]
[I223.4] log<sub>a</sub>(a<sup>c+d</sup>) = c+d [A221.1] using c+d instead of b
[L223.5] log<sub>a</sub>(a<sup>c+d</sup>) = log<sub>a</sub>(a<sup>c</sup>) + log<sub>a</sub>(a<sup>d</sup>) Transitive equals on [I223.3] and [I223.4]
</div></pre>
<pre><div class="code"
>[I224.1] log<sub>a</sub>(a<sup>c</sup>) + log<sub>a</sub>(a<sup>d</sup>) = c + d Subtract left sides and right sides of [I223.1] and [I223.2]
[I224.2] log<sub>a</sub>(a<sup>c-d</sup>) = c-d [A221.1] using c-d instead of b
[L224.3] log<sub>a</sub>(a<sup>c-d</sup>) = log<sub>a</sub>(a<sup>c</sup>) - log<sub>a</sub>(a<sup>d</sup>) Transitive equals on [I224.1] and [I224.2]
</div></pre>
<pre><div class="code"
>[I225.1] log<sub>a</sub>(a<sup>c+d</sup>) = log<sub>a</sub>(a<sup>c</sup>*a<sup>d</sup>) [L201.1]
[I225.2] log<sub>a</sub>(a<sup>c+d</sup>) = log<sub>a</sub>(a<sup>c</sup>) + log<sub>a</sub>(a<sup>d</sup>) [L223.5]
[I225.3] log<sub>a</sub>(a<sup>c</sup>*a<sup>d</sup>) = log<sub>a</sub>(a<sup>c</sup>) + log<sub>a</sub>(a<sup>d</sup>) Transitive equals on [I225.1] and [I225.2]
[L225.4] log<sub>a</sub>(x*y) = log<sub>a</sub>(x) + log<sub>a</sub>(y) Substitute x for a<sup>c</sup> and y for a<sup>d</sup>
</div></pre>
<pre><div class="code"
>[I226.1] log<sub>a</sub>(a<sup>c-d</sup>) = log<sub>a</sub>(a<sup>c</sup>/a<sup>d</sup>) [L202.4], a<sup>d</sup>!=0
[I226.2] log<sub>a</sub>(a<sup>c-d</sup>) = log<sub>a</sub>(a<sup>c</sup>) - log<sub>a</sub>(a<sup>d</sup>) [L224.3]
[I226.3] log<sub>a</sub>(a<sup>c</sup>/a<sup>d</sup>) = log<sub>a</sub>(a<sup>c</sup>) - log<sub>a</sub>(a<sup>d</sup>) Transitive equals on [I226.1] and [I226.2]
[L226.4] log<sub>a</sub>(x/y) = log<sub>a</sub>(x) - log<sub>a</sub>(y) Substitute x for a<sup>c</sup> and y for a<sup>d</sup>, y!=0
</div></pre>
<a name="principal-values"></a>
<h3>Principal Values</h3>
Previously, we noted that, when we added division to our
algebraic structure, we had to add a small complication in that
we can't divide by zero.
When we add square root (or, more generally, exponentiation with any non-integer exponent),
we run into another kind of special case
where we have to take additional care: multivalued functions.
We note that every number has two square roots:
for example, the square root of 4 is 2 or -2, because either of
those numbers, when multiplied by itself, is equal to 4.
With multivalued functions like square root,
we can run into trouble
if we are not careful about choosing which value to use.
Here's an example of this problem:
<pre><div class="code"
>(4<sup>1/2</sup>)<sup>2</sup> = 4
4<sup>1/2</sup> * 4<sup>1/2</sup> = 4
2 * 4<sup>1/2</sup> = 4 Substitute 2 as the first square root
2 * -2 = 4 Substitute -2 as the second square root
-4 = 4 Wrong!
</pre>
The bad substitution in the above sequence may be easy to spot and understand,
but as we go further into building our algebra, problems of this
nature become subtler and harder to recognize.
<br/><br/>
We can reduce the probability of running into this kind of problem by
carefully selecting which of these multiple values to use. When we
have one preferred value for a multivalued function, we call that the
<a href="https://en.wikipedia.org/wiki/Principal_value">principal value</a>
of the function.
For example, the principal value of sqrt(4) is 2.
<a name="irrational-numbers"></a>
<h3>Irrational Numbers</h3>
The ancient Greeks knew that <code>2<sup>1/2</sup></code>
(the square root of two) is not a rational number.
There are <a href="https://www.cut-the-knot.org/proofs/sq_root.shtml">a
lot of proofs</a> of this.
I happen to like this one that demonstrates that all roots
(square root, cube root, and others) that are not integers
are not rational.
<pre><div class="code"
>Assume a<sup>b</sup> = c (b=2 for square root, b=3 for cube root, etc)
and a = d/e, e!=1
where d/e is reduced to the lowest form, so they have no prime factors in common.
Then a<sup>b</sup> = (d/e)<sup>b</sup> = d<sup>b</sup>/e<sup>b</sup> = c = c/1
But d<sup>b</sup> has no prime factors that are not in d,
and e<sup>b</sup> has no prime factors that are not in e,
so d<sup>b</sup> and e<sup>b</sup> have no prime factors in common,
and the fraction can not be reduced at all,
and in particular can not be reduced to c/1,
therefore it can not be equal to c.
Since there is no rational number satisfying the original assumption,
any solution must not be a rational number,
except in the case that e=1, which means the root is an integer.
</div></pre>
In order for our numbering system to be closed under exponentiation,
we need to extend our numbers to include these values that are not
rational numbers. We call them irrational numbers.
<br/><br/>
When we added negative numbers and rational numbers, that was after
we had added not only an operation defined by repetition, but also
its inverse. In this case, we had to extend our numbers to provide
closure even without having yet added that inverse operation.
<blockquote>
<div style="background-color: lightyellow; border: medium ridge black;
padding: 0.6em; margin-top: 1em; margin-bottom: 1em;">
A brief aside about infinity:
before adding irrational numbers, our set of numbers was always
countably infinite, which means there was always a way to map the
entire set of numbers onto the counting numbers.
For example, we can count off all the integers, both positive and negative,
by ordering them like this: 0, 1, -1, 2, -2, 3, -3, and so on.
We can count off all the rational numbers by ordering them according to
the sum of the numerator and denominator and alternating positive and
negative, like this: 0, 1/1, -1/1, 1/2, -1/2, 2/1, -2/1, 1/3, -1/3,
2/2, -2/2, 3/1, -3/1, 1/4, and so on, then removing duplicates
(any fraction that is not reduced).
But once we add all the irrational numbers
we can no longer come up with a counting order like this, which is why
we say the set of all irrational numbers is
<a href="https://en.wikipedia.org/wiki/Georg_Cantor%27s_first_set_theory_article">uncountable</a>.
<br/><br/>
For a proof of this assertion, look up Canter's
<a href="https://en.wikipedia.org/wiki/Cantor%27s_diagonal_argument">diagonalization argument</a>.
</div>
</blockquote>
<a name="decimal-notation"></a>
<h4>Decimal Notation</h4>
When we introduced rational numbers, such as 1/2, we defined their values
in terms of the division operation, but did not provide any other
representation. This was perhaps acceptable, as we can easily manipulation
rational numbers in order to answer questions about them.
<br/><br/>
With irrational numbers, it is not quite so easy. How can we tell, for example,
which of <code>2<sup>1/2</sup></code>, <code>3<sup>1/3</sup></code>,
or 723/510 is the largest?
We would like a representation that allows us to do real-world
calculations with these values.
<br/><br/>
When counting up with integers, we use a place-notation system in
which each digit, as we move to the left, represents a value that is
ten times as much as the digit just to its right. For example,
1234 means 1 * 1000 + 2 * 100 + 3 * 10 + 4.
We extend this sequence by defining each place to the
right of the ones digit as having a place value of one tenth of the digit
to its left. In order to unambiguously know which place is the ones place,
we put a decimal point (.) just to the right of the ones digit
(we in America, that is; in some other parts of the world people
use a comma (,) instead).
For example, 0.5678 means 5 * 1/10 + 6 * 1/100 + 7 * 1/1000 + 8 * 1/10000.
<br/><br/>
We can convert fractions to decimal form such as
<code>a.bcde</code> by remembering that that means
<code>a + b/10 + c/100 + d/1000 + e/10000</code>
<pre><div class="code"
>723/510 = (510 + 213) / 510
= 510/510 + 213/510
= 1 + 213/510
= 1 + 10 * 213/510 / 10
= 1 + 2130/510 / 10
= 1 + (2040 + 90)/510 / 10
= 1 + 2040/510 / 10 + 90/510 / 10
= 1 + 4/10 + 10 * 90/510 / 100
= 1 + 4/10 + 900/510 / 100
= 1 + 4/10 + (510 + 390)/510 / 100
= 1 + 4/10 + (510/510 + 390/510) / 100
= 1 + 4/10 + 1/100 + 390/510 / 100
= 1 + 4/10 + 1/100 + 10 * 390/510 / 1000
= 1 + 4/10 + 1/100 + 3900/510 / 1000
= 1 + 4/10 + 1/100 + (3570 + 330)/510 / 1000
= 1 + 4/10 + 1/100 + (3570/510 + 330/510) / 1000
= 1 + 4/10 + 1/100 + 7/1000 + 330/510 / 1000
= 1.417 + more digits from 330/510 / 1000
</div></pre>
Figuring out the decimal representation for a number such as
<code>2<sup>1/2</sup></code> is not quite as straightforward,
but we can start by the brute-force approach of trial and error
to get an estimate.
<pre><div class="code"
>1<sup>2</sup> = 1, 1<2
2<sup>2</sup> = 4, 4>2, so our number must start with 1
1.1<sup>2</sup> = 1.21
1.2<sup>2</sup> = 1.44
1.3<sup>2</sup> = 1.69
1.4<sup>2</sup> = 1.96
1.5<sup>2</sup> = 2.25 so our number must start with 1.4
1.41<sup>2</sup> = 1.9881
1.42<sup>2</sup> = 2.0164 so our number must start with 1.41
1.411<sup>2</sup> = 1.990921
1.412<sup>2</sup> = 1.993744
1.413<sup>2</sup> = 1.996569
1.414<sup>2</sup> = 1.999396
1.415<sup>2</sup> = 2.002225 so our number must start with 1.414
</div></pre>
From this much we can determine that <code>2<sup>1/2</sup></code>
is less than <code>723/510</code>. We don't have an exact answer,
but for real world questions we often don't need to go to
very many decimal digits to get the answer.
<br/><br/>
Our decimal notation is a sum of fractions, so any finite decimal number
can be converted to a rational number. Conversely, irrational numbers can
not be exactly represented as a decimal number, we can only approximate
them when using decimal notation.
If we want to maintain an exact representation of an irrational number
such as <code>2<sup1/2</sup></code>, we have to keep it in that notation
or something similar.
<a name="imaginary-numbers"></a>
<h3>Imaginary Numbers</h3>
Adding irrational numbers extends our numbers to include the value of
<code>2<sup>1/2</sup></code> and other fractional roots of positive
numbers, but it doesn't cover everything. In particular, our numbers
don't yet include a value for the expression <code>-1<sup>1/2</sup></code>.
This is the square root of negative 1, which is equal to the number that,
when multiplied by itself, equals negative 1.
But any positive number multiplied by itself is a positive number, and
from [L104.5] any negative number multiplied by itself is also
a positive number, so we don't have any numbers that are candidates to
be the square root of negative 1. In order to have exponentiation be closed
for negative bases, we need to extend our numbers.
We need to add a set of numbers that, when multiplied by themselves,
produce negative numbers.
<br/><br/>
When we added negative numbers, we used our existing counting numbers with
an added character (-) in front to indicate a negative number.
We will do something similar here, using our existing counting numbers with
an added character, in this case the letter <i>i</i>, following the number to
indicate the new kind of numbers we are adding.
We define <code>1<i>i</i></code> (or just <code><i>i</i></code>) to
be the number such that <code><i>i</i><sup>2</sup> = -1</code>,
and given a number <code>a</code>, we define
<code>a<i>i</i> = a * <i>i</i></code>
(which is consistent with a common convention of defining
<code>ab = a * b</code>).
<br/><br/>
We need to pick a name to distinguish these new numbers from what we had
before, and "the square root of negative one" is too unwieldy, so we pick
a shorter name and call them imaginary numbers.
<br/><br/>
When we defined negative numbers, we might have instead called them
imaginary numbers, because you can't have negative lengths or a negative
number of apples in the real world, so those numbers are not real, right?
In the sense that they are highly useful for certain mathematical
calculations, imaginary numbers are no more "imaginary" than negative
numbers. It is unfortunate that we are stuck with a name that causes
some people to get distracted from thinking about these new numbers
as simply the next step in expanding our numbering system to be closed
under exponentiation.
<br/><br/>
To distinguish them from our newly added imaginary numbers, we go back
and lump together our previously defined rational and irrational numbers
and call those real numbers.
Having made the distinction between real and imaginary numbers, we note
that we can have imaginary rational numbers, such as <code>(1/2)<i>i</i></code>,
or imaginary irrational numbers, such as <code>2<sup>1/2</sup><i>i</i></code>,
as well as negative imaginary numbers such as <code>-4<i>i</i></code>
or negative irrational imaginary numbers such as
<code>-2<sup>1/2</sup><i>i</i></code>.
<br/><br/>
If we work through the mechanics of addition and subtraction with
imaginary numbers, we find that they work the same as real numbers but
with that extra <i>i</i> everywhere. To put it another way, imaginary
numbers are closed under addition and subtraction.
This is not the case with multiplication: imaginary numbers are not
closed under multiplication, since <code>i * i = -1</code>, which
is not an imaginary number.
Similarly, imaginary numbers are not closed under division, since
<code>i / i = 1</code>, which is not imaginary.
<a name="complex-numbers"></a>
<h3>Complex Numbers</h3>
Since we defined imaginary numbers as being a different set of numbers from
real numbers, we can't convert from one to the other, so if we try to add
a real number <code>a</code> and an imaginary number <code>b<i>i</i></code>
together, we can't reduce that, so we just write it as
<code>a + b<i>i</i></code>.
We call this kind of number a complex number,
and since a or b could be zero, we note that all real numbers and all
imaginary numbers are complex numbers.
<br/><br/>
We are, in a sense, cheating when we use the + symbol to enumerate
the real and imaginary parts of a complex number, because, as just stated,
we can't actually do anything with that operator to reduce the number.
In that sense, we could have used any special character in that location.
But we choose to use the + sign because it turns out the rules we have
that deal with the + operator on real numbers also work with complex numbers:
commutative, associative, and distributive rules all work consistently
when applied to complex numbers when we use a + sign between the real and
imaginary parts.
<br/><br/>
As with square root, complex numbers come with multivalued functions,
some with an infinite number of solutions.
It's easy to get bad results if you're not careful, so it's important
to define a principal value for these functions and consistently use it.
<a name="complex-cartesian"></a>
<h4>Cartesian Coordinates</h4>
Since real and imaginary numbers can't be reduced to each other and are
thus orthogonal, we can represent them on the plane. We choose real to
be the X axis and imaginary to be the Y axis.
<br/><br/>
With this cartesian environment, we can represent complex numbers in
polar coordinates using the standard conversion:
<code>(r, θ) = (sqrt(x<sup>2</sup> + y<sup>2</sup>), arctan(y/x)</code>,
where x is the real part and y is the imaginary part
(and with the appropriate sign adjustments for quadrants
other than I).
Converting the other way, we have
<code>(x, y) = (r * cos(θ), r * sin(θ))</code>.
Sometimes we refer to a complex number as <code>z</code>,
where we can decompose it either by real and imaginary parts,
written as <code>x = Re(z), y = Im(z)</code>,
or by polar coordinates,
written as <code>r = |z|, θ = Arg(z)</code>,
where <code>|z|</code> is the
<a href="https://en.wikipedia.org/wiki/Magnitude_(mathematics)#Complex_numbers">magnitude</a> of <code>z</code>
and <code>Arg(z)</code> is the
<a href="https://en.wikipedia.org/wiki/Argument_(complex_analysis)">argument</a> of <code>z</code>.
More precisely, <code>arg(z)</code> is the argument of <code>z</code>,
and <code>Arg(z)</code> is the principal argument of <code>z</code>.
<code>arg(z)</code> is a multi-valued function equal to
<code>Arg(z) + n*2*π</code> for all integer values of <code>n</code>.
<br/><br/>
We can treat our complex numbers as vectors in the two dimensional
complex plane, so that adding two complex numbers can be displayed
in our plane as vector addition.
More interesting is multiplication, where we can see that when we
use polar coordinates we get this nice result:
<code>(r1,θ1) * (r2,θ2) = (r1*r2, θ1+θ2)</code>.
<pre><div class="code"
>(r1,θ1) * (r2,θ2) = (r1*cos(θ1) + r1*sin(θ1)<i>i</i>) * (r2*cos(θ2) + r2*sin(θ2)<i>i</i>)
= r1*(cos(θ1) + sin(θ1)<i>i</i>) * r2*(cos(θ2) + r2*sin(θ2)<i>i</i>)
= r1*r2 * (cos(θ1) + sin(θ1)<i>i</i>) * (cos(θ2) + r2*sin(θ2)<i>i</i>)
= r1*r2 * (cos(θ1)*cos(θ2) + cos(θ1)*sin(θ2)<i>i</i> + sin(θ1)*cos(θ2)<i>i</i> + sin(θ1)*sin(θ2)*<i>i</i><sup>2</sup>
= r1*r2 * ((cos(θ1)*cos(θ2) - sin(θ1)*sin(θ2)) + (cos(θ1)*sin(θ2) + sin(θ1)*cos(θ2))<i>i</i>)
= r1*r2 * (cos(θ1+θ2) + sin(θ1+θ2)<i>i</i>)
= r1*r2*cos(θ1+θ2) + r1*r2*sin(θ1+θ2)<i>i</i>
= (r1*r2, θ1+θ2)
[L301] (r1,θ1) * (r2,θ2) = (r1*r2, θ1+θ2) The above summarized
</div></pre>
<a name="eulers-formula"></a>
<h4>Euler's Formula</h4>
Here is Euler's Formula:
<pre><div class="code"
>e<sup>iθ</sup> = cos(θ) + <i>i</i>*sin(θ)
</div></pre>
Feynman calls this
"one of the most remarkable, almost astounding, formulas in all of mathematics"
and refers to it as an "amazing jewel".
<br/><br/>
As described in an article at <a href="https://brilliant.org/wiki/eulers-formula/">Brilliant</a>,
Euler's Formula can be derived using the series expansions of sin(x), cos(x), and e<sup>x</sup>:
<pre><div class="code"
>cos(x) = 1 - x<sup>2</sup>/2! + x<sup>4</sup>/4! - ...
sin(x) = x - x<sup>3</sup>/3! + x<sup>5</sup>/5! - ...
e<sup>x</sup> = 1 + x + x<sup>2</sup>/2! + x<sup>3</sup>/3! + ...
</div></pre>
so:
<pre><div class="code"
>e<sup><i>i</i>*x</sup> = 1 + <i>i</i>*x + (<i>i</i>*x)<sup>2</sup>/2! + (<i>i</i>*x)<sup>3</sup>/3! + (<i>i</i>*x)<sup>4</sup>/4! + (<i>i</i>*x)<sup>5</sup>/5! + ...
= 1 + <i>i</i>*x - x<sup>2</sup>/2! - <i>i</i>*x<sup>3</sup>/3! + x<sup>4</sup>/4! + <i>i</i>*x<sup>5</sup>/5! - ...
= (1 - x<sup>2</sup>/2! + x<sup>4</sup>/4! - ...) + <i>i</i>*(x - x<sup>3</sup>/3! + x<sup>5</sup>/5! - ...)
= cos(x) + <i>i</i>*sin(x)
</div></pre>
In the section on Cartesian Coordinates above, we noted that any complex
number can be represented in polar coordinates using r and theta, but
we didn't have a good place to put the <i>i</i>.
With Euler's Formula, we can now unambiguously represent any complex number
<code>z = x + <i>i</i>*y</code> as
<code>|z| * e<sup><i>i</i>*arg(z)</sup></code>
where <code>|z|</code> is the magnitude of <code>z</code>
and <code>arg(z)</code> is the argument of <code>z</code>.
<a name="complex-exponentiation"></a>
<h4>Complex Exponentiation</h4>
Given <code>w = u + <i>i</i>*v</code> and <code>z = x + <i>i</i>*y</code>,
how do we calculate <code>w<sup>z</sup></code>?
<br/><br/>
We would like <code>w<sup>z</sup></code> to satisfy the
rules of exponentiation that we derived for real numbers, such as
<code>k<sup>a+b</sup> = k<sup>a</sup> * k<sup>b</sup></code>.
We will assume that we can apply this rule to complex exponentiation
and see how that works out.
<br/><br/>
From the discussion of Euler's Formula above we know that we can represent
any nonzero complex number <code>w</code> as <code>|w|*e<sup><i>i</i>*arg(w)</code>,
and we can represent the real number <code>|w|</code> as <code>e<sup>ln(|w|)</code>.
Let's see where that takes us.
<pre><div class="code"
>w<sup>z</sup> = (|w|*e<sup>(<i>i</i>*arg(w))</sup>)<sup>z</sup> Expand w
= (e<sup>ln(|w|)</sup>*e<sup><i>i</i>*arg(w)</sup>)<sup>z</sup> Use exp form for magnitude of w
= (e<sup>ln(|w|)+<i>i</i>*arg(w)</sup>)<sup>z</sup> e<sup>a</sup> * e<sup>b</sup> = e<sup>a+b</sup>
= e<sup>(ln(|w|)+<i>i</i>*arg(w))*z</sup> (e<sup>a</sup>)<sup>b</sup> = e<sup>a*b</sup>
= e<sup>(ln(|w|)+<i>i</i>*arg(w))*(x+<i>i</i>*y)</sup> Expand z to real and imaginary parts
= e<sup>ln(|w|)*x + ln(|w|)*</i>i</i>*y + <i>i</i>*arg(w)*x + <i>i</i>*arg(w)*<i>i</i>*y</sup> (a+b)*(c+d)=ac+ad+bc+bd
= e<sup>((ln(|w|)*x - arg(w)*y) + <i>i</i>*(ln(|w|)*y + arg(w)*x)</sup> i<sup>2</sup>=-1 and rearrange terms
[L310] w<sup>z</sup> = e<sup>((ln(|w|)*x - arg(w)*y) + <i>i</i>*(ln(|w|)*y + arg(w)*x)</sup> The above summarized
</div></pre>
This gives us a number of the form <code>r * e<sup><i>i</i>*θ</sup></code>
where <code>r = e<sup>((ln(|w|)*x - arg(w)*y)</sup></code>
and <code>θ = ln(|w|)*y + arg(w)*x</code>,
both of which we can evaluate.
<br/><br/>
Note that the above result includes <code>arg(w)</code> in two places,
once multiplied by <code>x</code> and once multiplied by <code>y</code>.
<code>arg</code> is a multi-valued function, and thus complex exponentiation is
also multi-valued for all exponents except zero.
<br/><br/>
If we are raising to a real power, then <code>y</code> is zero, so [L310] reduces to
<pre><div class="code"
>w<sup>x</sup> = e<sup>((ln(|w|)*x) + <i>i</i>*(arg(w)*x)</sup> [L310] with y=0
= |w|<sup>x</sup> * e<sup><i>i</i>*arg(w)*x</sup> For real x and all w
</div></pre>
This equation says the magnitude of the result is the magnitude of <code>w</code>
raised to the <code>x</code> power
and the <code>arg</code> of the result is the arg of <code>w</code> multiplied by <code>x</code>. If, for example,
we are squaring and thus <code>x</code> is 2, we square the magnitude of the number and
double the angle.
This result is consistent with our earlier observation that, when multiplying two
complex numbers, we can multiply the magnitudes and add the angles.
<br/><br/>
If <code>y</code> is zero and <code>x</code> is an integer, then <code>e<sup><i>i</i>*arg(w)*x</sup></code>
gives the same result for all of the multiple values of <code>arg(w)</code>, so the overall function
is single-valued. If <code>x</code> is not an integer, this is not the case.
For example, if <code>x</code> is 1/2, then we get two different answers by plugging in
<code>Arg(w)</code> and <code>Arg(w) + 2*π</code>. These are the two square roots of
a number: they always have the same magnitude and differ in angle by π.
<br/><br/>
If we consider the path that would be traced out for powers of some fixed <code>w</code> as we
change the real exponent, we can see that it generates a circle or a spiral.
Here is a nice visualization of <code>z<sup>x</sup></code> from
<a href="http://www.suitcaseofdreams.net/powers_complex.htm">Suitcase of Dreams</a>
for when <code>|z|>1</code>:
<br/>
<img src="http://www.suitcaseofdreams.net/Images/TF/spiral5.gif">
<br/><br/>
If we are raising to an imaginary power, then <code>x</code> is zero, so [L310] reduces to
<pre><div class="code"
>[L311] w<sup><i>i</i>*y</sup> = e<sup>(-arg(w)*y + <i>i</i>*ln(|w|)*y)</sup> [L310] with x=0
</div></pre>
Let's evaluate <i>i</i><sup><i>i</i></sup>.
We use [L311] with <code>w=<i>i</i></code> and <code>y=1</code>:
<pre><div class="code"
><i>i</i><sup><i>i</i></sup> = e<sup>(-arg(w) + <i>i</i>*ln(|w|))</sup> [L311] with w=<i>i</i> and y=0
= e<sup>-π/2</sup> * e<sup><i>i</i> * 0</sup> |w|=1, ln(1) is 0
= e<sup>-π/2</sup> Imaginary part drops out completely!
= 0.207879...
</div></pre>
Surprisingly, <code><i>i</i><sup><i>i</i></sup></code> is a real number, a little larger than one fifth.
At least, that's one answer. We can use any of the answers <code>e<sup>-π/2 + k*2π</sup></code>
for any integer <code>k</code>.
<br/><br/>
We see that we can represent any nonzero complex number in the form
<code>e<sup><i>i</i>*z</sup></code>, given <code>z = x + <i>i</i>*y</code>.
<pre><div class="code"
>e<sup><i>i</i>*z</sup> = e<sup><i>i</i>*(x+<i>i</i>*y)</sup>
= e<sup><i>i</i>*x + <i>i</i>*<i>i</i>*y</sup>
= e<sup>-y + <i>i</i>*x</sup>
= e<sup>-y</sup> * e<sup><i>i</i>*x</sup>
</div></pre>
One interesting thing we can do now is to extend Euler's Formula
from real theta to complex theta, which allows us to define
<code>sin</code> and <code>cos</code> for the entire complex plane:
<pre><div class="code"
>e<sup><i>i</i>*z</sup> = cos(z) + <i>i</i>*sin(z)
e<sup><i>-i</i>*z</sup> = cos(z) - <i>i</i>*sin(z) cos is an even function, sin is an odd function
e<sup><i>i</i>*z</sup> + e<sup><i>-i</i>*z</sup> = 2*cos(z)
cos(z) = 1/2 (e<sup><i>i</i>*z</sup> + e<sup><i>-i</i>*z</sup>)
e<sup><i>i</i>*z</sup> - e<sup><i>-i</i>*z</sup> = 2*<i>i</i>*sin(z)
sin(z) = 1/(2*<i>i</i>) (e<sup><i>i</i>*z</sup> - e<sup><i>-i</i>*z</sup>)
</div></pre>
<a name="eulers-identity"></a>
<h4>Euler's Identity</h4>
We evaluate Euler's Formula with theta set to pi:
<pre><div class="code"
>e<sup><i>i</i>*π</sup> = cos(π) + <i>i</i>*sin(π)
= -1 + 0
= -1
</div></pre>
We add one to both sides to get the typical presentation, <code>e<sup><i>i</i>*π</sup> + 1 = 0</code>.
<br/><br/>
Not only does this identity tie together five of the key values of algebra
(e, π, <i>i</i>, 1, and 0), it does it with one each of the key operations
we derived above (equality, addition, multiplication, exponentiation).
That's a pretty sweet equation.
<a name="final-closure"></a>
<h3>Final Closure</h3>
Throughout this presentation, we have expanded our system of numbers as we
defined new operators and discovered our system of numbers was not closed
under the new operators. But with complex numbers, we have reached a point
where we don't need to define any new number types. Complex numbers are
sufficient to solve all algebraic equations.
This is one of the interpretations of the
<a href="https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra">Fundamental Theorem of Algebra</a>,
but the proofs are pretty difficult, so I'm not going to try to prove it here.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-29635259855463955592021-08-17T11:32:00.000-07:002021-08-17T11:32:58.733-07:00You Are Not AloneImagine that you live alone.
I don't mean living by yourself in an apartment or house, I mean
imagine you are the only person in the world.
Furthermore, imagine that no other people have ever touched the world, so that
you are living in a wilderness without any of the artifacts of humanity.
Kind of like Brian in <a href="https://en.wikipedia.org/wiki/Hatchet_(novel)">Hatchet</a>,
but without the hatchet.
And without any clothes or any other manufactured items
<br/><br/>
Think about what you have to do to survive:
<ul>
<li>Gather or hunt your own food and prepare it
<li>Protect yourself from predators and parasites
<li>Create protection from the environment, such as clothes and a shelter
<li>Care for your own injuries and illnesses
</ul>
In that situation, how much could you accomplish? What could you create?
What wealth could you accumulate?
<h2>The Only Human</h2>
But let's go one step further. Think about all of the knowledge you have
that you learned from someone else rather than from direct experience.
Now imagine that you did not know any of that. You only know the things
that you have learned through you own interactions with the world.
We'll be generous and say you also know things that you might reasonably
have discovered on your own.
<br/><br/>
Now how much could you accomplish?
<br/><br/>
Remember, you can only work with available natural materials such as wood
and stone. You can't use metal, ceramic, plastic, rubber, or cloth unless
and until you can make it yourself. Remember also, we are assuming you don't
have any knowledge except that which you have learned through direct experience.
You are unlikely to even know that any of those materials exist or are
possible, let alone know how to create them.
<br/><br/>
Would you be able to survive? Would you have any time left over to start the
long process of discovering, learning about, and making any of the unavailable materials just mentioned?
Compared to what you own today, and the accomplishments of your real life so far, how much could you have
collected or accomplished in our imaginary situation?
<h2>Knowledge is Power</h2>
Let's ease up on the restrictions a bit and allow you to retain all of the
knowledge you have. In fact, let's take it one step further and make available
to you all of the collected knowledge and experience of humankind. Basically
let's say you have internet access. Now you can look up anything you want,
even if you have never thought about it before.
You can read about and watch videos on how to make a bow and arrow, or how to knap flint
to make an arrowhead, or how to make steel, or how a computer works.
<br/><br/>
Of course, reading about how to do something and actually being able to do it
are not the same thing. If you want to make an arrowhead, first you'll have to
find and identify some flint, then you'll have to practice, practice, practice
knapping before you get a decent arrowhead.
You should eventually be able to make your flint arrowhead and an
arrow to attach it to, and with a lot more work you'll be able to make a functional bow.
Your internet connection will provide you with many details that would take
much longer to get right if you had to figure them out yourself, such as
what kind of wood to use,
how to fletch and nock the arrow, how to make string, how to make glue,
and how to string your bow.
<br/><br/>
The knowledge you can get from your internet connection will help you much
more quickly learn how to identify edible and poisonous plants,
skin and cure animal hides,
make fire
(it's harder than you might think; rubbing two sticks
together is not an effective approach),
make and fire ceramic
(clay), and maybe, if you are lucky enough to find some copper ore (which
your internet knowledge can help you identify), create some metal tools.
<br/><br/>
There are many things you will not be able to create by yourself,
even with a long and healthy life and with access to all that information.
As examples, producing integrated circuits and stainless steel require far more
prerequisite infrastructure than you could create in one lifetime.
But having access to the distilled knowledge of millions of lifetimes of
exploration and experimentation will allow you to create much more than
you could if, as in our initial supposition, you had to learn everything
yourself.
<br/><br/>
With all that knowledge available to you, how much could you create and
accomplish in a world without other people and their creations
as compared to your current life?
<br/><br/>
We've seen how much more you would likely be able to create if you had
access to the knowledge of humankind via the internet. In the real world as well,
we use that knowledge to help us accomplish much more than we could
without it. We don't have to rely solely on what we have directly learned
from our own experience. We benefit from the experiences and knowledge
collected by many other people.
<h2>The Wealth of the World</h2>
What if, in addition to the knowledge humankind has collected,
you also had access to the physical things humankind has created?
Let's now assume that the world exists just as it does today, with all of
its roads, factories, and other infrastructure, but with no other people.
What could you accomplish?
<br/><br/>
The first question is, how long will all that infrastructure continue to
operate without any people? How long will you continue to have
electricity, water, communications, or the internet?
If you were to apply your time and energy towards keeping those systems up,
how much difference would it make?
Probably not a lot. Those systems are too big, there are too many, and
they require too much experience for your efforts as one person to make
much difference.
Without the continuing work of a very large number of people, all of
these systems, that we rely on in the ordinary course of our lives,
would likely fail relatively quickly.
<br/><br/>
If all those systems fail, what could you accomplish?
You could perhaps figure out how to generate some electricity, but keeping
that system running would certainly take some of your time.
And you would still have to spend some time collecting and preparing food.
For a while you could live off canned and preserved food that
you could raid from a grocery store, but eventually you'd have to
start gathering or hunting again, and that would cut into the time you
have available for doing other work.
<br/><br/>
But for our imaginary scenario, let's say all of those systems continued to
work. Let's even take it a step further, and stipulate that all of the factories
and supply chains continue to operate.
We'll even say you can order stuff online.
So basically, everything works as it does in the real world, except that
you don't have the ability to communicate or collaborate with any people.
Now we are essentially asking, how much can you create or accomplish in
the real world if you do not collaborate with anyone else or specifically
ask anyone else to do some custom work for you?
<h2>People Power</h2>
This is not that much different that the way many people operate, and
some people can create amazing things. One person can create a wonderful
piece of art, or a fun computer program, or an elegant piece of furniture.
But most of the things in the world, and all of the most complex and
sophisticated things, are made by groups of people,
sometimes very large groups of people,
collaborating towards a common goal.
<br/><br/>
I hope that this exercise has helped you see how much all of us rely on the
work of other people to accomplish what we do.
In all of our lives, there are innumerable people who have helped us get
to where we are and whose labors continue to contribute to our success.
There is no person walking this earth who has not been helped by someone
else at some point.
As babies, we would have died if there were no one feeding us and caring for us.
We have all learned things from teachers, friends, strangers, and, through media,
from people we have never met.
We have all inherited wealth from our ancestors, whether it is a personal
mansion or the use of our public streets, bridges, and other infrastructure.
We use knowledge from around the world and across time.
We benefit from the factories and other capital created by our ancestors that
provide us with better and less expensive goods.
We rely on the labor of others to provide us with food, clean water, electricity,
and many other things, so that we can focus on our own specialty.
For large projects, we collaborate with others to get more done, and even for
small projects we may solicit some piece of custom work from someone else.
In all of these ways, the work of other people, both past and present,
makes it possible for us to own more, do more, and produce more than
we could without them.
<br/><br/>
The next time you think "I did it all myself", please remember to be grateful
for all the people who helped you do it:
all the people who kept you alive and cared for you as a baby or beyond,
all the people who gained the knowledge of the world,
all the people who helped you learn some of it,
all the people who built the world around you,
all the people who made things that you now have,
and all the people who are still providing goods and services to you.
You are not alone.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-84168984892650632902021-05-23T15:32:00.000-07:002021-05-23T15:32:31.670-07:00Recreating Butter StreuselLong ago, while I was living for one year in Heidelberg, Germany,
I frequented a bakery on the Hauptstrasse that sold a baked good
called Butter Streusel. It was half way between a cookie and a
coffee cake: not quite an inch tall, about half of that being the
streusel topping, with a base that was denser and crisper than cake.
I never found it at any other bakery, and when I returned to Heidelberg
some years later, that bakery no longer sold it either.
<br/><br/>
A lot of people baked bread during the pandemic. I decided it was
a good time to make some Butter Streusel. I tried a couple of
recipes from the web, but they didn't quite match my memory.
I have to admit, however, that after so many years, it's possible
that what I remember never existed. As Mark Twain may or may not
have said,
"The older I get, the more clearly I remember things that never happened."
<br/><br/>
I started with a
<a href="https://www.chefkoch.de/rezepte/1596591266914632/Streuselkuchen.html">yeast-based recipe</a>
with a photo that looked somewhat like what I remembered,
but I wanted something more dense, and I decided I didn't like the
yeast taste in something that should taste more like a cake or cookie
than bread. I looked through a bunch of recipes for streuselkuchen,
shortbread, short cake, pound cake, vanilla cake, and biscuits,
and started experimenting.
<br/><br/>
One of my goals was that it should be easy to make, so some of my
experimentation was not only about what ingredients to use, but how
to mix them together.
On trial #13 I had something I liked. A friend said it was "amazing."
Here it is.
<h2>Butter Streusel</h2>
Preheat oven to 375F.
<br/>
Get a 9x12x2in baking pan (a shorter pan or even a cookie sheet should work)
and parchment paper, but don't put the paper in the pan yet.
<h3>Base</h3>
Prepare the base dough:
<ul>
<li>140g softened butter (salted) (10 tbsp)
<li>60g sugar
<li>2 tsp vanilla
</ul>
<ul>
<li>200g flour
<li>1 tsp baking powder (double acting)
<li>1/4 tsp salt
</ul>
<ul>
<li>100ml milk
</ul>
Blend together butter, sugar, and vanilla in a bowl large enough for all the base dough ingredients.
<br/>
Mix flour, baking powder, and salt (you can use a big bowl for this and reuse
it for the topping), then mix into butter mixture.
<br/>
Add milk and mix to smooth consistency, knead for about 2 minutes, then let rest for 10 minutes.
<br/>
(I used an electric mixer for all of the above steps, except for pre-mixing
the dry ingredients and for the last two minutes of kneading.)
<h3>Topping</h3>
While the base dough is resting, prepare the streusel topping:
<ul>
<li>160 g flour
<li>120 g sugar
<li>120 g butter, melted (melted makes the topping lumpier, which is good)
</ul>
Mix all ingredients together, leaving some lumps.
<h3>Assemble and Bake</h3>
Get a piece of parchment paper a few inches larger than the pan. Lay the
parchment paper on the counter, and spread the base dough on it using
a couple of tablespoons until it is the size of the bottom of the pan.
Place the parchment paper with dough into the pan and push down the
sides and corners to make it flat. Spread the streusel over the top.
<br/>
Bake at 375F for 35 minutes.
<br/>
Remove from oven. Lift parchment paper with contents and place on a cutting board to cool.
Cool about 20 minutes, then cut into 2x3in bars.
<br/><br/>
Yield: 18 bars about 3/4in thick.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-20932022002026234682020-09-04T17:42:00.000-07:002020-09-04T17:42:33.320-07:00Transferring MiniDV Tapes to LinuxDownloading miniDV tapes from my Sony DCR-TRV22
camcorder to my Fedora 32 Linux system with a Thunderbolt 3 port was
easy, using <code>dvgrab</code> and a couple of Apple converters to go from FireWire to
Thunderbolt 3.
<br/><br/>
Many years ago I transferred all my VHS home videos to disk through the somewhat painful process
of first recording them onto DVDs using a DVD recorder, then ripping the DVDs on my computer.
My next video transfer project was to transfer my more recent home videos from miniDV to disk.
There was always other work to do, and transferring the tapes was never a critical task, so it
was easy to put off. I was thinking it probably would be time consuming but not too difficult,
since my Linux computer had an IEEE 1394 (FireWire) port, so I wasn't too worried about it.
<br/><br/>
When the lockdown started earlier this year, that presented a good opportunity for me to start
my tape transfer project. I grabbed my miniDV camcorder and my box of tapes, then went to get a
cable to connect the camcorder to my computer. It was only then that I remembered that I upgraded
to a new computer at the beginning of this year and gave away the old computer. The old computer,
from 2010, had the IEEE 1394 port, the but new one did not. Oops! I waited a bit too long
for this supposedly easy job.
<br/><br/>
My new computer has a ton of ports of various flavors, so it seemed possible that it might still work,
if I could get the right cables and converters. After some digging, it looked like it should be
possible to use the USBC Thunderbolt port on my new computer. But I couldn't find much support for
whether it would work when run through converters on a current version of Linux. The required
converters are pretty expensive, but I decided to take a chance and buy them.
<br/><br/>
My Sony DCR-TRV22 camcorder has a 4-pin FireWire 400 jack, and I had FireWire 400 to 800 cable.
I purchased an
<a href="https://www.amazon.com/gp/product/B00SQ2CJUS/">Apple Thunderbolt to FireWire Adapter</a>
for $29 and an
<a href="https://www.amazon.com/gp/product/B01MQ26QIY">Apple Thunderbolt 3 (USB-C) to Thunderbolt 2 Adapter</a>
for $49, and for good measure I also purchased a
<a href="https://www.amazon.com/gp/product/B003L4P872">FireWire 400 to 800 Adapter</a>
for $10 (in case I had to use a different cable), which I ended up not using.
I connected the cable to the camcorder, connected the other end of the cable to the
FireWire to Thunderbolt adapter, plugged the FireWire to Thunderbolt adapter into the
Thunderbolt 2 to Thunderbolt 3 adapter, and plugged the Thunderbolt 2 to Thunderbolt 3 adapter
into the USBC Thunderbolt 3 port on my computer. Then I ran <code>dvgrab</code>, which I had installed earlier.
And... it did not see the camera. Rats.
<pre>
# lsmod | grep -i fire
(nothing)
# lspci | grep -i fire
(nothing)
</pre>
Fortunately, it turned out to be an easy fix. I was able to determine that the Thunderbolt to FireWire
adapter was visible by looking in <code>/sys/bus/thunderbolt</code>:
<pre>
# cat /sys/bus/thunderbolt/devices/0-3/device_name
Thunderbolt to FireWire Adapter
</pre>
I found the solution in an <a href="https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1726299/comments/6">Ubuntu bug report</a>:
the Thunderbolt device had to be authorized.
(Note that your device number might be different.)
<pre>
# cat /sys/bus/thunderbolt/devices/0-3/authorized
0
# echo 1 > /sys/bus/thunderbolt/devices/0-3/authorized
# lspci | grep -i fire
40:00.0 FireWire (IEEE 1394): LSI Corporation FW643 [TrueFire] PCIe 1394b Controller (rev 08)
# lsmod | grep -i fire
firewire_ohci 45056 0
firewire_core 81920 1 firewire_ohci
crc_itu_t 16384 1 firewire_core
</pre>
At this point I was able to insert a tape into the camcorder and test it:
<pre>
$ dvgrab foo-
</pre>
This created the file <code>foo-001.dv</code>.
By installing the <code>mediainfo</code> program, I was able to see the datestamp of the recording:
<pre>
$ mediainfo foo-001.dv | grep date
Recorded date : 2015-12-25 10:32:28.000
</pre>
The actual command I used to download the tape is:
<pre>
$ dvgrab --autosplit --timestamp --size 0 --rewind --showstatus dv-
</pre>
At this point I could just put a tape in the camcorder, rewind it, run the above <code>dvgrab</code> command,
come back an hour or so later when it was done, then put in the next tape and repeat.
It took a long time to get through all my miniDV tapes, but not much work.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com1tag:blogger.com,1999:blog-7045524330253482541.post-53044745525710930172019-11-28T06:59:00.001-08:002019-11-28T07:34:04.359-08:00Go Composition vs InheritanceGo does not support inheritance, but sometimes using embedded structs can look
a little like inheritance. I explore that feature to see how it differs.
<h2>Contents</h2>
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#baseclass">Base class</a></li>
<li><a href="#subclass">Subclass</a></li>
<li><a href="#main">Main and test</a></li>
<li><a href="#overriding">Overriding</a></li>
<li><a href="#downcall">Downcall</a></li>
<li><a href="#promotion">Method promotion</a></li>
<li><a href="#solution">Solution</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
<a name="intro">
<h2>Introduction</h2>
</a>
In lieu of inheritance, the Go language encourages composition
by allowing one struct to be <a href="https://golang.org/doc/effective_go.html#embedding">embedded</a> in another struct in a way that allows
calling methods defined on the embedded struct as if they are defined on the
containing struct.
<br/><br/>
<b>Note:</b> In this post I occasionally use object-oriented terminology such as base class,
subclass, and override. Please remember that Go does <i>not</i> support these concepts; I am
using those terms here to show how thinking that way with Go can lead to problems.
<br/><br/>
For the examples that follow, I assume we are building a graphical editor that
allows manipulating visual objects on the screen. We want to be able to draw
those objects, and we want to be able to transform them with operations such
as rotate, so we define an interface with those methods:
<br/><br/>
<b>Note:</b> For convenience, the final collected code used in this post is available
on <a href="https://play.golang.org/p/JA3zCjyeCCn">play.golang.org</a>.
<pre name="hlcode" class="go"
><div class="code">type shape interface {
draw()
rotate(radians float64)
// translate and scale omitted for simplicity
}
</div></pre>
We write a function that will draw all our shapes:
<pre name="hlcode" class="go"
><div class="code">func drawShapes(shapes []shape) {
for _, s := range shapes {
s.draw()
}
}
</div></pre>
<a name="baseclass">
<h2>Base class</h2>
</a>
We define our "base class", called <code>polygon</code>, where we implement a <code>draw</code> method
that we can invoke from our "subclasses":
<pre name="hlcode" class="go"
><div class="code">type polygon struct {
sides int
angle float64
}
func (p *polygon) draw() {
fmt.Printf("draw polygon with sides=%d\n", p.sides)
vertexDelta := 2*math.Pi / float64(p.sides)
vertexAngle := p.angle
x0 := math.Cos(vertexAngle)
y0 := math.Sin(vertexAngle)
for i := 0; i < p.sides; i++ {
// Draw one side within unit circle, offset by p.angle.
vertexAngle += vertexDelta
x1 := math.Cos(vertexAngle)
y1 := math.Sin(vertexAngle)
fmt.Printf("draw from (%v, %v) to (%v, %v)\n", x0, y0, x1, y1)
x0 = x1
y0 = y1
}
}
func (p* polygon) rotate(radians float64) {
p.angle += radians
}
</div></pre>
<a name="subclass">
<h2>Subclass</h2>
</a>
We define a couple of "subclasses", <code>triangle</code> and <code>square</code>,
that "extend" our "base class",
along with functions to create instances of those types:
<pre name="hlcode" class="go"
><div class="code">type triangle struct {
polygon
}
type square struct {
polygon
}
func createTriangle() *triangle {
return &triangle{
polygon {
sides: 3,
},
}
}
func createSquare() *square {
return &square{
polygon {
sides: 4,
},
}
}
</div></pre>
<a name="main">
<h2>Main and test</h2>
</a>
Finally, we write a couple of test functions to create a list of shapes and
draw them, and a one-line <code>main</code> function that calls our test function.
<pre name="hlcode" class="go"
><div class="code">package main
import (
"fmt"
"math"
)
func createTestShapes() []shape {
shapes := make([]shape, 0)
shapes = append(shapes, createTriangle())
shapes = append(shapes, createSquare())
return shapes
}
func testDrawShapes() {
drawShapes(createTestShapes())
}
func main() { testDrawShapes() }
</div></pre>
When we run this program, it produces the expected output:
<pre
><div class="code">draw polygon with sides=3
draw from (1.000, 0.000) to (-0.500, 0.866)
draw from (-0.500, 0.866) to (-0.500, -0.866)
draw from (-0.500, -0.866) to (1.000, -0.000)
draw polygon with sides=4
draw from (1.000, 0.000) to (0.000, 1.000)
draw from (0.000, 1.000) to (-1.000, 0.000)
draw from (-1.000, 0.000) to (-0.000, -1.000)
draw from (-0.000, -1.000) to (1.000, -0.000)
</div></pre>
Note that we have not defined any methods on the
<code>triangle</code> and <code>square</code> types,
yet the compiler accepts them as implementing <code>shape</code>, as seen by the fact that
we can store them in a slice of <code>shape</code> and we can invoke <code>draw</code> on them.
Because we embedded <code>polygon</code> in <code>triangle</code> and <code>square</code>, without giving them field
names, Go has promoted all of the methods in <code>polygon</code> into the namespaces of
<code>triangle</code> and <code>square</code>, allowing <code>draw</code> to be called directly on an
instance of type <code>triangle</code> or <code>square</code>.
<br/><br/>
So far, relying on an object-oriented mental model has not caused us problems.
Let's keep going and see when it does.
<a name="overriding">
<h2>Overriding</h2>
</a>
We add a <code>typeName</code> method to our <code>shape</code> interface
and our "base class", <code>polygon</code>,
and we "override" that method in our "subclasses", <code>triangle</code> and <code>square</code>:
<pre name="hlcode" class="go"
><div class="code">type shape interface {
draw()
rotate(radians float64)
// translate and scale omitted for simplicity
typeName() string
}
func (p *polygon) typeName() string {
return "polygon"
}
func (p *triangle) typeName() string {
return "triangle"
}
func (p *square) typeName() string {
return "square"
}
</div></pre>
We can test our <code>typeName</code> methods by pointing our <code>main</code> to
a different test function:
<pre name="hlcode" class="go"
><div class="code">func printShapeNames(shapes []shape) {
for _, s := range shapes {
fmt.Println(s.typeName())
}
}
func testShapeNames() {
printShapeNames(createTestShapes())
}
func main() { testShapeNames() }
</div></pre>
This outputs:
<pre
><div class="code">triangle
square
</div></pre>
No problems yet.
<a name="downcall">
<h2>Downcall</h2>
</a>
Let's add a method to our interface and "base class" that invokes the method that we
are overriding, and a new test function to call it.
This is sometimes referred to as a downcall, in that a superclass calls into the
overriding method of a subclass that is below it in the class hierarchy.
<pre name="hlcode" class="go"
><div class="code">type shape interface {
draw()
rotate(radians float64)
// translate and scale omitted for simplicity
typeName() string
nameAndSides() string
}
func (p *polygon) nameAndSides() string {
return fmt.Sprintf("%s (%d)", p.typeName(), p.sides)
}
func printShapeNamesAndSides(shapes []shape) {
for _, s := range shapes {
fmt.Println(s.nameAndSides())
}
}
func testShapeNamesAndSides() {
printShapeNamesAndSides(createTestShapes())
}
func main() { testShapeNamesAndSides() }
</div></pre>
This outputs:
<pre
><div class="code">polygon (3)
polygon (4)
</div></pre>
Well, that doesn't look right.
We wanted it to print triangle and square instead of polygon both times.
Thinking of this as inheritance has led us astray.
<a name="promotion">
<h2>Method promotion</h2>
</a>
So, what happened here?
Why did <code>printShapeNames</code> work, but <code>printShapeNamesAndSides</code> did not?
Let's dig into that.
<br/><br/>
The return value of <code>createShapes</code> is <code>[]shape</code>, which is a slice of objects that implement
the <code>shape</code> interface. Since the <code>triangle</code> and <code>square</code> types implement that interface, we can store
instances of those types
in that slice. But how is it that those types implement that interface when we didn't write those
methods for those types?
The answer is method promotion.
<br/><br/>
When we embed one type inside another without giving the internal type a field name,
Go automatically promotes all unambiguous names from the embedded type to the containing type.
Effectively, for each method in the embedded type whose name does not conflict with a method
in the containing type or in any other embedded type within that container,
Go creates a method on the containing type that turns around and calls that method on
the embedded type. For example, when we embed <code>polygon</code> in <code>triangle</code>
the compiler effectively creates this code:
<pre name="hlcode" class="go"
><div class="code">func (t *triangle) typeName() string {
return t.polygon.typeName()
}
</div></pre>
If the embedded type satisfies an interface, and there are no ambiguous
method names, this promotion of all the methods of the embedded type
makes the containing type also satisfy that interface.
Let's explore this method promotion behavior.
We create another struct type called <code>thing</code> that
has a <code>typeName</code> method,
embed it along with our previously defined <code>polygon</code>,
which also has a <code>typeName</code> method, in a new type <code>polygonThing</code>,
then try to assign an instance of that to a variable of type <code>shape</code>.
<pre name="hlcode" class="go"
><div class="code">type thing struct{}
func (t *thing) typeName() string { return "thing" }
type polygonThing struct {
polygon
thing
}
func testPolygonThing() {
p := &polygonThing{}
p.draw()
fmt.Println(p.typeName())
var s shape = p
fmt.Println(s.typeName())
}
func main() { testPolygonThing() }
</div></pre>
When we compile this, we get these errors:
<pre
><div class="code">./comp.go:130:16: ambiguous selector p.typeName
./comp.go:131:7: polygonThing.typeName is ambiguous
./comp.go:131:7: cannot use p (type *polygonThing) as type shape in assignment:
*polygonThing does not implement shape (missing typeName method)
</div></pre>
where line 131 is the line where we are assigning to <code>s</code>.
<br/><br/>
From this error we can see that Go did not promote the <code>typeName</code> method from either
of the embedded structs into <code>polygonThing</code>. But there was no error message about the
call to <code>draw</code>, so it did promote that method from <code>polygon</code>, since it is
not ambiguous.
<br/><br/>
If we comment out the embedded <code>thing</code> line from the definition of <code>polygonThing</code>,
the code compiles.
If, instead, we comment out the embedded <code>polygon</code> line, we get different errors:
<pre
><div class="code">./comp.go:129:4: p.draw undefined (type *polygonThing has no field or method draw)
./comp.go:131:7: cannot use p (type *polygonThing) as type shape in assignment:
*polygonThing does not implement shape (missing draw method)
</div></pre>
If we want to keep both embedded structs in our composite struct,
there are a couple of ways we can resolve the ambiguity of <code>typeName</code>
appearing in both embedded structs.
The simplest is to assign a name to one
of the embedded structs, converting it to a regular field. Instead of writing
<code>thing</code> in the definition of <code>polygonThing</code>, we can write <code>t thing</code>.
Go then does not attempt to promote the methods from <code>thing</code> into <code>polygonThing</code>,
and the promotion of <code>typeName</code> from <code>polygon</code> into
<code>polygonThing</code> is no longer ambiguous, so it succeeds.
<br/><br/>
Another possibility is to resolve the ambiguity by defining a <code>typeName</code> method
directly on <code>polygonThing</code>. In this case, Go does not attempt to promote <code>typeName</code>
from either of the embedded structs. We can call a method in an embedded struct
by referring to that embedded struct as if it were a named field.
<pre name="hlcode" class="go"
><div class="code">func (t *polygonThing) typeName() string {
return t.polygon.typeName()+"Thing"
}
</div></pre>
With this definition, the program compiles and runs, outputting
<pre
><div class="code">draw polygon with sides=0
polygonThing
polygonThing
</div></pre>
<a name="solution">
<h2>Solution</h2>
</a>
Now that we understand how embedded structs work in Go, let's go back and reconsider what happened
with our <code>printShapeNamesAndSides</code> function.
<br/><br/>
Assume one of the elements in our slice of <code>shape</code> is an instance of <code>triangle</code>.
We call <code>nameAndSides</code> with that <code>triangle</code> as the receiver. Since we did not define <code>nameAndSides</code>
on <code>triangle</code>, that calls the promoted version of that method. That promoted method turns around and calls
<code>nameAndSides</code> on the embedded <code>polygon</code>, passing the embedded <code>polygon</code> as the receiver.
In <code>polygon.nameAndSides</code>, it calls <code>p.typeName</code>, but <code>p</code> here is the receiver of the
<code>nameAndSides</code> method, which is the <code>polygon</code>, not the <code>triangle</code>. So the call from <code>nameAndSides</code>
to <code>typeName</code>
call's the <code>typeName</code> method on <code>polygon</code> rather than on <code>triangle</code>.
<br/><br/>
With this understanding, let's update our code to make "overriding" work.
The difference between the behavior we are seeing and what we would expect from a system
with inheritance and overriding
is that here our "base class" does not, by default, make calls to methods of the "subclass".
It can't because the method in the "base class" has no reference to the type of the containing object.
In order to implement a call to method in an instance of a "subclass" from <code>polygon.nameAndSides</code>, we need a reference to
that instance, such as a <code>triangle</code>.
We will do this by explicitly passing our <code>shape</code> as an argument, then calling the <code>typeName</code> method on
that <code>shape</code> rather than on the receiver.
By calling a method on a passed-in argument rather than the receiver,
it is clear, when looking at that method in the "base class", that the call may
be going to a different type of object than <code>polygon</code>.
<pre name="hlcode" class="go"
><div class="code">type shape interface {
...
nameAndSides(s shape) string
}
func (p *polygon) nameAndSides(s shape) string {
return fmt.Sprintf("%s (%d)", s.typeName(), p.sides)
}
func printShapeNamesAndSides(shapes []shape) {
for _, s := range shapes {
fmt.Println(s.nameAndSides(s))
}
}
</div></pre>
With these changes, we get the expected output:
<pre
><div class="code">triangle (3)
square (4)
</div></pre>
<a name="conclusion">
<h2>Conclusion</h2>
</a>
The way Go promotes methods of embedded structs makes it have some of the characteristics of
inheritance as defined in object-oriented programming. In particular, it allows for methods to
be automatically promoted to the containing struct, and thus for interfaces to be automatically
promoted to the containing struct. One key difference is that, when you override one of those
promoted methods in the containing struct, the code in the embedded class does not automatically call the overridden
method in the containing class, as happens in some object-oriented languages such as Java.
<br/><br/>
You may have heard of the <a href="https://en.wikipedia.org/wiki/Fragile_base_class">fragile base class</a> problem.
A related issue, that can arise when there are downcalls from a superclass to an overridden method in a subclass,
similar to the example here where I "overrode" the <code>typeName</code> method,
might be termed the fragile subclass problem.
If you are interested into digging into that, you can read
<a href="http://www.cs.ucf.edu/~leavens/tech-reports/ISU/TR00-05/TR.pdf">Safely Creating Correct Subclasses without Seeing Superclass Code</a>,
a paper from OOPSLA 2000 that examines that issue. See section 4.
The designers of Go chose not to implement inheritance, but instead to
<a href="https://en.wikipedia.org/wiki/Composition_over_inheritance">favor composition</a>.
Although some Go constructs can look a little like inheritance, it's better to
start thinking about designing in Go using composition rather than trying to bend
Go to do something like inheritance.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-61402247762103693592019-06-11T21:34:00.000-07:002019-06-11T21:34:59.312-07:00A Future TelescopeThis post describes an idea for a telescope
that can see where heavenly objects will be in the future.
This may sound crazy, like something out of a science-fiction story,
but I believe it is based on solid theory. Unless, or course, I have
misinterpreted something. Read on if you enjoy considering surprising
extrapolations of theory.
<h2>Contents</h2>
<ul>
<li><a href="#collective_electrodynamics">Collective Electrodynamics</a></li>
<li><a href="#interpreting">Interpreting the Theory</a></li>
<li><a href="#big_idea">The Big Idea</a></li>
<li><a href="#details">The Details</a></li>
<li><a href="#invitation">An Invitation</a></li>
</ul>
<a name="collective_electrodynamics"></a>
<h2>Collective Electrodynamics</h2>
Carver Mead's book
<a href="https://mitpress.mit.edu/books/collective-electrodynamics"
>Collective Electrodynamics</a>, first published in 2002,
puts forth a theory of electrodynamics based on
<a href="https://en.wikipedia.org/wiki/Four-vector">four-vectors</a>. As with many
other low-level aspects of physics, this theory is time-symmetric, making no
claims about how to distinguish between the past and the future.
<br/><br/>
I found Carver's theory and his exposition of it to be elegant and convincing.
Even if you don't agree with my interpretation and conclusions in this post,
I recommend you read this book if you are generally interested in physics.
<br/><br/>
Carver's description of the process of photon emission and absorption
includes a few comments noting that a photon will not be emitted
without a destination that will absorb
the photon at some point in the future, because the emitter and absorber
are a coupled pair forming a single resonator.
<ul>
<li>
In section 4.8: "Any energy leaving one resonator is transferred to some
other resonator, somewhere in the universe."
</li>
<li>
In section 4.12: "The spectral density of distant resonators
acting as absorbers is, of necessity, identical to that of the
resonators producing the local random field, because they are the
same resonators."
</li>
<li>
In the Epilogue: "It is by now a common experimental fact that an atom,
if sufficiently isolated from the rest of the universe, can stay in
an excited state for an arbitrarily long period. ... The mechanism for
initiating an atomic transition is not present in the isolated atom;
it is the direct result of coupling with the rest of the universe."
</li>
</ul>
Part 5 describes how two atoms couple electromagnetically as resonators.
<a name="interpreting"></a>
<h2>Interpreting the Theory</h2>
As a thought experiment, if we were out in space in some part of
the universe in which there were no matter in one direction, we would
not be able to shine a flashlight in that direction because there
would be nothing to absorb the photons, therefore they would not be
emitted. If we were able to measure all of the other energy going into
or out of the flashlight, we would be able to notice that energy leaves
the flashlight when we point it towards other things, but not when we
point it towards truly empty space.
<br/><br/>
Coming back to our current location in the universe, there is a finite
amount of matter between us and the
<a href="https://en.wikipedia.org/wiki/Hubble_volume">Hubble sphere</a>.
Consider a line
segment from our location to a point on the Hubble sphere. If there are
no atoms on the intersection of said line segment and our future light
cone, then it should not be possible to emit a photon in that direction.
More restricted, if there are no atoms in that intersection that are
capable of absorbing a photon of the frequency our source atom is
attempting to emit, then we will not be able to emit said photon in
that direction.
<a name="big_idea"></a>
<h2>The Big Idea</h2>
Assume, then, that we have a highly directional monochromatic light
source that we can point accurately, and that we can accurately know
how much light we are emitting based on energy input measurements. What
would happen if we were to provide that light with a suitable
input power signal, then scan the sky? If there are any differences
in the density of atoms in different directions that are capable of
absorbing photons of the frequency we are sending, would we be able to
produce a map of the sky showing those differences? Would there be any
anisotropism, as there is for the background radiation?
<br/><br/>
Given how much matter there is in the universe, I suspect it would be
hard to find one of those line segments out to the Hubble sphere without
a single atom capable of absorbing one of our photons, but perhaps if we
are trying to send out a great many photons, there will be enough of a
statistical variation to measure.
<br/><br/>
The thing that I find fascinating about this is that, if it did in fact
work, we would be "seeing the future", because whatever map we produced
would be a function of where the absorbing atoms are going to be when
the light we emit reaches them.
For planets in our solar system that would be minutes or hours in the future,
but for distant nebulae that could be millions or billions of years from now.
<a name="details"></a>
<h2>The Details</h2>
The devil is in the details. Even if, in principle, the theory supports this
conclusion, would it be possible to build such a device?
<br/><br>
In addition to the statements of theory, I make two assumptions above:
<ol>
<li>
We can accurately point our light source, such that we can perform a raster
scan on a portion of the sky.
</li>
<li>
We can determine how much light energy is leaving our light source by measuring
the input energy to that source.
</li>
</ol>
The first assumption seems straightforward: the optics involved in sending out a
beam of light to a small portion of the sky should be the same as receiving light
from a small portion of sky, which we do on a regular basis to form images of space.
But I am not an astronomer, so I may be missing something. For example, I know that
some modern telescopes use a
<a href="https://en.wikipedia.org/wiki/Laser_guide_star">guide laser</a>
shining up through the atmosphere to allow for
dynamic adjustments to the mirrors to compensate for atmospheric distortion.
Would this also work when sending out a signal beam alongside the reference beam?
I don't know why not, but, as mentioned, this is not my area of expertise.
<br/><br/>
I think the second assumption may require more effort to solve. The typical advice
for powering a laser is to use a
<a href="https://www.teamwavelength.com/laser-diode-driver-basics/">current source</a>
in order to get a stable output.
For my experiment, however, I specifically don't want a stable source. Instead, I
want a source that can output more or less light based on how much the
space into which it is shining can accept.
<br/><br/>
Since I can't directly measure the light output, I also need a light source where
I can accurately judge how much light is being output by measuring the input power.
This means I need to know the power transfer characteristics of the light source.
How much of the input power is transformed into light, and how much into heat or
other forms of energy? Is that relationship constant over time, or might it vary
such that at one point in time I get x% of the input turning into heat, and moments
later I get 2x% turning into heat? Alas, I am not a solid-state physicist (assuming
my light source is a solid-state laser), so I don't know the answers to these questions.
<a name="invitation"></a>
<h2>An Invitation</h2>
So, what do you think?
Is there a fatal flaw to my understanding of the theory?
A fundamental reason why it would not be possible to build such a
"future telescope"?
A technical limitation making it not currently possible?
<br/><br/>
I have talked to a few people about this idea, and the ones who I know have
a good understanding of Carver's theory have said that, in principle, they
don't see anything wrong with my reasoning.
<br/><br/>
AsI mentioned above, I'm not an astronomer or solid-state physicist,
so I don't have the background
to take this concept to the practical stage.
But perhaps someone else does.
<br/><br/>
This seems like it would be a very exciting thing if it worked,
but I think it would require a significant investment of time
and access to some expensive equipment to take the next step.
Would anyone like to give it a try?
If you do, I'd love to hear about it.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com2tag:blogger.com,1999:blog-7045524330253482541.post-16846447166028291832018-11-27T20:00:00.000-08:002018-11-27T20:00:40.524-08:00Wormhole MusingsI have questions about how wormhole portals in science fiction stories work.
<br/><br/>
Recently I started reading another science fiction novel where wormholes allow
instantaneous travel between distant points. In books that use this mechanism, the author typically explores
how the ability to travel easily and quickly between the stars shapes the course of history.
<br/><br/>
But I always get hung up thinking about all the other ways in which a portal might possibly be
used, for good or evil, in ways much less grand but potentially more disruptive
than distant travel. Of course, since the use of wormholes in these books does not rely on
our currently generally accepted science, these questions do not have well-defined answers.
That's why I muse.
<br/><br/>
In this article, I ask some questions about how some of our currently accepted principles of physics
apply (or don't apply) to wormholes, and ponder the ways in which one might use (or misuse) a
wormhole based on the answer to those questions.
<br/><br/>
<b>Caveat lector:</b>
If you want to keep reading wormhole stories without being distracted by questions like these,
you might want to stop reading now.
Because once you read these questions, you won't be able to unread them.
<h2>My Questions</h2>
<ul>
<li><a href="#expense">How big and expensive is the equipment required to create and maintain a wormhole?</a>
<li><a href="#energy">How much energy is required to create and maintain a wormhole?</a>
<li><a href="#shape">What shape is a wormhole?</a>
<li><a href="#size">Can I make a wormhole as large or as small as I want?</a>
<li><a href="#control">How do you control the location of the wormhole portals?</a>
<li><a href="#energy_conserved">Is energy conserved when traversing a wormhole?</a>
<li><a href="#momentum_conserved">Is momentum conserved when traversing a wormhole?</a>
<li><a href="#forces">How do physical forces propagate through a wormhole?</a>
<li><a href="#geometry">What is the geometry of the wormhole connection?</a>
<li><a href="#time">In what reference frame is traversal of the wormhole instantaneous?</a>
</ul>
<ul>
<li><a href="#answers">Potential Answers</a>
</ul>
<a name="expense"></a>
<h2>How big and expensive is the equipment required to create and maintain a wormhole?</h2>
Mainly what I want to know for this question is whether the equipment is small and inexpensive enough
that an individual can own one. If they are within the reach of many people, that makes
it much more likely that there will be some people who will use it for unexpected purposes.
<br/><br/>
I once read a story in which someone had invented a personal flying belt that anyone could get
for five dollars. With such easy personal mobility, border control suddenly became much more difficult,
which of course led to some interesting problems. If anyone could buy and control a wormhole for
five dollars, that would be a very different situation than if there were only a few wormholes
controlled by a few rich and powerful entities.
<a name="energy"></a>
<h2>How much energy is required to create and maintain a wormhole?</h2>
Although science fiction wormholes don't rely on any currently known physics, my feeling is that
any scientifically plausible mechanism for a wormhole would require a prohibitive amount of energy
to use. And I mean the word prohibitive literally: the amount of energy required would be so high,
it would effectively prohibit the possibility of using a wormhole.
<br/><br/>
Since that doesn't make for good science fiction stories, we have to assume that the energy
requirement is modest enough that we are able to produce and use wormholes. The question then
becomes, how much energy is required? This question is related to the earlier question about
cost, in that if a wormhole requires a relatively large amount of energy to operate, that could
restrict its operation to a small number of controlling entities. Whereas if I can run it with
a D-cell battery, there would be many more interesting things I could do with it.
<br/><br/>
It may not matter how much energy it requires to operate a wormhole, because, as discussed in some
of my comments below, it seems likely that once you have a wormhole you could get as much
free energy as you want.
<a name="shape"></a>
<h2>What shape is a wormhole portal?</h2>
In most stories, wormholes portals are portrayed as circular areas that you step through, much like
the entrance to a common tunnel. This is very convenient for imagining things like train lines
that run through wormholes, and for thinking about the equipment that might be required to
hold open a wormhole portal. That equipment is sometimes described as a torus with massive
structures around it.
<br/><br/>
I think it is more likely that a wormhole would be spherical. You could enter it from any direction,
and you would exit in a direction based on the direction you entered. This is a bit harder to visualize,
which may be one reason it is not often described this way.
<br/><br/>
If a wormhole portal is a sphere, how does that impact the equipment required to maintain it?
It would be tough to have equipment symmetrically on all sides and still have something that
allows easy access. But maybe it doesn't all have to be completely symmetrical, so you can leave a few
holes to let the trains get through the equipment so they can enter the portal.
<a name="size"></a>
<h2>Can I make a wormhole as large or as small as I want?</h2>
In most stories, wormholes are of a size that makes them convenient to step through,
or drive a car or train through. Is this an essential feature of wormholes, or is it just
that that happens to be the most convenient size? Could we make them any size if
we wanted to? Perhaps big wormholes would be harder, but I would think smaller wormholes
would actually be easier to make.
And I can think of lots of interesting uses for small wormholes, depending on the answers
to the other questions.
<br/><br/>
One example of a good use for a tiny wormhole would be to shine a laser through it and have
a high capacity communication channel.
<a name="control"></a>
<h2>How do you control the location of the wormhole portals?</h2>
Some stories postulate that maintaining a wormhole portal requires physical
equipment at both ends. In this case, the question of how to control the
location of the portal is clear: you have to move the equipment to move
the portal.
<br/><br/>
In other stories, the two ends of the wormhole are created at
one location, after which one end can be moved to another location.
In considering the geometry of wormholes, I would guess that
it is possible to move one end of a wormhole through another
wormhole, but perhaps only if the wormhole being transported is
sufficiently smaller than the one it is being moved through.
<br/><br/>
If equipment is required at both ends of the wormhole,
establishing a wormhole from A to B requires
first traveling from A to B through normal space to deliver the necessary equipment,
or possibly from C to B if the two ends of the wormhole don't need to be created in one place.
This constrains the expansion of an interstellar civilization to the speed of light,
which is annoyingly slow to some authors.
<br/><br/>
The more interesting case, as postulated in some stories, is that you can project the
other end of the wormhole to a desired location without first having to get there some other way.
This is, of course, a much-preferred mechanism if you want to quickly expand your network
of gates, since who wants to wait many years while the slowship takes your gate to the
next star?
But what could we do if we could project the other end of our portal to anywhere
we wanted in space?
<br/><br/>
If I can project tiny wormholes, I could do cut-less surgery.
Mining would be much cheaper, as I could just project a wormhole down to where the ore is
without having to tunnel or strip-mine down to it.
I could make a great vacuum pump by putting one end out in space.
<br/><br/>
At a more banal level, I could eat as much as I want and not gain weight. I just need to
project a tiny wormhole into my stomach and remove the food I just ate before my body
digests it. I get all the pleasure of eating without suffering the problems of obesity.
<br/><br/>
I read one story in which a little wormhole was located on the bottom of a drinking glass, with
the other end at the bottom of a vat of beer, wine, or whatever drink was selected. Each time
the glass was set down, the wormhole would open to fill the glass, then close once
the glass was full.
<br/><br/>
If I put on my black hat, the most obvious nefarious deed is, I project the other end of my wormhole
into a bank vault and walk off with the cash. Or into a collection of classified documents
and walk off with the secret plans. Or into my enemy's bedroom and kidnap him or
kill him. I really only need to project a tiny wormhole, big enough
for a bullet, to do a dastardly deed. Or so small it's only big enough for a packet of
viruses that I inject into his bloodstream without him even knowing it.
<br/><br/>
If we can project one end of our wormhole to any desired location in space, perhaps
we could project both ends. This would allow us to establish a wormhole between any
two points anywhere in space, without having to have equipment at either end.
This could actually be an interesting premise for a story, as it would allow for the
case where there is a single wormhole-generating facility that creates all of the
wormholes used throughout the civilization. That facility would presumably be
controlled by some now-very-powerful entity, and would be both
heavily secured and heavily attacked, so there are lots of opportunities for story lines.
<br/><br/>
The ability to create a wormhole between any two other points in space also opens up
lots of additional opportunities for mischief. One could create a pretty effective
weapon of mass destruction by creating a wormhole with one end in the middle of the
sun and the other end where you want the destruction. Or put one end
in the middle of a magma reservoir, or deep in the ocean, depending on the type of
destruction desired. Or put one end in space to suck everything into the vacuum.
<br/><br/>
On the positive side, one could create a really nice package delivery system.
Open a wormhole between the package source and destination, drop the package in
for instant delivery, and close the wormhole.
<br/><br/>
Assuming we have the ability to create a wormhole portal anywhere in space, there is
still the question of how we figure out where it gets created. Do we have to use
trial and error to place the wormhole in just the right place? If we are trying to
create a wormhole portal in a distant location, do we have to worry about the precision
of our equipment, in the same way that launching a spaceship to land on Mars requires
more precise equipment than launching one to land on the moon? Can we create the
remote wormhole portal and then move it around at will, and if so, can we move it
faster than the speed of light?
<a name="energy_conserved"></a>
<h2>Is energy conserved when traversing a wormhole?</h2>
In most wormhole stories, one can step through a wormhole to get from
one end to the other with no more effort than walking across the room.
There is no explicit discussion of conservation of energy,
and my assumption is that the authors don't worry about it because that detail
doesn't advance the story.
But I worry about it.
<br/><br/>
If I open a wormhole between Earth and its moon, there is a pretty big difference in
the gravitational potential energy between those two points. When I want to put something in
the wormhole portal on Earth and have it come out on the moon, do I need to supply the difference in
energy between those two points? That would mean supplying a whole lot of energy to move in that
direction. Conversely, if I step through the wormhole from the moon back to the Earth, what happens
to all that gravitational potential energy?
<br/><br/>
If I can move from one end of a wormhole to the other end without having to supply that extra
energy, then I can get free energy. Here's one way: go find a big dam with a hydro generating
plant and install a wormhole with the entrance portal under the water at the bottom of the dam,
just past the outflow of the generator, and with the exit portal just above the surface of the lake
at the top of the dam. Since the entrance portal is underwater and the exit is above, water flows
into the entrance portal and comes out at the exit portal. Thus the lake is ever refilled and our
hydroelectric generators can keep running.
<br/><br/>
Maybe the wormhole technology works like a battery with regenerative braking on electric cars: it supplies the
energy needed when traveling in one direction, and absorbs the excess energy when traveling in
the other direction.
<a name="momentum_conserved"></a>
<h2>Is momentum conserved when traversing a wormhole?</h2>
If I am in New York City, the Earth's rotation is moving me at about 700 miles per hour relative
to the center of the Earth. At the same time, Sydney is also moving at about 700 miles per hour,
but in roughly the opposite direction, as it is almost on the opposite side of the Earth.
If I open a wormhole between New York City and Sydney, and I step through, what happens to that
1400 miles per hour difference? Do I splat into the nearest wall at supersonic speed, or do I
casually step through and continue walking to my destination?
<br/><br/>
If momentum is conserved, then I would be moving at a high speed relative to the exit point of
the wormhole. If I put the appropriate mechanical devices next to the wormhole exit, I could send
through a rock, catch it moving at 1400 miles per hour, and convert that kinetic energy to
electricity. Then I could toss the rock back and do the same thing on the other side. Free energy.
<br/><br/>
The question of conservation of momentum is subtler than it first appears. If I want to conserve
momentum, I come out of the wormhole in Sydney with that supersonic velocity relative to the city.
But what does that mean for the angular momentum of the system? If I just moved that mass over to
a new location and nothing else changed, then I have changed the angular momentum of the system.
If the whole earth moves a tiny bit in the other direction, to keep the same center of mass, that
could take care of that issue, but why should the whole Earth move when I use a wormhole?
Would that happen if I were in an airplane? In a spaceship in low orbit? In a spaceship in high
orbit? In a spaceship at the orbit of the moon, or beyond?
<br/><br/>
As with conservation of energy, perhaps the wormhole portals absorb or supply momentum as needed,
transferring it to the surrounding masses. This could mean that wormhole portals would most effectively
be placed on large masses such that they had a reservoir of momentum to transfer to or from.
The larger the masses that were transferred through a wormhole, and the larger the relative velocity
of the portals, the more momentum would have to be transferred, and the larger the attached
mass would have to be.
<a name="forces"></a>
<h2>How do physical forces propagate through a wormhole?</h2>
In every wormhole story I have read, light traverses a wormhole with no problems.
I assume that means all forms of electromagnetic radiation traverse a wormhole equally easily.
This presents another opportunity for a good energy source: put a wormhole portal in close orbit around
the sun, then put the other wormhole portal
on Earth. Stream that high-intensity light through and use it to drive solar cells for direct
production of electricity, or as a heat source for standard steam turbines.
If no equipment is required at the solar end of the wormhole, you're all set.
If equipment is required, you might have to build some kind of refrigerator
that brings that heat back to Earth and keeps the equipment cool.
<br/><br/>
How about gravity? How does that propagate through a wormhole?
Most wormhole stories I have read describe travelers stepping through a wormhole
and experiencing a discontinuity in the gravity field, meaning gravity is not
propagating through the wormhole. This seems odd to me. Why would light
propagate through a wormhole but not gravity?
<br/><br/>
The intensity of light from a point source drops off proportionally to the distance squared, which
makes sense because the light is spreading out at that rate, and a fixed-size
object intercepting the light will thus get less of it when it is further away.
Because of this behavior, it makes sense to me that the amount of light that would
come through a wormhole would be proportional to its size. If the wormhole is very small,
only a small amount of light would come through.
<br/><br/>
Gravity also drops off proportionally to the distance squared, but not quite for the
same reason. Given a particular mass, the gravitational force on that mass is independent of whether
it is small and dense, or larger and less dense. The amount of area covered by the mass
is not important, only its mass and its distance from another mass.
If there is a tiny wormhole and I can measure a distance through that wormhole from my object to
a large mass, wouldn't that mean the gravitational force is proportional to the square of that distance?
<br/><br/>
If gravity does propagate through a wormhole, perhaps I could make a null-gravity region
by creating a pair of wormhole portals, then putting each one slightly above the surface of the Earth and
upside down from each other. If you were
to stand under one portal and look up, you would see the Earth above you. You
have one Earth gravity below you and one above, so they cancel out and you have no gravity.
A nice tourist attraction.
Then again, the two Earths would also be exerting a gravitational pull on each other,
so whatever is holding up each wormhole portal might be carrying the weight of the world.
<br/><br/>
On the other hand, given that General Relativity says
that mass causes curvature of space, and thus gravity, and wormholes are usually described as some
way of warping space, that seems to imply that being able to control wormholes means being able
to control the curvature of space and thus being able to control gravity. So perhaps based on
that we can choose how we want gravity to propagate through wormholes for our stories.
<br/><br/>
If you can turn wormholes on and off at will, you might be able to use this effect to
get some free energy.
You turn on a wormhole, have it pull up a weight, then turn it off, let the weight fall,
and use that to generate energy.
<a name="geometry"></a>
<h2>What is the geometry of the wormhole connection?</h2>
A wormhole is usually described as a connection that goes through a higher dimension than
the three dimensions in which we live. Those higher dimensions may present degrees of freedom
that can lead to some curious and unpleasant results. Let me try to explain with a flatland analogy.
<br/><br/>
If I live in a two dimensional space, I can create a wormhole by folding that sheet of space
until two points meet, then punching out a circle around those two points, and sewing those
two circles together. This is topologically equivalent to attaching a hose that stretches up from
a circle around one of those points and comes down at a circle around the other,
with the assumption that the hose represents no distance (or a
very short distance). A 2D creature could move from regular space onto the surface of that
hose (assuming the hose diameter is much larger than the creature),
then to regular space on the other end, then return to its original location via regular
space, and all is well.
<br/><br/>
Now consider what happens if I take that same hose, but instead of going up from the first point
and down at the second, I go up from the first point, then go around to the under side of the plane
(which I can do without going through the plane if I have yet another dimension) and come up from
the bottom side of the plane to meet the second point. Consider again what happens to that 2D creature
who travels into the wormhole, out the other end, and returns to its starting point in normal 2D space.
The result is that it comes back inverted. What was left is now right, and vice-versa.
<br/><br/>
I once read an old science fiction story in which there was a place deep within
the Amazon where, if you navigated a certain course, it would reverse everything left to right.
An enterprising businessman heard this and figured he could more efficiently make shoes
by manufacturing only left shoes,
then shipping half of them around this circuit, so he went exploring to find it. After going around
the course, he looked at his sample left shoes, but they were all still left. Frustrated, he threw
them all away, destroyed the worthless maps, and returned to civilization - only to discover that in
fact the trick had worked, but he had not recognized it because he, too, had been reversed. But he
could never find the place again.
<br/><br>
Getting your body flipped left to right would probably be fatal. Almost all of our body chemistry
is chiral, so you would not be able to extract any nutrition from most foods, and you would
starve to death or die of malnutrition.
<br/><br/>
If there is an extra dimension in which a wormhole exists, why not two extra dimensions?
If there are two or more extra dimensions, you now have the issue described above, and you
will need to make sure you get the two ends of your wormhole attached with the right geometry,
or things that move through the wormhole might not come out quite as expected.
<br/><br/>
Of course, a black-hat could surely come up with evil things that could be done with that kind
of wormhole.
<br/><br/>
When considering wormhole geometry, another potential problem is the curvature of space in the
wormhole. According to Einstein's Theory of General Relativity, curved space causes uneven acceleration.
Too much curvature can lead to disastrous gravitational tidal effects that can tear things apart.
Small wormholes would be most likely to have this problem.
Larger wormholes, like <a href="https://en.wikipedia.org/wiki/South_Pass_(Wyoming)">South Pass</a>
through the Rockies, would allow that curvature to be
spread out enough to be hardly noticeable.
<a name="time"></a>
<h2>In what reference frame is traversal of the wormhole instantaneous?</h2>
This is the issue which to me is the killer.
<br/><br/>
Einstein's Theory of
<a href="https://www.google.com/search?q=special+relativity">Special Relativity</a>
is quite well supported by experimental evidence.
According to that theory, there is no such thing as universal simultaneity,
so we have to ask what instantaneous travel means.
<br/><br/>
You may have heard that, according to Special Relativity, if observer A with clock A in spaceship A
is moving near the speed of light relative to observer B, clock A will run more slowly than observer B's clock B,
according to observer B, due to <a href="https://en.wikipedia.org/wiki/Time_dilation">time dilation</a>.
But at the same time, according to observer A, observer B with clock B is moving near the
speed of light relative to A, so observer A sees clock B as moving more slowly. This effect is
the core of the
<a href="https://en.wikipedia.org/wiki/Twin_paradox">twin paradox</a>, where one twin gets
on a spaceship from Earth, flies away at near light speed, and returns, while the other stays
on Earth.
<br/><br/>
The twin paradox is resolved by noting that there is an asymmetry between the twins: one stays
at rest on Earth, whereas the other accelerates three times during the trip (takeoff, turnaround,
and landing). This difference is the key to understanding the paradox and determining that the
twin on the spaceship ages more slowly than the one left on earth.
<br/><br/>
In 1971 a couple of scientists
<a href="https://en.wikipedia.org/wiki/Hafele%E2%80%93Keating_experiment">ran an experiment</a>
where they took some atomic clocks with them on commercial flights
around the world and confirmed that they really did slow down as compared to the
stationary atomic clocks left behind, just as predicted by Special Relativity
(and by General Relativity, which predicted time dilation due to gravitational differences).
<br/><br/>
For instantaneous travel between wormholes, it seems like we can set up a symmetric situation
so that we can't resolve our paradox the same way as for the twin paradox.
Consider the situation where we have a wormhole between two spaceships (or planets, if you prefer) A and B
that are moving at near
the speed of light relative to each other. As noted above, the observer in each location observes
the clock moving more slowly at the other location.
If person C with clock C steps from spaceship A to B through the wormhole, spends a bit of time on spaceship B, then comes back to
spaceship A, observer A will calculate that clock C will be behind clock A, having moved more slowly than clock A
while it was on spaceship B.
If person D with clock D steps from spaceship B to A through the wormhole, spends a bit of time
on spaceship A, then goes back to spaceship B,
observer A will calculate that clock D will be ahead of clock B, having moved more quickly than clock B
while it was on spaceship A.
But in this symmetric situation, observer B will calculate that
clock C will be ahead of clock A, and clock D will be behind clock B, the opposite of what
observer A calculates.
So which is it?
<br/><br/>
The problem here is that statement that travel between wormholes is instantaneous.
According to Special Relativity,
two events that occur at the same time but different locations in one reference frame
will occur at different times in a reference frame that is moving with respect to the first.
For our example, this means that if observer A sees person C moving instantaneously through
the wormhole from A to B, observer B does not see person C moving instantaneously through
the wormhole except for when A and B are right next to each other. And since A and B are
moving with respect to each other, they will not be right next to each other for at least one
leg of the wormhole round trip. When A and B are not right next to each other, what appears as simultaneous
in one reference frame is not simultaneous in the other reference frame.
<br/><br/>
The only way I know of that is consistent with Special Relativity that would allow wormhole
travel to be instantaneous according to both ends of the wormhole would be to constrain wormholes
to be stationary relative to each other. But this would be a pretty strong constraint for stories,
since essentially everything in the universe is moving relative to each other,
and even the rotation of a planet is enough velocity variation to cause measurable time issues
across the kind of distances wormholes sometimes connect.
<br/><br/>
But wait, it gets crazier.
By the laws of Special Relativity, if you have <i>any</i> mechanism that lets you
move between two points faster than the speed of light, in any arbitrary frame of reference,
you can use that mechanism to travel backwards in time.
The <a href="https://en.wikipedia.org/wiki/Tachyonic_antitelephone">Tachyonic antitelephone</a>
is an example of how being able to send a message faster than
light allows sending a message backwards in time, and this same principle applies to
sending an object rather than a message.
<br/><br/>
One way to explain this is based on the assertion of Special Relativity that two events that
are not at the same location in space that occur simultaneously in a frame of reference A will not be
simultaneous in a frame of reference B that is moving with respect to A. In frame B, one of
those two events will happen before the other. Let's assume that we have a wormhole with a
pair of distant portals that are stationary in frame A, and another wormhole with portals
stationary in frame B, moving with respect to frame A in the direction from one of the A portals to the other.
We arrange the portals such that wormhole portal B2 is immediately adjacent to wormhole portal A2
at the starting time of our experiment
according to observer A located at A1,
and we arrange that B1 and B2 are adjacent to A1 and A2, respectively,
at the same time in frame B.
At the starting time in A, we step from portal A1 to A2.
Since we arranged for B2 to be adjacent to A2 at this time, we can immediately move over
to B2 and step through to B1, which we assume is instantaneous in frame B.
Because we have arranged that B1 is adjacent to A1 at the same moment as B2 is adjacent to A2
in frame B, when we exit B1 we can then hop back over to A1 and complete our circuit in space.
Since our trip through the wormhole B is instantaneous
in frame B, it will not be instantaneous in frame A.
For the traveler, all four legs of the trip
are nearly instantaneous, but for an observer who remains in A only three legs are,
with the leg through wormhole B not being instantaneous.
Depending on which direction travelers takes around this loop, they will return to A1
either well after or well <i>before</i> the time they left.
<br/><br/>
The amount of time is proportional to the distance traveled through the wormholes
and is related to the velocity of one frame with respect to the other.
If frame B is traveling near the speed of light relative to A, the amount of time will be close to
the light-distance between the two ends of the portal, so even if you are "just" traveling to
<a href="https://en.wikipedia.org/wiki/Proxima_Centauri_b">Proxima Centauri B</a> near Alpha Centauri,
the closest extrasolar star group to Earth at four light years away,
you could travel up to four years into the future or the past.
The effect is less pronounced, but still present, at lower speeds.
<br/><br/>
Note that Special Relativity itself doesn't preclude faster-than-light messages or travel,
it just says that being able to do so allows sending a message or traveling backwards in time,
as demonstrated above.
Our current theories do not say this is not possible, but most
people believe in causality and thus find time travel problematic.
<br/><br/>
If you want to get a better intuitive feel for some of the weird things that happen when you start moving
at near the speed of light, check out the free video game
<a href="http://gamelab.mit.edu/games/a-slower-speed-of-light/">A Slower Speed of Light</a> from MIT.
<a name="answers"></a>
<h2>Potential Answers</h2>
Given that typical science fiction wormholes are based on new science beyond our current theories,
we have a lot of leeway in deciding how that science works so as to create the conditions that
best advance our story.
We could say that managing wormholes requires an amount of money and energy that are only available to large
organizations,
or we could say that, once the science is known, wormholes are easy and cheap and anybody can make
them, and see what kind of havoc is wreaked.
We could say that small wormholes are easy to make, or that larger wormholes are easier.
We could choose the geometry of the wormhole and portals to be troublesome or trivial.
We could say that wormhole portals require equipment to maintain, or that we can cast them
anywhere with ease.
<br/><br/>
All of the above choices are pretty easy in the sense that they are about the fictional new
wormhole science and don't conflict with our existing science. Things get a little harder when
we try to decide how conservation of energy and momentum work with wormholes, but even there we
should be able to postulate something that allows us to remain consistent with known science,
such as the wormhole absorbing or supplying the difference, or perhaps even requiring an exchange
of equal mass from either end of the wormhole.
<br/><br/>
Propagation of gravity through a wormhole seems to me a little more difficult to deal with.
As mentioned above, you might be able to claim that wormhole technology allows controlling the
curvature of space. But another view of mass and space is that mass <i>is</i> the curvature of
space, in which case making space curve is equivalent to creating mass, and at that point we get
into all the questions of conservation of mass and energy and where it comes from when curving
space for a wormhole.
<br/><br/>
The one that I really can't figure out how to make consistent is, as mentioned above, the question of time.
The main reason wormholes are typically introduced is to allow faster-than-light travel,
which, as described above, is what leads directly to the potential of time travel, according
to Special Relativity. For all of the other questions, it seems like it may be possible to define
some new science that answers those questions in a way that does not require us to discard any of
our current well-established scientific theories, but for faster-than-light travel, I don't see
any way to do this.
<br/><br/>
I can't even just assume that Special Relativity doesn't apply in that universe. There is a deep
connection between having the
<a href="https://physweb.bgu.ac.il/COURSES/test/LEC_BGU/LEC_BGU/LTs.pdf">same laws of physics everywhere</a>,
<a href="http://wtamu.edu/~cbaird/sq/2016/02/18/how-is-a-magnetic-field-just-an-electric-field-with-relativity-applied/">electromagnetism</a>,
and <a href="https://physics.stackexchange.com/q/231639">having a maximum velocity</a>
for any matter or information.
Special Relativity builds on the work of
<a href="http://rsta.royalsocietypublishing.org/content/366/1871/1861">Newton and Maxwell</a>.
and discarding it would require some other significant changes to the way the universe works.
<br/><br/>
A science fiction author might choose to focus on how wormholes allow time travel, as
Robert L. Forward does in some of his stories.
For the other stories, the ones that don't mention time travel, I just have to suspend my understanding
of Special Relativity and enjoy the story as told.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-71029563804793484112018-04-13T20:29:00.000-07:002018-04-13T20:29:15.034-07:00Golang Web Server AuthAn example of authentication and authorization in a simple
web server written in go.
<h2>Contents</h2>
<ul>
<li><a href="#background">Background</a>
<li><a href="#beforeauth">Before Auth</a>
<li><a href="#authentication">Adding Authentication</a>
<li><a href="#authorization">Adding Authorization</a>
<li><a href="#summary">Summary</a>
</ul>
<a name="background">
<h2>Background</h2>
</a>
As described in my
<a href="http://jim-mcbeath.blogspot.com/2018/03/golang-server-polymer-typescript-client.html">previous blog post</a>,
I recently rewrote my image viewer desktop app as a web app,
for which I wrote the web server in go.
<br/><br/>
Since I was adding a new potential attack vector, I wanted to add security;
but since this is only available on my internal network, and it's not
critically valuable data, I did not need enterprise-grade security.
In this post I describe how I implemented a relatively simple
authentication and authorization mechanism, in particular highlighting
the features of go I used that made that easy to do.
For a simple app such as this one, the third of the
<a href="http://jim-mcbeath.blogspot.com/2012/10/role-based-authorization.html#intro">three As</a> of security, auditing,
can be done with simple logging if desired.
<br/><br/>
The code I present here is taken from the github repo for my
<a href="http://www.github.com/jimmc/mimsrv">mimsrv</a> project,
with links to specific commits and versions of various files.
You can visit that project if you'd like to see more of the code
than I present in this post.
<a name="beforeauth">
<h2>Before Auth</h2>
</a>
Go has good support for writing simple web servers. The
<a href="https://golang.org/pkg/net/http/">net.http</a>
package allows setting up a web server that routes
requests based on path to specific functions.
In the first commit for mimsrv, before there was any code for
authentication or authorization, the http processing code looked
like this:
<br/><br/>
In <a href="https://github.com/jimmc/mimsrv/blob/f7c7cf29d9e47b98aa26fbc2b23aa6ad4fa5a38e/mimsrv.go#L29">mimsrv.go</a>:
<pre name="hlcode" class="go"
><div class="code">func main() {
...
mux := http.NewServeMux()
...
mux.Handle("/api/", api.NewHandler(...))
...
log.Fatal(http.ListenAndServe(":8080", mux))
}
</div></pre>
In <a href="https://github.com/jimmc/mimsrv/blob/f7c7cf29d9e47b98aa26fbc2b23aa6ad4fa5a38e/api/api.go#L32">api/api.go</a>:
<pre name="hlcode" class="go"
><div class="code">func NewHandler(c *Config) http.Handler {
h := handler{config: c}
mux := http.NewServeMux()
mux.HandleFunc(h.apiPrefix("list"), h.list)
mux.HandleFunc(h.apiPrefix("image"), h.image)
mux.HandleFunc(h.apiPrefix("text"), h.text)
return mux
}
func (h *handler) list(w http.ResponseWriter, r *http.Request) {
...
}
</div></pre>
The above two functions set up the routing and start the web server.
The code in mimsrv.go creates a top-level router (mux) that routes
any request with a path starting with "/api/" to the api handler that
is created by the NewHandler function in api.go. The top-level router
also defines routes for other top-level paths, such as "/ui/" for
delivering the UI files.
<br/><br/>
The api code in turn
sets up the second-level routing for all of the paths within /api
(the h.apiPrefix function adds "/api/" to its argument).
So when I make a request with the path /api/list, the main mux passes
the request to the api mux, which then calls the h.list function.
<a name="authentication">
<h2>Adding Authentication</h2>
</a>
To <a hre="https://github.com/jimmc/mimsrv/commit/44a6029b2bb42b15ccec09ff38a96d2bf278fb2f#diff-d8db984a4e139676adfee0fe21c6dc52">implement authentication</a>
in mimsrv, I added a new "auth" package with three files, and
modified mimsrv.go to use that new auth package. The most interesting
part of this change is that it implements the enforcement of the
constraint that all requests to any path starting with "/api/" must
be authenticated, yet I did not have to make any changes to any of
the api code that services those requests.
<br/><br/>
When I originally wrote my request routing code,
it could have been simpler if I had defined everything in one mux.
I didn't do that because I think the approach I took provides better
modularity, but in addition, that structure
made it easy for me to require authentication for all of the api calls.
<br/><br/>
The authentication code itself is not trivial, but wiring that code into
the request routing to enforce authentication for whole chunks of the
request path space was. I wrote a wrapper function and inserted it in the
middle of the request-handling flow for requests where I wanted to require
authentication.
<br/><br/>
To wire in the authentication requirement for all requests starting
with "/api/", I
<a href="https://github.com/jimmc/mimsrv/commit/44a6029b2bb42b15ccec09ff38a96d2bf278fb2f#diff-d8db984a4e139676adfee0fe21c6dc52L37">changed mimsrv.go</a>
to replace this line:
<pre name="hlcode" class="go"><div class="code"
>mux.Handle("/api/", api.NewHandler(...))
</div></pre>
with these lines:
<pre name="hlcode" class="go"><div class="code"
>apiHandler := api.NewHandler(...))
mux.Handle("/api/", authHandler.RequireAuth(apiHandler))
</div></pre>
Here is the RequireAuth method from the newly added
<a href="https://github.com/jimmc/mimsrv/blob/44a6029b2bb42b15ccec09ff38a96d2bf278fb2f/auth/authapi.go#L22">auth.go</a>:
<pre name="hlcode" class="go"><div class="code"
>func (h *Handler) RequireAuth(httpHandler http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request){
token := cookieValue(r, tokenCookieName)
idstr := clientIdString(r)
if isValidToken(token, idstr) {
httpHandler.ServeHTTP(w, r)
} else {
// No token, or token is not valid
http.Error(w, "Invalid token", http.StatusUnauthorized)
}
})
}
</div></pre>
The RequireAuth function looks at a cookie to see if the user is currently
logged in (which means the user has been authenticated).
If so, RequireAuth calls the handler it was passed, which in this case
is the one created by api.NewHandler. If not, then RequireAuth calls
http.Error, which prevents the request from being fulfilled and instead
returns an authorization error to the web caller.
When the mimsrv client gets this error it displays a login dialog.
<br/><br/>
The other code I added handles things like login,
logout, and cookie renewal and expiration, but all of
that code other than RequireAuth is specific to
my implementation of authentication. You could instead, for example,
use OAuth to authenticate, in which case you would have a completely
different mechanism for authenticating a user, but you could still
use a function similar to RequireAuth and wire it in the same way.
<a name="authorization">
<h2>Adding Authorization</h2>
</a>
Wrapping selected request paths as described above makes it so that
authentication provides authorization for those requests.
This coarse-grained authorization is a good start, but for mimsrv I
wanted to be able to use fine-grained authorization as well.
As this is a simple program with a very small number of users,
I don't need anything sophisticated such as
<a href="http://jim-mcbeath.blogspot.com/2012/10/role-based-authorization.html">role-based authorization</a>.
I chose to implement a model in which I only define permissions for
global actions, then assign those permissions directly to users.
<br/><br/>
For this simple permissions model, I needed to be able to define permissions,
assign them to users, and check them at run-time before performing an
action that requires authorization. My permissions are simple strings,
stored in a column in the CSV file that defines my users. To give a
permission to a user, I manually edit that CSV file, and to check for
authorization before taking an action, the code looks for that permission
string in the set of permissions for the current user.
<br/><br/>
The one piece that is not obvious is how to pass the user's permissions
to the code that needs to check them.
The reason this is not obvious is because the http routing package
defines the function signature for the functions that process an http
request, and that function signature includes only the request and
a writer for the response.
You can't simply add another argument in which you pass your user
information, so you have to dig a little deeper to figure out how
to pass along that information.
<br/><br/>
The solution relies on the fact that there is a Context attached to
the Request that is passed to the handler function. By adding the user
info to the Context, you can then extract that information further along
in the processing when you need to check the permission.
<br/><br/>
The RequireAuth function validates that the user making the request is
authenticated, so it already has information about who the user is, and this
is the point at which we want to add the user info to the Context.
We do this in our RequireAuth function by
<a href="https://github.com/jimmc/mimsrv/commit/545b636f536d950ee63facb38a4886e757b369ab#diff-60e27fe84e5a45397cf4fa550ca867acL33">replacing</a>
this line:
<pre name="hlcode" class="go"><div class="code"
> httpHandler.ServeHTTP(w, r)
</div></pre>
with these lines:
<pre name="hlcode" class="go"><div class="code"
> user := userFromToken(token)
mimRequest := requestWithContextUser(r, user)
httpHandler.ServeHTTP(w, mimRequest)
func requestWithContextUser(r *http.Request, user *users.User) *http.Request {
mimContext := context.WithValue(r.Context(), ctxUserKey, user)
return r.WithContext(mimContext)
}
</div></pre>
When the code needs to know whether the current user is authorized for
an action, it can call the new
<a href="https://github.com/jimmc/mimsrv/commit/545b636f536d950ee63facb38a4886e757b369ab#diff-60e27fe84e5a45397cf4fa550ca867acR56">CurrentUser</a> function,
which retrieves the user info from the Context attached to the Request,
from which the code can query the user's permissions:
<pre name="hlcode" class="go"><div class="code"
>func CurrentUser(r *http.Request) *users.User {
v := r.Context().Value(ctxUserKey)
if v == nil {
return nil
}
return v.(*users.User)
}
</div></pre>
<a name="summary">
<h2>Summary</h2>
</a>
While implementing authentication and authorization in a web server takes
more than just a few lines of code, at least the part about how it gets
tied in to the http processing in go is only a few lines. Although that
part is only a few lines of code, it took me a while to dig around and
find exactly how to do that. I hope that this article can save some other
people a bit of time when doing their own research on how to add auth to a
go web server.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-74067936035226607512018-03-13T21:45:00.000-07:002018-03-13T22:10:35.375-07:00Golang server, Polymer Typescript clientFinally, a web development environment I enjoy using.
<h3>Contents</h3>
<ul>
<li><a href="#background">Background</a></li>
<li><a href="#mimsrv">Mimsrv</a></li>
<li><a href="#what-i-like">What I Like</a></li>
<ul>
<li><a href="#offline-development">Offline Development</a></li>
<li><a href="#simple-mental-model">Simple Mental Model</a></li>
<li><a href="#simple-dependency-management">Simple Dependency Management</a></li>
<li><a href="#simple-compilation">Simple Compilation</a></li>
<li><a href="#type-safety">Type Safety</a></li>
<li><a href="#separation-of-concerns">Separation of Concerns</a></li>
<li><a href="#go-http-support">Go http support</a></li>
</ul>
<li><a href="#room-for-improvement">Room for Improvement</a></li>
<ul>
<li><a href="#polymer-typescript-mismatch">Polymer/Typescript mismatch</a></li>
<li><a href="#debugging-typescript">Debugging Typescript</a></li>
</ul>
</ul>
<h3>TL;DR</h3>
I have found Go to be a nice tool for developing a small web server,
and Polymer + Typescript to be a nice combination for developing a web UI.
The Go server acts as both the API server and the static content server
delivering the UI pages.
If you think you might want to try this approach, you can look at my
<a href="https://github.com/jimmc/mimsrv">mimsrv</a> program on github
as an example.
If it looks too complicated, browse in the git history back to some of
the earliest commits, such as the
<a href="https://github.com/jimmc/mimsrv/commit/6a9c1172a70e2c6d23a362b0655c39f428c13105">first ui commit</a>
and the
<a href="https://github.com/jimmc/mimsrv/commit/f7c7cf29d9e47b98aa26fbc2b23aa6ad4fa5a38e">first api commit</a>,
to see how things looked at a simpler time.
<a name="background"></a>
<h3>Background</h3>
I have been developing web pages and apps for a long time,
since the earliest days of HTML when there were no tools
more sophisticated than a text editor, and server-side scripts
were the only form of executable web code.
In 1994 I wrote
<a href="http://alumnus.caltech.edu/~jimmc/ftp/htimp/">htimp</a>,
an experiment in how to attach a web browser to an interactive program
with a lifetime longer than a single message.
<br/><br/>
Over the years I tried many technologies, including
<a href="https://docs.oracle.com/javaee/5/tutorial/doc/bnagy.html">JavaServer Pages</a>,
<a href="https://en.wikipedia.org/wiki/JavaServer_Faces">JavaServer Faces</a>,
<a href="http://www.php.net/">PHP</a>,
<a href="https://jquery.com/">jQuery</a>, and others I have forgotten.
Some were better than others (more accurately, some were bad
and some were excruciating),
but I never felt any of them provided a reasonable mental
model for how to put together an application.
<br/><br/>
I was away from the web UI scene for a while, and when I got back to
doing some web development a couple of years ago, things seemed to have
improved quite a bit. In the last year, I have been introduced to a few
technologies that, in combination, provide me with a development
environment with a working mental model of how to put together a program,
and a set of tools that makes it easy to do that at a good clip.
<br/><br/>
The three technologies that together have brought pleasure back to my web
programming are:
<ol>
<li>The <a href="https://golang.org/">Go</a>
language and development environment
<li>The <a href="https://www.typescriptlang.org/">Typescript</a> language
<li><a href="https://www.polymer-project.org/">Polymer-2</a>
(and <a href="https://www.webcomponents.org/">Web Components</a>)
with decorators
</ol>
Below I describe the project on which I tried out these technologies,
followed by a discussion of what I liked about them.
<a name="mimsrv"></a>
<h3>Mimsrv</h3>
<a href="https://github.com/jimmc/mimsrv">Mimsrv</a>
is a web server and UI to view a collection of photos.
It is a replacement for
<a href="https://github.com/jimmc/mimprint">mimprint</a>,
which is a desktop app that I
originally wrote starting in 2001 in Java, and converted to Scala
starting in 2008.
<br/><br/>
A couple of years ago I started looking into
rewriting mimprint once again, this time as a web app.
As a web app, I would no longer have to worry about distributing a
desktop application to the various machines I have on which I wanted to view
my photos.
I also thought I should be able to leverage the web browser's
media capabilities so that I would not have to develop or support
that whole chunk of code.
<br/><br/>
The tools I tried were never nice enough to pull
me in and get me going on that replacement, and I had moved my
rewrite-mimprint project way down on my TODO-list.
<br/><br/>
At Google last year I worked on the open-source
<a href="https://github.com/googledatalab/datalab">Datalab</a> project.
When I started on it, we were using jQuery and Javascript.
I liked it when we converted to Polymer-2 and Typescript,
and I liked it more when we switched to using Polymer decorators.
<br/><br/>
I started learning Go in order to review code from my teammates. It took
a little getting used to, but the more I learned, the more it made sense
to me. I felt it was much easier to understand the existing Go
codebase than similar codebases I had looked at in other languages. It
grew on me, and after I started adding my own Go code to the project, I
was surprised at how much I liked using it, and I felt that I was making
pretty good coding progress.
<br/><br/>
I thought the combination of Go for the server, and Polymer and Typescript
with decorators for the client, worked quite well, and I decided to try it
for my personal project.
So far that combination has worked well for me, and I have been quite
happy with it.
<a name="what-i-like"></a>
<h3>What I Like</h3>
<a name="offline-development"></a>
<h4>Offline Development</h4>
One of my requirements is that I be able to develop when I am offline. I
insist on this because one of the situations in which I have the most
amount of time available for programming on my personal projects is when I
am traveling and often don't have network access.
<br/><br/>
In a previous attempt at putting together a collection of technologies
for developing web apps, some of the pieces used maven, and I was unable
to figure out how to convince it not to go out looking for new versions
of the snapshots I needed every time it compiled.
<br/><br/>
After using Go on a project at work and being pleasantly surprised at how
much I enjoyed using it, I decided to see it if would work for my
personal projects. When I downloaded and installed it, I was delighted to
discover that, not only did the installation provide everything I needed
to compile and run my programs, but it also included all of the documentation
and the Go Tour, so those would all be available to me offline!
<br/><br/>
Similarly, the Typescript and Polymer tools allow just building the
code, without attempting to do any dependency resolution, so can
easily be used offline.
<a name="simple-mental-model"></a>
<h4>Simple Mental Model</h4>
There are a couple of changes to the web app landscape that have made for a
much simpler mental model than in the old days.
The main one is the Single Page Application (SPA).
With the old approach of having to move to a new page every time the
user took an action, saving state across those page changes
required mental and technical gyrations.
With a SPA, you make AJAX calls to the server using XMLHttpRequest,
and just keep your state in variables as in any other program.
<br/><br/>
The SPA model also allows for a clean separation of responsibility
between the server and the client. With Polymer, all of the UI
manipulation is handled in the client, so the server doesn't need to
deal with any kind of templating of client-side functionality.
This means the server can focus on the API and on just delivering the
UI code to the client, and the client can focus on managing the UI
and making API calls.
<br/><br/>
The other big change on the client side is the progress that has been made
on the asynchronous programming model.
At first we had to pass around success and failure callbacks,
which requires splitting code up in unwieldy ways around every
asynchronous call.
The introduction of Promises provided a nice way to avoid
the "callback hell" of deeply nested callbacks,
but still requires chopping your code up around every
asynchronous call.
Lastly, the introduction of the async and await keywords
made asynchronous programming almost as straightforward as
synchronous programming.
I'm particularly impressed that you can do things like have an
if-statement with synchronous code on one side and asynchronous
code on the other side, or a loop with an asynchronous call in it.
This is so much simpler to reason about than if you had to figure out
how to do that with callbacks or even Promises.
<a name="simple-dependency-management"></a>
<h4>Simple Dependency Management</h4>
The few times I had to deal with Maven were unpleasant.
I found it hard to control, hard to configure, and hard to understand
what it was doing. Perhaps it's just that, with the march of time,
people have figured out how to make dependency management better,
but I found the dependency management in both Go and Polymer to be
pleasant to use.
<br/><br/>
In Go, when you need a package, you just say <code>go get <i>package</i></code>,
and it downloads that package and all its dependencies. Assuming you follow
the Go conventions when naming and locating your package, when someone
then wants to download your package, they do the same thing, and Go will
also download all of your dependencies to their system.
<br/><br/>
Polymer-2 uses bower for its package management, and it is almost as easy
to use. The <code>bower.json</code> file lists the packages needed,
and running <code>bower install</code> installs those packages and their
dependencies. When you add a new dependency to one of your Polymer components,
you just run
<code>bower install --save <i>new-package</i></code> to download that
new package, and you're done.
Not quite as effortless as go, but much better than my
experience with maven.
<br/><br/>
For both Go and bower, they don't attempt to download anything except
when you explicitly tell them to with <code>go get</code> or
<code>bower install</code>, which is good for offline development.
<a name="simple-compilation"></a>
<h4>Simple Compilation</h4>
For pretty much my whole programming career, I have been accustomed
to using some kind of build tool that requires a configuration file:
make, ant, maven, sbt, grunt, bazel, gradle, and others.
<br/><br/>
Go is different: it is so opinionated about where you have to put your
packages and how you have to name stuff, that it has all the dependency
information it needs by looking at the source files. You just tell it
to build your program with the command <code>go build</code>, and it
does it. No build config file required.
<br/><br/>
The Typescript compiler and Polymer build commands do require config
files, but they were pretty simple to set up and understand,
and seldom need to be modified. Running <code>tsc</code> compiles all
the Typescript files to Javascript, and running <code>polymer build</code>
packages all the Polymer Javascript and HTML files into a directory
where they are served by the Go server.
<a name="type-safety"></a>
<h4>Type Safety</h4>
I like the compiler to catch as many errors in my code as possible.
Using compile-time types allows the compiler to spot more errors.
This is why I greatly prefer Typescript over Javascript.
<br/><br/>
Go is also a compiled and typed language, so it catches a lot of
problems before execution time.
<a name="separation-of-concerns"></a>
<h4>Separation of Concerns</h4>
While I don't think having to use multiple languages is a benefit,
the ability to select the best tools for different parts of the problem
is.
Go works very well as a web server for API calls and static content.
Most people using Polymer embed Javascript code in their HTML file,
but I prefer using Typescript and am happy putting that in a separate
file from the HTML, where my editor understands it better.
<a name="go-http-support"></a>
<h4>Go http support</h4>
Go has a nice <code>http</code> package that makes it easy to define web routing
and implement handler functions.
<br/><br/>
Because Go supports functions as first-class
values, it's easy to define a function that can take a function as
an argument and return another function. In my case, I used that
approach to create a function that I could use to specify that certain
parts of my API required authentication.
<br/><br/>
I wrote my http handlers to do only the marshaling and unmarshaling
of data and then call the underlying routine that implements the
requested functionality. This made it easy to write unit tests of the
underlying function.
But Go also provides a nice testing package for http handlers that makes it
relatively easy to test the http handler as well.
<a name="room-for-improvement"></a>
<h3>Room for Improvement</h3>
I'm pretty happy with this collection of technologies, but there are
a couple of things I would like to see improved.
<a name="polymer-typescript-mismatch"></a>
<h4>Polymer/Typescript type mismatch</h4>
Polymer decorators are a nice improvement over the previous approach, as
there is now much less boilerplate and repeated code. But I still have to
specify a type in each <code>Polymer.decorators.property</code> line, and
that type is not quite the same as the Typescript type (for example,
string vs String, any vs Object).
<br/><br/>
I suppose this is not that surprising, given that Typescript is not
officially supported by Polymer. I guess that's really what I would
like to see happen.
<a name="debugging-typescript"></a>
<h4>Debugging Typescript</h4>
Writing Typescript rather than Javascript is nice, but when it gets
loaded into the browser it's Javascript, so debugging in the browser
uses the transpiled Javascript.
The Javascript is usually close enough to the source Typescript that it's
manageable, but it would be nice to be able to debug with the Typescript
source code.
<br/><br/>
Maybe this situation will get better when
<a href="http://webassembly.org/">WebAssembly</a> gets implemented.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-37721813915374467452017-06-22T21:03:00.001-07:002017-06-22T21:03:16.990-07:00FiOS - A Cautionary TaleI delayed signing up for Frontier FiOS because I was concerned they
might screw things up. I should have been more concerned.
<br/><br/>
This is a long post. Consider it entertainment.
Or just skip to the <a href="#answers">Answers</a>.
<h3>Contents</h3>
<ul>
<li><a href="#speed">The Need for Speed</a></li>
<li><a href="#situation">My Unusual Situation</a></li>
<li><a href="#questions">Questions</a></li>
<li><a href="#research">Research</a></li>
<li><a href="#ordering">Ordering</a></li>
<li><a href="#trouble">Trouble</a></li>
<li><a href="#mistakes">Mistakes</a></li>
<li><a href="#good-stuff">Good Stuff</a></li>
<li><a href="#answers">Answers</a></li>
<li><a href="#frontiers-problems">Frontier's Problems</a></li>
<li><a href="#timeline">Timeline</a></li>
<li><a href="#quotes">Selected Quotes</a></li>
</ul>
<a name="speed">
<h3>The Need for Speed</h3>
</a>
I have had internet connectivity for decades,
starting back with modems so slow, I knew people who had to pause
in their typing to let the modem catch up.
I appreciated every doubling of speed as each generation of modem
arrived.
I was surprised when modem speeds reached 4800 and then 9600 baud -
how could you get more bits per second than the 3K bandwidth of a phone
line? - and I was astounded with the jump to 56K modems.
<br/><br/>
When DSL came out, I waited impatiently for it to be available in my
neighborhood, and signed up as soon as I could. After years of using a
56K modem, my 740Kbps DSL line was satisfyingly fast.
<br/><br/>
I lived with 740Kbps for six years, until one day my DSL modem broke.
While researching new modems, I learned that I could have my service
switched from Frame Relay to ATM and bump up my speed to 3Mbps.
Normally this would mean my service would be out for 10 days while they
did that, but since it was already out, it seemed like a good time
to make the switch.
The speed bump from 740Kbps to 3Mbps was a mere 4x, far less than the
13x increase from the 56K modem to 740Kbps DSL, but still, 3Mbps
was satisfyingly fast.
<br/><br/>
Verizon actually offered FiOS in my neighborhood fairly early, but
I was pretty happy with my DSL service, and I wasn't doing anything that
I thought needed more than 3Mbps bandwidth. I remained happy with my
3Mbps for over a decade. But technology marches on; I bought some HDTVs,
started watching more YouTube, and started working from home more often
using bandwidth-hungry remote desktop applications. My 3Mbps
connection was not sufficient to stream HDTV movies and YouTube clips,
and my remote desktop experience was annoyingly slow. I was finally
feeling the bandwidth squeeze.
<br/><br/>
Still, I delayed upgrading to FiOS. I had heard that I would have to
give up my copper-wire land lines, which I was not keen to do. Some years
ago our power was out for over a week; batteries everywhere ran down -
even the local cell towers ran out of juice after a few days, so there
was no cell service in our neighborhood - but, with our copper wires, we
had phone service the whole time. I liked that.
<br/><br/>
In addition, by this time Verizon had sold to Frontier, and based on
my experience and anecdotes I read, I was concerned that Frontier would mess
something up when dealing with my service change request, particularly
since my situation was rather unusual.
<a name="situation">
<h3>My Unusual Situation</h3>
</a>
In my case, there were a number of things about my situation that gave
me pause when thinking about asking for any kind of change.
<ol>
<li>I have a land line. This has become increasingly rare, and it seems
Frontier is deprioritizing phone service so they can focus
on providing internet and television service. It seems they want
to provide packages that include everything, or at least include both
internet and television.
<li>Actually, I have two land lines. I'm not sure I know anybody else who
has two land lines at home any more. I used to have three, but finally
got rid of the third line after disposing of my last FAX machine years ago.
Although I have two lines, they have not both shown up on my monthly
bill for many years now. Oh, I am paying for two lines, it's just that
the second line is not itemized anywhere. If you didn't know a-priori
that I had two phone lines, you would hardly be able to tell that by looking
at my phone bill. Based on conversations I have had with support and
billing people at Frontier, it's not obvious to them either, although
after I point it out, and with enough digging, some of them
could figure it out.
<li>I had a DSL line. As I mentioned, I delayed for quite some time in
switching from DSL to FiOS. The longer I delayed, the fewer people had
DSL lines, and the less Frontier cared about them. For this particular
problem, I suppose my delaying upgrading perhaps made things worse.
<li>My DSL service provider was not Frontier. This caused a fair amount
of frustration any time I had a service issue with my DSL line.
</ol>
My DSL service was perhaps the most unusual part of my situation.
Back in 1999, when I originally ordered DSL, GTE (yes, it was that long
ago!) had partnerships with Internet Service Providers who provided the
actual internet service. These are known as CLECs (Competitive Local
Exchange Carriers). So GTE provided the line, and my selected ISP
provided the internet service. Originally I paid GTE directly for the line
and I paid the ISP for the internet service. But when I switched from
Frame Relay to ATM service, my billing also changed so that I paid
everything to the CLEC, and they paid Verizon for the line.
<br/><br/>
Back then many of the carrier ISPs had
annoying policies such as blocking some ports, so it was nice to be
a customer of a smaller ISP that was more interested in making its
customers happy. The downside was that, whenever there was a service
problem, I had to deal with two companies, and they each tended to say it
was the other company's problem.
<br/><br/>
By the time I was considering switching
from DSL to FiOS this year, it had become perhaps comically bad:
when I talked to support and billing people at Frontier, they were
completely unaware that I had DSL service on my Frontier phone line,
and even with a lot of digging, nobody I talked to at Frontier this year
was ever able to find even a trace of information about my DSL line.
<br/><br/>
On the other side, my ISP had been acquired multiple times over the years,
each time by a larger and more remote company, until by this year they
were no longer in the DSL business and no longer in the residential ISP
business. Somehow through all this, my residential DSL line kept working,
but I did start to feel I was skating on ever-thinning ice.
<br/><br/>
It was time to take the plunge and upgrade.
<a name="questions">
<h3>Questions</h3>
</a>
Before ordering FiOS service, I wanted to get the answers to four questions:
<ol>
<li>What service options do I have?
<li>What equipment will be installed and where?
<li>What is the installation process?
<li>How much will it cost?
</ol>
How does one answer questions like these in today's world?
Hit up the internet, of course.
<a name="research">
<h3>Research</h3>
</a>
<a name="website">
<h4>Frontier's Web Site</h4>
</a>
I started by browsing <a href="https://frontier.com">Frontier's web site</a>
looking for information about their service offerings.
<br/><br/>
NOTE: Frontier's offerings are regional, so you may see different web pages
than what I describe.
<br/><br/>
Their <a href="https://frontier.com/shop/internet/fios">FiOS</a> page
shows four levels of service: 50Mbps, 75Mbps, 100Mbps or 150Mbps.
Since I wanted both internet and phone service, I headed over to the
<a href="https://frontier.com/shop/bundles/">bundles</a> page to see
what I could get. I don't want their television service, so
I unchecked the "Video" box. This shows three bundles: two that include
30Mbps internet service (30? that's not one of the speeds listed on
their FiOS page!) and one that includes 50Mbps.
Do they offer bundles that include internet service faster
than 50Mbps? Their web site doesn't say.
<br/><br/>
Their <a href="https://frontier.com/shop/phone/">Phone</a> page shows
me information about copper-line phone service (lower in the page it says
"Our reliable copper power stays on even when the power goes out or
in an emergency"), where they list two plans that differ by $3.
Confusingly, this phone service - you know,
<a href="https://en.wikipedia.org/wiki/Plain_old_telephone_service">POTS</a>
using analog signals on copper wires - is called "Digital Essentials."
Are there any other optional add-ons? There are a fair number of features
bundled with the basic phone service, and a lot more bundled with that
extra $3, but is that it? Like, I currently have an unlisted phone number,
what's the charge for that? Sorry, that kind of stuff is not on their
web site.
Ah, here on the the
<a href="https://frontier.com/shop/phone/phone-challenger/dpu-challenger">
Digital Phone Unlimited</a> page, it says
"Optional international calling packages are available for great savings",
so apparently there are other options available - but it's not a link,
so I have no idea what kind of packages they might offer.
<br/><br/>
How about a second phone line, how much does that cost? Sorry, that's
not on the web page. VoIP? Oh, maybe you mean FiOS Digital Voice.
Beats me what the scoop is on that.
If you go to Frontier's
<a href="https://frontier.com/shop/bundles/fios">FiOS Bundles</a> page,
where it says the phone service in their bundles is Digital Phone Unlimited,
and you click on the Learn More button for the phone service,
it takes you to
<a href="https://frontier.com/shop/phone/phone-challenger">that phone page</a>
I mentioned above - you know, the one that says
"Our reliable copper power stays on even when the power goes out or in an emergency."
So, if I get a bundle that includes FiOS internet, does that bundle include
Digital Phone Unlimited running on copper wires?
<br/><br/>
The details of the above web pages are what they look like now, in June 2017.
I believe they have changed since I did my initial research a few months ago,
but the gist is the same: I was unable to figure out
what options were available to me by reading their web site.
<br/><br/>
Besides looking at Frontier's web pages,
I did a lot of Googling and browsing of other web sites.
I learned a lot in general about equipment, but it was hard to know
how much of it would apply to my situation.
Although I did not record the time I spent browsing Frontier's and
others' web sites, I estimate it was probably about five hours.
<br/><br/>
It was time to move on to online chat to get more answers.
<a name="chat">
<h4>Online Chat</h4>
</a>
I had six online chats with Frontier, totaling about 4 1/2 hours.
Between each chat I did more online research, looking for details about the
equipment and the installation experience both inside and outside the house.
It was difficult to get a good handle on these details, particularly since
I sometimes got conflicting answers from the Frontier people I chatted with.
<br/><br/>
For example, one of my questions was whether I could keep my copper phone
lines, or whether I would be required to switch my phone service to fiber.
One of the people I chatted with said this:
<blockquote>You would have to switch to a digital phone service ! Voip. Which basically means Voice over Internet
</blockquote>
Another one said I could keep my phone service on copper and get their
"Simply FiOS" service, which is fiber with only internet service.
<a name="ordering">
<h3>Ordering</h3>
</a>
Once I reached the point where I felt I had as good answers as I could
get - which admittedly were not always very good - it was time to place
my order.
<br/><br/>
On April 9 I called Frontier to place my order.
I would say the fact that it took me well over an hour to place my
order was the first hint of <a href="#trouble">trouble</a>, but in
truth there were plenty of hints during the many <a href="#chat">chats</a>
I had, where I was not getting consistent answers.
<br/><br/>
Part of the reason the phone call took so long was due to my
<a href="#situation">unusual situation</a>.
The DSL was not much of an issue during ordering, since it was
completely invisible to them and they couldn't do anything about it.
The real trouble was that second phone line. Figuring out how to deal
with that took probably 45 minutes.
<br/><br/>
When I asked if I was required to switch my phone service from copper
to fiber, the service rep first said no, but then went and asked someone
else, came back and said yes, I would have to switch. I would have preferred
to keep my phones on copper (and especially I would have preferred it given
how much trouble I have had with the switch), but I was not given that option.
So I placed the order to switch both of my phone lines over to fiber.
<br/><br/>
At some point I learned that each phone number at Frontier is on a separate
account. This was completely invisible to me because both of my phone lines
are billed on the account for my primary number, so that's the only account
I see. Some of the Frontier people I talked to were able to find the separate
account for the secondary line, but it always seemed to take them a while.
In the end, I think that the fact that the secondary phone was actually a
separate account has saved me some hassle with it: because it was on a
separate account, the order to change the second phone over to fiber was
done with a separate work order, scheduled for the day following the
primary work order. Once the trouble started, I was able to cancel that
second work order before anything was done to the second line; but the
work had already started on the first line, and that has been the headache.
I wonder now if there was any way I could have convinced them to just
treat the internet service as a new internet-only account and so leave
the phone lines and their account completely untouched.
<br/><br/>
I was pleased that my installation was scheduled very quickly, just two
days later, on April 11. I should not have been. As it turned out, I did
not actually get my FiOS service until April 18.
<br/><br/>
I wonder, had I known then what I know now, what I might have been able
to do to avoid any of the troubles I have had.
<a name="trouble">
<h3>Trouble</h3>
</a>
On the morning of April 11, I was a bit surprised that the installer did
not call first to confirm I was home before coming by. When he arrived,
I learned why: although he had two different phone numbers for me, somehow
he had typos in both of them. These two numbers were for my
two Frontier phone lines. I would have thought the computer would have
just copied those numbers into the work order, but I assume now that a
person manually put in those phone numbers, and somehow got them both wrong.
<br/><br/>
Unfortunately, the person who took my order scheduled my installer visit
without first scheduling the preceding two steps of the
<a href="#installation-process">installation process</a>.
As a result, when the installer came out for the April 11 appointment,
he was unable to do his work, and had to leave having done nothing.
<br/><br/>
Before that first installer left, he told me he would call in the work
orders to do the steps that should have been done before he got there.
He might have done this right away, but when I called Frontier a little
bit later that day, I was still unable to reschedule the installation
because they didn't have the notes from that day's work order yet.
So I had to wait and call back a couple of days later.
<br/><br/>
I was disappointed that I would have to wait longer to get my
fast internet service, but that was just a mild disappointment.
What was more annoying was that my DSL service went out on April 13,
two days after that original installation date.
<br/><br/>
As I mentioned above, due to my unusual DSL situation, it was very
difficult for me to get anyone to take any action on my DSL line.
I called my ISP, and they said everything looked fine to them. I called
Frontier and they couldn't help me at all; they had absolutely zero
visibility into my DSL service. One tech said he would run a DSL line
check on my line, but the computer wouldn't let him because it said there
was no DSL service on my line.
<br/><br/>
My ISP suggested that my DSL modem may have died, and while I admit that is a
possibility, the timing of the outage, plus the fact that the modem lights
indicated no DSL carrier, leads me to believe that the work order to switch
my copper line to fiber triggered some follow-on internal work order to
turn off the DSL on that line, and because my DSL service was invisible to
everyone who looked at my account, they had no way to manage that internal
work order.
<br/><br/>
After a few frustrating and fruitless phone calls trying to get my DSL line
fixed, I decided to forget it and hope that my new fiber internet connection
would be running soon. In the meantime, I tethered my computer to my phone
when I wanted to use the internet, so I did not have to suffer internet
withdrawal while waiting for FiOS. Ironically, this gave me a faster
connection than my 3Mbps DSL line, although I never got it working as
a gateway for my entire LAN, but only used it on one computer at a time.
<br/><br/>
The first step in the installation process is for the utility locators to
come out and spray lines marking the location of the existing utilities so
that the people burying the fiber don't damage any existing buried utilities.
Two days after the aborted initial installation appointment,
on the same day my DSL service went out,
various
<a href="https://en.wikipedia.org/wiki/Utility_location#Color-coding"
>colored lines</a>
started appearing in my front yard marking the utilities.
The following morning the fiber installers came out and buried the fiber
cable running from the curb to my house (yay!).
BUT - that afternoon, yet another utility locator came out to locate more
utilities. So the fiber installers jumped the gun by installing the fiber
before all of the utilities were located. Fortunately, they did not
damage any of the unlocated utilities, so although they did not follow
the prescribed procedure, at least no harm resulted from that mistake.
<br/><br/>
On April 18, now that the fiber was in place, the second installer came out to
finish the installation. In about two hours he installed all the equipment
and got the FiOS internet service working (yay!). For much of the next hour
he worked over the phone with a technician trying to get the primary phone
line working over fiber. After some discussion with me, they finally gave up and
moved the phone line back to copper.
<br/><br/>
I was perfectly happy keeping my phone service on copper, as that's what I had
originally wanted anyway. If only it had been so easy.
<br/><br/>
I learned from the installer that the second phone line was on a separate
work order, to be moved from copper to fiber the next day. Given that they
were unable to move the first line, and were willing to keep it on copper
(I thought), I called and canceled the service call that was scheduled for
the next day. I'm pretty sure doing that has saved me a lot of grief on
my second phone line, as so far I have not had any problems with it, and
it has continued working just fine on copper, as well as being billed properly.
<br/><br/>
On April 25, one week after the FiOS installation, I learned that my
primary phone was not working properly. It may be that it stopped working
a day or two sooner, but this is the day I realized it. It was broken in
a strange way: I could place outgoing calls, and I could receive incoming
calls from another phone number in the same exchange, such as my second
phone line, but calls from outside the exchange would not go through.
When I called from my mobile phone, which has a different area code, I
could hear a ringback on my mobile, but my landline never rang. When I
called from my wife's mobile phone, which is in the same area code but not
in the same exchange, I immediately got a message saying "Your call can
not be completed." I spent a couple of hours on the phone with Frontier
over this.
<br/><br/>
On April 30, five days later, they finally managed to get the phone
working again. We got a call at 8:15am that Sunday morning from a repair
man testing to see if the line was working. Fortunately, we were already
awake.
<br/><br/>
Two days later, on May 2, the phone service went out again, in the same
way. Another hour on the phone with Frontier, and this time it "only" took
them two days to get it fixed.
So far, from then until now (mid-June), the phone service has not gone out
again, so I am hopeful that they really have fixed it.
</br/><br/>
On May 8, I received my first bill from Frontier since getting my new
FiOS service. It had a couple of minor errors on it, which I was able to
deal with on the phone to Frontier in about 15 minutes.
<br/><br/>
On June 7, I received my second bill from Frontier since getting my new
FiOS service. This one had more serious problems, and I spent closer to
an hour on the phone with Frontier.
The most significant problem is that, although my phone service never
got switched over to fiber, which also would have included switching to a new
service plan, the billing <i>did</i> get switched to the new plan.
My old plan was $18.90/month, the new plan is $30.99/month. So I am
being charged an extra $12.09 for <i>exactly the same service</i>
that I was getting before the FiOS installation. The billing person
I talked to told me she was unable to change my phone service back to
the old plan because I had been grandfathered in at that old rate.
I assume the computer did not provide her any way to go back to that
grandfathered rate.
<br/><br/>
So here I am, two months after ordering FiOS, trying to figure out what
I should do about my phone service.
Try harder to get it back to the old rate? Try to get it changed to
the service plan I am now being forced to pay for?
<br/><br/>
Or maybe I should just cancel it.
Who has land lines these days anyway?
<a name="mistakes">
<h3>Mistakes</h3>
</a>
Here is a list of what I believe are the mistakes Frontier made that
led to the above trouble.
<ul>
<li>When taking my order, the service rep scheduled the equipment installation
without first scheduling utility location and fiber installation
<li>Both of my phone numbers were entered incorrectly in the original
work order
<li>The fiber installers buried the fiber before all of the utilities
were located
<li>When the original installation was postponed, the order to disconnect
my DSL service was not also postponed
<li>When the installer was unable to move the phone service to fiber,
and kept it on copper, he should have canceled the rest of the service
order for moving the phone service (although I suspect the computer
would not have let him do that, since he had already done some of the
work on it)
<li>When the phone went out the first time, and the repair man got it working
again, he must have missed some piece of the puzzle, since it went out
again two days later
<li>Given that the phone service never actually got switched to the new
plan on fiber, the billing likewise should not have changed
</ul>
<a name="good-stuff">
<h3>Good Stuff</h3>
</a>
While I think far more has gone badly than is reasonable, not everything
has gone wrong. In fairness, I list here some good things.
<ul>
<li>The fiber installer did a very nice job burying the fiber line from
the curb to the house. We could hardly see where they ran it,
including through sod, and even where they had to run it under a bed
of solid pachysandra, they only damaged a strip a few inches wide.
<li>The equipment installer cheerfully ran ethernet cable from the ONT,
across the ceiling in my garage, through a wall, into my network
equipment closet, and to a wall-mounted jack.
<li>My 100/100 internet service came up smoothly on the (second) scheduled
date, and has been working well ever since.
It is satisfyingly fast.
<li>When I run speed tests, I consistently do get 100Mbps both up and down.
<li>The Arris wifi router they included in the installation was actually
pretty nice (although it would be better if there were some documentation
available somewhere). If I were a less technically demanding customer,
I would probably still be using it.
<li>Both installers who came to my house were friendly and competent.
A few of the tech support people I talked to also seemed quite competent.
<li>Almost everyone I have communicated with at Frontier has been friendly
and has (as far as I can tell) tried their best to help me.
They always let me stay on the line asking questions as long as I wanted
to; I never felt anyone was trying to get me to hang up.
<li>I have not had any trouble getting credits applied to my bill.
</ul>
<a name="answers">
<h3>Answers</h3>
</a>
This section lists what I think are the answers to the
<a href="#questions">four questions</a> I started with.
<a href="https://www.google.com/search?q=ymmv">YMMV</a>:
service, equipment, processes and prices may vary across
regions and over time, and depending on your situation.
<h4>What service options do I have?</h4>
Sadly, I can't give you good answers here, so you will probably have to
call or chat with Frontier and experience your own frustration at getting
a different answer each time.
<br/><br/>
I do, however, have a few things to point out.
<br/><br/>
One point, that was always unclear to me when researching
FiOS, is that
there is no technical reason you can not keep your copper-wire phone
along with FiOS.
The fiber line is installed completely independently of the copper wires,
and the service is likewise independent.
Frontier may tell you that you must switch your phone service over to
fiber service (either
<a href="https://en.wikipedia.org/wiki/Time-division_multiplexing">TDM</a>,
in which the phone signal is sent over the
fiber separately from the Internet signal, or
<a href="https://en.wikipedia.org/wiki/Voice_over_IP">VoIP</a>, where it is sent
on top of the Internet signal), but that is purely a business issue.
<br/><br/>
A possible sticking point is the way Frontier handles their accounts: if
your phone service is on the same account as your FiOS internet service,
they are constrained as to what the computer will let them do with that
phone service. If you want to keep your copper phone lines and they are
telling you you can't, perhaps you can ask to put the Internet service on
a separate account. You can then ask to have both accounts billed
together. But you might lose out on some bundling discounts this way.
<br/><br/>
One of the differences between POTS over copper wires and VoIP is that
POTS is regulated phone service, but VoIP is not.
More specifically, under the
<a href="https://en.wikipedia.org/wiki/Telecommunications_Act_of_1996"
>Telecommunications Act of 1996</a>
VoIP is considered an information service rather than a communications service,
the upshot being that you don't have the same level of guarantees as POTS,
which is regulated as a communications service.
However, IANAL, and I was unable to determine whether or how later laws
may have modified this situation,
or whether those regulations are still being enforced,
so this may be a moot point.
<a name="equipment">
<h4>What equipment will be installed and where?</h4>
</a>
Not including the fiber from the street to your house that gets buried as
part of the <a href="#installation-process">installation process</a>,
the installer installs three pieces of equipment:
<ol>
<li>The
<a href="https://en.wikipedia.org/wiki/Network_interface_device#Optical_network_terminals">ONT</a>
(Optical Network Terminal), which converts between the optical
signal carried on the fiber to the electrical signals used in the house.
The ONT has the following connections:
<ul>
<li>An optical connection that gets connected to the fiber from the street
<li>Two <a href="https://en.wikipedia.org/wiki/Modular_connector#8P8C">8P8C</a>
(RJ45) ethernet jacks for the internet connection
<li>Two
<a href="https://en.wikipedia.org/wiki/Registered_jack#RJ11.2C_RJ14.2C_RJ25_wiring">RJ-11</a>
jacks for phone connections
<li>A coaxial connector for the cable connection
</ul>
The ONT can be configured to provide internet service either through the
8P8C connector on a standard ethernet cable, or through the coaxial
cable using
<a href="https://en.wikipedia.org/wiki/Multimedia_over_Coax_Alliance">MOCA</a>.
<br/>
The ONT is typically mounted on the outside of the garage.
The fiber from the street is routed first into a holding box, typically
mounted behind the actual ONT, where the excess cable is wrapped in big
loops to take up all the slack, then from there it enters the ONT.
<li>A power supply that includes a small battery backup for the ONT.
This is typically mounted inside the garage, ideally just opposite where
the ONT is mounted on the outside, and near a power outlet. The installer
will then drill a hole through the garage wall to feed through the
power wire from the supply to the ONT, and possibly another to bring
the ethernet and coaxial cables into the garage if they will be routed
through the garage.
By default, the battery backup provides power only for the phone lines.
It can be hacked to provide power for the internet portion of the ONT,
or you can just buy your own
<a href="https://www.amazon.com/s/?field-keywords=ups">UPS</a>
and plug the ONT power supply into that
(although Frontier recommends plugging the ONT power supply directly
into an outlet).
<li>A MOCA-capable router. In my case this was an
<a href="http://www.arris.com/">Arris</a> NVG468MQ, which is
a reasonably nice wireless router, except that they didn't give me a
manual, and I was unable to find anything of substance online.
The router has the following connections:
<ul>
<li>A WAN ethernet port
<li>Four LAN ethernet ports
<li>A coax connector in case the internet signal is being
supplied using MOCA
<li>A four-wire RJ-11 phone jack for up to two phone lines
</ul>
If you have a good installer, they should be willing to let you decide
where you want to put your router, and run ethernet cable (or coax if
using MOCA) to that location, including drilling holes and installing
a wall jack.
</ol>
The internet signal from the ONT to the router can run either over an
ethernet cable or over a coax cable. If you are getting TV service, they
will have to run a coax cable for the TV service. If your internet service
is slower than 100/100, it is possible to run the internet service over that
same cable to the MOCA-capable router. If your internet service is 100/100
or faster, you probably want to run that over an ethernet cable; and you
might someday want to upgrade to 100/100 or faster service later, so you
probably should have them install that ethernet cable now anyway and have
them run the internet signal through that to the router.
Plus, that gives you the option of replacing their router with one of
your own choice that doesn't do MOCA.
<a name="installation-process">
<h4>What is the installation process?</h4>
</a>
Installation of new FiOS service - not including preliminary research,
placing the order, and post-installation followup to correct problems -
consists of three sequential steps:
<ol>
<li>Locate existing utilities: one or more people come out with metal
detectors that they use to locate existing utilities such as power,
water, sewer, gas, phone, and cable, and paint different
<a href="https://en.wikipedia.org/wiki/Utility_location#Color-coding"
>colored lines</a>
marking those locations so that the fiber installers don't accidentally
damage the existing utilities.
<li>Bury fiber from curb to house: a fiber installer puts in that last piece
of fiber from the drop point (by the street near your house) to your
house, typically to the garage. In the other direction, the fiber at
the curb runs to a nearby junction box, where the installer connects
it to an available port.
At this point a signal is available at the fiber end by the house.
<li>Install equipment outside and inside the house: an equipment installer
installs the
<a href="#equipment">equipment</a> on the outside of your house and
inside your house, and connects everything up.
If you have existing POTS service and
are switching to FiOS phone service, the phone lines that lead into
the house are disconnected from the old copper lines and connected to
the output of the ONT.
The installer calls the plant and works with them to bring up
the services you have ordered.
</ol>
<a name="cost">
<h4>How much will it cost?</h4>
</a>
Perhaps because I am a long-time customer, Frontier did not charge me any
kind of installation fee, which was nice. I don't know if that is standard.
One person told me the regular installation fee is $80.
<br/><br/>
For the monthly fees, it may cost significantly more than you expect.
<br/><br/>
Frontier
<a href="https://frontier.com/shop/internet/fios/simply-100">advertises</a>
their 100/100 internet service as $60 per month.
They have not yet managed to send me a clean monthly bill since my
upgrade, but based on my estimate of what that monthly amount is going to
be, I believe the effective cost of my 100/100 service is actually
over $100 per month.
Here's how that breaks down:
<ul>
<li>The $60 rate is only if you sign a two year contract and only for the
first six months. This is stated in the fine print on their web page,
along with "Equip. and other fees apply."
I did not sign a contract, so my monthly fee is $85.
<li>After Frontier told me I was required to change my phone service to
a new plan, and then was unable to deliver, my old grandfathered-in
rate of $18.90 disappeared and was replaced by the $30.99 rate for
<a href="https://frontier.com/shop/phone/phone-challenger">Digital Phone Unlimited</a>,
despite the fact that I don't actually have that service.
So I am currently paying an additional $12.09 per month for exactly
the same phone service that I had before ordering FiOS internet service.
<li>Taxes look like they will be about an additional $6 per month.
</ul>
One other annoyance relating to cost: Frontier offered me a $100 gift card
for signing up with them for FiOS internet. When I went to activate the
gift card on their web site, I was presented with a terms and conditions
screen requiring me to agree to a new 1 year term agreement. I had chosen
not to sign a contract and to pay $85/month rather than $60/month, so it
felt kind of like they were trying to pull a fast one on me by hoping I
would activate the gift card without reading the fine print.
<a name="frontiers-problems">
<h3>Frontier's Problems</h3>
</a>
<ul>
<li>Frontier's web site does not provide very good information about
what service options are available.
<li>If you call their Customer Service outside of their working hours,
you get a message telling you they are closed, but that message does
not tell you when they are open, and it's not an easy thing to find
on their web site.
<li>Different people at Frontier will give you different answers to the same
questions. For example, I asked whether I would
need to upgrade my copper-wire phone service to fiber; some said
yes, some said no. Or sometimes first one answer then the other.
One person suggested I put my phone service on a separate account
from my internet service; another told me I could not do that.
<li>Frontier's phone bills provide tons of details about taxes, but almost
no details about regular charges. For example, I have two phone lines,
and for most of the last few years they were billed as one line item
labeled "Residence Line", with no indication that there were two lines.
<li>Frontier's computers significantly constrain what their people can see
and do. Or maybe their programs are just really hard to use.
The customer service reps can't see the details of service calls,
and the service techs can't see the account details. It is apparently
not obvious when a customer has multiple accounts being billed together.
And nobody could see anything about my DSL line.
</ul>
<a name="timeline">
<h3>Timeline</h3>
</a>
<table border=1 style="table-layout: fixed;">
<tr>
<th style="width:7em">Date</th>
<th>Event</th>
</tr>
<tr>
<td>2017-03-02 Th</td>
<td>Online chat #1 with Frontier (43 minutes)</td>
</tr>
<tr>
<td>2017-03-06 Mo</td>
<td>Online chat #2 with Frontier (23 minutes)</td>
</tr>
<tr>
<td>2017-03-15 We</td>
<td>Online chat #3 with Frontier (55 minutes)</td>
</tr>
<tr>
<td>2017-03-18 Sa</td>
<td>Online chat #4 with Frontier (20 minutes, then cut off)</td>
</tr>
<tr>
<td>2017-03-20 Mo</td>
<td>Online chat #5 with Frontier (estimated 20 minutes)</td>
</tr>
<tr>
<td>2017-03-21 Tu</td>
<td>Online chat #6 with Frontier (1 hour and 38 minutes)</td>
</tr>
<tr>
<td>2017-04-09 Su</td>
<td>Phone call with Frontier to order FiOS, service scheduled for Apr 11
(1 hour and 17 minutes)
</td>
</tr>
<tr>
<td>2017-04-11 Tu</td>
<td>Installer came out, couldn't do anything because they have not yet
buried the fiber from the curb to the house
</td>
</tr>
<tr>
<td>2017-04-11 Tu</td>
<td>Called Frontier to reschedule installation, was told the current
installer has not yet entered his notes, please call back in 24 hours
(12 minutes)
</td>
</tr>
<tr>
<td>2017-04-13 Th</td>
<td>DSL service died at about 12:30pm
</td>
</tr>
<tr>
<td>2017-04-13 Th</td>
<td>Utility locators started painting colored lines where existing services
are buried
</td>
</tr>
<tr>
<td>2017-04-13 Th</td>
<td>Called Frontier to try to get DSL line fixed (24 minutes)
</td>
</tr>
<tr>
<td>2017-04-14 Fr</td>
<td>Fiber installers installed the curb-to-house fiber (before all the
locators had painted their lines)
</td>
</tr>
<tr>
<td>2017-04-14 Fr</td>
<td>Another locator came out to paint lines; when I pointed out that
the fiber had already been installed, he stopped painting, took
his final photos, and left
</td>
</tr>
<tr>
<td>2017-04-14 Fr</td>
<td>Called ISP to try to get DSL line fixed (12 minutes)
</td>
</tr>
<tr>
<td>2017-04-14 Fr</td>
<td>Called Frontier (multiple times) to check on status of FiOS order
(the fiber was installed this morning, but they said
the order had not yet been updated to show that)
(8 minutes + 13 minutes + 12 minutes + 25 minutes)
</td>
</tr>
<tr>
<td>2017-04-15 Sa</td>
<td>Called Frontier to check on the status of my FiOS order
(8 minutes)
</td>
</tr>
<tr>
<td>2017-04-18 Tu</td>
<td>Installer came out and completed the physical installation of
the equipment, got the FiOS internet service working.
He was unable to get the phones working over fiber, so switched
everything back to copper and left, with everything working
(3 hours and 10 minutes)
</td>
</tr>
<tr>
<td>2017-04-18 Tu</td>
<td>Called Frontier, canceled the remaining order to move the second
line over to fiber (scheduled for tomorrow)
(7 minutes)
</td>
</tr>
<tr>
<td>2017-04-25 Tu</td>
<td>Our main line stopped working, was unable to be reached from
outside our exchange
</tr>
<tr>
<td>2017-04-25 Tu</td>
<td>Called Frontier to report our main phone line not working
(44 minutes)</td>
</tr>
<tr>
<td>2017-04-26 We</td>
<td>Called Frontier to continue discussions about non-working phone
(1 hour and 35 minutes)</td>
</tr>
<tr>
<td>2017-04-30 Su</td>
<td>Received a call from Frontier at about 8:15am this morning
on the main line, he said it was now fixed (1 minute)</td>
</tr>
<tr>
<td>2017-05-01 Mo</td>
<td>Phone seems to have been working today, we received at least one
incoming phone call</td>
</tr>
<tr>
<td>2017-05-02 Tu</td>
<td>Called Frontier in the morning because my main phone was not working
again (39 minutes, then was cut off)</td>
</tr>
<tr>
<td>2017-05-02 Tu</td>
<td>My wife called Frontier mid-day about the non-working phone
(15 minutes)</td>
</tr>
<tr>
<td>2017-05-02 Tu</td>
<td>Called Frontier in the evening to continue the call from this morning
(14 minutes)</td>
</tr>
<tr>
<td>2017-05-04 Th</td>
<td>Frontier called, the line is working again</td>
</tr>
<tr>
<td>2017-05-08 Mo</td>
<td>Called Frontier to have them correct errors on my April bill
(the first received since I started FiOS service) (12 minutes)</td>
</tr>
<tr>
<td>2017-06-07 We</td>
<td>Received second bill since switching to FiOS - still wrong</td>
</tr>
<tr>
<td>2017-06-14 We</td>
<td>Called Frontier to deal with problems on my May bill (48 minutes)</td>
</tr>
</table>
<br/>
Total time (as of June 14): 20.3 hours
<ul>
<li>Web research: 5 hours
<li>Chat: 4.3 hours
<li>Place order: 1.3 hours
<li>Installer: 3.2 hours
<li>Followup phone calls (through June 14): 6.5 hours
</ul>
<a name="quotes">
<h3>Selected Quotes</h3>
</a>
I took notes on all my phone calls with Frontier, including writing down
certain things verbatim. For your entertainment, I present here some of those
quotes, in no particular order. I will let you imagine the context.
<ul>
<li>That is very confusing.
<li>Why can't I see that one?
<li>I don't know why they didn't just leave it alone.
<li>The program is wrong.
<li>How are you an R-U out of Washington?
<li>... and that's what I'm not seeing.
<li>We don't do these very often.
<li>Within our system we have nine different portals where we have to test things.
<li>This is very new to me, I have never dealt with two lines like this.
<li>Sorry this is taking so long, we'll get it figured out for you.
<li>It's not giving me anything.
</ul>Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-38808432357117532302014-12-21T20:45:00.000-08:002014-12-21T20:45:57.119-08:00The Rule Of LawA layman's view of The Rule of Law. IANAL.
<br />
<h3>
Contents</h3>
<ul>
<li><a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#this-i-believe">This I Believe</a>
</li>
<li><a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#importance-of-law">The Importance of Law</a>
</li>
<li><a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#prescriptive-and-proscriptive-law">Prescriptive and Proscriptive Law</a>
</li>
<li><a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#law-vs-convention">Law versus Convention</a>
</li>
<li><a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#multiple-systems-of-law">Multiple Systems of Law</a>
</li>
<li><a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#meta-law">Meta Law</a>
</li>
<li><a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#law-and-software">Law and Software</a>
</li>
<li><a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#law-and-oop">Law and Object-Oriented Programming</a>
</li>
<li><a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#law-and-mind">Law and Mind</a>
</li>
</ul>
<a href="https://www.blogger.com/null" name="this-i-believe"></a>
<h3>
This I Believe</h3>
Some years ago
<a href="http://www.npr.org/">NPR</a>
started running a series called
<a href="http://www.npr.org/templates/story/story.php?storyId=4538138">
This I Believe</a> as a tribute to
<a href="http://en.wikipedia.org/wiki/Edward_R._Murrow">Edward R. Murrow</a> and his
original 1951 radio program of the same name.
As I commuted I would occasionally catch an episode and hear an essay
about the topic in which a contributor believed.
I would listen to an essay around a weighty topic as
<a href="http://thisibelieve.org/essay/10/">God</a>,
<a href="http://thisibelieve.org/essay/13/">Love</a>,
<a href="http://thisibelieve.org/essay/8/">Funerals</a>,
<a href="http://thisibelieve.org/essay/12/">Good and Evil</a> or
<a href="http://thisibelieve.org/essay/15/">Public Service</a>
and think, "no", "maybe", or "yeah, sure".
Then one day I heard Michael Mullane's essay on
<a href="http://thisibelieve.org/essay/9574/">The Rule Of Law</a>
and I thought "Yes! <i>This</i> I believe!"
<br />
<br />
I particularly liked Michael's point that the
<a href="http://www.austlii.edu.au/au/journals/MqLJ/2004/7.html">
Tinkerbell effect</a> applies to the Rule of Law.
As he says, God exists (or does not exist) whether or not you or
I believe that to be so,
but with the Rule of Law, it can only exist if almost all of us
believe in it and follow it.
As Death says to a human near the end of Terry Pratchett's
<a href="http://www.terrypratchettbooks.com/index.php/us/books">Discworld</a> book
<a href="http://books.google.com/books?id=HeVsZo0E7ZkC&printsec=frontcover">Hogfather</a>,
"Y<span>OU NEED TO BELIEVE IN THINGS THAT AREN'T TRUE</span>.
H<span>OW ELSE CAN THEY <i>BECOME</i></span>?"
(Death always talks <span>IN CAPITAL LETTERS</span>.)
<a href="https://www.blogger.com/null" name="importance-of-law"></a>
<br />
<h3>
The Importance of Law</h3>
Why is law important?
The American
<a href="http://www.archives.gov/exhibits/charters/declaration_transcript.html">Declaration of Independence</a>
asserts that "all men are created equal"
and
<a href="http://www.un.org/en/documents/udhr/">
The Universal Declaration of Human Rights</a>
asserts that "all human beings are born free and equal in dignity and rights."
To support that position we need a system of law that in fact
treats all people equally.
But even if the law does not protect all of the fundamental human rights,
it can provide an important benefit to its society:
<a href="http://www.utexas.edu/law/conferences/measuring/The%20Papers/Rule%20of%20Law%20Conference.crosslindquist.pdf">stability</a>
through
<a href="http://en.wikipedia.org/wiki/Legal_certainty">predictability</a>.
<br />
<br />
To be predictable, the system of laws must be:
<br />
<ul>
<li>Understandable - the laws can be understood by most people.
</li>
<li>Consistent - individual laws do not conflict with each other.
</li>
<li>Extensive - the laws cover all common situations and a large portion
of less common situations.
</li>
</ul>
There have been many successful nations that followed the Rule of Law
with different laws for different classes of people,
including
<a href="http://en.wikipedia.org/wiki/Social_class_in_ancient_Rome">Rome</a>
and
<a href="http://en.wikipedia.org/wiki/Slavery_in_ancient_Greece#Origins_of_slavery">Greece</a>.
A system of law can provide stability and a foundation for an orderly
and effective society without treating all people equally.
<a href="https://www.blogger.com/null" name="prescriptive-and-proscriptive-law"></a>
<br />
<h3>
Prescriptive and Proscriptive Law</h3>
Prescriptive laws are those that tell us what we must do, such as
<a href="http://en.wikipedia.org/wiki/Honour_thy_father_and_thy_mother">
Honour thy father and thy mother</a>.
Proscriptive laws are those that tell us what we must not do, such as
"Thou shalt not kill".
<br />
<br />
You can think of prescriptive law as
additive manufacturing:
you can start with nothing, and add pieces until you get something useful,
like building up a sculpture by adding little pieces of clay, or
<a href="http://en.wikipedia.org/wiki/3D_printing">3D printing</a>.
<br />
<br />
Proscriptive law is more like
subtractive manufacturing:
you start with a block of something and carve away pieces until
you get the desired result,
like starting with a chunk of marble and carving a sculpture out of it, or
<a href="http://en.wikipedia.org/wiki/Machining">machining</a>.
<br />
<br />
(But don't try searching for additive law or subtractive law
unless you are working with primary colors. :-) )
<br />
<br />
Given the assumption of freedom
in both of the Declarations <a href="https://www.blogger.com/blogger.g?blogID=7045524330253482541#importance-of-law">above</a>,
it's easier to start by saying people can do anything, then add
proscriptive laws specifying what they can't do.
Compared to the complete freedom and anarchy of a society with no laws,
you can get pretty far down the road to stability just with proscriptive laws.
Of the
<a href="http://en.wikipedia.org/wiki/Ten_Commandments">Ten Commandments</a>,
eight are proscriptive and only two are prescriptive.
<a href="https://www.blogger.com/null" name="law-vs-convention"></a>
<br />
<h3>
Law versus Convention</h3>
While the Rule of Law normally refers to the explicit and codified laws
on the books, which can be enforced by the state, there is another set
of rules that most of us live by which are not legally mandated.
These conventions include social guidelines that prescribe how to
behave and communicate, including when and how it is appropriate to
touch (such as shaking hands or a pat on the back),
to ask for something (with "please" and "thank you"),
to offer advice
("<a href="http://jim-mcbeath.blogspot.com/2008/09/true-kind-necessary.html">true/kind/necessary</a>")
or <a href="http://jim-mcbeath.blogspot.com/2008/12/apology-abcs.html">apologies</a>,
and many other behaviors.
<br />
<br />
These conventions don't have the force of law.
If you break these rules, you won't be sent to jail or be forced
to pay someone monetary damages -
but you might find that you are a little less successful and your
life might be a little less pleasant.
Like laws, conventions are only useful if most of us agree on them,
and like laws, a widely accepted and understood set of conventions
helps make the world a little bit more predictable, which in turn
makes it a little bit easier for people to make plans and be successful.
<br />
<br />
In effect, social conventions are simply another layer of "laws"
that sit below the constitutional laws and the statute laws
(and in reality the American legal system has many other levels than
just those two).
<a href="https://www.blogger.com/null" name="multiple-systems-of-law"></a>
<br />
<h3>
Multiple Systems of Law</h3>
I am intrigued by the fact that we have so many different implementations
of the Rule of Law.
Every nation on Earth that abides by the Rule of Law has its own
system of law.
The ways in which the laws of nations interact is as varied as the
relationships between the nations.
For example, American Law has specific sections dealing with the
fact that there are Native American "domestic dependent nations"
within its borders that have their own laws.
<br />
<br />
Similarly, every nation has a different set of social conventions,
those unwritten rules that lubricate our everyday interactions.
<br />
<br />
On top of all those different systems of law, we have
<a href="http://www.hg.org/international-law.html">International Law</a>,
with the intent of providing structure for interactions between nations
when those nations have different and possibly incompatible systems of laws.
Two aspects of International Law that I find particularly thought-provoking
are the
<a href="http://en.wikipedia.org/wiki/Law_of_war">Law of War</a>, and
<a href="http://www.law.cornell.edu/wex/jurisdiction">Jurisdiction</a>.
<br />
<br />
(For an interesting bit of history about Jurisdiction, read about
<a href="http://en.wikipedia.org/wiki/Peine_forte_et_dure">Peine forte et dure</a>.)
<a href="https://www.blogger.com/null" name="meta-law"></a>
<br />
<h3>
Meta Law</h3>
In order to be predictable, the laws must be stable and not change often;
but the laws must sometimes be changed in order to cover new situations
or to correct problems in existing laws.
One approach to improving the predictability of the system of laws
while still allowing for change is
to use a layered approach, where some laws are considered more important
than others and are thus harder to change.
The set of harder-to-change laws typically includes
the rules on how to change the laws.
This is the basis of the constitutional model, as is used in the United States,
in which the most important laws are embodied in the constitution,
with rules that make those laws much harder to change than regular laws.
A constitution will typically include rules on how both
"normal" rules and the rules embodied in the constitution can be changed.
<br />
<br />
Back in 1982, a "constitution" game by Peter Suber called Nomic
appeared in
Douglas R. Hofstadter's column, "Metamagical Themas," in
<i>Scientific American</i>.
In this game, players take turns proposing changes to the rules of the game.
The rules start out in two categories, "immutable" and "mutable",
corresponding to the simple two-level "constitutional" and "statute" law
that Americans are taught in civics classes.
The rules of the game tell how a player wins the game, and also tell
how the rules can be changed - including how to change the rules that
tell how to win and how to change the rules.
The Nomic game is intended to illustrate the mechanisms and possibilities
described in Peter Suber's book
<i>The Paradox of Self-Amendment</i>, available
<a href="http://www.earlham.edu/~peters/writing/psa/index.htm">online</a>.
For the quickest read on the game, you can jump straight to the
<a href="http://www.earlham.edu/~peters/writing/nomic.htm#101">rules</a>,
but the
<a href="http://www.earlham.edu/~peters/writing/nomic.htm">game description</a>,
although somewhat lengthy, is also interesting.
<br />
<br />
In Suber's book he starts by asking how a legal system can deal with
paradox, when there are laws that directly contradict each other, and he
<a href="http://www.earlham.edu/~peters/writing/psa/pref1.htm">notes</a> that
"paradoxes come and go without much notice and are dealt with without much ado."
<br />
<br />
Given that systems of laws seem always to be self-referential (since they
include rules about how to change the rules), attempting to craft a
system of laws that is also complete and consistent would seem to run into
a version of
<a href="http://math.stanford.edu/~feferman/papers/Godel-IAS.pdf">
Gödel's Incompleteness Theorem</a>.
In practice, systems of laws are not really complete and still
blithely violate consistency, yet manage to be quite useful despite
their flaws.
<a href="https://www.blogger.com/null" name="law-and-software"></a>
<br />
<h3>
Law and Software</h3>
The title of this section might refer to laws that affect software,
such as copyright law,
or it might refer to the use of software to assist in the application
of law, such as computerized law indexes or
<a href="http://www.yalelawjournal.org/the-yale-law-journal/content-pages/regulation-by-software/">
Regulation by Software</a>;
but in fact, I am referring to the use of law as a concept in defining
how software works.
<br />
<br />
As in a society, a programming language is built on a set of rules that
describe how statements in the language are interpreted by the computer.
The developer uses his knowledge of these rules to create a program that
instructs the computer to do something that is useful to the developer.
<br />
<br />
Imagine trying to program in a computer language with no rules.
How could you get anything done?
You could never predict the results of a statement, so you could never
make a program that produced anything predictable.
<br />
<br />
Just as different societies each have their own set of rules,
different programming languages each have their own set of rules.
And just as with social conventions, different groups of programmers
typically adopt programming conventions that are not enforced by the
compiler but are intended to make life a bit simpler for the
developers in the group.
<br />
<br />
In fact, all of the concepts discussed above are applicable to software.
You can consider that as we take a look at what it means
to define software in terms of laws.
<a href="https://www.blogger.com/null" name="law-and-oop"></a>
<br />
<h3>
Law and Object-Oriented Programming</h3>
Back in 1987
<a href="http://scholar.google.com/citations?user=VLgJXtQAAAAJ&hl=en">Naftaly Minsky</a> and David Rozenshtein published
<a href="http://scholar.google.com/citations?view_op=view_citation&hl=en&user=VLgJXtQAAAAJ&citation_for_view=VLgJXtQAAAAJ:_FxGoFyzp5QC">"A Law-Based Approach to Object-Oriented Programming"</a>
(available for purchase
<a href="http://dl.acm.org/citation.cfm?id=38851">on-line</a>)
in which they discussed how an object-oriented system can be described
in terms of the laws that control the exchange of messages between objects.
(Minsky has published quite a few
<a href="http://scholar.google.com/scholar?hl=en&q=naftaly+minsky">
other papers</a>
on related topics concerning law and software.)
<br /><br />
They start by defining objects as containing state and program, with
four primitive messages (prefixed by the octothorpe character, #)
to create (#new) and destroy (#kill) objects and
to get (#get) and set (#mutate) state.
Messages are defined as a triplet of sender, message text, and target.
Message delivery goes through the law system, which can take one of
three actions:
<br />
<ol>
<li>The message can be delivered to its target.
</li>
<li>The message text and/or target can be modified and then delivered.
</li>
<li>The message can be blocked and thus not delivered.
</li>
</ol>
With these definitions and a permissive law that allows any object to
send any message to any other object, the system does not exhibit
many of the characteristics typically associated with object-oriented
systems.
<br /><br />
They then examine the effect of different kinds of laws,
such as allowing primitive messages to be sent only by the same object.
Through this approach they show how to implement common
object-oriented features such as
encapsulation, inheritance, and class variables
as well as less common features such as multiple inheritance,
exclusion of methods from inheritance, and triggers.
<br /><br />
Given that the program is part of the state of the object,
it can be modified with a #mutate message,
so it is possible to describe self-modifying programs within this framework.
The laws of the system control whether and how this message is
allowed to be sent.
<br /><br />
By defining the laws in objects that
are themselves part of the system, those laws can then be changed.
The system could start with a separate subset of laws that control
how the laws can be changed, making this approach look very much like
Suber's Constitution game.
<br /><br />
The law system allows the laws to modify the content of
a message or redirect it to a different target,
allowing for the implementation of security checks and other
forms of enforced delegation.
<br /><br />
I am not aware of a production system that directly uses this style of
law-based control of message passing, but there are some systems that
use a conceptually similar method of applying a set of rules to some messages
to control their delivery.
For example, in the
<a href="https://docs.oracle.com/javase/tutorial/essential/environment/security.html">
Java security model</a>
different environments can
have different implementations of the SecurityManager, each with its
own definition of the security policy (i.e. rules) that controls
whether certain actions are allowed to be taken, which can be viewed as
allowing the messages requesting those actions to be delivered.
The
<a href="http://moi.vonos.net/java/osgi-security/">OSGi security model</a>
goes further towards being a general law-based system,
including the ability to specify rules via a string and
to compose multiple security policies.
<a href="https://www.blogger.com/null" name="law-and-mind"></a>
<h3>
Law and Mind</h3>
For both societies and software, laws are rules telling us what we must and
must not do, or do differently,
and conventions are rules telling us what we should and
should not do, or do differently.
By following these rules and conventions, a society or a software system
can be far more productive than one with the same underlying capabilities
but where the rules and laws are less cohesive and effective
or are not followed.
<br /><br />
Could it be that the same is true for our minds?
According to Marvin Minsky's theory of the mind as set forth in
<a href="http://www.amazon.com/The-Society-Mind-Marvin-Minsky/dp/0671657135">
Society of Mind</a>,
our minds are composed of many small agents communicating with each other.
Minsky's agents are very small pieces, and the communication between them
is below our level of awareness.
Perhaps our minds use something like Naftaly Minsky's law-based message
delivery mechanism to monitor and control these low-level communications
between agents.
<br /><br />
Maybe the biggest difference between people who are productive and
those who are not is in the different internal rules the two minds follow,
and not so much a difference in raw underlying capability.
Maybe productive people have a better set of mental rules controlling
the messages within their minds.
And if that is true, that leads to an interesting question:
to what extent is it possible for people to rewrite
their own low-level internal communication rules
to improve their performance,
and how might that be accomplished?
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-76554343570440932922014-07-06T12:57:00.000-07:002014-07-06T13:01:32.388-07:00From Obvious To AgileWhat do you do when obvious isn't?
<h3>Installing new fence posts</h3>
Many years ago I had a fence that needed to be repaired.
I got a recommendation for a fence repair man from a friend and had him
come out to take a look.
He said the panels between the posts were fine and did not need to be
replaced, I just needed new posts.
He quoted me a price for installing new fence posts
that seemed quite reasonable, and I accepted his bid.
<br/><br/>
A few days later he came back to do the job.
After he had been out there working for a while, I went out to take a look.
I was surprised when I saw how he had installed the new fence posts.
He had not removed the old posts and put new posts in their places,
as I had assumed; instead, he simply planted a new post next to each
old post and strapped them together.
I was flabbergasted, and complained to him that my expectation was that
he was going to take out the old posts and replace them with new posts.
He was nonplussed. "I told you I would install new posts," he said.
"Taking out the old posts would be way more work, and I would have to
charge you more."
<br/><br/>
Well, he had me: he had indeed said only that he would install new posts.
I was the one who assumed he would take out the old posts.
I grumbled, paid him extra to replace a few of the old posts where it was
particularly troublesome to have an extra post sticking out,
and had the whole fence replaced the right way a few years later.
<h3>Keep using gmail</h3>
One of the startups at which I worked used gmail and was acquired by
a large company that used Exchange.
Concerned about the possibility of having to move to
what we felt was a worse system,
we asked what would happen with email.
We were relieved when they said we could keep using gmail.
<br/><br/>
On the very first day that we were officially part of the new company,
we were all told that we now had Exchange email accounts.
"Hey!," we said, "you told us we could keep our gmail accounts."
"Yes, you can," came the response, "but you also need to have an
Exchange account for all official company email."
<br/><br/>
This was, of course, not what we had expected when we asked if we could
keep our gmail accounts.
But, as with the new fence posts, they had in fact kept their word and
let us keep our gmail accounts;
it was we who assumed that that would continue to be our only email account.
<h3>Everything under SCCS</h3>
At one of the places I worked, we hired a contractor to work on a subsystem.
At one point we became concerned about how he was managing his source code,
so we asked how he was doing that.
"Everything is under sccs," he said.
(This was well before the days of
<a href="http://git-scm.com/">git</a>,
<a href="http://subversion.apache.org/">subversion</a>,
<a href="http://www.nongnu.org/cvs/">cvs</a>, or even
<a href="https://www.gnu.org/software/rcs/rcs.html">rcs</a>; at the time,
<a href="http://en.wikipedia.org/wiki/Source_Code_Control_System">sccs</a>
(Source Code Control System)
was what most people in our industry were using.)
When he finally delivered the source code to us, we were annoyed to discover
that he simply had a directory named "sccs", and all of his source code
was contained in that directory; there was in fact no versioning or history.
<br/><br/>
Once again, this was not what we had expected.
When he said "sccs" we assumed he was talking about the source code
control system, when in fact he was just referring to a directory name;
and when he said "under" we assumed he meant "managed by", when in fact
he just meant "contained in."
<h3>A new and improved version of Android</h3>
My first smart phone was an Android phone running version 2.2.
I watched as the newer versions of Android came out, filled with
interesting new features.
Finally, an over-the-air update was available for my phone.
I eagerly updated and started playing with the new features.
My first disappointment was with the new and definitely not improved
performance: my phone was slow and laggy, and it no longer lasted
even one day on a full charge.
<br/><br/>
I was even more dismayed to discover that they had removed USB
<a href="http://www.phonescoop.com/glossary/term.php?gid=356">Mass Storage Mode</a>
(MSC or UMS)
and replaced it with a significantly less functional
alternative,
<a href="http://www.phonescoop.com/glossary/term.php?gid=505">MTP</a>
(Media Transfer Protocol).
In my case, it was completely non-functional for my use, because my
home desktop machine was running Linux, and at the time there was not
a working Linux driver for MTP mode.
<br/><br/>
I was, as you might expect, pretty ticked off.
I had assumed without thinking about it
that they would not remove a significant feature from
a new version of the software, but they never said that.
<h3>Alternate Interpretations</h3>
Ask yourself: when reading the above anecdotes, did you realize in advance
of the denouement what the problem would be for all of them?
If it had been you, would you have made the same assumptions as I did?
<br/><br/>
Sometimes something seems so obvious to us that it does not even cross
our minds that there might be an alternate interpretation.
<br/><br/>
I don't think it is possible for us to see these alternative interpretations
in every case; often it is something with which we have had no experience,
so could not be expected to know.
We do, of course, sometimes consider alternative interpretations.
In the future, if someone tells me they will install new fence posts,
I will be sure to ask for more details.
But we have to make assumptions as we deal with the world every day.
If we examined every statement and every experience
for alternative interpretations, that would
consume all of our time, and we would not have any time left to
pursue new thoughts.
We learn to make instant and unconscious judgment calls:
as long as what we hear and see has a high enough probability of an
unambiguous interpretation, the possibility that there is an alternate
interpretation does not bubble up to our conscious minds.
Overall this is a very effective strategy that lets us focus
our mental energies on situations where an unusual outcome is
more likely.
But this does mean that every once in a while we will miss something,
with undesired results.
<h3>Going beyond obvious</h3>
I have already given my recommendation to
<a href="http://jim-mcbeath.blogspot.com/2008/10/state-obvious.html">State The Obvious</a>.
However, as you can see from the above anecdotes, this is not always enough.
But what else can we do?
<br/><br/>
If you consider the anecdotes above,
you might notice that, in most of them,
by the time I realized that I had made an incorrect assumption,
the deed was done and I was stuck with an undesired result.
But the fence post story was a little different:
in that case, I checked up on the work before it was done.
Because I discovered the problem while it was happening,
I was able to ask for changes and get a result that
was closer to what I wanted.
<h3>Software Development</h3>
Not all of my blog posts are about software development,
but in this case the application is obvious.
Well, it seems obvious to me, but just in case it is not obvious
to everyone, I will follow my own advice and explain in detail.
<br/><br/>
In the traditional
<a href="http://en.wikipedia.org/wiki/Waterfall_model">waterfall</a> process,
a complete and detailed specification of the desired system is created
before doing any of the implementation work.
Once that spec is done, the system is built to match it.
But, as we have seen from the anecdotes above,
even a very simple spec, such as "install new fence posts", might be
interpreted in a bizarre way that still matches the letter of the specification.
In this case, the result might be something that arguably
matches what was specified, but is not what was wanted.
<br/><br/>
Based on my personal experience and anecdotes I have heard from others,
I believe that it is <i>very</i> difficult to write a good spec
for something new,
and impossible to
<a href="http://www.navair.navy.mil/nawctsd/Resources/Library/Acqguide/SpecWrit.htm">write a spec</a>
that can not be interpreted by somebody
in some bizarre way that satisfies the spec but is not the desired result.
<br/><br/>
Given that we can't guarantee that we can write a spec that will not be
misinterpreted, what is the alternative?
I think the only alternative is to do what I did in the fence-post case:
check up on the work and make corrections along the way.
This is embodied in a couple of the value statements in
<a href="http://agilemanifesto.org/">The Agile Manifesto</a>:
"Customer collaboration over contract negotiation" and
"Responding to change over following a plan".
<br/><br/>
If you are asking someone to create something that is very similar to
things that have been created before,
and through previous common experience there is already a shared
vocabulary sufficient to describe how the desired result compares to
those previous creations,
then you can perhaps write a spec that will get you what you want.
The closer the new thing is to those previously created things, the
easier that will be.
But in software development, where the goal is often specifically to
create something novel,
this is particularly difficult.
In that situation, I think that creating and then relying solely on a detailed
spec is less likely to result in a satisfactory outcome;
I believe an agreement on direction and major points, followed by
keeping a close eye on progress,
paying particular attention when something is being done for the first time,
is the key to good results.
<h3>Writing a Spec</h3>
I'm not saying
<a href="https://gettingreal.37signals.com/ch11_Theres_Nothing_Functional_about_a_Functional_Spec.php">
don't write a spec</a>.
I'm saying you need to recognize that a spec
<a href="http://blog.codinghorror.com/dysfunctional-specifications/">
won't take you all the way</a>,
and a poorly written spec can
<a href="http://yarchive.net/comp/linux/specs.html">hinder your progress</a>.
Writing a spec is like looking at a map and planning your route:
often necessary but seldom sufficient.
You need to be prepared for construction closures, blocking accidents,
or even additional interesting sights you might decide to see along the way.
For any of these diversions, you will need to reexamine your route in
the middle of the trip and select an alternative.
For a short trip, you might not run into any such problems and thus
not need to modify your route,
but the longer the journey the more likely that at some point you will
need or want to deviate from your original route.
<br/><br/>
If you are familiar with the roads and have a clear destination,
you might be able to dispense with the initial route planning completely:
just head in the right direction and follow the signs.
Or if you are on a discovery road trip and don't have a specific
destination, then heading out without a planned route is fine.
In most cases, though, some level of advance route planning will save time.
You just need to stay agile and be prepared to change your route
along the way.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-19288504691590990992013-11-03T11:28:00.000-08:002013-11-03T11:28:34.364-08:00Code GuidelinesA list of basic goals for creating code.
<br />
<br />
In our team project at work, we wanted to have a set of style guidelines
to allow everyone to more easily and quickly read the codebase and to
avoid spurious code reformatting changes.
As you might expect, there were different opinions on many points.
To avoid fruitless "my way is just better" discussions,
I wanted to step back and make sure we could all agree on some
general goals.
With that agreement in place, we could at least ask people to
explain how their preferred style on some point supports our
general goals.
If nobody can provide an argument to support a favored construct,
we might as well flip a coin.
<br />
<br />
Below are the goals I proposed and with which the team agreed.
I think many of these are obvious, but then I usually believe in
<a href="http://jim-mcbeath.blogspot.com/2008/10/state-obvious.html">
stating the obvious</a>.
The first two criteria below are also listed in my post on
<a href="http://jim-mcbeath.blogspot.com/2008/10/software-quality-dimensions.html">
Software Quality Dimensions</a>.
Your team may choose slightly different guiding principles, but I think
having the team agree on and write down their principles and asking
people to justify their proposed standards against those principles can
help short-circuit disagreements that might otherwise take longer to resolve.
<br />
<h3>
Goals</h3>
In order of priority, with the most important criteria first:
<b>First</b>, we want our code to be correct.
<br />
This means that the code must:
<br />
<ul>
<li>perform the desired primary behavior.
</li>
<li>behave in a defined way for expected error conditions.
</li>
<li>not have undesirable side-effects.
</li>
<li>not have security vulnerabilities such as buffer overflows or injections.
</li>
<li>not have memory problems such as leaks or use of released or uninitialized memory.
</li>
<li>run fast enough for the intended use cases
(but without <a href="http://c2.com/cgi/wiki?PrematureOptimization">premature optimization</a>).
</li>
</ul>
<b>Second</b>, we want our code to be robust.
<br />
This means that the code should be written in such a way as to minimize the probability of incorrect behavior under a wide range of conditions, including when:
<br />
<ul>
<li>it receives unexpected, corrupted, or no input data
(<a href="http://searchnetworking.techtarget.com/definition/graceful-degradation">graceful degradation</a>).
</li>
<li>a programmer unfamiliar with the code makes changes to it.
</li>
<li>the functionality of neighboring code changes.
</li>
<li>the development environment or toolset changes.
</li>
</ul>
<b>Third</b>, we want our developers to be as productive as possible.
<br />
This means the code should be written such that:
<br />
<ul>
<li>developers are unlikely to misunderstand what the code does
(<a href="http://c2.com/cgi/wiki?PrincipleOfLeastSurprise">principle of least surprise</a>).
</li>
<li>developers can read and understand the code quickly.
</li>
</ul>
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-36189756205219835832012-10-24T20:45:00.000-07:002012-10-24T20:45:23.786-07:00Role-Based AuthorizationA simple, uniform, powerful and extensible authorization model.
<ul>
<li><a href="#intro">Introduction</a>
<li><a href="#separation-of-concerns">Separation of Concerns</a>
<li><a href="#users">Users</a>
<li><a href="#actions">Actions</a>
<li><a href="#objects">Objects</a>
<li><a href="#roles">Roles</a>
<li><a href="#role-activation">Role Activation</a>
<li><a href="#role-hierarchies">Role Hierarchies</a>
<li><a href="#alternate-hierarchy-implementation">Alternate Hierarchy Implementations</a>
<li><a href="#interlude">Interlude</a>
<li><a href="#tasks">Tasks</a>
<li><a href="#domains">Domains</a>
<li><a href="#intermediate-summary">Intermediate Summary</a>
<li><a href="#times">Times, Periods and Schedules</a>
<li><a href="#locations">Locations, Areas and Regions</a>
<li><a href="#denials">Denials</a>
<li><a href="#exceptions">Exceptions</a>
<li><a href="#prioritization">Prioritization</a>
<li><a href="#summary">Summary</a>
</ul>
<h3><a name="intro">Introduction</a></h3>
The "three As" of security are:
<ul>
<li>Authentication - assuring that the user is who he says he is.
<li>Authorization - allowing each authenticated user to perform selected
privileged actions.
<li>Audit - recording privileged actions to allow review of changes
or potential abuse of privileges.
</ul>
Given authentication and auditing it is pretty simple to add a bit
more monitoring that is very useful for billing purposes and
resource management, so you
more often see the combination
AAA (<a href="http://en.wikipedia.org/wiki/AAA_protocol">Authentication, Authorization, Accounting</a>) or
AAAA (Authentication, Authorization, Audit, Accounting).
<br/><br/>
In this post I discuss only authorization.
Authentication and auditing are each big topics,
so I won't try to cover them here.
Similarly, I assume that the code and data are themselves secure.
In particular, I do not cover the issue of multiple security domains
and the problem of having lower security code make requests to
higher security code.
<br/><br/>
With my focus only on authorization,
in the discussion below I assume that the user has been authenticated
so that we can trust that piece of data within the application.
<br/><br/>
I will use the language of relational databases in this post
because it is well-known and precise.
An implementation of this model can use some other mechanism to
store and query the authorization data.
The SQL examples provide precision to the discussion, but you should
be able to skip the SQL code and still gain a basic understanding of the model.
<br/><br/>
In the SQL example code I indicate replacement variables within braces;
for example the string <code>{user}</code> in a SQL statement indicates
that the application should plug in the user name at that point
in the expression.
For a real implementation, the actual syntax would depend on the
database access package in use.
<br/><br/>
I have run into some authorization systems intended to provide
a powerful set of capabilities for a complex situation
that were, unfortunately, themselves so complex as to make it
difficult to understand how they were supposed to work, and
even after having it explained, difficult to remember because
there was not a simple underlying model to tie it all together.
<br/><br/>
In this post I present an approach to authorization
that I believe provides a very high
level of power with a model that is relatively simple to understand
and to extend as needed.
This model initially implements a
<a href="http://en.wikipedia.org/wiki/Role-based_access_control">
Role-Based Access Control</a> (RBAC) mechanism,
a widely used approach to security that is now a
<a href="http://csrc.nist.gov/groups/SNS/rbac/">NIST standard</a>.
I add a few extensions to the common model that make it start to look
more like an
<a href="http://www.axiomatics.com/beyond-rbac.html">
Attribute-Based Access Control</a> (ABAC) model.
<h3><a name="separation-of-concerns">Separation of Concerns</a></h3>
In an authorization system, we want to
<a href="http://effectivesoftwaredesign.com/2012/02/05/separation-of-concerns/">
separate</a>
the management of authorization from the application.
The application should ask permission for what it wants to do,
which permission is supplied by the authorization system.
All management of the granting of the authorizations is
handled from the authorization system,
completely outside of the application.
If you build a system in which any of the abstractions used in the
management of authorizations,
such as roles, appear in the application, then, as they say,
<a href="http://lostechies.com/derickbailey/2011/05/24/dont-do-role-based-authorization-checks-do-activity-based-checks/">
you are doing it wrong</a>.
<br/><br/>
In this post I focus only on the part of the system that determines
whether to grant authorization.
A separate system is required to maintain the data that is used by
the authorization system.
That maintenance can become quite complex in enterprise systems,
but I will not be discussing it further in this post except to
mention that the authorization mechanism described here can be
applied to the system that maintains the authorization data in order
to control who is allowed to modify what parts of that data.
<h3><a name="users">Users</a></h3>
Let's start with perhaps the simplest useful authorization model possible.
We begin with a one-column <i>user</i> table containing user names.
<br/><br/>
<input id="hlButton" type="submit" value="Highlight Syntax" onclick="highlightSyntaxFlipButton()"/>
<pre name="hlcode" class="sql"
>create table user(name varchar(32) primary key);
</pre>
When the application wants to check for our sole authorization,
it takes a passed-in authenticated user name
and calls the authorization function with that value.
The authorization function just checks to
see if that user exists in the table.
If so, the user is authorized and the authorization function returns true;
if not, the user is not authorized and the authorization function returns false.
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from user where name={user};
</pre>
The user-only model is too simple for most applications.
<h3><a name="actions">Actions</a></h3>
The next step is to add a one-column <i>action</i> table containing actions.
We will assume each action is represented by a string name,
although for performance reasons some might choose a different representation.
<pre name="hlcode" class="sql"
>create table action(name varchar(32) primary key);
</pre>
We add one row to this table for each restricted action;
for example, we might have entries for
<i>login</i>, <i>reboot_system</i>, and <i>view_system_users</i>.
<br/><br/>
With the addition of the <i>action</i> table we can no longer just
look up users in the <i>user</i> table.
We add a third table called <i>grant</i>
(or <i>auth_grant</i>, since <i>grant</i> is typically
a reserved word in SQL) with two columns
that are foreign-key columns to the <i>user</i> and <i>action</i> tables.
Each row of the <i>grant</i> table refers to a user and an action,
with the meaning that that user is granted authorization
to perform that action.
<pre name="hlcode" class="sql"
>create table auth_grant(
user varchar(32) not null,
action varchar(32) not null,
constraint FK_grant_user foreign key(user)
references user(name),
constraint FK_grant_action foreign key(action)
references action(name)
);
</pre>
Our authorization function will now accept a combination of values.
We will refer to this combination as the requested <i>operation</i>
(the NIST standard uses <i>transaction</i> as the unit for
which permissions are granted).
When an application wants to perform a potentially restricted operation,
it takes the passed-in authenticated user name,
adds the action it wants to perform,
and passes that data to the authorization function.
The authorization function takes the passed-in user and action arguments
and looks in the <i>grant</i> table
for a row in which the passed-in values for user and action
match the values in the corresponding columns in the table.
That row defines a <i>permission</i> to execute the requested operation.
If that row exists, the operation is authorized;
if that row does not exist, the operation is not authorized.
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from auth_grant where
user={user} and
action={action};
</pre>
The user+action model is sufficient for many simple systems, such as
granting login rights to some users and admin rights to other users.
<h3><a name="objects">Objects</a></h3>
With just users and actions, each action granted to a user effectively
has global scope within the system.
This is fine for actions such as login which truly are intended to be
global in scope, but we would also like to be able to specify that
certain actions can be performed on specific objects.
Modern operating systems include mechanisms to grant different
access rights, such as read-file or write-file,
to specific files based on the user.
<br/><br/>
We add a one-column <i>object</i> table containing references to the
objects in our system for which we want to be able to issue grants,
with one row for each such object.
We are making the simplifying assumption that each object already has
a unique identifier that can be stored in our database.
<pre name="hlcode" class="sql"
>create table object(name varchar(32) primary key);
</pre>
We add a third column to our <i>grant</i> table that is a foreign-key
column to the <i>object</i> table, exactly analogous to the existing
references to the <i>user</i> and <i>action</i> tables.
Each row of the <i>grant</i> table now refers to a user, an action and
an object, with the meaning that that user is granted authorization
to perform that action on that object.
<pre name="hlcode" class="sql"
>create table auth_grant(
user varchar(32) not null,
action varchar(32) not null,
object varchar(32) not null,
constraint FK_grant_user foreign key(user)
references user(name),
constraint FK_grant_action foreign key(action)
references action(name),
constraint FK_grant_object foreign key(object)
references object(name)
);
</pre>
If we still want to have actions with global scope,
such as the example of a login action in the user+action model,
we can add a special <i>system</i> object that can be used in that situation.
<br/><br/>
Our authorization requests from the application
now include three pieces of data.
We modify our function for authorizing a restricted operation to take
an argument specifying the object, along with the user and action
arguments that we already have.
The authorization function looks in the <i>grant</i> table as before,
but it now must find a row that matches all three fields rather than
only user and action.
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from auth_grant where
user={user} and
action={action} and
object={object};
</pre>
The user+action+object model presented here is used in many databases,
with the objects being database tables or views and the actions being
the four database actions of select, insert, update and delete.
There may also be additional actions such as grant (the ability to
create additional grants on an object) or actions that allow creating
and modifying users or databases.
<h3><a name="roles">Roles</a></h3>
In order to simplify the maintenance of grants when we have a large
number of users, we add a mechanism that allows us to group users together
and grant permissions to a group of users rather than just to a single user.
Users are grouped according to the roles they play; example roles are
<i>user</i>, <i>administrator</i>, and <i>superuser</i>.
<br/><br/>
We add a <i>role</i> table with one row for each role we define.
(We will look at other possible implementations later,
but this choice serves well for explaining the concepts.)
<pre name="hlcode" class="sql"
>create table role(name varchar(32) primary key);
</pre>
In order to indicate which users have been granted (assigned) which roles,
we add a <i>user_role</i> table with two columns:
the <i>user</i> column is a foreign key to the <i>user</i> table
that references the user, and
the <i>role</i> column is a foreign key to the <i>role</i> table.
A user having a role is indicated by adding a row to the
<i>user_role</i> table referencing that user and that role.
When granting authorization, a user will receive authorization
for all roles he has.
<pre name="hlcode" class="sql"
>create table user_role(
user varchar(32) not null,
role varchar(32) not null,
constraint FK_userrole_user foreign key(user)
references user(name),
constraint FK_userrole_role foreign key(role)
references role(name)
);
</pre>
We also add a <i>role</i> column to our <i>grant</i> table.
This column is a foreign key to the one column in our <i>role</i> table.
A row in the <i>grant</i> table can now refer either to a user or
to a role.
It must reference one or the other; while it might be possible to set up
a structure to enforce that constraint directly in the database,
we will skip that exercise and instead suggest that this constraint
could be enforced by an application-level database consistency check.
<pre name="hlcode" class="sql"
>create table auth_grant(
user varchar(32),
role varchar(32),
action varchar(32) not null,
object varchar(32) not null,
constraint FK_grant_user foreign key(user)
references user(name),
constraint FK_grant_role foreign key(role)
references role(name),
constraint FK_grant_action foreign key(action)
references action(name),
constraint FK_grant_object foreign key(object)
references object(name)
);
</pre>
The addition of roles is entirely an abstraction within the
authorization system; the application is not aware of roles.
An operation is defined by the same three values as before, and
the application calls the authorization function in the same way as before
to see if an operation is authorized,
but the authorization function has to do a little more work now.
<br/><br/>
The application still passes the user, action and object arguments to the
authorization function, and the authorization function still looks
in the <i>grant</i> table to see if that combination of user, action
and object is authorized, but now in addition to looking for a row
that exactly matches those three values, it also looks up all of the
roles the specified user has, and it looks for
a row in the <i>grant</i> table in which the action and object values
exactly match the values passed in and in which the role in the
<i>grant</i> table is one of the roles the user has.
If the authorization function finds
a row that exactly matches the action and object
and that exactly matches either the user or any of the user's roles
then the action is authorized; if no such matching row is found
then the action is not authorized.
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from auth_grant where
(user={user} or role in (select role from user_role where user={user})) and
action={action} and
object={object};
</pre>
The (user+role)+action+object model presented here has been used
in the
<a href="http://en.wikipedia.org/wiki/Filesystem_permissions#Traditional_Unix_permissions">Unix filesystem</a>
for many years, with the objects being files
and directories, the actions being read, write and execute/search,
and the roles called groups.
<br/><br/>
In the NIST RBAC model permissions can only be assigned to roles,
not to users.
A strict implementation of this aspect could easily be implemented
by dropping the user check in our authorization test
(which also means we can drop the <i>user</i> column in the
<i>grant</i> table):
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from auth_grant where
role in (select role from user_role where user={user}) and
action={action} and
object={object};
</pre>
Alternatively, we could think of each user as automatically being
assigned a unique role whose name is the same as the user name.
Or, we can choose never to assign any permissions to a user,
only assigning them to roles.
<h3><a name="role-activation">Role Activation</a></h3>
The NIST RBAC standard includes a concept called Role Activation
(or Role Authorization).
When a user logs in, some subset of his roles can be activated.
Allowing a user to activate and deactivate his assigned roles
gives the user a way to ensure that he (or some program he is running)
does not perform a privileged operation when he is not expecting it.
Permissions are only granted for active roles,
so even if a user has been given permissions through a role, a program will
not be able to take advantage of them unless the user
has activated a role that grants those permissions.
<br/><br/>
We can implement role activation globally by adding an <i>is_active</i>
column to the <i>user_role</i> table.
<pre name="hlcode" class="sql"
>create table user_role(
user varchar(32) not null,
role varchar(32) not null,
is_active boolean not null default false,
constraint FK_userrole_user foreign key(user)
references user(name),
constraint FK_userrole_role foreign key(role)
references role(name)
);
</pre>
When checking for authorization, we only include roles that are
active for that user.
If we continue to allow user-based permissions, then we would need to
add an <i>is_active</i> flag for those permissions as well.
When using activation it is simpler to exclude user-based permissions,
as is done in the NIST RBAC model.
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from auth_grant where
role in (select role from user_role where user={user} and is_active) and
action={action} and
object={object};
</pre>
The NIST RBAC standard uses session-based activation rather than
global activation.
This allows a user to have multiple sessions open simultaneously
with different roles active for each session.
To implement this, rather than adding an <i>is_active</i>
column to the <i>user_role</i> table, we create a <i>session</i>
table that keeps track of our sessions and a
<i>session_role</i> table that lists the roles that are active for
each session.
<pre name="hlcode" class="sql"
>create table session(
id varchar(32) primary key,
user varchar(32) not null,
constraint FK_session_user foreign key(user)
references user(name)
);
create table session_role(
session_id varchar(32) not null,
role varchar(32) not null,
constraint FK_sessionrole_sessionid foreign key(session_id)
references session(id),
constraint FK_sessionrole_role foreign key(role)
references role(name)
);
</pre>
When testing for authorization we only want to use roles that are
both assigned (in the <i>user_role</i> table) and active (in the
<i>session_role</i> table).
Assuming the mechanism that maintains active roles in the <i>session_role</i>
table ensures that the only roles appearing in that table are in the
<i>user_role</i> table
(i.e. only an assigned role from the <i>user_role</i> table
can be active in the <i>session_role</i> table),
then we can
modify the authorization function to accept an additional
argument which is the session_id, and change our implementation SQL:
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from auth_grant where
role in (select role from session_role where user={user} and session_id={session_id}) and
action={action} and
object={object};
</pre>
At this point our model includes the capabilities of RBAC0,
the first level of the NIST RBAC standard
(although the NIST model does not include <i>action</i> and
<i>object</i> as presented above).
However, in order to keep the discussion of the other aspects of
the model less cluttered, I will generally not be including role
activation in the remainder of this discussion except where noted.
<h3><a name="role-hierarchies">Role Hierarchies</a></h3>
Given the ability to group users into roles and thus simplify the number
of grants we need to create, we can generalize on that concept by also
allowing roles to be grouped into other roles.
<br/><br/>
In the discussion of Roles above, we added a <i>user_role</i> table
that allowed us to assign roles to users.
We now add a <i>role_hierarchy</i> table with <i>parent</i> and <i>child</i>
columns that allows us to assign roles (children)
to other roles (parents).
<pre name="hlcode" class="sql"
>create table role_hierarchy(
parent varchar(32),
child varchar(32),
constraint FK_rolehierarchy_parent foreign key(parent)
references role(name),
constraint FK_rolehierarchy_child foreign key(child)
references role(name)
);
</pre>
When collecting the list of roles for a user,
we now have to recursively consult the <i>role_hierarchy</i> table
to collect all of the child roles for any role the user has.
How this is actually done is heavily dependent on the implementation.
Some SQL databases include the ability to formulate recursive queries,
but most do not.
<br/><br/>
We hide this implementation detail inside a view that collects
the closure of the role-role relationships, effectively
flattening our hierarchy.
Defining this flattening in a view
allows us to change how we collect the closure of the roles
without affecting the queries that invoke this view.
In this particular example, our view is defined using a
non-recursive query that will suffice for a hierarchy
of limited depth.
<pre name="hlcode" class="sql"
>-- not a full closure if the hierarchy is too deep
create view role_closure as
select distinct user, a3.child as role from user_role
join role_hierarchy as a1 on user_role.role=a1.parent or
(user_role.user=a1.parent and user_role.role=a1.child)
join role_hierarchy as a2 on a1.child=a2.parent or
(user_role.user=a2.parent and user_role.role=a2.child)
join role_hierarchy as a3 on a2.child=a3.parent or
(user_role.user=a3.parent and user_role.role=a3.child)
;
</pre>
We can now use the <i>role_closure</i> view
in place of the <i>user_role</i> table:
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from auth_grant where
(user={user} or role in (select role from role_closure where user={user})) and
action={action} and
object={object};
</pre>
If we want to use session-based activation, we can do that by
modifying our <i>role_closure</i> view to be based on the
<i>session_role</i> table rather than the <i>user_role</i> table:
<pre name="hlcode" class="sql"
>-- not a full closure if the hierarchy is too deep
create view role_closure as
select distinct session_id, user, a3.child as role from session_role
join role_hierarchy as a1 on session_role.role=a1.parent or
(session_role.user=a1.parent and session_role.role=a1.child)
join role_hierarchy as a2 on a1.child=a2.parent or
(session_role.user=a2.parent and session_role.role=a2.child)
join role_hierarchy as a3 on a2.child=a3.parent or
(session_role.user=a3.parent and session_role.role=a3.child)
;
</pre>
As above when adding session-based role activation,
the authorization SQL includes the session-id and we no longer
allow user-based permissions:
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from auth_grant where
role in (select role from role_closure where user={user} and session_id={session_id}) and
action={action} and
object={object};
</pre>
This change can be made in any of the authorization SQL statements
given below to add session-based authorization where it is otherwise
not included.
<br/><br/>
Note that although we stated that the role parent/child relationships
form a hierarchy, there is actually no reason to limit it to that,
and our design does not preclude defining role relationships
that form a more complex graph.
We do want to avoid cycles in our role graph, as a graph with cycles
would not provide us any useful benefits, and we need to ensure
that our implementation does not blow up if the role graph happens
to have some cycles.
If we use the <i>role_closure</i> view implementation provided above,
an incidental benefit is that the closure mechanism is so simple and limited,
cycles will not cause any problems other than wasting a bit of
processing power.
<br/><br/>
The NIST RBAC standard defines both general and restricted forms of hierarchy
as part of the RBAC1 level.
The restricted form is a tree structure and
the general form is an arbitrary partial order.
Our model above support the general form.
<br/><br/>
NIST RBAC levels RBAC2 and RBAC3 add Constraints (to ensure support of
Separation of Duties)
and Symmetry (the ability to review permission-role assignments as well as
user-role assignments).
With the simple database implementation presented here, these capabilities
are available.
<h3><a name="alternate-hierarchy-implementation">
Alternate Hierarchy Implementations</a></h3>
In the implementation of user roles and role hierarchies above
we added a <i>role</i> table, a <i>user_role</i> table
and a <i>role_hierarchy</i> table,
we added a <i>role</i> column to the grant table,
we added a <i>role_closure</i> view
and we modified our example SQL select statement for checking authorization
to use that view.
In this section I present
three alternate approaches to this step when using a relational database,
and of course there are other approaches not
discussed here that are not based on a relational database.
These implementation alternatives do not affect the basic model
being developed.
<br/><br/>
In the first alternate approach, after defining the <i>role</i> table
we next define the <i>user_or_role</i> view that is the union of those
two tables.
<pre name="hlcode" class="sql"
>create view user_or_role as
(select name from user)
union all
(select name from role)
;
</pre>
In the <i>grant</i> table, rather than adding a <i>role</i> column and
having the <i>user</i>
column be a foreign key to the <i>user</i> table, we make the <i>user</i>
column a foreign key to the <i>user_or_role</i> view.
Unfortunately, it is typically not possible to declare a foreign key
to a view, in which case this foreign key relationship would
have to remain implicit and not enforced by the database
(it could be part of our application-level database consistency checks).
Nonetheless, the SQL statements that join using this foreign key will
work the same as if the foreign key were declared,
although performance may be an issue if the user_or_role view can not
be indexed.
By using a materialized view it might be possible to index the view
and have a foreign key refer to it,
but then we would need to deal with rematerializing the view every
time we changed the contents of the <i>user</i> or <i>role</i> tables.
<br/><br/>
Instead of creating a <i>role_hierarchy</i> table,
we do the same thing to the <i>user</i> column of the <i>user_role</i>
table as we did to the <i>grant</i> table, making it a foreign key
to the <i>user_or_role</i> view rather than to the <i>user</i> table.
This allows the <i>user_role</i> table to represent which roles have
other roles as well as which roles users have directly been given.
<br/><br/>
In our second alternate implementation, we start by defining
<i>user_or_role</i> as a table that contains the records for
both users and roles,
with an <i>is_role</i> column that indicates
whether a row represents a user or a role.
We then create <i>user</i> and <i>role</i>
as appropriate views into that table.
<pre name="hlcode" class="sql"
>create table user_or_role (
name varchar(32) primary key,
is_role boolean not null default false
);
create view user as
select name from user_or_role where not is_role;
create view role as
select name from user_or_role where is_role;
</pre>
As in our first alternate implementation, the <i>grant</i> table
points to the <i>user_or_role</i> table, as does the <i>user</i>
column in the <i>user_role</i> table.
<pre name="hlcode" class="sql"
>create table auth_grant(
user_or_role varchar(32) not null,
action varchar(32) not null,
object varchar(32) not null,
constraint FK_grant_user foreign key(user_or_role)
references user_or_role(name),
constraint FK_grant_action foreign key(action)
references action(name),
constraint FK_grant_object foreign key(object)
references object(name)
);
create table user_role(
user varchar(32) not null,
role varchar(32) not null,
constraint FK_userrole_user foreign key(user)
references user_or_role(name),
constraint FK_userrole_role foreign key(role)
references role(name)
);
</pre>
Many databases, including MySQL,
do not allow indexes or foreign keys on views, so neither of
the above two alternate implementations will work very well on those
databases,
and the table statements would have to be modified not to declare
foreign keys to view columns.
<br/><br/>
If we want to use indexes and foreign keys, we have to compromise our
data model a bit and not use views when we need foreign keys,
which leads us to our final alternative.
<br/><br/>
In our third alternate implementation, we don't have a
separate <i>role</i> table or view.
Instead, we use the <i>user_or_role</i> approach as in
the second alternative above:
we place the role names into the
<i>user</i> table and add an <i>is_role</i> column that indicates
whether a row represents a user or a role.
<pre name="hlcode" class="sql"
>create table user (
name varchar(32) primary key,
is_role boolean not null default false
);
</pre>
In our <i>user_role</i> table, in which the <i>role</i> column
was a foreign key to the <i>role</i> table, we make that column
instead be a foreign key to the <i>user</i> table, where we are
now storing our role names.
<pre name="hlcode" class="sql"
>create table user_role(
user varchar(32) not null,
role varchar(32) not null,
constraint FK_userrole_user foreign key(user)
references user(name),
constraint FK_userrole_role foreign key(role)
references user(name)
);
</pre>
We don't need a <i>role_hierarchy</i> table because we can now
represent those role-to-role relationships in the <i>user_role</i> table.
In our <i>role_closure</i> view we replace the <i>role_hierarchy</i>
references with <i>user_role</i> references.
<pre name="hlcode" class="sql"
>create view role_closure as
select distinct a0.user, a3.role from user_role as a0
join user_role as a1 on a0.role=a1.user or
(a0.user=a1.user and a0.role=a1.role)
join user_role as a2 on a1.role=a2.user or
(a0.user=a2.user and a0.role=a2.role)
join user_role as a3 on a2.role=a3.user or
(a0.user=a3.user and a0.role=a3.role)
;
</pre>
Because we are now storing our roles in the <i>user</i> table,
the <i>user</i> column in our <i>grant</i> table can refer to either
a user or a role, depending on what we are storing in the <i>user</i> table,
so we don't need the <i>role</i> column and we can go back to the
previous definition that did not have that column.
<br/><br/>
With this implementation our foreign key constraints all
work because we are not dealing with any views, and our table structure
is simpler because we have combined users and roles into one table.
Although we are putting roles into the <i>user</i> table, we do need
to remember that this is just a convenient fiction to simplify our
implementation because there are some situations in which we want to
treat users and roles the same.
But we must remember that, although we are storing them in the same table
and in some situations ignoring the difference between them,
if we forget about that difference and start treating them the same
in other situations we can easily start getting absurd behavior from
our system.
<br/><br/>
(I have a mental image of our legal system as having a <i>people</i>
table, and a <i>law</i> table with a foreign key to the <i>people</i> table.
At some early point, someone wanted some laws that applied to corporations
as well as people, so they said, "I know, let's just add an
<i>is_corporation</i> flag to the <i>people</i> table and put the
corporations in there,
then our foreign keys from the <i>law</i> table will still work
and we won't need to add a bunch more structure to our law schema!"
With the passage of time, law programmers who should have been paying
attention to the <i>is_corporation</i> flag started ignoring it more
and more often, until finally the law programmers were saying,
"Well, those corporations are in the people table, so they must be people."
If you are concerned that this kind of situation might happen to you,
you might not want to put roles into the <i>user</i> table.)
<br/><br/>
For the remainder of this discussion, we will use this third
alternate implementation approach.
<h3><a name="interlude">Interlude</a></h3>
In the above discussions, I have been assuming that the names of users,
actions, objects and roles are also their key values.
This implies that each of those names are unique.
Given that I have discussed a couple of implementations in which users
and roles have been mixed together, you might wonder whether it would
cause problems to add a user whose name is the same as a role.
In the above simple implementation the answer is "yes", and the system
would have to disallow that.
A real system is likely to be a bit more complex, using unique IDs as
primary keys rather than names.
The problem of having unique names thus gets moved from a database
issue to an application-level issue.
The system implementer must decide under what circumstances it is
acceptable to have duplicate names,
and there must be a way to distinguish those duplicates to someone
operating the system.
<br/><br/>
We have reached a point in the development of our authorization model
that is similar in power to many existing systems.
People who need more flexibility than this model provides might diverge
at this point into custom authorization systems with various forms of
exceptions and extensions that rapidly start adding complexity to
the model.
<br/><br/>
There are still a number of extensions we can make to our
authorization model that will improves its power while adding only
a small amount to the cognitive load of understanding how it all works.
Let's get back to our model and add some more power to it.
<h3><a name="tasks">Tasks</a></h3>
In the same way that we allow specifying a group of users having a role,
we add the ability to specify a group of actions, which we call a task.
The relation between tasks and actions is exactly analogous to the
relation between users and roles.
Each action can be assigned to multiple tasks,
a task can be assigned other tasks,
and an authorization grant can refer either to
an action or to a task.
<br/><br/>
Analogous with our second alternative implementation above,
in which we added an <i>is_role</i> column to the <i>user</i> table
and put roles into the <i>user</i> table,
for the equivalent addition of tasks we
add an <i>is_task</i> column to the <i>action</i> table,
add an <i>action_task</i> table with columns
<i>action</i> and <i>task</i> both being foreign key references to
the <i>action</i> table,
and add a <i>task_closure</i> view.
<pre name="hlcode" class="sql"
>create table action(
name varchar(32) primary key,
is_task boolean not null
);
create table action_task(
action varchar(32) not null,
task varchar(32) not null,
constraint FK_actiontask_action foreign key(action)
references action(name),
constraint FK_actiontask_task foreign key(task)
references action(name)
);
create view task_closure as
select action, a3.task as task from action_task as a0
join action_task as a1 on a0.task=a1.action or
(a0.action=a1.action and a0.task=a1.task)
join action_task as a2 on a1.task=a2.action or
(a0.action=a2.action and a0.task=a2.task)
join action_task as a3 on a2.task=a3.action or
(a0.action=a3.action and a0.task=a3.task)
;
</pre>
We expand our authorization query to look for tasks in the same way as we
expanded it to handle roles, with the same caveats about
hierarchy depth.
<pre name="hlcode" class="sql"
>-- authorized if count>0
select count(*) from auth_grant where
(user={user} or user in (select role from role_closure where user={user})) and
(action={action} or action in (select task from task_closure where action={action})) and
object={object};
</pre>
<h3><a name="domains">Domains</a></h3>
Roles and tasks give us the ability to group users and actions.
We complete the pattern by adding the ability to group objects
into groups that we call domains (not to be confused with internet
domain names).
As with the tasks example above,
we add the <i>is_domain</i> column to the <i>object</i> table,
create the <i>object_domain</i> table to allow defining groups of objects,
create the <i>domain_closure</i> view,
and modify the authorization function to check for either objects
or domains in the same way as we modified it to check for
either actions or tasks.
All of these steps are exactly analogous to what we did when
we added tasks.
<h3><a name="intermediate-summary">Intermediate Summary</a></h3>
Let's take stock of what our model looks like:
<ul>
<li>There are three dimensions: user, action, and object.
<li>The handling of the three dimensions is completely symmetric
(unless role activation is being used, in which case the
user dimension has that extra wrinkle).
<li>The application passes those three values to the authorization function,
which returns true if that operation is authorized, false if not.
<li>For each dimension, there is a grouping mechanism:
role for user group, task for action group, domain for object group.
<li>The grouping mechanism supports a hierarchy of groups, or more generally
a (directed acyclic) graph of groups (a partial ordering).
<li>To determine if a request should be authorized, take each dimension,
collect the closure of the groups for that dimension,
and look for a grant in which each dimension of the grant matches
any of the items in the closure for that dimension.
</ul>
The model presented above is easy to understand,
but despite its simplicity it is quite powerful.
Yet it does not suffice for everyone.
Let's see how we can continue to enhance it's power without
significantly increasing its complexity.
<h3><a name="times">Times, Periods and Schedules</a></h3>
In some systems it is desirable to allow some operations only at
specified times.
For example, one might want to allow users to log in to the system
only during their work shift.
<br/><br/>
We define another dimension, the <i>time</i> dimension,
and we define a time range as a <i>period</i>,
where a period is an interval of time such as 8AM to 5PM, or Sunday,
or 8AM to 5PM on weekdays.
We add the time dimension to our definition of an operation, so
when the application calls the authorization function, it must now
pass the current time as a fourth argument.
<br/><br/>
The dimensions we have defined previously are all discrete dimensions,
with only one matching value for each definition.
The time dimension is different in that it is a continuous dimension:
there are multiple time values that can match a period.
This makes the authorization function a little more difficult to
write, but it does not add much complexity to the user's conceptual model.
<br/><br/>
The other dimensions all have groups, so it would not add to the complexity
of the model to add groups of periods.
In fact, the model would be more complex if we did <i>not</i> add
groups of periods, as that would make this dimension different from
all the others in that aspect,
which would be an additional detail that the user
would have to factor into his mental model.
<br/><br/>
We add a group called <i>schedule</i>.
As with all the other groups, a period can be included in any number
of schedules, and schedules can contain other schedules.
When checking authorization,
we collect all the periods that match the current time
and the closure of all the schedules for those periods,
and we search for grants that include any of those in the <i>period</i> column.
<h3><a name="locations">Locations, Areas and Regions</a></h3>
By now the pattern should be pretty clear.
If the system requires other dimensions, they are easy to add
by following the same pattern.
By keeping to the pattern, the complexity of the model that the
user must work with to understand the system is kept low,
even when there is some small difference for the new dimension,
as there was for the <i>time</i> dimension when
compared to the three previously defined dimensions.
When there are small model extensions for a dimension,
as there was when we added the <i>time</i> dimension,
we can leverage that model concept when adding some other dimension.
<br/><br>
Location is a system-specific concept.
For some systems it might be a logical location,
such as "console", "secure terminal", or "dial up".
Since these are discrete values, it would suffice to have a
<i>location</i> table, group locations in <i>region</i>s,
and handle it in the same manner as the other discrete dimensions
such as <i>user</i>.
<br/><br/>
For other systems a location might mean a physical location specified by
one or more continuous values, such as latitude and longitude,
in which case we define an <i>area</i> analogously to a period,
where one area includes a range of locations.
The area might be defined with a center point and radius, it might be defined
with a bounding box, it might be defined as a polygon, using splines,
or in some other even more complex way.
As with periods, the complexity of the definition of an area has
an effect on the difficulty of implementing the authorization function
that has to determine whether a location is or is not in an area,
but has little effect on the complexity of the user's mental model
of the authorization.
For the user, it is sufficient to know that
a given location will be either contained in or not contained in an area,
and that grants are based on areas.
<br/><br/>
Our group for an area is a <i>region</i>, and it groups together
areas and other regions in the same way as the groups in the other dimensions.
<h3><a name="denials">Denials</a></h3>
The approach described above is essentially a "whitelist" approach,
which is the standard approach to authorization.
If an operation is listed in the <i>grant</i> table then it is allowed;
any operation which is not listed is not allowed.
<br/><br/>
It is also possible to use a "blacklist" approach:
rather than allowing what is listed and denying everything else,
we can deny what is listed and allow everything else.
In this case we would create a <i>denial</i> table that is exactly
like the <i>grant</i> table except that it contains operations to
be denied rather than operations to be allowed.
The authorization function would do the same search as before,
except that it would deny the operation if any matching records
were found, and allow the operation otherwise.
<br/><br/>
Using a blacklist approach to authorization as just described
is generally not recommended
(in fact the NIST RBAC standard specifically recommends against
"Negative permissions", although it does not outright disallow them).
Since the default action is to
allow an operation, if a new operation is added to the system
and through oversight the appropriate denials are not added, then
there is no protection for the new operations.
<h3><a name="exceptions">Exceptions</a></h3>
We can combine the original <i>grant</i> approach and the
<i>denial</i> approach described just above to give us the ability
to have both a whitelist and a blacklist.
We start with our original <i>grant</i> table approach,
following the recommended position that the default is to
deny any operation unless it is explicitly granted;
on top of that, we add the <i>denial</i> table as exceptions to the grants.
<br/><br/>
Our authorization function first looks in the <i>denial</i> table;
if a matching record is found, then the request is denied.
If no matching record is found, then the function looks in
the <i>grant</i> table; if a matching record is found, then
the request is granted; otherwise it is denied.
<br/><br/>
This allows the admin to think in terms of exceptions:
grant privileges to all of X, except for Y.
In some situations this allows expressing the intended grants
more simply than if one is restricted to just additive grants.
<br/><br/>
We could also flip the <i>grant</i> and <i>denial</i> tables around,
first looking in the <i>grant</i> table for a match, then looking
in the <i>denial</i> table for a match, then granting if nothing
is found.
As discussed in the previous section, this is not recommended,
but understanding that it is possible is conceptually useful,
and leads us to our last enhancement.
<h3><a name="prioritization">Prioritization</a></h3>
The structure of the <i>grant</i> and <i>denial</i> tables are
identical, and their contents are checked in the same way, with the
only difference being an inversion of the interpretation of the results
in one case as compared to the other.
We can easily combine both of these tables into a single <i>auth</i>
table that includes an additional <i>allow</i> column that is
true for all records from the <i>grant</i> table and <i>false</i> for
all the records from the <i>denial</i> table.
We can also add a <i>priority</i> column that we use to determine which
records we should attend to first.
<pre name="hlcode" class="sql"
>create table auth(
id integer auto_increment,
allow boolean not null default true,
priority integer not null default 0, -- higher values take precedence
user varchar(32) not null,
action varchar(32) not null,
object varchar(32) not null,
period varchar(32) not null,
area varchar(32) not null,
constraint FK_auth_user foreign key(user)
references user(name),
constraint FK_auth_action foreign key(action)
references action(name),
constraint FK_auth_object foreign key(object)
references object(name),
constraint FK_auth_period foreign key(period)
references period(name),
constraint FK_auth_area foreign key(area)
references area(name)
);
</pre>
If we define the priority value such that higher values are more
important than lower values, then we can get the same behavior as
described in the first part of the previous section
by setting the priority on all the <i>denial</i> records to 2 and
setting the priority on all the <i>grant</i> records to 1.
Our authorization function then looks in the <i>auth</i> table
for the matching record with the highest <i>priority</i> value
and looks at the <i>allow</i> value for that record.
<br/><br/>
If we wanted to get the (non-recommended) behavior as described at
the end of the previous section, we could do that by setting the
priority of all the <i>grant</i> records to 2 and setting the
priority of all the <i>denial</i> records to 1,
plus making the default behavior (when no matching rows are found)
to allow the operation.
<br/><br/>
Given this structure, we can of course put in records with any priority
value.
This allows building up a series of toggling exceptions, much as the way
leap years in the Gregorian calendar are
<a href="http://en.wikipedia.org/wiki/Leap_year#Algorithm">defined</a>
(each year has 365 days, except every 4th year is a leap year with 366 days,
except every 100 years is not a leap year, except every 400 years is
a leap year).
<br/><br/>
Since we can stack up alternating <i>grant</i> and <i>denial</i>
records, the only distinction between the "whitelist" and "blacklist"
approaches discussed earlier is the question of what the default is when no
matching records are found in the <i>auth</i> table
(the default for whitelisting is deny, the default for
blacklisting is grant).
Given that using a default of allow is not recommended,
we define the system to use a default of deny,
but we provide a way that the system can effectively be set up with
a default of allow if desired.
<br/><br/>
To simulate a default of allow, the admin can create a group for each
of the dimensions in our authorization model (user, action, etc)
that includes all elements of that dimension.
Thus there would be an AllUsers role, an AllActions task,
an AllObjects domain, etc.
The admin then creates a rule that includes all of these groups
with <i>allow</i> set to true and <i>priority</i> set to zero.
Since the rule has been defined to include all elements of every dimension,
it will always match every operation,
so there will never be a case where there are no matches and the
system default of deny is used.
Assuming all other <i>priority</i> values are greater than zero,
this rule will be the lowest priority,
so it will only have an effect if there are no other matches,
and thus it acts as the default.
<br/><br/>
As described above, there is one more potential ambiguity to resolve:
what happens if there are two rules with the same <i>priority</i>
but opposite <i>allow</i> values?
(Two rules with the same <i>priority</i> and the same <i>allow</i>
are not a problem, as they both give the same result.)
We resolve this ambiguity by defining the <i>denial</i> records to
take precedence over the <i>grant</i> records when they have
the same <i>priority</i> value.
This definition reduces nicely to the desired behavior for the
simplest denial+grant case when all records have the same priority.
<br/><br/>
Our authorization function thus looks for all matching records in the
<i>auth</i> table, sorts first by <i>priority</i> then by <i>allow</i>,
picks the first one, and uses its <i>allow</i> value to determine
whether to allow the operation.
If no matching records are found, the operation is not allowed.
<br/><br/>
Ignoring for now the more complicated portions of the WHERE clause
for selecting time and location,
here is our SQL statement for determining if an operation is authorized:
<pre name="hlcode" class="sql"
>-- The single selected value is true if authorized; if false or no records, not authorized
select allow from auth_grant where
(user={user} or user in (select role from role_closure where user={user})) and
(action={action} or action in (select task from task_closure where action={action})) and
(object={object} or object in (select domain from domain_closure where object={object}))
order by priority desc, allow asc
limit 1
</pre>
Adding prioritization like this adds a new concept to the authorization
model, but provides a good amount of additional power relative
to the additional mental load to understand the model.
However, creating well-structured rules using prioritization
is trickier that it seems at first glance.
It has the same essential problem as for the blacklist approach
described above:
mistakes in setting up
the conceptual layers of the different levels of prioritization
can result in unexpected security holes.
If you can figure out how to set up your authorizations using grants
only, without denials, you should do that.
But if the grant-only model is not sufficient,
then adding prioritization as described in this section is a reasonable
way to take the model to the next level of power -
just remember that you have to be more careful in how you
set up your rules.
<h3><a name="summary">Summary</a></h3>
With the addition of prioritization in the previous section,
our authorization model is complete. Let's review the complete model.
<ul>
<li>There are two kinds of dimensions: discrete and continuous.
<li>There are five dimensions: user, action, object, time and location.
<li>User, action and object are discrete;
time is continuous;
location can be either discrete or continuous,
depending on how the system defines it.
<li>Additional dimensions can be added if necessary, following
the pattern of the existing dimensions.
<li>The handling of every discrete dimension is completely symmetrical
with every other discrete dimension
(unless session-based role activation is included, in which
case the user dimension is a little different);
the handling of each continuous dimension is close to completely
symmetrical with the other continuous dimensions;
and there is a high level of symmetry between the discrete and
the continuous dimensions.
<li>The application passes a value for each dimension
to the authorization function.
This collection of dimension values is the operation for which
the application is requesting authorization.
The authorization function returns true if that operation is authorized,
false if not.
<li>For each continuous dimensions, there is a range defined as the basic
match:
period for time, area for location.
<li>For each dimension, there is a grouping mechanism:
role for user group, task for action group, domain for object group,
schedule for period group, region for area or location group.
<li>The grouping mechanism supports a hierarchy of groups, or more generally
a (directed acyclic) graph of groups.
<li>There is a set of rules that is used to determine whether an operation
is authorized.
Each rule includes a set of comparison values, one for each dimension,
a priority,
and an <i>allow</i> flag that tells whether that rule specifies
that authorization for a matching operation should be granted or denied.
<li>To determine if a request is authorized,
take the value for each dimension in the request,
collect the closure of the groups for that value,
and collect the records in which each dimension of the grant matches
any of the items in the closure for that dimension.
Pick the record with the highest priority, giving preference to
<i>deny</i> records over <i>grant</i> records, and use the <i>allow</i>
value of that record to determine whether to authorize or deny
the operation.
If no matching records are found, the operation is denied.
</ul>
This conceptual model is no longer trivial, but the above rules are
still relatively concise and easy to understand.
The model is general enough and powerful enough that it should be
suitable for a wide variety of applications.
<br/><br/>
In our model the application passes in a set of values to the authorization
function, which uses its abstractions (in the form of groups) and rules
(in the form of prioritization) to determine whether or not to grant
permission for an operation.
If we need more power,
the application can pass in additional information, whether it is
additional attribute information about the user, the environment, or
other aspects of the operation,
and the authorization system can apply even more complex rules.
This is the approach used by
Attribute-Based Access Control,
with a rules engine used in place of the mechanisms described here.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com2tag:blogger.com,1999:blog-7045524330253482541.post-9484115686258936422012-04-30T22:40:00.000-07:002012-04-30T22:40:57.885-07:00Git Rebase Across Many CommitsNot all git merge conflicts are real.
<h3>Contents</h3>
<ul>
<li><a href="#scenario">The Scenario</a>
<li><a href="#problem">The Problem</a>
<li><a href="#solution">The Solution</a>
</ul>
<a name="scenario"></a>
<h3>The Scenario</h3>
In both my personal and my work projects I prefer to use
<a href="http://book.git-scm.com/4_rebasing.html"><i>git rebase</i></a>
to keep my commit histories simple and readable.
To make this work in a team setting, we never work on the master
branch, instead always working on a feature branch in our local repositories.
Our process flow looks something like this:
<br/>
<input id="hlButton" type="submit" value="Highlight Syntax" onclick="highlightSyntaxFlipButton()"/>
<pre name="hlcode" class="bash"
>$ git branch feature #create the working branch
$ git checkout feature #do all development work on that branch
#Edit files, etc.
$ git commit -m "Implement Feature"
#Repeat the above as desired during development.
#When ready to merge to master, do the following:
$ git checkout master
$ git pull #update master from shared repository
$ git checkout feature
$ git rebase master #optionally with -i if squashing is desired
$ git checkout master
$ git merge feature
$ git push origin master
$ git branch -d feature
</pre>
Because we never use our local master branch for development, the
<a href="http://book.git-scm.com/3_distributed_workflows.html">
<i>git pull</i></a> on master is always a
<a href="http://nathaniel.themccallums.org/2010/10/18/using-git-fast-forward-merging-to-keep-branches-in-sync/">fast-forward merge</a>.
Likewise, because we have just rebased the feature branch against the master
right before we merge that feature branch back into master, that merge is also
always a fast-forward merge.
Looking at it another way, we don't have any merge conflicts when
updating or merging master because we resolve all of the merge
conflicts when we rebase the feature branch against the latest master.
<a name="problem"></a>
<h3>The Problem</h3>
At work, we have a large codebase and a handful of active developers who
typically merge feature branches to the master using the above workflow
multiple times each day. Sometimes somebody has a feature branch that
takes a long time to finish, so that between the time that branch was
started and the time it is ready to go into master, there may
have been 40 or 50 other commits made to master.
In general in this situation we will occasionally rebase our local
feature branch against the latest master a few times during feature
development, but inevitably there are occasions when a large rebase
across many commits ends up being done.
<br><br/>
Even if there are many commits on the master branch,
if none of those commits touched any of the same code as the commits
on the feature branch, then there should be no merge conflicts when
rebasing the feature branch against the updated main branch.
However, in my experience this has not always been the case.
Sometimes <i>git rebase</i> reports merge conflicts when I think there
should not be any.
Since I don't generally know exactly what code the other team members have
edited, I can't immediately tell if the merge conflicts make sense.
<br/><br/>
The normal advice for how to handle merge conflicts is to edit the named
file, look for the conflict markers, inspect the conflicting code fragments,
determine what to keep, edit out what is not being kept along with the
conflict markers, <i>git add</i> the repaired file, and
<i>git rebase --continue</i> to let it tell you about the next merge conflict.
<br/><br/>
That's a lot of work, and it might all be completely unnecessary.
<a name="solution"></a>
<h3>The Solution</h3>
It seems that git sometimes just gets confused when doing a rebase across
a large number of commits.
Sometimes if you rebase in smaller steps, git will happily rebase each
smaller step with no merge conflicts, until you have stepped all the way
up to the latest master, at which point your rebase is done.
<br/><br/>
You could rebase against every single commit and work your way up to master,
but that, too, is a lot of work.
Here's what I do when the initial rebase of the feature branch against
the latest master tells me there are merge conflicts.
<br/><br/>
When the initial <i>git rebase</i> reports a merge conflict,
I immediately do <i>git rebase --abort</i> to undo that rebase attempt.
Using <i>gitk --all</i> to view the commit tree, which lets me see
the master branch and the commit at which my feature branch branches
off the master branch, I select a commit on the master branch
about half way between those two commits.
I copy the commit ID and paste it into a rebase command that looks
something like this:
<pre name="hlcode" class="bash"
>$ git rebase 8bc85584989e4435c2d98b13447bcab37648ba7f
</pre>
If this rebase reports no merge conflicts, then I try rebasing
against master and repeat the process.
<br/><br/>
If there are merge conflicts, then I abort the rebase and pick another
commit half way again to the branch point.
I repeat this until either the rebase succeeds or I am trying to
rebase across a single commit.
At that point, if there are still merge conflicts, they are real
and I address them in the normal way.
Since the conflict is only across a single commit, it is easier to
see the cause of the conflict and to resolve it.
<br/><br/>
After resolving the conflict across that one commit,
I go back to the first step and try rebasing against master again,
repeating the process.
<br/><br/>
I have followed this process a number of times.
I think that a majority of these times I binary-divide my commits
a few times and end up piecemeal stepping through the commits
until I have rebased against master without ever having to resolve
any conflicts.
The other times I typically have to resolve one or two small conflicts,
after which I can rebase against master.
<br/><br/>
The next time you do a rebase across more than one commit and git
tells you there are merge conflicts, try this approach.
You might save yourself a lot of work.Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com2tag:blogger.com,1999:blog-7045524330253482541.post-597883470345783092011-12-08T21:36:00.001-08:002011-12-08T21:38:20.774-08:00Levels of ExpertiseAn attempt to improve the objectivity of skill self-ratings.
<h3>Contents</h3>
<ul>
<li><a href="#discussion">Discussion</a>
<li><a href="#scale">Scale</a>
<li><a href="#references">References</a>
</ul>
<a name="discussion"></a>
<h3>Discussion</h3>
We are often asked to rate things on a scale, typically 1 to 5 or 1 to 10.
Rarely is there an attempt to define what those different numbers mean.
From a statistician's point of view, this makes the values useful for the
sole purpose of comparing a single individual's ratings against other ratings
of that individual.
In particular, without a good definition of what the various levels mean,
I don't see how there can be any effective communication from one person to
another of the meaning of such a rating.
<br/><br/>
When my doctor asks me to tell him how much something hurts on a scale
of 1 to 10, I have no idea what information he expects to get when I say
"3" or "7".
<br/><br/>
I once asked an acquaintance to rate, on a scale of 1 (bad) to 10 (good),
a movie he had just seen. He said it was a 9. I was suspicious of this
answer, so I asked him how he would rate Star Wars, which I knew to be
his all-time favorite movie, on the same 1-to-10 scale. He said 12.
<br/><br/>
I personally consider it an aspect of innumeracy, but people often try to
emphasize something by using numbers that are outside of the valid range.
We may chuckle when Nigel says he likes his amp better because it
<a href="http://www.spinaltapfan.com/atozed/TAP00160.HTM">
goes to 11</a>, but how often have you heard someone talking in all
seriousness about putting in a "110% effort"?
What does that actually
<a href="http://www.bepress.com/jqas/vol7/iss2/2/">mean</a>?
How would you know if someone were
<a href="http://www.fakingnews.com/2010/05/sc-strikes-down-demands-of-110-percent-from-employees-as-unconstitutional/">putting in</a>
110% versus 100%?
If 110% is a valid number, then presumably
<a href="http://blog.moneysavingexpert.com/2011/07/15/you-cannot-give-110-effort-%E2%80%93-an-explosion-of-pent-up-nerd-rage/">so is 120%</a>,
so anyone
suggesting a <a href="http://edge.ebaumsworld.com/mediaFiles/picture/460723/982951.png">mere 110%</a>
is clearly not asking for enough effort.
<br/><br/>
People tend to
<a href="http://www.apa.org/monitor/feb03/overestimate.aspx">overestimate</a>
<a href="http://en.wikipedia.org/wiki/Illusory_superiority">how good</a>
they are at all sorts of things,
including cognitive, social and physical skills.
If we all overrate ourselves by the same amount, I suppose that could all cancel
out and you could still compare people's ratings -
but without knowing <i>a priori</i>
what their ratings should be, we don't know how much they might be
overrating themselves.
<br/><br/>
When people consider their own expertise, it is common for those with less
expertise to overvalue themselves more than people with more expertise.
With more expertise comes more awareness of what one could do better.
Einstein
<a href="http://www.notable-quotes.com/e/einstein_albert.html">said</a>,
"As our circle of knowledge expands, so does the circumference of darkness surrounding it."
Relative beginners easily fall into the
<a href="http://www2.merriam-webster.com/cgi-bin/mwdictsn?va=sophomoric">Sophomore Illusion</a>
of thinking they
know a lot because the circumference of their knowledge is not yet large
enough for them to recognize the size of the surrounding darkness.
<br/><br/>
In 1989, psychologist
<a href="http://www.psy.cmu.edu/faculty/hayes/index.html">John Hayes</a>
at <a href="http://www.cmu.edu/">Carnegie Mellon University</a>
identified what is now called the "ten-year rule"
(although there are
<a href="http://blog.enkerli.com/2008/12/23/expertise-quest/">earlier commenters</a>,
including <a href="http://www.cs.cmu.edu/simon/">Herbert Simon</a>,
who was also at CMU).
As
<a href="http://www.its.caltech.edu/~len/">Leonard Mlodinow</a>
says in
"<a href="http://books.google.com/books?id=UJxRLCq9l3IC">The Drunkard's Walk</a>",
"Experts often speak of the
'<a href="http://www.selfgrowth.com/articles/10-Year_Rule_to_Become_an_Expert.html">ten</a>-<a href="http://rogercostello.wordpress.com/category/10-year-rule-for-exceptional-performance/">year</a>
<a href="http://creativity.netslova.ru/Ten-year_rule.html">rule</a>,'
meaning that it takes at least a decade of hard work, patience and striving
to become highly successful in most endeavors." (links mine)
The ten-year rule is related to the idea that it takes about 10,000 hours
of practice at something to become an expert; with 5 hours of practice
per business day and 200 business days per year,
it would take ten years to rack up that many hours.
If you find yourself thinking how wonderfully expert you are in something
that you have practiced for only a few years, perhaps you should
consider the ten-year rule and temper your evaluation.
<br/><br/>
Given that people are so bad at these ratings, it seems to me that the only
way to get any useful information from someone when asking this kind of
self-rating question is to have an objective definition
of what each level means.
<br/><br/>
One way to think about a scale is by how many people fall into each level.
There are currently
<a href="http://www.nytimes.com/2011/11/01/world/united-nations-reports-7-billion-humans-but-others-dont-count-on-it.html">7 billion people</a>
in the world,
or almost 10 to the 10th power.
This conveniently maps to a logarithmic scale from 0 to 10,
allowing us to define eleven levels starting with level 0
containing all approximately 10 billion people in the world
and with each higher level having one tenth
the number of people as the level just below it.
If the descriptions of a level are hard to interpret,
perhaps the size of that level will help give an indication
of whether a person should be rated there.
<br/><br/>
Years ago, during a job interview, I was asked to rate my level
of expertise in various subjects, such as programming languages
and development tools.
This was not an unusual question, I had been asked this question
before and have been asked it since.
What was different that time was that the interviewer included
a scale with some relatively objective descriptions for determining
level of expertise.
I rather liked the scale, so
although I don't recall the exact definition of his levels,
I have tried to reproduce that concept here,
using descriptions somewhat similar to those given by that interviewer.
Unfortunately, I don't remember who introduced that scale
to me, so I am unable to give credit.
<br/><br/>
There are many reasons one might want a scale of expertise,
including rating potential employees or creating a summary
of the amount of expertise within a company.
The scale I present here is intended to be very general;
given its logarithmic nature that can include the
entire world population, it is capable of allowing comparison of
expertise across everyone in the world.
You might think that would make it suboptimal for
rating (potential) employee expertise,
but I think there are enough levels to make it useful for that purpose.
<a name="scale"></a>
<h3>Scale</h3>
The scale below includes the following columns:
<ul>
<li>Level: a number for the level, from 0 to 10,
with 10 being the highest level of expertise.
<li>Name: a name for the level.
These are taken from a set of expertise level names proposed by the
<a href="https://we.riseup.net/tsolife+tsolife-goes-ruby/levels-of-expertise">
Traveling School of Life</a>.
My use of them probably doesn't quite match their intent,
but I liked the names and thought the ten words matched my
levels pretty well, so I applied them to my levels
and added "ignorant" for level 0.
<li>Description: a brief description of the level.
The descriptions are worded as if for a technical tool;
for application to other areas or concepts, modify accordingly.
Comments referring to companies assume a large company (10,000+ people)
with large divisions (1000+ people);
being a company-wide guru in a company with 100 people
might not get you past level 6.
<li>Size: the approximate number of people expected to be at that level
worldwide.
As mentioned above, this is a simple logarithmic scale.
The number of people in a level is 10<sup>10-L</sup> where
L is the level number.
<li>Practice: the approximate amount of practice that could be required
to reach that level of expertise.
Putting in that many hours does not guarantee reaching that level,
and reaching that level does not necessarily require
putting in that many hours.
The conversion factors are 1,000 hours per year or 5 hours per day.
</ul>
All of these different factors are rough estimates,
not intended as absolutes but merely as guidelines to help
people rank themselves in a way that allows for more meaningful results.
I don't have any research to show how well my guesses about
Description, Size and Practice correlate;
if anyone knows of something along those lines,
that would be interesting.
<br/><br/>
<a name="scale-table"></a>
<table class="scale-table" border=1>
<tr>
<th>Level</th>
<th>Name</th>
<th width="55%">Description</th>
<th>Size</th>
<th>Practice</th>
</tr>
<tr><td>0</td>
<td>ignorant</td>
<td>I have never heard of it.</td>
<td>10,000,000,000</td>
<td>none</td>
</tr>
<tr><td>1</td>
<td>interested</td>
<td>I have heard a little about it, but don't know much.</td>
<td>1,000,000,000</td>
<td>1 hour</td>
</tr>
<tr><td>2</td>
<td>pursuing</td>
<td>I have read an article or two about it and understand the basics
of what it is, but nothing in depth.</td>
<td>100,000,000</td>
<td>1 day (5 hours)</td>
</tr>
<tr><td>3</td>
<td>beginner</td>
<td>I have read an in-depth article, primer, or how-to book,
and/or have played with it a bit.</td>
<td>10,000,000</td>
<td>1 week (25 hours)</td>
</tr>
<tr><td>4</td>
<td>apprentice</td>
<td>I have used it for at least a few months and have successfully
completed a small project using it.</td>
<td>1,000,000</td>
<td>3 months (250 hours)</td>
</tr>
<tr><td>5</td>
<td>intermediate</td>
<td>I have used it for a year or more on a daily or regular basis,
and am comfortable using it in moderately complex projects.</td>
<td>100,000</td>
<td>1 year (1,000 hours)</td>
</tr>
<tr><td>6</td>
<td>advanced</td>
<td>I have been using it for many years, know all of the basic aspects,
and am comfortable using it as a key element in complex projects.
People in my group come to me with their questions.</td>
<td>10,000</td>
<td>5 years (5,000 hours)</td>
</tr>
<tr><td>7</td>
<td>accomplished</td>
<td>I am a local expert, with ten or more years of solid experience.
People in my division come to me with their questions.</td>
<td>1,000</td>
<td>10 years (10,000 hours)</td>
</tr>
<tr><td>8</td>
<td>master</td>
<td>I am a company-wide guru with twenty or more years of experience;
people from other divisions come to me with their questions.</td>
<td>100</td>
<td>20 years (20,000 hours)</td>
</tr>
<tr><td>9</td>
<td>grandmaster</td>
<td>I am a recognized international authority on it.</td>
<td>10</td>
<td>30 years (30,000 hours)</td>
</tr>
<tr><td>10</td>
<td>great-grandmaster</td>
<td>I created it, and am the number 1 expert in the world.</td>
<td>1</td>
<td>50 years (50,000 hours)</td>
</tr>
</table>
<a name="references"></a>
<h3>References</h3>
Other scales of expertise:
<ul>
<li>Ted Neward describes four levels in his
<a href="http://blogs.tedneward.com/2008/08/14/The+NeverEnding+Debate+Of+Special
ist+V+Generalist.aspx">
post of August 16</a>:
Apprentice, Journeyman, Master, Adept.
<li><a href="http://www.sld.demon.co.uk/dreyfus.pdf">The Dreyfus model of skill acquisition</a>:
Novice, (Advanced) Beginner, Competent, Proficient, Expert.
<li><a href="http://www.performancemattersinc.com/posts/stages-of-expertise/">
Paul Schempp's take</a> on Dreyfus's five levels,
with "Capable" rather than "Advanced Beginner".
<li>The <a href="http://en.wikipedia.org/wiki/Four_stages_of_competence">
Four Stages of Competence</a> of Thomas Gordon:
Unconscious Incompetence, Conscious Incompetence,
Conscious Competence, Unconscious Competence.
And how they might apply to
<a href="http://devthought.com/2009/02/24/the-four-stages-of-programming-competence/">programming</a>.
</ul>
Other articles:
<ul>
<li><a href="http://norvig.com/21-days.html">
Teach Yourself Programming in Ten Years</a>,
by Peter Norvig, 2001.
<li><a href="http://www.ascue.org/files/proceedings/2009/p52.pdf">
From Novice to Expert: Harnessing the Stages of
Expertise Development in the Online World</a>,
by Douglas A Kranach, in the 2009 ASCUE Proceedings.
<li><a href="http://www.scribd.com/doc/52366153/16/Expertise">
A College Student's Guide to Computers in Education</a>,
by Dave Moursund;
Chapter 3,
"Expertise and Problem Solving", page 25,
with a discussion of expertise as related to hours of study and practice.
<li>An 2006 excerpt from
<a href="http://seedmagazine.com/content/article/how_to_get_to_carnegie_hall/">
Jonah Lehrer</a>
on the importance of practice for Mozart and Tiger Woods.
</ul>Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com11tag:blogger.com,1999:blog-7045524330253482541.post-79011780833849448582011-07-28T15:36:00.000-07:002011-07-28T15:36:51.272-07:00Debugging Scala Parser CombinatorsTwo simple mechanisms for debugging parsers written using Scala's parser combinators.<br />
<br />
<h3>Contents</h3><ul><li><a href="#intro">Introduction</a><br />
<li><a href="#example">Example Parser</a><br />
<li><a href="#calling-parsers">Calling Individual Parsers</a><br />
<li><a href="#tracing">Tracing</a><br />
<li><a href="#updated-example">Updated Example</a><br />
</ul>
<a name="intro"></a>
<h3>Introduction</h3>In a recent comment on my 2008
<a href="http://jim-mcbeath.blogspot.com/2008/09/scala-parser-combinators.html">
blog post</a>
about Scala's parser combinators,
a reader asked how one might go about debugging such a parser.
As
<a href="http://www.quanttec.com/fparsec/users-guide/debugging-a-parser.html">
one post</a>
says,
"Debugging a parser implemented with the help of a combinator library
has its special challenges."
You may have trouble
<a href="http://lorgonblog.wordpress.com/2007/12/12/monadic-parser-combinators-part-seven/">
setting breakpoints</a>,
and stack traces can be
difficult to interpret.
<br/><br/>
The two techniques I show here may not provide you with the kind of
visibility you might be used to when single-stepping through problem code,
but I hope they provide at least a little more visibility than you might
otherwise have.
<a name="example"></a>
<h3>Example Parser</h3>
As an example parser I will use an integer-only version of the
four-function arithmetic parser I
built for my 2008 parser combinator post.
The code consists of a set of case classes to represent the parsed results
and a parser class that contains the parsing rules and a few helper methods.
You can copy this code into a file and either compile it or load it into
the Scala <a href="http://www.scala-lang.org/node/2097">REPL</a>.
<br/><br/>
<input id="hlButton" type="submit" value="Highlight Syntax" onclick="highlightSyntaxFlipButton()"/>
<pre name="hlcode" class="scala"
>import scala.util.parsing.combinator.syntactical.StandardTokenParsers
sealed abstract class Expr {
def eval():Int
}
case class EConst(value:Int) extends Expr {
def eval():Int = value
}
case class EAdd(left:Expr, right:Expr) extends Expr {
def eval():Int = left.eval + right.eval
}
case class ESub(left:Expr, right:Expr) extends Expr {
def eval():Int = left.eval - right.eval
}
case class EMul(left:Expr, right:Expr) extends Expr {
def eval():Int = left.eval * right.eval
}
case class EDiv(left:Expr, right:Expr) extends Expr {
def eval():Int = left.eval / right.eval
}
case class EUMinus(e:Expr) extends Expr {
def eval():Int = -e.eval
}
object ExprParser extends StandardTokenParsers {
lexical.delimiters ++= List("+","-","*","/","(",")")
def value = numericLit ^^ { s => EConst(s.toInt) }
def parens:Parser[Expr] = "(" ~> expr <~ ")"
def unaryMinus:Parser[EUMinus] = "-" ~> term ^^ { EUMinus(_) }
def term = ( value | parens | unaryMinus )
def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = {
level match {
case 1 =>
"+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } |
"-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) }
case 2 =>
"*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } |
"/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) }
case _ => throw new RuntimeException("bad precedence level "+level)
}
}
val minPrec = 1
val maxPrec = 2
def binary(level:Int):Parser[Expr] =
if (level>maxPrec) term
else binary(level+1) * binaryOp(level)
def expr = ( binary(minPrec) | term )
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(expr)(tokens)
}
def apply(s:String):Expr = {
parse(s) match {
case Success(tree, _) => tree
case e: NoSuccess =>
throw new IllegalArgumentException("Bad syntax: "+s)
}
}
def test(exprstr: String) = {
parse(exprstr) match {
case Success(tree, _) =>
println("Tree: "+tree)
val v = tree.eval()
println("Eval: "+v)
case e: NoSuccess => Console.err.println(e)
}
}
//A main method for testing
def main(args: Array[String]) = test(args(0))
}
</pre>
In the <code>ExprParser</code> class, the lines up to and including the
definition of the <code>expr</code> method define the parsing rules,
whereas the methods from <code>parse</code> onwards are helper methods.
<a name="calling-parsers"></a>
<h3>Calling Individual Parsers</h3>
In our example parser we can easily ask it to parse a string by calling
our <code>ExprParser.test</code> method, which parses the string using our
<code>parse</code> method, prints the resulting parse, and
(if the parse was successful) evaluates the parse tree and prints that value.
<br/><br/>
The last line of <code>parse</code>
parses a string using our expression parser:
<pre name="hlcode" class="scala"
>phrase(expr)(tokens)
</pre>
<code>phrase</code> is a method in
<a href="http://www.scala-lang.org/api/current/scala/util/parsing/combinator/syntactical/StandardTokenParsers.html">
<code>StandardTokenParsers</code></a>
that parses an input stream using the specified parser.
The only thing special about our <code>expr</code> method is that we
happen to have selected it as our top-level parser -
but we could just as easily have picked one of our other parsers
as our top-level parser.
<br/><br/>
Let's add another version of the <code>test</code> method that lets us
specify which parser to use as the top-level parser.
We want to print out the results in the same way as for the existing
<code>test</code> method, so we first
refactor that existing method:
<pre name="hlcode" class="scala"
>def test(exprstr: String) =
printParseResult(parse(exprstr))
def printParseResult(pr:ParseResult[Expr]) = {
pr match {
case Success(tree, _) =>
println("Tree: "+tree)
val v = tree.eval()
println("Eval: "+v)
case e: NoSuccess => Console.err.println(e)
}
}
</pre>
Now we add a new <code>parse</code> method that accepts a parser as
an argument, and we call that from our new <code>test</code> method:
<pre name="hlcode" class="scala"
>def parse(p:Parser[Expr], s:String) = {
val tokens = new lexical.Scanner(s)
phrase(p)(tokens)
}
def test(p:Parser[Expr], exprstr: String) =
printParseResult(parse(p,exprstr))
</pre>
We can run the Scala REPL, load our modified file using the ":load" command,
then manually call the top-level parser by calling our <code>test</code>
method.
To reduce typing, we import everything from <code>ExprParser</code>.
In the examples below, text in <b>bold</b> is what we type,
the rest is printed by the REPL.
<pre name="hlcode" class="scala"
>scala> <b>import ExprParser._</b>
import ExprParser._
scala> <b>test("1+2")</b>
Tree: EAdd(EConst(1),EConst(2))
Eval: 3
scala> <b>test("1+2*3")</b>
Tree: EAdd(EConst(1),EMul(EConst(2),EConst(3)))
Eval: 7
scala> <b>test("(1+2)*3")</b>
Tree: EMul(EAdd(EConst(1),EConst(2)),EConst(3))
Eval: 9
</pre>
We can also call the <code>test</code> method that takes a parser as an
argument, allowing us to specifically test one particular parsing rule
at a time.
If we pass in <code>expr</code> as the parser, we will get the same
results as above;
but if we pass in a different parser, we may get different results.
<pre name="hlcode" class="scala"
>scala> <b>test(expr,"1+2*3")</b>
Tree: EAdd(EConst(1),EMul(EConst(2),EConst(3)))
Eval: 7
scala> <b>test(binary(1),"1+2*3")</b>
Tree: EAdd(EConst(1),EMul(EConst(2),EConst(3)))
Eval: 7
scala> <b>test(binary(2),"1+2*3")</b>
[1.2] failure: ``/'' expected but `+' found
1+2*3
^
scala> <b>test(parens,"1+2")</b>
[1.1] failure: ``('' expected but 1 found
1+2
^
scala> <b>test(parens,"(1+2)")</b>
Tree: EAdd(EConst(1),EConst(2))
Eval: 3
scala> <b>test(parens,"(1+2)*3")</b>
[1.6] failure: end of input expected
(1+2)*3
^
</pre>
<a name="tracing"></a>
<h3>Tracing</h3>
If you have a larger parser that is not behaving and you are not quite
sure where the problem lies, it can be tedious to directly call
individual parsers until you find which one is misbehaving.
Being able to trace the progress of the whole parser running on an
input known to cause the problem might be helpful, but sprinkling
<code>println</code> statements throughout your parser can be tricky.
This section provides an approach that allows you to do some tracing
with minimal changes to your code.
The output can get pretty verbose, but
at least this will give you a starting point from which you may be
able to devise your own improved debugging.
<br/><br/>
The idea behind this approach is to wrap some or all of the individual
parsers in a debugging parser that delegates its <code>apply</code> action
to the wrapper parser, but that prints out some debugging information.
The <code>apply</code> action is called during the act of parsing.
<br/><br/>
<b>Note:</b> this code relies on the fact that the code
for the various combinators in
the <code>Parser</code> class in Scala's
<code>StandardTokenParsers</code>
(which is implemented as an inner class in
<code>scala.util.parsing.combinator.Parsers</code>)
does not override any <code>Parser</code>
method other than <code>apply</code>.
<br/><br>
This code could be added directly to the <code>ExprParser</code> class,
but it is presented here as a separate class to make it easier to reuse.
Add this <code>DebugStandardTokenParsers</code> class
to the file containing <code>ExprParsers</code>.
<pre name="hlcode" class="scala"
>trait DebugStandardTokenParsers extends StandardTokenParsers {
class Wrap[+T](name:String,parser:Parser[T]) extends Parser[T] {
def apply(in: Input): ParseResult[T] = {
val first = in.first
val pos = in.pos
val offset = in.offset
val t = parser.apply(in)
println(name+".apply for token "+first+
" at position "+pos+" offset "+offset+" returns "+t)
t
}
}
}
</pre>
The <code>Wrap</code> class provides the hook into the <code>apply</code>
method that we need in order to print out our trace information as the
parser runs.
Once this class is in place, we modify <code>ExprParser</code> to
inherit from it rather than from <code>StandardTokenParsers</code>:
<pre name="hlcode" class="scala"
>object ExprParser extends DebugStandardTokenParsers { ... }
</pre>
So far we have not changed the behavior of the parser, since we have not
yet wired in the <code>Wrap</code> class.
To do so, we can take any of the existing parsers and wrap it in a
<code>new Wrap</code>.
For example, with the top-level <code>expr</code> parser
we could do this,
with the added code highlighted in <b>bold</b>:
<pre name="hlcode" class="scala"
>def expr = <b>new Wrap("expr",</b> ( binary(minPrec) | term ) <b>)</b>
</pre>
We can make this a bit easier to edit and read by using implicits.
In <code>DebugStandardTokenParsers</code> we add this method:
<pre name="hlcode" class="scala"
>implicit def toWrapped(name:String) = new {
def !!![T](p:Parser[T]) = new Wrap(name,p)
}
</pre>
Now we can wrap our <code>expr</code> method like this:
<pre name="hlcode" class="scala"
>def expr = <b>"expr" !!!</b> ( binary(minPrec) | term )
</pre>
If you don't like using <code>!!!</code> as an operator, you are free
to pick something more to your taste, or you can leave out the implicit
and just use the <code>new Wrap</code> approach.
<br/><br/>
At this point you must modify your source code by adding the above syntax
to each parsing rule that you want to trace.
You can go through and do them all, or you can just pick out the ones
you think are the most likely culprits and wrap those.
Note that you can wrap any parser this way, including those that appear
as pieces in the middle of other parsers.
The following example shows how some of the parsers in the <code>term</code>
and <code>binaryOp</code> methods can be wrapped:
<pre name="hlcode" class="scala"
> def term = <b>"term" !!!</b> ( value | <b>"term-parens" !!!</b> parens | unaryMinus )
def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = {
level match {
case 1 =>
<b>"add" !!!</b> "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } |
<b>"sub" !!!</b> "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) }
case 2 =>
<b>"mul" !!!</b> "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } |
<b>"div" !!!</b> "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) }
case _ => throw new RuntimeException("bad precedence level "+level)
}
}
</pre>
Assuming we have wrapped the <code>expr</code>, <code>term</code> and
<code>binaryOp</code> methods as in the above examples, here is what the
output looks like for a few tests.
As in the previous REPL example, user input is in <b>bold</b>.
If you are using the REPL and reload the file, remember to
run <code>import ExprParser._</code> again to pick up the
newer definitions.
<pre name="hlcode" class="scala"
>scala> <b>test("1")</b>
term.apply for token 1 at position 1.1 offset 0 returns [1.2] parsed: EConst(1)
plus.apply for token EOF at position 1.2 offset 1 returns [1.2] failure: ``+'' expected but EOF found
1
^
minus.apply for token EOF at position 1.2 offset 1 returns [1.2] failure: ``-'' expected but EOF found
1
^
expr.apply for token 1 at position 1.1 offset 0 returns [1.2] parsed: EConst(1)
Tree: EConst(1)
Eval: 1
scala> <b>test("(1+2)*3")</b>
term.apply for token 1 at position 1.2 offset 1 returns [1.3] parsed: EConst(1)
plus.apply for token `+' at position 1.3 offset 2 returns [1.4] parsed: +
term.apply for token 2 at position 1.4 offset 3 returns [1.5] parsed: EConst(2)
plus.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``+'' expected but `)' found
(1+2)*3
^
minus.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``-'' expected but `)' found
(1+2)*3
^
expr.apply for token 1 at position 1.2 offset 1 returns [1.5] parsed: EAdd(EConst(1),EConst(2))
term-parens.apply for token `(' at position 1.1 offset 0 returns [1.6] parsed: EAdd(EConst(1),EConst(2))
term.apply for token `(' at position 1.1 offset 0 returns [1.6] parsed: EAdd(EConst(1),EConst(2))
term.apply for token 3 at position 1.7 offset 6 returns [1.8] parsed: EConst(3)
plus.apply for token EOF at position 1.8 offset 7 returns [1.8] failure: ``+'' expected but EOF found
(1+2)*3
^
minus.apply for token EOF at position 1.8 offset 7 returns [1.8] failure: ``-'' expected but EOF found
(1+2)*3
^
expr.apply for token `(' at position 1.1 offset 0 returns [1.8] parsed: EMul(EAdd(EConst(1),EConst(2)),EConst(3))
Tree: EMul(EAdd(EConst(1),EConst(2)),EConst(3))
Eval: 9
scala> <b>test(parens,"(1+2)")</b>
term.apply for token 1 at position 1.2 offset 1 returns [1.3] parsed: EConst(1)
mul.apply for token `+' at position 1.3 offset 2 returns [1.3] failure: ``*'' expected but `+' found
(1+2)
^
div.apply for token `+' at position 1.3 offset 2 returns [1.3] failure: ``/'' expected but `+' found
(1+2)
^
add.apply for token `+' at position 1.3 offset 2 returns [1.4] parsed: +
term.apply for token 2 at position 1.4 offset 3 returns [1.5] parsed: EConst(2)
mul.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``*'' expected but `)' found
(1+2)
^
div.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``/'' expected but `)' found
(1+2)
^
add.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``+'' expected but `)' found
(1+2)
^
sub.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``-'' expected but `)' found
(1+2)
^
expr.apply for token 1 at position 1.2 offset 1 returns [1.5] parsed: EAdd(EConst(1),EConst(2))
Tree: EAdd(EConst(1),EConst(2))
Eval: 3
</pre>
As you can see, even for these very short input strings
the output is pretty verbose.
It does, however, show you what token it is trying to parse
and where in the input stream that token is, so by paying attention
to the position and offset numbers you can see where it is backtracking.
<br/><br/>
When you have found the problem and are done debugging, you can remove
the <code>DebugStandardTokenParsers</code> class and take out all of the
<code>!!!</code> wrapping operations, or you can leave everything in place
and disable the wrapper output by changing the
definition of the implicit <code>!!!</code> operator to this:
<pre name="hlcode" class="scala"
>def !!![T](p:Parser[T]) = p
</pre>
Or, if you want to make it possible to enable debugging output later,
change <code>!!!</code> to return either <code>p</code> or
<code>new Wrap(p)</code> depending on some debugging configuration value.
<a name="updated-example"></a>
<h3>Updated Example</h3>
Below is the complete program with all of the above changes.
<pre name="hlcode" class="scala"
>import scala.util.parsing.combinator.syntactical.StandardTokenParsers
sealed abstract class Expr {
def eval():Int
}
case class EConst(value:Int) extends Expr {
def eval():Int = value
}
case class EAdd(left:Expr, right:Expr) extends Expr {
def eval():Int = left.eval + right.eval
}
case class ESub(left:Expr, right:Expr) extends Expr {
def eval():Int = left.eval - right.eval
}
case class EMul(left:Expr, right:Expr) extends Expr {
def eval():Int = left.eval * right.eval
}
case class EDiv(left:Expr, right:Expr) extends Expr {
def eval():Int = left.eval / right.eval
}
case class EUMinus(e:Expr) extends Expr {
def eval():Int = -e.eval
}
trait DebugStandardTokenParsers extends StandardTokenParsers {
class Wrap[+T](name:String,parser:Parser[T]) extends Parser[T] {
def apply(in: Input): ParseResult[T] = {
val first = in.first
val pos = in.pos
val offset = in.offset
val t = parser.apply(in)
println(name+".apply for token "+first+
" at position "+pos+" offset "+offset+" returns "+t)
t
}
}
implicit def toWrapped(name:String) = new {
def !!![T](p:Parser[T]) = new Wrap(name,p) //for debugging
//def !!![T](p:Parser[T]) = p //for production
}
}
object ExprParser extends DebugStandardTokenParsers {
lexical.delimiters ++= List("+","-","*","/","(",")")
def value = numericLit ^^ { s => EConst(s.toInt) }
def parens:Parser[Expr] = "(" ~> expr <~ ")"
def unaryMinus:Parser[EUMinus] = "-" ~> term ^^ { EUMinus(_) }
def term = "term" !!! ( value | "term-parens" !!! parens | unaryMinus )
def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = {
level match {
case 1 =>
"add" !!! "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } |
"sub" !!! "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) }
case 2 =>
"mul" !!! "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } |
"div" !!! "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) }
case _ => throw new RuntimeException("bad precedence level "+level)
}
}
val minPrec = 1
val maxPrec = 2
def binary(level:Int):Parser[Expr] =
if (level>maxPrec) term
else binary(level+1) * binaryOp(level)
def expr = "expr" !!! ( binary(minPrec) | term )
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(expr)(tokens)
}
def parse(p:Parser[Expr], s:String) = {
val tokens = new lexical.Scanner(s)
phrase(p)(tokens)
}
def apply(s:String):Expr = {
parse(s) match {
case Success(tree, _) => tree
case e: NoSuccess =>
throw new IllegalArgumentException("Bad syntax: "+s)
}
}
def test(exprstr: String) =
printParseResult(parse(exprstr))
def test(p:Parser[Expr], exprstr: String) =
printParseResult(parse(p,exprstr))
def printParseResult(pr:ParseResult[Expr]) = {
pr match {
case Success(tree, _) =>
println("Tree: "+tree)
val v = tree.eval()
println("Eval: "+v)
case e: NoSuccess => Console.err.println(e)
}
}
//A main method for testing
def main(args: Array[String]) = test(args(0))
}
</pre>Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com5tag:blogger.com,1999:blog-7045524330253482541.post-630423692802904732011-07-19T16:49:00.000-07:002011-07-19T16:49:32.115-07:00Multithread Coroutine Scheduler<h1>Multithread Coroutine Scheduler</h1>
A scheduler that uses multiple worker threads
for continuations-based Scala coroutines.
<br/><br/>
In my recent series of posts that
<a href="http://jim-mcbeath.blogspot.com/2011/04/java-nio-complete-scala-server.html">
ended</a> with a complete Scala server
that uses continuations-based coroutines to store per-client state,
I asserted that the single-threaded scheduler implementation in that example
could relatively easily be replaced by a scheduler
that uses multiple threads.
In this post I provide a simple working example of such a
multithread scheduler.
<h3>Contents</h3>
<ul>
<li><a href="#overview">Overview</a>
<li><a href="#tasks">Managing Tasks</a>
<li><a href="#scheduler">Scheduler</a>
<li><a href="#synchronization">Synchronization</a>
</ul>
<a name="overview"></a>
<h3>Overview</h3>
We can use the standard
<a href="http://en.wikipedia.org/wiki/Thread_pool_pattern">thread-pool</a>
approach in which we have a pool
of worker threads that independently pull from a common task queue.
Java 1.5 introduced a set of classes and interfaces in the
<a href="http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/package-summary.html">
<code>java.util.concurrent</code> package</a>
to support various kinds of thread pools
or potentially other task scheduling mechanisms.
Rather than writing our own, we will use an
<a href="http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/Executor.html">
<code>Executor</code></a>
from that package.
<br/><br/>
We have an additional requirement that makes our situation a little bit
more complex than the typical thread-pool: our collection of tasks includes
both tasks that are ready to run and tasks that are currently blocked
but will become ready to run at some point in the future.
<br/><br/>
We will implement a new scheduler class <code>JavaExecutorCoScheduler</code>
that maintains a list of blocked tasks and
uses a Java <code>Executor</code> to manage runnable tasks.
<br/><br/>
The updated complete source code for this post is available
on github in my <a href="https://github.com/jimmc/nioserver">nioserver</a>
project under the tag
<a href="https://github.com/jimmc/nioserver/tree/blog-executor">
blog-executor</a>.
<a name="tasks"></a>
<h3>Managing Tasks</h3>
As mentioned above, we need to deal with two kinds of tasks:
tasks that are ready to run and tasks that are blocked.
The standard
<code>Executor</code>
class allows us to submit a task for execution, but does not handle
blocked tasks.
Since we don't want to submit blocked tasks to the <code>Executor</code>,
we have to queue them up ourselves.
We have two issues to attend to:
<ol>
<li>When our scheduler is passed a task, we must put it into our own
queue of blocked tasks if it is not currently ready to run.
<li>When a previously blocked task becomes ready to run,
we must remove it from our queue of
blocked tasks and pass it to the <code>Executor</code>.
</ol>
The first issue is straightforward, as our framework already allows us to
test the blocker for a task and see if the task is ready to run.
In order to properly take care of the second issue, we will make a small
change to our framework to allow us to notice when a blocker has probably
stopped blocking so that we can run the corresponding task.
We do this by modifying our <code>CoScheduler</code> class to add
a method to notify it that a blocker has probably become unblocked:
<br/>
<input id="hlButton" type="submit" value="Highlight Syntax" onclick="highlightSyntaxFlipButton()"/>
<pre name="hlcode" class="scala"
> def unblocked(b:Blocker):Unit
</pre>
We call this method from <code>CoQueue</code> in the two places where
we previously called <code>scheduler.coNotify</code>:
in the <code>blockingEnqueue</code> method after we have enqueued an item
to notify the scheduler that the dequeue side is probably unblocked,
and in the <code>blockingDequeue</code> method after we have dequeued an item
to notify the scheduler that the enqueue side is probably unblocked.
Those two methods in <code>CoQueue</code> now look like this:
<pre name="hlcode" class="scala"
> def blockingEnqueue(x:A):Unit @suspendable = {
enqueueBlocker.waitUntilNotBlocked
enqueue(x)
<b>scheduler.unblocked(dequeueBlocker)</b>
}
def blockingDequeue():A @suspendable = {
dequeueBlocker.waitUntilNotBlocked
val x = dequeue
<b>scheduler.unblocked(enqueueBlocker)</b>
x
}
</pre>
The implementation of <code>unblocked</code> in our default scheduler
<code>DefaultCoScheduler</code> is just a call to <code>coNotify</code>,
so the behavior of that system will remain the same as it was before we added
the calls to <code>unblocked</code>.
<br/><br/>
Because we need to ensure that all of our NIO read and write operations
are handled sequentially, we continue to manage those tasks separately
with our <code>NioSelector</code> class,
where all of the reads are executed on one thread and all of the writes
are executed on another thread.
<a name="scheduler"></a>
<h3>Scheduler</h3>
We already have a scheduler framework that defines a <code>CoScheduler</code>
class as the parent class for our scheduler implementations,
which requires that we implement the methods
<code>setRoutineContinuation</code>, <code>runNextUnblockedRoutine</code>
and the newly added <code>unblocked</code>.
<br/><br/>
In our <code>JavaExecutorCoSchduler</code>,
our <code>setRoutineContinuation</code> method is responsible for storing
or executing the task.
It checks to see if the task is currently blocked, storing it
in our list of blocked tasks if so.
Otherwise, it passes it to the thread pool (which is managed by an
<a href="http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ExecutorService.html">
<code>ExecutorService</code></a>),
which takes care of managing the threads and running the task.
We define a simple case class, <code>RunnableCont</code>, to turn our task
into a <code>Runnable</code> that is usable by the pool.
<br/><br/>
Our <code>unblocked</code> method gets passed a blocker which is probably
now unblocked.
We test that, and if in fact it is still blocked we do nothing.
If it is unblocked, then we remove it from our list of blocked tasks
and pass it to the pool.
<br/><br/>
The <code>runNextUnblockedRoutine</code> method in this scheduler doesn't
actually do anything, since the pool is taking care of running everything.
We just return <code>SomeRoutinesBlocked</code> so that the caller goes
into a wait state.
<br/><br/>
In addition to the above three methods, we will have our thread pool,
a lock that we use when managing our blocked and runnable tasks,
and a set of blocked tasks waiting to become unblocked.
For this implementation we choose to use a thread pool of a fixed size,
thus the call to
<a href="http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool(int)">
<code>Executors.newFixedThreadPool</code></a>.
<br/><br/>
Here is our complete <code>JavaExecutorCoScheduler</code> class:
<pre name="hlcode" class="scala"
>package net.jimmc.scoroutine
import java.lang.Runnable
import java.util.concurrent.Executors
import java.util.concurrent.ExecutorService
import scala.collection.mutable.LinkedHashMap
import scala.collection.mutable.SynchronizedMap
class JavaExecutorCoScheduler(numWorkers:Int) extends CoScheduler {
type Task = Option[Unit=>Unit]
case class RunnableCont(task:Task) extends Runnable {
def run() = task foreach { _() }
}
private val pool = Executors.newFixedThreadPool(numWorkers)
private val lock = new java.lang.Object
private val blockedTasks = new LinkedHashMap[Blocker,Task] with
SynchronizedMap[Blocker,Task]
private[scoroutine] def setRoutineContinuation(b:Blocker,task:Task) {
lock.synchronized {
if (b.isBlocked) {
blockedTasks(b) = task
} else {
pool.execute(RunnableCont(task))
coNotify
}
}
}
def unblocked(b:Blocker):Unit = {
lock.synchronized {
if (!b.isBlocked)
blockedTasks.remove(b) foreach { task =>
pool.execute(RunnableCont(task)) }
}
coNotify
}
def runNextUnblockedRoutine():RunStatus = SomeRoutinesBlocked
}
</pre>
<a name="synchronization"></a>
<h3>Synchronization</h3>
Although not necessitated by the above changes,
I added one more change to <code>CoScheduler</code>
to improve its synchronization behavior.
<br/><br/>
While exploring various multi-threading mechanisms as alternatives to
using <code>Executor</code>,
I wrote a scheduler called <code>MultiThreadCoScheduler</code>
in which I implemented my own thread pool
and in which the master thread directly
allocated tasks to the worker threads in the pool.
Although that scheduler was quite a bit larger than the one presented
above, it provided much more control over the threads, allowing me to
change the number of worker threads on the fly
and to be able to tell in my master
thread whether there were any running worker threads.
<br/><br/>
In <code>MultiThreadCoScheduler</code>,
the main thread would call <code>coWait</code>
to wait until it needed to wake up and hand out another task,
and the worker threads would call <code>coNotify</code> when they were
done processing a task and were ready to be assigned the next task.
Similarly, a call to <code>coNotify</code> would be issued whenever
a new task was placed into the task queue.
<br/><br/>
Unfortunately, Java's <code>wait</code> and
<code>notify</code> methods,
which are the calls underlying our <code>coWait</code>
and <code>coNotify</code> methods,
do not quite behave the way we would like.
If we compare those calls to the Java NIO
<code>select</code> and <code>wakeup</code> calls,
we note that if a call is made to <code>wakeup</code> <i>before</i>
a call to <code>select</code>,
the <code>select</code> call will return immediately.
The <code>wait</code>/<code>notify</code> calls do not behave this way;
if a call is made to <code>notify</code> when there is no thread waiting
in a <code>wait</code> call on that
<a href="http://www.artima.com/insidejvm/ed2/threadsynch.html">
monitor</a>, the <code>notify</code> call
does nothing, and the following call to <code>wait</code> will wait until
the next call to <code>notify</code>.
<br/><br/>
This small difference in semantics actually makes a pretty big difference
in behavior, because it means when using <code>wait</code> and
<code>notify</code> you must be concerned with which happens first.
Let's see how that works.
<br/><br/>
In a typical scenario we have a resource with a boolean state that
indicates when a thread can access that resource,
for example, a queue with a boolean state of "has some data" that indicates when
a reader thread can pull an item from the queue (and perhaps another boolean
state of "queue is full" that indicates when a writer thread can put an item
into the queue).
In the case of <code>MultiThreadCoScheduler</code>
we have a task with a "ready" flag that tells us when we can
assign that task to a worker,
and a worker with an "idle" flag that tells us when we can
assign a task to that worker.
When a task becomes ready to run, we want a thread
(other than the master, since it may be waiting)
to add the task to our queue of
tasks and then notify the master that a task is available.
Meanwhile, when the master is looking for an available task to assign
to an idle worker, it will query to
see if a task is available, and if not it will then wait until one becomes
available.
The problem sequence would be if the master checks for available tasks,
finds none, then before the master executes its wait, the non-master puts
a ready task into the queue and issues a notify to the master.
The result of this sequence would be a ready task in the queue, but a
master waiting for a notify.
<br/><br/>
When all of the synchronization is done within a single class, you can
ensure that the above problem sequencing of operations does not happen
by arranging that the code that places a ready task into the queue and
notifies the master happens within one <code>synchronized</code> block,
and the code used by the master to query the queue for a ready task and
then to wait happens within one <code>synchronized</code> block on the same
monitor.
But when dealing with subclasses, we run into the
"<a href="http://www.scala-lang.org/node/9811">inheritance anomaly</a>"
(or "inheritance-synchronization anomaly").
The essence of this problem is that the base class provides a method
that is synchronized, but the subclass would like to include more
functionality within that synchronized block.
If, as is often the case, the subclass does not have access to the monitor
being used by the base class to control its synchronization,
there is no way for it to do this.
<br/><br/>
In our case, we can implement something that is sufficient for our
current needs by
making a small change to our <code>coWait</code>
and <code>coNotify</code> methods in <code>CoScheduler</code>
so that they behave in the same manner as
<code>select</code> and <code>wakeup</code>:
if a call to <code>coNotify</code> is made before a call to <code>coWait</code>,
the call to <code>coWait</code> will return immediately.
We do this by changing the implementation of <code>coWait</code> and
<code>coNotify</code> in <code>CoScheduler</code> from this:
<pre name="hlcode" class="scala"
> def coWait():Unit = {
defaultLock.synchronized {
defaultLock.wait()
}
}
def coNotify():Unit = {
defaultLock.synchronized {
defaultLock.notify
}
}
</pre>
to this:
<pre name="hlcode" class="scala"
> private var notified = false
def coWait():Unit = {
defaultLock.synchronized {
if (!notified)
defaultLock.wait()
notified = false
}
}
def coNotify():Unit = {
defaultLock.synchronized {
notified = true
defaultLock.notify
}
}
</pre>
With the above change to our base class, our subclass no longer needs to
be concerned about the problem sequence described above,
because the call to <code>coWait</code> will return immediately if there
was a call to <code>coNotify</code> since the most recent previous call
to <code>coWait</code>.
Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com2tag:blogger.com,1999:blog-7045524330253482541.post-45002944780859077342011-06-25T07:46:00.000-07:002011-06-25T07:46:15.757-07:00Sledgehammer WordsWords are tools that we use to
clarify our concepts, express our emotions and
persuade others to our positions.
We use those tools to craft mental models which we deliver to our listener.
The better the job we do with those tools,
the more effectively we can communicate our message.
<br/><br/>
The words we use every day are our basic tools.
Like screwdrivers and pliers, these words are simple but versatile,
performing adequately for most tasks.
Occasionally we might want to use a more esoteric word for
a specific task, as we might pull out a pair of
<a href="http://www.grainger.com/Grainger/PROTO-Bent-Needle-Nose-Pliers-3R209">
bent needle nose pliers</a>
when that tool is just right for the job.
<br/><br/>
The better your selection of tools, the better job you can do at making
a beautiful and effective work.
In a pinch you can use a slot-head screwdriver to set a Phillips screw,
but you stand a higher chance of damaging the screw head and it is more
difficult to set it just right.
Similarly but more subtly, you may be able to use a Phillips
screwdriver to set a
<a href="http://www.sizes.com/tools/screw_drive.htm#Frearson">
Frearson</a> screw, but you will be able to do
a better job if you have a Frearson driver.
Most of us will probably not need this level of distinction and can get
by with just a Phillips, or indeed perhaps with just a slot-head driver,
but if you want to be able to craft the best results over the widest
range of projects, having that Frearson screwdriver in your toolbox
will provide one more area in which you can do things better.
<br/><br/>
Swear words are the sledgehammers of our verbal toolbox.
Like a sledgehammer, a swear word can pack a lot of punch,
and like a sledgehammer it lacks precision.
Sometimes a sledgehammer is the right tool for the job:
when you need to smash a hole in something, one good whack with a
sledgehammer can be far more effective than trying to use pliers
and screwdrivers to do the same thing.
<br/><br/>
But for most of us, most of the time, that's not the job we are trying to do.
Most of the time we are more interested in making a neat hole, and
we should pull out the electric drill, or the hole saw, or even the
Sawzall to do the job; or we just need to tap in a small nail,
where a standard hammer would work nicely.
If we smash it with a sledgehammer, it's likely that we will then need
to spend a lot of time cleaning things up afterwards, which would probably
be more work than using one of the other tools in the first place.
<br/><br/>
Some people seem to have a very small toolbox
and are constantly swinging around that sledgehammer.
They use it for almost everything; rather than pulling out a
screwdriver to set a screw, they whack it with their sledgehammer.
To me, everything these people say seems like a pile of smashed rubble.
I doubt that's really the message they want to deliver.
<br/><br>
Even a single use of a sledgehammer word can derail
any kind of nuance or subtlety,
and casual use will likely overwhelm everything else in the message.
<br/><br/>
So go ahead and use a sledgehammer when it is appropriate,
but do so deliberately and fully conscious of your intended result.
Make an effort to add a good assortment of tools to your toolbox,
understand what you are trying to accomplish,
learn to use the best tool for the job and use it well.Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com1tag:blogger.com,1999:blog-7045524330253482541.post-4182892246329874822011-04-15T11:08:00.000-07:002011-04-15T11:08:34.093-07:00Java Nio Complete Scala ServerThe capstone to this series of posts:
a complete multi-client stateful application server in Scala
using Java NIO non-blocking IO for both reading and writing,
and delimited continuations as coroutines
for both IO and application processing.
<h3>Contents</h3>
<ul>
<li><a href="#background">Background</a>
<li><a href="#nioapplication">NioApplication</a>
<li><a href="#nioserver">NioServer</a>
<li><a href="#niolistener">NioListener</a>
<li><a href="#nioconnection">NioConnection</a>
<li><a href="#echoserver">EchoServer</a>
<li><a href="#three-questions-server">ThreeQuestionsServer</a>
<li><a href="#limitations">Limitations</a>
</ul>
<a name="background"></a>
<h3>Background</h3>
In the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html">
initial post</a>
of this series on
<a href="http://download.oracle.com/javase/1.5.0/docs/guide/nio/">
Java NIO</a>
in Scala I mentioned a set of
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html#limitations">
Limitations</a>
of the first example server.
In the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-for-character-decoding-in.html">
next</a>
<a href="http://jim-mcbeath.blogspot.com/2011/04/java-nio-and-scala-coroutines.html">
three</a>
<a href="http://jim-mcbeath.blogspot.com/2011/04/java-nio-for-writing.html">
posts</a>
after that initial post
I addressed some of those limitations.
In this post I address the remaining limitation in that original list:
the application code (an echo loop in the example) is buried in the
<code>NioConnection</code> class,
which makes that application code more difficult to maintain and
makes the server code not directly reusable as a library.
<br/><br/>
With the changes described in the next section,
all of the application-specific behavior will be
encapsulated in an instance of an application-specific
subclass of a new class, <code>NioApplication</code>.
Since the remainder of the classes presented so far will now be
independent of the application and reusable
without any modifications for multiple applications,
they will be moved into a separate package, <code>net.jimmc.nio</code>.
<br/><br/>
Other than adding <code>package net.jimmc.nio</code>,
there were no changes to <code>LineDecoder</code>
and <code>NioSelector</code>,
and there were no changes to the coroutine package
<code>net.jimmc.scoroutine</code>
for this latest set of changes.
For the files that were changed, listed below,
the listings show the complete new version of the file,
with changes from the previous version highlighted in <b>bold</b>.
<br/><br/>
The complete source for this series of posts is available on github in my
<a href="https://github.com/jimmc/nioserver">nioserver</a> project,
with the specific version after the changes specified in this post tagged as
<a href="https://github.com/jimmc/nioserver/tree/blog-complete">
blog-complete</a>.
<a name="nioapplication"></a>
<h3>NioApplication</h3>
Extracting the application-specific code out of <code>NioConnection</code>
is pretty simple:
in <code>NioConnection.startApp</code>,
rather than starting up a built-in echo loop,
we add a hook that allows us to call back to an application-specific
method that implements whatever behavior the application wants for
dealing with a connection.
To do this, we define a new abstract class <code>NioApplication</code>
that includes a <code>runConnection</code> method that we can call
from <code>NioConnection.startApp</code>.
<br/><br/>
We will also use the <code>NioApplication</code> class as a convenience
class where we can bundle up some of the arguments that get passed
around a lot, in particular the coroutine scheduler and the
read and write selectors.
This gives us the opportunity to override the coroutine scheduler
with one more appropriate for the application,
although we will not do so in this example.
<br/><br/>
<input id="hlButton" type="submit" value="Highlight Syntax" onclick="highlightSyntaxFlipButton()"/>
<pre name="hlcode" class="scala"
><b>package net.jimmc.nio
import net.jimmc.scoroutine.DefaultCoScheduler
import scala.util.continuations._
abstract class NioApplication {
val readSelector = new NioSelector()
val writeSelector = new NioSelector()
val sched = new DefaultCoScheduler
def runConnection(conn:NioConnection):Unit @suspendable
}</b>
</pre>
<a name="nioserver"></a>
<h3>NioServer</h3>
We simplify the <code>NioServer</code> class by removing
<code>object NioServer</code>, which will instead be in the application
main object.
We replace three parameters in the constructor with the
single <code>app</code> parameter
and likewise replace three arguments in the call to
<code>NioListener</code> with the single <code>app</code> argument.
<pre name="hlcode" class="scala"
><b>package net.jimmc.nio</b>
import net.jimmc.scoroutine.DefaultCoScheduler
import java.net.InetAddress
class NioServer(<b>app:NioApplication,</b> hostAddr:InetAddress, port:Int) {
val listener = new NioListener(<b>app,</b> hostAddr, port)
def start() {
listener.start(true)
//run the NIO read and write selectors each on its own thread
(new Thread(<b>app.</b>writeSelector,"WriteSelector")).start
(new Thread(<b>app.</b>readSelector,"ReadSelector")).start
Thread.currentThread.setName("CoScheduler")
<b>app.</b>sched.run //run the coroutine scheduler on our thread, renamed
}
}
</pre>
<a name="niolistener"></a>
<h3>NioListener</h3>
Three parameters in the constructor have been replaced by the single
<code>app</code> parameter.
<pre name="hlcode" class="scala"
><b>package net.jimmc.nio</b>
import net.jimmc.scoroutine.CoScheduler
import java.net.{InetAddress,InetSocketAddress}
import java.nio.channels.{ServerSocketChannel,SocketChannel}
import java.nio.channels.SelectionKey
import scala.util.continuations._
class NioListener(<b>app:NioApplication,</b> hostAddr:InetAddress, port:Int) {
val serverChannel = ServerSocketChannel.open()
serverChannel.configureBlocking(false);
val isa = new InetSocketAddress(hostAddr,port)
serverChannel.socket.bind(isa)
def start(continueListening: =>Boolean):Unit = {
reset {
while (continueListening) {
val socket = accept()
NioConnection.newConnection(<b>app,</b> socket)
}
}
}
private def accept():SocketChannel @suspendable = {
shift { k =>
<b>app.</b>readSelector.register(serverChannel,SelectionKey.OP_ACCEPT, {
val conn = serverChannel.accept()
conn.configureBlocking(false)
k(conn)
})
}
}
}
</pre>
<a name="nioconnection"></a>
<h3>NioConnection</h3>
We modify the constructor and the companion to replace three parameters
with the single <code>app</code> parameter, and we replace our echo loop
in <code>startApp</code> with a call to the application
<code>runConnection</code> method,
followed by a call to our <code>close</code> method to make sure we
close the socket when the application is done with it.
<pre name="hlcode" class="scala"
><b>package net.jimmc.nio</b>
import net.jimmc.scoroutine.{CoQueue,CoScheduler}
import java.nio.ByteBuffer
import java.nio.channels.SelectionKey
import java.nio.channels.SocketChannel
import scala.util.continuations._
object NioConnection {
def newConnection(<b>app:NioApplication,</b> socket:SocketChannel) {
val conn = new NioConnection(<b>app,</b> socket)
conn.start()
}
}
class NioConnection(<b>app:NioApplication,</b> socket:SocketChannel) {
private val buffer = ByteBuffer.allocateDirect(2000)
private val lineDecoder = new LineDecoder
private val inQ = new CoQueue[String](<b>app.</b>sched, 10)
private val outQ = new CoQueue[String](<b>app.</b>sched, 10)
def start():Unit = {
startReader
startWriter
startApp
}
private def startApp() {
reset {
<b>app.runConnection(this)
close()</b>
}
}
private def startReader() {
reset {
while (socket.isOpen)
readWait
}
}
private def readWait<b>:Unit @suspendable</b> = {
buffer.clear()
val count = read(buffer)
if (count<1) {
socket.close()
shiftUnit[Unit,Unit,Unit]()
} else {
buffer.flip()
lineDecoder.processBytes(buffer, inQ.blockingEnqueue(_))
}
}
private def read(b:ByteBuffer):Int @suspendable = {
if (!socket.isOpen)
-1 //indicate EOF
else shift { k =>
<b>app.</b>readSelector.register(socket, SelectionKey.OP_READ, {
val n = socket.read(b)
k(n)
})
}
}
def readLine():String @suspendable = inQ.blockingDequeue
private def startWriter() {
reset {
while (socket.isOpen)
writeWait
}
}
private def write(b:ByteBuffer):Int @suspendable = {
if (!socket.isOpen)
-1 //indicate EOF
else shift { k =>
<b>app.</b>writeSelector.register(socket, SelectionKey.OP_WRITE, {
val n = socket.write(b)
k(n)
})
}
}
private def writeBuffer(b:ByteBuffer):Unit @suspendable = {
write(b)
if (b.remaining>0 && socket.isOpen)
writeBuffer(b)
else
shiftUnit[Unit,Unit,Unit]()
}
private def writeWait():Unit @suspendable = {
val str = outQ.blockingDequeue
if (str eq closeMarker) {
socket.close
shiftUnit[Unit,Unit,Unit]()
} else
writeBuffer(ByteBuffer.wrap(str.getBytes("UTF-8")))
}
def writeLine(s:String) = write(s+"\n")
def write(s:String) = outQ.blockingEnqueue(s)
def isOpen = socket.isOpen
private val closeMarker = new String("")
def close():Unit @suspendable = write(closeMarker)
}
</pre>
<a name="echoserver"></a>
<h3>EchoServer</h3>
We move the application-specific main object out of <code>NioServer</code>
and place it into our sample application class, which we call
<code>EchoServer</code>, along with a subclassed <code>NioApplication</code>
that provides our application behavior.
<br/><br/>
Highlighted differences are as compared to the previous version
of <code>NioServer</code>.
<pre name="hlcode" class="scala"
><b>import net.jimmc.nio.{NioApplication,NioConnection,NioServer}</b>
import net.jimmc.scoroutine.DefaultCoScheduler
import java.net.InetAddress
<b>import scala.util.continuations._</b>
object <b>EchoServer</b> {
def main(args:Array[String]) {
<b>val app = new EchoApplication</b>
val hostAddr:InetAddress = null //listen on local connection
val port = 1234
val server = new NioServer(<b>app,</b>hostAddr,port)
server.start()
}
}
<b>class EchoApplication extends NioApplication {
def runConnection(conn:NioConnection):Unit @suspendable = {
while (conn.isOpen) {
conn.writeLine(conn.readLine)
}
}
}</b>
</pre>
The above class is the complete application definition for our
echo server when built on top of our generic nio package.
After compiling, run with this command:
<pre name="hlcode" class="bash"
>$ scala EchoServer
</pre>
With all the above changes, we have once again internally transformed
our application, but besides starting it up with a different name
it's external behavior is still the same.
However, we have reached the point where defining a new server-based
application is easy.
<a name="three-questions-server"></a>
<h3>ThreeQuestionsServer</h3>
The example in this section shows a slightly more complex application
that maintains some local per-client state as it progresses through a
short series of steps interacting with the client.
In this simple application, the server asks up to
<a href="http://www.imdb.com/title/tt0071853/quotes#qt0470601">
three questions</a>
of the client and collects responses,
with each next question sometimes depending on the previous answers.
The per-client state is contained both in local variables and in
the location of execution within the application.
Each time the processing for a client is suspended the state for that
client is captured in a continuation to be restored when the next piece
of input is available.
The continuation includes all of the above per-client state information,
so we don't have to write any application-specific
code to save and restore that data.
<br/><br/>
By defining the <code>ReaderWriter</code> interface trait,
the application is written so as to be able to run either in server mode
using an instance of <code>ConnReader</code>,
in which case it accepts connections from clients,
or in standalone mode using an instance of <code>SysReader</code>,
in which case it only interacts with the console.
<br/><br/>
When our application running in server mode finishes handling a client
and exits from the
<code>run</code> method,
control returns to <code>NioConnection</code>,
which closes the connection.
<pre name="hlcode" class="scala"
>import net.jimmc.nio.{NioApplication,NioServer,NioConnection}
import java.io.{BufferedReader,InputStreamReader,PrintWriter}
import java.net.InetAddress
import scala.util.continuations._
object ThreeQuestionsConsole {
def main(args:Array[String]) {
val in = new BufferedReader(new InputStreamReader(System.in))
val out = new PrintWriter(System.out)
val io = new SysReader(in,out)
reset {
(new ThreeQuestions(io)).run
}
}
}
object ThreeQuestionsServer {
def main(args:Array[String]) {
val app = new ThreeQuestionsApp
val hostAddr:InetAddress = null //localhost
val port = 1234
val server = new NioServer(app,hostAddr,port)
server.start()
}
}
class ThreeQuestionsApp extends NioApplication {
def runConnection(conn:NioConnection):Unit @suspendable = {
val io = new ConnReader(conn)
(new ThreeQuestions(io)).run
}
}
trait ReaderWriter {
def readLine():String @suspendable
def writeLine(s:String):Unit @suspendable
}
class SysReader(in:BufferedReader,out:PrintWriter) extends ReaderWriter {
def readLine() = in.readLine
def writeLine(s:String) = { out.println(s); out.flush() }
}
class ConnReader(conn:NioConnection) extends ReaderWriter {
def readLine():String @suspendable = conn.readLine
def writeLine(s:String):Unit @suspendable = conn.writeLine(s)
}
class ThreeQuestions(io:ReaderWriter) {
def run():Unit @suspendable = {
val RxArthur = ".*arthur.*".r
val RxGalahad = ".*galahad.*".r
val RxLauncelot = ".*(launcelot|lancelot).*".r
val RxRobin = ".*robin.*".r
val RxHolyGrail = ".*seek the holy grail.*".r
val RxSwallow = ".*african or european.*".r
val RxAssyriaCapital =
".*(assur|shubat.enlil|kalhu|calah|nineveh|dur.sharrukin).*".r
val name = ask("What is your name?").toLowerCase
val quest = ask("What is your quest?").toLowerCase
val holy = quest match {
case RxHolyGrail() => true
case _ => false
}
if (holy) {
val q3Type = name match {
case RxRobin() => 'capital
case RxArthur() => 'swallow
case _ => 'color
}
val a3 = (q3Type match {
case 'capital => ask("What is the capital of Assyria?")
case 'swallow => ask("What is the air-speed velocity of an unladen swallow?")
case 'color => ask("What is your favorite color?")
}).toLowerCase
(q3Type,a3,name) match {
//Need to use an underscore in regex patterns with alternates
case ('capital,RxAssyriaCapital(_),_) => accept
case ('capital,_,_) => reject
case ('swallow,RxSwallow(),_) => rejectMe
case ('swallow,_,_) => reject
case ('color,"blue",RxLauncelot(_)) => accept
case ('color,_,RxLauncelot(_)) => reject
case ('color,"yellow",RxGalahad()) => accept
case ('color,_,RxGalahad()) => reject
case ('color,_,_) => accept
}
} else {
reject
}
}
def ask(s:String):String @suspendable = { io.writeLine(s); io.readLine }
def accept:Unit @suspendable = io.writeLine("You may pass")
def reject:Unit @suspendable = io.writeLine("you: Auuuuuuuugh!")
def rejectMe:Unit @suspendable = io.writeLine("me: Auuuuuuuugh!")
}
</pre>
To run in console or server mode, use one of the following two commands:
<pre name="hlcode" class="bash"
>$ scala ThreeQuestionsConsole
$ scala ThreeQuestionsServer
</pre>
<a name="limitations"></a>
<h3>Limitations</h3>
I am calling this version complete because it addresses all of the issues
in the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html#limitations">
Limitations</a> section of my original post,
but it is far from production-ready.
Before putting this code into production I would address the following issues.
<ul>
<li>Although the application now uses more than one thread, it still runs
all of the application code on a single thread.
The scheduler should be replaced by one that can choose how many
threads to use and distribute the execution of the coroutines among
those threads.
<li>This version still has not addressed all of the issues raised in the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-for-character-decoding-in.html#limitations">
Limitations</a> section of the second post in this series,
on character decoding. In particular:
<ul>
<li>Error handling should be improved.
<li>It only supports UTF-8 encoding.
</ul>
For an example of this problem, type a Control-C into your telnet
window when connected to the EchoServer application.
<li>The application should parse its command line arguments so that
it has the flexibility to, for example, use a different port number
without requiring a code change.
<li>The application should
<a href="http://jim-mcbeath.blogspot.com/2010/01/reload-that-config-file.html">
read a configuration file</a>.
<li>Error handling in general needs to be improved.
<li>Logging should be added.
</ul>Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com1tag:blogger.com,1999:blog-7045524330253482541.post-23678625024617102132011-04-08T08:58:00.000-07:002011-04-08T08:58:30.851-07:00Java NIO for WritingUsing Java NIO non-blocking IO for writing as well as reading
is almost - but not quite - straightforward.
<h3>Contents</h3>
<ul>
<li><a href="#background">Background</a>
<li><a href="#implementation">Implementation</a>
<li><a href="#two-selectors">Two Selectors</a>
<li><a href="#close">Close</a>
<li><a href="#summary">Summary</a>
</ul>
<a name="background"></a>
<h3>Background</h3>
One of the limitations pointed out in the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html#limitations">
Limitations</a>
section of the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html">
original post</a> in this series
was that we were still directly writing our output data to the socket
rather than using
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html#java-nonblocking-io">
non-blocking IO</a> and
<a href="http://jim-mcbeath.blogspot.com/2010/08/delimited-continuations.html">
continuations</a> as we were doing
when reading our input data.
If a client stops reading its input
(or if there is sufficient network congestion that it looks that way
from our end)
then our socket output buffer
may fill up.
If that happens, then one of two things will happen when we try to write
our data to that socket: either the call will block, or the data will
not all be written.
If the call blocks, then we have a blocked thread that we can not use
for processing other clients until it is unblocked.
If there are many clients who are not reading their input,
we could have many blocked threads.
Since one of the goals of this exercise is to be able to run many clients
on a relatively small number of threads, having blocked threads is bad.
To avoid this problem, we use non-blocking output and continuations
for writing to the output,
just as we did for reading the input.
<br/><br/>
The complete source for this series of posts is available on github in my
<a href="https://github.com/jimmc/nioserver">nioserver</a> project,
with the specific version after the changes specified in this post tagged as
<a href="https://github.com/jimmc/nioserver/tree/blog-write">
blog-write</a>.
<a name="implementation"></a>
<h3>Implementation</h3>
We model the output code on the input code by making these changes:
<ul>
<li>We write a suspending <code>write</code> method that registers
our interest in writing to the output socket connection.
<li>We add an output queue to receive data from the application.
<li>We modify the <code>writeLine</code>
method to add a line to the output queue rather than writing
directly to the output socket.
<li>We run a separate control loop that reads from the output queue
and writes to the output socket.
</ul>
<br/>
<input id="hlButton" type="submit" value="Highlight Syntax" onclick="highlightSyntaxFlipButton()"/>
<pre name="hlcode" class="scala"
>//In class NioConnection
<b>private val outQ = new CoQueue[String](sched, 10)</b>
def start():Unit = {
startReader
<b>startWriter</b>
startApp
}
<b>private def startWriter() {
reset {
while (socket.isOpen)
writeWait
}
}
private def write(b:ByteBuffer):Int @suspendable = {
if (!socket.isOpen)
-1 //indicate EOF
else shift { k =>
selector.register(socket, SelectionKey.OP_WRITE, {
val n = socket.write(b)
k(n)
})
}
}
private def writeBuffer(b:ByteBuffer):Unit @suspendable = {
write(b)
if (b.remaining>0 && socket.isOpen)
writeBuffer(b)
else
shiftUnit[Unit,Unit,Unit]()
}
private def writeWait:Unit @suspendable = {
val str = outQ.blockingDequeue
writeBuffer(ByteBuffer.wrap(str.getBytes("UTF-8")))
}</b>
def writeLine(s:String)<b>:Unit @suspendable = write(s+"\n")
def write(s:String):Unit @suspendable = outQ.blockingEnqueue(s)</b>
</pre>
This seems pretty straightforward, but unfortunately it doesn't work.
The problem is that we have attempted to register our channel twice
(once for read and once for write) with the same selector.
The documentation for
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/SelectableChannel.html">
<code>SelectableChannel</code></a> says,
"<i>A channel may be registered at most once with any particular selector.</i>"
If we call
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/SelectableChannel.html#register(java.nio.channels.Selector,%20int,%20java.lang.Object)">
<code>register</code></a>
for our channel for write when it is
already registered for read, the read registration is overwritten by
the write registration and is lost.
<br/><br/>
In his
<a href="http://rox-xmlrpc.sourceforge.net/niotut/">
Rox Java NIO Tutorial</a> James Greenfield
<a href="http://rox-xmlrpc.sourceforge.net/niotut/#General%20principles">
explicitly recommends</a> that you
"<i>Use a single selecting thread</i>" and
"<i>Modify the selector from the selecting thread only.</i>"
We could take this approach,
adding some code to combine the read and write
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/SelectionKey.html#field_summary">
interest flags</a>
when we are in that position, but
unlike in James' case
we would also need
to add some code to demultiplex the separate callbacks for read and
write.
Instead, we use a different approach:
we use separate selectors for reading and writing, and we give each of
them its own thread.
<a name="two-selectors"></a>
<h3>Two Selectors</h3>
Depending on the implementation, using two selectors and two threads this
way could cause problems.
However, based on my understanding of the documentation,
<a href="http://www.docjar.com/html/api/sun/nio/ch/EPollSelectorImpl.java.html">
the code</a> in
the Sun implementation and the operation of the
<a href="http://linux.die.net/man/2/select">
POSIX select</a>
operation,
I believe this approach should work (at least on POSIX systems).
This would need to be tested on all supported
platforms for a production system.
<br/><br/>
To use separate read and write selectors, we replace the current
<code>selector</code> parameter in <code>NioConnection</code> with
two parameters <code>readSelector</code> and <code>writeSelector</code>
of the same type.
<pre name="hlcode" class="scala"
>//In object NioConnection:
def newConnection(sched:CoScheduler, <b>readSelector</b>:NioSelector,
<b>writeSelector:NioSelector,</b> socket:SocketChannel) {
val conn = new NioConnection(sched,<b>readSelector</b>,
<b>writeSelector,</b>socket)
conn.start()
}
class NioConnection(sched:CoScheduler, <b>readSelector</b>:NioSelector,
<b>writeSelector:NioSelector,</b> socket:SocketChannel) {
...
private def read(b:ByteBuffer):Int @suspendable = {
if (!socket.isOpen)
-1 //indicate EOF
else shift { k =>
<b>readSelector</b>.register(socket, SelectionKey.OP_READ, {
val n = socket.read(b)
k(n)
})
}
}
private def write(b:ByteBuffer):Int @suspendable = {
if (!socket.isOpen)
-1 //indicate EOF
else shift { k =>
<b>writeSelector</b>.register(socket, SelectionKey.OP_WRITE, {
val n = socket.write(b)
k(n)
})
}
}
...
}
</pre>
We also change <code>NioListener</code> to pass through those
two arguments, and we choose to use the <code>readSelector</code>
to handle our <code>accept</code> calls.
<pre name="hlcode" class="scala"
>//In NioListener
class NioListener(sched:CoScheduler, <b>readSelector</b>:NioSelector,
<b>writeSelector:NioSelector,</b> hostAddr:InetAddress, port:Int) {
...
def start(continueListening: =>Boolean):Unit = {
reset {
while (continueListening) {
val socket = accept()
NioConnection.newConnection(sched,
<b>readSelector,writeSelector</b>,socket)
}
}
}
private def accept():SocketChannel @suspendable = {
shift { k =>
<b>readSelector</b>.register(serverChannel,SelectionKey.OP_ACCEPT, {
val conn = serverChannel.accept()
conn.configureBlocking(false)
k(conn)
})
}
}
}
</pre>
Finally, we instantiate the new write selector in <code>NioServer</code>,
pass it in to <code>NioListener</code>, and start it running
in a new thread.
<pre name="hlcode" class="scala"
>//In NioServer
class NioServer(hostAddr:InetAddress, port:Int) {
val <b>readSelector</b> = new NioSelector()
<b>val writeSelector = new NioSelector()</b>
val sched = new DefaultCoScheduler
val listener = new NioListener(sched,
<b>readSelector, writeSelector,</b> hostAddr, port)
def start() {
listener.start(true)
//run the NIO <b>read and write selectors each</b> on its own thread
<b>(new Thread(writeSelector,"WriteSelector")).start</b>
(new Thread(<b>readSelector,"ReadSelector"</b>)).start
Thread.currentThread.setName("CoScheduler")
sched.run //run the coroutine scheduler on our thread, renamed
}
}
</pre>
<a name="close"></a>
<h3>Close</h3>
Our current example has no terminating condition, so never attempts to
close the connection.
Looking ahead, we expect to have applications that will want to do that,
so we add a <code>close</code> method to <code>NioConnection</code>,
and an <code>isOpen</code> method that allows us to see when it is closed.
<br/><br/>
We can't just add a close method that directly closes the socket,
because there may still be output data waiting to be written.
Thus we need an implementation that somehow waits until all of the queued
output data has been written to the output before closing the socket.
<br/><br/>
One easy way to do this is to have a special marker string that we put
into the output queue when the application requests to close the socket.
When our socket output code sees that marker, we know it has already written
out all of the data that came before that marker in the output queue,
so we can close the socket.
By doing the socket close in the same method that does the writes to
the socket, and by ensuring that that method is called on the
(write) selection thread,
we also ensure that the close happens on the selection thread.
<br/><br/>
The compiler shares constant strings, so to make sure we have a unique
string for our marker that can't be passed in by any code outside of
our <code>close</code> method, we use <code>new String()</code>.
In <code>writeWait</code>, where we check for that marker,
we use the identity comparison <code>eq</code> when checking for the marker,
and we add a call to <code>shiftUnit</code> to make both sides of the
<code>if</code> statement be CPS.
<br/><br/>
A call to our <code>close</code> method will return right away,
but the socket will not get closed until after all of the data in
the output queue has been written to the output socket.
The application can tell when the socket has actually been closed
by calling the <code>isOpen</code> method.
<pre name="hlcode" class="scala"
>//In NioConnection
private def writeWait():Unit @suspendable = {
val str = outQ.blockingDequeue
<b>if (str eq closeMarker) {
socket.close
shiftUnit[Unit,Unit,Unit]()
} else</b>
writeBuffer(ByteBuffer.wrap(str.getBytes("UTF-8")))
}
<b>def isOpen = socket.isOpen
private val closeMarker = new String("")
def close():Unit @suspendable = write(closeMarker)</b>
</pre>
<a name="summary"></a>
<h3>Summary</h3>
As in the previous two posts, we have modified the program to make an
internal improvement that has not changed its basic external behavior.
We have, however, changed its behavior for one of the corner cases -
in this case what happens when an output socket fills up, such as might
happen when there is excessive network latency - which is a necessary
improvement for a production application, particularly if one expects
the kind of high volume that would make those corner cases more likely.Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-25931942176677056392011-04-02T18:33:00.000-07:002011-04-02T18:33:27.882-07:00Java NIO and Scala CoroutinesI present a multi-client server in Scala that uses coroutines
to allow modularization of stateful client processing
in a way that is independent of threads.
<h3>Contents</h3>
<ul>
<li><a href="#background">Background</a>
<li><a href="#coroutines">Coroutines</a>
<li><a href="#architecture">Architecture</a>
<li><a href="#nioselector">NioSelector</a>
<li><a href="#coscheduler">CoScheduler</a>
<li><a href="#coqueue">CoQueue</a>
<li><a href="#nioconnection">NioConnection</a>
<li><a href="#linedecoder">LineDecoder</a>
<li><a href="#niolistener">NioListener</a>
<li><a href="#nioserver">NioServer</a>
<li><a href="#summary">Summary</a>
<li><a href="#caveats">Caveats</a>
</ul>
<a name="background"></a>
<h3>Background</h3>
In my
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html">
previous</a>
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-for-character-decoding-in.html">
two</a>
posts I presented a server in Scala that uses
<a href="http://download.oracle.com/javase/1.5.0/docs/guide/nio/">
Java NIO</a> non-blocking IO and
<a href="http://jim-mcbeath.blogspot.com/2010/08/delimited-continuations.html">
continuations</a> to allow
scaling to a large number of clients.
As I pointed out in the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html#limitations">
Limitations</a>
section of that first post,
that example used one thread for all execution.
On a multi-core machine, as is common today,
we would prefer to have multiple threads running to allow
us to take advantage of all of the processing power available to us,
yet we don't want to allocate a thread to every client.
<br/><br/>
It would be nice if we could add our own
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/SelectableChannel.html">
<code>SelectableChannel</code></a>
types to the set of NIO channel types that we can use with the
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/Selector.html#select()">
<code>select</code></a>
call so that we
could have one place where we do all our scheduling,
but that feature is not available.
We thus have to come up with another mechanism for handling all of the
other potentially blocking tasks we will want to do.
Fortunately, we already have such a mechanism: coroutines.
<a name="coroutines"></a>
<h3>Coroutines</h3>
Coroutines
provide a separation of the maintenance of task state from
the execution of code for that task,
allowing us to bind execution of the task to different threads as we desire.
When one of our task coroutines becomes blocked waiting for an unavailable
resource, we suspend it by storing its continuation, allowing us to
use that thread for another purpose, such as to restore and run
a different previously stored continuation that is now runnable.
<br/><br/>
In my
<a href="http://jim-mcbeath.blogspot.com/2010/09/scala-coroutines.html">
earlier post</a> on coroutines I presented an implementation
of a coroutine package that included a scheduler (<code>CoScheduler</code>)
and a blocking queue (<code>CoQueue</code>).
We will modify the server implementation
of my previous two "Java NIO" posts
to make use of those classes.
<br/><br/>
As pointed out in that earlier coroutines post,
the default scheduler implementation in the example can easily be
replaced by another implementation with no other changes to the code.
In particular,
that new implementation could use a thread pool or a group of actors
to execute the coroutines that are ready to run,
assuming the coroutine code itself is multi-thread safe.
We will not write that multi-thread scheduler for this post,
but will assume that it can be written later.
<a name="architecture"></a>
<h3>Architecture</h3>
At a high level, we want to modify our server so that we have a queue
between our socket reader and the application that will eventually
consume the data.
We can then set up a small processing loop that reads the socket data,
converts it to a string and writes it to that queue.
The application will read the contents of the queue, process it,
and write back its results to the connection.
We will let the socket reader continue to run on the select thread, but
we will run the application on a separate thread (or threads),
ensuring that the select loop can quickly get to all
connections and preventing the application processing of any one connection
from delaying the IO of other connections.
<br/><br/>
With this architecture we have two processing loops:
<ol>
<li>Read data from socket, write to queue.
<li>Read data from queue, process it, write data (to socket, for now).
</ol>
Given that for now we are writing directly to the connection socket on output
(and ignoring the possibility that the output socket might be blocked),
the second loop only has one potential blocking point:
if there is no data in the queue, it will block when trying to read
from the queue.
The first loop has two potential blocking points:
when it reads data from the socket (if there is no data available),
and when it writes data to the queue (if the queue is full).
The difficulty here is that the potentially blocking socket read must be
handled by the NIO select call,
but the potentially blocking write to the queue can't be handled by
the NIO select call and thus must be handled by our own scheduler.
<br/><br/>
Having one processing loop that when blocked is sometimes managed by one
scheduler (NIO select) and sometimes by another (our coroutine scheduler)
is not necessarily a problem.
Each scheduler just sees a blocking resource that has a
continuation associated with it; when the blocking resource becomes
available, the continuation is called and the process continues.
The new issue that arises when trying to combine two schedulers like
this is that an action by one scheduler can potentially unblock a task
that is currently controlled by (i.e. in a wait state on) the other scheduler.
Every time we perform an action that might unblock a task
we need to ensure that the appropriate scheduler is not stuck waiting
on the other tasks.
In other words, we need to wake up or notify the schedulers at appropriate
points in our code.
<br/><br/>
In this post, code which has changed is highlighed in <b>bold</b>
(when not using Syntax Highlighting).
Changes for <code>CoScheduler</code> and <code>CoQueue</code> are
as compared to the code in my
<a href="http://jim-mcbeath.blogspot.com/2010/09/scala-coroutines.html">
post</a>
on coroutines;
changes to
<code>NioSelector</code>,
<code>NioConnection</code>,
<code>LineDecoder</code>,
<code>NioListener</code> and
<code>NioServer</code> are as compared to the code in my
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html">
previous</a>
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-for-character-decoding-in.html">
two</a>
posts.
<br/><br/>
The complete source for this post is available on github in my
<a href="https://github.com/jimmc/nioserver">nioserver</a> project,
with the specific version used in this post tagged as
<a href="https://github.com/jimmc/nioserver/tree/blog-coroutines">
blog-coroutines</a>.
There are also tags for the previous two posts, so you can compare
using those tags to see the changes between the versions as used
in each post.
<a name="nioselector"></a>
<h3>NioSelector</h3>
As mentioned above,
we have to cooperate with the coroutine scheduler.
In particular, we have to be able to deal with the situation that
we have no active connections, so we are in a select call,
then another thread registers interest in an operation.
The documentation for the
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/Selector.html#selop">
<code>select</code></a> call states:
<blockquote>
Changes made to the interest sets of a selector's keys while a selection
operation is in progress have no effect upon that operation; they will
be seen by the next selection operation.
</blockquote>
To terminate the select operation early so that it retries with the
newly registered channel, we add a call to
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/Selector.html#wakeup()">
<code>wakeup</code></a>
just after registering our interest.
<br/><br/>
Unfortunately, this is not enough.
The documentation for the <code>select</code> call
is not very precise about
whether it is actually possible to call the <code>register</code>
call from another thread while the <code>select</code> call is
blocked waiting for a previously registered channel to become active.
The documentation for
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/SelectableChannel.html">
<code>SelectableChannel</code></a>
does explicitly say
"<i>Selectable channels are safe for use by multiple concurrent threads</i>",
but the documentation for the
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/SelectableChannel.html#register(java.nio.channels.Selector,%20int,%20java.lang.Object)">
<code>register</code></a>
method says
"<i>This method will then synchronize on the selector's key set and
therefore may block if invoked concurrently with another registration
or selection operation involving the same selector.</i>"
In fact, the standard Sun implementation does quite a bit of
synchronization, so quite easily gets deadlocked when used by
multiple threads.
In particular, the OS-level select call in the Java <code>select</code>
method is inside a pair of <code>synchronized</code> blocks that lock
the set of <code>SelectionKey</code>s associated with that selector.
If, while the first thread is blocked on the select,
a second thread calls <code>SelectableChannel.register</code>,
it locks the channel, then attempts to lock on the key set to which
that channel is being added, so it blocks.
If a third thread then tries to register that channel with a second
selector, which the documentation implies is allowed,
the third thread will attempt to lock the channel, which will
block until the second thread unblocks and releases its lock on the channel.
<br/><br/>
In his
<a href="http://rox-xmlrpc.sourceforge.net/niotut/">
Rox Java NIO Tutorial</a> James Greenfield
<a href="http://rox-xmlrpc.sourceforge.net/niotut/#General%20principles">
explicitly recommends</a> that you
"<i>Use a single selecting thread</i>" and
"<i>Modify the selector from the selecting thread only.</i>"
From the description of how <code>register</code> works above, you can see why.
<br/><br/>
To get around this problem and ensure that all changes to the selection keys
happen on the thread that is calling select,
we modify <code>NioSelect.register</code> so that,
rather than calling <code>SelectableChannel.register</code> directly,
it packages the arguments up and puts them into a queue which is
processed by the selection thread
in order to make all of the calls to <code>SelectableChannel.register</code>
just before it calls <code>select</code>.
<br/><br/>
Fortunately, the semantics of the <code>wakeup</code> call ensure that
we won't get ourselves into a position where we have put our registration
request into the queue but the <code>select</code> call doesn't see it
and blocks on all the other channels.
This is because <code>wakeup</code> is defined such that a call to it
that happens while the selector is not currently in a select operation
will cause the next <code>select</code> to wake up immediately.
<br/><br/>
With this change, all of the key set operations happen on the selection thread
and, since the socket read operation is in a callback that gets executed
by the selection thread in <code>NioSelector.executeCallbacks</code>,
all socket reads (and likewise accepts) will happen on the
selection thread.
<br/><br/>
<input id="hlButton" type="submit" value="Highlight Syntax" onclick="highlightSyntaxFlipButton()"/>
<pre name="hlcode" class="scala"
>//In class NioSelector
<b>import scala.collection.mutable.SynchronizedQueue</b>
<b>private case class RegistrationRequest(
channel:SelectableChannel,op:Int,callback:Function0[Unit])
private val regQ = new SynchronizedQueue[RegistrationRequest]</b>
def register(channel:SelectableChannel, op:Int, body: => Unit) {
val callback:Function0[Unit] = { () => { body }}
<b>regQ.enqueue(RegistrationRequest(channel,op,callback))
selector.wakeup()</b>
}
def selectOnce(timeout:Long) {
<b>while (regQ.size>0) {
val req = regQ.dequeue()
req.channel.register(selector,req.op,req.callback)
}</b>
...
}
}
</pre>
<a name="coscheduler"></a>
<h3>CoScheduler</h3>
For our coroutine scheduler,
we have to be able to deal with the situation that
we have no coroutines that are currently runnable,
then at some point one of those coroutines becomes runnable
by the actions of another thread.
In the architecture described above, this can happen when new data
that has been read from a connection is placed into the input queue.
To allow us to wait for this kind of event and to be awakened when
it happens, we use Java's
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Object.html#wait()">
<code>wait</code>/<code>notify</code></a>
model. We can't override those methods, since <code>notify</code>
is final, so we define our own versions,
which we call <code>coWait</code> and <code>coNotify</code>.
Given those methods, we also extend <code>Runnable</code>
and replace the old <code>run</code> method with one
that runs coroutines until none are available to run,
then waits until we are notified and continues the loop.
<pre name="hlcode" class="scala"
>trait CoScheduler <b>extends Runnable</b> { cosched =>
//we add the following items
<b>private val defaultLock = new java.lang.Object
def coWait():Unit = { defaultLock.synchronized { defaultLock.wait() } }
def coNotify():Unit = { defaultLock.synchronized { defaultLock.notify } }
def run {
while (true) {
runUntilBlockedOrDone
coWait
}
}</b>
}
</pre>
A <code>coNotify</code> method that accepts as an argument the coroutine
or blocker that has potentially changed state would allow for a more
efficient implementation, but for now we choose the simple implementation
given above that does not attempt that optimization.
<a name="coqueue"></a>
<h3>CoQueue</h3>
We use an instance of <code>CoQueue</code> as the queue between the
socket read loop and the application processing loop.
The socket read loop calls <code>blockingEnqueue</code> to place an
item into the queue, and the application processing loop calls
<code>blockingDequeue</code> to take an element out of the queue.
The result of either of these actions could be to unblock another
coroutine, so we modify those methods to add a call to <code>coNotify</code>
in case they are being called from a coroutine that is not currently
being managed by our coroutine scheduler.
Since we are calling the enqueue and dequeue methods from different threads,
we use a
<a href="http://www.scala-lang.org/api/current/scala/collection/mutable/SynchronizedQueue.html">
<code>SynchronizedQueue</code></a>
rather than a plain
<a href="http://www.scala-lang.org/api/current/scala/collection/mutable/Queue.html">
<code>Queue</code></a>.
Those two methods now look like this:
<pre name="hlcode" class="scala"
>import scala.collection.mutable.<b>Synchronized</b>Queue
class CoQueue ... extends <b>Synchronized</b>Queue[A] { ...
def blockingEnqueue(x:A):Unit @suspendable = {
enqueueResource.waitUntilNotBlocked
enqueue(x)
<b>dequeueResource.coNotify</b>
}
def blockingDequeue():A @suspendable = {
dequeueResource.waitUntilNotBlocked
<b>val x =</b> dequeue
<b>enqueueResource.coNotify</b>
<b>x</b>
}
</pre>
<a name="nioconnection"></a>
<h3>NioConnection</h3>
We add a <code>CoQueue</code> which we use as our input queue between
the socket reader loop and the application loop.
For this example, we pick an arbitrary limit of 10;
if our application gets behind by more than 10 items,
the socket reader code will suspend when attempting to write to the queue.
If more data arrives while that code is thus suspended,
it will back up in the system's input buffer for that connection,
and eventually the client will get an error when trying to write
to its output connection.
<br/><br/>
In order to initialize the <code>CoQueue</code>
we need to pass in a <code>CoScheduler</code>, so we add that
parameter to our constructor and to the convenience method
in our companion object.
<pre name="hlcode" class="scala"
><b>import net.jimmc.scoroutine.{CoQueue,CoScheduler}</b>
//In object NioConnection
def newConnection(<b>sched:CoScheduler,</b> selector:NioSelector, socket:SocketChannel) {
val conn = new NioConnection(<b>sched,</b>selector,socket)
}
class NioConnection(<b>sched:CoScheduler,</b> selector:NioSelector, socket:SocketChannel) {
//Add CoQueue
<b>private val inQ = new CoQueue[String](sched, 10)</b>
}
</pre>
Now that we have a queue, we modify our socket reader code to place our
input data (after conversion to a Java string) into our queue rather than
writing it straight to the output socket.
We want to block when the queue is full, so we call the
<code>blockingEnqueue</code> method.
Since we now know that's the only action we will be taking,
we fold the <code>readAction</code> method back into <code>readWhile</code>.
Because <code>blockingEnqueue</code> is suspendable,
the <code>else</code> branch of the
<code>if (count<1)</code> code block is suspendable, so we need to
make the <code>if</code> branch suspendable as well.
We do this by adding a <code>shiftUnit</code> call as the final value
in the <code>if</code> branch.
The <code>readWhile</code> method now looks like this:
<pre name="hlcode" class="scala"
> private def readWait = {
buffer.clear()
val count = read(buffer)
if (count<1) <b>{</b>
socket.close()
<b>shiftUnit[Unit,Unit,Unit]()</b>
<b>}</b> else <b>{</b>
<b>//Moved here from readAction</b>
buffer.flip()
lineDecoder.processBytes(buffer, <b>inQ.blockingEnqueue(_)</b>)
<b>}</b>
}
</pre>
We now have input data going into our queue, but nobody is
reading it.
For this example, we implement a simple echo loop that reads
from the input queue using a new <code>readLine</code> method
and writes to the output using our existing <code>writeLine</code> method.
We do this inside a <code>reset</code> block so that
it becomes another coroutine that can be managed by our
coroutine scheduler.
Our previous <code>start</code> method started up the socket reader loop.
We rename that one to <code>startReader</code>, add a
<code>startApp</code> method that starts up our echo loop,
and call both of those from a new <code>start</code> method.
Our <code>start</code> method now looks like this:
<pre name="hlcode" class="scala"
>//In class NioConnection
def start():Unit = {
<b>startReader
startApp
}
private def startApp() {
reset {
while (socket.isOpen)
writeLine(readLine())
}
}
private def startReader() {</b>
reset {
while (socket.isOpen)
readWait
}
}
<b>def readLine():String @suspendable = inQ.blockingDequeue</b>
</pre>
<a name="linedecoder"></a>
<h3>LineDecoder</h3>
Our <code>processBytes</code> method is now getting passed a callback
that is suspendable, so we need to modify the signature of our
method to accept that.
It passes that callback to <code>processChars</code>, so that
signature needs to be changed in the same way.
Since <code>processChars</code> is now calling a suspendable method,
it too is suspendable, so its return signature
needs to be modified to note that,
and since <code>processBytes</code> calls <code>processChars</code>,
it too needs to be modified to have a suspendable return signature.
<pre name="hlcode" class="scala"
>//In class LineDecoder
<b>import scala.util.continuations._</b>
def processBytes(b:ByteBuffer,
lineHandler:(String)=>Unit <b>@suspendable</b>):Unit <b>@suspendable</b> = ...
private def processChars(cb:CharBuffer,
lineHandler:(String)=>Unit <b>@suspendable</b>)<b>:Unit @suspendable =</b> { ... }
</pre>
<a name="niolistener"></a>
<h3>NioListener</h3>
<code>NioListener</code> calls <code>NioConnection.newConnection</code>,
and that call now requires a <code>CoScheduler</code> argument,
so we add that to our constructor and pass it through when we call
<code>newConnection</code>.
<pre name="hlcode" class="scala"
><b>import net.jimmc.scoroutine.CoScheduler</b>
class NioListener(<b>sched:CoScheduler,</b> selector:NioSelector, hostAddr:InetAddress, port:Int) {
def start(continueListening: =>Boolean):Unit = {
reset {
while (continueListening) {
val socket = accept()
NioConnection.newConnection(<b>sched,</b>selector,socket)
}
}
}
}
</pre>
<a name="nioserver"></a>
<h3>NioServer</h3>
<code>NioServer</code> instantiates the <code>NioListener</code>, so
we need to pass it an instance of <code>CoScheduler</code>.
We create an instance of <code>DefaultCoScheduler</code> and pass that in.
We now need two threads, one for our coroutine scheduler and one
for the NIO scheduler.
In our <code>start</code> method,
we create and start a second <code>Thread</code> for the NIO scheduler,
then rename our own thread and run the coroutine scheduler on it.
<pre name="hlcode" class="scala"
><b>import net.jimmc.scoroutine.DefaultCoScheduler</b>
class NioServer(hostAddr:InetAddress, port:Int) {
val selector = new NioSelector()
<b>val sched = new DefaultCoScheduler</b>
val listener = new NioListener(<b>sched,</b> selector, hostAddr, port)
def start() {
listener.start(true)
<b>//run the NIO selector on its own thread
(new Thread(selector,"NioSelector")).start
Thread.currentThread.setName("CoScheduler")
sched.run //run the coroutine scheduler on our thread, renamed</b>
}
}
</pre>
<a name="summary"></a>
<h3>Summary</h3>
As in the previous post, we have once again transformed our example
application in a way which provides an internal improvement - in this
case the ability to use multiple threads - but which
has not changed its basic external behavior:
we still have a simple echo server.
We also have not yet addressed all of the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html#limitations">Limitations</a>
from the first post in this series.
Stay tuned for more.
<a name="caveats"></a>
<h3>Caveats</h3>
<ul>
<li>Although I have asserted that it is possible to write a multi-threading
scheduler to the <code>CoScheduler</code> API,
I have not yet actually done this.
It is possible that this may be more difficult than I expect.
<li>Multi-threaded code is generally tricky stuff.
I have not spent a lot of time running this example code,
so it is certainly possible that there are race conditions or other
concurrency problems.
</ul>Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0tag:blogger.com,1999:blog-7045524330253482541.post-87330393537927690702011-03-28T15:01:00.000-07:002011-03-28T15:01:20.877-07:00Java NIO for Character Decoding in ScalaThe Java NIO package includes some handy character encoding and
decoding methods that can be used from Scala.
<h3>Contents</h3>
<ul>
<li><a href="#background">Background</a>
<li><a href="#java-nio-character-coders">Java NIO Character Coders</a>
<li><a href="#linedecoder">LineDecoder</a>
<li><a href="#nioconnection">NioConnection</a>
<li><a href="#limitations">Limitations</a>
</ul>
<a name="background"></a>
<h3>Background</h3>
In my
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html">
previous post</a>
I described a simple Scala server using NIO and continuations,
and mentioned in the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html#limitations">
Limitations</a>
section that the example did not convert the data bytes to characters.
In this post I show how that can easily be added by using another
feature of the
<a href="http://download.oracle.com/javase/1.5.0/docs/guide/nio/">
Java NIO</a> package:
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/package-summary.html">
character-set encoders and decoders</a>.
<a name="java-nio-character-coders"></a>
<h3>Java NIO Character Coders</h3>
The <code>java.nio.charset</code> package includes a
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/Charset.html">
<code>Charset</code></a>
class that represents a mapping between the 16-bit Unicode
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html#unicode">
code-units</a> that Java uses for its internal representation for
characters and strings,
and a sequence of bytes as are stored in a file or transmitted
through a socket connection.
Each such mapping is represented by a separate instance of the
<code>Charset</code> class.
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/Charset.html#iana">
Standard character mappings</a> such as "UTF-8" and "ISO-8859-1"
can be retrieved using the static
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/Charset.html#forName(java.lang.String)">
<code>forName</code></a>
method.
<br/><br/>
Given an instance of <code>Charset</code>,
a
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/CharsetEncoder.html">
<code>CharsetEncoder</code></a>
for that character mapping can
be retrieved by calling the
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/Charset.html#newEncoder()">
<code>newEncoder</code></a>
method on that instance.
That encoder can then be used to convert a Java string into a sequence
of bytes suitable for writing to a file or connection.
<br/><br/>
Similarly, the
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/Charset.html#newDecoder()">
<code>newDecoder</code></a>
method on <code>Charset</code> retrieves a
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/CharsetDecoder.html">
<code>CharsetDecoder</code></a>
that can be used for the complementary task of
converting bytes from a file or connection into a Java string.
<br/><br/>
The encoding and decoding methods convert data between a
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/CharBuffer.html">
<code>CharBuffer</code></a>
and a
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/ByteBuffer.html">
<code>ByteBuffer</code></a>.
Since the <code>java.nio</code> socket I/O calls we are using read and write
their data to and from <code>ByteBuffer</code>s,
it is convenient for the encoding and decoding to use those objects.
<a name="linedecoder"></a>
<h3>LineDecoder</h3>
Using the <code>java.nio.charset</code> classes described above,
we write a <code>LineDecoder</code>
class containing a <code>processBytes</code> method that takes as input a
<code>ByteBuffer</code>
(which is what we have to read into when using a
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/channels/SocketChannel.html">
<code>SocketChannel</code></a>)
and converts that byte data to Java characters.
For this example, we also break up that character data into separate lines
when we see line break characters,
converting each line of characters to a Java <code>String</code>.
One buffer of data might contain multiple lines of character data,
so rather than returning a set of lines,
our method accepts a callback to which we pass each line
as we decode it.
<br/><br/>
<input id="hlButton" type="submit" value="Highlight Syntax" onclick="highlightSyntaxFlipButton()"/>
<pre name="hlcode" class="scala"
>import java.nio.{ByteBuffer,CharBuffer}
import java.nio.charset.{Charset,CharsetDecoder,CharsetEncoder,CoderResult}
import scala.annotation.tailrec
class LineDecoder {
//Encoders and decoders are not multi-thread safe, so create one
//for each connection in case we are using multiple threads.
val utf8Charset = Charset.forName("UTF-8")
val utf8Encoder = utf8Charset.newEncoder
val utf8Decoder = utf8Charset.newDecoder
def processBytes(b:ByteBuffer, lineHandler:(String)=>Unit):Unit =
processChars(utf8Decoder.decode(b),lineHandler)
@tailrec
private def processChars(cb:CharBuffer, lineHandler:(String)=>Unit) {
val len = lengthOfFirstLine(cb)
if (len>=0) {
val ca = new Array[Char](len)
cb.get(ca,0,len)
eatLineEnding(cb)
val line = new String(ca)
lineHandler(line)
processChars(cb, lineHandler) //handle multiple lines
}
}
//Assuming the first character in the buffer is an eol char,
//consume it and a possible matching CR or LF in case the EOL is 2 chars.
private def eatLineEnding(cb:CharBuffer) {
//Eat the first character and see what it is
cb.get match {
case '\n' => if (cb.remaining>0 && cb.charAt(0)=='\r') cb.get
case '\r' => if (cb.remaining>0 && cb.charAt(0)=='\n') cb.get
case _ => //ignore everything else
}
}
private def lengthOfFirstLine(cb:CharBuffer):Int = {
(0 until cb.remaining) find { i =>
List('\n','\r').indexOf(cb.charAt(i))>=0 } getOrElse -1
}
}
</pre>
Here is an imperative version of <code>lengthOfFirstLine</code>
that does the same thing as the functional version above.
<pre name="hlcode" class="scala"
> private def lengthOfFirstLine(cb:CharBuffer):Int = {
var cbLen = cb.remaining
for (i <- 0 until cbLen) {
val ch = cb.charAt(i)
if (ch == '\n' || ch == '\r')
return i
}
return -1
}
</pre>
<a name="nioconnection"></a>
<h3>NioConnection</h3>
One of the classes shown in my previous post was the
<a href="http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html#nioconnection">
NioConnection</a>
class,
whose responsibilities include processing input data from the client.
It does this in the method <code>readAction</code>,
which initially looks like this:
<pre name="hlcode" class="scala"
>//The old version
private def readAction(b:ByteBuffer) {
b.flip()
socket.write(b)
b.clear()
}
</pre>
We replace the direct call to <code>socket.write</code>
with a call to <code>LineDecoder.processBytes</code>,
which is responsible for decoding the input data,
and we pass it our new <code>writeLine</code> method
that accepts a line of characters
and writes it back to the client.
Also, we don't actually need the call to <code>b.clear</code> here,
which is effectively at the bottom of our <code>readWhile</code> loop,
since we call that method at the top of the loop.
<pre name="hlcode" class="scala"
> private val lineDecoder = new LineDecoder
private def readAction(b:ByteBuffer) {
b.flip()
lineDecoder.processBytes(b, writeLine)
}
def writeLine(line:String) {
socket.write(ByteBuffer.wrap((line+"\n").getBytes("UTF-8")))
}
</pre>
Now when we receive some input data, it gets passed to
<code>LineDecoder.processBytes</code>,
which converts it to characters, breaks it up into separate lines,
and calls our <code>writeLine</code> method for each line.
The <code>writeLine</code> method uses
<code>String.getBytes</code>
to convert the characters in the line back to bytes,
wraps those bytes into a <code>ByteBuffer</code>
and writes them directly to the output channel.
<br/><br/>
As compared to the example in the previous post, this example should
behave the same externally,
but we are now passing around Java strings rather than NIO buffers,
which, assuming we want to deal with string data rather than binary data,
will make it simpler to write the rest of the real application.
<a name="limitations"></a>
<h3>Limitations</h3>
<ul>
<li>As with the example in the previous post,
the current example only shows how to use the NIO calls
on the read side of the connection.
We could use a <code>CharsetEncoder</code> on the write side
rather than using <code>String.getBytes</code> and
<code>ByteBuffer.wrap</code>.
<li>Partial input lines (characters not terminated by an EOL character)
are ignored by this implementation.
<li>The example uses the convenience method version of
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/CharsetDecoder.html#decode(java.nio.ByteBuffer)">
<code>decode</code></a>,
which assumes that the input <code>ByteBuffer</code> contains complete
character sequences.
It is possible that a multi-byte character sequence will be
split such that only the first part of that sequence appears at the
end of the input buffer,
with the remainder of the sequence appearing at the start of the next
buffer of input data.
The above implementation will not properly handle this situation.
The underlying
<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/CharsetDecoder.html#decode(java.nio.ByteBuffer,%20java.nio.CharBuffer,%20boolean)">
<code>decode</code></a> method does handle this situation properly,
but the remaining code in this example is not set up for this situation.
<li>The <code>decode</code>
convenience method throws exceptions rather than returning
a status code as the full <code>decode</code> method does.
Since these exceptions are nowhere caught in the code, such an
exception would cause that task to abort.
A more robust solution would have a mechanism to catch exceptions or
restart an aborted task.
<li>The example assumes UTF-8 encoding.
</ul>Jim McBeathhttp://www.blogger.com/profile/10541190774989580614noreply@blogger.com0