Sunday, August 14, 2022

The Ideal Software Law

In science, we make abstractions that are simplified models of reality, then we try to describe them with equations that let us make accurate predictions given the conditions assumed by the model. In this post I attempt to do that for software projects.

Contents

The Ideal Gas Law

In physics, the behavior of an idealized gas is described by the ideal gas law: PV=nRT, where P is pressure, V is volume, n is the quantity of gas, R is a constant, and T is the absolute temperature. While real gases don't follow this law exactly, it can be used to make pretty good predictions. It can help you understand how steam engines, refrigerators, and hot air balloons work.

A key insight that follows from this equation is that you can't hold three of the four parameters fixed and change just one parameter. If you have a fixed amount of gas at a given pressure, volume, and temperature, and you increase the temperature, then either the pressure goes up, the volume goes up, or both. If, with the same starting conditions, you decrease the volume, then either the pressure must go up, or the temperature must go down, or both. You can keep any two parameters fixed and change the other two in fixed relationships, but you simply can't hold three of the parameters fixed and change just one. If you try to do that, you will invariably fail: one or more of the other parameters will, perforce, also change.

The Ideal Software Law

We can use a similar equation to convey the relationships among the parameters of software development. Instead of PV=nRT, we have:
FQ=nST
where F is functionality, Q is quality, n is development resources, S is a constant, and T is the amount of time to complete development. As with the ideal gas law, this equation does not precisely apply to real software projects, but it can be used to make predictions and gain insights. In particular, we can see in this formulation the same basic insight as with the ideal gas law: it is not possible to hold all but one of the parameters fixed and change only one parameter. If you try to do so, one or more of the other parameters will, perforce, also change.

The Parameters

Let's take a look at what the parameters in our equation mean and how we might measure them.

Functionality (F)

Functionality represents what our software can do. There are defined ways to measure the functional size of software, such as COSMIC function points, but we would like something simpler that still allows us to understand the relationships between the parameters of the equation. For our purposes, a reasonable proxy for functionality is lines of code (LoC).

We are not claiming that lines of code is a good general metric for measuring productivity. Some people write denser code than others, so can implement more functionality in the same number of lines of code. Some research has concluded that people can write the same number of lines of code per day independent of language, but a higher-level language can express more with the same number of lines of code, so could be used to implement more functionality in the same number of lines of code as compared to a lower-level language. Some projects have a more difficult environment than others, so developers produce fewer lines of code per day in that environment.

However, we are using LoC slightly differently in this case. We are not using it to compare productivity or functionality between projects and teams, but only within the team and project for which we are measuring functionality. We assume that all of the factors mentioned above that affect the LoC metric are constant within the project and time span of interest, so that twice as many lines of code will provide twice as much functionality.

Quality (Q)

For quality, we could use a sophisticated quality model such as ISO/IEC 25010, but for this exercise we will use the simpler Defect Management approach.

Intuitively, it makes sense that higher quality software will have fewer bugs (also called defects). We also expect a larger project to have more total bugs than a smaller project. Roughly speaking, then, we can think of the number of bugs per line of code as being a proxy for the level of quality of a software project. We can call this the bug density (or defect density). We want our parameter to be larger for higher quality software, so we use the reciprocal of the bug density. The reciprocal of density for materials is called specific volume, so we will call this measure bug specific volume (or defect specific volume), and use that as our measure of quality. Our units for quality are thus LoC/bug.

We recognize that there are some practical problems with this measure. Firstly, bugs come in different sizes. For our purpose we will assume some kind of "normalized" bug units, and assign more serious bugs more than one bug unit. Secondly, we don't know how many bugs are in a piece of software until well after it is delivered. We assume those bugs exist and will be revealed over time, at a rate which depends on factors such as how much use the software gets, so although we don't know the number in advance, we can still use this concept in our abstraction to understand the relation of quality to the other parameters.

Resources (n)

Resources, as in Human Resources, refers to the people we have available to work on the project. To a first approximation, n is the number of people developing the project. Many studies have shown that different people have different levels of productivity. For this idealization we assume that there is a baseline developer and that we know what the productivity multiplier is for each of our developers as compared to that baseline developer, despite that in practice this might be difficult, and the factor could be different depending on circumstances. We then define n as the number of baseline developers on the project. If we have a developer who we believe is three times as productive as our baseline, that would increase n by three. Our units for n are thus baseline developers, but for simplicity, we will sometimes just refer to the units for n as people.

Our idealized equation assumes that we could do our project in half the time if we had twice the resources. We recognize that we are blatantly ignoring the problems of the mythical man-month.

Time (T)

Time refers to how much time it will take to complete the project. This is the most straightforward dimension to measure, and because of that it is often the dimension that gets the most attention during project planning. We choose to use days as our units, as that is a commonly used unit for other aspects of software development.

The Software Constant

The units we have selected for the four parameters define the units of the constant S.

F(LoC)Q(LoC/bug)=n(person)S(??)T(days)

Therefore the units for S must be (LoC^2)/(bug*person*days). We can also write this as (LoC/bug)*(LoC/person/day). LoC/bug is a bug specific volume (our quality measure), and LoC/person/day is a development velocity for our baseline developer, so S is the product of a bug specific volume and a per-person development velocity. We can think of S as the "quality velocity" for one baseline developer. A higher value of S means higher productivity: more functionality or quality from a given amount of time, per developer.

So what value should we use for S? Some people (such as Brooks in The Mythical Man-Month) say a programmer can write about 10 lines of production code per day. Other sources use different numbers, but as a baseline we will go with Brooks value of 10 LoC/person/day.

For bug density, various studies have come up with a number from 3 to 50 defects per 1000 LoC. As a starting point, I will select 10 bugs per 1000 LoC, or a bug specific volume of 100. Combining these two values gives 10 * 100 = 1000 as the value of S. This means our baseline developer could, for example, write 10 lines of code with 10 bugs per 1000 LoC in one day, or 20 lines of code with 20 bugs per 1000 LoC.

In reality, different collections of people, different development environments, and different project attributes will all lead to different values of S. Organization should always be looking for ways to increase the value of S for their projects, but for this analysis I am assuming that they have already done this in all the easy ways, and the remaining opportunities to increase S require larger investments and time to have an effect on the project. Thus when analyzing our equation to see what predictions it makes for a particular project, we will assume S is constant.

The form of the equation

The Ideal Gas Law was created by assembling a number of simpler laws that were derived from empirical observations. Each of these simpler laws demonstrated the relationship between two parameters when the other two were held constant.
Our Ideal Software Law is similarly assembled from simpler guidelines. We don't have previously stated laws, so we rely on our intuition to guide us.
  • All other things being equal, functionality is proportional to resources: F ∝ n
  • All other things being equal, functionality is proportional to time: F ∝ T
  • All other things being equal, quality will he higher with more resources
  • All other things being equal, quality will he higher with more time
Because quality is hard to define and measure, we don't actually know how close to being proportional to the other variables it is. For simplicity, we assume that it is proportional to both resources and time, the same as functionality: Q ∝ n and Q ∝ T.

These four rules, when assembled, give us the form of the equation for the Ideal Software Law shown above.

Example

Let's make a concrete example. Let's assume we have a project with the following parameters:
  • The functionality we desire requires 10,000 lines of code
  • Our quality bar is 5 bugs per 1000 lines of code (better than baseline), so 200 LoC/bug
  • We have 10 people on our team, all operating at baseline
  • Our team software constant S is 1000, as calculated above.
How many days should we expect this project to take to complete? From the Ideal Software Law, we have:

10,000 (LoC) * 200 (LoC/bug) = 10 (people) * 1000 (LoC^2/(bug*people*days)) * d (days)

Solving for d, we get d = (10,000*200)/(10*1000) = 200 days. A project team, given the assumptions above (although perhaps not so explicitly), would perhaps deliver this estimate to management when asked how long the project will take.

Analysis

Now let's play with the parameters and see what happens.

The typical scenario is that management comes back to the team and says "That estimate is too long. We need to deliver sooner. Make it happen faster." What options does the team have?

Looking at the Ideal Software Law equation, if we want to make T smaller, we have four options:
  • Make F smaller (less functionality)
  • Make Q smaller (less quality)
  • Make n larger (more developers)
  • Make S larger (higher velocity)
Clearly making S larger would be good, but, as mentioned above, when considering the schedule for a single project, this is unlikely to be a short-term option. That leaves us with three other parameters that can be changed.

We could make n larger by adding more developers to the team. This can be effective if there are people available, but practically speaking is difficult because of limited budgets, the difficulty of finding appropriate developers, and the time-cost of bringing a new team member up to speed. All of those factors make this choice possible but unlikely.

Now we are down to two parameters: functionality and quality. The developer team will typically propose to make F smaller, also called a reduction in scope, by removing features from the project. If this is acceptable to management, then the reduced value of T can be balanced by the reduced value of F.

In many cases, however, management insists on not cutting any features. Now we are left with only one parameter: quality. Because this is the hardest parameter to measure, it is also the one that most often is ignored. In this situation, when T is made smaller and F, n, and S are unchanged, Q must, perforce, be made smaller by the same fraction as T was reduced.

The choice to reduce quality is sometimes made consciously, and could come with a commitment to go back later and improve quality. This is often referred to as taking on technical debt, which is expected to be paid back by improving the code later. The word "debt" is used here in intentional analogy to financial debt: there is a carrying cost to debt in the form of interest, making the total cost continue to go up the longer it remains unpaid. In software, this manifests as more time spent fixing bugs after product release, until such time as the debt is repaid by cleaning up the code to bring its quality back up.

If, however, a decision is made to reduce project time without changing functionality or resources, without consciously recognizing that there will be a reduction in quality, this is effectively like borrowing money without realizing it or having a plan to pay it back. The interest payments will still be there, in the form of more time spent fixing bugs and more time required to add new features, and that will negatively impact the team's schedule on future projects.

Limitations of the abstraction

All abstractions will eventually break down when the parameters go outside the valid range of the abstraction.
  • Newton's law of gravity elegantly describes the paths of the planets, but starts to break down in strong gravitational fields
  • The constant-time swing of a pendulum of a given length starts to change when the pendulum swings too far from its center position
  • The Ideal Gas Law becomes less accurate at lower temperatures, higher pressures, and with larger gas molecules
Understanding the limitations of an abstraction allows us to improve our predictions. In the Parameters section above, I discuss some of the assumptions about each parameter. When we recognize that an assumption does not hold, we can bend the results of our formula to try to compensate.

For example, our formula tells us we can get the same functionality in half the time by doubling our resources. But we know that it takes time to bring a new developer up to speed on a project, so we won't actually be able to cut our time in half. By estimating how much reality deviates from our assumption, we can improve the accuracy of the predictions made by the formula despite the fact that the assumptions behind the formula are not entirely accurate.

Conclusion

By abstracting the parameters of software development and creating an equation, we can make practical predictions about those parameters. We can make such predictions even when the assumptions behind our formula are not completely true.

One of the most important predictions is this:
If you insist on reducing the time available to complete a software project, and you don't increase the number of people on the project or cut some features, the quality of the delivered sofware will decrease proportionally to the reduction in time.

Sunday, May 1, 2022

Home Automation for a Hot Water Recirculating Pump

My bathroom is pretty far from the water heater. It took over a minute of running the hot water for it to actually get hot. That's a lot of water wasted every time I waited for hot water. I wanted hot water faster.

Contents

Recirculating Hot Water

Last year, as part of a bathroom remodel, I had a hot water recirculating system installed. This consisted of a return pipe from the bathroom and a recirculating pump at the water heater to pull water from the return pipe, thus bringing hot water to the bathroom without having to run water down the drain waiting for it to warm up.

Once the system was installed, I learned that the pump is not supposed to run all the time. In addition, the pump, while not terribly noisy, produced enough noise to be annoying, especially in the parts of the house adjacent to the garage where the water heater and pump were located. So I didn't want to run it all the time for that reason.

The installer gave me a timer. I set it up to run in the morning and the evening. My schedule wasn't precise enough to run the timer for just a short amount of time, so I had it set up to run for about an hour. This didn't work very well: besides the noise issue mentioned above, the temperature of the water dropped a noticeable amount during this period. I needed another solution.

Home Automation

My solution was to set up a home automation system with some outlets and some battery-powered pushbuttons and program it so that when one of the pushbuttons was pressed, it would turn on the outlet for a couple of minutes to run the recirculating pump. This has worked well.

Years ago I used a bunch of X10 switches and outlets. I even installed a blocker to isolate the X10 signals in my house from the incoming power line and a coupler to ensure the X10 signals from one 120V leg made it to devices on the other 120V leg. I eventually stopped using those devices and had not installed any other home automation until now.

After looking at what was available, I decided to use the following technologies for my new home automation system:
  • Home Assistant as the controller
    I chose this for two reasons:
    1. I don't want my system to depend on the cloud or to be sending data out to anyone. Home Assistant allows me to do everything myself and be isolated from the internet. My automation won't stop working when my internet connection or someone else's computers or software go down.
    2. I like to tinker. Home Assistant is highly customizable - as long as you are willing to fiddle with it.
  • Zigbee 3.0 devices
    • I looked at Zigbee and Z-wave and decided Zigbee looked like the better choice for number of compatible available devices.
    • I specifically did not want to use wifi devices.
Having made those two choices, the next choice was where to run Home Assistant and how to connect the Zigbee devices to it. I figured I would use a USB Zigbee coordinator. For the Home Assistant host, I considered running it on my desktop (which is always on), on my Synology NAS, or on a bespoke device such as a Raspberry Pi. I learned that Synology announced they would be removing support for external USB devices other than disks, so I eliminated that choice. I starting looking into using a Raspberry Pi and read multiple comments about high failure rates of the SD cards. Someone suggested attaching a USB SSD, which seemed like a good idea, but that would require more research and figuring out how to mount everything. About this time I discovered HA Blue, a nice little device based on the Odroid-N2 with 128GB of on-board eMMC, 4 USB ports, ethernet, and HDMI, all in a good-looking extruded aluminum case, and pre-loaded with Home Assistant. It's a little more expensive than some other options, but for me the added convenience of a pre-installed system and the nice case were worth the price.

Note: Home Assistant Blue has been discontinued and is being superseded by Home Assistant Yellow, which has a built-in Zigbee radio and more expansion slots.

Even after deciding on Zigbee, there were a few different available ways to set up the communication between the Zigbee devices and Home Assistant. After doing some reading, I settled on using zigbee2mqtt. It seems like one of the newer solutions, and one where I would have less trouble integrating a wider variety of devices.

Hardware

For my initial foray into home automation and based on my decisions above, I bought the following:
  • HA Blue bespoke Home Assistant controller pre-loaded with Home Assistant
  • SmartLight Zigbee CC2652P Coordinator v4 USB Adapter preflashed with CC2652P_E72_20210319 firmware to support zigbee2mqtt
  • Some Sonoff S31 Lite Zigbee outlet plugs
  • Sonoff SNZB-01 Zigbee switch
  • Some Linkind Zigbee switches and outlets
I used the Blakadder compatibility list to find devices that were compatible with zigbee2mqtt, then looked at which ones I could get and what they cost. The outlets and switches I bought were on the less expensive end of the range, costing less than $10 each, although the price has since gone up.

Initial Setup

Setting up the HA Blue system was straightforward:
  1. Plug it in to power and ethernet
  2. Look in my DHCP log to see what IP address it was assigned
  3. Open my web browser to port 8123 at that IP address
  4. Wait for it to run through its first-boot setup (about 10 minutes)
  5. Create an account for myself
I set up the Zigbee USB adapter (following a YouTUBE video (but beware, there have been some changes since that video was made):
  1. Plug in the Zigbee USB adapter
  2. Log into HA Blue using my account
  3. Enable Advanced mode in my profile
  4. Create user "mqtt" to handle mqtt stuff
  5. From the Add-on store, install Mosquito Broker
  6. Configure Mosquito Broker by adding the mqtt user, and start it
Once the Zigbee adapter was in place, I set up zigbee2mqtt:
  1. In the Add-on store screen, from the "..." menu, select Repository and add the URL for the zigbee2mqtt repository, then find the Zigbee2mqtt Hass.io Add-on near the bottom and select it
  2. Find the USB port the Zigbee adapter is connected to: in Supervisor, System, Host box, three-dot menu, Hardware is a list of devices in /dev; by plugging and unplugging the Zigbee adapter I could see that it shows up as device 1-1.2 with path /dev/bus/usb/001/004 and as /dev/ttyUSB0. Or you can just assume /dev/ttyUSB0.
  3. Edit the configuration on the zigbee2mqtt module and change the default pot from /dev/ttyACM0 to /dev/ttyUSB0, and change the username to mqtt
  4. Start the module
I also set up ssh to simplify future customizations:
  1. Install the Terminal & SSH Add-on and start it
  2. Open the Terminal & SSH Web UI, which is a web terminal, usable as an alternative to ssh
  3. In the Terminal & SSH Config network page, specify port 22
  4. In the Terminal & SSH Config page, add my public key to the authorized_keys array in single quotes
  5. Save, and restart the module
  6. ssh to the HA Blue as root
At this point I rebooted the HA Blue and looked in the Log for each module to make sure it was working properly.

The above description of setting up zigbee2mqtt is condensed, as I actually had a bit of trouble setting it up, including using an old zigbee2mqtt repository that I later replaced with the newer repository URL given above.

Adding Devices

With Zigbee configured on my HA system, I was ready to add my Zigbee switches and outlets.

In order to add a new Zigbee device to the network, the zigbee2mqtt module must be configured to permit devices to join. Initially I was doing this by directly editing the configuration of the zigbee2mqtt module and changing the value of the permit-join attribute to true. Once the new device had been added, I then edited the configuration again and changed permit-join back to false. Later, I discovered I could just use the Web UI for the zigbee2mqtt module and click on the "Permit join" button, which enables permit-join for 255 seconds with a count-down timer, after which it automatically turns it off.

With the HA Blue system beside me, I enabled permit-join. The LED in the Zigbee adapter started flashing green to indicate that it was in permit-join mode.

The first device I attached was a SONOFF SNZB-01 button:
  1. Pry off the back of the button, remove the paper battery insulation sheet, replace the battery and back
  2. Using a paper clip, press and hold the reset button for 5 seconds, until the red light flashes
  3. After a couple more seconds, the tile for Mosquitto Broker shows "1 device and 3 entities"
  4. Click on "1 device" to open a list of devices
  5. Click on the device to open its details page
  6. Click on the pencil icon by the hex name at the top of the page and rename the device and the entity IDs
  7. Press the button, it briefly shows "single" by the "action" line
  8. Double-click, it briefly shows "double" by the "action" line
Yay, my first Zigbee device is working!

I added a few more devices with basically the same process. Sometimes they would join just by enabling permit-join, but sometimes I also had to reset the device. I had some Sonoff devices and some Linkind devices, and I got them all working, although I did have one unexpected hiccup.

I had purchased a few Linkind outlets. The first one successfully joined my network, but the second one did not. After a few tries, I finally looked at the zigbee2mqtt log and saw that there were error messages saying the unit was not supported. (Lesson: if a new device doesn't join right away, look in the log file for errors!) Although the two outlets were sold under the same product name and looked the same, it turned out they had different model numbers: the unit that worked was ZS190000118 and the unit that failed to join was ZS190000108.

In order to add support for this slightly different flavor of Linkind outlet, I found and followed some instructions to support a new device.
  1. ssh into my HA Blue as root
  2. cd to config/zigbee2mqtt
  3. edit the new file ZS190000118.js
  4. In web browser, open https://github.com/Koenkk/zigbee-herdsman-converters/blob/master/devices/linkind.js, look for Linkind ZS190000118, and copy that stanza into my yaml file (this assumed the description was compatible, which turned out to be true)
  5. Change zigbeeModel to ['ZB_ONOFFPlug_D0008'] (from the zigbee2mqtt log)
  6. Change model to 'ZS190000108' (from the zigbee2mqtt log)
  7. Add the rest of the boilerplate as specified in step 2 of the instructions
  8. Write out the new file
  9. Update the zigbee2mqtt config to add the new device: set advanced:log_level: debug (was warn); set external_converters: - ZS190000108.js
  10. Save, Restart

Programming

Once the hardware was all in place and working, the next step was to set up the programming. It looks like there are multiple ways this can be done, and as a programmer I figured it wouldn't be too hard to write some automation code, but then I discovered Node-RED, a graphical editor plugin.

I installed Node-RED from the Community section of the AddOns menu. I had a bit of trouble with the certificate stuff, but eventually got that working. I then created a flow such that when I pressed one of my buttons, it would turn on the pump for two minutes. I spent too much time trying to figure out how to do the whole thing using standard components, but eventually decided the standard components were not quite up to the task. I ended up using a few function components, in which I wrote a bit of Javascript code.

My buttons are connected to the input of the Add Time function component, which adds time to a counter each time a button is pressed, with a max value. The buttons are also wired to an on-outlet component that turns on the recirculating pump.

Here is the Add Time code:
// On Start flow.set("max_count", 120); // 2 minutes flow.set("button_increment", 80); // 1 minute and 20 seconds // On Message max_count = flow.get("max_count"); button_increment = flow.get("button_increment"); c = flow.get("counter") if (c < 0) { c = 0; } c = c + button_increment; if (c > max_count) { c = max_count; } node.status({fill:"blue",shape:"dot",text:"count:"+c}); flow.set("counter", c); return msg;
Once time has been added to the timer, there is another function that counts down to zero, the Count Down function. The input of the Count Down function is connected to a Ticker component that ticks once per second. The output of the Count Down function is connected to an off-outlet component that turns off the recirculating pump.

Here is the Count Down code:
// On Start flow.set("counter", 0) // On Message c = flow.get("counter") c = c - 1 flow.set("counter", c) if (c > 0) { node.status({fill:"green",shape:"dot",text:"count:"+c}); return {payload:{counter:c}}; } else if (c == 0) { node.status({fill:"yellow",shape:"dot",text:"stop"}); return {payload:"stop"}; } else { node.status({fill:"red",shape:"dot",text:"stopped"}); return {payload:"stopped"}; }
This worked well, but I wanted some kind of feedback so I knew when the pump was on. To get that, I added another smart outlet, into which I plugged a guide light. I then added a function component that monitored the state of the pump switch with a state-changed component, such that when the pump outlet turned on or off, the function would turn on or off the outlet with the guide light. The function also set the node status within Home Assistant so I could see on the Node-RED schematic when it was on or off.

Here is the Outlet State code:
// On Start flow.set("counter", 0) // On Message state = msg.payload; if (state == "on") { node.status({fill:"green",shape:"dot",text:"on"}); } else if (state == "off") { node.status({fill:"red",shape:"dot",text:"off"}); } return msg
After getting this all set up, I spent some time testing with different pump-on times and tweaked the values to be just long enough to get the initial hot water to the bathroom sinks. I'm pretty happy with how it is working now.

Sunday, November 21, 2021

From Counting to Complex by Inverse and Closure

Walking the path from counting numbers to complex numbers.

Contents

Preface

Many years ago I read that Richard Feynman gave a talk to a room full of scientists in which he rederived basic abstract algebra on real numbers in under an hour. Since then I found that Feynman gave this derivation in a discussion on Algebra in his Lectures on Physics, for which I give a link a few paragraphs below.

I'm not going to compete with Feynman, but doing this derivation seemed like a fun challenge to undertake. Below I present my explanation of how one gets to complex numbers based on a few simple concepts: repetition, inverse and closure. Along the way I try to throw in a few comments about abstract algebra. By the end, we will look at Euler's Identity, eiπ+1=0, and maybe make it a little less mystical than it might appear.

It is not necessary for you to understand all of the references to math terms, so you don't need to follow those links unless you want to learn about that concept. Similarly, it is not necessary for you to follow and understand in detail every proof. Hopefully you can simply ignore any parts you don't immediately understand and yet still get something out of the overall presentation.

I walked this path mostly for my own entertainment, but I thought perhaps others might get something out of it. It is quite long and likely contains some errors, so caveat lector.

Here are a couple of other documents that discuss Algebra that you might find interesting:

Introduction

Imagine that none of this stuff exists, so we are making it all up as we go. We are going to define our numbering system from the ground up, gradually building up a structure of definitions and operations that all manage to work together nicely. It's not just by random chance that things work nicely: we are defining our numbers and operations precisely to make them work together nicely.

In the code blocks below, I label each assumption (or definition) with a name such as A1 enclosed in square brackets, like this: [A1]. Lemmas (things which can be proved from the assumptions and are used in later proofs) are labeled similarly but with L rather than A. Other intermediate steps in a proof which are not referenced outside of that proof are labeled similarly but with I. These names may be referenced later to build up additional lemmas. The references look the same, but appear in the text or in comments after an equation rather than before.

Concepts

There are three basic ways we will be extending our system:
  • Repetition: performing the same operation many times. For example, multiplication is repeated addition.
  • Inverse: an operation that has the opposite effect of some other operation. For example, subtraction is the inverse of addition.
  • Closure: the results of an operation are in the same set as the operands. For example, the natural numbers (or positive integers) are closed under addition, because you can add any two natural numbers and get another natural number; but they are not closed under subtraction, because there are some expressions on natural numbers using subtraction whose results are not natural numbers, such as (3 - 5).

Preview

Here is the quick preview of how we will move from counting to complex:
  • start with zero and the successor function
  • repeated successors yields counting and the natural numbers
  • repeated counting yields addition
  • inverse of addition yields subtraction
  • closure on subtraction yields negative numbers
  • repeated addition yields multiplication
  • inverse of multiplication yields division
  • closure on division yields rational numbers
  • repeated multiplication yields exponentiation
  • inverse of exponentiation yields logarithms
  • closure on exponentiation with positive rational numbers yields real numbers
  • closure on exponentiation with negative rational numbers yields complex numbers
  • all of our operations on complex numbers are already closed, so we are done
If you enjoy playing with math you might want to try doing all of these derivations yourself before reading my derivations.

Counting

At the most basic level, we start with some simple assumptions, which happen to be a subset of the Peano axioms.

We define a starting point for counting. Historically, people typically started with one, but for later simplicity in this exercise we start with zero. We define a successor function s(x) that takes a number x and produces the next number, which by definition is distinct from x.
[A1] zero exists [A2] given x, s(x) generates another number, where s(x) is not the same as x

Equals

We define an equals operator (=) so that the statement a=a is true, and the statement a=b means that, for any true statement containing a, we can replace any or all instance of a by b and the resulting statement will also be true. We further assume that if a=b is false, then the same replacements as described above will generally (but not always) yield a false statement.
[A3] a=a is true for all a [A4] a=b is a replacement rule (described above)
The equals operator is:
  • Reflexive: a=a (by definition)
  • Symmetric: if a=b then b=a. Starting with the true statement a=a and the predicate a=b, by our definition of equals we can replace any instance of a by b in a=a and still have a true statement; we chose to replace the first a by b, yielding b=a.
  • Transitive: if a=b and b=c, then a=c. Taking the assumed true statement a=b, and applying our equals rule using the second statement b=c, we replace b by c in the first statement, yielding a=c.
[L5.1] if a=b then b=a (demonstrated above) [L5.2] if a=b and b=c then a=c (demonstrated above)
For convenience, we define the not-equals operator != to be false whenever equals on the same values is true, and vice=versa.

The above definition also leads almost directly to one of the common ways of solving algebraic equations: performing the same operation to both sides of an equation, such as adding the same number to both sides of an equation, or multiplying both sides by the same number. Here's an example of adding the same amount to both sides of an equation.
a = b Assume this is our starting equation we are working with a + c = a + c True by definition [A3] a + c = b + c From [A4]
Note that this works for any function:
[I6.1] a = b Assume this is our starting equation we are working with [I6.2] f(a) = f(a) True by definition [A3] [I6.3] f(a) = f(b) From [A4] using [I6.2] as a starting equation and [I6.1] as our replacement rule [L6.4] if a = b then f(a) = f(b) for any f defined for a
f(x) might be 2*x, x+3, sin(x), or anything else we desire. Thus we can start with any true equation, perform the same valid operation on both sides, and still have a true equation.

Natural Numbers

Given our previously defined starting point of zero, we now define the natural numbers:
[A7.0] 0=zero [A7.1] 1=s(0) [A7.2] 2=s(1) [A7.3] 3=s(2) etc. to infinity.
By definition, s(x)!=x, so 1!=0, 2!=1, etc. Note that we did not assume that repeated application of s(x) would not eventually give us the same number. Without that assumption it is possible that, for example, s(s(s(x)))=x, or in other words, 3=0. This yields a "modulo" system, which can be useful. But for this particular exposition, I want to use the "normal" numbers, so we will add the assumption that s(x) is never equal to any previous value in the sequence. More precisely, we assume:
[A8] For any x, repeated application of the successor function any number of times will never generate x.
We have now defined an unending stream of distinct numbers, each of which is a successor to one other number.

Greater Than

We next define the relational operators less than (<) and greater than (>) with the following statements:
[A9] s(a) > a [A10] if (a > b) and (b > c) then (a > c) [A11] (b < a) always has the same truth value as (a > b)
We are now at the point where we can count and know (by definition) that each time we count we get a number that is greater than all of the previous numbers. We can start with any number and count up from there by repeated application of the successor function. For example, if we start with 4 (which is s(s(s(s(zero))))) we can count up from there by three by repeated application of the successor function three times to get s(s(s(4))), which we can calculate is 7. This gets unwieldy pretty fast. To make this simpler, let's define an "addition" operator + that gives us the same results as repeated counting.

Addition

We define the addition operator (+) as follows:
[A21] a + 0 = a [A22] a + s(b) = s(a + b)
Some quick examples:
[L23.1] a + 1 = a + s(0) = s(a + 0) = s(a) [L23.2] a + 2 = a + s(1) = s(a + 1) = s(s(a))
Since s(a) = a+1, we also have
[L23.3] a + s(b) = a + (b+1) [L23.4] s(a + b) = (a + b) + 1
For some of what we want to do below, we are going to need to use the rule of induction:
[A24] If an equation is true for a known value of n, and it can be demonstrated to be true for n+1 for any n when true for n, then it is true for all natural numbers x where x > n.

Associative

We now show that our addition operator is associative. We want to prove that (a+b)+n = a+(b+n) for all n. We start by showing this is true for n=1, then use induction:
[L25.1] a + (b + 1) = (a + b) + 1 From [A22], [L23.3] and [L23.4] [I25.2] a + (b + n) = (a + b) + n Inductive assumption, true for n=1 a + (b + (n + 1)) = a + ((b + n) + 1) From [L25.1] on (b+(n+1)) = (a + (b + n)) + 1 From [L25.1] with (b+n) for b = ((a + b) + n)+ 1 From [I25.2] applied to (a+(b+n)) = (a + b) + (n + 1) From [L25.1] in reverse with (a+b) for a and n for b [L26] a + (b + c) = (a + b) + c Above lines summarized, with c for n+1
Thus by induction we have our proof of associativity.

Commutative

We use a similar approach to show that addition is commutative, such that a+b=b+a. We start by showing that 0 commutes with a for any a.
[I27.1] 0 + 0 = 0 From [A21] with 0 for a 0 + 1 = 0 + s(0) From [L23.1] = s(0 + 0) From [A22] with 0 for a and b = s(0) From [L27] = 1 [L27.2] 0 + 1 = 1 Summary of the above few lines [I27.3] 0 + n = n Inductive assumption, true for n=1 from [L27.2] [I27.4] 0 + (n + 1) = (0 + n) + 1 From [L26] [I27.5] 0 + (n + 1) = n + 1 By induction from [I27.3] and [I27.4] [L27.6] 0 + a = a From [I27.5] with a for n+1 [I27.7] 0 + a = a = a + 0 From [L27.6] and [A21] [L27.8] 0 + a = a + 0 From [L5.2]
Now we show that 1 commutes with any number by induction.
1 + (n + 1) = 1 + s(n) From [L23.1] on (n+1) with n for a = s(1 + n) From [A22] with 1 for a and n for b = s(n + 1) From inductive assumption that 1 commutes with n, known true for n=0 = n + s(1) From [A21] with n for a and 1 for b = n + (1 + 1) From [L23.1] on s(1) with 1 for a = (n + 1) + 1 From [L25.1] [L28] 1 + a = a + 1 Summary of the above with a for n+1
Finally, we use induction again to show that any two numbers commute.
a + (n + 1) = (a + n) + 1 From [L25.1] = (n + a) + 1 From inductive assumption that a commutes with n, known true for n=1 [L28] = n + (a + 1) From [L25.1] = n + (1 + a) From [L28] = (n + 1) + a From [L25.1] [L29] a + b = b + a Summary of the above with b for n+1
As a final note for addition, since we have demonstrated that (a+b)+c=a+(b+c), we can omit the parentheses when adding multiple terms without creating any ambiguity.
[A30] a + b + c = (a + b) + c = a + (b + c)
Repeated application of this rule can be used for addition with four or more terms without parentheses. By combining this rule with [L29] commutative law, we can see that we can take an expression with multiple terms added together, such as a + b + c + d + e and rearrange and group the terms any way we want.

The associative rule also makes it easy to calculate our addition facts. We already know that 1=0+1, 2=1+1, 3=2+1 etc from our definitions [A7] with [L23.1]. That lets us fill in the first row of our addition fact table. We can then calculate all of the n+2 values based on the n+1 values, and repeat ad infinitum for the rest of the numbers.
n + 2 = n + (1 + 1) = (n + 1) + 1 n + 3 = n + (2 + 1) = (n + 2) + 1 n + 4 = n + (3 + 1) = (n + 3) + 1
Wikipedia has proofs of associativity and commutativity of addition, which are similar to mine but actually a little more concise, and here is a proof of commutativity that does not rely on associativity - but I wanted to think through these derivations myself and present them here in-line with the rest of my exposition.

Identity

At this point we know that a+0=a [A21] and 0+a=a [L27.6], or in other words adding zero to any number (on either side, since we showed addition is commutative) yields that number. This is an interesting enough fact that we will give this number a special name: the Identity for addition.

It's easy to show that there is only one identity for addition.
Assume two identity values e and f. Consider the expression e+f. Because e is an identity, e+f=f. Because f is an identity, e+f=e. Therefore e=f. [L31] Since this is true for any two identities, all are in fact the same one identity.

Algebra

We have built up our concepts in layers, like building a house: we set a foundation with zero and the successor function, put in some rim joists with the natural numbers, and laid on some flooring with the addition operator and its identity element. We have created a little structure from our concepts. Whereas a house is a physical structure, this is an algebraic structure.

It turns out that this algebraic structure is useful enough that mathematicians have given this kind of structure a name: a monoid. A monoid has these characteristics (with our case in parentheses):
  • It has a set of elements (the natural numbers).
  • It has a binary operation on those elements (the + operator).
  • The operation is associative (+ is associative).
  • The operation is closed (adding two natural numbers always produces another natural number).
  • It has an identity element (zero).
There are a few rules from the above section that we will use often enough that we want to reference them by name rather than lemma number. We use the first letter of the name of the characteristic, followed by the operator character.
[a+] a + (b + c) = (a + b) + c [L26] Associativity of addition [c+] a + b = b + a [L29] Commutativity of addition [i+] a + 0 = 0 + a = a [A21], [L27.6] Identity for addition

Subtraction

At this point we have the ability to perform addition, which allows us to calculate a value for x in such equations as x = a + b. But we don't yet have the ability to solve for x in the equation a + x = b. We want to add an operation that is the opposite of addition. In other words, if we start with a and add b to it, we want to be able to take the result and perform another operation using b in order to get back to a. An operator that has this characteristic is called an inverse. We are going to define an operation that is the inverse of addition. We will call that operation subtraction, and we will use the dash character (-) as the operator.

Before we defined addition, we already had the successor function [A2] and we defined the numbers [A7] in terms of the successor function. We defined addition with two axioms [A21] and [A22], then showed that adding 1 to any number is the same [L23] as applying the successor function. Including the successor function and the definitions of the numbers in terms of the successor function, we really had four pieces going into the definition of addition.

We could follow the same path and define a predecessor function that is the inverse of the successor function, but instead we will skip that step and work in terms of adding and subtracting 1 instead of successor and predecessor functions.

We define our subtraction operator (-) recursively, similarly to how we defined the addition operator, using an additional axiom [A41.1] in place of defining a predecessor function p(x):
[A41] a - 0 = a [A41.1] (a + 1) - 1 = a [A42] a - (b + 1) = (a - b) - 1
So let's see how this works:
3 - 0 = 3 From [A41] 3 - 1 = (2 + 1) - 1 = 2 From [A41.1], and since 3 is the successor to 2 (i.e. 3=2+1) 3 - 2 = 3 - (1 + 1) = (3 - 1) - 1 = 2 - 1 = (1 + 1) - 1 = 1

Associative

We want to prove the associative laws for subtraction so we know how we can transform various combinations of parentheses and operators. We already know about a + (b + c), so there are three other possible combinations of + and - with the parentheses in the same position:
  • a - (b + c)
  • a + (b - c)
  • a - (b - c)
We start with a - (b + c).
[L43.1] a - (b + n) = (a - b) - n Inductive assumption, true for n=1 from [A42] a - (b + (n + 1)) = a - ((b + n) + 1) From [a+] = (a - (b + n)) - 1 From [A42] = ((a - b) - n) - 1 From [L43.1] on (a-(b+n)) = (a - b) - (n + 1) From [A42] with (a-b) for a and n for b [L43.2] a - (b + c) = (a - b) - c Above lines summarized, with c for n+1
Next we do a + (b - c), which we do by induction after first doing a + (b - 1).
(a + (n + 1)) - 1 = ((a + n) + 1) - 1 From [a+] = a + n From [A41.1] with a+n for a = a + ((n + 1) - 1) From [A41.1] with n for a [L44] (a + b) - 1 = a + (b - 1) Above lines summarized, with b for n+1
[L45.1] a + b = a + (b - 0) From [A41] with b for a [L45.2] a + b = (a + b) - 0 From [A41] with (a+b) for a [L45.3] a + (b - 0) = (a + b) - 0 From [L45.1] and [L45.2] by [A4] [L45.4] a + (b - n) = (a + b) - n Inductive assumption, true for n=0 by [L45.3] a + (b - (n + 1)) = a + (b - (1 + n)) From [c+] with n for a and 1 for b = a + ((b - 1) - n) From [L43.2] on b-(1+n) = (a + (b - 1)) - n From [L45.4] with b-1 for b = ((a + b) - 1) - n From [L44] = (a + b) - (1 + n) From [L43.2] with a+b for a, 1 for b, n for c = (a + b) - (n + 1) From [c+] with n for a and 1 for b [L45.5] a + (b - c) = (a + b) - c Above lines summarized, with c for n+1
Finally we tackle a - (b - c), which we build up to through quite a few lemmas.
[L46.1] 0 - 0 = 0 [A41] with 0 for a [L46.2] (0 + 1) - 1 = 0 [A41.1] with 0 for a [L46.3] 1 - 1 = 0 From [L27.2] on 0+1 [L46.4] n - n = 0 Inductive assumption, true for n=1 from [L46.3] (n + 1) - (n + 1) = (n + 1) - (1 + n) From [c+] = ((n + 1) - 1) - n) From [L43.2] with n+1 for a, 1 for b, n for c = n - n From [A41.1] on (n+1)-1 with n+1 for a = 0 From [L46.4] [L46.5] a - a = 0 Above lines summarized, with a for n+1
a - b = a - (b + 0) From a+0=0 with b for a = a - (b + (n - n)) From a-a=0 with n for a = a - ((b + n) - n) From [L45.5] with b for a, n for b and c = a - ((n + b) - n) From commutative+ = a - (n + (b - n)) From [L45.5] = (a - n) - (b - n) From [L43.2] [L47] a - b = (a - n) - (b - n)
Substituting a = (c + n), b = (d + n) in [L47] yields [L48.1] (c + n) - (d + n) = ((c + n) - n) - ((d + n) - n) = c - d [L48.2] c - d = (c + n) - (d + n) [L48.1] last and first parts
(a - n) + n = n + (a - n) From [c+] = (n + a) - n From [L45.5] = (a + n) - n From [c+] = a + (n - n) From [L45.5] = a + 0 From [L46.5] = a From [i+] [L49] (a - n) + n = a Above lines summarized
a - (b - c) = (a + c) - ((b - c) + c) From [L48.2] with a for c, b-c for d, c for n = (a + c) - b From [L49] with c for n = (c + a) - b From [c+] on a+c = c + (a - b) From [L45.5] = (a - b) + c From [c+] [L50] a - (b - c) = (a - b) + c Above lines summarized
We now have all of our rules of association for addition and subtraction. The following four equations, repeated from above, show all eight possible combinations of + and - operators and grouping of three variables.
[L26] a + (b + c) = (a + b) + c [L43.2] a - (b + c) = (a - b) - c [L45.5] a + (b - c) = (a + b) - c [L50] a - (b - c) = (a - b) + c
Earlier we saw that, because of [L26], we can write a + b + c and know that it is unambiguous. But that is not true if we write a - b - c, because the statement (a - b) - c = a - (b - c) is not in general true. In order to be able to write fewer parentheses, we arbitrarily choose to have a - b - c mean the same thing as (a - b) - c.
[A51] a - b - c = (a - b) - c
We have specified that the middle variable (b in our equation), following the - operator, should be grouped with the variable on its left, so we call the - operator left-associative; but we generally say it is not associative, meaning it does not associate both ways as does addition.

Unlike addition, subtraction is not commutative, and it has no identity. More precisely, we could say that zero is a right identity for subtraction, but since it is not also a left identity, it is not a simple identity and we usually don't mention it.

Negative Numbers

You may already have noticed that adding the subtraction operator to our structure has created a bit of a problem: we are now able to write expressions which we can not evaluate within our structure. For example, the expression 2 - 4 can not be reduced to a single natural number. When we reduce this equation according to our rules, we eventually get to the point where we need to solve for 0 - 1, and we have no rule to reduce that any further. In other words, our system is no longer a closed system: to state the problem more precisely, the natural numbers are not closed under subtraction.
A pet peeve of mine: elementary school math teachers who tell their students "You cannot subtract 5 from 3." This statement is misleading in its imprecision, since it can be solved with the use of negative numbers. Math is a precise field. The correct statement should include that qualification: "You cannot subtract 5 from 3 using the counting numbers we are studying."

Likewise for other incorrect statements such as "You can not divide 3 by 2" and "You can not take the square root of -4."
We would like to be able to solve any equation we can write with our subtraction operator, so we will define new numbers that we can use for that purpose. We call these numbers negative numbers. We choose to write them using the same digits as we write our natural numbers, with a leading - character, such as -1 and -2.

In our house-building analogy, so far we have built a little house from the foundation upwards, and now we realize we need some more support in order to finish subtraction. Adding negative numbers is like adding another room to that house: in order to have a solid structure, we need to extend our foundation. To save on design work, we are going to reuse the same basic plan as we used when we built up the natural numbers. This is like using the same blueprint for the second room of our house as for the first, except in mirror image because we find symmetry pleasing. Here is a little diagram:
+-----+ +-----+ / 3 \ / 3 \ +----+ +----+----+ +----+----+ | 2 | | 2 | | 5 | 2 | +------+ +----+-+ +----+-+ +-+----+----+-+ | 1 | | 1 | | 1 | | 4 | 1 | +------+ +------+ +------+ +------+------+ 1. Natural 2. Addition 3. Subtraction 4. Negative Numbers Numbers on Naturals Oops! 5. Addition on Negatives 3. Completion of Subtraction
Thus we go back to the beginning of our derivation of natural numbers. To distinguish our original numbers from our newly defined negative numbers, we will call all of the numbers generated by our successor function (that would be all numbers 1 and above) the positive numbers. We will call the collection of all of these numbers (positive, negative and zero) the integers. We will call the characteristic of being "positive" and "negative" the sign of the number.

Since we want our rules to apply to all integers, we start by stating that in any of our previous assumptions and derivations, a variable name can refer to any integer unless the specific proof or assumption states otherwise (such as for induction proofs).

We started by defining a successor operator s(x) [A2], and we now define a corresponding predecessor operator p(x) that generates our negative numbers in a way which is symmetric to s(x):
[A61] given x, p(x) generates another number, where p(x) is not the same as x
In all of our original assumptions and following proofs, we now state that variable names in those assumption refer to any integer. We define the predecessor function as the inverse of the successor function and vice-versa. In other words:
[A62.1] p(s(a)) = a [A62.2] s(p(a)) = a
We define our negative numbers in the same way as we defined our natural (positive) numbers [A7]:
[A63.1] -1 = p(0) [A63.2] -2 = p(-1) [A63.3] -3 = p(-2) etc. to negative infinity.
We take our no-duplicates assumption [A8] on the successor function and state it for the predecessor function:
[A64] For any x, repeated application of the predecessor function any number of times will never generate x.
For the relational operators, we can derive their meaning relative to the predecessor operator:
s(a) > a [A9] p(s(a)) > p(a) Apply p(x) to both sides [L6.6] a > p(a) From [A62.1] [L65] p(a) < a From [A9]

Addition

We add to our definition of Addition ([A21] and [A22]) to handle negative numbers, and we extend our induction assumption [A24] to negative numbers:
[A71] a + p(b) = p(a + b) [A72] If an equation is true for a known value of n, and it can be demonstrated to be true for n+(-1) for any n when true for n, then it is true for all natural numbers x where x < n.
For each of our original assumptions through addition, we have now added similar assumptions to handle our negative numbers. All of our assumptions are completely symmetrical: take any of the original assumptions, replace successor by predecessor, replace 1 by -1, and exchange < with >, and you will get the equivalent assumption for our negative numbers. Because all of our other proofs in those sections are based on those assumptions, the symmetric proofs for negative numbers follow from the symmetric assumptions in exactly the same way as for the natural numbers. Thus all of the results and conclusions in those sections are valid for addition of negative numbers: commutative, associative, identity, algebra.

We list the results of one lemma here, leaving the details of the derivation as an exercise to the reader:
[L73] a + -1 = p(a)
We derive a couple of other useful results:
p(s(a)) = a [A62.1] p(a + 1) = a [L23.1] (a + 1) + -1 = a [L73] a + (1 + -1) = a [a+] (1 + -1) = 0 [L74] -1 + 1 = 0 [c+]
(1 + -1) = 0 [L74] n + -n = 0 Inductive assumption, true for n=1 [L74] (n + -n) + (1 + -1) = 0 From [i+] because (1 + -1) = 0 (n + 1) + (-n + -1) = 0 (n + 1) + (-(n+1)) = 0 From p(x) defn [L75] a + -a = 0 Above lines summarized, with a for n+1
The above statement says that, for any element a in our set of natural numbers, there is an element -a (a negative number, negative a) which can be added to that natural number to produce zero (our identity element). We call negative a the inverse element of a, and likewise a is the inverse element of -a.
-a + a = 0 [L75] (-a + a) - a = 0 - a Subtract a from each side -a + (a - a) = 0 - a [L45.5] [L76] -a = 0 - a [L46.5] and [i+]
a + -a = 0 [L75] (a + -a) - -a = 0 - -a Subtract -a from each side a + (-a - -a) = 0 - -a [L45.5] a = 0 - -a [L46.5] [L76.1] a = -(-a) [L76]
a + -b = a + (0 - b) [L76] = (a + 0) - b [L45.5] = a - b [i+] [L77] a + -b = a - b

Subtraction

As with addition, we note that we can create a set of symmetric assumptions using negative numbers in place of positive numbers, so that all of our results and conclusions of subtraction on positive numbers also work on negative numbers.

For improved symmetry with the definition of addition, we restate our assumptions defining subtraction to use the successor and predecessor functions, and we add a symmetric assumption that covers negative numbers. We no longer need (a+1)-1=0 [A41.1] as an assumption for subtraction, because it is equivalent to p(s(a))=a) [A62.1]. Since these assumptions are just a rewriting of our original assumptions for subtraction, all of our derivations remain the same.
[A41] a - 0 = a Repeat of original [A41] [A81] a - s(b) = p(a - b) [A42] restated in terms of s and p [A82] a - p(b) = s(a - b) Symmetric assumption to [A81]

Algebra

With the addition of negative numbers to our structure, our set is closed with respect to subtraction. We now have a set (the integers) with an associative binary operator (+) with an identity (0) and inverse elements (the negative numbers). This algebraic structure is called a group. Because our operator (addition) is commutative, our algebraic structure is an abelian group. The group, however, ignores the subtraction operator.

Multiplication

Once we start using addition for real tasks, we find that we are often adding the same number many times, such as 3+3+3+3. Because this is so common, we would like to define a shortcut - a new operator - that means the same thing. We call this operation multiplication.

There are various conventions for how the multiplication operator is written: x, * and dot are common, and in some cases a convention is adopted that two variables written next to each other with no operator between them are to be multiplied. Most computer programming languages use the asterisk character (*), and I will use that here.

In order to have as much symmetry as we can, and to minimize our design work, we will define multiplication using a similar approach as we did when we defined addition:
[A101] a * 0 = 0 [A102] a * (b + 1) = (a * b) + a [A103] a * (b - 1) = (a * b) - a
We could equivalently have used a slightly different formulation for [A103] in which we add -1 rather than subtracting 1, as supported by [L77]:
a * (-1) = a * (0 - 1) [L76] = (a * 0) - a [A103] = 0 - a [A101] = -a [L76] [L104.1] a * -1 = -a Above lines summarized
a * (b + -1) = a * (b - 1) [L77] = (a * b) - a [A103] = (a * b) + -a [L77] = (a * b) + (a * -1) [L104.1] [L104.2] a * (b + -1) = (a * b) + (a * -1) Above lines summarized
If the second operand is negative, we can factor that out and we see that it changes the sign of the result.
a * -n = -(a * n) Inductive assumption, true for n=1 a * -(n + 1) = a * (-n - 1) = (a * -n) - a = -(a * n) - a = 0 - (a * n) - a = 0 - ((a * n) + a) = 0 - (a * (n + 1)) = -(a * (n + 1)) [L104.3] a * -b = -(a * b) Above summarized, with b for n+1 [L104.4] -a * b = -(a * b) Swap a with b and use [c*]
-a * -b = -(-a * b) [L104.3] = -(-(a * b)) [L104.3] again = a * b [L76.1] [L104.5] -a * -b = a * b Above lines summarized

Identity and Zero

By setting b=0 in [A102], we see that 1 is a right-identity for multiplication:
a * (0 + 1) = (a * 0) + a From [A102] with 0 for b a * 1 = 0 + a From [i+] on LHS, [A101] on RHS [L105] a * 1 = a
We show by induction that zero multiplied on either side gives zero:
[L106.1] 0 * 0 = 0 From [A101] with 0 for a [L106.2] 0 * n = 0 Inductive assumption, true for n=0 [L106.3] 0 * (n + 1) = (0 * n) + 0 From [A102] with 0 for a, n for b [L106.4] 0 * (n + 1) = 0 + 0 From [L106.2] [L106.5] 0 * (n + 1) = 0 [L106.6] 0 * a = 0 Above summarized with a for n+1
By doing the same proof using [A103] we can conclude that [L106.6] holds for all integers.

We show that 1 is a left identity:
1 * 1 = 1 From [L105] with a=1 1 * n = n Inductive assumption, true for n=1 1 * (n + 1) = (1 * n) + 1 From [A102] with a=1 and b=n = n + 1 From Inductive assumption [L106.8] 1 * a = a Above summarized, with a for n+1
Since 1 is both a left identity and a right identity, we can drop the handedness and just refer to it as an identity.

With addition we had one special number, 0, which when added to any number yielded that number. With multiplication we see that we have two special numbers: the number 1 is an identity for multiplication, but 0 is also special, since anything multiplied by 0 yields 0. We choose to use the word "zero", when associated with a specific operation such as multiplication, to mean a value that, when given as an operand to that operator, always yields zero. Our multiplication operator has only one zero, but other systems and operators may have more than one zero.

By the same argument [L31] as for the additive identity, we can see that there is only one multiplicative identity and only one multiplicative zero.

Distributive

We show that multiplication is distributive over addition by induction:
[L107.1] a * (b + 0) = a * b = (a * b) + 0 = (a * b) + (a * 0) a * (b + 1) = (a * b) + a [A102] a * (b + 1) = (a * b) + (a * 1) From [L105] on rightmost a [L107.2] a * (b + n) = (a * b) + (a * n) Inductive assumption, true for n=1 a * (b + (n + 1)) = a * ((b + n) + 1) From [a+] = (a * (b + n)) + a From [A102] = ((a * b) + (a * n)) + a From [L107.2] = (a * b) + ((a * n) + a) From [a+] = (a * b) + (a * (n + 1)) From [A102] [L107.3] a * (b + c) = (a * b) + (a * c) Above summarized, with c for n+1
The above proof can be repeated using -1 instead of 1 (by [L104.2]), so [L107.3] covers all integers.

Using the same proof steps using [A103] rather than [A102] demonstrates that multiplication distributes over subtraction as well. Since by [L77] subtraction is the equivalent of adding the negative of a number, this is consistent.
[L107.4] a * (b - c) = (a * b) - (a * c)
2 * 1 = 2 = 1 + 1 2 * n = n + n Inductive assumption, true for n=1 2 * (n + 1) = 2 * n + 2 = (n + n) + (1 + 1) = (n + 1) + (n + 1) 2 * a = a + a 1 * b = b (0 * 1) * b = (0 * b) + b (n + 1) * b = (n * b) + b Inductive assumption, true for n=0 (n + 2) * b = (n * b) + b + b Inductive assumption, true for n=0 ((n + 1) + 1) * b = (n + 2) * b = (n * b) + b + b = ((n + 1) * b) + b (a + 1) * b = (a * b) + b

Associative

We show multiplication is associative by induction:
[L108.1] (a * b) * 0 = 0 = a * 0 = a * (b * 0) [L108.2] (a * b) * 1 = a * b = a * (b * 1) From [L105] on each side [L108.3] (a * b) * n = a * b = a * (b * n) Inductive assumption, true for n=1 (a * b) * (n + 1) = ((a * b) * n) + (a * b) = (a * (b * n)) + (a * b) From [L108.3] = a * ((b * n) + b) From [L107.3] with b*n for b, b for c = a * (b * (n + 1)) From [A102] with b for a, n for b [L108.4] (a * b) * c = a * (b * c) Above lines summarized, with c for n+1
As with the distributive law, we can replace 1 by -1 to show that our conclusion covers negative numbers as wel.

Commutative

m * n = n * m Inductive assumption, true for m=0 or 1 and n=0 or 1 (m + 1) * (n + 1) = (m + 1) * n + (n + 1) From [A102] = (m * n) + m + (n + 1) From [(a+1)*b = a*b+b] = (n * m) + n + (m + 1) From Inductive assumption and [a+] = (n + 1) * m + (m + 1) From [same as two lines up] = (n + 1) * (m + 1) From [A102] [L109] a * b= b * a
As with addition, the fact that multiplication is associative [L108.4] means that, if we have an expression that is a string of values multiplied together, we can drop the parentheses from the expression without creating any ambiguity; and the fact that it is commutative means that we can rearrange all of those multiplied values to any order we want.

Algebra

We have added a second operator to our repertoire that, like addition, is an associative binary operator with an identity. With two such operators, where one distributes over the other, we have a ring (for a more precise definition, follow the link). In the same way that group ignores subtraction, the ring ignores the division operator. As with addition, there are a few rules from the above section that we will use often enough that we want to reference them by name rather than lemma number.
[a*] a * (b * c) = (a * b) * c [L108.4] Associativity of multiplication [c*] a * b = b * a [L109] Commutativity of multiplication [z*] a * 0 = 0 * a = 0 [L106.6] Zero for multiplication [i*] a * 1 = 1 * a = a [L106.8] Identity for multiplication [d*] a * (b + c) = (a * b) + (a * c) [L107.3] Distributivity of multiplication over addition

Division

As when we defined subtraction to be the inverse operation of addition, we want an inverse operation to multiplication so that we can solve for x in equations such as a * x = b.

We call our inverse operation division. As with multiplication, there are a number of common ways this operation is expressed. For use in this presentation, we choose to use the slash character (/) to represent the division operation. We want division and multiplication each to be the inverse of the other, as is the case with addition and subtraction, so we have two candidate definitions:
[A120.1] (a * b) / b = a for all a and b except b=0 [A120.2] (a / b) * b = a for all a and b except b=0
Our definitions exclude zero because we already have a rule that says anything times zero is zero, so we know a priori that we can't make these new rules work for all a when b is zero.

The fact that we can't divide by zero is the first time we have encountered a special case in our structure, where we have to add a qualification to one of our rules stating that you can't do something rather than extending our structure to make it possible to do that. When, in building our structure of numbers, we realized that we could not answer the question "what is 3 - 5?", we expanded the structure to allow us to answer tha question ("negative 2"). In this case, we can't answer the question "what is 5 / 0?", but, for the first time, instead of trying to expand our structure to be able to answer that question, we make the statement "you can't do that". As we will see later, the further we go in defining our structure, the more such exceptions and caveats we need to make.

We check that the two assumptions above are compatible by starting with one and converting it into the other.
(a * b) / b = a [A120.1] ((a * b) / b) * b = a * b Right-multiply both sides by b (c / b) * b = c Previous line with c for a*b; this is [A120.2]
We can quickly get some useful lemmas by plugging in a few different values for a and b:
[L121] a / 1 = a From [A120.1 or 2] with b=1, after a*1=a [L122] b / b = 1 From [A120.1] with a=1, after b*1=b [L123] (1/b)*b = 1 From [A120.2] with a=1 [L124] 0 / b = 0 From [A120.1] with a=0, after 0*b=0 [L124.2] a / a = 1 From [A120.1] with a=1 and b=a
If we are looking at the equation
[I125] a = c / b
what does that mean? If we assume
[A126] c = a * b
then [I125] becomes
[I127] a = (a * b) / b
which is [A120.1]. This is true by definition, so our assumption [A126] is a valid assumption to use in solving [I125]. What we are saying here is that the solution (a) to [I125] is the value that, when multiplied by b, gives c.
[L128] If a = c / b, then c = a * b, and vice-versa (from [I125] and [A126])

Associative

As we did with subtraction, we want to prove the associative laws for division so we know how we can transform various combinations of parentheses and the multiplication and division operations. We already know about a * (b * c), so there are three other possible combinations of * and / with the parentheses in the same position:
  • a / (b * c)
  • a * (b / c)
  • a / (b / c)
[I129.1] a / (b * c) = d Given a = d * (b * c) From [L128] a = (d * c) * b From [a*] and [c*] a / b = d * c From [L128] [I129.2] (a / b) / c = d From [L128] [L129.3] a / (b * c) = (a / b) / c From [I129.1] and [I129.2]
[I130.1] a * (b / c) = d Given a * (b / c) * c = d * c Multiply both sides by c a * b = d * c Reduce b /c * c = b by [A120.1] [I130.2] (a * b) / c = d From [L128] [L130.3] a * (b / c) = (a * b) / c From [I130.1] and [I130.3]
[I131.1] a / (b / c) = d Given a = d * (b / c) From [L128] = (d * b) / c From [L130.3] a * c = d * b From [L128] c * a = d * b From [c*] (c * a) / b = d From [L128] c * (a / b) = d From [L130.3] [I131.2] (a / b) * c = d From [c*] [L131.3] a / (b / c) = (a / b) * c From [I131.1] and [I131.2]
We now have all of our rules of association for multiplication and division. The following four equations, repeated from above, show all eight possible combinations of * and / operators and grouping of three variables. Note that this table is identical to the table of rules of association for addition and subtraction, with * instead of + and / instead of -.
[a*] a * (b * c) = (a * b) * c [L129.3] a / (b * c) = (a / b) / c [L130.3] a * (b / c) = (a * b) / c [L131.3] a / (b / c) = (a / b) * c
We derive a few more useful lemmas.
a / b = (a * 1) / b From [i*] = a * (1 / b) From [LL130.3] [L132] a / b = a * (1 / b) Summary of the above lines
1 / (a / b) = (1 / a) * b From [L131.3] = b * (1 / a) From [c*] = b / a From [L132] [L133] 1 / (a / b) = b / a Summary of the above lines
(a / b) * (c / d) = ((a / b) * c) / d) From [L130.3] = (c * (a / b)) / d) From [c*] = ((c * a) / b) / d) From [L130.3] = (c * a) / (b * d) From [L129.3] = (a * c) / (b * d) From [c*] [L134] (a / b) * (c / d) = (a * c) / (b * d) Summary of the above lines
(a / b) / (c / d) = ((a / b) * 1) / (c / d) From [i*] = (a / b) * (1 / (c / d)) From [L130.3] = (a / b) * (d / c) From [L133] = (a * d) / (b * c) From [L134] [L135] (a / b) / (c / d) = (a * d) / (b * c) Summary of the above lines

Rational Numbers

You may have noticed in the above section about the division operation that we discussed things like 1 / a without commenting on the fact that our number system, which up to now includes only integers, does not in general include the numbers that can represent that. The proper sequence would have been to introduce rational numbers first, but I wanted to finish the discussion about the properties of the division operation before discussing rational numbers. With that out of the way, let's turn to rational numbers.

We can easily build a table for specific values of a, b and c for equation [I125] by taking all pairs of integer values for a and b, generating c as their product, and defining the value of c/b to be a for all of those triplets. For example, 2*3=6, therefore 6/3=2.

Our division table does not include all possible combinations of c/b, so there are some division equations for which the answer can not be found in our tables. For example, 3/2 does not appear in our table because, in our system of numbers up to this point, which is all integers, there is no number that, when multiplied by 2, yields 3.

In order for our numbers to be closed under division, we have to add some new numbers, which are the numbers needed to solve the equation c/b when there is no integer number a such that a*b=c. We call these numbers rational numbers, because they are the ratio of two integers, and we choose to represent them as a fraction using the division operator. In other words, when we ask what is the answer to the equation c/b, we are simply defining the answer to be c/b and stating that that value is a number. We will then examine how to manipulate these numbers.

We have defined rational numbers as numbers of the form c/b. We also know from our table-based enumeration of division equations that, for any number c which can be written as a*b, the value of the division equation c/b is a. We define the value of our rational number that we write as c/b to be consistent with the known solutions of our division equations written the same way. Thus the value of the rational number 6/3 is defined to be 2, etc.

Algebra

With division as the inverse of multiplication, the multiplicative identity 1, and rational numbers, our ring is now a field.

This is as far as we will go with algebra. When we continue with exponentiation to derive real numbers and then complex numbers, those structures are still fields.

Operator Precedence

Up to now, we have been using parentheses to ensure that the order of application of operators in an expression is unambiguous. We noted earlier that we don't need those parentheses in an expression that consists solely of a number of values added together, and likewise that we don't need parentheses in an expression that consists solely of a number of values multiplied together. This is nice because it reduces the amount of writing we need to do.

We can further reduce the need for parentheses by defining a rule that tells us which operations to evaluate first when there are no parentheses to guide us. When we start with an operation and then define a second operation as the repeated application of the first operation, we can think of that second operation as being more powerful than the first operation. We then give priority to the more powerful operator, defining our rule of precedence to be that, in an expression in which the order of evaluation would otherwise be ambiguous, we will evaluate the more powerful operators first.

We define addition (+) and subtraction (-) to be at the first level, and multiplication (*) and division (/) to be at the second level and higher power than the first level. Thus, for example, the expression a + b * c will be equal to a + (b * c), and the expression a / b - c will be equal to (a / b) - c.

In cases where there are multiple operators of the same power, we define the order of evaluation to be left to right. Thus, for example, the expression a / b * c will be equal to (a / b) * c, and the expression a - b + c will be equal to (a - b) + c.

Exponentiation

Up to this point the structure we have built is pretty clean. With rational numbers and our four operators (+, -, *, /), we have a system that is closed and mostly complete and consistent, with the only exception being that we can't divide by zero. Other than that one exception, operations are well-defined, we have a nice set of rules including our commutative, associative, and distributive rules, and we have a host of identities and lemmas we can apply to our rational numbers.

Once we add exponentiation, things get a lot messier: we will have expressions that have multiple values, bigger swaths of undefined operations, and many places where our lemmas and rules of manipulation no longer apply. It might seem like it's hardly worth trading our nice clean rational numbers for this mess. But despite all of the rough edges, there are enough useful things you can do with real and complex numbers that it is worth carefully defining where those rough edges are and avoiding them. So, let's forge ahead.

As with addition, once we start using multiplication for real problems, we often find we want to multiply the same number together many times, such as 3*3*3*3. As we did when defining multiplication, we define a new operator that means the same as repeated multiplication. We call this new operation exponentiation. In programming languages this is sometimes written using the up-arrow (^) as an operator, but since this is HTML we have the luxury of using the standard notation, which is to write the exponent as a superscript. For example the expression 34 means 3 multiplied by itself 4 times, or 3 * 3 * 3 * 3. We call the number on the left the base, and the superscript number the exponent. The operation of exponentiation is also referred to as taking a base to a power, where the power is the exponent.

In line with our precedence rules by which we evaluate higher-power operations first, we will evaluate exponentiation before multiplication, division, addition, and subtraction, when there are no parentheses to otherwise indicate the order of evaluation.

From [a*] we know we can group repeated multiplication any way we want, so for example 3 * 3 * 3 * 3 = (3 * 3 * 3) * 3 = (3 * 3) * (3 * 3). Using our new superscript notation, we can write this as 34 = (33) * (31) = (32) * (32). More generally, we can see these things from our definition of exponentiation and [a*]:
[L201.1] a(b + c) = ab * ac [L201.2] a1 = a [L201.3] (ab)c = a(b * c) [L201.4] (ab)c = ab*c = ac*b = (ac)b From [L201.3] and [c*]
We can figure out how to deal with (a * b)n by starting with n=2:
(a * b)2 = (a * b) * (a * b) = a * b * a * b From [a*] = a * a * b * b From [c*] = a2 * b2 [L201.5] (a * b)2 = a2 * b2 Summary of above lines
Then we use induction for the general case:
Assume (a * b)n = an * bn for some n (a * b)(n + 1) = (a * b)n * (a * b)1 From [L201.1] = (an * bn) * (a * b) From [L201.5] = an * a * bn * b From [a*] and [c*] = a(n + 1) * b(n + 1) True when n=2 from [L201.5], so by induction true for all positive n [L201.5] (a * b)n = an * bn
Unlike addition and multiplication, we can quickly see from counterexamples that exponentiation is neither commutative:
23 = 2 * 2 * 2 = 8 32 = 3 * 3 = 9 8 != 9, so 23 != 32
nor associative:
2(32) = 2(3 * 3) = 29 = 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 = 512 (23)2 = (2 * 2 * 2)2 = 82 = 8 * 8 = 64 512 != 64, so 2(32) != (23)2
These initial lemmas are based on our intuitive definition of exponentiation as repeated multiplication, which provides obvious answers only in the case where the exponent is a counting number (strictly positive integer). Let's extend our definition to cover other numbers in our algebra.
[A202.1] d = b + c Starting assumption [I202.2] b = d - c ad = ab * ac From [A202.1] and [L201.1] ad / ac = ab * ac / ac Assuming ac!=0 [I202.3] ab = ad / ac [L202.4] a(d - c) = ad / ac Substitute b from [I202.2]
We can't divide by zero, so the above is not valid when ac is zero. When is that expression zero? From the definition of exponentiation, this expression represents repeated multiplication of a. What number when multiplied by itself is zero? There is only one such number: zero. So [L202.4] is not valid when a = 0, but it is valid for any other base.

Let's look at two special cases of [L202.4].
a0 = a(1 - 1) From [L46.5], a!=0 = a1 / a1 From [L202.4] = a / a From [L201.2] = 1 From [L124.2] [L203] a0 = 1 Above lines summarized, a!=0
a-b = a(0 - b) From [L76], a!=0 = a0 / ab From [L202.4], b!=0 = 1 / ab From [L203] [L204] a-b = 1 / ab Above lines summarized, a!=0, b!=0 [L204.1] a-1 = 1 / a From [L204] with b = 1, and [L201.2]
The above extends our exponentiation operator to all integer exponents and all bases other than zero. What about rational exponents?

Remember that our goal is to define a set of consistent and useful operations. To that end, we want to ask ourselves how we can define exponentiation using a rational exponent such that it is consistent with the rest of our algebra. Rational numbers are equivalent to division using integers, which is the inverse of multiplication. Our exponentiation rule [L201.3] includes multiplication, from which we can derive a rule for division.
a = a1 [L201.2] = a(b / b) From [L124.2], b!=0 = a(b * 1/b) From [L132] = a(1/b * b) From [c*] = (a1/b)b From [L201.3] [L205] (a1/b)b = a Summary of the above lines
What the above says is that the value of a1/b is the number that, when raised to the power b, is equal to a. For example, the number a1/2 is the number that, when raised to the power 2, is equal to a. We call a1/b the b-th root of a. The case where b is 2 or 3 is common enough that we define special names: we call a2 a squared and a1/2 the square root of a; we call a3 a cubed and a1/3 the cube root of a.

Previously when we added a new operation to represent repeated application of an earlier operation (addition as repeated counting and multiplication as repeated addition), we did not encounter closure problems until we added an inverse operation to the newly added operation (subtraction, division). As we will see below, this is not the case for exponentiation: here we will run into closure problems even without an inverse operation. But to keep the flow the same as with the other operators, I will discuss the inverse operation before getting back to closure.

Logarithms

As when we defined division to be the inverse operation of multiplication, we want an inverse operation to exponentiation so that we can solve for x in equations such as ax = b.

We call our inverse operation logarithm.
There is a curious hole in math terminology about logarithms. Our other operations all have names: we talk about performing addition, multiplication, or exponentiation. We do addition by adding two addends to get a sum. But we don't "do logarithm": we "take a logarithm". The word logarithm refers to one of the elements in that operation, similar to how the word exponent refers to one of the elements in the operation of exponentiation. There seems to be no single word for logarithms that corresponds to the operation names such as addition, multiplication, and exponentiation. Talking about logarithms is like talking about sums rather than addition.
[A221.1] loga(ab) = b for all a and b except a=0 or b=0 [A221.2] alogab = b for all a and b except a=0 or b=0
We can derive a few lemmas for log.
[L222.1] loga(a) = loga(a1) = 1 [L201.2] and [A221.1] with b=1 [L222.2] loga(1) = loga(a0) = 0 [L203] and [A221.1] with b=0 [L222.3] loga(1/a) = loga(a-1) = -1 [L204.1] and [A221.1] with b=0
[I223.1] loga(ac) = c [A221.1] using c instead of b [I223.2] loga(ad) = d [A221.1] using d instead of b [I223.3] loga(ac) + loga(ad) = c + d Add left sides and right sides of [I223.1] and [I223.2] [I223.4] loga(ac+d) = c+d [A221.1] using c+d instead of b [L223.5] loga(ac+d) = loga(ac) + loga(ad) Transitive equals on [I223.3] and [I223.4]
[I224.1] loga(ac) + loga(ad) = c + d Subtract left sides and right sides of [I223.1] and [I223.2] [I224.2] loga(ac-d) = c-d [A221.1] using c-d instead of b [L224.3] loga(ac-d) = loga(ac) - loga(ad) Transitive equals on [I224.1] and [I224.2]
[I225.1] loga(ac+d) = loga(ac*ad) [L201.1] [I225.2] loga(ac+d) = loga(ac) + loga(ad) [L223.5] [I225.3] loga(ac*ad) = loga(ac) + loga(ad) Transitive equals on [I225.1] and [I225.2] [L225.4] loga(x*y) = loga(x) + loga(y) Substitute x for ac and y for ad
[I226.1] loga(ac-d) = loga(ac/ad) [L202.4], ad!=0 [I226.2] loga(ac-d) = loga(ac) - loga(ad) [L224.3] [I226.3] loga(ac/ad) = loga(ac) - loga(ad) Transitive equals on [I226.1] and [I226.2] [L226.4] loga(x/y) = loga(x) - loga(y) Substitute x for ac and y for ad, y!=0

Principal Values

Previously, we noted that, when we added division to our algebraic structure, we had to add a small complication in that we can't divide by zero. When we add square root (or, more generally, exponentiation with any non-integer exponent), we run into another kind of special case where we have to take additional care: multivalued functions. We note that every number has two square roots: for example, the square root of 4 is 2 or -2, because either of those numbers, when multiplied by itself, is equal to 4. With multivalued functions like square root, we can run into trouble if we are not careful about choosing which value to use. Here's an example of this problem:
(41/2)2 = 4 41/2 * 41/2 = 4 2 * 41/2 = 4 Substitute 2 as the first square root 2 * -2 = 4 Substitute -2 as the second square root -4 = 4 Wrong!
The bad substitution in the above sequence may be easy to spot and understand, but as we go further into building our algebra, problems of this nature become subtler and harder to recognize.

We can reduce the probability of running into this kind of problem by carefully selecting which of these multiple values to use. When we have one preferred value for a multivalued function, we call that the principal value of the function. For example, the principal value of sqrt(4) is 2.

Irrational Numbers

The ancient Greeks knew that 21/2 (the square root of two) is not a rational number. There are a lot of proofs of this. I happen to like this one that demonstrates that all roots (square root, cube root, and others) that are not integers are not rational.
Assume ab = c (b=2 for square root, b=3 for cube root, etc) and a = d/e, e!=1 where d/e is reduced to the lowest form, so they have no prime factors in common. Then ab = (d/e)b = db/eb = c = c/1 But db has no prime factors that are not in d, and eb has no prime factors that are not in e, so db and eb have no prime factors in common, and the fraction can not be reduced at all, and in particular can not be reduced to c/1, therefore it can not be equal to c. Since there is no rational number satisfying the original assumption, any solution must not be a rational number, except in the case that e=1, which means the root is an integer.
In order for our numbering system to be closed under exponentiation, we need to extend our numbers to include these values that are not rational numbers. We call them irrational numbers.

When we added negative numbers and rational numbers, that was after we had added not only an operation defined by repetition, but also its inverse. In this case, we had to extend our numbers to provide closure even without having yet added that inverse operation.
A brief aside about infinity: before adding irrational numbers, our set of numbers was always countably infinite, which means there was always a way to map the entire set of numbers onto the counting numbers. For example, we can count off all the integers, both positive and negative, by ordering them like this: 0, 1, -1, 2, -2, 3, -3, and so on. We can count off all the rational numbers by ordering them according to the sum of the numerator and denominator and alternating positive and negative, like this: 0, 1/1, -1/1, 1/2, -1/2, 2/1, -2/1, 1/3, -1/3, 2/2, -2/2, 3/1, -3/1, 1/4, and so on, then removing duplicates (any fraction that is not reduced). But once we add all the irrational numbers we can no longer come up with a counting order like this, which is why we say the set of all irrational numbers is uncountable.

For a proof of this assertion, look up Canter's diagonalization argument.

Decimal Notation

When we introduced rational numbers, such as 1/2, we defined their values in terms of the division operation, but did not provide any other representation. This was perhaps acceptable, as we can easily manipulation rational numbers in order to answer questions about them.

With irrational numbers, it is not quite so easy. How can we tell, for example, which of 21/2, 31/3, or 723/510 is the largest? We would like a representation that allows us to do real-world calculations with these values.

When counting up with integers, we use a place-notation system in which each digit, as we move to the left, represents a value that is ten times as much as the digit just to its right. For example, 1234 means 1 * 1000 + 2 * 100 + 3 * 10 + 4. We extend this sequence by defining each place to the right of the ones digit as having a place value of one tenth of the digit to its left. In order to unambiguously know which place is the ones place, we put a decimal point (.) just to the right of the ones digit (we in America, that is; in some other parts of the world people use a comma (,) instead). For example, 0.5678 means 5 * 1/10 + 6 * 1/100 + 7 * 1/1000 + 8 * 1/10000.

We can convert fractions to decimal form such as a.bcde by remembering that that means a + b/10 + c/100 + d/1000 + e/10000
723/510 = (510 + 213) / 510 = 510/510 + 213/510 = 1 + 213/510 = 1 + 10 * 213/510 / 10 = 1 + 2130/510 / 10 = 1 + (2040 + 90)/510 / 10 = 1 + 2040/510 / 10 + 90/510 / 10 = 1 + 4/10 + 10 * 90/510 / 100 = 1 + 4/10 + 900/510 / 100 = 1 + 4/10 + (510 + 390)/510 / 100 = 1 + 4/10 + (510/510 + 390/510) / 100 = 1 + 4/10 + 1/100 + 390/510 / 100 = 1 + 4/10 + 1/100 + 10 * 390/510 / 1000 = 1 + 4/10 + 1/100 + 3900/510 / 1000 = 1 + 4/10 + 1/100 + (3570 + 330)/510 / 1000 = 1 + 4/10 + 1/100 + (3570/510 + 330/510) / 1000 = 1 + 4/10 + 1/100 + 7/1000 + 330/510 / 1000 = 1.417 + more digits from 330/510 / 1000
Figuring out the decimal representation for a number such as 21/2 is not quite as straightforward, but we can start by the brute-force approach of trial and error to get an estimate.
12 = 1, 1<2 22 = 4, 4>2, so our number must start with 1 1.12 = 1.21 1.22 = 1.44 1.32 = 1.69 1.42 = 1.96 1.52 = 2.25 so our number must start with 1.4 1.412 = 1.9881 1.422 = 2.0164 so our number must start with 1.41 1.4112 = 1.990921 1.4122 = 1.993744 1.4132 = 1.996569 1.4142 = 1.999396 1.4152 = 2.002225 so our number must start with 1.414
From this much we can determine that 21/2 is less than 723/510. We don't have an exact answer, but for real world questions we often don't need to go to very many decimal digits to get the answer.

Our decimal notation is a sum of fractions, so any finite decimal number can be converted to a rational number. Conversely, irrational numbers can not be exactly represented as a decimal number, we can only approximate them when using decimal notation. If we want to maintain an exact representation of an irrational number such as 2, we have to keep it in that notation or something similar.

Imaginary Numbers

Adding irrational numbers extends our numbers to include the value of 21/2 and other fractional roots of positive numbers, but it doesn't cover everything. In particular, our numbers don't yet include a value for the expression -11/2. This is the square root of negative 1, which is equal to the number that, when multiplied by itself, equals negative 1. But any positive number multiplied by itself is a positive number, and from [L104.5] any negative number multiplied by itself is also a positive number, so we don't have any numbers that are candidates to be the square root of negative 1. In order to have exponentiation be closed for negative bases, we need to extend our numbers. We need to add a set of numbers that, when multiplied by themselves, produce negative numbers.

When we added negative numbers, we used our existing counting numbers with an added character (-) in front to indicate a negative number. We will do something similar here, using our existing counting numbers with an added character, in this case the letter i, following the number to indicate the new kind of numbers we are adding. We define 1i (or just i) to be the number such that i2 = -1, and given a number a, we define ai = a * i (which is consistent with a common convention of defining ab = a * b).

We need to pick a name to distinguish these new numbers from what we had before, and "the square root of negative one" is too unwieldy, so we pick a shorter name and call them imaginary numbers.

When we defined negative numbers, we might have instead called them imaginary numbers, because you can't have negative lengths or a negative number of apples in the real world, so those numbers are not real, right? In the sense that they are highly useful for certain mathematical calculations, imaginary numbers are no more "imaginary" than negative numbers. It is unfortunate that we are stuck with a name that causes some people to get distracted from thinking about these new numbers as simply the next step in expanding our numbering system to be closed under exponentiation.

To distinguish them from our newly added imaginary numbers, we go back and lump together our previously defined rational and irrational numbers and call those real numbers. Having made the distinction between real and imaginary numbers, we note that we can have imaginary rational numbers, such as (1/2)i, or imaginary irrational numbers, such as 21/2i, as well as negative imaginary numbers such as -4i or negative irrational imaginary numbers such as -21/2i.

If we work through the mechanics of addition and subtraction with imaginary numbers, we find that they work the same as real numbers but with that extra i everywhere. To put it another way, imaginary numbers are closed under addition and subtraction. This is not the case with multiplication: imaginary numbers are not closed under multiplication, since i * i = -1, which is not an imaginary number. Similarly, imaginary numbers are not closed under division, since i / i = 1, which is not imaginary.

Complex Numbers

Since we defined imaginary numbers as being a different set of numbers from real numbers, we can't convert from one to the other, so if we try to add a real number a and an imaginary number bi together, we can't reduce that, so we just write it as a + bi. We call this kind of number a complex number, and since a or b could be zero, we note that all real numbers and all imaginary numbers are complex numbers.

We are, in a sense, cheating when we use the + symbol to enumerate the real and imaginary parts of a complex number, because, as just stated, we can't actually do anything with that operator to reduce the number. In that sense, we could have used any special character in that location. But we choose to use the + sign because it turns out the rules we have that deal with the + operator on real numbers also work with complex numbers: commutative, associative, and distributive rules all work consistently when applied to complex numbers when we use a + sign between the real and imaginary parts.

As with square root, complex numbers come with multivalued functions, some with an infinite number of solutions. It's easy to get bad results if you're not careful, so it's important to define a principal value for these functions and consistently use it.

Cartesian Coordinates

Since real and imaginary numbers can't be reduced to each other and are thus orthogonal, we can represent them on the plane. We choose real to be the X axis and imaginary to be the Y axis.

With this cartesian environment, we can represent complex numbers in polar coordinates using the standard conversion: (r, θ) = (sqrt(x2 + y2), arctan(y/x), where x is the real part and y is the imaginary part (and with the appropriate sign adjustments for quadrants other than I). Converting the other way, we have (x, y) = (r * cos(θ), r * sin(θ)). Sometimes we refer to a complex number as z, where we can decompose it either by real and imaginary parts, written as x = Re(z), y = Im(z), or by polar coordinates, written as r = |z|, θ = Arg(z), where |z| is the magnitude of z and Arg(z) is the argument of z. More precisely, arg(z) is the argument of z, and Arg(z) is the principal argument of z. arg(z) is a multi-valued function equal to Arg(z) + n*2*π for all integer values of n.

We can treat our complex numbers as vectors in the two dimensional complex plane, so that adding two complex numbers can be displayed in our plane as vector addition. More interesting is multiplication, where we can see that when we use polar coordinates we get this nice result: (r1,θ1) * (r2,θ2) = (r1*r2, θ1+θ2).
(r1,θ1) * (r2,θ2) = (r1*cos(θ1) + r1*sin(θ1)i) * (r2*cos(θ2) + r2*sin(θ2)i) = r1*(cos(θ1) + sin(θ1)i) * r2*(cos(θ2) + r2*sin(θ2)i) = r1*r2 * (cos(θ1) + sin(θ1)i) * (cos(θ2) + r2*sin(θ2)i) = r1*r2 * (cos(θ1)*cos(θ2) + cos(θ1)*sin(θ2)i + sin(θ1)*cos(θ2)i + sin(θ1)*sin(θ2)*i2 = r1*r2 * ((cos(θ1)*cos(θ2) - sin(θ1)*sin(θ2)) + (cos(θ1)*sin(θ2) + sin(θ1)*cos(θ2))i) = r1*r2 * (cos(θ1+θ2) + sin(θ1+θ2)i) = r1*r2*cos(θ1+θ2) + r1*r2*sin(θ1+θ2)i = (r1*r2, θ1+θ2) [L301] (r1,θ1) * (r2,θ2) = (r1*r2, θ1+θ2) The above summarized

Euler's Formula

Here is Euler's Formula:
e = cos(θ) + i*sin(θ)
Feynman calls this "one of the most remarkable, almost astounding, formulas in all of mathematics" and refers to it as an "amazing jewel".

As described in an article at Brilliant, Euler's Formula can be derived using the series expansions of sin(x), cos(x), and ex:
cos(x) = 1 - x2/2! + x4/4! - ... sin(x) = x - x3/3! + x5/5! - ... ex = 1 + x + x2/2! + x3/3! + ...
so:
ei*x = 1 + i*x + (i*x)2/2! + (i*x)3/3! + (i*x)4/4! + (i*x)5/5! + ... = 1 + i*x - x2/2! - i*x3/3! + x4/4! + i*x5/5! - ... = (1 - x2/2! + x4/4! - ...) + i*(x - x3/3! + x5/5! - ...) = cos(x) + i*sin(x)
In the section on Cartesian Coordinates above, we noted that any complex number can be represented in polar coordinates using r and theta, but we didn't have a good place to put the i. With Euler's Formula, we can now unambiguously represent any complex number z = x + i*y as |z| * ei*arg(z) where |z| is the magnitude of z and arg(z) is the argument of z.

Complex Exponentiation

Given w = u + i*v and z = x + i*y, how do we calculate wz?

We would like wz to satisfy the rules of exponentiation that we derived for real numbers, such as ka+b = ka * kb. We will assume that we can apply this rule to complex exponentiation and see how that works out.

From the discussion of Euler's Formula above we know that we can represent any nonzero complex number w as |w|*ei*arg(w), and we can represent the real number |w| as eln(|w|). Let's see where that takes us.
wz = (|w|*e(i*arg(w)))z Expand w = (eln(|w|)*ei*arg(w))z Use exp form for magnitude of w = (eln(|w|)+i*arg(w))z ea * eb = ea+b = e(ln(|w|)+i*arg(w))*z (ea)b = ea*b = e(ln(|w|)+i*arg(w))*(x+i*y) Expand z to real and imaginary parts = eln(|w|)*x + ln(|w|)*i*y + i*arg(w)*x + i*arg(w)*i*y (a+b)*(c+d)=ac+ad+bc+bd = e((ln(|w|)*x - arg(w)*y) + i*(ln(|w|)*y + arg(w)*x) i2=-1 and rearrange terms [L310] wz = e((ln(|w|)*x - arg(w)*y) + i*(ln(|w|)*y + arg(w)*x) The above summarized
This gives us a number of the form r * ei where r = e((ln(|w|)*x - arg(w)*y) and θ = ln(|w|)*y + arg(w)*x, both of which we can evaluate.

Note that the above result includes arg(w) in two places, once multiplied by x and once multiplied by y. arg is a multi-valued function, and thus complex exponentiation is also multi-valued for all exponents except zero.

If we are raising to a real power, then y is zero, so [L310] reduces to
wx = e((ln(|w|)*x) + i*(arg(w)*x) [L310] with y=0 = |w|x * ei*arg(w)*x For real x and all w
This equation says the magnitude of the result is the magnitude of w raised to the x power and the arg of the result is the arg of w multiplied by x. If, for example, we are squaring and thus x is 2, we square the magnitude of the number and double the angle. This result is consistent with our earlier observation that, when multiplying two complex numbers, we can multiply the magnitudes and add the angles.

If y is zero and x is an integer, then ei*arg(w)*x gives the same result for all of the multiple values of arg(w), so the overall function is single-valued. If x is not an integer, this is not the case. For example, if x is 1/2, then we get two different answers by plugging in Arg(w) and Arg(w) + 2*π. These are the two square roots of a number: they always have the same magnitude and differ in angle by π.

If we consider the path that would be traced out for powers of some fixed w as we change the real exponent, we can see that it generates a circle or a spiral. Here is a nice visualization of zx from Suitcase of Dreams for when |z|>1:


If we are raising to an imaginary power, then x is zero, so [L310] reduces to
[L311] wi*y = e(-arg(w)*y + i*ln(|w|)*y) [L310] with x=0
Let's evaluate ii. We use [L311] with w=i and y=1:
ii = e(-arg(w) + i*ln(|w|)) [L311] with w=i and y=0 = e-π/2 * ei * 0 |w|=1, ln(1) is 0 = e-π/2 Imaginary part drops out completely! = 0.207879...
Surprisingly, ii is a real number, a little larger than one fifth. At least, that's one answer. We can use any of the answers e-π/2 + k*2π for any integer k.

We see that we can represent any nonzero complex number in the form ei*z, given z = x + i*y.
ei*z = ei*(x+i*y) = ei*x + i*i*y = e-y + i*x = e-y * ei*x
One interesting thing we can do now is to extend Euler's Formula from real theta to complex theta, which allows us to define sin and cos for the entire complex plane:
ei*z = cos(z) + i*sin(z) e-i*z = cos(z) - i*sin(z) cos is an even function, sin is an odd function ei*z + e-i*z = 2*cos(z) cos(z) = 1/2 (ei*z + e-i*z) ei*z - e-i*z = 2*i*sin(z) sin(z) = 1/(2*i) (ei*z - e-i*z)

Euler's Identity

We evaluate Euler's Formula with theta set to pi:
ei = cos(π) + i*sin(π) = -1 + 0 = -1
We add one to both sides to get the typical presentation, ei + 1 = 0.

Not only does this identity tie together five of the key values of algebra (e, π, i, 1, and 0), it does it with one each of the key operations we derived above (equality, addition, multiplication, exponentiation). That's a pretty sweet equation.

Final Closure

Throughout this presentation, we have expanded our system of numbers as we defined new operators and discovered our system of numbers was not closed under the new operators. But with complex numbers, we have reached a point where we don't need to define any new number types. Complex numbers are sufficient to solve all algebraic equations. This is one of the interpretations of the Fundamental Theorem of Algebra, but the proofs are pretty difficult, so I'm not going to try to prove it here.