Wednesday, October 24, 2012

Role-Based Authorization

A simple, uniform, powerful and extensible authorization model.

Introduction

The "three As" of security are:
  • Authentication - assuring that the user is who he says he is.
  • Authorization - allowing each authenticated user to perform selected privileged actions.
  • Audit - recording privileged actions to allow review of changes or potential abuse of privileges.
Given authentication and auditing it is pretty simple to add a bit more monitoring that is very useful for billing purposes and resource management, so you more often see the combination AAA (Authentication, Authorization, Accounting) or AAAA (Authentication, Authorization, Audit, Accounting).

In this post I discuss only authorization. Authentication and auditing are each big topics, so I won't try to cover them here. Similarly, I assume that the code and data are themselves secure. In particular, I do not cover the issue of multiple security domains and the problem of having lower security code make requests to higher security code.

With my focus only on authorization, in the discussion below I assume that the user has been authenticated so that we can trust that piece of data within the application.

I will use the language of relational databases in this post because it is well-known and precise. An implementation of this model can use some other mechanism to store and query the authorization data. The SQL examples provide precision to the discussion, but you should be able to skip the SQL code and still gain a basic understanding of the model.

In the SQL example code I indicate replacement variables within braces; for example the string {user} in a SQL statement indicates that the application should plug in the user name at that point in the expression. For a real implementation, the actual syntax would depend on the database access package in use.

I have run into some authorization systems intended to provide a powerful set of capabilities for a complex situation that were, unfortunately, themselves so complex as to make it difficult to understand how they were supposed to work, and even after having it explained, difficult to remember because there was not a simple underlying model to tie it all together.

In this post I present an approach to authorization that I believe provides a very high level of power with a model that is relatively simple to understand and to extend as needed. This model initially implements a Role-Based Access Control (RBAC) mechanism, a widely used approach to security that is now a NIST standard. I add a few extensions to the common model that make it start to look more like an Attribute-Based Access Control (ABAC) model.

Separation of Concerns

In an authorization system, we want to separate the management of authorization from the application. The application should ask permission for what it wants to do, which permission is supplied by the authorization system. All management of the granting of the authorizations is handled from the authorization system, completely outside of the application. If you build a system in which any of the abstractions used in the management of authorizations, such as roles, appear in the application, then, as they say, you are doing it wrong.

In this post I focus only on the part of the system that determines whether to grant authorization. A separate system is required to maintain the data that is used by the authorization system. That maintenance can become quite complex in enterprise systems, but I will not be discussing it further in this post except to mention that the authorization mechanism described here can be applied to the system that maintains the authorization data in order to control who is allowed to modify what parts of that data.

Users

Let's start with perhaps the simplest useful authorization model possible. We begin with a one-column user table containing user names.

create table user(name varchar(32) primary key);
When the application wants to check for our sole authorization, it takes a passed-in authenticated user name and calls the authorization function with that value. The authorization function just checks to see if that user exists in the table. If so, the user is authorized and the authorization function returns true; if not, the user is not authorized and the authorization function returns false.
-- authorized if count>0
select count(*) from user where name={user};
The user-only model is too simple for most applications.

Actions

The next step is to add a one-column action table containing actions. We will assume each action is represented by a string name, although for performance reasons some might choose a different representation.
create table action(name varchar(32) primary key);
We add one row to this table for each restricted action; for example, we might have entries for login, reboot_system, and view_system_users.

With the addition of the action table we can no longer just look up users in the user table. We add a third table called grant (or auth_grant, since grant is typically a reserved word in SQL) with two columns that are foreign-key columns to the user and action tables. Each row of the grant table refers to a user and an action, with the meaning that that user is granted authorization to perform that action.
create table auth_grant(
    user varchar(32) not null,
    action varchar(32) not null,
    constraint FK_grant_user foreign key(user)
        references user(name),
    constraint FK_grant_action foreign key(action)
        references action(name)
);
Our authorization function will now accept a combination of values. We will refer to this combination as the requested operation (the NIST standard uses transaction as the unit for which permissions are granted). When an application wants to perform a potentially restricted operation, it takes the passed-in authenticated user name, adds the action it wants to perform, and passes that data to the authorization function. The authorization function takes the passed-in user and action arguments and looks in the grant table for a row in which the passed-in values for user and action match the values in the corresponding columns in the table. That row defines a permission to execute the requested operation. If that row exists, the operation is authorized; if that row does not exist, the operation is not authorized.
-- authorized if count>0
select count(*) from auth_grant where
    user={user} and
    action={action};
The user+action model is sufficient for many simple systems, such as granting login rights to some users and admin rights to other users.

Objects

With just users and actions, each action granted to a user effectively has global scope within the system. This is fine for actions such as login which truly are intended to be global in scope, but we would also like to be able to specify that certain actions can be performed on specific objects. Modern operating systems include mechanisms to grant different access rights, such as read-file or write-file, to specific files based on the user.

We add a one-column object table containing references to the objects in our system for which we want to be able to issue grants, with one row for each such object. We are making the simplifying assumption that each object already has a unique identifier that can be stored in our database.
create table object(name varchar(32) primary key);
We add a third column to our grant table that is a foreign-key column to the object table, exactly analogous to the existing references to the user and action tables. Each row of the grant table now refers to a user, an action and an object, with the meaning that that user is granted authorization to perform that action on that object.
create table auth_grant(
    user varchar(32) not null,
    action varchar(32) not null,
    object varchar(32) not null,
    constraint FK_grant_user foreign key(user)
        references user(name),
    constraint FK_grant_action foreign key(action)
        references action(name),
    constraint FK_grant_object foreign key(object)
        references object(name)
);
If we still want to have actions with global scope, such as the example of a login action in the user+action model, we can add a special system object that can be used in that situation.

Our authorization requests from the application now include three pieces of data. We modify our function for authorizing a restricted operation to take an argument specifying the object, along with the user and action arguments that we already have. The authorization function looks in the grant table as before, but it now must find a row that matches all three fields rather than only user and action.
-- authorized if count>0
select count(*) from auth_grant where
    user={user} and
    action={action} and
    object={object};
The user+action+object model presented here is used in many databases, with the objects being database tables or views and the actions being the four database actions of select, insert, update and delete. There may also be additional actions such as grant (the ability to create additional grants on an object) or actions that allow creating and modifying users or databases.

Roles

In order to simplify the maintenance of grants when we have a large number of users, we add a mechanism that allows us to group users together and grant permissions to a group of users rather than just to a single user. Users are grouped according to the roles they play; example roles are user, administrator, and superuser.

We add a role table with one row for each role we define. (We will look at other possible implementations later, but this choice serves well for explaining the concepts.)
create table role(name varchar(32) primary key);
In order to indicate which users have been granted (assigned) which roles, we add a user_role table with two columns: the user column is a foreign key to the user table that references the user, and the role column is a foreign key to the role table. A user having a role is indicated by adding a row to the user_role table referencing that user and that role. When granting authorization, a user will receive authorization for all roles he has.
create table user_role(
    user varchar(32) not null,
    role varchar(32) not null,
    constraint FK_userrole_user foreign key(user)
        references user(name),
    constraint FK_userrole_role foreign key(role)
        references role(name)
);
We also add a role column to our grant table. This column is a foreign key to the one column in our role table. A row in the grant table can now refer either to a user or to a role. It must reference one or the other; while it might be possible to set up a structure to enforce that constraint directly in the database, we will skip that exercise and instead suggest that this constraint could be enforced by an application-level database consistency check.
create table auth_grant(
    user varchar(32),
    role varchar(32),
    action varchar(32) not null,
    object varchar(32) not null,
    constraint FK_grant_user foreign key(user)
        references user(name),
    constraint FK_grant_role foreign key(role)
        references role(name),
    constraint FK_grant_action foreign key(action)
        references action(name),
    constraint FK_grant_object foreign key(object)
        references object(name)
);
The addition of roles is entirely an abstraction within the authorization system; the application is not aware of roles. An operation is defined by the same three values as before, and the application calls the authorization function in the same way as before to see if an operation is authorized, but the authorization function has to do a little more work now.

The application still passes the user, action and object arguments to the authorization function, and the authorization function still looks in the grant table to see if that combination of user, action and object is authorized, but now in addition to looking for a row that exactly matches those three values, it also looks up all of the roles the specified user has, and it looks for a row in the grant table in which the action and object values exactly match the values passed in and in which the role in the grant table is one of the roles the user has. If the authorization function finds a row that exactly matches the action and object and that exactly matches either the user or any of the user's roles then the action is authorized; if no such matching row is found then the action is not authorized.
-- authorized if count>0
select count(*) from auth_grant where
    (user={user} or role in (select role from user_role where user={user})) and
    action={action} and
    object={object};
The (user+role)+action+object model presented here has been used in the Unix filesystem for many years, with the objects being files and directories, the actions being read, write and execute/search, and the roles called groups.

In the NIST RBAC model permissions can only be assigned to roles, not to users. A strict implementation of this aspect could easily be implemented by dropping the user check in our authorization test (which also means we can drop the user column in the grant table):
-- authorized if count>0
select count(*) from auth_grant where
    role in (select role from user_role where user={user}) and
    action={action} and
    object={object};
Alternatively, we could think of each user as automatically being assigned a unique role whose name is the same as the user name. Or, we can choose never to assign any permissions to a user, only assigning them to roles.

Role Activation

The NIST RBAC standard includes a concept called Role Activation (or Role Authorization). When a user logs in, some subset of his roles can be activated. Allowing a user to activate and deactivate his assigned roles gives the user a way to ensure that he (or some program he is running) does not perform a privileged operation when he is not expecting it. Permissions are only granted for active roles, so even if a user has been given permissions through a role, a program will not be able to take advantage of them unless the user has activated a role that grants those permissions.

We can implement role activation globally by adding an is_active column to the user_role table.
create table user_role(
    user varchar(32) not null,
    role varchar(32) not null,
    is_active boolean not null default false,
    constraint FK_userrole_user foreign key(user)
        references user(name),
    constraint FK_userrole_role foreign key(role)
        references role(name)
);
When checking for authorization, we only include roles that are active for that user. If we continue to allow user-based permissions, then we would need to add an is_active flag for those permissions as well. When using activation it is simpler to exclude user-based permissions, as is done in the NIST RBAC model.
-- authorized if count>0
select count(*) from auth_grant where
    role in (select role from user_role where user={user} and is_active) and
    action={action} and
    object={object};
The NIST RBAC standard uses session-based activation rather than global activation. This allows a user to have multiple sessions open simultaneously with different roles active for each session. To implement this, rather than adding an is_active column to the user_role table, we create a session table that keeps track of our sessions and a session_role table that lists the roles that are active for each session.
create table session(
    id varchar(32) primary key,
    user varchar(32) not null,
    constraint FK_session_user foreign key(user)
        references user(name)
);

create table session_role(
    session_id varchar(32) not null,
    role varchar(32) not null,
    constraint FK_sessionrole_sessionid foreign key(session_id)
        references session(id),
    constraint FK_sessionrole_role foreign key(role)
        references role(name)
);
When testing for authorization we only want to use roles that are both assigned (in the user_role table) and active (in the session_role table). Assuming the mechanism that maintains active roles in the session_role table ensures that the only roles appearing in that table are in the user_role table (i.e. only an assigned role from the user_role table can be active in the session_role table), then we can modify the authorization function to accept an additional argument which is the session_id, and change our implementation SQL:
-- authorized if count>0
select count(*) from auth_grant where
    role in (select role from session_role where user={user} and session_id={session_id}) and
    action={action} and
    object={object};
At this point our model includes the capabilities of RBAC0, the first level of the NIST RBAC standard (although the NIST model does not include action and object as presented above). However, in order to keep the discussion of the other aspects of the model less cluttered, I will generally not be including role activation in the remainder of this discussion except where noted.

Role Hierarchies

Given the ability to group users into roles and thus simplify the number of grants we need to create, we can generalize on that concept by also allowing roles to be grouped into other roles.

In the discussion of Roles above, we added a user_role table that allowed us to assign roles to users. We now add a role_hierarchy table with parent and child columns that allows us to assign roles (children) to other roles (parents).
create table role_hierarchy(
    parent varchar(32),
    child varchar(32),
    constraint FK_rolehierarchy_parent foreign key(parent)
        references role(name),
    constraint FK_rolehierarchy_child foreign key(child)
        references role(name)
);
When collecting the list of roles for a user, we now have to recursively consult the role_hierarchy table to collect all of the child roles for any role the user has. How this is actually done is heavily dependent on the implementation. Some SQL databases include the ability to formulate recursive queries, but most do not.

We hide this implementation detail inside a view that collects the closure of the role-role relationships, effectively flattening our hierarchy. Defining this flattening in a view allows us to change how we collect the closure of the roles without affecting the queries that invoke this view. In this particular example, our view is defined using a non-recursive query that will suffice for a hierarchy of limited depth.
-- not a full closure if the hierarchy is too deep
create view role_closure as
    select distinct user, a3.child as role from user_role
        join role_hierarchy as a1 on user_role.role=a1.parent or
            (user_role.user=a1.parent and user_role.role=a1.child)
        join role_hierarchy as a2 on a1.child=a2.parent or
            (user_role.user=a2.parent and user_role.role=a2.child)
        join role_hierarchy as a3 on a2.child=a3.parent or
            (user_role.user=a3.parent and user_role.role=a3.child)
    ;
We can now use the role_closure view in place of the user_role table:
-- authorized if count>0
select count(*) from auth_grant where
    (user={user} or role in (select role from role_closure where user={user})) and
    action={action} and
    object={object};
If we want to use session-based activation, we can do that by modifying our role_closure view to be based on the session_role table rather than the user_role table:
-- not a full closure if the hierarchy is too deep
create view role_closure as
    select distinct session_id, user, a3.child as role from session_role
        join role_hierarchy as a1 on session_role.role=a1.parent or
            (session_role.user=a1.parent and session_role.role=a1.child)
        join role_hierarchy as a2 on a1.child=a2.parent or
            (session_role.user=a2.parent and session_role.role=a2.child)
        join role_hierarchy as a3 on a2.child=a3.parent or
            (session_role.user=a3.parent and session_role.role=a3.child)
    ;
As above when adding session-based role activation, the authorization SQL includes the session-id and we no longer allow user-based permissions:
-- authorized if count>0
select count(*) from auth_grant where
    role in (select role from role_closure where user={user} and session_id={session_id}) and
    action={action} and
    object={object};
This change can be made in any of the authorization SQL statements given below to add session-based authorization where it is otherwise not included.

Note that although we stated that the role parent/child relationships form a hierarchy, there is actually no reason to limit it to that, and our design does not preclude defining role relationships that form a more complex graph. We do want to avoid cycles in our role graph, as a graph with cycles would not provide us any useful benefits, and we need to ensure that our implementation does not blow up if the role graph happens to have some cycles. If we use the role_closure view implementation provided above, an incidental benefit is that the closure mechanism is so simple and limited, cycles will not cause any problems other than wasting a bit of processing power.

The NIST RBAC standard defines both general and restricted forms of hierarchy as part of the RBAC1 level. The restricted form is a tree structure and the general form is an arbitrary partial order. Our model above support the general form.

NIST RBAC levels RBAC2 and RBAC3 add Constraints (to ensure support of Separation of Duties) and Symmetry (the ability to review permission-role assignments as well as user-role assignments). With the simple database implementation presented here, these capabilities are available.

Alternate Hierarchy Implementations

In the implementation of user roles and role hierarchies above we added a role table, a user_role table and a role_hierarchy table, we added a role column to the grant table, we added a role_closure view and we modified our example SQL select statement for checking authorization to use that view. In this section I present three alternate approaches to this step when using a relational database, and of course there are other approaches not discussed here that are not based on a relational database. These implementation alternatives do not affect the basic model being developed.

In the first alternate approach, after defining the role table we next define the user_or_role view that is the union of those two tables.
create view user_or_role as
    (select name from user)
    union all
    (select name from role)
;
In the grant table, rather than adding a role column and having the user column be a foreign key to the user table, we make the user column a foreign key to the user_or_role view. Unfortunately, it is typically not possible to declare a foreign key to a view, in which case this foreign key relationship would have to remain implicit and not enforced by the database (it could be part of our application-level database consistency checks). Nonetheless, the SQL statements that join using this foreign key will work the same as if the foreign key were declared, although performance may be an issue if the user_or_role view can not be indexed. By using a materialized view it might be possible to index the view and have a foreign key refer to it, but then we would need to deal with rematerializing the view every time we changed the contents of the user or role tables.

Instead of creating a role_hierarchy table, we do the same thing to the user column of the user_role table as we did to the grant table, making it a foreign key to the user_or_role view rather than to the user table. This allows the user_role table to represent which roles have other roles as well as which roles users have directly been given.

In our second alternate implementation, we start by defining user_or_role as a table that contains the records for both users and roles, with an is_role column that indicates whether a row represents a user or a role. We then create user and role as appropriate views into that table.
create table user_or_role (
    name varchar(32) primary key,
    is_role boolean not null default false
);

create view user as
    select name from user_or_role where not is_role;

create view role as
    select name from user_or_role where is_role;
As in our first alternate implementation, the grant table points to the user_or_role table, as does the user column in the user_role table.
create table auth_grant(
    user_or_role varchar(32) not null,
    action varchar(32) not null,
    object varchar(32) not null,
    constraint FK_grant_user foreign key(user_or_role)
        references user_or_role(name),
    constraint FK_grant_action foreign key(action)
        references action(name),
    constraint FK_grant_object foreign key(object)
        references object(name)
);

create table user_role(
    user varchar(32) not null,
    role varchar(32) not null,
    constraint FK_userrole_user foreign key(user)
        references user_or_role(name),
    constraint FK_userrole_role foreign key(role)
        references role(name)
);
Many databases, including MySQL, do not allow indexes or foreign keys on views, so neither of the above two alternate implementations will work very well on those databases, and the table statements would have to be modified not to declare foreign keys to view columns.

If we want to use indexes and foreign keys, we have to compromise our data model a bit and not use views when we need foreign keys, which leads us to our final alternative.

In our third alternate implementation, we don't have a separate role table or view. Instead, we use the user_or_role approach as in the second alternative above: we place the role names into the user table and add an is_role column that indicates whether a row represents a user or a role.
create table user (
    name varchar(32) primary key,
    is_role boolean not null default false
);
In our user_role table, in which the role column was a foreign key to the role table, we make that column instead be a foreign key to the user table, where we are now storing our role names.
create table user_role(
    user varchar(32) not null,
    role varchar(32) not null,
    constraint FK_userrole_user foreign key(user)
        references user(name),
    constraint FK_userrole_role foreign key(role)
        references user(name)
);
We don't need a role_hierarchy table because we can now represent those role-to-role relationships in the user_role table. In our role_closure view we replace the role_hierarchy references with user_role references.
create view role_closure as
    select distinct a0.user, a3.role from user_role as a0
        join user_role as a1 on a0.role=a1.user or
            (a0.user=a1.user and a0.role=a1.role)
        join user_role as a2 on a1.role=a2.user or
            (a0.user=a2.user and a0.role=a2.role)
        join user_role as a3 on a2.role=a3.user or
            (a0.user=a3.user and a0.role=a3.role)
    ;
Because we are now storing our roles in the user table, the user column in our grant table can refer to either a user or a role, depending on what we are storing in the user table, so we don't need the role column and we can go back to the previous definition that did not have that column.

With this implementation our foreign key constraints all work because we are not dealing with any views, and our table structure is simpler because we have combined users and roles into one table. Although we are putting roles into the user table, we do need to remember that this is just a convenient fiction to simplify our implementation because there are some situations in which we want to treat users and roles the same. But we must remember that, although we are storing them in the same table and in some situations ignoring the difference between them, if we forget about that difference and start treating them the same in other situations we can easily start getting absurd behavior from our system.

(I have a mental image of our legal system as having a people table, and a law table with a foreign key to the people table. At some early point, someone wanted some laws that applied to corporations as well as people, so they said, "I know, let's just add an is_corporation flag to the people table and put the corporations in there, then our foreign keys from the law table will still work and we won't need to add a bunch more structure to our law schema!" With the passage of time, law programmers who should have been paying attention to the is_corporation flag started ignoring it more and more often, until finally the law programmers were saying, "Well, those corporations are in the people table, so they must be people." If you are concerned that this kind of situation might happen to you, you might not want to put roles into the user table.)

For the remainder of this discussion, we will use this third alternate implementation approach.

Interlude

In the above discussions, I have been assuming that the names of users, actions, objects and roles are also their key values. This implies that each of those names are unique. Given that I have discussed a couple of implementations in which users and roles have been mixed together, you might wonder whether it would cause problems to add a user whose name is the same as a role. In the above simple implementation the answer is "yes", and the system would have to disallow that. A real system is likely to be a bit more complex, using unique IDs as primary keys rather than names. The problem of having unique names thus gets moved from a database issue to an application-level issue. The system implementer must decide under what circumstances it is acceptable to have duplicate names, and there must be a way to distinguish those duplicates to someone operating the system.

We have reached a point in the development of our authorization model that is similar in power to many existing systems. People who need more flexibility than this model provides might diverge at this point into custom authorization systems with various forms of exceptions and extensions that rapidly start adding complexity to the model.

There are still a number of extensions we can make to our authorization model that will improves its power while adding only a small amount to the cognitive load of understanding how it all works. Let's get back to our model and add some more power to it.

Tasks

In the same way that we allow specifying a group of users having a role, we add the ability to specify a group of actions, which we call a task. The relation between tasks and actions is exactly analogous to the relation between users and roles. Each action can be assigned to multiple tasks, a task can be assigned other tasks, and an authorization grant can refer either to an action or to a task.

Analogous with our second alternative implementation above, in which we added an is_role column to the user table and put roles into the user table, for the equivalent addition of tasks we add an is_task column to the action table, add an action_task table with columns action and task both being foreign key references to the action table, and add a task_closure view.
create table action(
    name varchar(32) primary key,
    is_task boolean not null
);

create table action_task(
    action varchar(32) not null,
    task varchar(32) not null,
    constraint FK_actiontask_action foreign key(action)
        references action(name),
    constraint FK_actiontask_task foreign key(task)
        references action(name)
);

create view task_closure as
    select action, a3.task as task from action_task as a0
        join action_task as a1 on a0.task=a1.action or
            (a0.action=a1.action and a0.task=a1.task)
        join action_task as a2 on a1.task=a2.action or
            (a0.action=a2.action and a0.task=a2.task)
        join action_task as a3 on a2.task=a3.action or
            (a0.action=a3.action and a0.task=a3.task)
    ;
We expand our authorization query to look for tasks in the same way as we expanded it to handle roles, with the same caveats about hierarchy depth.
-- authorized if count>0
select count(*) from auth_grant where
    (user={user} or user in (select role from role_closure where user={user})) and
    (action={action} or action in (select task from task_closure where action={action})) and
    object={object};

Domains

Roles and tasks give us the ability to group users and actions. We complete the pattern by adding the ability to group objects into groups that we call domains (not to be confused with internet domain names). As with the tasks example above, we add the is_domain column to the object table, create the object_domain table to allow defining groups of objects, create the domain_closure view, and modify the authorization function to check for either objects or domains in the same way as we modified it to check for either actions or tasks. All of these steps are exactly analogous to what we did when we added tasks.

Intermediate Summary

Let's take stock of what our model looks like:
  • There are three dimensions: user, action, and object.
  • The handling of the three dimensions is completely symmetric (unless role activation is being used, in which case the user dimension has that extra wrinkle).
  • The application passes those three values to the authorization function, which returns true if that operation is authorized, false if not.
  • For each dimension, there is a grouping mechanism: role for user group, task for action group, domain for object group.
  • The grouping mechanism supports a hierarchy of groups, or more generally a (directed acyclic) graph of groups (a partial ordering).
  • To determine if a request should be authorized, take each dimension, collect the closure of the groups for that dimension, and look for a grant in which each dimension of the grant matches any of the items in the closure for that dimension.
The model presented above is easy to understand, but despite its simplicity it is quite powerful. Yet it does not suffice for everyone. Let's see how we can continue to enhance it's power without significantly increasing its complexity.

Times, Periods and Schedules

In some systems it is desirable to allow some operations only at specified times. For example, one might want to allow users to log in to the system only during their work shift.

We define another dimension, the time dimension, and we define a time range as a period, where a period is an interval of time such as 8AM to 5PM, or Sunday, or 8AM to 5PM on weekdays. We add the time dimension to our definition of an operation, so when the application calls the authorization function, it must now pass the current time as a fourth argument.

The dimensions we have defined previously are all discrete dimensions, with only one matching value for each definition. The time dimension is different in that it is a continuous dimension: there are multiple time values that can match a period. This makes the authorization function a little more difficult to write, but it does not add much complexity to the user's conceptual model.

The other dimensions all have groups, so it would not add to the complexity of the model to add groups of periods. In fact, the model would be more complex if we did not add groups of periods, as that would make this dimension different from all the others in that aspect, which would be an additional detail that the user would have to factor into his mental model.

We add a group called schedule. As with all the other groups, a period can be included in any number of schedules, and schedules can contain other schedules. When checking authorization, we collect all the periods that match the current time and the closure of all the schedules for those periods, and we search for grants that include any of those in the period column.

Locations, Areas and Regions

By now the pattern should be pretty clear. If the system requires other dimensions, they are easy to add by following the same pattern. By keeping to the pattern, the complexity of the model that the user must work with to understand the system is kept low, even when there is some small difference for the new dimension, as there was for the time dimension when compared to the three previously defined dimensions. When there are small model extensions for a dimension, as there was when we added the time dimension, we can leverage that model concept when adding some other dimension.

Location is a system-specific concept. For some systems it might be a logical location, such as "console", "secure terminal", or "dial up". Since these are discrete values, it would suffice to have a location table, group locations in regions, and handle it in the same manner as the other discrete dimensions such as user.

For other systems a location might mean a physical location specified by one or more continuous values, such as latitude and longitude, in which case we define an area analogously to a period, where one area includes a range of locations. The area might be defined with a center point and radius, it might be defined with a bounding box, it might be defined as a polygon, using splines, or in some other even more complex way. As with periods, the complexity of the definition of an area has an effect on the difficulty of implementing the authorization function that has to determine whether a location is or is not in an area, but has little effect on the complexity of the user's mental model of the authorization. For the user, it is sufficient to know that a given location will be either contained in or not contained in an area, and that grants are based on areas.

Our group for an area is a region, and it groups together areas and other regions in the same way as the groups in the other dimensions.

Denials

The approach described above is essentially a "whitelist" approach, which is the standard approach to authorization. If an operation is listed in the grant table then it is allowed; any operation which is not listed is not allowed.

It is also possible to use a "blacklist" approach: rather than allowing what is listed and denying everything else, we can deny what is listed and allow everything else. In this case we would create a denial table that is exactly like the grant table except that it contains operations to be denied rather than operations to be allowed. The authorization function would do the same search as before, except that it would deny the operation if any matching records were found, and allow the operation otherwise.

Using a blacklist approach to authorization as just described is generally not recommended (in fact the NIST RBAC standard specifically recommends against "Negative permissions", although it does not outright disallow them). Since the default action is to allow an operation, if a new operation is added to the system and through oversight the appropriate denials are not added, then there is no protection for the new operations.

Exceptions

We can combine the original grant approach and the denial approach described just above to give us the ability to have both a whitelist and a blacklist. We start with our original grant table approach, following the recommended position that the default is to deny any operation unless it is explicitly granted; on top of that, we add the denial table as exceptions to the grants.

Our authorization function first looks in the denial table; if a matching record is found, then the request is denied. If no matching record is found, then the function looks in the grant table; if a matching record is found, then the request is granted; otherwise it is denied.

This allows the admin to think in terms of exceptions: grant privileges to all of X, except for Y. In some situations this allows expressing the intended grants more simply than if one is restricted to just additive grants.

We could also flip the grant and denial tables around, first looking in the grant table for a match, then looking in the denial table for a match, then granting if nothing is found. As discussed in the previous section, this is not recommended, but understanding that it is possible is conceptually useful, and leads us to our last enhancement.

Prioritization

The structure of the grant and denial tables are identical, and their contents are checked in the same way, with the only difference being an inversion of the interpretation of the results in one case as compared to the other. We can easily combine both of these tables into a single auth table that includes an additional allow column that is true for all records from the grant table and false for all the records from the denial table. We can also add a priority column that we use to determine which records we should attend to first.
create table auth(
    id integer auto_increment,
    allow boolean not null default true,
    priority integer not null default 0, -- higher values take precedence
    user varchar(32) not null,
    action varchar(32) not null,
    object varchar(32) not null,
    period varchar(32) not null,
    area varchar(32) not null,
    constraint FK_auth_user foreign key(user)
        references user(name),
    constraint FK_auth_action foreign key(action)
        references action(name),
    constraint FK_auth_object foreign key(object)
        references object(name),
    constraint FK_auth_period foreign key(period)
        references period(name),
    constraint FK_auth_area foreign key(area)
        references area(name)
);
If we define the priority value such that higher values are more important than lower values, then we can get the same behavior as described in the first part of the previous section by setting the priority on all the denial records to 2 and setting the priority on all the grant records to 1. Our authorization function then looks in the auth table for the matching record with the highest priority value and looks at the allow value for that record.

If we wanted to get the (non-recommended) behavior as described at the end of the previous section, we could do that by setting the priority of all the grant records to 2 and setting the priority of all the denial records to 1, plus making the default behavior (when no matching rows are found) to allow the operation.

Given this structure, we can of course put in records with any priority value. This allows building up a series of toggling exceptions, much as the way leap years in the Gregorian calendar are defined (each year has 365 days, except every 4th year is a leap year with 366 days, except every 100 years is not a leap year, except every 400 years is a leap year).

Since we can stack up alternating grant and denial records, the only distinction between the "whitelist" and "blacklist" approaches discussed earlier is the question of what the default is when no matching records are found in the auth table (the default for whitelisting is deny, the default for blacklisting is grant). Given that using a default of allow is not recommended, we define the system to use a default of deny, but we provide a way that the system can effectively be set up with a default of allow if desired.

To simulate a default of allow, the admin can create a group for each of the dimensions in our authorization model (user, action, etc) that includes all elements of that dimension. Thus there would be an AllUsers role, an AllActions task, an AllObjects domain, etc. The admin then creates a rule that includes all of these groups with allow set to true and priority set to zero. Since the rule has been defined to include all elements of every dimension, it will always match every operation, so there will never be a case where there are no matches and the system default of deny is used. Assuming all other priority values are greater than zero, this rule will be the lowest priority, so it will only have an effect if there are no other matches, and thus it acts as the default.

As described above, there is one more potential ambiguity to resolve: what happens if there are two rules with the same priority but opposite allow values? (Two rules with the same priority and the same allow are not a problem, as they both give the same result.) We resolve this ambiguity by defining the denial records to take precedence over the grant records when they have the same priority value. This definition reduces nicely to the desired behavior for the simplest denial+grant case when all records have the same priority.

Our authorization function thus looks for all matching records in the auth table, sorts first by priority then by allow, picks the first one, and uses its allow value to determine whether to allow the operation. If no matching records are found, the operation is not allowed.

Ignoring for now the more complicated portions of the WHERE clause for selecting time and location, here is our SQL statement for determining if an operation is authorized:
-- The single selected value is true if authorized; if false or no records, not authorized
select allow from auth_grant where
    (user={user} or user in (select role from role_closure where user={user})) and
    (action={action} or action in (select task from task_closure where action={action})) and
    (object={object} or object in (select domain from domain_closure where object={object}))
    order by priority desc, allow asc
    limit 1
Adding prioritization like this adds a new concept to the authorization model, but provides a good amount of additional power relative to the additional mental load to understand the model. However, creating well-structured rules using prioritization is trickier that it seems at first glance. It has the same essential problem as for the blacklist approach described above: mistakes in setting up the conceptual layers of the different levels of prioritization can result in unexpected security holes. If you can figure out how to set up your authorizations using grants only, without denials, you should do that. But if the grant-only model is not sufficient, then adding prioritization as described in this section is a reasonable way to take the model to the next level of power - just remember that you have to be more careful in how you set up your rules.

Summary

With the addition of prioritization in the previous section, our authorization model is complete. Let's review the complete model.
  • There are two kinds of dimensions: discrete and continuous.
  • There are five dimensions: user, action, object, time and location.
  • User, action and object are discrete; time is continuous; location can be either discrete or continuous, depending on how the system defines it.
  • Additional dimensions can be added if necessary, following the pattern of the existing dimensions.
  • The handling of every discrete dimension is completely symmetrical with every other discrete dimension (unless session-based role activation is included, in which case the user dimension is a little different); the handling of each continuous dimension is close to completely symmetrical with the other continuous dimensions; and there is a high level of symmetry between the discrete and the continuous dimensions.
  • The application passes a value for each dimension to the authorization function. This collection of dimension values is the operation for which the application is requesting authorization. The authorization function returns true if that operation is authorized, false if not.
  • For each continuous dimensions, there is a range defined as the basic match: period for time, area for location.
  • For each dimension, there is a grouping mechanism: role for user group, task for action group, domain for object group, schedule for period group, region for area or location group.
  • The grouping mechanism supports a hierarchy of groups, or more generally a (directed acyclic) graph of groups.
  • There is a set of rules that is used to determine whether an operation is authorized. Each rule includes a set of comparison values, one for each dimension, a priority, and an allow flag that tells whether that rule specifies that authorization for a matching operation should be granted or denied.
  • To determine if a request is authorized, take the value for each dimension in the request, collect the closure of the groups for that value, and collect the records in which each dimension of the grant matches any of the items in the closure for that dimension. Pick the record with the highest priority, giving preference to deny records over grant records, and use the allow value of that record to determine whether to authorize or deny the operation. If no matching records are found, the operation is denied.
This conceptual model is no longer trivial, but the above rules are still relatively concise and easy to understand. The model is general enough and powerful enough that it should be suitable for a wide variety of applications.

In our model the application passes in a set of values to the authorization function, which uses its abstractions (in the form of groups) and rules (in the form of prioritization) to determine whether or not to grant permission for an operation. If we need more power, the application can pass in additional information, whether it is additional attribute information about the user, the environment, or other aspects of the operation, and the authorization system can apply even more complex rules. This is the approach used by Attribute-Based Access Control, with a rules engine used in place of the mechanisms described here.

Monday, April 30, 2012

Git Rebase Across Many Commits

Not all git merge conflicts are real.

Contents

The Scenario

In both my personal and my work projects I prefer to use git rebase to keep my commit histories simple and readable. To make this work in a team setting, we never work on the master branch, instead always working on a feature branch in our local repositories. Our process flow looks something like this:
$ git branch feature           #create the working branch
$ git checkout feature          #do all development work on that branch
#Edit files, etc.
$ git commit -m "Implement Feature"
#Repeat the above as desired during development.
#When ready to merge to master, do the following:
$ git checkout master
$ git pull                      #update master from shared repository
$ git checkout feature
$ git rebase master             #optionally with -i if squashing is desired
$ git checkout master
$ git merge feature
$ git push origin master
$ git branch -d feature
Because we never use our local master branch for development, the git pull on master is always a fast-forward merge. Likewise, because we have just rebased the feature branch against the master right before we merge that feature branch back into master, that merge is also always a fast-forward merge. Looking at it another way, we don't have any merge conflicts when updating or merging master because we resolve all of the merge conflicts when we rebase the feature branch against the latest master.

The Problem

At work, we have a large codebase and a handful of active developers who typically merge feature branches to the master using the above workflow multiple times each day. Sometimes somebody has a feature branch that takes a long time to finish, so that between the time that branch was started and the time it is ready to go into master, there may have been 40 or 50 other commits made to master. In general in this situation we will occasionally rebase our local feature branch against the latest master a few times during feature development, but inevitably there are occasions when a large rebase across many commits ends up being done.

Even if there are many commits on the master branch, if none of those commits touched any of the same code as the commits on the feature branch, then there should be no merge conflicts when rebasing the feature branch against the updated main branch. However, in my experience this has not always been the case. Sometimes git rebase reports merge conflicts when I think there should not be any. Since I don't generally know exactly what code the other team members have edited, I can't immediately tell if the merge conflicts make sense.

The normal advice for how to handle merge conflicts is to edit the named file, look for the conflict markers, inspect the conflicting code fragments, determine what to keep, edit out what is not being kept along with the conflict markers, git add the repaired file, and git rebase --continue to let it tell you about the next merge conflict.

That's a lot of work, and it might all be completely unnecessary.

The Solution

It seems that git sometimes just gets confused when doing a rebase across a large number of commits. Sometimes if you rebase in smaller steps, git will happily rebase each smaller step with no merge conflicts, until you have stepped all the way up to the latest master, at which point your rebase is done.

You could rebase against every single commit and work your way up to master, but that, too, is a lot of work. Here's what I do when the initial rebase of the feature branch against the latest master tells me there are merge conflicts.

When the initial git rebase reports a merge conflict, I immediately do git rebase --abort to undo that rebase attempt. Using gitk --all to view the commit tree, which lets me see the master branch and the commit at which my feature branch branches off the master branch, I select a commit on the master branch about half way between those two commits. I copy the commit ID and paste it into a rebase command that looks something like this:
$ git rebase 8bc85584989e4435c2d98b13447bcab37648ba7f
If this rebase reports no merge conflicts, then I try rebasing against master and repeat the process.

If there are merge conflicts, then I abort the rebase and pick another commit half way again to the branch point. I repeat this until either the rebase succeeds or I am trying to rebase across a single commit. At that point, if there are still merge conflicts, they are real and I address them in the normal way. Since the conflict is only across a single commit, it is easier to see the cause of the conflict and to resolve it.

After resolving the conflict across that one commit, I go back to the first step and try rebasing against master again, repeating the process.

I have followed this process a number of times. I think that a majority of these times I binary-divide my commits a few times and end up piecemeal stepping through the commits until I have rebased against master without ever having to resolve any conflicts. The other times I typically have to resolve one or two small conflicts, after which I can rebase against master.

The next time you do a rebase across more than one commit and git tells you there are merge conflicts, try this approach. You might save yourself a lot of work.