Linear Programming in Excel 2011 for Mac

A Quick Intro to Linear Programming

Linear programming (LP) is a specialized form of matrix algebra that we can use to solve constrained multi-variable optimization scenarios. LP is part of a larger discipline known as Operations Research, which uses advanced mathematics to inform and optimize decision making.

The discussion and example herein are intended for intermediate-level Excel users who have little or no experience with using Solver to calculate LP problems.

Mathematically, we characterize LP as an equation of the form:

max cTx
s.t. Axb

As you may recall, for n = length, bold lower-case letters represent a 1 × n array, and bold upper-case letters represent an n × n matrix. The superscript capital T denotes a transpose; for example, turning a 1 × n array 90 degrees to form its n × 1 representation.

Also be aware that the term, matrix, is actually a generic expression that can apply as well to an n × n matrix, a 1 × n array, or a single-cell scalar. To avoid confusion, in addition to matrix, we’ll use the more specific terms array and scalar where applicable.

In the LP model above, the expression cTx is called the objective function. It signifies a matrix multiplication of the function to be minimized (or maximized), cT, by the decision variables, x. Decision variables are those elements of an LP problem that are discretionary; in other words, controllable by the decision maker.

Actually, LP is just a way of optimizing the objective function subject to the constraints A and b. And along the way, Solver adjusts the values of the decision variables, x, as necessary to produce the optimal outcome. It does this by ranking possible solutions according to the value of cTx they generate.

The variable A is a set of linear inequality constraints. We use A to limit the decision variables in x to being greater than or less than the corresponding values set forth in b. In this way, the relationship between c and x is defined and enforced.

Fortunately, the built-in version of Solver will allow us to calculate LP solutions for most simple (and even some moderately complex) LP scenarios. In setting up the calculation for Solver, we’ll also be using the built-in Excel functions MMULT and/or SUMPRODUCT to do our matrix arithmetic.

As you formulate your LP arguments, you’ll want to keep in mind the following:

  1. MMULT and SUMPRODUCT require at least two ranges separated by commas. For purposes of this article, we’ll assume exactly two ranges throughout.
  2. If the result of an operation will span more than one cell, you’ll need to set it up as an array formula. In Excel for Mac, you evaluate an array formula by pressing command-shift-return.
  3. SUMPRODUCT will return an error unless the dimensions of the ranges are exactly alike. In other words, SUMPRODUCT will not operate on a 1 × n array and an n × 1 array.
  4. MMULT will return an error unless the number of columns in the first range is equal to the number of rows in the second range. That means that if you want to use MMULT on the same two expressions that you successfully referenced in SUMPRODUCT, you’ll have to use the TRANSPOSE function on one or the other, like so:
=SUMPRODUCT(array1, array2) =MMULT(TRANSPOSE(array1), array2)

Note: in this article, we’ll forgo a review of matrix arithmetic/matrix algebra theory and the various mathematical operations that can be performed on arrays and matrices. Interested readers can find a good one-page review of the topic here. For the truly interested, all the materials for an undergraduate MIT course in managerial LP are available free online.

Formulating an Employee Work Schedule

In terms of practical application, a simple LP example might involve optimizing the schedule of a company’s workforce to minimize labor cost. Assume a scheduled airline operates seven days a week at a particular airport. The station’s daily labor demands (in terms of hours worked) for ground crew employees are as follows:

hours_reqd

We’ll assume the collective bargaining agreement in place specifies the following:

  • All employees are paid $20 per hour for straight time worked
  • The company operates only one 8-hour shift per day
  • Management is prohibited from hiring part-time employees
  • The scope clause permits management to assign any employee to perform any necessary task
  • Employees must receive two consecutive days off per week

To keep things simple, we will disregard seniority, bidding and the relative desirability of certain days off. We will also disallow overtime.

Based on the demand for labor in terms of hours, we can calculate the minimum number of employees required to work on a given day with the following argument:

=CEILING(D3/8, 1)

This argument divides the number of hours required by 8 hours per employee, and if there is a remainder, the CEILING function returns the next integer value greater than the fractional result.

One way to solve the scheduling problem is to create a tableau of employee schedules (see below). Note that we treat the tableau values as binary: 1 for days worked, and 0 for days off. Note also that the tableau accounts for every possible combination of consecutive days off.

tableau1

Next, we need an array of values representing the number of employees who are starting their 5-day work week on a particular day. We can specify any integer value here, since these are the values that Solver will change to optimize our solution. For the sake of simplicity, I’ve specified 20 employees for each day.

decision_variables

Next, we’ll create an array that sets out the number of employees who are scheduled to work on a particular day.

emp_avail3

We’ll generate these values from a matrix multiplication of the binary (1, 0) values for each day in the table, and the entire array of employees who start work on each week day. Assume the tableau of days worked and days off is located at D11:J17, and the array of employees who start their work week on a particular day is located in L11:L17. The argument for available employees on Sunday is:

=MMULT(D11:J11,$L$11:$L$17)

Note that I’ve included the dollar-sign designation in the second array because this range will be the same for all seven calculations.

After you’ve calculated the available employees for each day, it’s helpful to go ahead and recap the values for number of employees required just below. You’ll also want to designate a cell (L20) that captures the total number of employees, SUM(L11:L17).

emp_reqd2

We’ll create a cell (D23) to hold the value corresponding with the hourly rate of pay, and another cell (D24) to hold our objective function.

weekly_payroll

Our objective function in this case is the airline’s weekly payroll for ground crew:

=D23*8*L20

Or, hourly pay rate, times eight hours per day, times total employees.

Once you’ve entered all of the above, we’ll save time by naming the range L11:L17 as “Sked_Emp,” the range D20:J20 as “Emp_Avail,” and the range D21:J21 as “Emp_Reqd.” The best way to name a range of cells is to select all of the cells in question, and then enter the (unique) name in the window to the left of the argument window:

named_range2

We’re almost ready to calculate. Click on the Data tab and then on the Solver icon. When the Solver pane pops up, configure the LP problem like so:

solver_pane2

There’s one final thing to check. Click on the Options button next to the Solving Method combo box, and make sure the box marked “Ignore Integer Constraints” is unchecked.

Now click the Solve button, and Excel should return the following changed values:

changed_values2
Posted in Built-In Functions, Business, Excel | Leave a comment

Optimization in Excel for Mac 2011 Using Solver

In what could be considered Part 1 of this article, we set up a VBA environment that allows us to run Monte Carlo simulations in Excel for Mac. Here, we’ll take the analysis one step further and use the Solver addin to find specific target values.

If you’ve purchased MS Office for Mac 2011 recently, then the Solver addin was installed automatically along with Excel. To activate it, go into Excel and click on Tools > Addins. Check the box next to Solver, and you’re good to go. If you purchased an early version of Office for Mac 2011, you may need to download and install Solver.

As with Excel for Windows, Solver also offers a premium version of its addin for Mac. I have no experience with the premium version, and I have no basis by which to compare it to, e.g., Matlab, for complex, non-linear problems. As you can see from the webpage I’ve linked, it’s one of those deals where you have to email Solver just to find out how much the premium version costs.

Parenthetically, the fact that Solver makes an addin for the Mac version of Excel means that addins are technically doable on the Mac side. Indeed, there are a few other third-party addins for the Mac version of Office as described here. Nevertheless, it’s unclear whether or to what extent it’s feasible for independent developers and hobbyists to create Mac-version addins.

Fortunately, the basic version of Solver will accomplish everything we’ll want to do today (and considerably more).

The Solver Window

To use Solver, click on the Data tab and then on the Solver button. The Solver window looks like this:

Solver Window

In order for Solver to work, you have to specify one particular cell (the “Objective”) that contains a formula rather than a static value. After that, Solver will begin changing the value(s) of whatever variable(s) you specify to either maximize or minimize the value of the Objective cell, or to make the Objective cell equal a certain value.

You can also limit Solver’s calculation by imposing constraints on the value of one or more variables. We’ll discuss this in more detail below.

Some of the items in the Solver window are self-explanatory, and some items are anything but. To understand more fully requires us to know something about the difference between linear and non-linear functions.

Linear vs. Non-linear Functions

Of course, the quickest way to determine whether a function is linear or non-linear is to assign numbers to all of the variables and plot it. When you do, most non-linear functions will show up as a clearly discernible curve.

But not always. Recall from Part I of this article that the plot of our Monte Carlo simulation looked like this:

ROI Graph

As you can see, the points all appear to lie on a straight line. Nevertheless, our Revenue, Income and ROI functions are all non-linear. Here’s how we know.

A function is linear when its rate of change is constant. To illustrate, let’s create a table. Using the same worksheet we created in Part 1, enter 3500 in cell J2, and then in J3 enter the following:

=J2+100

Now drag down to cell J12. You’ll see that we’ve created a list of billable hours that increases by a constant value, 100. In this problem we only simulated one variable. Consequently, if we change the billable hours variable by a constant amount, we can measure whether the change in value for the Revenue function is also constant.

Note that had we simulated more than one variable, to test for non-linearity we’d have to make the rate of change constant for those other variables as well.

Next, Copy cell F2 and Paste Special > Formulas into cell K2. Change E2 in the argument to J2, and then drag down even with column J.

Recall that the formula for change is (new – old)/old. In cell L3, enter the following:

=(K3-K2)/K2

and drag down to L12. You can now see from column L that the change in Revenue from one observation to the next is small, but definitely not constant.

Beginning in column M, repeat this process for Income and ROI. When you finish, you’ll have a table that looks like this:

Rates of Change

As you can see, none of our functions are linear, regardless of whether they appear to be straight when we graph them. Also note that all of the functions are decreasing in relation to Billable Hours, some faster than others.

The Minimum and Maximum of a Function

As we discussed above, Solver is designed to minimize or maximize the value of whatever function you specify. Generally speaking, however, only parabolic functions will have an intrinsic minimum or maximum value. For example, if we look at the plot of a convex parabolic function:

Convex Quadratic Function

it’s easy to see where the minimum value would be located. In the same way, if we look at the plot of a concave parabolic function:

Concave Quadratic Function

we can readily point out where the maximum value is located. The distinctive feature of a parabolic function is that the change in value is negative in some places on the curve and positive in others. As you may have guessed, the maximum or minimum of a parabolic function is the place on the curve where change, δ, is equal to zero.

Yet in many cases, the shape of the curve is more subtle. In our headcount analysis curve, change is strictly positive all along the curve – at least for the part we’ve plotted – and nowhere does δ = 0. This means we won’t be able to find a maximum or minimum value for the function unless we impose arbitrary constraints on the Billable Hours variable.

Obviously, we know there’s an absolute constraint on the number of hours an associate can bill that’s represented by 24 × 365 = 8760 hours in a year (insert appropriate lawyer joke here). We also know that there’s some practical limit on the number of billable hours an associate will generate that caps off somewhere around 2400 hours per year (except in Manhattan).

At the same time, we know there’s a practical constraint in the other direction that represents the minimum number of hours an associate must bill in order to keep from being fired.

So in reality, our headcount profitability function is already limited to a particular range. All we need to do is specify for Solver the minimum and maximum values for Billable Hours that we want it to consider.

Setup

Before we return to the Solver window, we need to set up worksheet cells that contain the functions we want Solver to work on.

Go to cell F2 and Copy, then Paste Special > Formulas in cell B10. Change E2 in the argument to B7. Then copy and paste the formula from cell G2 to cell B11, changing the reference from F2 to B10. Last, copy the formula from cell H2 to cell B12, changing the reference from G2 to B11.

Optimization Setup

Since Revenue is the only function that is directly dependent on Billable Hours, it’s the Revenue function that we’ll maximize.

In the Set Objective box, make sure it reads $B$10. Since we’re seeking the maximum, click the radio button labeled Max if it isn’t already selected. Then in the box labeled By Changing Variable Cells, you can either use the locator button to the right to select cell B7, or manually enter $B$7.

To include constraints, click the Add button. In the window that appears, designate that cell $B$7 should be <= 4800 (or whatever maximum value you prefer). It should come as no surprise that maximum Revenue will correspond with the maximum number of billable hours you enter.

Solver Constraints

Then click OK. Next, click Add again, and specify that cell $B$7 should be >= 3600 (or whatever minimum number of hours you want to specify). Click OK. You can ignore the box labeled Make Unconstrained Variables Non-Negative, because we don’t have any unconstrained variables in this example. Our Revenue function is non-linear, so for the solving method choose GRG Nonlinear, which should already be selected as the default.

One final thing – if your argument for Overhead in cell B8 contains a reference to B7, then replace B7 with the number 4000.

When you’ve completed all that, click Solve.

If all goes well, Solver will return a window that looks like this:

Solver Finish

Before you decide whether to keep the solution that Solver found, first look over at the changed variables and make sure that the value(s) is/are within the constraints you specified. Then look over at the Objective cell and make a determination as to whether the result is plausible. If both check out, then you can click OK. If not, it’s usually best to select the radio button labelled Restore Original Values before clicking OK.

You should now see the values below in cells B10:B12

Optimization Result

Next, to find the number of billable hours that correspond with a particular ROI, first bring back up the Solver window. Change the Objective cell to $B$12. Let’s assume that we want to know the number of billable hours that corresponds with a twelve percent ROI. Start by clicking the radio button labeled Value Of and enter 0.12 in the box to the right. Leave everything else unchanged, and click Solve.

You should see that Solver determined the number of billable hours corresponding with a 12% ROI is 3983.75.

In the same way, if you want to calculate the number of billable hours that corresponds with break-even, leave $B$12 as the Objective cell, and enter 0 in the box to the right. Then click Solve.

Posted in Business, Excel | Leave a comment

Relational Database Functionality in MS Excel for Mac 2011

The Most Significant and Powerful Excel for Mac Workaround I’ve Yet Encountered

Think of MS Office for Windows as a professional auto mechanic’s toolbox. You know what I mean – a big red locking tool box with dozens of sliding drawers. For any kind of auto repair, it has everything you’ll need, in metric and standard, Phillips-head, slotted, and hex. In fact, there are tools in there that 99 out of 100 people wouldn’t even recognize – let alone know how to use.

By that same analogy, you can think of MS Office 2011 for Mac as a shade-tree mechanic’s toolbox. There’s a lot of useful stuff inside, and if all you ever do is change spark plugs and oil, it’s more than enough. And in all fairness, for any repair more complex than that most people just take their car to a professional mechanic, anyway.

But if for whatever reason you find yourself needing or wanting to overhaul a transmission or rebuild an engine, your task will be much harder. You’ll discover that your shade-tree mechanic’s toolbox is missing a number of things that, at a minimum, would greatly facilitate matters.

Now, if playing video games has taught me anything, it’s this: there’s always a way. The real issue is whether you have enough time and patience to deal with finding and then implementing the necessary workaround. Naturally, some workarounds are less optimal than others.

Then once in a blue moon, someone actually figures out how to use a ball-peen hammer as a torque wrench. Less frequently still, that person is willing to share their discovery without charging anything for it.

It says here – without exaggeration – that this set of instructions is that same kind of monumental discovery in the context of Excel for Mac.

Relative to the Windows version of MS Office, the most significant omission in MS Office for Mac 2011 (apart from ActiveX) is MS Access. Without question, Excel is an outstanding application, but its real power only becomes evident when Excel is coupled with a database, for it’s then that real quantitative “heavy lifting” can take place.

You can think of this article as a foreword to Jim Gordon’s excellent set of materials linked above.

Office for Mac Preliminaries and Setup

Naturally, I’m taking it as a given that you’ve already purchased MS Office 2011 for Mac. I use the Home & Business version, but to my knowledge everything I’ll discuss and refer to herein is equally applicable to the Home & Student version.

To the extent it’s material, I’m running Maverick OS (OS X version 10.9), but unless you’re still using one of the very early versions of OS X, that shouldn’t matter, either.

Before you can begin to follow along with the articles linked above, you’ll first need to download and install ODBC Manager. This (free) application came pre-installed in the very early versions of OS X, but was disincluded by Apple several versions ago. Once you’ve downloaded the application, make sure it’s installed in your Applications/Utilities folder.

You’ll also need a set of database drivers. I’m not aware of any freeware drivers, but fortunately they’re modestly priced. I use the drivers from Actual Technologies, available here. They come in 4 flavors: (a) SQL and Sybase; (b) Oracle; (c) Open Source, including MySQL, PostgreSQL and SQLite; and (d) MS Access. At this writing, each of the four costs the same: $34.95, or $139.80 for all four. The AT website also has an easy-to-follow set of instructions for installing the drivers in your ODBC Manager application.

You can freely test these AT drivers before purchasing a license, but the limitation is that database query results will be limited to three rows.

Once you have your database drivers installed, Gordon’s article will begin walking you through the process of interfacing Excel with ODBC Manager.

Posted in Excel | Leave a comment

Monte Carlo Simulation in Excel for Mac 2011

Monte Carlo simulation isn’t something that Excel for Mac does naturally, but as I’ll demonstrate in this post, it’s certainly doable. Using a series of VBA modules, if we take the time to set it all up once, you can use the Excel workbook we’ll create as a template for future simulation projects.

Assume you’re the managing partner of a law firm, and you’d like to know whether it would be profitable to bring in another first-year associate. Experience has taught you that evaluating a new attorney is ordinarily a two-year process, so your window of analysis will be two years.

You already know what the market will bear in terms of billing rates for first and second-year associates, so your decision is likely to turn on the difference between expected revenue and estimated overhead in relation to a first-year associate. Overhead is largely a fixed cost, but expected revenue is inherently variable. As it happens, expected revenue is primarily a function of the number of hours billed.

By virtue of the fact that you’re considering a headcount increase in the first place, we’ve established that you currently have more work than your present staff can accomplish working full-time. What’s uncertain is whether the additional work would amount to something like 4000 billable hours over the next two years. Consequently, we’ll simulate various totals of billable hours to study how these affect profitability and return on investment.

Setup in Excel

First, let’s identify and assign a value to each of the variables we need to consider. You know from experience that you’ll have to write off about 35 percent of the hours billed by a first or second year associate, so let’s identify a variable called Collectible Billings, and assign a value of 0.65.

You expect to bill the attorney’s time at $225 per hour for the first year, and $245 an hour for the second year; we’ll average the two and assign $235 as the expected Billing Rate.

We’ll need to include the applicable Risk-free interest Rate (RFR), which is typically the quoted rate on 90-day T-Bills. This is a basic measure of opportunity cost for any type of firm project, including a headcount increase. It also represents the estimated effect of inflation (if any). Since the RFR is unusually low at this writing, 0.1%, we’ll arbitrarily assign a more conservative rate of 2%. Note that if your firm already has an established hurdle rate for evaluating new projects (e.g., eight percent), you should use the hurdle rate here instead of the risk-free rate.

If you ultimately decided to defer hiring a new attorney for now, you estimate that hiring one later would take about 60 days. The economic value of avoiding this 60-day lag is called “Convenience Yield,” and we quantify it as 60/720, or approximately 0.08. This parameter is optional, and if you feel that including it would make your simulations too rosy, you can simply enter 1 in cell B5.

Our time increment is two years, or 2.

Next, we’ll assume that the firm requires associates to bill 2000 hours per year. The value in cell B7 is simply your annual billable hour quota times 2 years, or 4000. We’ll end up making this value dynamic, but we’ll use the firm quota as the basis for certain other estimates, as well as our starting estimate for number of hours billed.

Finally, we’ll estimate (reasonably, if arbitrarily) that Overhead (i.e., salary, benefits, administrative support, equipment, etc.) for an additional associate will amount to sixty percent of expected receivables, or Billing Rate × 4000 × 0.6.

Once you enter these variables, you should have a column of values that looks like this:

Starting Values in Excel

To keep things simple, we’ll hold all other variables constant and express our expected profit from hiring the associate purely as a function of hours billed (actual demand). Because we also expressed cost as a function of expected billable hours, this is not a trivializing assumption.

Monte Carlo Simulation with Excel and VBA

For purposes of simulation, we’ll model the number of hours billed in accordance with the Log Normal distribution. Since we know that there is a stochastic element to simulation, we’ll set that up next.

As I’ve noted in other posts, Excel’s random number generator is notoriously suspect, so we’ll make use of the excellent Mersenne Twister module available here. For my own convenience, I’ve made some minor changes to the original code to facilitate random seeding, and you can download my edited mersenne twister module here. N.B.: Excel for Windows users can and should make use of the free random number generating addin available here.

In what follows, I’ll assume that you’re using my modified mersenne twister module. Once you’ve copied the mersenne twister code, go into Excel and click on the Developer tab (click to enlarge).

Developer Tab Visible

(Here are instructions on how to make the Developer tab visible if it isn’t already). On the top menu bar, click Insert, then select Module. When the new blank module comes up, paste the mersenne twister code you copied. Next, for ease of reference I recommend that you go to View, then select the Properties window. There you can specify a unique name for the module.

We’ll also need to specify a random seed function. For this, my mersenne twister module uses the same VBA function that I provided in an earlier post. This I’ve included separately in a second module.

Finally, we’ll need VBA code to generate Lognormally-distributed random values. Here, we’ll use the random number generation code that I also set out in an earlier post. This I’ve included separately in the third module.

If we simulate values from a Log Normal distribution with mean = 0.05, and standard deviation = 0.1, we obtain a range of values with a histogram that looks like this (click to enlarge):

Random Values Histogram

We can then multiply these simulated values by the firm’s billable quota to obtain a series of simulated billable hour totals.

Going back to our Excel workbook, in cell D2 we’ll enter the following argument:

=dynrngLN(0.05,0.1)

Look in the lower right-hand corner of cell D2 and locate the fill handle. Click and drag the fill handle down to generate as many simulated values as you wish. Generally speaking, when it comes to simulation, the more the better. Nevertheless, I strongly advise you to set your calculation preference in Excel to manual (Excel > Preferences > Calculation > Manual); otherwise, Excel will go through the often time-consuming chore of calculation anytime you change any of the cells on that worksheet. On the manual setting, you can force calculation by pressing command-equal.

Also, if at any point you’re satisfied with the values you’ve simulated in column D, you can always copy them and use Paste Special > Values to make them constant. This will avoid the lengthy re-simulation process every time you recalculate.

Here, we’ll simulate only 100 values to keep things manageable, but you could easily use this same procedure to run 10,000 or more simulations.

Next, in cell E2 we’ll enter:

=$B$7*D2

then drag down so that columns D and E contain the same number of values. Don’t forget the dollar-signs in the above formula; that’s what tells Excel to use the same starting value for expected demand in every iteration.

After completing this step, you’ll have a list of simulated billable hour values in column E. Take a look at the list and see if it strikes you as realistic. For example, it may be unrealistic to expect a first-year associate to bill as little as 75% of quota (because they’d be fired before the end of year 1) or as much as 125% of quota (unless your firm is located in Manhattan). If you want to bring the range of simulated values closer to the 4000 hour estimate, you can adjust the standard deviation value in cell D2 downward from 0.1 to 0.05 or even lower (as long as the number is greater than zero). I’ve deliberately left the estimates somewhat rough in this example to better illustrate the range of possible outcomes.

The heart of the analysis is in column F. Before we begin, go ahead and format the cells in columns F and G to accommodate currency values, and specify that negative values are to be in red. You’ll see why in a moment.

In cell F2, you’ll want to enter the following argument:

=($B$2*$B$3*$B$5*E2)/(1+$B$4)^$B$6

then drag it down as before.

There’s a lot going on in this calculation, so I’ll break it down. Between the first set of parentheses, we’re multiplying the average billing rate times the estimated percentage that’s collectible, then times the convenience yield multiple, then times the simulated number of billable hours. This yields a gross estimate of revenue. We then discount the gross estimate by the opportunity cost, which is represented by the gross figure divided by the quantity 1 plus the risk-free rate, to the second power (since we’re using a 2 year window). This in turn yields a net simulation of revenue.

In column G, we’ll calculate the resulting income from each simulation. Insert the following code in cell G2, then drag down:

=F2-$B$8

Last, we want to calculate our return on investment for each simulation. Go ahead and format column H to contain percentage values, and specify two decimal places. Then in cell H2, enter the following argument:

=G2/$B$8

and drag down as before. Column H now gives us a range of simulated ROIs, which for most purposes is the number that the firm’s executive committee is really interested in.

columns_final

Bear in mind also that the ROI simulations in column H are somewhat conservative, in that we’ve already discounted revenue by the RFR (or whatever value you specified in cell B4).

To visualize the data, select columns E through H and then click on the Charts tab. Select Scatter Plots, and then Smooth. When Excel creates the chart, right-click on any of the markers showing Simulated Revenue, and choose Delete. Then right-click on any of the markers showing Simulated Income, and choose Delete. You’re now left with a chart that plots billed hours versus ROI, which should be the analysis you’re looking for. Since we’re using simulation, your chart will be slightly different, but the one I created looks like this (I’ve added axis labels) (click to enlarge):

ROI Graph

As you can see, break even is somewhere around 3500 hours for the two-year horizon, or about 1750 billable hours a year. It’s also easy to visualize the required number of billable hours for various returns on investment; simply choose an ROI percentage on the left, draw a straight line right until it intersects the curve, and then draw a straight line down to the number of billable hours.

In part 2 of this article, I’ll show you how to use the Solver add-in to calculate the exact number of billable hours that correspond with break-even, as well as any hurdle rate you choose.

Posted in Business, Excel | Leave a comment

Building a Quasi-Random Password Generator in Mathematica

Here’s a simple, down-and-dirty password generator in Mathematica. I wrote it in version 9, but it should be backward compatible to . . . I don’t know, but way back. There’s nothing really fancy or artful here, it’s just something that I find useful.

You’ll note that the function takes two inputs; the first is a binary indicator of whether you want to include non-alphanumeric (i.e., special) characters (1 for alphanumeric only, otherwise 0), and the second input is the number of characters you want your password to contain.

cut[charlist_] := RotateLeft[charlist, RandomInteger[{10, 58}]]; shuffle[charlist_] := Flatten[Transpose[Partition[charlist, 34]], 1]; pwdFn[anonly_: 0, len_] := Module[{chars, charsX, nums, set, numcuts, inter, ret, fin}, charsX = {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "!", "@", "#", "$", "%", "^", "&", "*", "-", "_", "+"}; chars = Drop[charsX, -11]; nums = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}; set = If[anonly == 1, Join[chars, nums], Join[charsX, nums]]; numcuts = RandomInteger[{233, 6765}]; inter = Nest[shuffle[cut[#1]] &, set, numcuts]; ret = RandomChoice[inter, len]; fin = StringJoin@ret; Return[Style[fin, FontFamily -> "Courier", FontSize -> 14]]];

pwdFn[1, 16]

gYf2RcR96Z7NYgKZ

pwdFn[0, 16]

02ktc%bl*U6v@+GD

Posted in Mathematica | Leave a comment

Excel Random Numbers from Various Distributions

As a sort of companion to the most recent post dealing with alpha-Stable random values in Excel, here’s a VBA module that will generate quasi-random numbers from any of the following distributions:

  • Standard Normal(0, 1) distribution
  • Normal(μ, σ) distribution
  • Log-normal(λ, ρ) distribution
  • Chi-square(n) distribution
  • Student’s T(n) distribution
  • Weibull(η, σ) distribution
  • Pareto(α, κ) distribution
  • Exponential(α) distribution
  • Rayleigh(α) distribution
  • Cauchy(μ, γ) distribution

Note that you’ll need to choose a method of generating Uniform random variates. At the same time, be aware that Excel’s built-in quasi-random number generator, RAND(), is notoriously weak. I prefer to use the Mersenne Twister VBA module available here, and the random number module linked above incorporates the Mersenne Twister module’s nomenclature.

You’ll also need to apply your own random seeding algorithm. Below is a random seed function that utilizes your computer’s internal clock:

Public Function randseed()
' Generates a positive integer value
' from the Excel Date/Time reading

    Application.Volatile
    Application.ScreenUpdating = False
    With Application.WorksheetFunction
    
    Dim currtime As Date
    Dim stime As String
    Dim sval As Double
    Dim inter As Double
    currtime = Now
    stime = CStr(Now)
    sval = TimeValue(stime)
    inter = (sval * 10 ^ 11) / .RandBetween(1000, 10000)
    randseed = Round(inter, 0)
    End With
    Application.ScreenUpdating = True
    
End Function
Posted in Business, Excel, Finance & Investing | Comments Off

alpha-Stable Random Values in Excel

It was a good day all around when Microsoft brought back macros in Excel 2011 for Mac.

Here’s a VBA routine that will generate a 1-D array of pseudo-random numbers from the alpha-Stable distribution. Note that you’ll need to include this excellent Mersenne Twister module to generate the uniform random variates. Please further note that you’ll need to provide the parameter values for alpha, beta, gamma, and delta; this routine won’t fit data.

It’s a helpful tool if you’re noodling on Excel and just want to run some quick return scenarios on a particular asset.

Posted in Business, Excel, Finance & Investing | Leave a comment

Portfolio Simulation in Mathematica (edited for version 9)

Last year I posted a demonstration of copula-based portfolio simulation in Mathematica version 8. Since then, Mathematica has released version 9, which includes lots of new functionality that will be useful for investors and managers – particularly in the area of statistics.

Also as I noted in the earlier post, one problem with multivariate simulation in Mathematica is excessive computation time. Using explicit Mathematica functions, it can take an hour or more to produce a 10^4 matrix of simulated observations for a portfolio with fewer than a dozen assets. And since ten thousand observations is a bare minimum for serious analysis, Mathematica’s computation time is a barrier to use. Another purpose of this article is to present a neat workaround that returns a same-sized matrix of simulated data in just a few seconds.

While I’m tempted to take all the unchanged content from my earlier post and just incorporate it by reference in this article, I think readers using version 9 and later will find it more helpful to have all the information in one place. The earlier article is still a working, albeit less content-rich, example of portfolio simulation in Mathematica version 8, and it can still be utilized by those who have not yet updated to version 9.

As before, we simulate portfolio daily returns using a Student’s T copula. This time, though, we’ll be using a different set of daily stock prices, in part because I want to illustrate the ease with which Mathematica handles negative correlations in the context of multivariate distributions.

First the disclaimer: the following simulation and the particular stock price data referenced herein are purely illustrative. Absolutely no warranty or recommendation on the part of the author or site is intended, including no recommendation or opinion whatsoever with respect to the specific equities chosen or their suitability as an investment. Further, the Mathematica code herein is offered as-is, and with no warranty whatsoever, including warranties of merchantability or fitness. Use the following code at your own risk, but under no circumstances should any reader use the following methodology and/or Mathematica code for actual investing.

Data Import and Notional Portfolio

Assume we have a 5-asset portfolio composed of four small-cap equities with a recent history of high performance (EGHT, IOSP, MG, and MGAM), and as a simple hedge, we’ll arbitrarily choose ProShares Dow UltraShort (DXD) as our fifth asset.

Although Mathematica will fetch closing prices and a few select fundamentals with respect to a given equity, for purposes of this example it’s easier just to download the table of daily log-deltas and corresponding dates for the notional portfolio here, the characteristics of which can be seen below:

. empchart

cumretp
Parenthetically, I should add that the fundamental data available through Mathematica’s built-in FinancialData function are in my opinion much too limited to use Mathematica as a stand-alone application for managing a portfolio. No doubt this is deliberate in light of Wolfram’s recently introduced Finance Platform. Since at present the Finance Platform is available only for Windows, I have no experience with it.

Portfolio Descriptive Statistics

First we construct a function that will return a list of useful descriptive statistics for a corresponding array of daily returns data:

Clear[descripStat];descripStat[data_]:=Module[{drift,volatility,autocorr,medi,fin}, drift[x_List]:=Mean[x]*252;volatility[x_List]:=StandardDeviation[x]*Sqrt[252]; autocorr[x_List]:=Module[{xl=Drop[x,-1],x2=Drop[x,1]},Correlation[x1,x2]; medi={N[Length[data]],N[Variance[data]],N[StandardDeviation[data]], N[MeanDeviation[data]],N[MedianDeviation[data]],N[QuartileDeviation[data]], N[Mean[data]],N[HarmonicMean[data]],N[Median[data]],N[Skewness[data]], N[QuartileSkewness[data]],N[Kurtosis[data]],N[Quantile[data,0.25]], N[Quantile[data,0.75]],Max[data],Min[data],drift[data]//N,volatility[data]//N, autocorr[data]//N};fin=TableForm[medi,TableHeadings={{"Length","Variance", "Standard Deviation","Mean Deviation","Median Deviation", "Quartile Deviation","Mean","Harmonic Mean","Median","Skewness", "Quartile Skewness",
"Kurtosis Excess","Second Quartile", "Fourth Quartile","Maximum","Minimum","Drift","Volatility","Autocorrelation"}, {""}}];Return[fin]];


Because closing stock prices are coincidentally sometimes the same from one day to the next, and since a number of these statistical functions will throw a division by zero error, we need to change all of our zero-valued log-delta observations to a number that’s marginally different than zero:

xzero=g_:=RandomReal[{0.0000000001,0.00000001}]/;g==0;

Next we construct a table of descriptive statistics for each of the equities in the portfolio:

Clear[descripTbl];descripTbl=TableForm[Transpose[{descripStatShort[lndxd/.xzero] [[1]],descripStatShort[lneght/.xzero][[1]],descripStatShort[lniosp/.xzero][[1]], descripStatShort[lnmg/.xzero][[1]],descripStatShort[lnmgam/.xzero][[1]]}], TableHeadings->{{"Length","Variance","Standard Deviation", "Mean Deviation", "Median Deviation","Quartile Deviation","Mean", "Harmonic Mean","Median","Skewness", "Quartile Skewness", "Kurtosis Excess","Second Quartile","Fourth Quartile","Maximum", "Minimum","Drift","Volatility","Autocorrelation","Cowles-Jones Statistic"}, {"DXD","EGHT","IOSP","MG","MGAM"}}]

descrip_Tbl

Conditioned VCV Matrix in Mathematica

Next, we need to generate a correlation matrix for the asset log-deltas, and from the standard correlation matrix, we will generate a conditioned VCV matrix using the following Mathematica code:

Clear[matrixV];matrixV[corrmat_]:=Module[{evcmat,evlmat,inter,ret}, evcmat=Eigenvectors[corrmat];evlmat=DiagonalMatrix[Eigenvalues[corrmat]]; inter=evcmat.MatrixPower[evlmat,1/2].Transpose[evcmat]; ret=SetPrecision[inter,8];Return[ret]];

The resulting VCV matrix looks like this (click to enlarge):

vcvtbl_032013

For purposes of portfolio simulation, it’s the VCV matrix – not the standard correlation matrix – that will govern the simulated portfolio’s codependency.

Before we start working with the VCV matrix, we need to ensure that it’s positive definite. Mathematica internally uses Cholesky decomposition in several matrix functions, and an input matrix that is not positive definite will frequently cause Mathematica to throw an error. We can test for these matrix qualities we want in either of two ways:

vcvmat===Transpose[vcvmat] PositiveDefiniteMatrixQ[vcvmat]

The latter is a new function introduced in Mathematica version 9.

The price data we’ve collected starts with the first trading day in 2011. Based on that, it’s useful to construct a vector of mean returns for each asset:

Clear[muvec];muvec=Mean[lnall];

For purposes of illustration, we’ll assume that each of our holdings in the respective assets makes up one-fifth of total portfolio value. But in any case, total portfolio return is simply the dot-product of muvec and a same-length vector of portfolio weights.

Clear[portret];portret=muvec.Table[1/5,{5}];

Then to obtain the portfolio mean annual return, we multiply portret by 252 (the usual number of trading days in a year).

Asset Marginal Distributions

It’s important to recognize that copula-based simulation will return values in the form of CDF probabilities, 1 > p > 0, for the specified multivariate distribution – not simulated observations, per se. Clearly, a 10^4 by 5 matrix of simulated Student’s T probabilities isn’t terribly useful for our purpose. The real power of copula-based simulation comes from applying the appropriate marginal distribution to the simulated CDF values for each asset.

As famously observed by Benoit Mandelbrot, daily changes in equity and commodity prices are not Normally distributed. In fact, they’re not even Log-Normally distributed. Accordingly, I typically model daily log-deltas in accordance with either the alpha-Stable distribution or the TsallisQ Gaussian distribution, whichever provides the better fit. Both of these distributions readily accommodate data sets that are both longer-tailed and higher-peaked than either the Normal or log-normal distributions:


compPDFs
The graphic above depicts the respective distribution PDFs for EGHT’s daily log-returns.

For an n × 1 array of numerical values, array, Mathematica accomplishes distribution fitting with the following function:

stable=EstimatedDistribution[array,StableDistribution[1,α,β,δ,γ]] tsallisq=EstimatedDistribution[array,TsallisQGaussianDistribution[μ,β],q]]

Note that Mathematica’s parameterization of the alpha-Stable distribution transposes the mean (δ) and scale (γ) parameters in relation to the more standard parameterization offered by Nolan and others.

We can test the returned distribution values to see which is the better fit like so:


DistributionFitTest[array,#]&/@{stable,tsallisq}

For our test data, Mathematica returned the following best fits:


margDs

Calculating the Student’s T Degrees of Freedom Parameter

A Student’s T copula function takes as inputs the multivariate correlation (VCV) matrix and the estimated degrees of freedom parameter, v (from the Student’s T distribution). Using maximum likelihood, we can estimate the appropriate value for v by maximizing the following function with respect to our VCV matrix and the variable, v:

Clear[tLLFn];tLLFn[v_,corrmat_]:=Module[{dim,ret},dim=Length[corrmat]; ret=-(Pi^(-dim/2)*v^(-1-dim/2)*Gamma[(dim+v)/2]*(dim+v*PolyGamma[0,v/2] -v*PolyGamma[0,(dim+v)/2]))/(2*Sqrt[Det[corrmat]]*Gamma[v/2]);Return[ret]];

NMaximize[{tLLFn[v,vcvmat],v>0&&Element[v,Integers]},v]//Quiet

For the test data we’re using, Mathematica returns v = 9.

Ordinarily, we would take our VCV matrix and estimated value for v, and plug them into Mathematica’s built-in CopulaDistribution function. As I referenced above, however, the computation times for Monte Carlo simulation using this method are simply untenable. As a result, I’m particularly grateful to the member known as Sasha at mathematica.stackexchange.com for demonstrating how to use interpolation to derive the same simulation output in a fraction of the time.

To accomplish this, you can see from the linked page above that Sasha created a Fritsch-Carlson interpolation function, together with a grid system to hold the interpolated values. Obviously the grid Sasha designed can be adjusted for greater or lesser granularity, depending on user preference and purpose for the simulation. Thereafter, Sasha’s function applies a multivariate Student’s T distribution to the grid (which yields our simulated CDF values), and then it maps the respective marginal distribution to each asset’s CDF array.

The total elapsed time for Sasha’s operation is less than 5 seconds for a 10^4 × 5 simulation, as compared to 15 minutes or more using Mathematica’s explicit CopulaDistribution function.

Scrubbing the Simulated Data

One drawback of modeling with alpha-Stable and/or Tsallis Q Gaussian distributions is their infinite second moment (variance). In practice, it means that Monte Carlo simulation of these distributions will occasionally return values that are so far out of scale as to be unusable (in statistical terms, an “outlier”). Recalling from our descriptive statistics above that the maximum change in daily log-delta for any of the five equities was on the order of 0.4, you can see from the chart below that we did in fact generate some extreme observations:

rawsim

If we were simulating log-deltas for a single asset, the scrubbing process would be simple; we’d just remove any entry whose absolute value is greater than some assigned threshold. Since we’re simulating multivariate data, however, each 1 × 5 observation (or each day’s simulated return) is interdependent. Consequently, we need a way to test for multivariate outliers.

Mahalanobis Distance Test in Mathematica

It turns out that there are several ways to accomplish this, but the mathematically simplest is the Mahalanobis distance method. For an r × c matrix we can implement the method in Mathematica like so:

Clear[scrubCop];scrubCop[rvraw_,ci_:0.975]:= Module[{md,dims,x2,twov,twov2,ret},dims= Length[Transpose[rvraw]];md=mDist[rvraw]; x2=Quantile[ChiSquareDistribution[dims],ci]; twov=MapThread[Prepend,{rvraw,md}]; twov2=Complement[twov,Select[twov,#[[1]]>x2&]]; ret=Map[Drop[#,{1}]&,twov2];Return[ret]];

After scrubbing, the box and whiskers chart below shows our plausibly distributed yet long-tailed simulations, free from contaminating multivariate outliers:

scrubdat
Posted in Finance & Investing, Mathematica | Leave a comment

Mathematica as Guitar Scratch Pad

Below is a block of Mathematica code that will generate and play the last few bars of Pat Metheny’s beautiful Solo from “More Travels”.  Just click the triangular play button in the bottom left-hand corner of the graphic once the .CDF application loads.  Typically, there is a 5-10 second delay before the sound begins.

Note that you will need to enable Javascript if it has been disabled. Please also note that some users have reported difficulty loading the application in Safari, but I’ve heard of no such difficulties with Firefox or Chrome.

Continue reading “Mathematica as Guitar Scratch Pad” »
Posted in Mathematica, Music | Leave a comment

Rationalizing Popular Outré or: Why Would Anyone Commission Frank Gehry?

It’s more important to be good than original.

—Mies Van Der Rohe


At the Intersection of Art and Architecture, Where Major Accidents Frequently Occur

Some topics actually beg to be addressed.

Those of us who live in free societies accept non-controversially that art is completely entitled to be provocative, even designedly and purposefully so.  Indeed, some legitimately contend that this is art’s primary function (I respectfully disagree, but I digress).

Architecture, at least public* architecture, must satisfy a different standard.

* I use the term “public” herein in a commercial sense to denote places of public accommodation, regardless of who owns them or how they were financed.  In legal terms, I mean structures that are or will be subject to the public accommodations provisions of the Americans With Disabilities Act.

To be sure, much of architecture can be appreciated in the same way as sculpture or other three-dimensional artistic media.  But the aegis is critically different; architecture must always facilitate.

That means aesthetics, no matter how inspired, are but a single component of architectural merit.  More to the point, it means aesthetics must never diminish the utility of public architecture.

At the very core of architecture lie constraints: fiscal, commercial, physical, legal, natural.  And since unrecognized or unheeded constraints are satisfied only accidentally, the first step in architectural creation is and must be analysis – not inspiration.

These dual notions of utility and constraints necessarily inform architectural criticism.  Good architecture satisfies constraints and reconciles aesthetics with utility.  Great architecture leverages constraints and integrates them with aesthetics to provide excess utility.

One important measure of architectural utility is sociological.  Great public architecture unifies and elevates a community, even as great art – no less importantly – often factionalizes.  I like the above quotation from Mies, in part because it encapsulates perfectly the architect’s practical imperative.  Architects are rightfully judged by their ability to envision and then deliver utility, and this is a chief distinction between architects and sculptors.

Which brings me to Frank Gehry, and specifically the Cleveland Clinic’s Lou Ruvo Center in Las Vegas.


nuvo_center

Monumental irony.  The Cleveland Clinic Lou Ruvo Center for Brain Health, designed by Frank Gehry – a building dedicated to the study and eradication of Alzheimer’s disease and other degenerative brain disorders.


Birds Gotta Fly, Fish Gotta Swim

Let me emphasize that I have no quarrel with Frank Gehry, per se.  As an architect, he makes no classical pretensions; he does what he does, and his design propensities are well known.  On the subject of Gehry’s design, I’d merely point out that Albert Hofmann never advocated a daily regimen of his famous discovery; he only claimed that it opened up some interesting possibilities.

In the same way, I’m libertarian enough not only to wish Frank Gehry and his associates well with respect to residential and private commissions, but to absolve him from guilt for accepting design commissions for public buildings.  After all, he has a business to run.

My real beef is with whoever handed the hot mic to Yoko – again, and I’m particularly interested in what would motivate someone to do it.


This Just In: Men Go Crazy in Congregations

By googling the terms “emperor’s new clothes” and “phenomenon,” I discovered the following paper written by three colleagues at Cornell and published in the American Journal of Sociology.  It concerns the sometimes inexplicable popularity of the unpopular; for example, how might one explain it if, over time, music critics began to praise recordings that featured gratuitous profanity spoken over cacophonous rhythms with no discernible melody?

According to the paper by Centola, Willer and Macy, this phenomenon cannot occur within a generalized and fully connected social network.  Popular unpopularity only takes root when agents within a developed local network create their own local preference and feedback loop for an unpopular norm, and thereafter export it to the generalized, connected social network.  To the extent that the generalized network attributes some special expertise to the local network, this of course greatly increases the probability of generalized acceptance.

If Centola et al. are correct, it means the AIA and the Pritzker Prize committee are as responsible as anyone for the Lou Ruvo Center and all of Gehry’s other aesthetic blasphemies.

It also means that the way to prevent future mis-coronation is for architects to connect regularly with people outside of the architecture community.  Actually show an architectural portfolio to people unaffiliated with architecture and ask them if it’s any good.  Listen to what they say and weigh their opinions; don’t automatically dismiss them as philistines if their opinions run counter to the architecture community’s prevailing assessment.  To charitable benefactors and governmental decision makers: trust your eyes.  Above all, don’t let an architect or architecture critic convince you that naked is clothed.


Posted in Architecture, Behavioral Economics, Pet Peeves | Leave a comment