Notation: a beginner’s guide

April 16, 2013 · by wciecon · in Frances Woolley, Teaching · 20 Comments

My colleague Lynda Khalaf's favourite saying is: Notation, notation, notation. Bad notation makes a paper difficult to follow. Papers that are hard to read and understand get rejected, or receive lower grades.

But what makes for good notation? First, symbols should be easy to remember. Take, for example, this passage from Holmstrom and Milgrom (1991).

It's simple: B stands for benefits, C stands for costs, and t represents effort devoted to each task. Yet coming up with an intuitive system of symbols is not as easy as it looks.

One problem is that a lot of letters are already taken. Notice that Holstrom and Milgrom do not use "e" for effort, but instead "t" for effort spent on tasks. I would guess that they did this because, in a probabilistic model, "e" is generally used to represent the error term.

For similar reasons, Holstrom and Milgrom couldn't use "p" to stand for "principal", because p is commonly associated with either price or probability (or perhaps producer, profit or productivity). Using it for anything else is confusing. The way Holmstrom and Milgrom avoid this problem is really quite neat: they do not use any symbol at all for the principal or the agent. They don't have to, since benefits always go to the principal, and costs are always borne by the agent.

This illustrates another general rule suggested to me by University of Victoria economic theorist Linda Welling: avoid multiple subscripts or subscript/superscript combinations whenever possible. Consider, for example, what would happen if Holmstrom and Milgrom had used this notation instead:

Consider a principal-agent relationship in which the agent a makes a one-time
choice of a vector of efforts e^a = (e^a₁,…,e^a_n) …

For the author, multiple subscripts and superscripts create too many chances for errors or slippage when working or presenting. For the reader, it just makes things hard to remember.

One other rule is illustrated by the Holmstrom and Milgrom paper quoted above. Generally speaking, greek letters are used for parameters of the model, such as in this case μ (though note μ is actually a function). There are quite a number of conventions like this. For example, individual-level variables are often represented in lower-case, aggregate-level variables in upper-case. Perhaps others will suggest additional conventions in the comments.

The Holmstrom and Milgrom notation seems so simple and logical, but I would be willing to bet that it took them some time, and a bit erasing everything and starting again, to come up with it. An alternative approach is just to use the same notation as others in the literature have have done. This is one area where there are no points for originality – a paper that uses conventional notation will be easier for the reader to understand.

When working on a model, creating a list of symbols that you and any potential referees or examiners can refer back to is a good idea, even if that list does not go into the final version of the paper. Such a list will also reveal any confusing duplications, for example, use of c for cost and C for consumption. It may also suggest opportunities for simplification. Consider, for example, a model where the agent is splitting time three ways, between paid work, household production, and leisure. These three uses of time could be represented as L, l, and H, or as t₁, t₂ and t₃, or as l₁, l₂ and l₃. Which is simpler? Which is easier to remember? Is there something else that might be even better?

Once one has a model, one has to figure out what goes in between the equations (hint: economics). Every symbol used in a paper needs to be defined clearly the first time it mentioned. If the symbol has not been used in a while, it is a good idea to give the reader a hint as to what it means. For example, instead of always referring to C(t), occasionally say something like "the cost C(t)". A jet-lagged referee or professor reading a paper on a plane after a beer or two may find it challenging to remember the meaning of 10 different symbols.

It's been a while since I've done any serious theory, so I may not be the best person to give advice in this area. I hope that others will provide reactions, or additional suggestions, in the comments.

20 comments

Stephen Gordon · April 16, 2013 - 9:15 pm · Reply→

Oof. Yes. This is one of those things you learn with experience: that you are the only one who will ever make whatever effort it takes to understand your notation. If your notation is too opaque, people will simply give up and stop reading. I know I’ve written referee reports to the effect that since the author hadn’t made an effort to be understood, I wasn’t about to make the effort to understand the paper.
Stephen Gordon · April 16, 2013 - 9:16 pm · Reply→

(Also, tell Lynda I said hello.)
Frances Woolley · April 16, 2013 - 9:30 pm · Reply→

Stephen, will do.
Via facebook I’ve just had another suggestion: William Thomson’s book A Guide for the Young Economist: http://mitpress.mit.edu/books/guide-young-economist Apparently it has a chapter on how to write a paper.
Giovanni · April 16, 2013 - 10:15 pm · Reply→

“Generally speaking, greek letters are used for parameters of the model, such as in this case μ.”
Call me pedantic, but…isn’t this an example of notational inconsistency? The expression μ(t) is a vector-valued function defined over t. We’ve already established that lowercase Roman letters represent vectors, i.e., t itself. Why then suddenly start using lowercase Greek for the same purpose? Why not write (say): x = u(t) + e?
One possible rationale for using both Roman and Greek lowercase: to allow readers to easily distinguish between stochastic and deterministic vectors. But the authors aren’t doing that…both x and ϵ are stochastic, μ and t deterministic.
Frances Woolley · April 16, 2013 - 10:54 pm · Reply→

Giovanni, the agent chooses t, that is, the amount of effort to devote to each task. Because it’s a thing that’s chosen by an individual agent, it gets a lower case Roman letter, t.
μ, on the other hand, is something outside of the agent’s control – it determines the relationship inputs and outputs, between the amount of effort the agent devotes to the task and the “information signal” that the principal observes.
The way these models typically work, if ε is has as large variance you get one set of results, if ε is has a small variance there will be another set of results. E.g. if if μ(t)=t and ε=0, then the principal can directly observe the amount of effort that the agent is putting into the task, and can design appropriate incentive schemes. The principal-agent problem disappears.
Bob Smith · April 16, 2013 - 10:55 pm · Reply→

Good post, and one with a more general application. The same logic applies to the use of defined terms (which pop up a lot in my profession). Ideally, they should be simple, intuitive and easy to remember – you don’t want the reader (be they a professor, client or judge) to have to keep flipping back to the beggining of the essay, memo or agreement to figure out what a particular defined term means. And they should be short – the whole reason for using defined terms is to avoid having to repeat verbiage ad nauseum. That point is defeated if the defined term is 7 words long.
At best, poorly thought-out defined terms create annoyance on the part of the readers, at worst confusion or genuine misunderstanding as to what you’re trying to say. I’m dealing with a file where sloppy use of a defined term (i.e., one that didn’t accurately reflect the legal status of one of the parties to the agreement) meant that none of the parties to the agreement actually appreciated its legal implications. Had the lawyers drafting the agreement used a defined term that actually reflected that party’s role, people would have twigged to the legal implications earlier and a lot of sleepless nights (and hefty legal fees) might have been avoided.
Frances Woolley · April 16, 2013 - 11:00 pm · Reply→

Bob, absolutely. It’s easy to fall for the fallacy of sunk costs, and think “I’ve written 10 pages already, I don’t want to have to go back and change everything.”
No, it’s worth going back and making the changes (that’s something I learned the hard way).
Giovanni · April 16, 2013 - 11:33 pm · Reply→

“the agent chooses t, that is, the amount of effort to devote to each task. Because it’s a thing that’s chosen by an individual agent, it gets a lower case Roman letter, t.”
Fair enough…so the symbol for the information vector must be a lowercase “chi”, not an “ex” (as I initially thought it to be).
Frances Woolley · April 17, 2013 - 7:01 am · Reply→

Giovanni, the information vector isn’t a model parameter either, it’s the outcome of the agent’s actions and the model parameters.
A model parameter is something whose value is not determined within the model (although the results of the model may be sensitive to its value. It’s exogenous. For example an agent’s discount rate (delta) is typically a model parameter,
Phil Koop · April 17, 2013 - 9:09 am · Reply→

“creating a list of symbols that you and any potential referees or examiners can refer back to is a good idea”
That is excellent advice, though I would say it is nearly indispensable, rather than merely “good”.
“though note μ is actually a function”
Nothing wrong with functions as parameters, particularly if you are taking a Bayesian approach.
“Perhaps others will suggest additional conventions in the comments.”
I like to distinguish vectors from scalars in some way. In this example, it would be logical to use bold font, or perhaps upper case. That would answer Giovanni’s complaint.
Bob Smith · April 17, 2013 - 10:37 am · Reply→

“It’s easy to fall for the fallacy of sunk costs, and think “I’ve written 10 pages already, I don’t want to have to go back and change everything.”
Especially in the age of the “find and replace” feature (although, used carelessly that poses its own problems).
Giovanni · April 17, 2013 - 12:18 pm · Reply→

“A model parameter is something whose value is not determined within the model (although the results of the model may be sensitive to its value. It’s exogenous. For example an agent’s discount rate (delta) is typically a model parameter”
Ah…now me know difference endo and exo (what me do so many years econ school?) now me see rule…Latin letter = endogenous variable, Greek letter = exogenous variable/parameter/function…that is, except when denoting some familiar thingie like a benefit/cost function, where one may write B(t)/C(t)…or the (exogenous) end-value of an index, where one may write n or k…or a Cobb-Douglas production/utility function, where one may use a, b to represent the factor-share/expenditure-share parameters…or a generic system of structural equations (as in the econometrics textbook – Judge et al – I’m presently gazing at), where one may use e to represent the vector of exogenous shocks impinging on the system…
Oh, heck…now me confused again.
Frances Woolley · April 17, 2013 - 12:28 pm · Reply→

Giovanni, perhaps you’d better stick to poetry.
These are conventions, not absolute rules. Micro, macro and econometrics use different conventions; I’m talking here mostly about the ones appropriate for micro theory, as one could deduce from the examples I used. But, yes, they are generally followed.
In a Cobb-Douglas utility function, for example, it’s more usual to use alpha and beta than a and b (or, actually alpha and 1 – alpha).
Model parameters are exogenous variables, but not all exogenous variables are model parameters. The weather, for example, is exogenous, but it is not a model parameter.
I’m not here laying down the law telling people what conventions they must use. These are simply suggestions that may, possibly, help some PhD student or junior faculty member get a paper published, or a thesis accepted.
Gene Callahan · April 17, 2013 - 12:41 pm · Reply→

‘It’s easy to fall for the fallacy of sunk costs, and think “I’ve written 10 pages already, I don’t want to have to go back and change everything.”‘
Huh? In this case, it wouldn’t be the sunk cost that would be giving me pause, it would be the future cost of making the changes!
Giovanni · April 17, 2013 - 12:53 pm · Reply→

“Giovanni, perhaps you’d better stick to poetry.”
Quite…will do.
Frances Woolley · April 17, 2013 - 2:48 pm · Reply→

Terry McGarty gives his excellent rules for writing (following up on this post) on The Squirrel’s Nest here: http://terrymcgarty.blogspot.ca/2013/04/mathematics-and-language.html
Giovanni · April 18, 2013 - 12:34 am · Reply→

Frances,
My apologies for being a nitpicking jerk in the above. Sometimes I get a little giddy with my own cleverness. Your basic point is well-taken…I’ve suffered through enough wretchedly written papers in my time to know. I’ll try to keep my contributions a little more substantive and on point in future.
Frances Woolley · April 18, 2013 - 6:59 am · Reply→

Giovanni – no worries. I have no idea who you are – you might be an undergrad student in econ, you might be an evolutionary biologist, you might be a tax lawyer, you might be a lay preacher, you might be a hippie home schooler. I make a guess about someone’s level of econ knowledge, but if I guess wrong either I go right over the person’s head, or I come across as patronizing. In this case, it was probably the later.
You’re right that people don’t always use greek letters for model parameters. It’s also true that t is a lousy choice for effort on task, as it gets way too easily confused with time.
But I maintain that the micro theorist who has run out of roman letters, and has to start using greek letters somewhere, is best advised to use greek for model parameters than for choice variables.
Giovanni · April 18, 2013 - 4:48 pm · Reply→

Frances,
Thanks for your understanding. For what its worth, I’m actually a retired professional economist/statistician…and these days, I suppose, a sort of superannuated hippie home schooler. Cheers.
Min · April 18, 2013 - 10:15 pm · Reply→

I am so glad that I learned computer programming in such a way as to break the habit of cryptic variables. Programs are written for humans to read, as well as for machines to perform. So I can write things like, effort_vector, cost(effort_vector), normal_random_error, and so on. I can read programs that I wrote 25 years ago and I know what I was doing. 🙂