Reality is often underpowered

Introduction

When I worked as a doctor, we had a lecture by a paediatric haematologist, on a condition called Acute Lymphoblastic Leukaemia. I remember being impressed that very large proportions of patients were being offered trials randomising them between different treatment regimens, currently in clinical equipoise, to establish which had the edge. At the time, one of the areas of interest was, given the disease tended to have a good prognosis, whether one could reduce treatment intensity to reduce the long term side-effects of the treatment whilst not adversely affecting survival.

On a later rotation I worked in adult medicine, and one of the patients admitted to my team had an extremely rare cancer,1 with a (recognised) incidence of a handful of cases worldwide per year. It happened the world authority on this condition worked as a professor of medicine in London, and she came down to see them. She explained to me that treatment for this disease was almost entirely based on first principles, informed by a smattering of case reports. The disease unfortunately had a bleak prognosis, although she was uncertain whether this was because it was an aggressive cancer to which current medical science has no answer, or whether there was an effective treatment out there if only it could be found.

I aver that many problems EA concerns itself with are closer to the second story than the first. That in many cases, sufficient data is not only absent in practice but impossible to obtain in principle. Reality is often underpowered for us to wring the answers from it we desire.

Big units of analysis, small samples

The main driver of this problem for ‘EA topics’ is that the outcomes of interest have units of analysis for which the whole population (leave alone any sample from it) is small-n: e.g. outcomes at the level of a whole company, or a whole state, or whole populations. For these big unit of analysis/small sample problems, RCTs face formidable in principle challenges:

  1. Even if by magic you could get (e.g.) all countries on earth to agree to randomly allocate themselves to policy X or Y, this is merely a sample size of ~200. If you’re looking at companies relevant to cage-free campaigns, or administrative regions within a given state, this can easily fall another order of magnitude.
  2. These units of analysis tend highly heterogeneous, almost certainly in ways that affect the outcome of interest. Although the key ‘selling point’ of the RCT is it implicitly controls for all confounders (even ones you don’t know about), this statistical control is a (convex) function of sample size, and isn’t hugely impressive at ~ 100 per arm: it is well within the realms of possibility for the randomisation happen to give arms with unbalanced allocation of any given confounding factor.
  3. ‘Roughly’ (in expectation) balanced intervention arms are unlikely to be good enough in cases where the intervention is expected to have much less effect on the outcome than other factors (e.g. wealth, education, size, whatever), thus an effect size that favours one arm or the other can be alternatively attributed to one of these.
  4. Supplementing this raw randomisation by explicitly controlling for confounders you suspect (cf. block randomisation, propensity matching, etc.) has limited value when don’t know all the factors which plausibly ‘swamp’ the likely intervention effect (i.e. you don’t have a good predictive model for the outcome but-for the intervention tested). In any case, they tend to trade-off against the already scarce resource of sample size.

These ‘small sample’ problems aren’t peculiar to RCTs, but endemic to all other empirical approaches. The wealth of econometric and quasi-experimental methods (e.g. IVs, regression discontinuity analysis), still run up against hard data limits, as well those owed to in whatever respect they fall short of the ‘ideal’ RCT set-up (e.g. imperfect instrumentation, omitted variable bias, nagging concerns about reverse causation). Qualitative work (case studies, etc.) have the same problems even if other ones (e.g. selection) loom larger.

Value of information and the margin of common-sense

None of this means such work has zero value – big enough effect sizes can still be reliably detected, and even underpowered studies still give us information. But we may learn very little on the margin of common sense. Suppose we are interested in ‘what makes social movements succeed or fail?’ and we retrospectively assess a (somehow) representative sample of social movements. It seems plausible the results of this investigation is the big (and plausibly generalisable) hits may prove commonsensical (e.g. “Social movements are more likely to grow if members talk to other people about the social movement”), whilst the ‘new lessons’ remain equivocal and uncertain.

We should expect to see this if we believe the distribution of relevant effect sizes is heavy-tailed, with most of the variance in (say) social movement success owed to a small number of factors, with the rest comprised of a large multitude of smaller effects. In such case, modest increases in information (e.g. from small sample data) may bring even more modest increases in either explaining the outcome or identifying what contributes to it:

Imgur

Toy example, where we propose a roughly pareto distribution of effect size among contributory factors. The largest factors (which nonetheless explain a minority of the variance) may prove to be obvious to the naked eye (blue). Adding in the accessible data may only slightly lower detection threshold, with modest impacts on identifying further factors (green) and overall accuracy. The great bulk of the variance remains in virtue of a large ensemble of small factors which cannot be identified (red). Note that detection threshold tends to have diminishing returns with sample size.

The scientific revolution for doing good?

The foregoing should not be read as general scepticism to using data. The triumphs of evidence-based medicine, although not unalloyed, have been substantial, and there remain considerable gains that remain on the table (e.g. leveraging routine clinical practice). The ‘randomista’ trend in international development is generally one to celebrate, especially (as I understand) it increasingly aims to isolate factors that have credible external validity. The people who run cluster-randomised, stepped-wedge, and other study designs with big units of analysis are not ignorant of their limitations, and can deploy these judiciously.

But it should temper our enthusiasm about how many insights we can glean by getting some data and doing something sciency to it.2 The early successes of EA in global health owes a lot to this being one of the easier areas to get crisp, intersubjective and legible answers from a wealth of available data. For many to most other issues, data-driven demonstration of ‘what really works’ will never be possible.

We see that people do better than chance (or better than others) in terms of prediction and strategic judgement. Yet, at least judging by the superforecasters (this writeup by AI impacts is an excellent overview), how they do is much more indirectly data-driven: one may have to weigh between several facially-relevant ‘base rates’, adjusting these rates by factors where the coefficient may be estimated by role in loosely analogous cases, and so forth.3 Although this process may be informed by statistical and numerical literacy (e.g. decomposition, ‘fermi-ization’), it seems to me the main action going on ‘under the hood’ is developing a large (and implicit, and mostly illegible) set of gestalts and impressions to determine how to ‘weigh’ relevant data that is nonetheless fairly remote to the question at issue.4

Three final EA takeaways:

  1. Most who (e.g.) write up a case study or a small-sample analysis tend to be well aware of the limitations of their work. Nonetheless I think it is worth paying more attention to how these bear on overall value of information before one embarks on these pieces of work. Small nuggets of information may not be worth the time to excavate even when the audience are ideal reasoners. As they aren’t, one risks them (or yourself) over-weighing their value when considering problems which should demand tricky aggregation of a multitude of data sources.
  2. There can be good reasons why expert communities in some areas haven’t tried to use data explicitly to answer problems in their field. In these cases, the ‘calling card’ of EA-style analysis of doing this anyway can be less of a disruptive breakthrough and more a stigma of intellectual naivete.
  3. In areas where ‘being driven by the data’ isn’t a huge advantage, it can be hard to identify an ‘edge’ that the EA community has. There are other candidates: investigating topics neglected by existing work, better aligned incentives, etc. We should be sceptical of stories which boil down a generalized ‘EA exceptionalism’.

Footnotes

  1. Its name escapes me, although arguably including it would risk deductive disclosure. To play it safe I’ve obfuscated some details. 
  2. And statistics and study design generally prove hard enough that experts often go wrong. Given the EA community’s general lack of cultural competence in these areas, I think their (generally amateur) efforts at the same have tended to fare worse. 
  3. I take as supportive evidence a common feature among superforecasters is they read a lot – not just in areas closely relevant to their forecasts, but more broadly across history, politics, etc. 
  4. Something analogous happens in other areas of ‘expert judgement’, whereby experts may not be able to explain why they made a given determination. We know that this implicit expert judgement can be outperformed by simple ‘reasoned rules’. My suspicion, however, is it still performs better than chance (or inexpert judgement) when such rules are not available. 

Is old art obsolete?

Summary

The present is much better than the past in almost all respects, and producing great works of art should not be one of exceptions: in principle modernity has a much larger ‘talent pool’ than the past, and in circumstances which much better allow this talent to flow into great accomplishments; when we look at elite performance in areas with ‘harder’ indicators, they show a trend of improvement from the past to present; and aggregate consumption of older versus new art provides a ‘wisdom of crowds’ consideration.

If we want to experience the ‘best’ artistic work, we should turn our attention away from the ‘classics’, and towards the modern body of work which all-but-inevitably surpasses them. Old art is obsolete.

Such a view runs counter to received wisdom, whereby the ‘canon’ of great art is dominated by classic works wrought by old masters. There’s a natural debunking argument that this view derives from snobbish signalling than genuine merit. I nonetheless explore other hypotheses that make old art non-obsolescent after all.

In concert, they may be enough to secure a place for old art. Yet the do not defray the key claim that the modern artistic community is profoundly stronger than any time before, and posterity should find our art to have been greater than any time before. For this, and many other things besides, there has never been a better time to be a Whig historian.

Continue reading “Is old art obsolete?”

Notes on Deep Work

Introduction

Cal Newport has written a book on ‘Deep Work’. I’m usually archly sceptical of ‘productivity gurus’ or ‘productivity guides’, but this book was recommended and I found it was very good – so much so I made notes on the way through.

I put them here in case others find them valuable: I’ve done a bit of rearranging of the line of argument, as well as some commentary on the areas I take to be less well-supported. Nonetheless, the notes follow the general structure in the book of section 1: Why deep work is great; section 2: How to do it. Those after the armamentarium of suggestions Newport makes on how to do it can skip down to the third section. Avoiding anything in italics spares you most of my editorialising. Continue reading “Notes on Deep Work”

The person-affecting value of existential risk reduction

Introduction

The standard motivation for the far future cause area in general, and existential risk reduction in particular, is to point to the vast future that is possible providing we do not go extinct (see Astronomical Waste). One crucial assumption made is a ‘total’ or ‘no-difference’ view of population ethics: in sketch, it is just as good to bring a person into existence with a happy life for 50 years as it is to add fifty years of happy life to someone who already exists. Thus the 10lots of potential people give profound moral weight to the cause of x-risk reduction.

Population ethics is infamously recondite, and so disagreement with this assumption commonplace; many find at least some form of person affecting/asymmetrical view plausible: that the value of ‘making happy people’ is either zero, or at least much lower than the value of making people happy. Such a view would remove a lot of the upside of x-risk reduction, as most of its value (by the lights of the total view) is ensuring a great host of happy potential people exist.

Yet even if we discount the (forgive me) person-effecting benefit, extinction would still entail vast person-affecting harm. There are 7.6 billion people alive today, and 7.6 billion premature deaths would be deemed a considerable harm by most. Even fairly small (albeit non-pascalian) reductions in the likelihood of extinction could prove highly cost-effective.

To my knowledge, no one has ‘crunched the numbers’ on the expected value of x-risk reduction by the lights of person affecting views. So I’ve thrown together a guestimate as a first-pass estimate.

An estimate

The (forward) model goes like this:

  1. There are currently 7.6 billion people alive on earth. The worldwide mean age is 38, and worldwide life expectancy is 70.5.
  2. Thus, very naively, if ‘everyone died tomorrow’, the average number of life years lost per person is 32.5, and the total loss is 247 Billion life years.
  3. Assume the extinction risk is 1% over this century, uniform by year (i.e. the risk this year is 0.0001, ditto the next one, and so on.)
  4. Also assume the tractability of x-risk reduction is something like (borrowing from Millett and Snyder-Beattie) this: ‘There’s a project X that is expected to cost 1 billion dollars each year, and would reduce the risk (proportionately) by 1% (i.e. if we spent a billion each year this century, xrisk over this century declines from 1% to 0.99%).
  5. This gives a risk-reduction per year of around 1.3 * 10-6 , and so an expected value of around 330 000 years of life saved.

Given all these things, the model spits out a mean ‘cost per life year’ of $1500-$26000 (mean $9200).

Caveats and elaborations

The limitations of this are nigh-innumerable, but I list a few of the most important below an approximately ascending order.

Zeroth: The model has a wide range of uncertainty, and reasonable sensitivity to distributional assumptions: you can modulate mean estimate and range by a factor of 2 or so by whether the distributions used are Beta, log normal, or tweaking their variance.

First: Adjustment to give ‘cost per DALY/QALY’ would be somewhat downward, although not dramatically (a factor of 2 would imply everyone who continues to live does so with a disability weight of 0.5, in the same ballpark as those used for major depression or blindness).

Second, trends may have a large impact, although their importance is modulated by which person-affecting view is assumed. I deliberately set up the estimate to work in a ‘one shot’ single year case (i.e. the figure applies to a ‘spend 1B to reduce extinction risk in 2018 from 0.0001 to 0.000099’ scenario).

By the lights of a person-affecting view which considers only people who exist now, making the same investment 10 years from now (i.e. spent 1B to reduce extinction risk in 2028 from 0.0001 to 0.000099) is less attractive, as some of these people would have died, and the new people who have replaced them have little moral relevance. These views thus imply a fairly short time horizon, and are particularly sensitive to x-risk in the near future. Given the ‘1%’ per century is probably not uniform by year, and plausibly lower now but higher later, this would imply a further penalty to cost-effectiveness.

Other person affecting views consider people who will necessarily exist (however cashed out) rather than whether they happen to exist now (planting a bomb with a timer of 1000 years is still accrues person-affecting harm). In a ‘extinction in 100 years’ scenario, this view would still count the harm of everyone alive then who dies, although still discount the foregone benefit of people who ‘could have been’ subsequently in the moral calculus.

Thus the trends in factual basis become more salient. One example is the ongoing demographic transition, and the consequently older population give smaller values of life-years saved if protected from extinction in the future. This would probably make the expected cost-effectiveness somewhat (but not dramatically) worse.

A lot turns on the estimate for marginal ‘x-risk reduction’. I think the numbers offered in terms of base rate, and how much it can be reduced for now much lean on the conservative side of the consensus of far-future EAs. Confidence in (implied) scale or tractability an order of magnitude ‘worse’ impose commensurate increases on the risk estimate. Yet in such circumstances the bulk of disagreement is explained by empirical disagreement rather than a different take on the population ethics.

Finally, this only accounts for something like the (welfare) ‘face value’ of existential risk reduction. There would be some further benefits by the light of the person-affecting view itself, or ethical views which those holding a person affecting view are likely sympathetic to: extinction might impose other harms beyond years of life lost; there could be person affecting benefits if some of those who survive can enjoy extremely long and happy lives; and there could be non-welfare goods on an objective list which rely on non-extinction (among others). On the other side, those with non-deprivationists accounts of the badness of death may still discount the proposed benefits.

Conclusion

Notwithstanding these challenges, I think the model, and the result that the ‘face value’ cost-effectiveness of x-risk reduction is still pretty good, is instructive.

First, there is a common pattern of thought along the lines of, “X-risk reduction only matters if the total view is true, and if one holds a different view one should basically discount it”. Although rough, this cost-effectiveness guestimate suggests this is mistaken. Although it seems unlikely x-risk reduction is the best buy from the lights of a person-affecting view (we should be suspicious if it were), given ~$10000 per life year compares unfavourably to best global health interventions, it is still a good buy: it compares favourably to marginal cost effectiveness for rich country healthcare spending, for example.

Second, although it seems unlikely that x-risk reduction would be the best buy by the lights of a person affecting view, this would not be wildly outlandish. Those with a person-affecting view who think x-risk is particularly likely, or that the cause area has easier wins available than implied in the model, might find the best opportunities to make a difference. It may therefore supply reason for those with such views to investigate the factual matters in greater depth, rather than ruling it out based on their moral commitments.

Finally, most should be morally uncertain in matters as recondite as population ethics. Unfortunately, how to address moral uncertainty is similarly recondite. If x-risk reduction is ‘good but not the best’ rather than ‘worthless’ by the lights of person affecting views, this likely implies x-risk reduction looks more valuable whatever the size of the ‘person affecting party’ in one’s moral parliament.

Continue reading “The person-affecting value of existential risk reduction”

How fragile was history?

Elsewhere (and better): 1, 2.

If one could go back in time and make a small difference in the past, would one expect it to effect dramatic changes to the future? Questions like these are fertile soil for fiction writers (generally writing under speculative or alternative history) but receive less attention in the historical academy, which tends to focus on explaining what in fact happened, rather than what could have been. Yet general questions of historical fragility (e.g. Are events in human history ‘generally’ fragile? In what areas is history particularly fragile? Are things getting more or less fragile over time?) are of particular interest to those interested in altering the course the long-run future by differences they make today. Continue reading “How fragile was history?”

In defence of epistemic modesty

This piece defends a strong form of epistemic modesty: that, in most cases, one should pay scarcely any attention to what you find the most persuasive view on an issue, hewing instead to an idealised consensus of experts. I start by better pinning down exactly what is meant by ‘epistemic modesty’, go on to offer a variety of reasons that motivate it, and reply to some common objections. Along the way, I show common traps people being inappropriately modest fall into. I conclude that modesty is a superior epistemic strategy, and ought to be more widely used – particularly in the EA/rationalist communities.

[gdoc]

Provocation

I argue for this:

In virtually all cases, the credence you hold for any given belief should be dominated by the balance of credences held by your epistemic peers and superiors. One’s own convictions should weigh no more heavily in the balance than that of one other epistemic peer.

Continue reading “In defence of epistemic modesty”

In defence of democracy

Last week the UK held a referendum on whether it should remain a member of the European Union. ‘Leave’ won by a narrow (52%) majority. The aftermath so far involves the resignation of the Prime Minister and consequent leadership election; a leadership challenge within the opposition; an increasingly restive Scotland looking for independence; and large slides of UK stocks and currency.

Contra the balance of the chattering classes (and the great majority of all those in my social media bubble). I think the results of the referendum should be respected, and therefore the UK should leave the EU.[1] Continue reading “In defence of democracy”

Are History’s “Greatest Philosophers” All That Great?

Introduction

In the canon of western philosophy, generally those regarded as the ‘greatest’ philosophers tend to live far in the past. Consider this example from an informal poll:

  1. Plato (428-348 BCE)
  2. Aristotle (384-322 BCE)
  3. Kant (1724-1804)
  4. Hume (1711-1776)
  5. Descartes (1596-1650)
  6. Socrates (469-399 BCE)
  7. Wittgenstein (1889-1951)
  8. Locke (1632-1704)
  9. Frege (1848-1925)
  10. Aquinas (1225-1274)

(source: LeiterReports)

I take this as fairly representative of consensus opinion—one might argue about some figures versus those left out, or the precise ordering, but most would think (e.g.) Plato and Aristotle should be there, and near the top. All are dead, and only two were alive during the 20th century.

But now consider this graph of human population over time (US Census Bureau, via Wikipedia):

1000px-Population_curve.svg

The world population at 500BCE  is estimated to have been 100 million; in the year 2000, it was 6.1 billion, over sixty times greater. Thus if we randomly selected people from those born since the ‘start’ of western philosophy, they would generally be born close to the present day. Yet when it comes to ‘greatest philosophers’, they were generally born much further in the past than one would expect by chance. Continue reading “Are History’s “Greatest Philosophers” All That Great?”

Free will, without God?

Part 8 in series: 20 Atheist answers to questions they supposedly can’t

11. How is free will possible in a material universe?

Short answer: Depends what you mean by ‘free will….’

Long answer: What exactly do we need to ‘count as’ having free will (and does our situation satisfy that?) Particularly, if we live in a world that is apparently determined via laws of nature, surely our brains (and perhaps therefore our minds) are included in this inviolable causal chain. So, if our thoughts are determined, what then for our intuition we have free will? Continue reading “Free will, without God?”