In my previous post I published a bunch of R Scripts that will enable a reader of Taleb’s “Silent Risk“, Chapter 3, Section 3.2 “Problems and Inverse Problems” to play with the ideas he presents. I thought I should discuss one of the results those scripts produce that does not jive with Taleb’s.

I know from writing blog posts that it is incredibly difficult to be consistent and accurate when giving examples. Ideas evolve as you try to write them up, you try different things and inconsistencies emerge between the text, the code, and the charts.

The result that is not consistent is to do with the expected loss. Taleb’s book suggests “that close to 67% of the observations underestimate the tail risk below 1%, and 99% for more severe risks”. My MC simulations indicate that even at a tail risk below 5%, 99%+ of the observations underestimate the loss.

The expected shortfall is defined as follows:

\displaystyle S(\kappa )=\frac{\int_{-\infty }^{\kappa}x.f(x)dx}{\int_{-\infty }^{\kappa}f(x)dx}

Where κ is the value of x associated with the percentile tail risk. i.e. the value of κ such that the probability that x is less than or equal to κ is equal to the tail risk level. So the shortfall is the expected value of x given that x is less than or equal to κ.

The denominator of S(κ) is F(x <= κ), which is, of course the tail risk level. Since, for a left tailed pareto type 4 distribution, there is no reliable way to estimate a good number to substitute for -infinity, but we do know that there is no possibility of a value of x greater than μ, we can work with the range κ to μ:

\displaystyle \int_{-\infty }^{\kappa}x.f(x)dx=\int_{-\infty }^{\mu}x.f(x)dx-\int_{\kappa }^{\mu}x.f(x)dx=E[x]-\int_{\kappa }^{\mu}x.f(x)dx

We already have a function for calculating the expected value for x. We can compute the integral numerically by choosing some suitable increment to represent “dx”, calculate all the values of x between κ and μ at intervals equal to the increment, calculate f(x) – the pdf – for the same range of values of x, multiply x, f(x), and dx for each value of x and add them all up.

I do the same thing for a gaussian distribution using the means and standard deviations derived from the simulated samples.

Problem is, even at a 5% level, my script indicates 99.9%+ of the gaussian loss values are less than the pareto loss value. At 1% the value is always 100%!

Perhaps Taleb is calculating the Gaussian loss a different way. Maybe he is doing it from each observation …

Sure enough, if I estimate the loss from the observation by sorting the sample series, taking the first (tail % . sample size) elements and averaging them I get the following:

  • 5% tail => 61%+ underestimates
  • 1% tail => 67%+ underestimates
  • 0.1% tail => 77%+ underestimates

So I created a new function:

shortFallEmpirical &lt;- function(empData, tail=0.01) {
  mean(sort(empData)[1:(tail * length(empData))])
}

and updated the main script that runs all the tests:

rm(list=ls())
source("Silent Risk Utilities.R")

# Pareto4 distribution parameters:
mu &lt;- 0 # Taleb has 1
sigma &lt;- 1 # Taleb has 2
gamma &lt;- 0.75
alpha &lt;- 1.5
right &lt;- F

# Monte Carlo simulation values:
M &lt;- 1000
n &lt;- 1000

# Shortfall estimate values
tail &lt;- 0.01

message("Characteristics of the actual distribution shown in the charts")
dev.new()
plot(x=seq(-6, 0, 0.01), dPareto4(x=seq(-6, 0, 0.01), mu=mu, sigma=sigma, 
  gamma=gamma, alpha=alpha, right=right), t="l", xlab="x", ylab="Density",
  main="Probability Density Function")
message("Generating ", M, " samples with ", n, " random elements each ...")
MCSeries &lt;- MonteCarloPareto4(M=M, n=n, mu=mu, sigma=sigma, 
  gamma=gamma, alpha=alpha, right=right)
dev.new()
par(mfrow=c(2, 2))
samples &lt;- sample(M, 4)
for (i in 1:4){
	hist(MCSeries[samples[i], MCSeries[samples[i], ] &gt;= -6], breaks=24, 
	  main=paste("Series", samples[i]), xlab="x value")
}

expectedValue &lt;- ePareto4(mu=mu, sigma=sigma, gamma=gamma, alpha=alpha, right=right)
message("True mean of distribution is ", round(expectedValue, 2), ".")
message("Mean of ", M * n, " variates is ", round(mean(MCSeries), 2), ".")
expectedVar &lt;- vPareto4(mu=mu, sigma=sigma, gamma=gamma, alpha=alpha, right=right)
message("True variance of the distribution is ", ifelse(is.na(expectedVar), "infinite",
 round(expectedVar, 2)), ifelse(alpha &lt;= 2 * gamma, " (alpha &lt;= 2.gamma).", "."))
message("The variance of ", M * n, " variates is ", round(sd(MCSeries)^2, 2), ".")

sampleMeans &lt;- rowMeans(MCSeries)
dev.new()
hist(sampleMeans, 20, main="Probability of Calculating Various Means",
 xlab="Calculated Means", ylab="Relative Probabiity", freq=F)
message(round(sum(sampleMeans &gt; expectedValue) * 100 / M, 2),
 "% of sample means exceed 'true' value.")
message("Further, ", round(sum(MCSeries &gt; expectedValue) * 100 / M / n, 2),
 "% of randomly generated variates exceed the 'true' mean.")

sampleSDs &lt;- apply(MCSeries, 1, sd)
message("The sample standard deviations (&lt; 20) have a distribution as shown in the histogram.")
message("A further ", length(sampleSDs[sampleSDs &gt; 20]),
 " samples had sd's exceeding 20, to a maximum of ", round(max(sampleSDs), 2), ".")
message("The average deduced value for the sd is ", round(mean(sampleSDs), 2), ".")
dev.new()
hist(sampleSDs[sampleSDs &lt;= 20], 40, main="Probability of Calculating Each (Wrong) SD",
 xlab="Calculated SDs", ylab="Relative Probabiity", freq=F)

pareto4ShortFall &lt;- shortFallPareto4(tail=tail, increment=0.001, mu=mu, sigma=sigma,
 gamma=gamma, alpha=alpha, right=right)

shortFallsE &lt;- apply(MCSeries, 1, shortFallEmpirical, tail=tail)

shortFallExcess &lt;- sum(shortFallsE &gt; pareto4ShortFall)
message("Expected shortfall at ", 100 * tail, "% level is ", round(pareto4ShortFall, 2), ".")
message("The apparent shortfall of the samples is less severe than the 'true' shortfall ",
 round(100 * shortFallExcess / M, 2), "% of time.")

The bottom line? I was assuming more than necessary: Taleb is not implying that the “hapless analysts”, as I called the subjects of our hypothetical experiment, are assuming a Gaussian distribution, they are simply calculating the characteristics of the samples they are given!

EDIT 04-05-2016: Updated Silent Risk C3.R script to reflect my improved understanding of Section 3.2.

Share This

Share

If you found this post informative, please share it!

New Commodity Pool Launches

Please provide your name and email address so we can send you our monthly compilation of new commodity pools registered with NFA.

You can unsubscribe at any time.

Thank you! Your file will be available after you have completed a two-step confirmation process. Check your in-box for step 1.

Biggest Hedge Funds By $AUM

Please provide your name and email address so we can send you our monthly compilation of biggest hedge funds by $AUM as reported on the SEC's Form ADV.

You can unsubscribe at any time.

Thank you! Your file will be available after you have completed a two-step confirmation process. Check your in-box for step 1.