I have read most everything Nassim Nicholas Taleb has published. He is working on a new, more technical exposition to go along with his latest “Antifragile”. The draft of the book is called “Silent Risk” and is available on Taleb’s website: Fooled By Randomness. I have been (slowly) working my way through it. I had an idea to develop a set of R-scripts that can be used to play with the ideas he discusses. As a first step I present here some utilities (useful functions) that I have put together concerning Generalized Pareto Distributions which I wrote to go along with Chapter 3, section 3.2 “Problems and Inverse Problems”. I will add to them as I go. In the mean time, I will briefly discuss each, present the scripts and show what I have done to verify that they work as intended.
In each case the distribution is based Wikipedia’s entry on Generalized Pareto Distribution (Type 4). The Type 4 distribution is the most general of the four presented. The other types are each special cases of the Type 4. Note this is on the “Pareto Distribution” page, not the “Generalized Pareto Distribution” page. The functions are as follows:
- pPareto4() – returns the CDF for the values submitted.
- dPareto4() – returns the PDF for the values submitted.
- qPareto4() – returns the percentiles associated with the probabilities submitted.
- rPareto4() – returns a set of random numbers with a Pareto Distribution.
Since most of the examples I have looked at in “Silent Risk” concern Pareto Distributions with the tail out to the left, and the Wikipedia material uses a distribution with the tail to the right, nearly all the functions have a “right” parameter which defaults to “FALSE” in keeping with Taleb’s usage, but the user can set to “TRUE”.
Cumulative Distribution Function
The right-tailed version:
The left-tailed version:
Where: The shape parameter α is the tail index, μ is location, σ is scale, γ is an inequality parameter. The default values match the examples Taleb uses in Section 3.2, with the exception that his text indicates μ = 1 and σ = 2, but the figures are consistent with μ = 0 and σ = 1. In addition, Taleb uses k for scale instead of σ.
# cdf for Type 4. Returns p(X <= x) pPareto4 <- function(x=seq(-10, 10, 0.001), mu=0, sigma=1, gamma=0.75, alpha=1.5, right=F) { if(right){ c(0 * x[x <= mu], 1 - (1 + ((x[x > mu] - mu) / sigma)^(1/gamma))^(-alpha)) }else{ c((1 + ((-x[x < mu] + mu) / sigma)^(1/gamma))^(-alpha), rep(1, length(x[x >= mu]))) } }
Probability Density Function
The right-tailed version:
The left-tailed version:
Where: The shape parameter α is the tail index, μ is location, σ is scale, γ is an inequality parameter. The default values match the examples Taleb uses in Section 3.2.
# pdf for Type 4. dPareto4 <- function(x=seq(-10, 10, 0.001), mu=0, sigma=1, gamma=0.75, alpha=1.5, right=F){ if(right){ c(0 * x[x <= mu], alpha * (sigma^(-1 / gamma))*(((-mu + x[x > mu])^(1 / gamma - 1))* (((sigma / (-mu + x[x > mu]))^(-1 / gamma)) + 1)^(-alpha - 1)) / gamma) }else{ c(alpha * (sigma^(-1 / gamma))*(((mu - x[x < mu])^(1 / gamma - 1))* (((sigma / (mu - x[x < mu]))^(-1 / gamma)) + 1)^(-alpha - 1)) / gamma, 0 * x[x >= mu]) } }
Percentile Function
The right-tailed version:
The left-tailed version:
Where: The shape parameter α is the tail index, μ is location, σ is scale, γ is an inequality parameter. The default values match the examples Taleb uses in Section 3.2.
# Calculates the percentile for Type 4 qPareto4 <- function(p=seq(0, 1, 0.01), mu=0, sigma=1, gamma=0.75, alpha=1.5, right=F){ if (right){ mu + sigma * ((1 - p)^(-1 / alpha) - 1)^gamma }else{ mu - sigma * (p^(-1 / alpha) - 1)^gamma } }
Random Pareto Function
According to Wikipedia, random variates with a Pareto Type 4 Distribution can be generated using random variates from 2 gamma distributions, one with both shape and scale parameters set to 1, the other with shape equal to α, and scale = 1. The right-tailed version:
The left-tailed version:
Where: The shape parameter α is the tail index, μ is location, σ is scale, γ is an inequality parameter. The default values match the examples Taleb uses in Section 3.2. R has a built-in rgamma(n, shape, rate) function for generating n random variates with gamma distribution.
# Generates n random numbers with Type 4 distribution. rPareto4 <- function(n=1000, mu=0, sigma=1, gamma=0.75, alpha=1.5, right=F){ if(right){ mu + sigma * (rgamma(n, 1, 1) / rgamma(n, alpha, 1)) ^ gamma }else{ mu - sigma * (rgamma(n, 1, 1) / rgamma(n, alpha, 1)) ^ gamma } }
At this point it is wise to verify that the random number generator matches our expectations for the distribution. The following script accomplishes that by generating 10,000 random variates and deriving their “empirical” CDF and PDF. The results are plotted on top of the values derived from the definitions of the CDF and PDF of the Pareto Distribution.
rm(list=ls()) source("Silent Risk Utilities.R") # To demonstrate that rPareto4 is consistent with pPareto4 sampSize <- 10000 sample <- rPareto4(sampSize) increments <- 0.001 xRange <- seq(-6, 1, increments) genPSeries <- vector("numeric", length(xRange)) genPSeries[1] <- sum(sample <= xRange[1]) for (i in 2:length(xRange)){ genPSeries[i] <- genPSeries[i - 1] + sum(sample > xRange[i - 1] & sample <= xRange[i]) } genPSeries <- genPSeries / sampSize dev.new() plot(xRange, genPSeries, col="red", main="CDF vs rPareto4", xlab="X Values", ylab="CDF") lines(xRange, pPareto4(x=xRange)) # To demonstrate that rPareto4 is consistent with dPareto4 genDSeries <- vector("numeric", 100) newXrange <- seq(min(xRange), max(xRange), length=101) for (i in 1:100){ genDSeries[i] <- sum(sample > newXrange[i] & sample <= newXrange[i + 1]) } genDSeries <- genDSeries * 100 / (max(xRange) - min(xRange)) / sampSize newXrange <- (newXrange[-101] + newXrange[-1]) / 2 dev.new() plot(newXrange, genDSeries, col="red", main="PDF vs rPareto4", xlab="X Values", ylab="PDF") lines(xRange, dPareto4(x=xRange))
In order to dampen the noise in the pdf, the range of x is divided up into 100 segments, and all the appearances of a variate within each range are counted – so the points are the tops of the bars in a histogram. The results are as follows:
So it looks as though the generator is working as it should. The next post will make use of these functions and provide some functions more specific to “Silent Risk” Chapter 3, to illustrate the problems of not knowing much about the sample data one is working with, and how that might lead to nasty surprises!
Share
If you found this post informative, please share it!