Angara.Statistics


Distributions

A Distribution is a value which allows to draw a pseudo-random number or compute a probability density function log_pdf. The Distribution discriminated union supports most common probability distributions like Uniform or Normal. More details are in a separate document page.

For example, here is how you define normal distribution once you've referenced Angara.Statistics.dll:

1: 
2: 
3: 
open Angara.Statistics

let distribution = Normal(37.0, 9.0)

Random number generator

To draw random numbers we use Mersenne Twister, one of the most commonly used random number generator.

1: 
2: 
3: 
4: 
5: 
let generator = MT19937()
let random_values =
    generator.uniform_uint32(),  // low-level output
    generator.uniform_float64(), // a floating point number from [0,1)
    generator.normal()           // standard normal distribution with mean=0 and stdev=1
(3499211612u, 0.1354770041, -0.5456602766)

To reproduce the generator state use either exact seed or copy-constructor:

1: 
2: 
3: 
4: 
5: 
6: 
let seed = generator.get_seed()
let g' = MT19937(seed)
let g'' = MT19937(generator)
// the three instances should now produce
// exactly the same sequence of pseudo-random numbers
let test_3 = generator.normal(), g'.normal(), g''.normal()
(-2.165115627, -2.165115627, -2.165115627)

To draw a number from a non-standard distribution use draw : MT19337 -> Distribution -> float. For example, here we generate an array of 1000 random numbers from the above distribution.

1: 
let samples = [| for _ in 1..1000 -> draw generator distribution |]

Probability density function

One single function log_pdf : Distribution -> float -> float computes logarithm of normalized probability density of any distribution at any valid point.

If you do not have a Distribution but have a sample from the distribution used kernel density estimator kde : int -> float seq -> (float[] * float[]). The estimator needs target number of points at which the density is to be evaluated. If this number is not a power of two he number of points will be the next larger power of two. The estimator returns two arrays of x and y values:

The following sample code compares non-parametric density of the sample with the exact density from log_pdf function.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
// estimate density curve in 16 points
let sampling_density_x, sampling_density_y = kde 128 samples
// compute exact probability density function
let analytic_density_y = [|for x in sampling_density_x -> exp(log_pdf distribution x)|]

open Angara.Charting
let chart = 
    Chart.ofList [Plot.line(sampling_density_x, analytic_density_y, thickness=7., stroke="lightgray")
                  Plot.line(sampling_density_x, sampling_density_y)]

Sample statistics

Use summary and qsummary functions to quickly compute mean, standard deviation, 95% and 68% quantiles of a sample:

1: 
printfn "%A" (summary samples)
{count = 1000;
 min = 9.02008448;
 max = 67.19459218;
 mean = 37.14566509;
 variance = 80.56673735;}
1: 
printfn "%A" (qsummary samples)
{min = 9.02008448;
 lb95 = 19.81095462;
 lb68 = 28.21379981;
 median = 37.17099114;
 ub68 = 45.98669919;
 ub95 = 55.18024092;
 max = 67.19459218;}

The lb95 in the qsummary record stands for "lower bound of 95% interval" with is actually a 2.5% quantile. Similarly ub95 stands for 97.5% quantile.

namespace Angara
module Statistics

from Angara
val distribution : Distribution

Full name: Tutorial.distribution
union case Distribution.Normal: float * float -> Distribution
val generator : MT19937

Full name: Tutorial.generator
Multiple items
type MT19937 =
  new : copy:MT19937 -> MT19937
  new : seed:uint32 [] -> MT19937
  new : ?seed:uint32 -> MT19937
  private new : mt:uint32 [] * idx:int -> MT19937
  member bernoulli : p:float -> bool
  member private getIdx : int
  member private getMt : uint32 []
  member get_seed : unit -> uint32 []
  member normal : unit -> float
  member uniform_float64 : unit -> float
  ...

Full name: Angara.Statistics.MT19937

--------------------
new : ?seed:uint32 -> MT19937
new : seed:uint32 [] -> MT19937
new : copy:MT19937 -> MT19937
val random_values : uint32 * float * float

Full name: Tutorial.random_values
member MT19937.uniform_uint32 : unit -> uint32
member MT19937.uniform_float64 : unit -> float
member MT19937.normal : unit -> float
val seed : uint32 []

Full name: Tutorial.seed
member MT19937.get_seed : unit -> uint32 []
val g' : MT19937

Full name: Tutorial.g'
val g'' : MT19937

Full name: Tutorial.g''
val test_3 : float * float * float

Full name: Tutorial.test_3
val samples : float []

Full name: Tutorial.samples
val draw : gen:MT19937 -> d:Distribution -> float

Full name: Angara.Statistics.draw
val sampling_density_x : float []

Full name: Tutorial.sampling_density_x
val sampling_density_y : float []

Full name: Tutorial.sampling_density_y
val kde : n0:int -> xs:seq<float> -> float [] * float []

Full name: Angara.Statistics.kde
val analytic_density_y : float []

Full name: Tutorial.analytic_density_y
val x : float
val exp : value:'T -> 'T (requires member Exp)

Full name: Microsoft.FSharp.Core.Operators.exp
val log_pdf : d:Distribution -> v:float -> float

Full name: Angara.Statistics.log_pdf
namespace Angara.Charting
val chart : Chart

Full name: Tutorial.chart
type Chart =
  {Plots: PlotInfo list;}
  static member ofList : plots:PlotInfo list -> Chart

Full name: Angara.Charting.Chart
static member Chart.ofList : plots:PlotInfo list -> Chart
type Plot =
  private new : unit -> Plot
  static member band : seriesX:float [] * seriesY1:float [] * seriesY2:float [] * ?fill:string * ?displayName:string * ?titles:BandTitles -> PlotInfo
  static member heatmap : x:float [] * y:float [] * values:float [] * ?colorPalette:string * ?treatAs:HeatmapTreatAs * ?displayName:string * ?titles:HeatmapTitles -> PlotInfo
  static member heatmap : x:float [] * y:float [] * values:HeatmapValues * ?colorPalette:string * ?treatAs:HeatmapTreatAs * ?displayName:string * ?titles:HeatmapTitles -> PlotInfo
  static member line : seriesY:float [] * ?stroke:string * ?thickness:float * ?treatAs:LineTreatAs * ?fill68:string * ?fill95:string * ?displayName:string * ?titles:LineTitles -> PlotInfo
  static member line : seriesX:float [] * seriesY:float [] * ?stroke:string * ?thickness:float * ?treatAs:LineTreatAs * ?fill68:string * ?fill95:string * ?displayName:string * ?titles:LineTitles -> PlotInfo
  static member line : x:LineX * y:LineY * ?stroke:string * ?thickness:float * ?treatAs:LineTreatAs * ?fill68:string * ?fill95:string * ?displayName:string * ?titles:LineTitles -> PlotInfo
  static member markers : seriesX:float [] * seriesY:float [] * ?color:MarkersColor * ?colorPalette:string * ?size:MarkersSize * ?sizeRange:MarkersSizeRange * ?shape:MarkersShape * ?borderColor:string * ?displayName:string * ?titles:MarkersTitles -> PlotInfo
  static member markers : x:MarkersX * y:MarkersY * ?color:MarkersColor * ?colorPalette:string * ?size:MarkersSize * ?sizeRange:MarkersSizeRange * ?shape:MarkersShape * ?borderColor:string * ?displayName:string * ?titles:MarkersTitles -> PlotInfo

Full name: Angara.Charting.Plot
static member Plot.line : seriesY:float [] * ?stroke:string * ?thickness:float * ?treatAs:LineTreatAs * ?fill68:string * ?fill95:string * ?displayName:string * ?titles:LineTitles -> PlotInfo
static member Plot.line : seriesX:float [] * seriesY:float [] * ?stroke:string * ?thickness:float * ?treatAs:LineTreatAs * ?fill68:string * ?fill95:string * ?displayName:string * ?titles:LineTitles -> PlotInfo
static member Plot.line : x:LineX * y:LineY * ?stroke:string * ?thickness:float * ?treatAs:LineTreatAs * ?fill68:string * ?fill95:string * ?displayName:string * ?titles:LineTitles -> PlotInfo
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
val summary : data:seq<float> -> summaryType

Full name: Angara.Statistics.summary
val qsummary : data:seq<float> -> qsummaryType

Full name: Angara.Statistics.qsummary
Fork me on GitHub