A table is an immutable collection of named columns.
Column values are represented as lazy one-dimensional immutable array of one of the supported types.
Heights of all columns in a table are equal.
Columns names are arbitrary strings;
duplicate names are allowed but may cause ambiguity in some API functions.
Here we describe how the column and the column-based table are represented.
See also Table as Collection of Rows to work with table rows.
A column is represented as an immutable type Angara.Data.Column which
keeps column name, height and values. Column name cannot be a null string.
1:
2:
3:
4:
5:
6:
7:
|
type Column =
/// Gets a name of the column.
member Name : string with get
/// Gets a number of rows in the column.
member Height : int with get
/// Returns column values.
member Rows : ColumnValues with get
|
Column values are represented as an instance of discriminated union Angara.Data.ColumnValues:
1:
2:
3:
4:
5:
6:
|
type ColumnValues =
| IntColumn of Lazy<ImmutableArray<int>>
| RealColumn of Lazy<ImmutableArray<float>>
| StringColumn of Lazy<ImmutableArray<string>>
| DateColumn of Lazy<ImmutableArray<DateTime>>
| BooleanColumn of Lazy<ImmutableArray<Boolean>>
|
The ImmutableArray<'a>
structure represents an array that cannot be changed once it is created. Use of
Lazy<'a> enables evaluation of
the column array on demand.
To create a column, use overloaded static methods Column.Create and Column.CreateLazy.
Let we have a sequence xs and want to create a corresponding column named "x":
1:
|
let xs = seq{ for i in 0..99 -> float(i) / 10.0 }
|
-
Use
Column.Create to build a column from a string name and a sequence of values.
If the given sequence is a mutable array, it is copied to guarantee immutability of the column;
if the sequence is an immutable array, it is used as is without copying;
otherwise, if none of above, an immutable array is built from the sequence.
1:
|
let cx = Column.Create ("x", xs)
|
-
If you provide an optional argument
count to the Column.Create, the given sequence will be enumerated
only when the column values are first time accessed. The given count is the number of elements in the sequence.
If the real sequence length will be different than specified, a runtime exception will occur when values are requested.
1:
|
let lazyCx = Column.Create ("x", xs, 100)
|
-
Another way to create a lazy column is to use
Column.CreateLazy which takes a Lazy instance producing an immutable array, and the number of elements.
Evalutation of the array will be performed when the column values are first time accessed.
1:
|
let lazyCx' = Column.CreateLazy ("x", lazy(ImmutableArray.CreateRange xs), 100)
|
- To build a column with same values but different name, call
Colum.Create and pass an instance of the ColumnValues.
1:
|
let cx2 = Column.Create ("x2", cx.Rows, cx.Height)
|
Generic tools usually do not expect a column to have a certain type, but must handle all possible types.
In this case, use match by value of the Column.Rows property to get column values.
The following example prints values of the column:
1:
2:
3:
4:
5:
6:
|
match cx.Rows with
| RealColumn v -> printf "floats: %A" v.Value
| IntColumn v -> printf "ints: %A" v.Value
| StringColumn v -> printf "strings: %A" v.Value
| DateColumn v -> printf "dates: %A" v.Value
| BooleanColumn v -> printf "bools: %A" v.Value
|
floats: seq [0.0; 0.1; 0.2; 0.3; ...]
|
When a column is expected to be of a certain type, use one of the functions ColumnValues.AsReal,
ColumnValues.AsInt, ColumnValues.AsString, ColumnValues.AsDate, ColumnValues.AsBoolean
which evaluate the column array (if it is not evaluated yet) and return the ImmutableArray<'a> instance,
assuming that the column type corresponds the function;
otherwise, if the column type is incorrect, the function fails.
1:
|
let x : ImmutableArray<float> = cx.Rows.AsReal
|
seq [0.0; 0.1; 0.2; 0.3; ...]
|
Also, the type ColumnValues allows getting an individual data value by an index; again,
there is a generic approach based on match and a succinct approach when a certain type is expected.
The following example returns a median of the ordered column cx when type is unknown:
1:
2:
3:
4:
5:
6:
|
match cx.Rows.[cx.Height / 2] with
| RealValue v -> printf "float: %f" v
| IntValue v -> printf "int: %d" v
| StringValue v -> printf "string: %s" v
| DateValue v -> printf "date: %A" v
| BooleanValue v -> printf "bool: %A" v
|
The next example assumes that the column is real:
1:
|
printf "float: %f" (cx.Rows.[cx.Height / 2].AsReal)
|
The type Angara.Data.Table represents an immutable table.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
|
type Table =
interface IEnumerable<Column>
/// Gets a count of the total number of columns in the table.
member Count : int with get
/// Gets a count of the total number of rows in the table.
member RowsCount : int with get
/// Gets a column by its index.
member Item : index:int -> Column with get
/// Gets a column by its name.
/// If there are several columns with same name, returns the fist column having the name.
member Item : name:string -> Column with get
/// Tries to get a column by its index.
member TryItem : index:int -> Column option with get
/// Tries to get a column by its name.
/// If there are several columns with same name, returns the fist column having the name.
member TryItem : name:string -> Column option with get
|
The Table.Empty property returns an empty table, i.e. a table that has no columns.
A table can be created from a finite sequence of columns:
1:
2:
3:
4:
|
let table =
Table.OfColumns
[ Column.Create ("x", xs)
Column.Create ("sin(x)", xs |> Seq.map sin) ]
|
To add a column to a table, you can use the static function Table.Add which creates
a new table that has all columns of the original table appended with the given column.
Duplicate names are allowed.
Normally, all columns of a table must have same height which is the table row count;
if the new table column has different height, Table.Add fails.
In the following example the resulting table is identical to the table of the previous example:
1:
2:
3:
4:
|
let table =
Table.Empty
|> Table.Add (Column.Create ("x", xs))
|> Table.Add (Column.Create ("sin(x)", xs |> Seq.map sin))
|
To remove columns from a table by names, you can use Table.Remove:
1:
|
let table2 = table |> Table.Remove ["sin(x)"]
|
The Table implements the IEnumerable<Column> interface and you can manipulate with a table
as a sequence of columns. For example, the following code removes all columns but first:
1:
|
let table3 = table |> Seq.take 1 |> Table.OfColumns
|
The following example prints a schema of the table without evalutation of the columns values:
1:
2:
3:
4:
5:
6:
7:
8:
9:
|
table
|> Seq.iteri (fun colIdx col ->
printfn "%d: %s of type %s" colIdx col.Name
(match col.Rows with
| RealColumn _ -> "float"
| IntColumn _ -> "int"
| StringColumn _ -> "string"
| DateColumn _ -> "DateTime"
| BooleanColumn _ -> "bool"))
|
0: x of type float
1: sin(x) of type float
|
The Table exposes members Count, Item and TryItem that allow to get a count of the total number of columns in the table
and get a column by its index or name.
In particular, the Table.Item is an indexed property finding a column. If a column is not found,
an exception is thrown; if there are two or more columns with the given name, the first column having the name is returned.
The example gets a name of a table column with index 1:
1:
|
let col_name = table.[1].Name
|
Next, we compute an average of the column named "sin(x)", assuming that it is real:
1:
|
let sin_avg = table.["sin(x)"].Rows.AsReal |> Seq.average
|