A table is an immutable collection of named columns.
Column values are represented as lazy one-dimensional immutable array of one of the supported types.
Heights of all columns in a table are equal.
Columns names are arbitrary strings;
duplicate names are allowed but may cause ambiguity in some API functions.
Here we describe how the column and the column-based table are represented.
See also Table as Collection of Rows to work with table rows.
A column is represented as an immutable type Angara.Data.Column
which
keeps column name, height and values. Column name cannot be a null string.
1:
2:
3:
4:
5:
6:
7:
|
type Column =
/// Gets a name of the column.
member Name : string with get
/// Gets a number of rows in the column.
member Height : int with get
/// Returns column values.
member Rows : ColumnValues with get
|
Column values are represented as an instance of discriminated union Angara.Data.ColumnValues
:
1:
2:
3:
4:
5:
6:
|
type ColumnValues =
| IntColumn of Lazy<ImmutableArray<int>>
| RealColumn of Lazy<ImmutableArray<float>>
| StringColumn of Lazy<ImmutableArray<string>>
| DateColumn of Lazy<ImmutableArray<DateTime>>
| BooleanColumn of Lazy<ImmutableArray<Boolean>>
|
The ImmutableArray<'a>
structure represents an array that cannot be changed once it is created. Use of
Lazy<'a> enables evaluation of
the column array on demand.
To create a column, use overloaded static methods Column.Create
and Column.CreateLazy
.
Let we have a sequence xs
and want to create a corresponding column named "x"
:
1:
|
let xs = seq{ for i in 0..99 -> float(i) / 10.0 }
|
-
Use
Column.Create
to build a column from a string name and a sequence of values.
If the given sequence is a mutable array, it is copied to guarantee immutability of the column;
if the sequence is an immutable array, it is used as is without copying;
otherwise, if none of above, an immutable array is built from the sequence.
1:
|
let cx = Column.Create ("x", xs)
|
-
If you provide an optional argument
count
to the Column.Create
, the given sequence will be enumerated
only when the column values are first time accessed. The given count
is the number of elements in the sequence.
If the real sequence length will be different than specified, a runtime exception will occur when values are requested.
1:
|
let lazyCx = Column.Create ("x", xs, 100)
|
-
Another way to create a lazy column is to use
Column.CreateLazy
which takes a Lazy
instance producing an immutable array, and the number of elements.
Evalutation of the array will be performed when the column values are first time accessed.
1:
|
let lazyCx' = Column.CreateLazy ("x", lazy(ImmutableArray.CreateRange xs), 100)
|
- To build a column with same values but different name, call
Colum.Create
and pass an instance of the ColumnValues
.
1:
|
let cx2 = Column.Create ("x2", cx.Rows, cx.Height)
|
Generic tools usually do not expect a column to have a certain type, but must handle all possible types.
In this case, use match
by value of the Column.Rows
property to get column values.
The following example prints values of the column:
1:
2:
3:
4:
5:
6:
|
match cx.Rows with
| RealColumn v -> printf "floats: %A" v.Value
| IntColumn v -> printf "ints: %A" v.Value
| StringColumn v -> printf "strings: %A" v.Value
| DateColumn v -> printf "dates: %A" v.Value
| BooleanColumn v -> printf "bools: %A" v.Value
|
floats: seq [0.0; 0.1; 0.2; 0.3; ...]
|
When a column is expected to be of a certain type, use one of the functions ColumnValues.AsReal
,
ColumnValues.AsInt
, ColumnValues.AsString
, ColumnValues.AsDate
, ColumnValues.AsBoolean
which evaluate the column array (if it is not evaluated yet) and return the ImmutableArray<'a>
instance,
assuming that the column type corresponds the function;
otherwise, if the column type is incorrect, the function fails.
1:
|
let x : ImmutableArray<float> = cx.Rows.AsReal
|
seq [0.0; 0.1; 0.2; 0.3; ...]
|
Also, the type ColumnValues
allows getting an individual data value by an index; again,
there is a generic approach based on match
and a succinct approach when a certain type is expected.
The following example returns a median of the ordered column cx
when type is unknown:
1:
2:
3:
4:
5:
6:
|
match cx.Rows.[cx.Height / 2] with
| RealValue v -> printf "float: %f" v
| IntValue v -> printf "int: %d" v
| StringValue v -> printf "string: %s" v
| DateValue v -> printf "date: %A" v
| BooleanValue v -> printf "bool: %A" v
|
The next example assumes that the column is real:
1:
|
printf "float: %f" (cx.Rows.[cx.Height / 2].AsReal)
|
The type Angara.Data.Table
represents an immutable table.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
|
type Table =
interface IEnumerable<Column>
/// Gets a count of the total number of columns in the table.
member Count : int with get
/// Gets a count of the total number of rows in the table.
member RowsCount : int with get
/// Gets a column by its index.
member Item : index:int -> Column with get
/// Gets a column by its name.
/// If there are several columns with same name, returns the fist column having the name.
member Item : name:string -> Column with get
/// Tries to get a column by its index.
member TryItem : index:int -> Column option with get
/// Tries to get a column by its name.
/// If there are several columns with same name, returns the fist column having the name.
member TryItem : name:string -> Column option with get
|
The Table.Empty
property returns an empty table, i.e. a table that has no columns.
A table can be created from a finite sequence of columns:
1:
2:
3:
4:
|
let table =
Table.OfColumns
[ Column.Create ("x", xs)
Column.Create ("sin(x)", xs |> Seq.map sin) ]
|
To add a column to a table, you can use the static function Table.Add
which creates
a new table that has all columns of the original table appended with the given column.
Duplicate names are allowed.
Normally, all columns of a table must have same height which is the table row count;
if the new table column has different height, Table.Add
fails.
In the following example the resulting table
is identical to the table
of the previous example:
1:
2:
3:
4:
|
let table =
Table.Empty
|> Table.Add (Column.Create ("x", xs))
|> Table.Add (Column.Create ("sin(x)", xs |> Seq.map sin))
|
To remove columns from a table by names, you can use Table.Remove
:
1:
|
let table2 = table |> Table.Remove ["sin(x)"]
|
The Table
implements the IEnumerable<Column>
interface and you can manipulate with a table
as a sequence of columns. For example, the following code removes all columns but first:
1:
|
let table3 = table |> Seq.take 1 |> Table.OfColumns
|
The following example prints a schema of the table without evalutation of the columns values:
1:
2:
3:
4:
5:
6:
7:
8:
9:
|
table
|> Seq.iteri (fun colIdx col ->
printfn "%d: %s of type %s" colIdx col.Name
(match col.Rows with
| RealColumn _ -> "float"
| IntColumn _ -> "int"
| StringColumn _ -> "string"
| DateColumn _ -> "DateTime"
| BooleanColumn _ -> "bool"))
|
0: x of type float
1: sin(x) of type float
|
The Table
exposes members Count
, Item
and TryItem
that allow to get a count of the total number of columns in the table
and get a column by its index or name.
In particular, the Table.Item
is an indexed property finding a column. If a column is not found,
an exception is thrown; if there are two or more columns with the given name, the first column having the name is returned.
The example gets a name of a table column with index 1:
1:
|
let col_name = table.[1].Name
|
Next, we compute an average of the column named "sin(x)", assuming that it is real:
1:
|
let sin_avg = table.["sin(x)"].Rows.AsReal |> Seq.average
|