Options
All
  • Public
  • Public/Protected
  • All
Menu

Class ArrowConverter

This is a set of helper functions to move between the Maven DataFrame and Apache Arrow.

Arrow is very useful as a data shuttling mechanism (compact, efficient, relatively generic, and fast) but doesn't support hierarchal row grouping and has no MavenWorks framework support.

Hierarchy

  • ArrowConverter

Index

Methods

Static Private createArrowVectors

  • createArrowVectors(dataColumns: Array<Array<unknown>>, schema: Schema): RecordBatch
  • Create a RecordBatch given a list of columnar data and schema

    static

    Parameters

    • dataColumns: Array<Array<unknown>>

      The outgoing data in columnar form

    • schema: Schema

      The schema to use when transforming to Arrow

    Returns RecordBatch

Static Private createColumnArrays

  • createColumnArrays(table: Table, schema: Schema): Array<Array<unknown>>
  • Turn a table to into a set of columnar TypedArrays, or regular Arrays

    static

    Parameters

    • table: Table

      The outgoing MavenTable

    • schema: Schema

      The schema to use.

    Returns Array<Array<unknown>>

Static fromArrow

  • fromArrow(arrowTable: ArrowTable): Table

Static Private inferSchema

  • inferSchema(table: Table): Schema
  • Infer an Arrow schema from a MavenTable, and attach metadata to it.

    Not all types have clean Arrow equivalents; for those types, they should be serialized to strings.

    static

    Parameters

    • table: Table

      The table to infer the schema from

    Returns Schema

    An Arrow table schema

Static toArrow

  • toArrow(table: Table): ArrowTable
  • Generate an Arrow table, given a flat Maven Table.

    Using Arrow has certain performance and space benefits that Tables cannot normally take advantage of.

    This function does not guarantee complete correctness, and may not successfully round-trip as the same table. Further, it only supports flat tables (for now).

    remarks

    This method makes a best effort to translate between Maven types and Arrow types, however the process is inexact. Further, any columns that are turned into Utf8 will pay a performance penalty on deserialization, per column.

    For best results, make sure the table has type annotations. Arrow has limited support for heterogenous types, to the extent that this function doesn't make any attempt to account for them. The type of the first row will be used to infer the column types, if the types are not already there.

    Types that Arrow cannot handle will be serialized as Utf8 JSON.

    static

    Parameters

    • table: Table

      The Maven table to translate

    Returns ArrowTable

    An Apache Arrow data table

Generated using TypeDoc