• Stars
    star
    325
  • Rank 124,530 (Top 3 %)
  • Language
    Kotlin
  • License
    Apache License 2.0
  • Created over 10 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Kotlin Bigdata Toolkit

Centurion

master License

Introduction

Centurion is a JVM (written in Kotlin) toolkit for columnar and streaming formats.

This library allows you to read, write and convert between the following formats:

Readers and writers are compatible with data generated by Apache Spark and does not require you to start a cluster to perform I/O operations.

Schema Conversions

Centurion allows easy conversion of schemas between any of the supported formats, via Centurion's own internal format.

This internal format is a superset of the functionality of all the supported formats, and is intended as an intermediate format only to allow for conversions.

The following table shows how types map between each of the formats.

Centurion Type Avro Parquet Orc Arrow
Strings String Binary (String) String Utf8
UUID String (UUID) Binary (String) String Utf8
Booleans Boolean Boolean Boolean Bool
Int64 Long Int64 Long Int64 Signed
Int32 Int Int32 Int Int32 Signed
Int16 N/A (Int) Int32 (Signed Int16) Short Int16 Signed
Int8 N/A (Int) Int32 (Signed Int8) Byte Int8 Signed
Float64 Double Double Double FloatingPointDouble
Float32 Float Float Float FloatingPointSingle
Enum Enum Enum String String
Decimal Binary / Fixed with annotation Decimal Decimal(precision, scale) Decimal) Decimal
Varchar Fixed) N/A (String) Varchar N/A (String)
TimestampMillis Long (TimestampMillis) Int64 (Timestamp) Timestamp Timestamp (Millis)
TimestampMicros Long (TimestampMicros) Int64 (Timestamp) Unsupported Timestamp (Micros)
Map Map Map Map Map