• Stars
    star
    271
  • Rank 145,932 (Top 3 %)
  • Language
    Scala
  • License
    MIT License
  • Created over 5 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Parquet4S

Parquet4s is a simple I/O for Parquet. Allows you to easily read and write Parquet files in Scala.

Use just a Scala case class to define the schema of your data. No need to use Avro, Protobuf, Thrift or other data serialisation systems. You can use generic records if you don't want to use the case class, too.

Compatible with files generated with Apache Spark. However, unlike in Spark, you do not have to start a cluster to perform I/O operations.

Based on official Parquet library, Hadoop Client and Shapeless (Shapeless is not in use in a version for Scala 3).

As it is based on Hadoop Client then you can connect to any Hadoop-compatible storage like AWS S3 or Google Cloud Storage.

Integrations for Akka Streams and FS2.

Released for Scala 2.12.x, 2.13.x and 3.2.x.

Documentation

Documentation is available at here.

Contributing

Do you want to contribute? Please read the contribution guidelines.