This project contains some basic runnable tools that can help with various tasks around a Spark based project.
The main tools available:
- FormatConverter Converts any acceptable file format into a different file format, providing also partitioning support.
- SimpleSqlProcessor Applies a given SQL to the input files which are being mapped into tables.
- StreamingFormatConverter Converts any acceptable data stream format into a different data stream format, providing also partitioning support.
- SimpleFileStreamingSqlProcessor Applies a given SQL to the input files streams which are being mapped into file output streams.
This project is also trying to create and encourage a friendly yet professional environment for developers to help each other, so please do no be shy and join through gitter, twitter, issue reports or pull requests.
- Java 8 or higher
- Scala 2.11 or 2.12
- Apache Spark 2.4.X
Getting Spark Tools
where the latest artifacts can be found.
- Group id / organization:
- Artifact id / name:
- Latest version is
Usage with SBT, adding a dependency to the latest version of tools to your sbt build definition file:
libraryDependencies += "org.tupol" %% "spark-tools" % "0.4.1"
Include this package in your Spark Applications using
with Scala 2.11
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.11:0.4.1
or with Scala 2.12
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.12:0.4.1
- The project compiles with both Scala
- Updated Apache Spark to
- Updated the
- Removed the
com.databricks:spark-avrodependency, as avro support is now built into Apache Spark
- Updated the
spark-utilsdependency to the latest available snapshot
For previous versions please consult the release notes.
This code is open source software licensed under the MIT License.