• Stars
    star
    115
  • Rank 305,916 (Top 7 %)
  • Language
    Java
  • Created almost 10 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Plugin for Presto to allow addition of user functions easily

Presto User-Defined Functions(UDFs)

Plugin for Presto to allow addition of user defined functions. The plugin simplifies the process of adding user functions to Presto.

Plugging in Presto UDFs

The details about how to plug in presto UDFs can be found here.

Presto Version Compatibility

Presto Version Last Compatible Release
ver 300+ current
ver 0.193-0.2xx udfs-2.0.3
ver 0.180 udfs-2.0.2
ver 0.157 udfs-2.0.1
ver 0.142 udfs-1.0.0
ver 0.119 udfs-0.1.3

Implemented User Defined Functions

The repository contains the following UDFs implemented for Presto :

HIVE UDFs

  • DATE-TIME Functions
  1. to_utc_timestamp(timestamp, string timezone) -> timestamp
    Assumes given timestamp is in given timezone and converts to UTC (as of Hive 0.8.0). For example, to_utc_timestamp('1970-01-01 00:00:00','PST') returns 1970-01-01 08:00:00.
  2. from_utc_timestamp(timestamp, string timezone) -> timestamp
    Assumes given timestamp is UTC and converts to given timezone (as of Hive 0.8.0). For example, from_utc_timestamp('1970-01-01 08:00:00','PST') returns 1970-01-01 00:00:00.
  3. unix_timestamp() -> timestamp
    Gets current Unix timestamp in seconds.
  4. year(string date) -> int
    Returns the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970.
  5. month(string date) -> int
    Returns the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11.
  6. day(string date) -> int
    Returns the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1.
  7. hour(string date) -> int
    Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12.
  8. minute(string date) -> int
    Returns the minute of the timestamp: minute('2009-07-30 12:58:59') = 58, minute('12:58:59') = 58.
  9. second(string date) -> int
    Returns the second of the timestamp: second('2009-07-30 12:58:59') = 59, second('12:58:59') = 59.
  10. to_date(string timestamp) -> string
    Returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"
  11. weekofyear(string date) -> int
    Returns the week number of a timestamp string: weekofyear("1970-11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44.
  12. date_sub(string startdate, int days) -> string
    Subtracts a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30'.
  13. date_add(string startdate, int days) -> string
    Adds a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01'.
  14. datediff(string enddate, string startdate) -> string
    Returns the number of days from startdate to enddate: datediff('2009-03-01', '2009-02-27') = 2.
  15. format_unixtimestamp(bigint unixtime[, string format]) -> string
    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00" unless a format string is specified. If a format string is specified the epoch time is converted in the specified format. More information about the formatter can be found here.
    NOTE : Due to name collision of presto 0.142's implementaion of from_unixtime(bigint unixtime) function, which returns the value as a timestamp type and Hive's from_unixtime(bigint unixtime[, string format]) function, which returns the value as string type and supports formatter, the hive UDF has been implemented as format_unixtimestamp(bigint unixtime[, string format]).
  16. from_duration(string duration, string duration_unit) -> double
    Converts a string representing time duration in airlift's Duration format (https://github.com/airlift/units/blob/master/src/main/java/io/airlift/units/Duration.java) to a double representing time in specified unit: from_duration('4h', 'ms') = 1.44E7.
  17. from_datasize(string datasize, string size_unit) -> double
    Converts a string representing data size in airlift's DataSize format (https://github.com/airlift/units/blob/master/src/main/java/io/airlift/units/DataSize.java) to a double representing size in specified unit: from_datasize('1GB', 'B') = 1.073741824E9.
  • MATH Functions
  1. pmod(INT a, INT b) -> INT, pmod(DOUBLE a, DOUBLE b) -> DOUBLE
    Returns the positive value of a mod b: pmod(17, -5) = -3.
  2. rands(INT seed) -> DOUBLE
    Returns a random number (that changes from row to row) that is distributed uniformly from 0 to 1. Specifying the seed will make sure the generated random number sequence is deterministic: rands(3) = 0.731057369148862
    NOTE : Due to name collision of presto 0.142's implementaion of rand(int a) function, which returns a number between 0 to a and Hive's rand(int seed) function, which sets the seed for the random number generator, the hive UDF has been implemented as rands(int seed).
  3. bin(BIGINT a) -> STRING
    Returns the number in binary format: bin(100) = 1100100.
  4. hex(BIGINT a) -> STRING, hex(STRING a) -> STRING, hex(BINARY a) -> STRING
    If the argument is an INT or binary, hex returns the number as a STRING in hexadecimal format. Otherwise if the number is a STRING, it converts each character into its hexadecimal representation and returns the resulting STRING: hex(123) = 7b, hex('123') = 7b, hex('1100100') = 64.
  5. unhex(STRING a) -> BINARY
    Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number: unhex('7b') = 1111011.
  • STRING Functions
  1. locate(string substr, string str[, int pos]) -> int
    Returns the position of the first occurrence of substr in str after position pos: locate('si', 'mississipi', 2) = 4, locate('si', 'mississipi', 5) = 7
  2. find_in_set(string str, string strList) -> int
    Returns the first occurance of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas: find_in_set('ab', 'abc,b,ab,c,def') returns 3.
  3. instr(string str, string substr) -> int
    Returns the position of the first occurrence of substr in str. Returns null if either of the arguments are null and returns 0 if substr could not be found in str: instr('mississipi' , 'si') = 4.
  • CONDITIONAL Functions

    1. nvl(T value, T default_value) -> T
      ** Supported only till v1.0.0 due to the limitations presto new versions of Presto puts on plugins Returns default value if value is null else returns value: nvl(3,4) = 3, nvl(NULL,4) = 4.
  • MISCELLANEOUS Functions

    1. hash(a1[, a2...]) -> int
      ** Supported only till v1.0.0 due to the limitations presto new versions of Presto puts on plugins Returns a hash value of the arguments. hash('a','b','c') = 143025634.

Adding User Defined Functions to Presto-UDFs

Functions can be added using annotations, follow https://prestosql.io/docs/current/develop/functions.html for details on how to add functions

** Note that Code generated functions were supported only till v1.0.0 due to the limitations presto new versions of Presto puts on plugins

Release a new version of presto-udfs

Releases are always created from master. During development, master has a version like X.Y.Z-SNAPSHOT.

# Change version as per http://semver.org/
mvn release:prepare -Prelease
mvn release:perform -Prelease
git push
git push --tags

More Repositories

1

sparklens

Qubole Sparklens tool for performance tuning Apache Spark
Scala
562
star
2

rubix

Cache File System optimized for columnar formats and object stores
Java
182
star
3

spark-on-lambda

Apache Spark on AWS Lambda
Scala
151
star
4

kinesis-sql

Kinesis Connector for Structured Streaming
Scala
137
star
5

afctl

afctl helps to manage and deploy Apache Airflow projects faster and smoother.
Python
130
star
6

quark

Quark is a data virtualization engine over analytic databases.
Java
98
star
7

streamx

kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Java
97
star
8

spark-acid

ACID Data Source for Apache Spark based on Hive ACID
Scala
96
star
9

qds-sdk-py

Python SDK for accessing Qubole Data Service
Python
51
star
10

uchit

Python
29
star
11

streaminglens

Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
Scala
17
star
12

s3-sqs-connector

A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
Scala
17
star
13

spark-state-store

Rocksdb state storage implementation for Structured Streaming.
Scala
16
star
14

presto-kinesis

Presto connector to Amazon Kinesis service.
Java
14
star
15

kinesis-storage-handler

Hive Storage Handler for Kinesis.
Java
11
star
16

qds-sdk-java

A Java library that provides the tools you need to authenticate with, and use the Qubole Data Service API.
Java
7
star
17

demotrends

Code required to setup the demo trends website (http://demotrends.qubole.com)
Ruby
6
star
18

qubole-terraform

HCL
6
star
19

space-ui

UI Ember components based on Space design specs
JavaScript
5
star
20

caching-metastore-client

A metastore client that caches objects
Java
5
star
21

rubix-admin

Admin scripts for Rubix
Python
5
star
22

tco

Python
4
star
23

qds-sdk-R

R extension to execute Hive Commands through Qubole Data Service Python SDK.
Python
4
star
24

docker-images

Qubole Docker Images
Dockerfile
4
star
25

tableau-qubole-connector

JavaScript
3
star
26

metriks-addons

Utilities for collecting metrics in a Rails Application
Ruby
3
star
27

qds-sdk-ruby

Ruby SDK for Qubole API
Ruby
3
star
28

qubole-log-datasets

3
star
29

hubot-qubole

Interaction with Qubole Data Services APIs via Hubot framework
CoffeeScript
3
star
30

customer-success

HCL
2
star
31

bootstrap-functions

Useful functions for Qubole cluster bootstraps
Shell
2
star
32

qubole-jar-test

A maven project to test that qubole jars can be listed as dependencies
Java
2
star
33

etl-examples

Scala
2
star
34

perf-kit-queries

2
star
35

tuning-paper

TeX
2
star
36

blogs

1
star
37

jupyter

1
star
38

presto-event-listeners

1
star
39

qubole-rstudio-example

1
star
40

presto

Presto
Java
1
star
41

qubole.github.io

Qubole OSS Page
1
star
42

quboletsdb

Setup opentsdb using Qubole
Python
1
star