• Stars
    star
    157
  • Rank 238,399 (Top 5 %)
  • Language
    Elixir
  • Created about 5 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Zipping on the fly — Generate downloadable Zip streams by aggregating File or URL Sources

Packmatic

Packmatic generates Zip streams by aggregating File or URL Sources.

By using a Stream, the caller can compose it within the confines of Plug’s request/response model and serve the content of the resultant Zip archive in a streaming fashion. This allows fast delivery of a Zip archive consisting of many disparate parts hosted in different places, without having to first spool all of them to disk.

The generated archive uses Zip64, and works with individual files that are larger than 4GB. See the Compatibility section for more information.



Design Rationale

Problem

In modern Web applications, content is often stored away from the host, such as Amazon S3, to enable better scalability and resiliency, as data would not be lost should a particular host go down. However, one disadvantage of this design is that data has become more spread-out, so previously simple operations such as creation of a Zip archive consisting of various files is now more complicated.

In a hypothetical scenario where the developer opts for synchronous generation, the process may spend too long preparing the constituent files, which breaks certain connection proxies which expect a reasonable time to first byte. And in any case, disk utilisation will be high, since not only does each constituent file need to be on disk, there must be free space that will hold the archive temporarily as well.

In another hypothetical scenario, where the developer opts for asynchronous generation, spooling still has to happen and further infrastructure for background job execution, user notification, temporary storage, etc. is now needed, which increases complexity.

Solution

The overall goal for Packmatic is to deliver a modern Zip streamer, which is able to assemble a Zip file from various sources, such as HTTP(S) downloads or locally generated temporary files, on-the-fly, so as to minimise waiting time, reduce resource consumption and elevate customer happiness.

Packmatic solves the problem by separating the concerns of content definition (specifying what content should be retrieved, and from where), data retrieval (getting the content from the given source), and content presentation (creation of the Archive bundle).

With Packmatic, the developer first specifies a list of Source Entries, which indicates where content can be obtained, and in what way: downloaded from the Internet, read from the filesystem, or dynamically resolved when the archive is being built, etc. This information is then wrapped in a Stream, which is consumed to send out chunked responses.

When the Stream is started, it spins up an Encoder process, which works in a staged, iterative manner. This allows downloads to start almost immediately, which enhances customer happiness.

The Encoder consumes the manifest as the Stream is consumed. In other words, the Stream starts producing data only when the client starts reading, and only as fast as the client reads, which reduces the amount of networking and local storage needed to start the process. Most of the acquisition work is interleaved, and back-pressured by client connection speed.

Benefits

Since Packmatic only streams what it needs, and does not spool content locally, it is possible to produce large archives whose constituent files do not fit on one host. This is an advantage over conventional processes, where even in a synchronous scenario, all constituent files and the archive must be spooled to disk.

This design enhances developer happiness as well, since by using a synchronous, iterative mode of operation, there is no longer any need to spool constituent files in the archive on disk; there is also no longer any need to build technical workarounds in order to compensate for archive preparation delays.

With Packmatic, both the problem and the solution is drastically simplified. The user clicks “download”, and the archive starts downloading. This is how it should be.

Installation

To install Packmatic, add the following line in your application’s dependencies:

defp deps do
  [
    {:packmatic, "~> 1.1.3"}
  ]
end

Usage

The general way to use Packmatic within your application is to generate a Stream dynamically by passing a list of Source Entries directly to Packmatic.build_stream/2. This gives you a standard Stream which you can then send it off for download with help from Packmatic.Conn.send_chunked/3.

If you need more control, for example if you desire context separation, or if you wish to validate that the entries are valid prior to vending a Stream, you may generate a Packmatic.Manifest struct ahead of time, then pass it to Packmatic.build_stream/2 at a later time. See Packmatic.Manifest for more information.

In either case, the Stream is powered by Packmatic.Encoder, which consumes the Entries within the Manifest iteratively as the Stream is consumed, at the pace set by the client’s download connection.

Building the Stream with Entries

The usual way to construct a Stream is as follows.

entries = [
  [source: {:file, "/tmp/hello.pdf"}, path: "hello.pdf"],
  [source: {:file, "/tmp/world.pdf"}, path: "world.pdf", timestamp: DateTime.utc_now()],
  [source: {:url, "https://example.com/foo.pdf"}, path: "foo/bar.pdf"]
]

stream = Packmatic.build_stream(entries)

As you can see, each Entry used to build the Stream (under source:) is a keyword list, which concerns itself with the source, the path, and optionally a timestamp:

  • source: represents a 2-arity tuple, representing the name of the Source and its Initialisation Argument. This data structure specifies the nature of the data, and how to obtain its content.

  • path: represents the path in the Zip file that the content should be put under; it is your own responsibility to ensure that paths are not duplicated (see the Notes for an example).

  • timestamp: is optional, and represents the creation/modification timestamp of the file. Packmatic emits both the basic form (DOS / FAT) of the timestamp, and the Extended Timestamp Extra Field which represents the same value with higher precision and range.

Packmatic supports reading from any Source which conforms to the Packmatic.Source behaviour. To aid adoption and general implementation, there are built-in Sources as well; this is documented under Source Types.

Building a Manifest

If you wish, you can use the Packmatic.Manifest module to build a Manifest ahead-of-time, in order to validate the Entries prior to vending the Stream.

Manifests can be created iteratively by calling Packmatic.Manifest.prepend/2 against an existing Manifest, or by calling Packmatic.Manifest.create/1 with a list of Entries created elsewhere. For more information, see Packmatic.Manifest.

Specifying Error Behaviour

By default, Packmatic fails the Stream when any Entry fails to process for any reason. If you desire, you may pass an additional option to Packmatic.build_stream/2 in order to modify this behaviour:

stream = Packmatic.build_stream(entries, on_error: :skip)

Writing Stream to File

You can use the standard Stream.into/2 call to operate on the Stream:

stream
|> Stream.into(File.stream!(file_path, [:write]))
|> Stream.run()

Writing Stream to Conn (with Plug)

You can use the bundled Packmatic.Conn module to send a Packmatic stream down the wire:

stream
|> Packmatic.Conn.send_chunked(conn, "download.zip")

When writing the stream to a chunked Plug.Conn, Packmatic automatically escapes relevant characters in the name and sets the Content Disposition to attachment for maximum browser compatibility under intended use.

Source Types

Packmatic has default Source types that you can use easily when building Manifests and/or Streams:

  1. File, representing content on disk, useful when the content is already available and only needs to be integrated. See Packmatic.Source.File.

  2. URL, representing content that is available remotely. Packmatic will run a chunked download routine to incrementally download and archive available chunks. See Packmatic.Source.URL.

  3. Stream, representing content that is generated by a Stream (perhaps from other libraries) which can be incrementally consumed and incorporated in the archive. Anything that conforms to the Enumerable protocol, for example a List containing binaries, can be passed as well.

  4. Random, representing randomly generated bytes which is useful for testing. See Packmatic.Source.Random.

  5. Dynamic, representing a dynamically resolved Source, which is ultimately fulfilled by pulling content from either a File or an URL. If you have any need to inject a dynamically generated file, you may use this Source type to do it. This also has the benefit of avoiding expensive computation work in case the customer abandons the download midway. See Packmatic.Source.Dynamic.

These Streams can be referred by their internal aliases:

  • {:file, "/tmp/hello/pdf"}.
  • {:url, "https://example.com/hello/pdf"}.
  • {:stream, [<<0>>, <<1>>]}.
  • {:random, 1048576}.
  • {:dynamic, fn -> {:ok, {:random, 1048576}} end}.

Alternatively, they can also be referred by module names:

  • {Packmatic.Source.File, "/tmp/hello/pdf"}.
  • {Packmatic.Source.URL, "https://example.com/hello/pdf"}.
  • {Packmatic.Source.Stream, enum}.
  • {Packmatic.Source.Random, 1048576}.
  • {Packmatic.Source.Dynamic, fn -> {:ok, {:random, 1048576}} end}.

Dynamic & Custom Sources

If you have an use case where you wish to dynamically generate the content that goes into the archive, you may either use the Dynamic source or implement a Custom Source.

For example, if the amount of dynamic computation is small, but the results are time-sensitive, like when you already have Object IDs and just need to pre-sign URLs, you can use a Dynamic source with a curried function:

{:dynamic, MyApp.Packmatic.build_dynamic_fun(object_id)}

If you have a different use case, for example if you need to pull data from a FTP server (which uses a protocol that Packmatic does not have a bundled Source to work with), you can implement a module that conforms to the Packmatic.Source behaviour, and pass it:

{MyApp.Packmatic.Source.FTP, "ftp://example.com/my.docx"}

See Packmatic.Source for more information.

Events

The Encoder can be configured to emit events in order to enable feedback elsewhere in your application, for example:

entries = [
  [source: {:file, "/tmp/hello.pdf"}, path: "hello.pdf"],
  [source: {:file, "/tmp/world.pdf"}, path: "world.pdf", timestamp: DateTime.utc_now()],
  [source: {:url, "https://example.com/foo.pdf"}, path: "foo/bar.pdf"]
]

entries_count = length(entries)
entries_completed_agent = Agent.start(fn -> 0 end)

handler_fun = fn event ->
  case event do
    %Packmatic.Event.EntryCompleted{} ->
      count = Agent.get_and_update(entries_completed_agent, & &1 + 1)
      IO.puts "#{count} of #{entries_count} encoded"
    %Packmatic.Event.StreamEnded{} ->
      :ok = Agent.stop(entries_completed_agent)
      :ok
    _ ->
      :ok
  end
end

stream = Packmatic.build_stream(entries, on_event: handler_fun)

See documentation for Packmatic.Event for a complete list of Event types.

Compatibility

  • Windows

    • Windows Explorer (Windows 10): OK
    • 7-Zip: OK
  • macOS

    • Finder (High Sierra): OK
    • The Unarchiver (High Sierra): NG
    • Erlang/OTP (:zip): NG

Known Issues

  • Ability to switch between STORE and DEFLATE modes is under investigation. Currently all archives are generated with DEFLATE, although for some use cases, this only increases the size of the archive, as content within each file is already compressed.

  • Can not change compression level.

  • Explicit directory generation is not supported; each entry must be a file. This is pending further investigation.

  • Handling of 0-length files from sources is pending further investigation. Currently, they do get journaled but this may change in the future.

Future Enhancements

  • We will consider adding ability to resume downloads with range queries, so if the URL Source fails we can attempt to resume encoding from where we left off.

  • In practice, if there are large files that are archived repeatedly, they will be pulled repeatedly and there is currently no explicit ability to cache them. We expect that in these cases an optimisation by the application developer may be to temporarily spool these hot constituents on disk and use a Dynamic Source to wrap around the cache layer (which would emit a File Source if the file is available locally, and act accordingly otherwise).

Notes

  1. As with any user-generated content, you should exercise caution when building the Manifest, and ensure that only content that the User is entitled to retrieve is included.

  2. Due to design limitations, when downloading resources via a HTTP(S) connection, should the connection become closed halfway, a partial representation may be embedded in the output. A future release of this library may correct this in case the Content-Length header is present in the output, and act accordingly.

  3. You must ensure that paths are not duplicated within the same Manifest. You can do this by first building a list of sources and paths, then grouping and numbering them as needed with Enum.group_by/3.

    Given entries are {source, path} tuples, you can do this:

    annotate_fun = fn entries ->
      for {{source, path}, index} <- Enum.with_index(entries) do
        path_components = Path.split(path)
        {path_components, [filename]} = Enum.split(path_components, -1)
        extname = Path.extname(filename)
        basename = Path.basename(filename, extname)
        path_components = path_components ++ ["#{basename} (#{index + 1})#{extname}"]
        {source, Path.join(path_components)}
      end
    end
    
    duplicates_fun = fn
      [_ | [_ | _]] = entries -> annotate_fun.(entries)
      entries -> entries
    end
    
    entries
    |> Enum.group_by(& elem(&1, 1))
    |> Enum.flat_map(&duplicates_fun.(elem(&1, 1)))
  4. You must ensure that paths conform to the target environment of your choice, for example macOS and Windows each has its limitations regarding how long paths can be.

  5. The Stream emitted by Packmatic contains IO Data basically lists of binaries or themselves. To use the Stream with other libraries such as ex_aws_s3, you will have to write a transformer which converts the IO Lists to binaries. See the guest post.

Guest Post: Using Packmatic with AWS S3 Multipart Uploads

The following Guest Post is authored by Kyle Steger at Driver Technologies.

Our company develops a dashcam application. We sync loads of sensor data (locations, acceleration, orientation, attitudes, ect) to our cloud platform while folks are driving with the application. This data is voluminous and infrequently queried so the natural place to store it is in S3. We allow our users to download exports of their data through our portal. We also share anonymized and aggregated information with some of our partners, who are typically interested in smaller segments of data around specific scenarios identified throughout a drive (think: hard braking, near collision, emergency vehicles). Today, that process could be summarized in a few steps:

  1. Fetch sensor data from S3
  2. Filter/transform the data in memory
  3. Write the result to an export directory on disk
  4. Create an archive of the directory using :erlang.zip and write to disk
  5. Pray everything worked as it should
  6. Stream the archive to S3
  7. Remove all artifacts from disk

Packmatic allows us to remove steps 3, 4, 5, and 7 from our export flow. Since Packmatic works with streams, we create sources that are responsible for filtering the data and emitting the buffer that Packmatic passes off to the ExAws.S3.upload function. The only thing that I had to implement to support this was an intermediate Stream.transform on the Packmatic.build_stream().

Example below:

# 10mb; S3 has a limit of 1000 upload chunks. 
# Ideally this value is estimated based on the export.
@chunk_size 10_485_760
# This function takes a stream generated by `Packmatic.build_stream()` and chunks the emitted values
# into sizes/shapes that `ExAws.S3.upload` can handle.
@spec chunk_packmatic_stream(Enumerable.t()) :: Enumerable.t()
defp chunk_packmatic_stream(stream) do
  Stream.transform(
    stream,
    _start_fun = fn -> {_buffer = "", _size = 0} end,
    _reducer = fn
      elem, {buffer, size} when size >= @chunk_size ->
        new_buffer = IO.iodata_to_binary(elem)
        {[buffer], {new_buffer, byte_size(new_buffer)}}
      elem, {buffer, size} ->
        buffer_appendage = IO.iodata_to_binary(elem)
        {[], {buffer <> buffer_appendage, size + byte_size(buffer_appendage)}}
    end,
    _last_fun = fn {buffer, _size} -> {[buffer], {"", 0}} end,
    _after_fun = fn _acc -> :ok end
  )
end

Acknowledgements

During design and prototype development of this library, the Author has drawn inspiration from the following implementations, and therefore thanks all contributors for their generosity:

The Author wishes to thank the following individuals:

Reference

More Repositories

1

LiveFrost

Real time blurring for iOS
Objective-C
484
star
2

etso

Ecto 3 adapter allowing use of Ecto schemas held in ETS tables
Elixir
372
star
3

DayFlow

iOS Date Picker + Infinite Scrolling
Objective-C
229
star
4

shun

Easy & flexible traffic filtering with Elixir
Elixir
80
star
5

RAPageViewController

Sliding pages side by side, infinitely. Fancy parts from the redacted bits, re-implemented for expressiveness at expense of some naïvety
Objective-C
78
star
6

supervised-scaler

An Elixir integration with libVIPS
Elixir
44
star
7

LESS.mode

LESS syntax mode for SubEthaEdit and Coda
Perl
38
star
8

emporium

Elixir
36
star
9

DayFlow-Sample

Sample application for DayFlow the picker
Objective-C
30
star
10

ets-playground

Leveraging ETS Effectively
Elixir
29
star
11

RAExpendable

NSObject subclass with dynamically added implementations for custom selectors
Objective-C
27
star
12

SocialAuth

Painless Facebook & Twitter auth on iOS 6+
Objective-C
26
star
13

gen_magic

Fantasia in Elixir — Binary-level file content identification, powered by libmagic
Elixir
25
star
14

RATilingBackgroundView

A special view which tiles visually identical subviews.
Objective-C
24
star
15

snake

🐍 Multi-Snake 🐍 — A showcase of Elixir clustering capabilities and strategies.
Elixir
22
star
16

LiveFrost-Sample

Sample for LiveFrost
Objective-C
20
star
17

RAProjectTools

Collection of Xcode project “tool” scripts
Ruby
15
star
18

RASchedulingKit

Callback-based, lock-less, wait-free NSOperation subclasses
Objective-C
15
star
19

metrics

Metal Charting
Objective-C
12
star
20

RAReachability

Yet another SystemConfiguration Reachability Wrapper
Objective-C
11
star
21

RAContentStretchingView

UIView content stretching / grouped table view cells done right
Objective-C
10
star
22

RAContentLoader

Reference-counted content loading
Objective-C
10
star
23

gemini-notes

Gemini PDA Notes
10
star
24

RARangedSlider

Ranged (two-knob) slider for iOS
Objective-C
10
star
25

RAPageViewController-Sample

Sample app for RAPageViewController
Objective-C
9
star
26

xyozys_gen_statem

tic-tac-toe with gen_statem (in Elixir)
Elixir
8
star
27

html5shiv

unofficial fork of html5shiv on Google Code
JavaScript
8
star
28

PLSymbolicate

Symbolicate protobuf output from PLCrashReporter et. al.
7
star
29

irInterfaceKit

Custom interface kit for Cappuccino
Objective-J
6
star
30

WeakKVO

Sample: emitting KVO change notifications for weak properties
Objective-C
6
star
31

FixedWidthWebView

fixed-width UIWebView implementation for containment in UIViewController
Objective-C
5
star
32

NSWindow-Resize

5
star
33

angular-chosen-playground

Chosen + AngularJS + Typeahead + Auto-Suggestion with good search performance on 10,000 items
4
star
34

flesler-plugins

Fork of flesler-plugins
JavaScript
4
star
35

NSStatusItem-Iconology

NSStatusItem iconology helper methods
Objective-C
4
star
36

InvocationTesting

Objective-C Method forwarding fun (not that it will be fast)
Objective-C
4
star
37

NSTableView-DeleteKey

Objective-C
3
star
38

hero

Heroku account swizzling helper
Ruby
3
star
39

kennel

Sample Hound + Headless Chrome application
Elixir
3
star
40

slate

Custom Plurk theme
3
star
41

monoSnippets

Minimalist jQuery / JavaScript / Sass tool chain for a more civilized web
JavaScript
3
star
42

RATilingBackgroundView-Sample

Sample app for RATilingBackgroundView
Objective-C
3
star
43

termHere

Terminal niceties for for Coda
2
star
44

UIScrollViewExperiments

Just a bunch of experimental code
Objective-C
2
star
45

SocialAuth-Sample

Sample for SocialAuth
Objective-C
2
star
46

ReflectiveMapView

HACKY: Same story regarding iOS 6 status bar, but done for CAEAGLLayer as well.
Objective-C
2
star
47

CPTableview-customDataView-selectionTest

CPTableview custom data view -setSelected:animated: experiment
Objective-J
2
star
48

RAAnimation

WIP: UIKit / QuartzCore Animation Additions
Objective-C
2
star
49

gridViewNibTest

Loading AQGridViewCell from a nib
Objective-C
2
star
50

PinchThat

An attempt to fix an Apple UICollectionView WWDC sample app issue with duplicate decoration views
Objective-C
2
star
51

glitter

Bare-bones zBar-to-UIWebView trampoline
Objective-C
2
star
52

CommonCrypto.j

Crypto Support for Cappuccino
Objective-J
2
star
53

jQuery.spin.js

Spin any image using Raphaël. Adoption.
1
star
54

cloudy

WIP — Desktop / Mobile Continuum for iTunes Track Info
1
star
55

TabBarDemo

Objective-C
1
star
56

MNCodaHighlighter

1
star
57

jQuery.multiFile

JavaScript
1
star
58

js.delegatable

Cocoa-style delegation support for JS.Class
JavaScript
1
star
59

myGit

Shows how to call “which” from a Cocoa app while respecting the user’s $PATH
Objective-C
1
star
60

monoArray

array.supersets(), etc.
JavaScript
1
star
61

retinal

Ordinary + @2x Illustrator .PNGs in one click
JavaScript
1
star
62

LWHangDetector

LWHangDetector in repository form
Objective-C
1
star
63

js.synthesizable

1
star
64

monoTwitterEngine

jQuery-based Twitter API
JavaScript
1
star
65

snapshot

Make copies, make many of them
1
star
66

code-your-startup-idea

Samples for a She++ 2013 Session
1
star
67

stochastic

Full gamut of RGBA.
Ruby
1
star
68

RAContentStretchingView-Sample

Sample for RAContentStretchingView
Objective-C
1
star
69

regulation

RegEx testing
JavaScript
1
star
70

evadne.github.com

Site
HTML
1
star
71

7languages

7 Languages in 7 Weeks…
Io
1
star
72

monoMailer

PHP mail() wrapper
PHP
1
star
73

AQGridView-Issue-171-Sample-1

AlanQuatermain/AQGridView Issue #171 Sample 1
Objective-C
1
star
74

irDelegation

Easy delegation and protocol support for Cappuccino
Objective-J
1
star
75

RotationTest

Some sort of rotation math.
Objective-C
1
star
76

CPObject-DelayedPerforming

Cappuccino CPObject Delayed Performing
Objective-J
1
star
77

jQConstants

Semantic jQuery addition that brings legible constants
JavaScript
1
star
78

pictorial

Your tiny, automated image post-processing factory
Ruby
1
star
79

markdown-previewer

1
star
80

NSBundle-Information

Objective-C
1
star
81

radiance

Custom WordPress theme using Journalist as a starting point
PHP
1
star
82

NSMutableArray-Shuffling

Objective-C
1
star
83

CodaPluginsController.h

The missing header for Coda plug-ins.
1
star
84

KVC.js

Minimalist JavaScript Key-Value Coding support
1
star
85

monoString

JavaScript String additions
JavaScript
1
star
86

ObjC-SwizzleMethods

Obj-C method swizzling wrapper
C
1
star
87

wczg-2018-concurrent-js

Examples for the “Fearless Concurrency with JavaScript & Beyond” talk @ WebCamp Zagreb, 2018
JavaScript
1
star
88

wp-e-commerce

Forked WP e-Commerce plugin
PHP
1
star
89

tidyCJK.js

Spaces Chinese, Japanese and Korean glyphs properly — before browsers learned how to kern them. Needs XRegExp!
JavaScript
1
star
90

innerShadowedTextSample

Custom UILabel drawing with inner shadowed text
Objective-C
1
star
91

SortedSectionTest

Sorting NSFetchedResultsController sections on rigid but arbitrary principles
Objective-C
1
star
92

Aperio

markit finance api client
Objective-C
1
star
93

monoDate

JavaScript Date additions, including a minimal, template-based, Ruby-flavored formatter
JavaScript
1
star
94

vips-buildpack

VIPS + Anvil + Heroku
Shell
1
star
95

ISO8601DateFormatter

Mirror.
Objective-C
1
star
96

StickyHTMLTableHeaders

Fun with HTML
JavaScript
1
star
97

snipMe

Drop-dead fool-proof in-line localization
1
star
98

helveticSafari

Arial bashers rejoice.
1
star
99

coursework-introductory-core-data

Introductory Core Data tutorial
Objective-C
1
star