• This repository has been archived on 28/Mar/2024
  • Stars
    star
    349
  • Rank 121,528 (Top 3 %)
  • Language
    Ruby
  • License
    Other
  • Created over 8 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[DEPRECATED] Compact ZIP file writing/reading for Ruby, for streaming applications

zip_tricks

โš ๏ธ Deprecation notice

zip_tricks will not receive further updates or support, and will no longer be maintained. The story of zip_tricks continues in zip_kit which is going to be receiving regular updates and supports all of the zip_tricks functionality (and more). Thank you for being part of the zip_tricks community!

Gem Version


Allows streaming, non-rewinding ZIP file output from Ruby.

Initially written and as a spiritual successor to zipline and now proudly powering it under the hood.

Allows you to write a ZIP archive out to a File, Socket, String or Array without having to rewind it at any point. Usable for creating very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without memory inflation.

zip_tricks currently handles all our zipping needs (millions of ZIP files generated per day), so we are pretty confident it is widely compatible with a large number of unarchiving end-user applications.

Requirements

Ruby 2.1+ syntax support (keyword arguments with defaults) and a working zlib (all available to jRuby as well). jRuby might experience problems when using the reader methods due to the argument of IO#seek being limited to 32 bit sizes.

Diving in: send some large CSV reports from Rails

The easiest is to include the ZipTricks::RailsStreaming module into your controller.

class ZipsController < ActionController::Base
  include ZipTricks::RailsStreaming

  def download
    zip_tricks_stream do |zip|
      zip.write_deflated_file('report1.csv') do |sink|
        CSV(sink) do |csv_write|
          csv_write << Person.column_names
          Person.all.find_each do |person|
            csv_write << person.attributes.values
          end
        end
      end
      zip.write_deflated_file('report2.csv') do |sink|
        ...
      end
    end
  end
end

If you want some more conveniences you can also use zipline which will automatically process and stream attachments (Carrierwave, Shrine, ActiveStorage) and remote objects via HTTP.

Create a ZIP file without size estimation, compress on-the-fly during writes

Basic use case is compressing on the fly. Some data will be buffered by the Zlib deflater, but memory inflation is going to be very constrained. Data will be written to destination at fairly regular intervals. Deflate compression will work best for things like text files.

out = my_tempfile # can also be a socket
ZipTricks::Streamer.open(out) do |zip|
  zip.write_stored_file('mov.mp4.txt') do |sink|
    File.open('mov.mp4', 'rb'){|source| IO.copy_stream(source, sink) }
  end
  zip.write_deflated_file('long-novel.txt') do |sink|
    File.open('novel.txt', 'rb'){|source| IO.copy_stream(source, sink) }
  end
end

Unfortunately with this approach it is impossible to compute the size of the ZIP file being output, since you do not know how large the compressed data segments are going to be.

Send a ZIP from a Rack response

To "pull" data from ZipTricks you can create an OutputEnumerator object which will yield the binary chunks piece by piece, and apply some amount of buffering as well. Since this OutputEnumerator responds to #each and yields Strings it also can (and should!) be used as a Rack response body. Return it to your webserver and you will have your ZIP streamed. The block that you give to the OutputEnumerator will only start executing once your response body starts getting iterated over - when actually sending the response to the client (unless you are using a buffering Rack webserver, such as Webrick).

body = ZipTricks::Streamer.output_enum do | zip |
  zip.write_stored_file('mov.mp4') do |sink| # Those MPEG4 files do not compress that well
    File.open('mov.mp4', 'rb'){|source| IO.copy_stream(source, sink) }
  end
  zip.write_deflated_file('long-novel.txt') do |sink|
    File.open('novel.txt', 'rb'){|source| IO.copy_stream(source, sink) }
  end
end
[200, {}, body]

Send a ZIP file of known size, with correct headers

Use the SizeEstimator to compute the correct size of the resulting archive.

# Precompute the Content-Length ahead of time
bytesize = ZipTricks::SizeEstimator.estimate do |z|
 z.add_stored_entry(filename: 'myfile1.bin', size: 9090821)
 z.add_stored_entry(filename: 'myfile2.bin', size: 458678)
end

# Prepare the response body. The block will only be called when the response starts to be written.
zip_body = ZipTricks::RackBody.new do | zip |
  zip.add_stored_entry(filename: "myfile1.bin", size: 9090821, crc32: 12485)
  zip << read_file('myfile1.bin')
  zip.add_stored_entry(filename: "myfile2.bin", size: 458678, crc32: 89568)
  zip << read_file('myfile2.bin')
end

[200, {'Content-Length' => bytesize.to_s}, zip_body]

Writing ZIP files using the Streamer bypass

You do not have to "feed" all the contents of the files you put in the archive through the Streamer object. If the write destination for your use case is a Socket (say, you are writing using Rack hijack) and you know the metadata of the file upfront (the CRC32 of the uncompressed file and the sizes), you can write directly to that socket using some accelerated writing technique, and only use the Streamer to write out the ZIP metadata.

# io has to be an object that supports #<<
ZipTricks::Streamer.open(io) do | zip |
  # raw_file is written "as is" (STORED mode).
  # Write the local file header first..
  zip.add_stored_entry(filename: "first-file.bin", size: raw_file.size, crc32: raw_file_crc32)

  # Adjust the ZIP offsets within the Streamer
  zip.simulate_write(my_temp_file.size)

  # ...and then send the actual file contents bypassing the Streamer interface
  io.sendfile(my_temp_file)

end

Other usage examples

Check out the examples/ directory at the root of the project. This will give you a good idea of various use cases the library supports.

Computing the CRC32 value of a large file

BlockCRC32 computes the CRC32 checksum of an IO in a streaming fashion. It is slightly more convenient for the purpose than using the raw Zlib library functions.

crc = ZipTricks::StreamCRC32.new
crc << next_chunk_of_data
...

crc.to_i # Returns the actual CRC32 value computed so far
...
# Append a known CRC32 value that has been computed previosuly
crc.append(precomputed_crc32, size_of_the_blob_computed_from)

You can also compute the CRC32 for an entire IO object if it responds to #eof?:

crc = ZipTricks::StreamCRC32.from_io(file) # Returns an Integer

Reading ZIP files

The library contains a reader module, play with it to see what is possible. It is not a complete ZIP reader but it was designed for a specific purpose (highly-parallel unpacking of remotely stored ZIP files), and as such it performs it's function quite well. Please beware of the security implications of using ZIP readers that have not been formally verified (ours hasn't been).

Contributing to zip_tricks

  • Check out the latest main to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
  • Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
  • Fork the project.
  • Start a feature/bugfix branch.
  • Commit and push until you are happy with your contribution.
  • Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

Copyright and license

Copyright (c) 2020 WeTransfer.

zip_tricks is distributed under the conditions of the Hippocratic License See LICENSE.txt for further details. If this license is not acceptable for your use case we still maintain the 4.x version tree which remains under the MIT license, see https://rubygems.org/gems/zip_tricks/versions for more information. Note that we only backport some performance optimizations and crucial bugfixes but not the new features to that tree.

More Repositories

1

WeScan

Document Scanning Made Easy for iOS
Swift
2,825
star
2

Mocker

Mock Alamofire and URLSession requests without touching your code implementation
Swift
1,096
star
3

Diagnostics

Allow users to easily share Diagnostics with your support team to improve the flow of fixing bugs.
Swift
939
star
4

UINotifications

Present custom in-app notifications easily in Swift
Swift
394
star
5

GitBuddy

Your buddy in managing and maintaining GitHub repositories, and releases. Automatically generate changelogs from issues and merged pull-requests.
Swift
240
star
6

WeTransfer-iOS-CI

Containing all the shared CI logic for WeTransfer repositories
Swift
223
star
7

prorate

Redis-based rate limiter (with a leaky bucket implementation in Lua)
Ruby
86
star
8

wt_activerecord_index_spy

A gem to spy queries running with Active Record and report missing indexes
Ruby
77
star
9

format_parser

file metadata parsing, done cheap
Ruby
62
star
10

wt-js-sdk

A JavaScript SDK for WeTransfer's Public API
JavaScript
47
star
11

WeTransfer-Swift-SDK

A Swift SDK for WeTransferโ€™s public API
Swift
39
star
12

Sketch-Plugin

Plugin to share artboards directly via WeTransfer. Share the link easily with your colleagues and friends.
Objective-C
39
star
13

sqewer

SQS queue processor engine
Ruby
30
star
14

wt_s3_signer

Fast S3 key urls signing
Ruby
26
star
15

cr_zip_tricks

Alternate ZIP writer for Crystal, ported from zip_tricks for Ruby
Crystal
25
star
16

image_vise

Image processing proxy that works via signed URLs
Ruby
20
star
17

ghost_adapter

Run ActiveRecord migrations through gh-ost
Ruby
19
star
18

concorde.js

A sexy pinnacle of engineering thatโ€™s nonetheless incredibly inefficient and expensive and goes out of business because it canโ€™t find enough use. It also provides some tools to deal with the browser.
JavaScript
17
star
19

fast_send

Send very large HTTP responses via file buffers
Ruby
16
star
20

apiculture

Honey-tasting REST API toolkit for Sinatra
Ruby
12
star
21

WeScanAndroid

The Android Implementation of WeScan https://github.com/wetransfer/wescan
11
star
22

wetransfer_ruby_sdk

A Ruby SDK for WeTransfer's Public API
Ruby
11
star
23

richurls

Service which enriches URLs fast and cheap
Ruby
10
star
24

measurometer

Minimum viable API for โฑ๐Ÿ“ˆ in ๐Ÿ’Ž libraries
Ruby
10
star
25

interval_response

Serve partial (Range) HTTP responses from ๐Ÿ’Ž applications
Ruby
9
star
26

activerecord_autoreplica

Simple read replica proxy for ActiveRecord
Ruby
7
star
27

wt-api-docs

Official documentation for WeTransfer's Public API
Ruby
7
star
28

product-engineering-career-framework

This repo holds discussion and the permalink to WeTransfer's internal Product Engineering Career Framework.
7
star
29

hash_tools

Do useful things to Ruby Hashes, without monkey-patches
Ruby
5
star
30

rational_choice

A fuzzy logic gate
Ruby
4
star
31

Xperiments

Simple A/B testing tool. Includes CMS and an experimentation engine.
JavaScript
4
star
32

amplitude-client-node

Node.js client for the Amplitude API
TypeScript
4
star
33

Danger

Contains our global Danger file.
Ruby
3
star
34

very_tiny_state_machine

For when you need it even smaller than you think you do
Ruby
3
star
35

eslint-config-wetransfer

ESLint shareable config used for WeTransfer JS projects.
JavaScript
3
star
36

wetransfer_style

At WeTransfer we code in style. This is our coding style for Ruby development.
Ruby
3
star
37

runaway

Controls child process execution, with hard limits on maximum runtime and heartbeat timings
Ruby
2
star
38

megabytes

Tiny byte size formatter
Ruby
1
star
39

Actions-Experiment

A Repo to experiment with github actions to build previews for the frontend.
JavaScript
1
star
40

strict_request_uri

Truncate and cleanup URLs with junk in Rack
Ruby
1
star
41

tdd-workshop

Repo to host the code for the TDD workshop
Kotlin
1
star
42

departure

WeTransfer's fork of departurerb/departure, to accelerate Rails 5.2 support. See the link for the original repo:
Ruby
1
star
43

sanitize_user_agent_header

Ensure User-Agent gets correctly UTF-8 encoded
Ruby
1
star
44

ks

Keyword-initialized Structs
Ruby
1
star
45

format_parser_pdf

file metadata parsing, for PDF
Ruby
1
star
46

EmbedExamples

Examples on how to use WeTransfer Embed
Ruby
1
star
47

simple_compress

GZIP compression to and from a String
Ruby
1
star