• Stars
    star
    820
  • Rank 55,603 (Top 2 %)
  • Language
    Swift
  • License
    GNU Lesser Genera...
  • Created almost 3 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fast Xcode unarchiver

unxip

unxip is a command-line tool designed for rapidly unarchiving Xcode XIP files and writing them to disk with good compression. Its goal is to outperform Bom (which powers xip(1) and Archive Utility) in both performance and on-disk usage, and (at the time of writing) does so by a factor of about 2-3x in time spent decompressing and about 8% in space.

Installation

The easiest way to install unxip is to grab a precompiled binary for macOS 12.0 and later from the releases page. If you prefer, you can also install unxip from your package manager: it's available on MacPorts, and Homebrew. Both will make the latest version of the command available under the package name "unxip".

Building

unxip is fairly simple and implemented as a single file. Thus, you can build it by compiling that file directly, with just an up-to-date version of the Command Line Tools (xcode-select --install):

$ swiftc -parse-as-library -O unxip.swift

This will build an optimized unxip binary for your computer's native architecture. Because unxip uses Swift Concurrency, it is recommended that you build on macOS Monterey or later; macOS Big Sur is technically supported but needs to use backdeployment libraries that are not very easy to distribute with a command line tool.

If you prefer to use Swift Package Manager to build your code, a Package.swift is also available. This has the downside of requiring a full Xcode installation to bootstrap the build, but makes it easy to build a Universal binary:

$ swift build -c release --arch arm64 --arch x86_64

When run from the project root, the resulting executable will be located at .build/apple/Products/Release/unxip.

Finally, you may also use the provided Makefile to build and install unxip:

$ make all
$ make install

The installation prefix is configurable via the PREFIX variable.

unxip is not currently designed to be embedded directly into the address space of another application. While it would "work" (with minor modifications to allow linking) its implementation expects to be the only user of the cooperative thread pool and effectively takes it over, which may adversely affect other code that wishes to run on it. The recommended way to use unxip is spawning it as a subtask.

Usage

The intended usage of unxip is with a single command line parameter that represents the path to an XIP from Apple that contains Xcode. For example:

$ unxip Xcode.xip # will produce Xcode.app in the current directory

As the tool is still somewhat rough, its error handling is not very good at the moment. An attempt has been made to at least crash preemptively when things go wrong, but you may still run into strange behavior on edge cases. For best results, ensure that the directory you are running unxip from does not contain any existing Xcode(-beta).app bundles and that you are using a modern version of macOS on a fast APFS filesystem. For simplicity, unxip does not perform any signature verification, so if authentication is important you should use another mechanism (such as a checksum) for validation.

Contributing

When making changes, be sure to use swift-format on the source:

$ swift-format -i *.swift

Design

As a purpose-built tool, unxip outperforms Bom because of several key implementation decisions. Heavy use of Swift Concurrency allows unxip to unlock parallelization opportunities that Bom largely misses, and the use of LZFSE rather than the simpler LZVN gives it higher compression ratios. To understand its design, it's important to first be familiar with the Xcode XIP format and APFS transparent compression.

XIPs, including the ones that Xcode come in, are XAR archives, which contain a table of contents that lists each file inside and the compression used for each. However, unlike most XARs Xcode's only has two files: a bzip2-compressed Metadata that is just a few hundred bytes, and a multi-gigabyte file named Content that is stored "uncompressed". While marked as plain data, this file is an apparently proprietary archive format called pbzx. Luckily, the scheme is fairly straightforward and several people on the internet have already tried reverse engineering it. This tool contains an independent implementation that nonetheless shares many of its core details with these efforts. compressed in a format documented by compression_tool(1). The compressed content inside the pbzx is an ASCII-representation cpio archive, which has been split apart into 16MB chunks that have either been individually compressed with LZMA or included as-is. Unfortunately pbzx does not contain a table of contents, or any structure aside from these (byte-, rather than file-aligned) chunks, so distinguishing individual files is not possible without decompressing the entire buffer.

Parsing this cpio archive gives the necessary information need to reconstruct an Xcode bundle, but unxip (and Bom) go through an additional step to apply transparent APFS compression to files that could benefit from it, which significantly reduces size on-disk. For this operation, unxip chooses to use the LZFSE algorithm, while Bom uses the simpler LZVN. The compressed data is stored in the file's resource fork, a special header describing the compression is constructed in an xattr, and then UF_COMPRESSED is set on the file.

On the whole, this procedure is designed to be fairly linear, with the XIP being read sequentially, producing LZMA chunks that are reassembled in order to create the cpio archive, which can then be streamed to reconstruct an Xcode bundle. Unfortunately, a naive implementation of this is process does not perform very well due to the varying performance bottlenecks of each step. To make matters worse, the size of Xcode makes it infeasible to operate with entirely in memory. To get around this problem, unxip parallelizes intermediate steps and then streams results in linear order, benefiting from much better processor utilization and allowing the file to be processed in "sliding window" fashion.

On modern processors, single-threaded LZMA decoding is limited to about ~100 MB/s; as the Xcode cpio is almost 40 GB large, this is not really fast enough for unxip. Instead, unxip carves out each chunk from the pbzx archive into its own task (the metadata in the file format makes this fairly straightforward) and decompresses each in parallel. To limit memory usage, a cap is applied to how many chunks are resident in memory at once. Since the next step (parsing the cpio) requires logical linearity, completing chunks are temporarily parked until their preceding ones complete, after which they are all yielded together. This preserves order while still providing an opportunity for multiple chunks to be decoded in parallel. In practice, this technique can decode the LZMA stream at effective speeds approaching 1 GB/s when provided with enough CPU cores.

The linear chunk stream (now decompressed into a cpio) is then parsed in sequence to extract files, directories, and their associated metadata. cpios are naturally ordered–for example, all additional hardlinks must come after the original file–but Xcode's has an additional nice property that it's been sorted so that all directories appear before the files inside of them. This allows for a sequential stream of filesystem operations to correctly produce the bundle, without running into errors with missing intermediate directories or link targets.

While simplifying the implementation, this order makes it difficult for unxip to efficiently schedule filesystem operations and transparent compression. To resolve this, a dependency graph is created for each file (directories, files, and symlinks depend on their parent directory's existence, hardlinks require their target to exist) and then the task is scheduled in parallel with those constraints applied. New file writes are particularly expensive because compression is applied before the data is written to disk. While this step is already parallelized to some extent because of the graph described earlier, there is a chance for additional parallelism in Apple's filesystem compression implementation because it chunks data internally at 64KB chunk boundaries, which we can then run in parallel. LZFSE achieves high compression ratios and has a performant implementation, which we can take advantage of largely for free. Unlike most of our steps, which were compute-bound, the final step of writing to disk requires interacting with the kernel. If we're careless we can accidentally overload the system with operations and back up our entire pipeline. To prevent unconsumed chunks sitting around in memory, we manually apply backpressure on our stream by having them only yield results when later steps are ready to consume them.

Overall, this architecture allows unxip to utilize CPU cores and dispatch disk writes fairly well. It is likely that there is still some room for improvement in its implementation, especially around the constants chosen for batch sizes and backoff intervals (some of which can probably be done much better by the runtime itself once it is ready). Ideas on how its performance can be further improved are always welcome :)

Finally, I am very thankful to Kevin Elliott and the rest of the DTS team for fielding some of my filesystem-related questions; the answers were very helpful when I was designing unxip.

More Repositories

1

Ensemble

Cast Mac windows to visionOS
Swift
832
star
2

VirtualApple

Work with macOS VMs using Virtualization
Swift
257
star
3

TSOEnabler

Kernel extension that enables TSO for Apple silicon processes
C
239
star
4

EffectivePower

Apple PLSQL viewer
Swift
156
star
5

AppleConnect

Peer-to-peer bidirectional connection based on Network.framework
Swift
105
star
6

Chronicle

High-performance Swift logging for the rest of us
Swift
100
star
7

OpenAdBlock

The free, open-source Content Blocker for 32- and 64-bit iOS devices
Swift
82
star
8

dummy_thicc

0xcafebabe
C
68
star
9

macOSSandboxInitializationBypass

App sandbox escapes for macOS
Objective-C
29
star
10

dotfiles

Configuration and stuff. Messy.
Shell
26
star
11

DetailsViewer

A macOS Mail plugin to show extra details
Swift
21
star
12

Jumpcut

Personal patches to change the UI a bit
Objective-C
19
star
13

DarkNight

Sherlocked macOS launch agent for synchronizing Night Shift and the macOS system appearance
Swift
16
star
14

tinycc

Fork of the Tiny C Compiler patched to compile code on iOS
C
16
star
15

CrankPlayer

A crank-based video player for the Panic Playdate
C
16
star
16

expresscall

Proof-of-concept LLVM fork to speculatively inline objc_msgSend
C++
15
star
17

Presentations

Slides and resources for talks I've given
14
star
18

vers

Prints version information for easy pasting into Bug Reporter
Swift
14
star
19

Cod

A binary Swift Coder implementation
Swift
11
star
20

break

App for School Loop
Swift
11
star
21

GenerateAppIcons

A command line tool to generate app icons
Swift
9
star
22

binja-swift-demangler

Binary Ninja plugin to demangle Swift function names
Python
9
star
23

DerivedDataDeleter

Source Editor Extension to delete Xcode's DerivedData folder
Swift
7
star
24

elevate

The concise C++ STL extension
C++
5
star
25

ions

List of ions for 68k TI calculators
C
5
star
26

homebrew-tap

Homebrew tap for personal projects
Ruby
4
star
27

Complicated

My personal watchOS widgets
Swift
3
star
28

ictf-carprey

CarPrey jeopardy challenge and writeup from iCTF 2019
Python
3
star
29

advent-of-code

Advent of Code
Python
3
star
30

ports

Portfiles for personal MacPorts ports
3
star
31

SFSEE

swift-format Source Editor Extension
Swift
3
star
32

saagarjha.github.io

SCSS
3
star
33

MailHeaderLock

OS X Mail plugin to lock headers
Swift
2
star
34

MARS

Fork of the MARS MIPS simulator, modified to behave better on macOS
Java
2
star
35

WWDC18-Scholarship-Submission

Submission for the WWDC18 Scholarships application
Swift
1
star
36

El-Capitan-Theme

OS X Yosemite inspired theme for Sublime Text 3
1
star
37

hdshell

CLI wrapper for HDFS
Swift
1
star
38

CCode

C compiler for non-jailbroken iOS devices using TinyCC
1
star
39

TIGCC-macOS

TIGCC compiled for macOS
C
1
star
40

bntv450-images

Boot, recovery, and patched (rooted) images for Barnes and Noble Nook Tablet 7"
1
star