UnityDataTools
The UnityDataTool is a set of command line tools showcasing what can be done with the UnityFileSystemApi native dynamic library. The main purpose of these tools is to analyze the content of Unity data files. You can directly jump here if your goal is to understand how to use the UnityDataTool command-line tool.
The UnityFileSystemApi library is distributed in the Tools folder of the Unity editor (starting in version 2022.1.0a14). For simplicity, it is also included in this repository. The library is backward compatible, which means that it can read data files generated by any previous version of Unity.
Note that the UnityFileSystemApi library included in this repository is a custom version containing an additional function that is required to support the SerializeReference attribute. This version of the library will be included in future releases of Unity.
What is the purpose of the UnityFileSystemApi native library?
The purpose of the UnityFileSystemApi is to expose the functionalities of the WebExtract and binary2text tools, but in a more flexible way. To fully understand what it means, let's first discuss how Unity generates the data files in a build. The data referenced by the scenes in a build is called the Player Data and is contained in SerializedFiles. A SerializedFile is the file format used by Unity to store its data. In builds, they contain the serialized assets in the target's platform-specific format.
When using AssetBundles or Addressables, things are slightly different. Firstly, note that Addressables are AssetBundles on disk so we will only use the term AssetBundle in the remaining of this document. AssetBundles are archive files (similar to zip files) that can be mounted at runtime. They contain SerializedFiles, but contrary to those of the Player Data, they include what is called a TypeTree1.
Note: it is possible to generate TypeTrees for the Player data starting in Unity 2021.2. To do so, the ForceAlwaysWriteTypeTrees Diagnostic Switch must be enabled in the Editor Preferences (Diagnostic/Editor section).
The TypeTree is a data structure exposing how objects have been serialized, i.e. the name, type and size of their properties. It is used by Unity when loading an AssetBundle that was built by a previous Unity version (so you don't necessarily have to update all AssetBundles after upgrading a project to a newer version of Unity).
The content of a SerializedFile including a TypeTree can be converted to a human-readable format using the binary2text tool that can be found in the Tools folder of Unity. In the case of AssetBundles, the SerializedFiles must first be extracted using the WebExtract tool that is also in the Tools folder. For the Player Data, there is no TypeTree because it is included in a build and
The text file generated by binary2text can be very useful to diagnose issues with a build, but they are usually very large and difficult to navigate. Because of this, a tool called the AssetBundle Analyzer was created to make it easier to extract useful information from these files in the form of a SQLite database. The AssetBundle Analyzer has been quite successful but it has several issues. It is extremely slow as it runs WebExtract and binary2text on all the AssetBundles of a project and has to parse very large text files. It can also easily fail because the syntax used by binary2text is not standard and can even be impossible to parse in some occasions.
The UnityFileSystemApi library has been created to expose WebExtract and binary2text functionalities. This enables the creation of tools that can read Unity data files with TypeTrees. With it, it becomes very easy to create a binary2text-like tool that can output the data in any format or a new faster and simpler AssetBundle Analyzer.
Repository content
The repository contains the following items:
- UnityFileSystem: source code of a .NET class library exposing the functionalities or the UnityFileSystemApi native library.
- UnityFileSystem.Tests: test suite for the UnityFileSystem library.
- UnityFileSystemTestData: the Unity project used to generate the test data.
- TestCommon: a helper library used by the test projects.
- UnityDataTool: a command-line tool providing several features that can be used to analyze the content of Unity data files.
- Analyzer: a class library that can be used to extract key information from Unity data files and output it into a SQLite database (similar to the AssetBundle Analyzer).
- TextDumper: a class library that can be used to dump SerializedFiles into a human-readable format (similar to binary2text).
- ReferenceFinder: a class library that can be used to find reference chains from objects to other objects using a database created by the Analyzer
How to build
The projects in this solution require the .NET 6.0 SDK. You can use your favorite IDE to build them. They were tested in Visual Studio on Windows and Rider on Mac.
It is also possible to build the projects from the CLI using this command:
dotnet build -c Release
Disclaimer
This project is provided on an "as-is" basis and is not officially supported by Unity. It is an experimental tool provided as an example of what can be done using the UnityFileSystemApi. You can report bugs and submit pull requests, but there is no guarantee that they will be addressed.
Footnotes: 1: AssetBundles include the TypeTree by default but this can be disabled by using the DisableWriteTypeTree option.