• Stars
    star
    406
  • Rank 104,077 (Top 3 %)
  • Language
    Go
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Library and tools for parsing and writing MP4 files including video, audio and subtitles. The focus is on fragmented files. Includes mp4ff-info, mp4ff-encrypt, mp4ff-decrypt and other tools.

Logo

Test golangci-lint GoDoc Go Report Card license

Package mp4ff implements MP4 media file parsing and writing for AVC and HEVC video, AAC and AC-3 audio, and stpp and wvtt subtitles. It is focused on fragmented files as used for streaming in DASH, MSS and HLS fMP4, but can also decode and encode all boxes needed for progressive MP4 files. In particular, the tool mp4ff-crop can be used to crop a progressive file.

Command Line Tools

Some useful command line tools are available in cmd.

  1. mp4ff-info prints a tree of the box hierarchy of a mp4 file with information about the boxes. The level of detail can be increased with the option -l, like -l all:1 for all boxes or -l trun:1,stss:1 for specific boxes.
  2. mp4ff-pslister extracts and displays SPS and PPS for AVC or HEVC in a mp4 or a bytestream (Annex B) file. Partial information is printed for HEVC.
  3. mp4ff-nallister lists NALUs and picture types for video in progressive or fragmented file
  4. mp4ff-wvttlister lists details of wvtt (WebVTT in ISOBMFF) samples
  5. mp4ff-crop shortens a progressive mp4 file to a specified duration

You can install these tools by going to their respective directory and run go install . or directly from the repo with

go install github.com/Eyevinn/mp4ff/cmd/mp4ff-info@latest

Example code

Example code is available in the examples directory. The examples and their functions are:

  1. initcreator creates typical init segments (ftyp + moov) for video and audio
  2. resegmenter reads a segmented file (CMAF track) and resegments it with other segment durations using fullSample
  3. segmenter takes a progressive mp4 file and creates init and media segments from it. This tool has been extended to support generation of segments with multiple tracks as well as reading and writing mdat in lazy mode
  4. multitrack parses a fragmented file with multiple tracks
  5. decrypt-cenc decrypts a segmented mp4 file encrypted in cenc mode
  6. combine-segs combines single-track init and media segments into multi-track segments

Library

The library has functions for parsing (called Decode) and writing (Encode) in the package mp4ff/mp4. It also contains codec specific parsing of AVC/H.264 including complete parsing of SPS and PPS in the package mp4ff.avc. HEVC/H.265 parsing is less complete, and available as mp4ff.hevc. Supplementary Enhancement Information can be parsed and written using the package mp4ff.sei.

Traditional multiplexed non-fragmented mp4 files can be parsed and decoded, but the focus is on fragmented mp4 files as used in DASH, HLS, and CMAF.

Beyond single-track fragmented files, support has been added to parse and generate multi-track fragmented files as can be seen in examples/segment and examples/multitrack.

The top level structure for both non-fragmented and fragmented mp4 files is mp4.File.

In a progressive (non-fragmented) mp4.File, the top level attributes Ftyp, Moov, and Mdat points to the corresponding boxes.

A fragmented mp4.File can be more or less complete, like a single init segment, one or more media segments, or a combination of both like a CMAF track which renders into a playable one-track asset. It can also have multiple tracks. For fragmented files, the following high-level attributes are used:

  • Init contains a ftyp and a moov box and provides the general metadata for a fragmented file. It corresponds to a CMAF header. It can also contain one or more sidx boxes.
  • Segments is a slice of MediaSegment which start with an optional styp box, possibly one or more sidx boxes and then one or moreFragment`s.
  • Fragment is a mp4 fragment with exactly one moof box followed by a mdat box where the latter contains the media data. It can have one or more trun boxes containing the metadata for the samples.

All child boxes of container box such as MoovBox are listed in the Children attribute, but the most prominent child boxes have direct links with names which makes it possible to write a path such as

fragment.Moof.Traf.Trun

to access the (only) trun box in a fragment with only one traf box, or

fragment.Moof.Trafs[1].Trun[1]

to get the second trun of the second traf box (provided that they exist). Care must be taken to assert that none of the intermediate pointers are nil to avoid panic.

Creating new fragmented files

A typical use case is to a fragment consisting of an init segment followed by a series of media segments.

The first step is to create the init segment. This is done in three steps as can be seen in examples/initcreator:

init := mp4.CreateEmptyInit()
init.AddEmptyTrack(timescale, mediatype, language)
init.Moov.Trak.SetHEVCDescriptor("hvc1", vpsNALUs, spsNALUs, ppsNALUs)

Here the third step fills in codec-specific parameters into the sample descriptor of the single track. Multiple tracks are also available via the slice attribute Traks instead of Trak.

The second step is to start producing media segments. They should use the timescale that was set when creating the init segment. Generally, that timescale should be chosen so that the sample durations have exact values without rounding errors.

A media segment contains one or more fragments, where each fragment has a moof and a mdat box. If all samples are available before the segment is created, one can use a single fragment in each segment. Example code for this can be found in examples/segmenter.

A simple, but not optimal, way of creating a media segment is to first create a slice of FullSample with the data needed. The definition of mp4.FullSample is

mp4.FullSample{
 Sample: mp4.Sample{
  Flags uint32 // Flag sync sample etc
  Dur   uint32 // Sample duration in mdhd timescale
  Size  uint32 // Size of sample data
  Cto   int32  // Signed composition time offset
 },
 DecodeTime uint64 // Absolute decode time (offset + accumulated sample Dur)
 Data       []byte // Sample data
}

The mp4.Sample part is what will be written into the trun box. DecodeTime is the media timeline accumulated time. The DecodeTime value of the first sample of a fragment, will be set as the BaseMediaDecodeTime in the tfdt box.

Once a number of such full samples are available, they can be added to a media segment like

seg := mp4.NewMediaSegment()
frag := mp4.CreateFragment(uint32(segNr), mp4.DefaultTrakID)
seg.AddFragment(frag)
for _, sample := range samples {
 frag.AddFullSample(sample)
}

This segment can finally be output to a w io.Writer as

err := seg.Encode(w)

For multi-track segments, the code is a bit more involved. Please have a look at examples/segmenter to see how it is done. A more optimal way of handling media sample is to handle them lazily, as explained next.

Lazy decoding and writing of mdat data

For video and audio, the dominating part of a mp4 file is the media data which is stored in one or more mdat boxes. In some cases, for example when segmenting large progressive files, it is much more memory efficient to just read the movie or fragment data from the moov or moof box and defer the reading of the media data from the mdat box to later.

For decoding, this is supported by running mp4.DecodeFile() in lazy mode as

parsedMp4, err = mp4.DecodeFile(ifd, mp4.WithDecodeMode(mp4.DecModeLazyMdat))

In this case, the media data of the mdat box will not be read, but only its size is being set. To read or copy the actual data corresponding to a sample, one must calculate the corresponding byte range and either call

func (m *MdatBox) ReadData(start, size int64, rs io.ReadSeeker) ([]byte, error)

or

func (m *MdatBox) CopyData(start, size int64, rs io.ReadSeeker, w io.Writer) (nrWritten int64, err error)

Example code for this, including lazy writing of mdat, can be found in examples/segmenter with the lazy mode set.

More efficient I/O using SliceReader and SliceWriter

The use of the interfaces io.Reader and io.Writer for reading and writing boxes gives a lot of flexibility, but is not optimal when it comes to memory allocation. In particular, the Read(p []byte) method needs a slice p of the proper size to read data, which leads to a lot of allocations and copying of data. In order to achieve better performance, it is advantageous to read the full top level boxes into one, or a few, slices and decode these.

To enable that mode, version 0.27 of the code introduced DecodeX(sr bits.SliceReader) methods to every box X where mp4ff.bits.SliceReader is an interface. For example, the TrunBox gets the method DecodeTrunSR(sr bits.SliceReader) in addition to its old DecodeTrun(r io.Reader) method. The bits.SliceReader interface provides methods to read all kinds of data structures from an underlying slice of bytes. It has an implementation bits.FixedSliceReader which uses a fixed-size slice as underlying slice, but one could consider implementing a growing version which would get its data from some external source.

The memory allocation and speed improvements achieved by this may vary, but should be substantial, especially compared to versions before 0.27 which used an extra io.LimitReader layer.

Fur further reduction of memory allocation when reading the ´mdat` data of a progressive file, some sort of buffered reader should be used.

Benchmarks

To investigate the efficiency of the new SliceReader and SliceWriter methods, benchmarks have been done. The benchmarks are defined in the file mp4/benchmarks_test.go and mp4/benchmarks_srw_test.go. For DecodeFile, one can see a big improvement by going from version 0.26 to version 0.27 which both use the io.Reader interface but another big increase by using the SliceReader source. The latter benchmarks are called BenchmarkDecodeFileSR but have here been given the same name, for easy comparison. Note that the allocations here refers to the heap allocations that are done inside the benchmark loop. Outside that loop, a slice is allocated to keep the input data.

For EncodeFile, one can see that v0.27 is actually worse than v0.26 when used with the io.Writer interface. That is because the code was restructured so that all writes go via the SliceWriter layer in order to reduce code duplication. However, if instead using the SliceWriter methods directly, there is a big relative gain in allocations as can be seen in the last column.

name \ time/op v0.26 v0.27 v0.27-srw
DecodeFile/1.m4s-16 21.9µs 6.7µs 2.6µs
DecodeFile/prog_8s.mp4-16 143µs 48µs 16µs
EncodeFile/1.m4s-16 1.70µs 2.14µs 1.50µs
EncodeFile/prog_8s.mp4-16 15.7µs 18.4µs 12.9µs
name \ alloc/op v0.26 v0.27 v0.27-srw
DecodeFile/1.m4s-16 120kB 28kB 2kB
DecodeFile/prog_8s.mp4-16 906kB 207kB 12kB
EncodeFile/1.m4s-16 1.16kB 1.39kB 0.08kB
EncodeFile/prog_8s.mp4-16 6.84kB 8.30kB 0.05kB
name \ allocs/op v0.26 v0.27 v0.27-srw
DecodeFile/1.m4s-16 98.0 42.0 34.0
DecodeFile/prog_8s.mp4-16 454 180 169
EncodeFile/1.m4s-16 15.0 15.0 3.0
EncodeFile/prog_8s.mp4-16 101 86 1

Box structure and interface

Most boxes have their own file named after the box, but in some cases, there may be multiple boxes that have the same content, and the code file then has a generic name like mp4/visualsampleentry.go.

The Box interface is specified in mp4/box.go. It does not contain decode (parsing) methods which have distinct names for each box type and are dispatched,

The mapping for decoding dispatch is given in the table mp4.decoders for the io.Reader methods and in mp4.decodersSR for the mp4ff.bits.SliceReader methods.

How to implement a new box

To implement a new box fooo, the following is needed.

Create a file fooo.go and create a struct type FoooBox.

FoooBox must implement the Box interface methods:

Type()
Size()
Encode(w io.Writer)
EncodeSW(sw bits.SliceWriter)  // new in v0.27.0
Info()

It also needs its own decode method DecodeFooo, which must be added in the decoders map in box.go, and the new in v0.27.0 DecodeFoooSR method in decodersSR. For a simple example, look at the PrftBox in prft.go.

A test file fooo_test.go should also have a test using the method boxDiffAfterEncodeAndDecode to check that the box information is equal after encoding and decoding.

Direct changes of attributes

Many attributes are public and can therefore be changed in freely. The advantage of this is that it is possible to write code that can manipulate boxes in many different ways, but one must be cautious to avoid breaking links to sub boxes or create inconsistent states in the boxes.

As an example, container boxes such as TrafBox have a method AddChild which adds a box to Children, its slice of children boxes, but also sets a specific member reference such as Tfdt to point to that box. If Children is manipulated directly, that link may not be valid.

Encoding modes and optimizations

For fragmented files, one can choose to either encode all boxes in a mp4.File, or only code the ones which are included in the init and media segments. The attribute that controls that is called FragEncMode. Another attribute EncOptimize controls possible optimizations of the file encoding process. Currently, there is only one possible optimization called OptimizeTrun. It can reduce the size of the TrunBox by finding and writing default values in the TfhdBox and omitting the corresponding values from the TrunBox. Note that this may change the size of all ancestor boxes of trun.

Sample Number Offset

Following the ISOBMFF standard, sample numbers and other numbers start at 1 (one-based). This applies to arguments of functions and methods. The actual storage in slices is zero-based, so sample nr 1 has index 0 in the corresponding slice.

Stability

The APIs should be fairly stable, but minor non-backwards-compatible changes may happen until version 1.

Specifications

The main specification for the MP4 file format is the ISO Base Media File Format (ISOBMFF) standard ISO/IEC 14496-12 6th edition 2020. Some boxes are specified in other standards, as should be commented in the code.

LICENSE

MIT, see LICENSE.

Some code in pkg/mp4, comes from or is based on https://github.com/jfbus/mp4 which has Copyright (c) 2015 Jean-François Bustarret.

Some code in pkg/bits comes from or is based on https://github.com/tcnksm/go-casper/tree/master/internal/bits Copyright (c) 2017 Taichi Nakashima.

ChangeLog and Versions

See CHANGELOG.md.

Support

Join our community on Slack where you can post any questions regarding any of our open source projects. Eyevinn's consulting business can also offer you:

  • Further development of this component
  • Customization and integration of this component into your platform
  • Support and maintenance agreement

Contact [email protected] if you are interested.

About Eyevinn Technology

Eyevinn Technology is an independent consultant firm specialized in video and streaming. Independent in a way that we are not commercially tied to any platform or technology vendor. As our way to innovate and push the industry forward we develop proof-of-concepts and tools. The things we learn and the code we write we share with the industry in blogs and by open sourcing the code we have written.

Want to know more about Eyevinn and how it is to work here. Contact us at [email protected]!

More Repositories

1

streaming-onboarding

New to streaming and don't know where to start? This is the place for you!
442
star
2

toolbox

A set of Docker containers with Streaming tools
Python
230
star
3

channel-engine

Open Source FAST Channel Engine library based on VOD2Live technology
JavaScript
104
star
4

node-srt

Nodejs bindings for Secure Reliable Transport SDK
JavaScript
88
star
5

whip

Client and Server modules for WebRTC HTTP Ingestion Protocol (WHIP)
TypeScript
79
star
6

webrtc-player

WebRTC (recvonly) player
TypeScript
70
star
7

ott-multiview

This is a web based multiview screen for HLS and MPEG-DASH streams based on hls.js and Shaka Player.
JavaScript
60
star
8

srt-whep

SRT to WHEP (WebRTC)
Rust
55
star
9

hls-download

Download HLS and convert to MP4
Python
46
star
10

hls-vodtolive

NPM library to generate HLS Live from HLS VOD
JavaScript
36
star
11

hls-ts-js

HLS MPEG-TS parser library in Javascript
JavaScript
36
star
12

hls-to-dash

Open source packager and tools to rewrap live HLS to live MPEG DASH
Python
35
star
13

hls-monitor

Service to monitor one or more HLS stream(s) for errors and inconsistencies.
TypeScript
31
star
14

chaos-stream-proxy

Add some chaos to your HTTP streams to validate player behaviour
TypeScript
30
star
15

srt-webrtc

JavaScript
29
star
16

web-player

HLS and MPEG-DASH player for web
TypeScript
24
star
17

srt-whip-gateway

Open Source SRT WHIP Gateway
TypeScript
23
star
18

dash-validator-js

MPEG DASH validator JS library
JavaScript
22
star
19

autovmaf

Toolkit to automatically encode multiple bitrates and perform automated VMAF measurements on all of them.
TypeScript
20
star
20

auto-subtitles

Automatically generate subtitles from an input audio or video file using OpenAI Whisper
TypeScript
20
star
21

hls-relay

Script to pull HLS stream from one origin and push to another origin
Python
18
star
22

html-player

HTML5 Player with support for HLS, MPEG-DASH and Smooth Streaming
JavaScript
17
star
23

hls-origin-scripts

Scripts to manipulate HLS manifests at origin or edge server
Python
17
star
24

dash-mpd

MPEG DASH MPD library based on XML Schema for completeness.
Go
16
star
25

hls-splice

NPM library to splice HLS VOD
JavaScript
16
star
26

docker-dash-packager

Open source MPEG DASH packager for live and VOD
ApacheConf
14
star
27

hls-pull-push

NPM library to pull HLS Live and Push to a another origin
TypeScript
14
star
28

http-streaming-samples

Just a collection of HTTP streaming samples
13
star
29

whip-mpegts

MPEG-TS WHIP client
C++
12
star
30

fmp4-js

A Javascript library to parse ISO Base Media File Format (MPEG-4 Part 12)
JavaScript
12
star
31

mp2ts-tools

Tools for MPEG-2 TS
Go
12
star
32

VideoToolboxMacOSExample

Swift
12
star
33

docker-toolbelt

A Docker container filled with pre-built versions of video streaming related tools
Dockerfile
11
star
34

test-adserver

An adserver implementation for testing frameworks
JavaScript
11
star
35

tsgen-svc

Eyevinn Transport Stream Generator
JavaScript
10
star
36

whep-video-component

A web component for WHEP WebRTC video playback
JavaScript
10
star
37

media-event-filter

Interpreter for HTML5 media events
TypeScript
10
star
38

hls-recorder

NPM library to record live HLS
TypeScript
10
star
39

rtsp-hls-push

JavaScript
9
star
40

swift-srt

C++
9
star
41

preview-hls-service

Service to generate preview assets from HLS
TypeScript
8
star
42

demand-side-platform

An open source implementation of a Demand-Side Platform (DSP) to serve for testing and educational purpose
JavaScript
8
star
43

scte35-inserter

Tool to insert SCTE35 messages at a configurable interval
C++
8
star
44

hls-proxy

NPM library for proxying HLS requests
TypeScript
7
star
45

srt-monitor

Open Source SRT Web Monitor
TypeScript
7
star
46

supply-side-platform

An open source implementation of a Supply-Side Platform (SSP) to serve for testing and educational purpose
JavaScript
7
star
47

autovmaf-preprocessing

Tool that analyses a video via a combination of motion and sharpness to determine the most suitable parts to be used for VMAF analysis. https://github.com/Eyevinn/autovmaf
Python
7
star
48

docker-fast

Open Source Sustainable FAST Channel Engine
TypeScript
6
star
49

docker-2dash

A Docker container to pre-package MPEG DASH on demand content
Python
6
star
50

vod-to-live

A python library to generate Live HLS from VOD
Python
6
star
51

function-scenes

A serverless media function to detect scene changes and extract keyframes in a video file or a stream.
JavaScript
6
star
52

wrtc-egress

Server endpoint NPM library for standardized WebRTC based streaming
TypeScript
6
star
53

whip-whep

Run WHIP/WHEP service locally
6
star
54

continue-watching-api

Example of a simple implementation to build a continue watching api on top of Redis
JavaScript
6
star
55

function-probe

Serverless Media Function to obtain Media Info of a video file or video stream
JavaScript
6
star
56

tv.eyevinn.technology

JavaScript
5
star
57

vast-info

Parse a VAST or VMAP to show valuable information in a readable format
JavaScript
5
star
58

video-event-filter

A simple module to filter the events sent from the video element in a way that align with what is, most probably, expected from an analytics perspective.
TypeScript
5
star
59

hls-playlist-parser

A Javascript library to parse Hls playlists
JavaScript
5
star
60

lambda-stitch

A Lambda function to insert ads in VOD
JavaScript
5
star
61

channel-engine-mosaic

Eyevinn Channel Engine Multiviewer
TypeScript
5
star
62

codem-client

Simple clients to codem-transcoder
JavaScript
5
star
63

action-eks-kubectl

GitHub Action for interacting with EKS vended kubectl (k8s)
Dockerfile
5
star
64

cast-receiver

A basic custom cast receiver that can be configured using environment variables. The main intention is to be able to quickly spin up a cast receiver without having to write any code.
JavaScript
5
star
65

vmaf-analyze

Python
4
star
66

player-analytics-specification

TypeScript
4
star
67

hls-duration

NPM library to calculate duration of an HLS
TypeScript
4
star
68

webrtc-http-playback-protocol

WebRTC-HTTP playback protocol specification
HTML
4
star
69

multiview-sync

A simple library to play multiple videos in sync
TypeScript
4
star
70

lambda-hls-rewrite

Lambda function for rewriting HLS manifests
TypeScript
4
star
71

docker-html5player

A Docker containerized HTML5 player based on Shaka Player
JavaScript
4
star
72

eye-recommender

A simple similarity based recommendation engine and NPM module built on top of Node.js and Redis.
JavaScript
4
star
73

tvos-swift-example-app

Swift
4
star
74

whpp-client

WebRTC HTTP Playback Protocol client library
TypeScript
4
star
75

iaf-plugin-aws

Ingest application framework plugin for upload and transcode in AWS
TypeScript
4
star
76

lambda-protect-hls

Lambda function for handling restricted access to HLS
TypeScript
4
star
77

av1-player

Eyevinn AV1 player
C
4
star
78

hls-repeat

NPM library to repeat an HLS VOD and create a new HLS VOD
JavaScript
4
star
79

eyevinn-player

Throttled video player to test video streams
JavaScript
4
star
80

schedule-service

Service for automatic schedule creation
TypeScript
3
star
81

id3.js

Simple ID3 parser in Javascript
JavaScript
3
star
82

is-drm-supported

HTML5 MSE DRM support detection library
TypeScript
3
star
83

hls-cutsegment

A web app that lets you insert a cut into a segment, which is then cut into two new segments.
JavaScript
3
star
84

ftpdownload

Python FTP download client - more stable than FTP.retrbinary
Python
3
star
85

lambda-vast-transformer

Lambda function for modifying a VAST/VMAP XML on-the-fly
TypeScript
3
star
86

node-webrtc

WebRTC util modules for NodeJS
JavaScript
3
star
87

autovmaf-api

API to orchestrate AutoVMAF jobs
TypeScript
3
star
88

docker-serve

A simple Python based HTTP server that sets CORS allow headers. Useful for streaming from files on local computer
Python
3
star
89

adxchange-engine

Eyevinn Adxchange Engine is a microservice placed between the server-side ad-insertion component and the adserver or SSP
JavaScript
3
star
90

channel-engine-multiview

A multiview frontend for Eyevinn Channel Engine
JavaScript
3
star
91

hls-vodtovod

NPM library to concatenate multiple HLS VODs into a new VOD
TypeScript
3
star
92

csai-manager

Simple class to conduct and manage the relation between content playback and an ad experience applied on top - client side.
TypeScript
3
star
93

encore-test-profiles

Some basic transcoding profiles for encore
2
star
94

mpd-whep

TypeScript
2
star
95

videomapp

Plot videos in a Youtube playlist on a map
JavaScript
2
star
96

EXJOBB

Python
2
star
97

udp-webrtc

NPM library for a UDP to WebRTC server
JavaScript
2
star
98

html5-hls-player

JavaScript
2
star
99

dev-lambda

Libraries to facilitate development of Lambda functions
TypeScript
2
star
100

docker-jit-capture

A Docker container for an open source Just-In-Time Capture Origin
ApacheConf
2
star