• This repository has been archived on 05/Oct/2019
  • Stars
    star
    1,858
  • Rank 24,963 (Top 0.5 %)
  • Language
    Python
  • License
    Other
  • Created over 10 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A forensic evidence collection & analysis toolkit for OS X

osxcollector

Build Status PyPI

OSXCollector Manual

OSXCollector is a forensic evidence collection & analysis toolkit for OSX.

Forensic Collection

The collection script runs on a potentially infected machine and outputs a JSON file that describes the target machine. OSXCollector gathers information from plists, SQLite databases and the local file system.

Forensic Analysis

Armed with the forensic collection, an analyst can answer the question like:

  • Is this machine infected?
  • How'd that malware get there?
  • How can I prevent and detect further infection?

Yelp automates the analysis of most OSXCollector runs converting its output into an easily readable and actionable summary of just the suspicious stuff. Check out OSXCollector Output Filters project to learn how to make the most of the automated OSXCollector output analysis.

Performing Collection

osxcollector.py is a single Python file that runs without any dependencies on a standard OSX machine. This makes it really easy to run collection on any machine - no fussing with brew, pip, config files, or environment variables. Just copy the single file onto the machine and run it:

sudo osxcollector.py is all it takes.

$ sudo osxcollector.py
Wrote 35394 lines.
Output in osxcollect-2014_12_21-08_49_39.tar.gz

If you have just cloned the GitHub repository, osxcollector.py is inside osxcollector/ directory, so you need to run it as:

$ sudo osxcollector/osxcollector.py

IMPORTANT: please make sure that python command on your Mac OS X machine uses the default Python interpreter shipped with the system and is not overridden, e.g. by the Python version installed through brew. OSXCollector relies on a couple of native Python bindings for OS X libraries, which might be not available in other Python versions than the one originally installed on your system. Alternatively, you can run osxcollector.py explicitly specifying the Python version you would like to use:

$ sudo /usr/bin/python2.7 osxcollector/osxcollector.py

The JSON output of the collector, along with some helpful files like system logs, has been bundled into a .tar.gz for hand-off to an analyst.

osxcollector.py also has a lot of useful options to change how collection works:

  • -i INCIDENT_PREFIX/--id=INCIDENT_PREFIX: Sets an identifier which is used as the prefix of the output file. The default value is osxcollect.

    $ sudo osxcollector.py -i IncontinentSealord
    Wrote 35394 lines.
    Output in IncontinentSealord-2014_12_21-08_49_39.tar.gz

    Get creative with incident names, it makes it easier to laugh through the pain.

  • -p ROOTPATH/--path=ROOTPATH: Sets the path to the root of the filesystem to run collection on. The default value is /. This is great for running collection on the image of a disk.

    $ sudo osxcollector.py -p '/mnt/powned'
  • -s SECTION/--section=SECTION: Runs only a portion of the full collection. Can be specified more than once. The full list of sections and subsections is:

    • version
    • system_info
    • kext
    • startup
      • launch_agents
      • scripting_additions
      • startup_items
      • login_items
    • applications
      • applications
      • install_history
    • quarantines
    • downloads
      • downloads
      • email_downloads
      • old_email_downloads
    • chrome
      • history
      • archived_history
      • cookies
      • login_data
      • top_sites
      • web_data
      • databases
      • local_storage
      • preferences
    • firefox
      • cookies
      • downloads
      • formhistory
      • history
      • signons
      • permissions
      • addons
      • extension
      • content_prefs
      • health_report
      • webapps_store
      • json_files
    • safari
      • downloads
      • history
      • extensions
      • databases
      • localstorage
      • extension_files
    • accounts
      • system_admins
      • system_users
      • social_accounts
      • recent_items
    • mail
    • full_hash
    $ sudo osxcollector.py -s 'startup' -s 'downloads'
  • -c/--collect-cookies: Collect cookies' value. By default OSXCollector does not dump the value of a cookie, as it may contain sensitive information (e.g. session id).

  • -l/--collect-local-storage: Collect the values stored in web browsers' local storage. By default OSXCollector does not dump the values as they may contain sensitive information.

  • -d/--debug: Enables verbose output and python breakpoints. If something is wrong with OSXCollector, try this.

    $ sudo osxcollector.py -d

Details of Collection

The collector outputs a .tar.gz containing all the collected artifacts. The archive contains a JSON file with the majority of information. Additionally, a set of useful logs from the target system logs are included.

Common Keys

Every Record

Each line of the JSON file records 1 piece of information. There are some common keys that appear in every JSON record:

  • osxcollector_incident_id: A unique ID shared by every record.
  • osxcollector_section: The section or type of data this record holds.
  • osxcollector_subsection: The subsection or more detailed descriptor of the type of data this record holds.
File Records

For records representing files there are a bunch of useful keys:

  • atime: The file accessed time.
  • ctime: The file creation time.
  • mtime: The file modified time.
  • file_path: The absolute path to the file.
  • md5: MD5 hash of the file contents.
  • sha1: SHA1 hash of the file contents.
  • sha2: SHA2 hash of the file contents.

For records representing downloaded files:

  • xattr-wherefrom: A list containing the source and referrer URLs for the downloaded file.
  • xattr-quarantines: A string describing which application downloaded the file.
SQLite Records

For records representing a row of a SQLite database:

  • osxcollector_table_name: The table name the row comes from.
  • osxcollector_db_path: The absolute path to the SQLite file.

For records that represent data associated with a specific user:

  • osxcollector_username: The name of the user

Timestamps

OSXCollector attempts to convert timestamps to human readable date/time strings in the format YYYY-mm-dd hh:MM:ss. It uses heuristics to automatically identify various timestamps:

  • seconds since epoch
  • milliseconds since epoch
  • seconds since 2001-01-01
  • seconds since 1601-01-01

Sections

version section

The current version of OSXCollector.

system_info section

Collects basic information about the system:

  • system name
  • node name
  • release
  • version
  • machine
kext section

Collects the Kernel extensions from:

  • /System/Library/Extensions
  • /Library/Extensions
startup section

Collects information about the LaunchAgents, LaunchDaemons, ScriptingAdditions, StartupItems and other login items from:

  • /System/Library/LaunchAgents
  • /System/Library/LaunchDaemons
  • /Library/LaunchAgents
  • ~/Library/LaunchAgents
  • /Library/LaunchDaemons
  • /System/Library/ScriptingAdditions
  • /Library/ScriptingAdditions
  • /System/Library/StartupItems
  • /Library/StartupItems
  • ~/Library/Preferences/com.apple.loginitems.plist

More information about the Max OS X startup can be found here: http://www.malicious-streams.com/article/Mac_OSX_Startup.pdf

applications section

Hashes installed applications and gathers install history from:

  • /Applications
  • ~/Applications
  • /Library/Receipts/InstallHistory.plist
quarantines section

Quarantines are basically the info necessary to show the 'Are you sure you wanna run this?' when a user is trying to open a file downloaded from the Internet. For some more details, checkout the Apple Support explanation of Quarantines: http://support.apple.com/kb/HT3662

This section collects also information from XProtect hash-based malware check for quarantines files. The plist is at: /System/Library/CoreServices/CoreTypes.bundle/Contents/Resources/XProtect.plist

XProtect also add minimum versions for Internet Plugins. That plist is at: /System/Library/CoreServices/CoreTypes.bundle/Contents/Resources/XProtect.meta.plist

downloads section

Hashes all users' downloaded files from:

  • ~/Downloads
  • ~/Library/Mail Downloads
  • ~/Library/Containers/com.apple.mail/Data/Library/Mail Downloads
chrome section

Collects following information from Google Chrome web browser:

  • History
  • Archived History
  • Cookies
  • Extensions
  • Login Data
  • Top Sites
  • Web Data

This data is extracted from ~/Library/Application Support/Google/Chrome/Default

firefox section

Collects information from the different SQLite databases in a Firefox profile:

  • Cookies
  • Downloads
  • Form History
  • History
  • Signons
  • Permissions
  • Addons
  • Extensions
  • Content Preferences
  • Health Report
  • Webapps Store

This information is extracted from ~/Library/Application Support/Firefox/Profiles

For more details about Firefox profile folder see http://kb.mozillazine.org/Profile_folder_-_Firefox

safari section

Collects information from the different plists and SQLite databases in a Safari profile:

  • Downloads
  • History
  • Extensions
  • Databases
  • Local Storage
accounts section

Collects information about users' accounts:

  • system admins: /private/var/db/dslocal/nodes/Default/groups/admin.plist
  • system users: /private/var/db/dslocal/nodes/Default/users
  • social accounts: ~/Library/Accounts/Accounts3.sqlite
  • users' recent items: ~/Library/Preferences/com.apple.recentitems.plist
mail section

Hashes files in the mail app directories:

  • ~/Library/Mail
  • ~/Library/Mail Downloads
full_hash section

Hashes all the files on disk. All of 'em. This does not run by default. It must be triggered with:

$ sudo osxcollector.py -s full_hash

Basic Manual Analysis

Forensic analysis is a bit of art and a bit of science. Every analyst will see a bit of a different story when reading the output from OSXCollector. That's part of what makes analysis fun.

Generally, collection is performed on a target machine because something is hinky: anti-virus found a file it doesn't like, deep packet inspect observed a callout, endpoint monitoring noticed a new startup item. The details of this initial alert - a file path, a timestamp, a hash, a domain, an IP, etc. - that's enough to get going.

Timestamps

Simply greping a few minutes before and after a timestamp works great:

$ cat INCIDENT32.json | grep '2014-01-01 11:3[2-8]'

Browser History

It's in there. A tool like jq can be very helpful to do some fancy output:

$ cat INCIDENT32.json | grep '2014-01-01 11:3[2-8]' | jq 'select(has("url"))|.url'

A Single User

$ cat INCIDENT32.json | jq 'select(.osxcollector_username=="ivanlei")|.'

Automated Analysis

The OSXCollector Output Filters project contains filters that process and transform the output of OSXCollector. The goal of filters is to make it easy to analyze OSXCollector output.

Development Tips

The functionality of OSXCollector is stored in a single file: osxcollector.py. The collector should run on a naked install of OS X without any additional packages or dependencies.

Ensure that all of the OSXCollector tests pass before editing the source code. You can run the tests using: make test

After making changes to the source code, run make test again to verify that your changes did not break any of the tests.

License

This work is licensed under the GNU General Public License and a derivation of https://github.com/jipegit/OSXAuditor

Blog post

Presentations

External Presentations

Resources

Want to learn more about OS X forensics?

A couple of other interesting tools:

  • KnockKnock - KnockKnock is a command line python script that displays persistent OS X binaries that are set to execute automatically at each boot.
  • Grr - Google Rapid Response: remote live forensics for incident response
  • osquery - SQL powered operating system instrumentation, monitoring, and analytics

More Repositories

1

elastalert

Easy & Flexible Alerting With ElasticSearch
Python
7,926
star
2

dumb-init

A minimal init system for Linux containers
Python
6,806
star
3

detect-secrets

An enterprise friendly way of detecting and preventing secrets in code.
Python
3,704
star
4

mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
Python
2,615
star
5

paasta

An open, distributed platform as a service
Python
1,681
star
6

undebt

A fast, straightforward, reliable tool for performing massive, automated code refactoring
Python
1,634
star
7

MOE

A global, black box optimization engine for real world metric optimization.
C++
1,306
star
8

dockersh

A shell which places users into individual docker containers
Go
1,282
star
9

dataset-examples

Samples for users of the Yelp Academic Dataset
Python
1,189
star
10

yelp.github.io

A showcase of projects we've open sourced and open source projects we use
JavaScript
701
star
11

bravado

Bravado is a python client library for Swagger 2.0 services
Python
603
star
12

yelp-api

Examples of code using our v2 API
PHP
580
star
13

service-principles

A guide to service principles at Yelp for our service oriented architecture
423
star
14

swagger-gradle-codegen

💫 A Gradle Plugin to generate your networking code from Swagger
Kotlin
413
star
15

mysql_streamer

MySQLStreamer is a database change data capture and publish system.
Python
409
star
16

pyleus

Pyleus is a Python framework for developing and launching Storm topologies.
Python
406
star
17

yelp-fusion

Yelp Fusion API
Python
401
star
18

docker-custodian

Keep docker hosts tidy
Python
355
star
19

android-school

The best videos from the Android community and beyond
350
star
20

Tron

Next generation batch process scheduling and management
Python
340
star
21

kafka-utils

Python
313
star
22

bento

DEPRECATED - A delicious framework for building modularized Android user interfaces, by Yelp.
Kotlin
306
star
23

Testify

A more pythonic testing framework.
Python
303
star
24

clusterman

Cluster Autoscaler for Kubernetes and Mesos
Python
295
star
25

kotlin-android-workshop

A Kotlin Workshop for engineers familiar with Java and Android development.
Kotlin
288
star
26

threat_intel

Threat Intelligence APIs
Python
264
star
27

nrtsearch

A high performance gRPC server on top of Apache Lucene
Java
254
star
28

python-gearman

Gearman API - Client, worker, and admin client interfaces
Python
242
star
29

py_zipkin

Provides utilities to facilitate the usage of Zipkin in Python
Python
225
star
30

fuzz-lightyear

A pytest-inspired, DAST framework, capable of identifying vulnerabilities in a distributed, micro-service ecosystem through chaos engineering testing and stateful, Swagger fuzzing.
Python
205
star
31

yelp-python

A Python library for the Yelp API
Python
182
star
32

venv-update

Synchronize your virtualenv quickly and exactly.
Python
178
star
33

firefly

Firefly is a web application aimed at powerful, flexible time series graphing for web developers.
JavaScript
171
star
34

amira

AMIRA: Automated Malware Incident Response & Analysis
Python
150
star
35

aactivator

Automatically source and unsource a project's environment
Python
145
star
36

YLTableView

Objective-C
144
star
37

love

A system to share your appreciation
Python
142
star
38

lemon-reset

Consistent, cross-browser React DOM tags, powered by CSS Modules. 🍋
JavaScript
131
star
39

dataloader-codegen

🤖 dataloader-codegen is an opinionated JavaScript library for automatically generating DataLoaders over a set of resources (e.g. HTTP endpoints).
TypeScript
110
star
40

bravado-core

Python
109
star
41

data_pipeline

Data Pipeline Clientlib provides an interface to tail and publish to data pipeline topics.
Python
109
star
42

detect-secrets-server

Python
108
star
43

yelp-ruby

A Ruby gem for communicating with the Yelp REST API
Ruby
105
star
44

swagger_spec_validator

Python
104
star
45

ybinlogp

A fast mysql binlog parser
C
97
star
46

beans

Bringing people together, one cup of coffee at a time
Python
93
star
47

casper

A fast web application platform built in Rust and Luau
Rust
90
star
48

schematizer

A schema store service that tracks and manages all the schemas used in the Data Pipeline
Python
86
star
49

requirements-tools

requirements-tools contains scripts for working with Python requirements, primarily in applications.
Python
81
star
50

osxcollector_output_filters

Filters that process and transform the output of osxcollector
Python
77
star
51

sensu_handlers

Custom Sensu Handlers to support a multi-tenant environment, allowing checks themselves to emit the type of handler behavior they need in the event json
Ruby
75
star
52

graphql-guidelines

GraphQL @ Yelp Schema Guidelines
Makefile
74
star
53

kegmate

Arduino/iPad powered kegerator
Objective-C
72
star
54

ephemeral-port-reserve

Find an unused port, reliably
Python
68
star
55

parcelgen

Helpful tool to make data objects easier for Android
Python
65
star
56

salsa

A tool for exporting iOS components into Sketch 📱💎
Swift
62
star
57

yelp-ios

Objective-C
61
star
58

docker-observium

Observium docker image with both professional and community edition support, ldap auth, and easy plugin support.
ApacheConf
58
star
59

yelp-android

Java
55
star
60

terraform-provider-signalform

SignalForm is a terraform provider to codify SignalFx detectors, charts and dashboards
Go
44
star
61

mycroft

Python
42
star
62

pidtree-bcc

eBPF tool for logging process ancestry of outbound TCP connections
Python
41
star
63

terraform-provider-gitfile

Terraform provider for checking out git repositories and making changes
Go
40
star
64

ffmpeg-android

Shell
39
star
65

pushmanager

Pushmanager is a web application to manage source code deployments.
Python
38
star
66

zygote

A Python HTTP process management utility.
Python
38
star
67

yelp_kafka

An extension of the kafka-python package that adds features like multiprocess consumers.
Python
38
star
68

pgctl

Manage sets of developer services -- "playground control"
Python
31
star
69

EMRio

Elastic MapReduce instance optimizer
Python
31
star
70

s3mysqldump

Dump mysql tables to s3, and parse them
Python
31
star
71

android-varanus

A client-side Android library to monitor and limit network traffic sent by your apps
Kotlin
29
star
72

pyramid_zipkin

Pyramid tween to add Zipkin service spans
Python
29
star
73

puppet-netstdlib

A collection of Puppet functions for interacting with the network
Ruby
27
star
74

sqlite3dbm

sqlite-backed dictionary conforming to the dbm interface
Python
27
star
75

send_nsca

Pure-python NSCA client
Python
26
star
76

docker-push-latest-if-changed

Python
26
star
77

data_pipeline_avro_util

Provides a Pythonic interface for reading and writing Avro schemas
Python
26
star
78

cocoapods-readonly

Automatically locks all CocoaPod source files.
Ruby
26
star
79

uwsgi_metrics

Python
26
star
80

WebImageView

An enhanced and improved ImageView for Android that displays images loaded over the interwebs
Java
25
star
81

task_processing

Interfaces and shared infrastructure for generic task processing at Yelp.
Python
23
star
82

PushmasterApp

(Legacy) Yelp pushmaster application built on Google App Engine
Python
22
star
83

tlspretense-service

A Docker container that exposes tlspretense on a port.
Makefile
20
star
84

puppet-uchiwa

Puppet module for installing Uchiwa
Ruby
20
star
85

yelp_cheetah

cheetah, hacked by yelpers
Python
20
star
86

logfeeder

Python
20
star
87

fido

Asynchronous HTTP client built on top of Crochet and Twisted
Python
20
star
88

swagger-spec-compatibility

Python library to check Swagger Spec backward compatibility
Python
20
star
89

pyramid-hypernova

A Python client for Airbnb's Hypernova server, for use with the Pyramid web framework.
Python
19
star
90

mr3po

protocols for use with mrjob
Python
16
star
91

YPFastDateParser

A class for parsing strings into NSDate instances, several times faster than NSDateFormatter
Objective-C
15
star
92

yelp_uri

Utilities for dealing with URIs, invented and maintained by Yelp.
Python
14
star
93

pysensu-yelp

A Python library to emit Sensu events that the Yelp Sensu Handlers can understand for Self-Service Sensu Monitoring
Python
14
star
94

terraform-provider-cloudhealth

Terraform provider for Cloudhealth
Go
14
star
95

yelp-rails-example

An example Rails application that uses the Yelp gem to integrate with the API
Ruby
13
star
96

named_decorator

Dynamically name wrappers based on their callees to untangle profiles of large python codebases
Python
12
star
97

pt-online-schema-change-plugins

Perl
11
star
98

environment_tools

Tools for programmatically describing Yelp's different environments (prod, dev, stage)
Python
11
star
99

puppet-cron

A super great cron Puppet module with timeouts, locking, monitoring, and more!
Ruby
11
star
100

pyswf

Python
10
star