• Stars
    star
    693
  • Rank 65,262 (Top 2 %)
  • Language
  • License
    MIT License
  • Created about 10 years ago
  • Updated about 10 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Corpus of data automatically shared with Apple by a standard installation of OS X Yosemite.

E.T. Phone Home?

This repository provides a corpus of network communications automatically sent to Apple by OS X Yosemite; we're using this dataset to explore how Yosemite shares user data with Apple.

The provided data was collected using our Net Monitor toolkit; more information regarding usage and methodology is provided below.

Examples

The following occur with all privacy options enabled -- including disabling analytics (i.e., Diagnostics and Usage Data).

About this Mac

When the user selects 'About this Mac' from the Apple menu, Yosemite phones home and s_vi, a unique analytics identifier, is [included in the request](eff-user-r0/Applications/Utilities/System Information.app/Contents/MacOS/System Information/20141019T192957Z-effuser-[172.16.174.146]:49495-[23.3.12.195]:80.log). (s_vi is used by Adobe/Omniture's analytics software).

If we search the logs for the cookie value, we can find:

  • Where the identifying cookie was first set -- when the user visited http://www.apple.com in Safari, with an expiration of two years.
  • Where else the cookie is sent to Apple -- for example, when both Spotlight and Help phone home.

DuckDuckGo for Privacy

Having read DuckDuckGo's privacy statements, you might decide to switch Safari's default search to DuckDuckGo. If we enter a new search in Safari, we can then search the logged data to see who the search terms are actually sent to.

The logs show that a copy of your Safari searches are still sent to Apple, even when selecting DuckDuckGo as your search provider, and 'Spotlight Suggestions' are disabled in System Preferences > Spotlight.

Non-Cloud Mail Account

When setting up a new Mail.app account for the address [email protected], which is hosted locally, searching the logs for "fix-macosx.com" shows that Mail quietly sends the domain entered by the user to Apple, too.

Methodology, Usage, and Caveats

Two different datasets are provided; these were generated in independent VMs with fresh installs of Mac OS X Yosemite:

  • eff-user-r0

    • All data sharing options disabled.
    • Location services disabled.
    • iCloud not used.
    • No Apple ID used.
    • DuckDuckGo selected as Safari search engine
  • icloud-user-r0

    • Installed with all default options, including sending of "Diagnostics and Usage Data".
    • iCloud and most iCloud features enabled, including iCloud drive.

All TCP/SSL connections are logged with one file per connection: <application path>/<iso 8601 time>-<username>-<src addr>-<dest-addr>.log Non-TCP traffic (such as UDP, ICMP) is logged in pcap format in udp-monitor/*.pcap.

Caveats

  • This data was collected over the course of a few hours, and with only minimal interaction with the system and applications. It is not a complete representative set of all data potentially collected by Yosemite; for example:
    • icloud-user-r0 dataset does not contain the diagnostics data periodically sent to Apple.
    • Cursory usage means that application-specific logs are not representative -- e.g., when setting up a Mail account, we only entered information on the first screen.
  • Correlation of sockets with file system executable paths is reasonably accurate; actual correspondance should be sanity checked (we've seen cases where proc_pidpath() returned paths for processes that could not be running).
  • TLS traffic using client certificates cannot be captured in plaintext by default. For example, NM captures the key exchange performed by apsd (Apple Push Services Daemon), that establishes a client certificate, but NM can't transparently sniff future communications protected by that certificate without the addition of apsd-specific protocol handling.
  • Not all traffic is logged in plaintext, so the lack of a match on a search should not be treated as conclusive; it may be necessary to decode data that was encoded for transmission via URL encoding, base64, protobuf, etc.

Contributing

Help is requested in all of the following areas:

  • Finding and documenting privacy issues.
  • Enhanced automated dataset visualization/decoding.
  • Adding application-specific support for processes using client-certificates to SSLsplit.
  • Automated (re-)generation of the datasets (e.g, scripting installation and application use).
  • Using net-monitor to gather data from AirDrop, Handoff, and other technologies that are difficult to run in a VM environment.
  • Exploring work-arounds (e.g., sandboxing, firewalling).