E.T. Phone Home?
This repository provides a corpus of network communications automatically sent to Apple by OS X Yosemite; we're using this dataset to explore how Yosemite shares user data with Apple.
The provided data was collected using our Net Monitor toolkit; more information regarding usage and methodology is provided below.
Examples
The following occur with all privacy options enabled -- including disabling analytics (i.e., Diagnostics and Usage Data).
About this Mac
When the user selects 'About this Mac' from the Apple menu, Yosemite phones home and s_vi
, a unique analytics identifier, is [included in the request](eff-user-r0/Applications/Utilities/System Information.app/Contents/MacOS/System Information/20141019T192957Z-effuser-[172.16.174.146]:49495-[23.3.12.195]:80.log). (s_vi
is used by Adobe/Omniture's analytics software).
If we search the logs for the cookie value, we can find:
- Where the identifying cookie was first set -- when the user visited http://www.apple.com in Safari, with an expiration of two years.
- Where else the cookie is sent to Apple -- for example, when both Spotlight and Help phone home.
DuckDuckGo for Privacy
Having read DuckDuckGo's privacy statements, you might decide to switch Safari's default search to DuckDuckGo. If we enter a new search in Safari, we can then search the logged data to see who the search terms are actually sent to.
The logs show that a copy of your Safari searches are still sent to Apple, even when selecting DuckDuckGo as your search provider, and 'Spotlight Suggestions' are disabled in System Preferences > Spotlight.
Non-Cloud Mail Account
When setting up a new Mail.app account for the address [email protected]
, which is hosted locally, searching the
logs for "fix-macosx.com" shows that Mail quietly sends the domain entered by the user to Apple, too.
Methodology, Usage, and Caveats
Two different datasets are provided; these were generated in independent VMs with fresh installs of Mac OS X Yosemite:
-
eff-user-r0
- All data sharing options disabled.
- Location services disabled.
- iCloud not used.
- No Apple ID used.
- DuckDuckGo selected as Safari search engine
-
icloud-user-r0
- Installed with all default options, including sending of "Diagnostics and Usage Data".
- iCloud and most iCloud features enabled, including iCloud drive.
All TCP/SSL connections are logged with one file per connection: <application path>/<iso 8601 time>-<username>-<src addr>-<dest-addr>.log
Non-TCP traffic (such as UDP, ICMP) is logged in pcap format in udp-monitor/*.pcap
.
Caveats
- This data was collected over the course of a few hours, and with only minimal interaction with the system and applications. It is
not a complete representative set of all data potentially collected by Yosemite; for example:
icloud-user-r0
dataset does not contain the diagnostics data periodically sent to Apple.- Cursory usage means that application-specific logs are not representative -- e.g., when setting up a Mail account, we only entered information on the first screen.
- Correlation of sockets with file system executable paths is reasonably accurate; actual correspondance should be sanity checked (we've
seen cases where
proc_pidpath()
returned paths for processes that could not be running). - TLS traffic using client certificates cannot be captured in plaintext by default. For example, NM captures the key exchange performed by apsd (Apple Push Services Daemon), that establishes a client certificate, but NM can't transparently sniff future communications protected by that certificate without the addition of apsd-specific protocol handling.
- Not all traffic is logged in plaintext, so the lack of a match on a search should not be treated as conclusive; it may be necessary to decode data that was encoded for transmission via URL encoding, base64, protobuf, etc.
Contributing
Help is requested in all of the following areas:
- Finding and documenting privacy issues.
- Enhanced automated dataset visualization/decoding.
- Adding application-specific support for processes using client-certificates to SSLsplit.
- Automated (re-)generation of the datasets (e.g, scripting installation and application use).
- Using net-monitor to gather data from AirDrop, Handoff, and other technologies that are difficult to run in a VM environment.
- Exploring work-arounds (e.g., sandboxing, firewalling).