Facebook Message Scraper
A simple python script to download the entire conversation from Facebook, not limited like the one in the data dump provided by Facebook
Outputs the conversation in a JSON format, as well as the JSON for each individual chunk.
Initial Setup
Run for both dumper.py
and group_dumper.py
- In Chrome, open facebook.com/messages and open any conversation with a fair number of messages
- Open the network tab of the Chrome Developer tools
- Scroll up in the conversation until the page attempts to load previous messages
- Look for the POST request to thread_info.php
- You need to copy certain parameters from this request into the python script to complete the setup:
- Set the
cookie
value to the value you see in Chrome underRequest Headers
- Set the
__user
value to the value you see in Chrome underForm Data
- Set the
__a
value to the value you see in Chrome underForm Data
- Set the
__dyn
value to the value you see in Chrome underForm Data
- Set the
__req
value to the value you see in Chrome underForm Data
- Set the
fb_dtsg
value to the value you see in Chrome underForm Data
- Set the
ttstamp
value to the value you see in Chrome underForm Data
- Set the
__rev
value to the value you see in Chrome underForm Data
You're now all set to start downloading messages.
Downloading Messages
- Get the conversation ID for those messages by opening http://graph.facebook.com/{username-of-chat-partner}
- Copy the
id
value from there - For group conversations, the ID can be retrieved from the messages tab, as part of the URL. You must use
group_dumper.py
instead. - Run the command
python dumper.py {id} 2000
, and put the value you retrieved for ID earlier
Messages are saved by default to Messages/{id}/
Known Issues
The script sometimes has trouble with very large conversations (>100k messages). Facebook seems to rate limit this, and returns empty responses. In such cases, the script will retry after 30s until it gets a valid response.
It may take the script several tries to get a valid response. DO NOT PANIC.
Interrupting the execution before completion only leaves the JSON chunks, not the stitched file.