telegram-history-dump
This utility is the successor of telegram-json-backup, written from the ground up in Ruby. It can create backups of your Telegram user and (super)group dialogs using telegram-cli's remote control feature.
Compared to the old project, telegram-history-dump:
- Has better support for media downloads
- Supports output formats other than JSON and is extensible with custom formats
- Supports incremental backup (only new messages are downloaded)
- Does not depend on unstable Python/Lua bindings within telegram-cli
- Has a separate YAML formatted configuration file
The default configuration will backup all dialogs to a directory named output
,
in JSON format, without downloading any media.
Usage
First time setup
- Compile telegram-cli, start it once to link your Telegram account
- Make sure Ruby 2+ is installed on your system:
ruby --version
- Optionally configure your backup routine by editing
config.yaml
Performing a backup
- Start telegram-cli with at least the following options:
telegram-cli --json -P 9009
- While telegram-cli is running, execute the script:
ruby telegram-history-dump.rb
Formatters
History will always be stored in JSON Lines compliant files. However, additional output formats can be produced by uncommenting a few lines in the configuration file.
You can enable one or more of the following formatter modules:
html
creates styled, paginated chat logs vieweable with a web browser.
plaintext
creates human-readable text files, organized as one file per day.
bare
outputs only the actual message texts without any context. It is meant
for linguistic / statistical analysis.
pisg
creates daily logs compatible with the EnergyMech IRC logging format as
input for the PISG chat statistics generator. Also see telegram-pisg.
You can also implement a custom formatter; see
formatters/lib/formatter_base.rb
for details.
Command line options
Most of the backup configuration is done through the config file, but a few specific options are available as CLI options. None of them are mandatory.
Usage: telegram-history-dump.rb [options]
-c, --config=cfg.yaml Path to YAML configuration file
-k, --kill-tg Kill telegram-cli after backup
-h, --help Show help
-d, --dir=DIR Subdirectory for output files
(relative to backup_dir in YAML config)
-l, --limit=LIMIT Maximum number of messages to backup
for each target (overrides YAML config)
Notes
Usage notes:
- It is possible to run telegram-cli on a different machine, e.g. as a daemon
on a server. In this case you must pass
--accept-any-tcp
to telegram-cli and firewall the port appropriately to prevent unwanted exposure. Keep in mind that some options regarding media files will not work in a remote setup. - Be careful with decreasing
chunk_delay
or increasingchunk_size
. Telegram seems to rate limit history requests. Going too fast may cause an operation to time out and force the script to skip part of a dump.
Telegram-cli issues known to affect telegram-history-dump:
- vysheng/tg#947 can cause crashes when dumping channels with more than 100 messages.
- vysheng/tg#904 can cause crashes when dialogs contain certain media files. If you get this, recompile telegram-cli with the suggested workaround.