• Stars
    star
    298
  • Rank 139,663 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Provides tools to analyze hashtags within posts scraped from TikTok.

TikTok hashtag analysis toolset

IMPORTANT NOTE: this tool relies on drawrowfly/tiktok-scraper which seems to be broken at time of writing and without updates for some time with several open issues (796 #799) that need to be fixed before this library can work smoothly :/

The tool helps to download posts and videos from TikTok for a given set of hashtags over a period of time. Users can create a growing database of posts for specific hashtags which can then be used for further hashtag analysis. It uses the tiktok-scraper Node package to download the posts and videos.

PyPI version

Pre-requisites

  1. Make sure you have Python 3.6 or a later version installed
  2. And, you need to have node version 16. On Mac, do brew install node followed by npm install -g n and then n 16
  3. Download and install TikTok scraper: https://github.com/drawrowfly/tiktok-scraper
  4. Install the tool with pip: pip install tiktok-hashtag-analysis
    1. or directly from the repo version: pip install git+https://github.com/bellingcat/tiktok-hashtag-analysis

You should now be ready to start using it.

About the tool

Command-line arguments

tiktok-hashtag-analysis --help
usage: tiktok-hashtag-analysis [-h] [-t [T ...]] [-f F] [-p] [-v] [-ht HASHTAG] [-n NUMBER] [-plt] [-d] {download,frequencies}

Analyze hashtags within posts scraped from TikTok.

positional arguments:
  {download,frequencies}
                        command to initialize

options:
  -h, --help            show this help message and exit
  -t [T ...]            List of hashtags to scrape (module: run_downloader)
  -f F                  File name containing list of hashtags to scrape (module: run_downloader)
  -p                    Download post data (module: run_downloader)
  -v                    Download video files (module: run_downloader)
  -ht HASHTAG, --hashtag HASHTAG
                        The hashtag of scraped posts to analyze (module: hashtag_frequencies)
  -n NUMBER, --number NUMBER
                        The number of top n occurrences (module: hashtag_frequencies)
  -plt, --plot          Plot the occurrences (module: hashtag_frequencies)
  -d, --print           List top n hashtags (module: hashtag_frequencies)

Structure of output data

$ tree ../data
../data
โ”œโ”€โ”€ ids
โ”‚   โ””โ”€โ”€ post_ids.json
โ”œโ”€โ”€ london
โ”‚   โ””โ”€โ”€ posts
โ”‚       โ””โ”€โ”€ data.json
โ”œโ”€โ”€ newyork
โ”‚   โ””โ”€โ”€ posts
โ”‚       โ””โ”€โ”€ data.json
โ””โ”€โ”€ paris
    โ””โ”€โ”€ posts
        โ””โ”€โ”€ data.json

The data folder contains all the downloaded data as shown in the tree diagram above.

  • The ids folder contains two files post_ids.json and video_ids.json that record the ids of the downloaded posts and videos for each hashtag.
  • Each hashtag has a folder with two subfolders posts and videos that store posts and videos respectively. The posts are stored in the data.json file in the posts folder, and videos are stored as the .mp4 files in the videos folder.

How to use

Post downloading

Running the tiktok-hashtag-analysis download command with the following options will scrape posts containing the hashtags #london, #paris, or #newyork:

tiktok-hashtag-analysis download -t london paris newyork -p

and will produce an output similar to the following log:

$ tiktok-hashtag-analysis download -t london paris newyork -p
Hashtags to scrape: ['london', 'paris', 'newyork']
Scraped 963 posts containing the hashtag 'london'
Scraped 961 posts containing the hashtag 'paris'
Scraped 940 posts containing the hashtag 'newyork'
Successfully scraped 2864 total entries
  • The -t flag allows a space-separated list of hashtags to be specified as a command line argument
  • The -p flag specifies that posts, not videos, will be downloaded

Video downloading

Running the tiktok-hashtag-analysis download script with the following options will scrape trending videos containing the hashtag #london: tiktok-hashtag-analysis download -t london -v

  • The -t flag allows a space-separated list of hashtags to be specified as a command line argument
  • The -v flag specifies that videos, not posts, will be downloaded

Note that video downloading is a time and data rate consuming task, as a result we recommend using one hashtag at a time when using the -v flag to avoid complications.

Analyzing results

Top n hashtag occurrences

The script tiktok-hashtag-analysis frequencies analyzes the frequencies of top occurring hashtags in a given set of posts.

Assume we want to analyze the 20 most frequently occurring hashtags in the downloaded posts of the #london hashtag.

  • The results can be plotted and saved as a PNG file by executing the following command:

    tiktok-hashtag-analysis frequencies london 20 -p

    which will produce a figure similar to that shown below:

    Top 20 most frequent common hashtags in posts containing the #london hashtag

    In the above plot, the highest occurrence is the #fyp hashtag, which is tagged in more than half of all posts containing the #london hashtag.

  • The results can be displayed in tabular form by executing the following command:

    tiktok-hashtag-analysis frequencies london 20 -d

    which will produce a terminal output similar to the following:

    Rank     Hashtag                        Occurrences     Frequency
    0        london                         960             1.0000
    1        fyp                            494             0.5146
    2        uk                             238             0.2479
    3        foryou                         221             0.2302
    4        foryoupage                     184             0.1917
    5        viral                          179             0.1865
    6        fypใ‚ท                           84              0.0875
    7        funny                          56              0.0583
    8        xyzbca                         51              0.0531
    9        british                        45              0.0469
    10       england                        44              0.0458
    11       trending                       40              0.0417
    12       fy                             33              0.0344
    13       comedy                         32              0.0333
    14       roadman                        28              0.0292
    15       4u                             27              0.0281
    16       usa                            26              0.0271
    17       tiktok                         26              0.0271
    18       travel                         21              0.0219
    19       america                        20              0.0208
    Total posts: 960
    

    The Frequency column shows the ratio of the occurrence to the total number of downloaded posts.

More Repositories

1

octosuite

GitHub Data Analysis Framework.
Python
1,786
star
2

telegram-phone-number-checker

Check if phone numbers are connected to Telegram accounts.
Python
1,056
star
3

instagram-location-search

Finds Instagram location IDs near a specified latitude and longitude.
Python
548
star
4

auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
Python
532
star
5

sar-interference-tracker

A Google Earth Engine tool for identifying satellite radar interference.
JavaScript
519
star
6

open-questions

Want to contribute? These are difficult, long-term projects that could be valuable to open source investigators at Bellingcat and around the world.
Jupyter Notebook
328
star
7

ukraine-timemap

TimeMap instance for Civilian Harm in Ukraine
JavaScript
243
star
8

ShadowFinder

Find possible locations of shadows around the world
Python
223
star
9

open-source-research-notebooks

Jupyter notebooks helping open source researchers, journalists, and fact-checkers use command line tools and code projects for digital investigations.
Jupyter Notebook
195
star
10

wayback-google-analytics

A lightweight tool for scraping current and historic Google Analytics data
Python
181
star
11

osm-search

A user friendly way to search OpenStreetMap data for features in proximity to each other.
Vue
161
star
12

EDGAR

Tool for the retrieval of corporate and financial data from the SEC
Python
105
star
13

reddit-post-scraping-tool

Given a subreddit name and a keyword, this program returns all top (by default) posts that contain the specified keyword.
Visual Basic .NET
80
star
14

whisperbox-transcribe

Easy to deploy API for transcribing and translating audio / video using OpenAI's whisper model.
Python
59
star
15

cloud-free-subregion

Google Earth Engine application that finds Sentinel-2 images that are cloud-free in a particular area of interest.
JavaScript
54
star
16

tiktok-timestamp

A tiny client side tool that retrieves the timestamp from Tiktok videos.
HTML
45
star
17

name-variant-search

A tool for searching common variations of a human name
JavaScript
40
star
18

vk-url-scraper

Scrape VK URLs to fetch info and media - python API or command line tool.
Python
40
star
19

knewkarma

A Reddit data analysis toolkit
Python
39
star
20

avoc

Working repo for the 2024 Bellingcat Tech Fellowship.
CSS
36
star
21

geoclustering

Command-line tool for clustering geolocations ๐Ÿ“
Python
30
star
22

uniform-timezone

Extension to standardize dates and times to the same timezone across social media websites.
JavaScript
30
star
23

facebook-downloader

Facebook video downloader
Python
26
star
24

twitter-geocode-searches

Analysis for "Geofenced Searches on Twitter: A Case Study Detailing South Asiaโ€™s Covid Crisis", published on May 19, 2021.
HTML
24
star
25

RS4OSINT

Guide to Remote Sensing for OSINT
TeX
23
star
26

google-apps-script

A collection of handy Google Apps Script code snippets
JavaScript
21
star
27

telegram-group-joiner

Online tool to automatically join public/private telegram groups.
JavaScript
18
star
28

cisticola

Coordinates scrapers and interfaces with database
Python
17
star
29

youtube-comment-scraper

A script to scrape youtube comments and checks whether a user commented on all of the given videos
Python
17
star
30

alias-generator

Node module to generate likely aliases for a given human name
JavaScript
16
star
31

polyphemus

Scraper for Odysee: alt-tech platform for sharing video
Python
14
star
32

quitobaquito

Methodology for "The Disappearance of Quitobaquito Springs: Tracking Hydrologic Change with Google Earth Engine," published on October 1, 2020.
Jupyter Notebook
12
star
33

hackathon-submission-template

Template repository and README for submissions to Bellingcat's Global Hackathon
9
star
34

o9a-product-scripts

Scripts used in research for a Bellingcat article about the Order of Nine Angles
Python
6
star
35

likee-downloader

A program for downloading videos from Likee, given a username
Python
4
star
36

gesara-entity-viz

Generates an interactive visualisation of named entities in English-language posts archived in a database of Telegram channels that have posted about the GESARA conspiracy theory.
TypeScript
3
star
37

vis-tj-kg-map-2022

Interactive map for the Tajikistan-Kyrgyzstan Border Clash 2022
JavaScript
2
star
38

search-grid-generator

A Vue App for quickly generating KML Search Grids
Vue
2
star
39

smart-image-sorter

User friendly zero-shot image classification using open-source models from HuggingFace's library
Jupyter Notebook
2
star
40

coronavirus-aid-data

Data for "What Restaurants and Maps Can Tell us About Billions of Dollars of Covid-19 Relief Funds," published on December 4, 2020.
2
star
41

who-killed-abelardo

visualization of audios in map
Vue
1
star
42

.github

Community health files and organization profile for @bellingcat
1
star