Dota: Radiant victory prediction

Link to competition.
Link to TO-DO.
Notebook to reproduce the results.

Problem description

Dota is a competitive game, where two teams of five players fight each-other. Before the game starts, each player chooses a hero and spawns on his base. Before the creeps appear, there is time to buy items from the shop using initial gold. Then, each player goes to his position on the map.

Data description

Dataset is in JSONL format - each line is a JSON object that describes one game. Available keys: ['game_time', 'match_id_hash', 'teamfights', 'objectives', 'chat', 'game_mode', 'lobby_type', 'players']. Furthermore, train_matches.jsonl contains additional key targets, which contains the following fields: ['game_time', 'duration', 'time_remaining', 'radiant_win', 'next_roshan_team']. radiant_win is our target.

Score is calculated using ROC AUC.

Game object

Game object consists of 8 (+1 for training matches) keys:

game_time - time at the moment of data collection
match_id_hash - game unique identifier
teamfights - list of fighting encounters between teams
objectives - ?
chat - list of encrypted chat messages in format (player_slot, time, message)
game_mode - type of game played (Majority of 22 - "All Draft")
lobby_type - type of game lobby created. (0 - Public matchmaking, 7 - Ranked)

Pre-processed features

There are 245 features in the dataset. First 5 of them are related to the game configuration. The remaining 240 describe the state of all 10 heros, each having 24 features, such as (x, y) location, hp, gold, etc.

Features engineered

General features

game_time
game phase based on time (early\mid\late)
game_mode
lobby_type

Chat

Chat is collected only before the current game_time.

Chat length
Number of messages per team
Number of messages per player

Teamfights

Since almost all information about teamfights is aggregated in players, we don't focus on feature extraction here.

Number of fights
Mean fight time (in sec.)
Mean number of deaths

Objectives

Num of objectives
Num of objectives per team
Num of Aegis per team
Num of Barracks killed per team
Firstblood indicator per team
Roshan kills per team
Tower denies per team
Tower kills per team

Player features

Features not used

player['obs'] - aggregates x, y coordinates of observations, obs_placed instead
player['sen'] - same reason, sen_placed instead
len of player['runes_log'] - player['rune_pickups']
len of kills_log - no need, there is counter for kills
len of obs_log
len of sen_log
player['observers_placed']
account_id_hash
runes
player[dn_t] - use the number of denies only
pred_vict - categorical, low variance
randomed - categorical, low variance
player[ability_upgrades]

Main features

level - categorical
kills
deaths
assists
denies
nearby_creep_death_count
obs_placed
sen_placed
creeps_stacked
camps_stacked
rune_pickups
teamfight_participation
max_health
health_frac = health / max_health
pings

Life_state

dead_sec
dead_frac
magic_sec
magic_frac

Reasons

net_gold
gold_reasons
gold_reasons_frac
net_xp
xp_reasons
xp_reasons_frac

Team-aggregated

Level difference

Items

len of hero_stash
add items from hero_stash
net worth

Logs

len of obs_left_log
len of sen_left_log
len of purchase_log
len of buyback_log

delta-time

using times, collect features from delta-time features gold_t, xp_t

exponential corrected gold/sec
exponential corrected xp/sec

Actions

binary-encoded actions

Location features

All this features are hard to infer because of the unknown scales for coordinates. My experiments with position visualization don't look very realistic, with players stuck off-map sometimes.

x
y
lane
proximity to player's base
proximity to enemy's base

team centroid proximity to self base (not done)
team centroid proximity to enemy base (not done)

FAQ

Q: There is negative value in time!

A: This means that action has happenned in the beginning phase of the game, before the creeps were spawned.

Q: How does player_slot correspond to team?

A: Slot < 128 -> Radiant team, otherwise Dire

Q: What was the version of game when the data was collected?

A: Assuming data was collected on the single game version, it is between 7.07 and 7.19. This was found by analyzing hero picks in train and test games, and then comparing them to heroes release dates. dota_map.png is for 7.07.

Q: What is player slot?

A: This field denotes player team (which is relevant) and position in a team (which is not). There are 10 unique values - [0, 1, 2, 3, 4, 128, 129, 130, 131, 132]. First five are for team Radiant, next five are for Dire.

Q: What is better: Bag-of-Roles or Bag-of-Roles-Levels?

A: ?

ptyshevs/dota

ptyshevs

Reviews

Repository Details