• Stars
    star
    141
  • Rank 259,971 (Top 6 %)
  • Language
    Rust
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A high performance tool for summarizing large directories or drives

Dirscan Crates.io Actions Status

Dirscan is a high-performance tool for quickly inspecting the contents of huge (possibly networked) disks. It provides a summary of every single directory on a given disk, complete with the number of files within, their total size, and the latest time a file was created, accessed or modified.

It's designed for disks that are too large to inspect with traditional tools, and it:

  • Is many orders of magnitudes faster than tools like du, find or tree
  • Can max out any disk you give it, assuming you have enough CPU resources to keep up.
  • Produces a simple JSON or CSV output that can be analysed by the built in viewer or other tools
  • Supports a customisable number of threads
  • Streams results to the output file, keeping relatively constant memory usage with any sized disk.

Table of Contents

Install πŸ’Ώ

Homebrew (MacOS + Linux)

brew tap orf/brew, then brew install dirscan

Binaries (Windows)

Download the latest release from the github releases page. Extract it and move it to a directory on your PATH.

Cargo

For optimal performance run cargo install dirscan

Docker

This project is packaged as a Docker container as tomforbes/dirscan.

Running docker run -vYOUR_DIRECTORY:/dir tomforbes/dirscan scan /dir will scan YOUR_DIRECTORY.

Usage 🎷

Scan a directory

You can start scanning a directory by executing:

dirscan scan [PATH] --output=[OUTPUT]

This will scan [PATH] and output all results, in JSON format, to [OUTPUT]. By default it will use a thread pool with 2 * number_of_cores threads, but you can customize this. Depending on your disk speed the number of threads can drastically improve performance:

dirscan scan [PATH] --output=[OUTPUT] --threads=20

You can also output the results in CSV:

dirscan scan [PATH] --output=[OUTPUT] --format=csv

$ dirscan scan ~/ --output=output.json --threads=20
[00:00:15] Files/s: 17324/s | Total: 258734 | Size: 99.01GB | Components: 14291 | Errors: IO=0 Other=36

Stream results

You can stream all files to stdout by executing:

dirscan stream [PATH]

If you wanted to remove all files in a disk in parallel, you could create a pipeline like:

dirscan stream /my-dir | xargs -d β€˜\n’ -L10 -P500

This would launch up to 500 rm processes, each deleting 10 files.

Inspect results

Once a scan is complete you can inspect the output using:

dirscan parse [OUTPUT]

For example:

$ dirscan parse output.json --prefix=/System/
[00:00:02] Total: 580000 | Per sec: 220653/s
+----------------------+---------+----------+-------------+-------------+-------------+
| Prefix               | Files   | Size     | created     | accessed    | modified    |
+----------------------+---------+----------+-------------+-------------+-------------+
| /System/Applications | 57304   | 777.28MB | 2 weeks ago | 2 weeks ago | 2 weeks ago |
| /System/DriverKit    | 55      | 5.09MB   | 2 weeks ago | 2 weeks ago | 2 weeks ago |
| /System/Library      | 292190  | 13.56GB  | 7 hours ago | 1 hour ago  | 7 hours ago |
| /System/Volumes      | 1468296 | 197.93GB | 1 hour ago  | 1 hour ago  | 1 hour ago  |
| /System/iOSSupport   | 13856   | 600.20MB | 2 weeks ago | 2 weeks ago | 2 weeks ago |
+----------------------+---------+----------+-------------+-------------+-------------+

You can include more directories with the --depth flag, or change the prefix search with --prefix.

You can also order the results by name (the default), size or files:

$ dirscan parse output.json --prefix=/System/ --sort=size
[00:00:02] Total: 580000 | Per sec: 220653/s
+----------------------+---------+----------+-------------+-------------+-------------+
| Prefix               | Files   | Size     | created     | accessed    | modified    |
+----------------------+---------+----------+-------------+-------------+-------------+
| /System/Volumes      | 1468296 | 197.93GB | 2 hours ago | 2 hours ago | 2 hours ago |
| /System/Library      | 292190  | 13.56GB  | 7 hours ago | 2 hours ago | 7 hours ago |
| /System/Applications | 57304   | 777.28MB | 2 weeks ago | 2 weeks ago | 2 weeks ago |
| /System/iOSSupport   | 13856   | 600.20MB | 2 weeks ago | 2 weeks ago | 2 weeks ago |
| /System/DriverKit    | 55      | 5.09MB   | 2 weeks ago | 2 weeks ago | 2 weeks ago |
+----------------------+---------+----------+-------------+-------------+-------------+

More Repositories

1

gping

Ping, but with a graph
Rust
10,623
star
2

html-query

jq, but for HTML
HTML
622
star
3

simple

Simple is a clone of Obtvse written in Python running on Flask.
CSS
505
star
4

xcat

XPath injection tool
Python
355
star
5

cyborg

Python web scraping framework
Python
315
star
6

django-debug-toolbar-template-timings

A django-debug-toolbar panel that displays template rendering times for your Django application
Python
296
star
7

git-workspace

Sync personal and work git repositories from multiple providers πŸš€
Rust
281
star
8

inliner

Automagically inline python methods
Python
102
star
9

cargo-bloat-action

Track rust binary sizes across builds using Github Actions
TypeScript
96
star
10

wordinserter

Insert HTML or Markdown into a Word document
Python
82
star
11

bare-hugo-theme

A Hugo theme based on Bulma.io
HTML
72
star
12

datatables

SQLAlchemy->Datatables
Python
52
star
13

ptail

Stream and display a fixed number of lines from a processes output.
Rust
49
star
14

human_id

Human readable IDs, in Python
Python
44
star
15

MovieFinder

A basic movie recommendation site built using Python, Flask, SQLAlchemy and Backbone.js
JavaScript
31
star
16

ripgrep-structured

Ripgrep over structured data
Rust
24
star
17

crontabula

Parse crontab expressions with Python
Python
23
star
18

websocket_stdout_example

Use websockets with twisteds ProcessProtocol
Python
21
star
19

django-docker-box

See https://github.com/django/django-docker-box
Python
21
star
20

xcat_app

A XPath injection demonstration application
Java
20
star
21

django-choice-object

A choice object for Django
Python
17
star
22

spam

A tool to graph who has sent you the most emails
Python
17
star
23

HtmlToWord

Render HTML to a specific portion of a word document using Python and PyWin32
Python
16
star
24

dotfiles

My dotfiles.
Nushell
14
star
25

cel-rust-original

Rust
13
star
26

pytest-scrutinize

Find bottlenecks in your test suites
Python
12
star
27

xpath-expressions

Treat XPath expressions as Python objects
Python
11
star
28

petal

🌺 Petal - Flask, for gRPC services.
Python
11
star
29

TinyLink

Small link-shortening service written in Django
JavaScript
10
star
30

CTF

Simple capture the flag web application
JavaScript
9
star
31

django-github-actions

Github actions PoC for Django
Python
7
star
32

pinger

Archived: Now part of https://github.com/orf/gping
Rust
7
star
33

uni_timetables

A quick timetabling application written in Python using Flask
JavaScript
6
star
34

cvsslib

A library implementing CVSS v2 and v3 scores
Python
6
star
35

aio-pipes

Asynchronous pipes in Python
Python
6
star
36

hnewssimulator

Hacker news simulator using Markov chains. Very messy at the moment.
Python
6
star
37

alfred-quip-workflow

Fulltext, local Quip document search
Python
6
star
38

deterministic-zip

Deterministic zipfiles, with Rust
Rust
5
star
39

pyvector

https://vector.dev/ embedded inside Python
Rust
5
star
40

django-performance-metrics

Python
5
star
41

alfred-pycharm

Quickly open Pycharm projects via Alfred
Python
4
star
42

s3-deletion-visualizer

Rust
4
star
43

howslow_django

4
star
44

hncat

Grab all Hacker News stores + comments, quickly.
Rust
3
star
45

redis-parser

Rust
3
star
46

watchman-client

Python
3
star
47

apple-music-importer

Import your Library.xml file into Apple Music
TypeScript
3
star
48

digest

Simple RSS digester
2
star
49

pypaper

A windows desktop background manager written in Python
Python
2
star
50

Gmail-dumper

Dump Gmail inboxes
Python
2
star
51

cargo-bloat-backend

Python
2
star
52

blog-hugo

My blog!
CSS
2
star
53

logbot

Logbot tails local log files to an IRC channel.
Python
2
star
54

homebrew-brew

Personal homebrew things
Ruby
1
star
55

workaround

Python
1
star
56

Facebook-link-stats

Half finished facebook application that would track links shared on facebook.
Python
1
star
57

vulnerable_website

A vulnerable website I made for a presentation
CSS
1
star
58

wow_economy

Word of Warcraft auction price average thing.
Python
1
star
59

FindMeChicken-mono

C#
1
star
60

trend

Simple terminal graphs
Rust
1
star
61

proximity-db

euclidean distance calculations, fast.
Rust
1
star
62

circleci-inspector

Python
1
star
63

Wikipedia-XML-Processor

Wikipedia XML Processor
C#
1
star
64

presentations

Presentations I've given since 2019
Shell
1
star
65

ripgrep-stream

Rust
1
star