• Stars
    star
    10,684
  • Rank 3,208 (Top 0.07 %)
  • Language
    Ruby
  • Created over 13 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!

Linguist

Actions Status

This library is used on GitHub.com to detect blob languages, ignore binary or vendored files, suppress generated files in diffs, and generate language breakdown graphs.

Documentation

Installation

Install the gem:

gem install github-linguist

Dependencies

Linguist is a Ruby library so you will need a recent version of Ruby installed. There are known problems with the macOS/Xcode supplied version of Ruby that causes problems installing some of the dependencies. Accordingly, we highly recommend you install a version of Ruby using Homebrew, rbenv, rvm, ruby-build, asdf or other packaging system, before attempting to install Linguist and the dependencies.

Linguist uses charlock_holmes for character encoding and rugged for libgit2 bindings for Ruby. These components have their own dependencies.

  1. charlock_holmes
  2. rugged

You may need to install missing dependencies before you can install Linguist. For example, on macOS with Homebrew:

brew install cmake pkg-config icu4c

On Ubuntu:

sudo apt-get install build-essential cmake pkg-config libicu-dev zlib1g-dev libcurl4-openssl-dev libssl-dev ruby-dev

Usage

Application usage

Linguist can be used in your application as follows:

require 'rugged'
require 'linguist'

repo = Rugged::Repository.new('.')
project = Linguist::Repository.new(repo, repo.head.target_id)
project.language       #=> "Ruby"
project.languages      #=> { "Ruby" => 119387 }

Command line usage

Git Repository

A repository's languages stats can also be assessed from the command line using the github-linguist executable. Without any options, github-linguist will output the language breakdown by percentage and file size.

cd /path-to-repository
github-linguist

You can try running github-linguist on the root directory in this repository itself:

$ github-linguist
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile

Additional options

--rev REV

The --rev REV flag will change the git revision being analyzed to any gitrevisions(1) compatible revision you specify.

This is useful to analyze the makeup of a repo as of a certain tag, or in a certain branch.

For example, here is the popular Jekyll open source project.

$ github-linguist jekyll

70.64%  709959     Ruby
23.04%  231555     Gherkin
3.80%   38178      JavaScript
1.19%   11943      HTML
0.79%   7900       Shell
0.23%   2279       Dockerfile
0.13%   1344       Earthly
0.10%   1019       CSS
0.06%   606        SCSS
0.02%   234        CoffeeScript
0.01%   90         Hack

And here is Jekyll's published website, from the gh-pages branch inside their repository.

$ github-linguist jekyll --rev origin/gh-pages
100.00% 2568354    HTML
--breakdown

The --breakdown or -b flag will additionally show the breakdown of files by language.

You can try running github-linguist on the root directory in this repository itself:

$ github-linguist --breakdown
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile

Ruby:
Gemfile
Rakefile
bin/git-linguist
bin/github-linguist
ext/linguist/extconf.rb
github-linguist.gemspec
lib/linguist.rb
โ€ฆ
--json

The --json or -j flag output the data into JSON format.

$ github-linguist --json
{"Dockerfile":{"size":1212,"percentage":"0.31"},"Ruby":{"size":264519,"percentage":"66.84"},"C":{"size":97685,"percentage":"24.68"},"Lex":{"size":5098,"percentage":"1.29"},"Shell":{"size":1257,"percentage":"0.32"},"Go":{"size":25999,"percentage":"6.57"}}

This option can be used in conjunction with --breakdown to get a full list of files along with the size and percentage data.

$ github-linguist --breakdown --json
{"Dockerfile":{"size":1212,"percentage":"0.31","files":["Dockerfile","tools/grammars/Dockerfile"]},"Ruby":{"size":264519,"percentage":"66.84","files":["Gemfile","Rakefile","bin/git-linguist","bin/github-linguist","ext/linguist/extconf.rb","github-linguist.gemspec","lib/linguist.rb",...]}}

Single file

Alternatively you can find stats for a single file using the github-linguist executable.

You can try running github-linguist on files in this repository itself:

$ github-linguist grammars.yml
grammars.yml: 884 lines (884 sloc)
  type:      Text
  mime type: text/x-yaml
  language:  YAML

Docker

If you have Docker installed you can build an image and run Linguist within a container:

$ docker build -t linguist .
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) -t linguist
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) -t linguist github-linguist --breakdown
66.84%  264519     Ruby
24.68%  97685      C
6.57%   25999      Go
1.29%   5098       Lex
0.32%   1257       Shell
0.31%   1212       Dockerfile

Ruby:
Gemfile
Rakefile
bin/git-linguist
bin/github-linguist
ext/linguist/extconf.rb
github-linguist.gemspec
lib/linguist.rb
โ€ฆ

Contributing

Please check out our contributing guidelines.

License

The language grammars included in this gem are covered by their repositories' respective licenses. vendor/README.md lists the repository for each grammar.

All other files are covered by the MIT license, see LICENSE.

More Repositories

1

gitignore

A collection of useful .gitignore templates
160,684
star
2

copilot-docs

Documentation for GitHub Copilot
23,229
star
3

docs

The open-source repo for docs.github.com
JavaScript
14,053
star
4

opensource.guide

๐Ÿ“š Community guides for open source creators
HTML
12,947
star
5

gh-ost

GitHub's Online Schema-migration Tool for MySQL
Go
11,302
star
6

semantic

Parsing, analyzing, and comparing source code across many languages
Haskell
8,865
star
7

copilot.vim

Neovim plugin for GitHub Copilot
Vim Script
8,286
star
8

codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
CodeQL
7,579
star
9

roadmap

GitHub public roadmap
7,393
star
10

scientist

๐Ÿ”ฌ A Ruby library for carefully refactoring critical paths.
Ruby
7,389
star
11

personal-website

Code that'll help you kickstart a personal website that showcases your work as a software developer.
HTML
7,243
star
12

markup

Determines which markup library to use to render a content file (e.g. README) on GitHub
Ruby
5,678
star
13

dmca

Repository with text of DMCA takedown notices as received. GitHub does not endorse or adopt any assertion contained in the following notices. Users identified in the notices are presumed innocent until proven guilty. Additional information about our DMCA policy can be found at
DIGITAL Command Language
5,457
star
14

swift-style-guide

**Archived** Style guide & coding conventions for Swift projects
4,770
star
15

gemoji

Emoji images and names.
Ruby
4,280
star
16

training-kit

Open source courseware for Git and GitHub
HTML
4,247
star
17

explore

Community-curated topic and collection pages on GitHub
Ruby
3,840
star
18

mona-sans

Mona Sans, a variable font from GitHub
3,680
star
19

hubot-scripts

DEPRECATED, see https://github.com/github/hubot-scripts/issues/1113 for details - optional scripts for hubot, opt in via hubot-scripts.json
CoffeeScript
3,538
star
20

choosealicense.com

A site to provide non-judgmental guidance on choosing a license for your open source project
Ruby
3,379
star
21

git-sizer

Compute various size metrics for a Git repository, flagging those that might cause problems
Go
3,160
star
22

secure_headers

Manages application of security headers with many safe defaults
Ruby
3,104
star
23

gov-takedowns

Text of government takedown notices as received. GitHub does not endorse or adopt any assertion contained in the following notices.
3,088
star
24

archive-program

The GitHub Archive Program & Arctic Code Vault
3,000
star
25

scripts-to-rule-them-all

Set of boilerplate scripts describing the normalized script pattern that GitHub uses in its projects.
Shell
2,859
star
26

hotkey

Trigger an action on an element with a keyboard shortcut.
JavaScript
2,851
star
27

relative-time-element

Web component extensions to the standard <time> element.
JavaScript
2,799
star
28

janky

Continuous integration server built on top of Jenkins and Hubot
Ruby
2,759
star
29

github-elements

GitHub's Web Component collection.
JavaScript
2,523
star
30

renaming

Guidance for changing the default branch name for GitHub repositories
2,408
star
31

view_component

A framework for building reusable, testable & encapsulated view components in Ruby on Rails.
Ruby
2,370
star
32

VisualStudio

GitHub Extension for Visual Studio
C#
2,365
star
33

glb-director

GitHub Load Balancer Director and supporting tooling.
C
2,255
star
34

SoftU2F

Software U2F authenticator for macOS
Swift
2,201
star
35

accessibilityjs

Client side accessibility error scanner.
JavaScript
2,180
star
36

CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code.
Jupyter Notebook
2,155
star
37

balanced-employee-ip-agreement

GitHub's employee intellectual property agreement, open sourced and reusable
2,126
star
38

github-services

Legacy GitHub Services Integration
Ruby
1,902
star
39

platform-samples

A public place for all platform sample projects.
Shell
1,885
star
40

hubot-sans

Hubot Sans, a variable font from GitHub
Shell
1,832
star
41

pages-gem

A simple Ruby Gem to bootstrap dependencies for setting up and maintaining a local Jekyll environment in sync with GitHub Pages
Ruby
1,782
star
42

india

GitHub resources and information for the developer community in India
Ruby
1,769
star
43

haikus-for-codespaces

EJS
1,753
star
44

site-policy

Collaborative development on GitHub's site policies, procedures, and guidelines
1,743
star
45

government.github.com

Gather, curate, and feature stories of public servants and civic hackers using GitHub as part of their open government innovations
HTML
1,727
star
46

advisory-database

Security vulnerability database inclusive of CVEs and GitHub originated security advisories from the world of open source software.
1,711
star
47

objective-c-style-guide

**Archived** Style guide & coding conventions for Objective-C projects
1,682
star
48

covid19-dashboard

A site that displays up to date COVID-19 stats, powered by fastpages.
Jupyter Notebook
1,644
star
49

lightcrawler

Crawl a website and run it through Google lighthouse
JavaScript
1,471
star
50

rest-api-description

An OpenAPI description for GitHub's REST API
1,372
star
51

feedback

Public feedback discussions for: GitHub for Mobile, GitHub Discussions, GitHub Codespaces, GitHub Sponsors, GitHub Issues and more!
1,359
star
52

developer.github.com

GitHub Developer site
Ruby
1,314
star
53

backup-utils

GitHub Enterprise Backup Utilities
1,190
star
54

brubeck

A Statsd-compatible metrics aggregator
C
1,185
star
55

dev

Press the . key on any repo
1,184
star
56

catalyst

Catalyst is a set of patterns and techniques for developing components within a complex application.
TypeScript
1,183
star
57

codeql-action

Actions for running CodeQL analysis
TypeScript
1,152
star
58

securitylab

Resources related to GitHub Security Lab
C
1,150
star
59

opensourcefriday

๐Ÿšฒ Contribute to the open source community every Friday
HTML
1,143
star
60

graphql-client

A Ruby library for declaring, composing and executing GraphQL queries
Ruby
1,139
star
61

Rebel

Cocoa framework for improving AppKit
Objective-C
1,127
star
62

gh-actions-importer

GitHub Actions Importer helps you plan and automate the migration of Azure DevOps, Bamboo, Bitbucket, CircleCI, GitLab, Jenkins, and Travis CI pipelines to GitHub Actions.
C#
982
star
63

licensed

A Ruby gem to cache and verify the licenses of dependencies
Ruby
942
star
64

.github

Community health files for the @GitHub organization
869
star
65

swordfish

EXPERIMENTAL password management app. Don't use this.
Ruby
740
star
66

details-dialog-element

A modal dialog that's opened with <details>.
JavaScript
739
star
67

stack-graphs

Rust implementation of stack graphs
Rust
725
star
68

codeql-cli-binaries

Binaries for the CodeQL CLI
725
star
69

github-ds

A collection of Ruby libraries for working with SQL on top of ActiveRecord's connection
Ruby
667
star
70

email_reply_parser

Small library to parse plain text email content
Ruby
658
star
71

vulcanizer

GitHub's ops focused Elasticsearch library
Go
657
star
72

github-ospo

Helping open source program offices get started
641
star
73

webauthn-json

๐Ÿ” A small WebAuthn API wrapper that translates to/from pure JSON using base64url.
TypeScript
638
star
74

gh-copilot

Ask for assistance right in your terminal.
637
star
75

rubocop-github

Code style checking for GitHub's Ruby projects
Ruby
616
star
76

safe-settings

JavaScript
606
star
77

codespaces-jupyter

Explore machine learning and data science with Codespaces
Jupyter Notebook
591
star
78

dat-science

Replaced by https://github.com/github/scientist
Ruby
582
star
79

maven-plugins

Official GitHub Maven Plugins
Java
581
star
80

details-menu-element

A menu opened with <details>.
JavaScript
554
star
81

trilogy

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.
C
543
star
82

freno

freno: cooperative, highly available throttler service
Go
534
star
83

smimesign

An S/MIME signing utility for use with Git
Go
519
star
84

brasil

Recursos e informaรงรตes do GitHub para a comunidade de desenvolvedores no Brasil.
Ruby
515
star
85

gh-valet

Valet helps facilitate the migration of Azure DevOps, CircleCI, GitLab CI, Jenkins, and Travis CI pipelines to GitHub Actions.
C#
511
star
86

include-fragment-element

A client-side includes tag.
JavaScript
508
star
87

covid-19-repo-data

Data archive of identifiable COVID-19 related public projects on GitHub
505
star
88

vscode-github-actions

GitHub Actions extension for VS Code
TypeScript
492
star
89

vscode-codeql-starter

Starter workspace to use with the CodeQL extension for Visual Studio Code.
CodeQL
477
star
90

how-engineering-communicates

A community version of the "common API" for how the GitHub Engineering organization communicates
474
star
91

Archimedes

Geometry functions for Cocoa and Cocoa Touch
Objective-C
466
star
92

codeql-go

The CodeQL extractor and libraries for Go.
465
star
93

open-source-survey

The Open Source Survey
431
star
94

synsanity

netfilter (iptables) target for high performance lockless SYN cookies for SYN flood mitigation
C
424
star
95

entitlements-app

The Ruby Gem that Powers Entitlements - GitHub's Identity and Access Management System
Ruby
409
star
96

MVG

MVG = Minimum Viable Governance
379
star
97

issue-metrics

Gather metrics on issues/prs/discussions such as time to first response, count of issues opened, closed, etc.
Python
378
star
98

roskomnadzor

deprecated archive โ€” moved to https://github.com/github/gov-takedowns/tree/master/Russia
376
star
99

clipboard-copy-element

Copy element text content or input values to the clipboard.
JavaScript
374
star
100

codespaces-react

JavaScript
364
star