• Stars
    star
    153
  • Rank 243,368 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created almost 12 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Language Savant, Python clone of github/linguist.

Linguist

Build Status

Language Savant, Python clone of github/linguist.

Installation

PIP

pip install linguist

Easy_install

easy_install linguist

Features

Language detection

Linguist defines the list of all languages known in a yaml file. In order for a file to be highlighted, a language and lexer must be defined there.

Most languages are detected by their file extension. This is the fastest and most common situation.

For disambiguating between files with common extensions, we use a Bayesian classifier. For an example, this helps us tell the difference between .h files which could be either C, C++, or Obj-C.

For testing, there is a simple FileBlob API:

from linguist.libs.file_blob import FileBlob

FileBlob('test.py').language.name #=> 'Python'

FileBlob('test_file').language.name #=> 'Python'

See linguist/libs/language.py and lib/linguist/languages.yml.

Syntax Highlighting

The actual syntax highlighting is handled by pygments. It also provides a Lexer abstraction that determines which highlighter should be used on a file.

Stats

The Language Graph you see on every repository is built by aggregating the languages of all repo's blobs.

The repository stats API can be used on a directory:

from linguist.libs.repository import Repository

project = Repository.from_directory(".")

project.language.name #=> 'Python'

project.languages #=> defaultdict(<type 'int'>, {<Language name:Python>: 53446, <Language name:JavaScript>: 1991})

for lang, count in projects.languages.iteritems():
    print lang.name, count
#=> Python, 53446
#=> JavaScript, 1991

These stats are also printed out by the binary. Try running pylinguist [dir_path|file_path]:

$ pylinguist ~/douban/proj/code/
60.8% JavaScript
39.1% Python
0.1% Shell

$ pylinguist static/js/lib/jquery.min.js
static/js/lib/jquery.min.js: 2 lines (2 sloc)
  type: Text
  language: JavaScript
  appears to be generated source code
  appears to be a vendored file

$ pylinguist config.py
config.py: 34 lines (23 sloc)
  type: Text
  language: Python

Ignore vendored files

Checking other code into your git repo is a common practice. But this often inflates your project's language stats and may even cause your project to be labeled as another language. We are able to identify some of these files and directories and exclude them.

from linguist.libs.file_blob import FileBlob

FileBlob('static/js/jquery-2.0.0.min.js').is_vendored #=> True

See BlobHelper#is_vendored and linguist/libs/vendor.yml.

Generated file detection

from linguist.libs.file_blob import FileBlob

FileBlob('jquery-2.0.0.min.js').is_generated #=> True
FileBlob('app.coffee').is_generated #=> True

See Generated#is_generated.

Contributing

* Fork the repository.
* Create a topic branch.
* Implement your feature or bug fix.
* Add, commit, and push your changes.
* Submit a pull request.

Testing

cd tests/
python run.py

Changelog

v0.1.1 [2014-11-03]

  • Updated require Pygments

v0.1.0 [2013-11-19]

  • Better performance, create && require scanner
  • Sync the latest version of github/linguist
  • Using MIME Types, create && require mime
  • Compatible github custom lexers, create && require pygments-github-lexers

v0.0.3 [2013-05-20]

  • Bugfix: ignore dir if dir.startswith('.')

v0.0.2 [2013-04-25]

  • Added script pylinguist
  • Disable detech unknown ext file
  • Bugfix count blob sloc
  • Added some unittest

v0.0.1 [2013-04-22]

  • Release v0.0.1

More Repositories

1

DOUAudioStreamer

A Core Audio based streaming audio player for iOS and macOS
Objective-C
2,768
star
2

dpark

Python clone of Spark, a MapReduce alike framework in Python
Python
2,691
star
3

code

[DEPRECATED]Douban CODE
CSS
1,811
star
4

beansdb

Archived, see GoBeansDB instead.
C
870
star
5

douban-client

Python client library for Douban APIs (OAuth 2.0)
Python
744
star
6

rexxar-android

Mobile Hybrid Framework Rexxar Android Container
Java
667
star
7

rexxar-ios

Mobile Hybrid Framework Rexxar iOS Container
Objective-C
578
star
8

FRDIntent

A framework for handle the call between view controllers in iOS
Swift
492
star
9

gobeansdb

Distributed object storage server from Douban Inc.
Go
451
star
10

libmc

Fast and light-weight memcached client for C++ / #python / #golang #libmc
C++
442
star
11

greenify

Make blocking C library work with gevent
C
427
star
12

ynm3k

UI Automation + YUItest driven acceptance tests that can be hooked into Jenkins
JavaScript
410
star
13

paracel

Distributed training framework with parameter server
C++
337
star
14

douban-objc-client

Objective-C client library for Douban APIs (OAuth 2.0)
Objective-C
254
star
15

beanseye

Proxy and monitor for beansdb in Go
Go
233
star
16

rexxar-web

Mobile Hybrid Framework Rexxar Web SDK
JavaScript
206
star
17

Kenshin

Kenshin: A time-series database alternative to Graphite Whisper with 40x improvement in IOPS
Python
206
star
18

tfmesos

Tensorflow in Docker on Mesos #tfmesos #tensorflow #mesos
Python
191
star
19

pymesos

A pure python implementation of Mesos scheduler and executor
Python
163
star
20

brownant

Brownant is a web data extracting framework.
Python
159
star
21

graph-index

index of Graphite & Diamond
Python
129
star
22

CaoE

Kill all children processes when the parent dies
Python
101
star
23

douban-quixote

Douban's Quixote
Python
82
star
24

douban-utils

Douban's Utils
Python
59
star
25

python-libmemcached

DEPRECATED, use https://github.com/douban/libmc instead. python-libmemcached is a python extention for libmemcached
Python
57
star
26

PyCharlockHolmes

Character encoding detecting library for Python using ICU and libmagic.
Common Lisp
50
star
27

DOUSNSSharing

SNS OAuth 2 binding and sharing
Objective-C
47
star
28

ellen

Ellen is a wrapper of pygit2 and git command.
Python
41
star
29

Polymorph

Transform value of dictionary to property of Objective-C class, by using a `dynamic` like directive.
Objective-C
40
star
30

douban-sqlstore

Douban's MySQL lib.
Python
31
star
31

gpack

GIT Smart HTTP Server Rack Implementation, Python clone of https://github.com/schacon/grack
Python
30
star
32

douban-orz

The Missing Data Manager In Douban
Python
29
star
33

douban-mc

Douban's Memcached lib for python.
Python
27
star
34

charts

Helm charts from douban
Smarty
24
star
35

helpdesk

Yet another helpdesk based on multiple providers
Python
22
star
36

sina

A GIT Smart HTTP Server WSGI Implementation.
Python
21
star
37

sa-tools-core

Handy tools for sysadmin.
Python
18
star
38

graphite-kenshin

A plugin for using graphite-web with the kenshin-based storage backend.
Python
16
star
39

gobeansproxy

A proxy for GoBeansDB
Go
13
star
40

beansdbadmin

GoBeansDB Admin UI
Python
9
star
41

redarrow-rs

A command dispatcher to run executables remotely and safely.
Rust
6
star
42

MTURLProtocol

Multiple NSURLProtocol subclasses alternative solution.
Objective-C
4
star
43

python-libmagic

A wrapper for libmagic with static build.
Python
3
star
44

qiniu-exporter

Go
2
star
45

aliyun-exporter

Go
2
star
46

pyquicklz

C
1
star
47

upyun-exporter

Go
1
star
48

sa-tools-go

go version for sa-tools
Go
1
star