• Stars
    star
    113
  • Rank 310,115 (Top 7 %)
  • Language
    Ruby
  • License
    MIT License
  • Created about 9 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code Climate engine for code duplication analysis

codeclimate-duplication

Maintainability

codeclimate-duplication is an engine that wraps flay and supports Java, Ruby, Python, JavaScript, and PHP. You can run it on the command line using the Code Climate CLI or on our hosted analysis platform.

What is duplication?

The duplication engine's algorithm can be surprising, but it's actually very simple. We have a docs page explaining the algorithm.

Installation

  1. Install the Code Climate CLI, if you haven't already.
  2. You're ready to analyze! cd into your project's folder and run codeclimate analyze. Duplication analysis is enabled by default, so you don't need to do anything else.

Configuring

Mass Threshold

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The mass threshold configuration represents the minimum "mass" a code block must have to be analyzed for duplication. If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

To adjust this setting, use the top-level checks key in your config file:

checks:
  identical-code:
    config:
      threshold: 25
  similar-code:
    config:
      threshold: 50

Note that you have the update the YAML structure under the languages key to the Hash type to support extra configuration.

Count Threshold

By default, the duplication engine will report code that has been duplicated in just two locations. You can be less strict by only raising a warning if code is duplicated in three or more locations only. To adjust this setting, add a count_threshold key to your config. For instance, to use the default mass_threshold for ruby, but to enforce the Rule of Three, you could use this configuration:

plugins:
  duplication:
    enabled: true
    config:
      languages:
        ruby:
          count_threshold: 3

You can also change the default count_threshold for all languages:

plugins:
  duplication:
    enabled: true
    config:
      count_threshold: 3

Custom file name patterns

All engines check only appropriate files but you can override default set of patterns. Patterns are ran against the project root directory so you have to use ** to match files in nested directories. Also note that you have to specify all patterns, not only the one you want to add.

plugins:
  duplication:
    enabled: true
    config:
      languages:
        ruby:
          patterns:
            - "**/*.rb
            - "**/*.rake"
            - "Rakefile"
            - "**/*.ruby"

Python 3

By default, the Duplication engine will use a Python 2 parser. To enable analysis for Python 3 code, specify the python_version as shown in the example below. This will enable a Python 3 parser and add the .py3 file extension to the list of included file patterns.

plugins:
  duplication:
    enabled: true
    config:
      languages:
        python:
          python_version: 3

Node Filtering

Sometimes structural similarities are reported that you just don't care about. For example, the contents of arrays or hashes might have similar structures and there's little you can do to refactor them. You can specify language specific filters to ignore any issues that match the pattern. Here is an example that filters simple hashes and arrays:

plugins:
  duplication:
    enabled: true
    config:
      languages:
        ruby:
          filters:
            - "(hash (lit _) (str _) ___)"
            - "(array (str _) ___)"

The syntax for patterns are pretty simple. In the first pattern: "(hash (lit _) (str _) ___)" specifies "A hash with a literal key, a string value, followed by anything else (including nothing)". You could also specify "(hash ___)" to ignore all hashes altogether.

Visualizing the Parse Tree

Figuring out what to filter is tricky. codeclimate-duplication comes with a configuration option to help with the discovery. Instead of scanning your code and printing out issues for codeclimate, it prints out the parse-trees instead! Just add dump_ast: true and debug: true to your .codeclimate.yml file:

---
plugins:
  duplication:
    enabled: true
    config:
      dump_ast: true
      debug: true
      ... rest of config ...

Then run codeclimate analyze while using the debug flag to output stderr:

% CODECLIMATE_DEBUG=1 codeclimate analyze

Running that command might output something like:

Sexps for issues:

# 1) ExpressionStatement#4261258897 mass=128:

# 1.1) bogus-examples.js:5

s(:ExpressionStatement,
 :expression,
 s(:AssignmentExpression,
  :"=",
  :left,
  s(:MemberExpression,
   :object,
   s(:Identifier, :EventBlock),
   :property,
   s(:Identifier, :propTypes)),
   ... LOTS more...)
   ... even more LOTS more...)

This is the internal representation of the actual code. Assuming you've looked at those issues and have determined them not to be an issue you want to address, you can filter it by writing a pattern string that would match that tree.

Looking at the tree output again, this time flattening it out:

s(:ExpressionStatement, :expression, s(:AssignmentExpression, :"=",:left, ...) ...)

The internal representation (which is ruby) is different from the pattern language (which is lisp-like), so first we need to convert s(: to ( and remove all commas and colons:

(ExpressionStatement expression (AssignmentExpression "=" left ...) ...)

Next, we don't care bout expression so let's get rid of that by replacing it with the matcher for any single element _:

(ExpressionStatement _ (AssignmentExpression "=" left ...) ...)

The same goes for "=" and left, but we actually don't care about the rest of the AssignmentExpression node, so let's use the matcher that'll ignore the remainder of the tree ___:

(ExpressionStatement _ (AssignmentExpression ___) ...)

And finally, we don't care about what follows in the ExpressionStatement so let's ignore the rest too:

(ExpressionStatement _ (AssignmentExpression ___) ___)

This reads: "Any ExpressionStatement node, with any value and an AssignmentExpression node with anything in it, followed by anything else". There are other ways to write a pattern to match this tree, but this is pretty clear.

Then you can add that filter to your config:

---
plugins:
  duplication:
    enabled: true
    config:
      dump_ast: true
      languages:
        javascript:
          filters:
          - "(ExpressionStatement _ (AssignmentExpression ___) ___)"

Then rerun the analyzer and figure out what the next filter should be. When you are happy with the results, remove the dump_ast config (or set it to false) to go back to normal analysis.

For more information on pattern matching, see sexp_processor, especially sexp.rb

More Repositories

1

codeclimate

Code Climate CLI
Ruby
2,523
star
2

refactoring-fat-models

Ruby
209
star
3

test-reporter

Code Climate Test Reporter
Go
156
star
4

platform

Code Climate Engineering Data Platform
113
star
5

codeclimate-eslint

Code Climate Engine for ESLint
JavaScript
95
star
6

ruby-test-reporter

DEPRECATED Uploads Ruby test coverage data to Code Climate
Ruby
92
star
7

javascript-test-reporter

DEPRECATED Code Climate test reporter client for JavaScript projects
JavaScript
69
star
8

php-test-reporter

DEPRECATED PHP Test Reporter
PHP
65
star
9

velocity-deploy-action

A simple GitHub Action for tracking deployments in Velocity.
JavaScript
62
star
10

codeclimate-rubocop

Code Climate Engine for Rubocop
Ruby
59
star
11

codeclimate-fixme

A codeclimate engine for finding things you should fix.
JavaScript
35
star
12

codeclimate-phpcodesniffer

Code Climate Engine for PHP Code Sniffer
PHP
28
star
13

styleguide

Code Climate's Internal Style Guides
JavaScript
23
star
14

popeye

Generate an authorized_keys file from users stored in AWS IAM
Haskell
20
star
15

python-test-reporter

DEPRECATED Uploads Python test coverage data to Code Climate
Python
19
star
16

codeclimate-services

Code Climate services
Ruby
13
star
17

codeclimate-pmd

Groovy
13
star
18

minidoc

Lightweight MongoDB object document mapper
Ruby
13
star
19

codeclimate-bundler-audit

Code Climate Engine for bundler-audit
Ruby
12
star
20

codeclimate-phpmd

Code Climate PHPMD Engine
PHP
10
star
21

homebrew-formulae

Formulae repo for Code Climate Homebrew packages
Shell
9
star
22

codeclimate-golint

Code Climate Engine for golint
Go
7
star
23

cc-engine-go

Go library for Code Climate engines
Go
7
star
24

codeclimate-csslint

Code Climate Engine for CSSLint
JavaScript
6
star
25

docker-alpine-ruby

Docker image for Alpine Linux w/ Ruby
Shell
5
star
26

create-codeclimate-connector

Template for Code Climate Connectors
TypeScript
5
star
27

github_monkey

A monkey with access to several typewriters and the GitHub API. More plainly: a script to generate commit & PR traffic on a GH repo.
Ruby
5
star
28

codeclimate-sonar-python

Java
4
star
29

codeclimate-coffeelint

Code Climate Engine for CoffeeLint
Ruby
4
star
30

connector-sdk

SDK for building on the Code Climate Engineering Data Platform
TypeScript
4
star
31

codeclimate-checkstyle

A code climate engine for checkstyle
Ruby
3
star
32

codeclimate-sonar-java

Maintainability and reliability checks for Java
Java
3
star
33

cc-php-test-reporter-sandbox

PHP
2
star
34

codeclimate-grep

Ruby
2
star
35

kafka

Centralized Kafka client gem (temporary name)
Ruby
2
star
36

codeclimate-golangci-lint

Ruby
1
star
37

builder-gc

Garbage Collect Builder and Engine containers
Shell
1
star
38

codeclimate-connector-codecov

Code Climate Connector for Codecov
TypeScript
1
star
39

codeclimate-ss-analyzer-wrapper

Java
1
star
40

coding-exercise-rails-template

Templated application for Rails based interviewing coding exercise
Ruby
1
star
41

peek-grit

Ruby
1
star
42

community-styleguide

Collection of popular style guides
1
star
43

velocity-client-ruby

Ruby library to interact with velocity teams API
Ruby
1
star
44

codeclimate-statsd

Dockerized statsd
Dockerfile
1
star
45

codeclimate-swiftlint

Code Climate Swiftlint engine
Swift
1
star
46

velocity-agent-helm-chart

Helm chart for the Velocity Agent
Smarty
1
star