• Stars
    star
    176
  • Rank 215,776 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

a load testing tool extended from locust

Grasshopper

A lightweight framework for performing load tests against an environment, primarily against an API. Grasshopper glues Locust, Pytest, some plugins (namely Locust InfluxDBListener ) and some custom code to provide a package that makes authoring load tests simple with very little boilerplate needed.

Here are some key functionalities that this project extends on Locust:

Installation

This package can be installed via pip: pip install locust-grasshopper

Example Load Test

  • You can refer to the test test_example.py in the example directory for a basic skeleton of how to get a load test running. In the same directory, there is also an example conftest.py that will show you how to get basic parameterization working.
  • This test can be invoked by running pytest example/test_example.py in the root of this project.
  • This test can also be invoked via a YAML scenario file:
cd example
pytest example_scenarios.YAML --tags=example1

In this example scenario file, you can see how grasshopper_args, grasshopper_scenario_args, and tags are being set.

(back to top)

Creating a load test

When creating a new load test, the primary grasshopper function you will be using is called Grasshopper.launch_test. This function can be imported like so: from grasshopper.lib.grasshopper import Grasshopper launch_test takes in a wide variety of args:

  • user_classes: User classes that the runner will run. These user classes must extend BaseJourney, which is a grasshopper class (from grasshopper.lib.journeys.base_journey import BaseJourney). This can be a single class, a list of classes, or a dictionary where the key is the class and the value is the locust weight to assign to that class.
  • **complete_configuration: In order for the test to have the correct configuration, you must pass in the kwargs provided by the complete_configuration fixture. See example load test on how to do this properly.

(back to top)

Scenario Args

  • If you want to parameterize your journey class, you should use the scenario_args dict. This is the proper way to pass in values from outside of the journey for access by the journey code. Note that each journey gets a copy on start, so the journey itself can safely modify its own dictionary once the test is running. scenario_args exists for any journey that extends the grasshopper base_journey class. scenario_args also grabs from self.defaults on initialization. For example:
from locust import between, task
from grasshopper.lib.journeys.base_journey import BaseJourney
from grasshopper.lib.grasshopper import Grasshopper

# a journey class with an example task
class ExampleJourney(BaseJourney):
    # number of seconds to wait between each task
    wait_time = between(min_wait=20, max_wait=30)
    
    # this defaults dictionary will be merged into scenario_args with lower precedence 
    # when the journey is initialized
    defaults = {
        "foo": "bar",
    }
    
    @task
    def example_task:
        logging.info(f'foo is `{self.scenario_args.get("foo")}`.')
        
        # aggregate all metrics for the below request under the name "get google"
        # if name is not specified, then the full url will be the name of the metric
        response = self.client.get('https://google.com', name='get google')

# the pytest test which launches the journey class
def test_run_example_journey(complete_configuration):
    # update scenario args before initialization
    ExampleJourney.update_incoming_scenario_args(complete_configuration)
    
    # launch the journey
    locust_env = Grasshopper.launch_test(ExampleJourney, **complete_configuration)
    return locust_env

(back to top)

Commonly used grasshopper pytest arguments

  • --runtime: Number of seconds to run each test. Set to 120 by default.
  • --users: Max number of users that are spawned. Set to 1 by default.
  • --spawn_rate : Number of users to spawn per second. Set to 1 by default.
  • --shape: The name of a shape to run for the test. If you don't specify a shape or shape instance, then the shape Default will be used, which just runs with the users, runtime & spawn_rate specified on the command line (or picks up defaults of 1, 1, 120s). See utils/shapes.py for available shapes and information on how to add your own shapes.
  • --scenario_file If you want a yaml file where you pre-define some args, this is how you specify that file path. For example, scenario_file=example/scenario_example.YAML.
  • --scenario_name If --scenario_file was specified, this is the scenario name that is within that YAML file that corresponds to the scenario you wish to run. Defaults to None.
  • --tags See below example: Loop through a collection of scenarios that match some tag.
  • --scenario_delay Adds a delay in seconds between scenarios. Defaults to 0.
  • --influx_host If you want to report your performance test metrics to some influxdb, you must specify a host. E.g. 1.1.1.1. Defaults to None.
  • --influx_port: Port for your influx_host in the case where it is non-default.
  • --influx_user: Username for your influx_host, if you have one.
  • --influx_pwd: Password for your influx_host, if you have one.

(back to top)

Launching tests with a configuration

All in all, there are a few ways you can actually collect and pass params to a test:

Run a test with its defaults

cd example pytest test_example.py ...

Run a test with a specific scenario

cd example pytest test_example.py --scenario_file=example_scenarios.YAML --scenario_name=example_scenario_1 ...

Loop through a collection of scenarios that match some tag

cd example pytest example_scenarios.YAML --tags=smoke ...

  • As shown above, this case involves passing a .YAML scenario file into pytest instead of a .py file.
  • The --scenario_file and --scenario_name args will be ignored in this case
  • The --tags arg supports AND/OR operators according to the opensource tag-matcher package. More info on these operators can be found here.
  • If no --tags arg is specified, then ALL the scenarios in the .yaml file will be run.
  • If a value is given for --scenario_delay, the test will wait that many seconds between collected scenarios.
  • All scenarios are implicitly tagged with the scenario name to support easily selecting one single scenario

(back to top)

Configuring Grasshopper

Grasshopper adds a variety of parameters relating to performance testing along with a variety of ways to set these values.

Recent changes (>= 1.1.1) include an expanded set of sources, almost full access to all arguments via every source (some exceptions outlined below), and the addition of some new values that will be used with integrations such as report portal & slack (integrations are NYI). These changes are made in a backwards compatible manner, meaning all existing grasshopper tests should still run without modification. The original fixtures and sources for configuration are deprecated, but still produce the same behavior.

All of the usual pytest arguments also remain available.

The rest of the sections on configuration assume you are using locust-grasshopper>=1.1.1.

(back to top)

Sources

Currently, there are 5 different sources for configuration values and they are, in precedence order

  • command line arguments
  • environment variables
  • scenario values from a scenario yaml file
  • grasshopper configuration file
  • global defaults (currently stored in code, not configurable by consumers)

Obviously, the global defaults defined by Grasshopper are not really a source for consumers to modify, but we mention it so values don't seem to appear "out of thin air".

(back to top)

Categories

The argument list is getting lengthy, so we've broken it down into categories. These categories are entirely for humans: better readability, understanding and ease of use. Once they are all fully loaded by Grasshopper, they will be stored in a single GHConfiguration object (dict). By definition, every argument is in only one category and there is no overlap of keys between the categories. If the same key is supplied in multiple categories, they will be merged with the precedence order as they appear in the table.

Name Scope Description/Usage
Grasshopper Session Variables that rarely change, may span many test runs.
Test Run Session Variables that may change per test run, but are the
same for every scenario in the run
Scenario Session Variables that may change per scenario and are often
scenario specific; Includes user defined variables that are
not declared as command line arguments by Grasshopper.
However, you may use pytest's addoptions hook in your
conftest to define them.

At least one of the sections must be present in the global configuration file and eventually this will be the same in the configuration section of a scenario in a scenario yaml file. Categories are not used when specifying environment variables or command line options. We recommend that you use these categories in file sources, but if a variable is in the wrong section, it won't actually affect the configuration loading process.

(back to top)

Using Configuration Values

Your test(s) may access the complete merged set of key-value pairs via the session scoped fixture complete_configuration. This returns a GHConfiguration object (dict) which is unique to the current scenario. This value will be re-calculated for each new scenario executed.

A few perhaps not obvious notes about configuration:

  • use the environment variable convention of all uppercase key names (e.g. RUNTIME=10) to specify a key-value pair via an environment value
  • use the lower case key to access a key from the GHConfiguration object (e.g. x = complete_configuration("runtime")) regardless of the original source(s)
  • use -- before the key name to specify it on the command line (e.g. --runtime=10)
  • configure a grasshopper configuration file by creating a session scoped fixture loaded by your conftest.py called grasshopper_config_file_path which returns the full path to a configuration yaml file.
  • grasshopper supports thresholds specified as
    • a json string - required for environment variable or commandline, but also accepted from other sources
    • a dict - when passing in via the scenario_args method (more details on that below) or via a journey class's defaults attr.
@pytest.fixture(scope="session")
def grasshopper_config_file_path():
    return "path/to/your/config/file"

An example grasshopper configuration file:

grasshopper:
  influx_host: 1.1.1.1
test_run:
  users: 1.0
  spawn_rate: 1.0
  runtime: 600
scenario:
  key1 : 'value1'
  key2: 0

(back to top)

Additional Extensions to the configuration loading process

If you would like to include other environment variables that might be present in the system, you can define a fixture called extra_env_var_keys which returns a list of key names to load from the os.environ. Keys that are missing in the environment will not be included in the GHConfiguration object.

Any environment variables that use the prefix GH_ will also be included in the GHConfiguration object. The GH_ will be stripped before adding and any names that become zero length after the strip will be discarded. This is a mechanism to include any scenario arguments you might like to pass via an environment variable.

In the unlikely case that you need to use a different prefix to designate scenario variables, you can define a fixture called env_var_prefix_key which returns a prefix string. The same rules apply about which values are included in the configuration.

(back to top)

Checks

Checks are an assertion that is recorded as a metric. They are useful both to ensure your test is working correctly (e.g. are you getting a valid id back from some post that you sent) and to evaluate if the load is causing intermittent failures (e.g. sometimes a percentage of workflow runs don't complete correctly the load increases). At the end of the test, checks are aggregated by their name across all journeys that ran and then reported to the console. They are also forwarded to the DB in the "checks" table. Here is an example of using a check:

from grasshopper.lib.util.utils import check
...
response = self.client.get(
    'https://google.com', name='get google'
)
check(
    "get google responded with a 200",
    response.status_code == 200,
    env=self.environment,
)

It is worth noting that it is NOT necessary to add checks on the http codes. All the HTTP return codes are tracked automatically by grasshopper and will be sent to the DB. If you aren't using a DB then you might want the checks console output.

(back to top)

Custom Trends

Custom trends are useful when you want to time something that spans multiple HTTP calls. They are reported to the specified database just like any other HTTP request, but with the "CUSTOM" HTTP verb as opposed to "GET", "POST", etc. Here is an example of using a custom trend:

from locust import between, task
from grasshopper.lib.util.utils import custom_trend
...

@task
@custom_trend("my_custom_trend")
def google_get_journey(self)
    for i in range(len(10)):
        response = self.client.get(
            'https://google.com', name='get google', context={'foo1':'bar1'}
        )

(back to top)

Thresholds

Thresholds are time-based, and can be added to any trend, whether it be a custom trend or a request response time. Thresholds default to the 0.9 percentile of timings. Here is an example of using a threshold:

# a journey class with an example threshold
from locust import between, task
from grasshopper.lib.journeys.base_journey import BaseJourney
from grasshopper.lib.grasshopper import Grasshopper

class ExampleJourney(BaseJourney):
    # number of seconds to wait between each task
    wait_time = between(min_wait=20, max_wait=30)
    
    @task
    def example_task:
        self.client.get("https://google.com", name="get google")
        
    @task
    @custom_trend("my custom trend")
    def example_task_custom_trend:
        time.sleep(10)

# the pytest test which launches the journey class, thresholds could be 
# parameterized here as well.
def test_run_example_journey(complete_configuration):
    ExampleJourney.update_incoming_scenario_args(complete_configuration)
    ExampleJourney.update_incoming_scenario_args({
        "thresholds": {
            "get google":
                {
                    "type": "get",
                    "limit": 4000  # 4 second HTTP response threshold
                },
            "my custom trend":
                {
                    "type": "custom",
                    "limit": 11000  # 11 second custom trend threshold
                }
        }
    })
    
    locust_env = Grasshopper.launch_test(ExampleJourney, **complete_configuration)
    return locust_env

Thresholds can also be defined for individual YAML scenarios. Refer to the thresholds key in example/example_scenarios.YAML for how to use thresholds for YAML scenarios.

After a test has concluded, trend/threshold data can be found in locust_env.stats.trends. This data is also reported to the console at the end of each test.

(back to top)

Time Series DB Reporting and Tagging

When you specify a time series database URL param to launch_test, such as influx_host, all metrics will be automatically reported to tables within the locust timeseries database via the specified URL. These tables include:

  • locust_checks: check name, check passed, etc.
  • locust_events: test started, test stopped, etc.
  • locust_exceptions: error messages
  • locust_requests: HTTP requests and custom trends

An example grafana dashboard which queries these tables can be found in example/grafana_dashboards

There are a few ways you can pass in extra tags which will be reported to the time series DB:

  1. HTTP Request Tagging
    All HTTP requests are automatically tagged with their name. If you want to pass in extra tags for a particular HTTP request, you can pass them in as a dictionary for the context param when making a request. For example:

    self.client.get('https://google.com', name='get google', context={'foo':'bar'})

    The tags on this metric would then be: {'name': 'get google', 'foo': 'bar'} which would get forwarded to the database if specified.

  2. Check Tagging
    When defining a check, you can pass in extra tags with the tags parameter:

    from grasshopper.lib.util.utils import check
    ...
    response = self.client.get(
    'https://google.com', name='get google', context={'foo1':'bar1'}
    )
    check(
       "get google responded with a 200",
       response.status_code == 200,
       env=self.environment,
       tags = {'foo2': 'bar2'}
    )
  3. Custom Trend Tagging
    Since custom trends are decorators, they do not have access to non-static class variables when defined. Therefore, you must use the extra_tag_keys param, which is an array of keys that exist in the journey's scenario_args. So for example, if a journey had the scenario args {"foo" : "bar"} and you wanted to tag a custom trend based on the "foo" scenario arg key, you would do something like this:

    from locust import between, task
    from grasshopper.lib.util.utils import custom_trend
    ...
        
    @task
    @custom_trend("my_custom_trend", extra_tag_keys=["foo"])
    def google_get_journey(self)
       for i in range(len(10)):
          response = self.client.get(
             'https://google.com', name='get google', context={'foo1':'bar1'}
          )

(back to top)

Project Roadmap

  • Custom Trends
  • Checks
  • Thresholds
  • Tagging
  • InfluxDB metric reporting
  • PrometheusDB metric reporting
  • Slack reporting
  • ReportPortal reporting

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Make sure unit tests pass (pytest tests/unit)
  4. Add unit tests to keep coverage up, if necessary
  5. Commit your Changes (git commit -m 'Add some AmazingFeature')
  6. Push to the Branch (git push origin feature/AmazingFeature)
  7. Open a Pull Request

(back to top)

More Repositories

1

featuretools

An open source python library for automated feature engineering
Python
7,141
star
2

evalml

EvalML is an AutoML library written in python.
Python
737
star
3

open_source_demos

A collection of demos showcasing automated feature engineering and machine learning in diverse use cases
Jupyter Notebook
491
star
4

compose

A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
Python
488
star
5

predict-customer-churn

A general-purpose framework for solving problems with machine learning applied to predicting customer churn
Jupyter Notebook
401
star
6

Automated-Manual-Comparison

Automated vs Manual Feature Engineering Comparison. Implemented using Featuretools.
Jupyter Notebook
323
star
7

predict-remaining-useful-life

Predict remaining useful life of a component based on historical sensor observations using automated feature engineering
Jupyter Notebook
225
star
8

woodwork

Woodwork is a Python library that provides robust methods for managing and communicating data typing information.
Python
140
star
9

autonormalize

python library for automated dataset normalization
Python
108
star
10

predict-loan-repayment

Predict whether a loan will be repaid using automated feature engineering.
Jupyter Notebook
62
star
11

predict-taxi-trip-duration

Predict taxi trip duration based on historical trips using automated feature engineering
Jupyter Notebook
59
star
12

categorical_encoding

Repository for the research and implementation of categorical encoding into a Featuretools-compatible Python library
Jupyter Notebook
50
star
13

featuretools-tsfresh-primitives

TSFresh primitives for featuretools
Python
36
star
14

nlp_primitives

Natural Language Processing primitives for Featuretools
Python
36
star
15

predict-malicious-cyber-connections

Predict whether internet traffic is malicious given historical router traffic data
Jupyter Notebook
34
star
16

predict-correct-answer

Predict whether a student will correctly answer a problem based on past performance using automated feature engineering
HTML
32
star
17

DSx

Hands on tutorials demonstrating the concepts of Prediction engineering, Feature engineering and automation in data science.
Jupyter Notebook
29
star
18

predict-appointment-noshow

Predict whether or not a patient will show up to their next appointment using automated feature engineering
Jupyter Notebook
29
star
19

predict-olympic-medals

Predict how many medals a country will win at the Olympics based on past performance using automated feature engineering
Jupyter Notebook
28
star
20

snakeplane

A flexible, easy-to-use abstraction layer for building tools for the Alteryx Python SDK
Python
27
star
21

python-sdk-samples

A repository for all sample plugins created with the Alteryx python SDK
Python
25
star
22

predict-household-poverty

Predict the poverty of households in Costa Rica using automated feature engineering.
Jupyter Notebook
23
star
23

AlteryxRhelper

Create, manage and edit R code outside Alteryx in an IDE
R
20
star
24

alteryx-tool-generator

Generator to scaffold a custom Alteryx Designer tool.
JavaScript
18
star
25

DL-DB

Deep learning for time-varying multi-entity datasets
Python
17
star
26

promote-python

Python library for deploying models built using Python to Alteryx Promote.
Python
16
star
27

henchman

A collection of repeated use utility functions for notebook demos.
Python
15
star
28

AlteryxPredictive

This is an R package containing utility functions used by the predictive tools in Alteryx.
R
15
star
29

cookbook-alteryx-server

Chef cookbook for Alteryx Server
Ruby
14
star
30

ayx-developer-sdk

Alteryx Developer Software Development Kit (SDK)
12
star
31

featuretools-sklearn-transformer

Featuretools' DFS as a scikit-learn transformer
Python
10
star
32

sparkGLM

An R-like GLM package for Apache Spark
Scala
10
star
33

featuretools_sql

Automated creation of EntitySets from relational data stored in SQL databases
Python
10
star
34

flightdeck

Interactive Dashboard for Predictive Models
CSS
8
star
35

mini-tate

TypeScript
8
star
36

featuretools-docker

Use docker to provision Featuretools with a Jupyter notebook server
Dockerfile
7
star
37

dev-harness

TypeScript
7
star
38

jeeves

A sagacious valet to build and maintain predictive tools in Alteryx.
R
7
star
39

alteryx-ui

JavaScript
6
star
40

learning-guide

Want to use Alteryx, but not sure where to start? To guide you through your journey, we have provided a comprehensive list of available resources!
HTML
6
star
41

ui-automation-samples

HTML
5
star
42

pythontool-ayx-package

Python
5
star
43

OpenYXDB

C
5
star
44

gh-action-ci

A GitHub Action integrated with the GitHub and CircleCI API.
Python
5
star
45

promote-r-client

R package for deploying models built using R to Alteryx Promote.
R
5
star
46

DLDB-Demos

Jupyter Notebook
5
star
47

premium_primitives

Python
4
star
48

D3M-Online-Retail-Dataset

Convert D3M raw dataset to D3M clean dataset with Featuretools
Python
4
star
49

JavaScriptTool

Alteryx tool to execute arbitrary JavaScript code within the Alteryx workflow.
JavaScript
4
star
50

generator-node-typescript-simple

An opinionated yeoman generator for node packages with typescript. Based on generator-node-typescript.
JavaScript
3
star
51

AlteryxSim

R package for Simulation in Alteryx
R
3
star
52

AlteryxPrescriptive

R Package for Optimization in Alteryx
R
3
star
53

alteryx-open-src-update-checker

An add-on for Alteryx open source that automatically checks for the latest updates and warnings you when an Alteryx package is out of date.
Python
3
star
54

gh-action-pypi-upload

GitHub action to upload to PyPi
Shell
2
star
55

AlteryxPythonSdk-teaching-a-spider-to-crawl

Python
2
star
56

react-comms

JavaScript
2
star
57

ta1-primitives

Python
2
star
58

predict-restaurant-rating

Predict the rating given to a restaurant based solely on the review text. Uses custom NLP primitives.
Jupyter Notebook
2
star
59

Code_for_weekly_challenge

Code used to generate datasets for Alteryx's weekly challenges on the Community
R
1
star
60

adobe-analytics

Generate on demand report data from your Adobe Analytics report suites.
JavaScript
1
star
61

AlteryxAddins

R
1
star
62

GoogleAnalytics

Alteryx Google Analytics Plugin
JavaScript
1
star
63

Logistic_Regression

Logistic Regression Tool
CSS
1
star
64

CheckMates

CheckMate is an AutoML library which catches and warns of problems with your data and problem setup before modeling
Python
1
star