• Stars
    star
    127
  • Rank 272,720 (Top 6 %)
  • Language
    Java
  • License
    BSD 3-Clause "New...
  • Created over 6 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tool that helps you get security patches for Docker images into production as quickly as possible without breaking things

Multi-Module Maven Build / Deploy codecov Maven Central Docker Image Version (latest semver)

Dockerfile Image Updater

This tool provides a mechanism to make security updates to docker images at scale. The tool searches github for declared docker images and sends pull requests to projects that are not using the desired version of the requested docker image.

Docker builds images using a declared Dockerfile. Within the Dockerfile, there is a FROM declaration that specifies the base image and a tag that will be used as the starting layers for the new image. If the base image that FROM depends on is rebuilt, the Docker images that depend on it will never be updated with the newer layers. This becomes a major problem if the reason the base image was updated was to fix a security vulnerability. All Docker images are often based on operating system libraries and these get patched for security updates quite frequently. This tool, the Dockerfile Image Updater was created to automatically make sure that child images are updated when the images they depend on get updated.

Table of contents

User Guide

What it does

The tool has three modes

  1. all - Reads store that declares the docker images and versions that you intend others to use.

    Example:

    export git_api_url=https://api.github.com
    export git_api_token=my_github_token
    docker run --rm -e git_api_token -e git_api_url \
      salesforce/dockerfile-image-update all image-to-tag-store
    
  2. parent - Searches github for images that use a specified image name and sends pull requests if the image tag doesn't match intended tag. The intended image with tag is passed in the command line parameters. The intended image-to-tag mapping is persisted in a store in a specified git repository under the token owner.

    Example:

    export git_api_url=https://api.github.com
    export git_api_token=my_github_token
    docker run --rm -e git_api_token -e git_api_url \
      salesforce/dockerfile-image-update parent my_org/my_image v1.0.1 \
      image-to-tag-store
    
  3. child - Given a specific git repo, sends a pull request to update the image to a given version. You can optionally persist the image version combination in the image-to-tag store.

    Example:

    export git_api_url=https://api.github.com
    export git_api_token=my_github_token
    docker run --rm -e git_api_token -e git_api_url \
      salesforce/dockerfile-image-update child my_gh_org/my_gh_repo \
      my_image_name v1.0.1
    

Prerequisites

In environment variables, please provide:

  • git_api_token : This is your GitHub token to your account. Set these privileges by: going to your GitHub account --> settings --> Personal access tokens --> check repo and delete_repo.
  • git_api_url : This is the Endpoint URL of the GitHub API. In general GitHub, this is https://api.github.com/; for Enterprise, this should be https://hostname/api/v3. (this variable is optional; you can provide it through the command line.)

Precautions

  1. This tool may create a LOT of forks in your account. All pull requests created are through a fork on your own account.
  2. We currently do not operate on forked repositories due to limitations in forking a fork on GitHub. We should invest some time in doing this right. See issue #21
  3. Submodules are separate repositories and get their own pull requests.

How to use it

Our recommendation is to run it as a docker container:

export git_api_url=https://api.github.com
export git_api_token=my_github_token
docker run --rm -e git_api_token -e git_api_url \
  salesforce/dockerfile-image-update <COMMAND> <PARAMETERS>
usage: dockerfile-image-update [-h] [-l GHAPISEARCHLIMIT] [-o ORG] [-b BRANCH] [-g GHAPI] [-f] [-m M] [-c C] [-e EXCLUDES] [-B B] [-s {true,false}] [-x X] COMMAND ...

Image Updates through Pull Request Automator

named arguments:
  -h, --help             show this help message and exit
  -l GHAPISEARCHLIMIT, --ghapisearchlimit GHAPISEARCHLIMIT
                         limit the search results for github api (default: 1000)
  -o ORG, --org ORG      search within specific organization (default: all of github)
  -b BRANCH, --branch BRANCH
                         make pull requests for given branch name (default: main)
  -g GHAPI, --ghapi GHAPI
                         link to github api; overrides environment variable
  -f, --auto-merge       NOT IMPLEMENTED / set to automatically merge pull requests if available
  -m M                   message to provide for pull requests
  -c C                   additional commit message for the commits in pull requests
  -e EXCLUDES, --excludes EXCLUDES
                         regex of repository names to exclude from pull request generation
  -B B                   additional body text to include in pull requests
  -s {true,false}, --skipprcreation {true,false}
                         Only update image tag store. Skip creating PRs
  -x X                   comment snippet mentioned in line just before FROM instruction for ignoring a child image. Defaults to 'no-dfiu'
  -r, --rate_limit_pr_creations 
                         Enable rateLimiting for throttling the number of PRs DFIU will cut over a period of time. 
                         The argument value should be in format "<positive_integer>-<ISO-8601_formatted_time>". For example "--rate_limit_pr_creations 60-PT1H" to create 60 PRs per hour.
                         Default is not set, this means no ratelimiting is imposed.
                         
subcommands:
  Specify which feature to perform

  COMMAND                FEATURE
    parent               updates all repositories' Dockerfiles with given base image
    all                  updates all repositories' Dockerfiles
    child                updates one specific repository with given tag

The all command

Specify an image-to-tag store (a repository name on GitHub that contains a file called store.json); looks through the JSON file and checks/updates all the base images in GitHub to the tag in the store.

usage: dockerfile-image-update all [-h] <IMG_TAG_STORE>

positional arguments:
  <IMG_TAG_STORE>        REQUIRED

optional arguments:
  -h, --help             show this help message and exit

The child command

Forcefully updates a repository's Dockerfile(s) to given tag. If specified a store, it will also forcefully update the store.

usage: dockerfile-image-update child [-h] [-s <IMG_TAG_STORE>] <GIT_REPO> <IMG> <FORCE_TAG>

positional arguments:
  <GIT_REPO>             REQUIRED
  <IMG>                  REQUIRED
  <FORCE_TAG>            REQUIRED

optional arguments:
  -h, --help             show this help message and exit
  -s <IMG_TAG_STORE>     OPTIONAL

The parent command

Given an image, tag, and store, it will create pull requests for any Dockerfiles that has the image as a base image and an outdated tag. It also updates the store.

usage: dockerfile-image-update parent [-h] <IMG> <TAG> <IMG_TAG_STORE>

positional arguments:
  <IMG>                  REQUIRED
  <TAG>                  REQUIRED
  <IMG_TAG_STORE>        REQUIRED

optional arguments:
  -h, --help             show this help message and exit

Skip An Image

In case you want the tool to skip updating a particular image tag then add a comment no-dfiu after the FROM declaration in the Dockerfile. The tool will process the comment following FROM declaration and if no-dfiu is mentioned, pull request for that image tag will be ignored. You can use an alternate comment string by passing an additional command line parameter -x IGNORE_IMAGE_STRING. In that case string mentioned with the parameter, will be used for skipping PR creation.

Example:

FROM imagename:imagetag # no-dfiu

PR throttling

In case you want to throttle the number of PRs cut by DFIU over a period of time, set --rate_limit_pr_creations with appropriate value.

Default case:

By default, this feature is disabled. This will be enabled when argument --rate_limit_pr_creations will be passed with appropriate value.

example: dockerfile-image-update all image-tag-store-repo-falcon //throttling will be disabled by default
Configuring the rate limit:

Below are some examples that will throttle the number of PRs cut based on values passed to the argument --rate_limit_pr_creations The argument value should be in format <positive_integer>-<ISO-8601_formatted_time>. For example --rate_limit_pr_creations 60-PT1H would mean the tool will cut 60 PRs every hour and the rate of adding a new PR will be (PT1H/60) i.e. one minute. This will distribute the load uniformly and avoid sudden spikes, The process will go in waiting state until next PR could be sent.

Below are some more examples:

Usage: 
    dockerfile-image-update --rate_limit_pr_creations 60-PT1H all image-tag-store-repo-falcon //DFIU can send up to 60 PRs per hour.
    dockerfile-image-update --rate_limit_pr_creations 500-PT1H all image-tag-store-repo-falcon //DFIU can send up to 500 PRs per hour.
    dockerfile-image-update --rate_limit_pr_creations 86400-PT24H all image-tag-store-repo-falcon //DFIU can send up to 1 PRs per second.
    dockerfile-image-update --rate_limit_pr_creations 1-PT1S all image-tag-store-repo-falcon //Same as above. DFIU can send up to 1 PRs per second.
    dockerfile-image-update --rate_limit_pr_creations 5000 all image-tag-store-repo-falcon //rate limiting will be disabled because argument is not in correct format.

Developer Guide

Building

git clone https://github.com/salesforce/dockerfile-image-update.git
cd dockerfile-image-update
mvn clean install

Running locally

cd dockerfile-image-update/target
java -jar dockerfile-image-update-1.0-SNAPSHOT.jar <COMMAND> <PARAMETERS>

Creating a new feature

Under dockerfile-image-update/src/main/java/com/salesforce/dva/dockerfileimageupdate/subcommands/impl, create a new class YOUR_FEATURE.java. Make sure it implements ExecutableWithNamespace and has the SubCommand annotation with a help, requiredParams, and optionalParams. Then, under the execute method, code what you want this tool to do.

Running unit tests

Run unit tests by running mvn test.

Running integration tests

Before you run the integration tests (locally):

  1. Make sure that you have access to the github orgs specified in TestCommon.ORGS. You likely will need to change it to three orgs where you have permissions to create repositories.

  2. Make sure you have git_api_url=https://api.github.com in /dockerfile-image-update-itest/itest.env, or set it to your internal GitHub Enterprise.

  3. Make sure you have a secret file which contains the git_api_token. The token needs to have delete_repo, repo permissions. You can generate your token by going to personal access tokens in GitHub. Once you have your token place it in a file:

    echo git_api_token=[copy personal access token here] > ${HOME}/.dfiu-itest-token
    
  4. Export the following environment variable to point to the file:

    export user_itest_secrets_file_secret=${HOME}/.dfiu-itest-token
    
  5. Run integration tests by running

    make integration-test
    

Release Process

We currently use GitHub Actions and Releases. In order to collect dependency updates from dependabot and any other minor changes, we've switched to a process to manually trigger the release process. For now, that looks like the following:

1. Versioned Git Tag

  • Decide what version you desire to have. If you want to bump the major or minor version then you need to bump the MVN_SNAPSHOT_VERSION in the Makefile and in the Dockerfile before proceeding to the next steps. For example MVN_SNAPSHOT_VERSION=1.0-SNAPSHOT to MVN_SNAPSHOT_VERSION=2.0-SNAPSHOT.
  • After PRs have been merged to the primary branch, go to the Actions tab and trigger the Release new version Workflow. This will build, integration test, deploy the latest version to Docker Hub and Maven Central, and tag that commit hash with the next semantic version.

2. Cut Release with Release Notes

  • PRs continually get updated with labels by Pull Request Labeler and that helps set us up for nice release notes by Release Drafter.
  • Once that release has been tagged you can go to the draft release which is continually updated by Release Drafter and select the latest tag to associate with that release. Change the version to reflect the same version as the tag (1.0.${NEW_VERSION}). Take a look at the release notes to make sure that PRs are categorized correctly. The categorization is based on the labels of the PRs. You can either fix the labels on the PRs, which will trigger the release drafter action, or simply modify the release notes before publishing. Ideally we'll automate this to run at the end of the triggered workflow with something like svu.

Checking Code Climate Locally

If you'd like to check Code Climate results locally you can run the following:

docker run --interactive --tty --rm \
 --env CODECLIMATE_CODE="$(pwd)" \
 --volume "$(pwd)":/code \
 --volume /var/run/docker.sock:/var/run/docker.sock \
 --volume /tmp/cc:/tmp/cc \
 codeclimate/codeclimate analyze README.md

Blogs / Slides

More Repositories

1

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
Jupyter Notebook
8,226
star
2

CodeGen

CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
Python
4,594
star
3

BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Jupyter Notebook
3,879
star
4

akita

πŸš€ State Management Tailored-Made for JS Applications
TypeScript
3,442
star
5

Merlion

Merlion: A Machine Learning Framework for Time Series Intelligence
Python
3,232
star
6

ja3

JA3 is a standard for creating SSL client fingerprints in an easy to produce and shareable way.
Python
2,502
star
7

CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
Python
2,437
star
8

decaNLP

The Natural Language Decathlon: A Multitask Challenge for NLP
Python
2,301
star
9

TransmogrifAI

TransmogrifAI (pronounced trΔƒns-mŏgˈrΙ™-fΔ«) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Scala
2,227
star
10

policy_sentry

IAM Least Privilege Policy Generator
Python
1,938
star
11

awd-lstm-lm

LSTM and QRNN Language Model Toolkit for PyTorch
Python
1,900
star
12

cloudsplaining

Cloudsplaining is an AWS IAM Security Assessment tool that identifies violations of least privilege and generates a risk-prioritized report.
JavaScript
1,865
star
13

ctrl

Conditional Transformer Language Model for Controllable Generation
Python
1,766
star
14

lwc

⚑️ LWC - A Blazing Fast, Enterprise-Grade Web Components Foundation
JavaScript
1,537
star
15

WikiSQL

A large annotated semantic parsing corpus for developing natural language interfaces.
HTML
1,520
star
16

sloop

Kubernetes History Visualization
Go
1,396
star
17

CodeTF

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Python
1,375
star
18

ALBEF

Code for ALBEF: a new vision-language pre-training method
Python
1,276
star
19

pytorch-qrnn

PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM
Python
1,255
star
20

ai-economist

Foundation is a flexible, modular, and composable framework to model socio-economic behaviors and dynamics with both agents and governments. This framework can be used in conjunction with reinforcement learning to learn optimal economic policies,Β as done by the AI Economist (https://www.einstein.ai/the-ai-economist).
Python
964
star
21

jarm

Python
914
star
22

design-system-react

Salesforce Lightning Design System for React
JavaScript
896
star
23

tough-cookie

RFC6265 Cookies and CookieJar for Node.js
TypeScript
858
star
24

reactive-grpc

Reactive stubs for gRPC
Java
814
star
25

OmniXAI

OmniXAI: A Library for eXplainable AI
Jupyter Notebook
782
star
26

xgen

Salesforce open-source LLMs with 8k sequence length.
Python
704
star
27

vulnreport

Open-source pentesting management and automation platform by Salesforce Product Security
HTML
593
star
28

UniControl

Unified Controllable Visual Generation Model
Python
577
star
29

hassh

HASSH is a network fingerprinting standard which can be used to identify specific Client and Server SSH implementations. The fingerprints can be easily stored, searched and shared in the form of a small MD5 fingerprint.
Python
525
star
30

progen

Official release of the ProGen models
Python
518
star
31

Argus

Time series monitoring and alerting platform.
Java
501
star
32

base-components-recipes

A collection of base component recipes for Lightning Web Components on Salesforce Platform
JavaScript
496
star
33

matchbox

Write PyTorch code at the level of individual examples, then run it efficiently on minibatches.
Python
488
star
34

PCL

PyTorch code for "Prototypical Contrastive Learning of Unsupervised Representations"
Python
483
star
35

cove

Python
470
star
36

CodeRL

This is the official code for the paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (NeurIPS22).
Python
465
star
37

DialogStudio

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI
Python
431
star
38

warp-drive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
Python
429
star
39

observable-membrane

A Javascript Membrane implementation using Proxies to observe mutation on an object graph
TypeScript
368
star
40

PyRCA

PyRCA: A Python Machine Learning Library for Root Cause Analysis
Python
367
star
41

DeepTime

PyTorch code for Learning Deep Time-index Models for Time Series Forecasting (ICML 2023)
Python
322
star
42

ULIP

Python
316
star
43

logai

LogAI - An open-source library for log analytics and intelligence
Python
298
star
44

MultiHopKG

Multi-hop knowledge graph reasoning learned via policy gradient with reward shaping and action dropout
Jupyter Notebook
290
star
45

CodeGen2

CodeGen2 models for program synthesis
Python
269
star
46

provis

Official code repository of "BERTology Meets Biology: Interpreting Attention in Protein Language Models."
Python
269
star
47

jaxformer

Minimal library to train LLMs on TPU in JAX with pjit().
Python
255
star
48

EDICT

Jupyter Notebook
247
star
49

causalai

Salesforce CausalAI Library: A Fast and Scalable framework for Causal Analysis of Time Series and Tabular Data
Jupyter Notebook
223
star
50

ETSformer

PyTorch code for ETSformer: Exponential Smoothing Transformers for Time-series Forecasting
Python
221
star
51

themify

πŸ‘¨β€πŸŽ¨ CSS Themes Made Easy. A robust, opinionated solution to manage themes in your web application
TypeScript
216
star
52

rules_spring

Bazel rule for building Spring Boot apps as a deployable jar
Starlark
215
star
53

simpletod

Official repository for "SimpleTOD: A Simple Language Model for Task-Oriented Dialogue"
Python
212
star
54

TabularSemanticParsing

Translating natural language questions to a structured query language
Jupyter Notebook
210
star
55

grpc-java-contrib

Useful extensions for the grpc-java library
Java
208
star
56

GeDi

GeDi: Generative Discriminator Guided Sequence Generation
Python
207
star
57

aws-allowlister

Automatically compile an AWS Service Control Policy that ONLY allows AWS services that are compliant with your preferred compliance frameworks.
Python
207
star
58

mirus

Mirus is a cross data-center data replication tool for Apache Kafka
Java
200
star
59

generic-sidecar-injector

A generic framework for injecting sidecars and related configuration in Kubernetes using Mutating Webhook Admission Controllers
Go
200
star
60

CoST

PyTorch code for CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting (ICLR 2022)
Python
196
star
61

factCC

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
Python
192
star
62

runway-browser

Interactive visualization framework for Runway models of distributed systems
JavaScript
188
star
63

glad

Global-Locally Self-Attentive Dialogue State Tracker
Python
186
star
64

ALPRO

Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Python
177
star
65

densecap

Jupyter Notebook
176
star
66

cloud-guardrails

Rapidly apply hundreds of security controls in Azure
HCL
174
star
67

booksum

Python
167
star
68

kafka-junit

This library wraps Kafka's embedded test cluster, allowing you to more easily create and run integration tests using JUnit against a "real" kafka server running within the context of your tests. No need to stand up an external kafka cluster!
Java
166
star
69

sfdx-lwc-jest

Run Jest against LWC components in SFDX workspace environment
JavaScript
156
star
70

ctrl-sum

Resources for the "CTRLsum: Towards Generic Controllable Text Summarization" paper
Python
144
star
71

cos-e

Commonsense Explanations Dataset and Code
Python
143
star
72

hierarchicalContrastiveLearning

Python
140
star
73

secure-filters

Anti-XSS Security Filters for EJS and More
JavaScript
138
star
74

metabadger

Prevent SSRF attacks on AWS EC2 via automated upgrades to the more secure Instance Metadata Service v2 (IMDSv2).
Python
129
star
75

Converse

Python
125
star
76

refocus

The Go-To Platform for Visualizing Service Health
JavaScript
125
star
77

CoMatch

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization
Python
117
star
78

BOLAA

Python
114
star
79

bazel-eclipse

This repo holds two IDE projects. One is the Eclipse Feature for developing Bazel projects in Eclipse. The Bazel Eclipse Feature supports importing, building, and testing Java projects that are built using the Bazel build system. The other is the Bazel Java Language Server, which is a build integration for IDEs such as VS Code.
Java
108
star
80

botsim

BotSIM - a data-efficient end-to-end Bot SIMulation toolkit for evaluation, diagnosis, and improvement of commercial chatbots
Jupyter Notebook
108
star
81

near-membrane

JavaScript Near Membrane Library that powers Lightning Locker Service
TypeScript
107
star
82

rng-kbqa

Python
105
star
83

MUST

PyTorch code for MUST
Python
103
star
84

fsnet

Python
101
star
85

bro-sysmon

How to Zeek Sysmon Logs!
Zeek
101
star
86

Timbermill

A better logging service
Java
99
star
87

best

πŸ† Delightful Benchmarking & Performance Testing
TypeScript
95
star
88

eslint-config-lwc

Opinionated ESLint configurations for LWC projects
JavaScript
93
star
89

craft

CRAFT removes the language barrier to create Kubernetes Operators.
Go
91
star
90

AuditNLG

AuditNLG: Auditing Generative AI Language Modeling for Trustworthiness
Python
90
star
91

online_conformal

Methods for online conformal prediction.
Jupyter Notebook
90
star
92

lobster-pot

Scans every git push to your Github organisations to find unwanted secrets.
Go
88
star
93

violet-conversations

Sophisticated Conversational Applications/Bots
JavaScript
84
star
94

ml4ir

Machine Learning for Information Retrieval
Jupyter Notebook
84
star
95

apex-mockery

Lightweight mocking library in Apex
Apex
83
star
96

fast-influence-functions

Python
80
star
97

MoPro

MoPro: Webly Supervised Learning
Python
79
star
98

TaiChi

Open source library for few shot NLP
Python
79
star
99

helm-starter-istio

An Istio starter template for Helm
Shell
78
star
100

QAConv

This repository maintains the QAConv dataset, a question-answering dataset on informative conversations including business emails, panel discussions, and work channels.
Python
77
star