• This repository has been archived on 05/Jul/2023
  • Stars
    star
    137
  • Rank 266,121 (Top 6 %)
  • Language
    PHP
  • License
    Other
  • Created about 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Project Open Data Dashboard

CircleCI

The Project Open Data Dashboard provides a variety of tools and capabilities to help manage the implementation of Project Open Data. It is primary used for Federal agencies, but also provides tools and resources for use by other entities like state and local government.

The primary place for the user-facing documentation is https://labs.data.gov/dashboard/docs

Features

  • Dashboard overview of the status of each federal agency's implementation of Project Open Data for each milestone.
  • Permissioned Content Editing for the fields in the dashboard that can't be automated. The fields are stored as JSON objects so the data model is very flexible and can be customized without database changes. User accounts are handled via Github.
  • Automated crawls for each agency to report metrics from Project Open Data assets (data.json, digitalstrategy.json, /data page, etc). This includes reporting on the number of data sets and validation against the Project Open Data metadata schema.
  • A validator to validate Project Open Data data.json files via URL, file upload, or text input. This can be used for testing both data.json Public Data Listing files as well as the Enterprise Data Inventory JSON. The validator can be used both by Federal agencies as well as non-federal entities by specifying the Non-Federal schema.
  • Converters to export existing data from data.gov
  • Changeset viewer to see how a data.json file for an agency compares to metadata currently hosted on data.gov

CLI Interface

In addition to the web interface, there's also a Command Line Interface to manage the crawls of data.json, digitalstrategy.json, and /data pages. This is helpful to run specific updates, but it's primary use is with a CRON job.

From the root of the application, you can update the status of agencies using a few different options on the campaign controller. The syntax is:

$ php public/index.php campaign status [id] [component]

If you wanted to update all components (data.json, digitalstrategy.json, /data) for all agencies, you'd run this command:

$ php public/index.php campaign status all all

If you just wanted to update the data.json status for CFO Act agencies you'd run:

$ php public/index.php campaign status cfo-act datajson

If you just wanted to update the data.json status for agencies being monitored by the OMB you'd run:

$ php public/index.php campaign status omb-monitored datajson

If you just wanted to update the digitalstrategy.json status for the Department of Agriculture you'd run:

$ php public/index.php campaign status 49015 digitalstrategy

There are agencies whose crawls take a long time to complete. These are identified with the id of long-running. You can find a current list of these in this db migration. To initiate a full-scan for these agencies, you'd run:

$ php public/index.php campaign status long-running full-scan

The options for [id] are: all,cfo-act, omb-monitored, long-running or the ID provided by the USA.gov Federal Agency Directory API.

The options for [component] are: all, datajson, datapage, digitalstrategy, download, full-scan.

  • The datajson component captures the basic characteristics of a request to an agency's data.json file (like whether it returns an HTTP 200) and then attempts to parse the file, validate against the schema, and provide other reporting metrics like the number of datasets listed.
  • The digitalstrategy component captures the basic characteristics of a request to an agency's digitalstrategy.json file (like whether it returns an HTTP 200)
  • The datapage component captures the basic characteristics of a request to an agency's /data page (like whether it returns an HTTP 200)
  • The download component downloads an archive of the data.json and digitalstrategy.json files
  • The full-scan component does further validation based on the content of the response
  • As you'd expect, all does all of these things at once.

Development

This is a CodeIgniter PHP application. We use Docker and Docker compose for local development and cloud.gov for testing and production (pending migration from BSP.)

Prerequisites:

By default, the ENVIRONMENT variable is set to production so that error messages will not be displayed. To display these messages while developing, you should edit your .env file to include the variable CI_ENV set to anything other than production. See index.php for more details.

Setup

Install application dependencies

make install-dev-dependencies

Start up the application and database

make up

Run tests

make test

Open your browser to localhost:8000.

Restoring database dumps

If you need a database dump, you can create one following instructions from the Runbook. Clean up the database dump by removing any USE database statement, or CREATE DATABASE statement. Then:

cat cleaned_database.sql |
  docker-compose run --rm database mysql \
  --host=database --user=root --password=mysql dashboard

After a database restore, test by viewing a USDA detail page:

curl http://localhost:8000/offices/detail/49015/2017-08-31

Making database schema changes

To update the schema

Add a new numbered migration class, then change the configured version number to match. To perform the migration, CodeIgniter will automatically run up() migration methods until the schema version in the database matches the configured version.

If you want to invoke the migration explicitly to test that it's working, you can run php public/index.php migrate. Otherwise expect that the migration will be invoked automatically before CodeIgniter will handle any other requests.

To revert the schema

Change the configured version number to match the schema version you want to revert to. CodeIgniter will automatically run down() migration methods until the schema version in the database matches the configured version.

You can invoke the reversion as described for updates above.

Migration requirements

The dashboard uses MySQL for the backend database. MySQL doesn't support transactions around schema-altering statements. If any problems are encountered during a migration, the app is likely to wind up in a confused state where schema-altering statements have been applied, but the version of the schema in the database remains at the previous version. The migration will be attempted over and over again, often exhibiting user-visible errors or other bad behavior until manual intervention happens.

To avoid this, we need to be careful to write migrations that are both idempotent and reversible. (That is, we should be able to run them again without generating errors, and we should be able to downgrade to previous schema versions automatically.)

This requires some care because there's no guaranteed way to make it happen. Whenever we do a PR review that includes a schema change, the answer should be "yes" to all of these questions:

  • Does each of the schema-altering statements happen in its own migration?
  • Does the down() method exist on the migration, and does it undo any schema-changing action performed in the up() method?
  • Does every CREATE TABLE statement use IF NOT EXISTS?
  • Does every DROP TABLE statement use IF EXISTS?
  • Does every ADD/CHANGE/ALTER COLUMN happen via a call to the idempotent add_column_if_not_exists helper?
  • Does every DROP COLUMN happen via a call to the idempotent drop_column_if_exists helper?

CircleCI testing

All pushes to GitHub are integration tested with our CircleCI tests.

Updating composer dependencies

Edit version constraints in composer.json.

make update-dependencies

Commit the updated composer.json and composer.lock.

Deploying to cloud.gov

Quickstart with an empty database

Copy the vars.yml.template file and rename it to vars.yml. Edit any values following the comments in the file.

If you are not logged in for the Cloud Foundry CLI, follow the steps in this guide

Assuming you're logged in for the Cloud Foundry CLI, Run the following commands and replacing ${app_name} with the value in your vars.yml file.

$ cf create-service aws-rds small-mysql-redundant ${app_name}-db

$ cf create-service s3 basic-public ${app_name}-s3

$ cf create-user-provided-service ${app_name}-secrets -p '{
  "ENCRYPTION_KEY": "long-random-string"
}'

$ cf set-env ${app_name} NEWRELIC_LICENSE license-key-obtained-from-newrelic-account

$ cf push --vars-file vars.yml
Waiting for app to start...

name:              app
requested state:   started
routes:            <b><u>app-boring-sable.app.cloud.gov</u></b>
last uploaded:     Wed 28 Aug 10:02:06 EDT 2019
stack:             cflinuxfs3
buildpacks:        php_buildpack

type:            web
instances:       1/1
memory usage:    256M
start command:   $HOME/.bp/bin/start
     state     since                  cpu    memory          disk             details
#0   running   2019-08-28T14:02:25Z   0.3%   24.3M of 256M   301.7M of 512M

You should be able to visit https://<ROUTE>/offices/qa, where <ROUTE> is the route reported from cf push:

Restoring a database backup to cloud.gov:

If you need a database dump, you can create one following instructions from the Runbook. Clean up the database dump by removing any USE database statement, or CREATE DATABASE statement. We'll call this cleaned_database.sql below. Then:

Install the cf-service-connect plugin, e.g., for version 1.1.0 of the plugin on a MacOS system:

cf install-plugin https://github.com/18F/cf-service-connect/releases/download/1.1.0/cf-service-connect-darwin-amd64

Open up a tunnel to the database, and leave the tunnel open for the next step:

$ cf connect-to-service --no-client app database
Host: localhost
Port: NNNN
Username: randomuser
Password: randompass
Name: cgawsbrokerrandomname

In a separate terminal session, use the connection information to make a MySQL connection to restore cleaned_database.sql. When prompted for a password, paste in the password (e.g randompass in this example).

cat cleaned_database.sql | 
  mysql -h 127.0.0.1 -PNNNN -u randomuser -p cgawsbrokerrandomname

After a restore, you should be able to view an agency's detail page, such as: https://<ROUTE>/offices/detail/49015/2017-08-31

CI configuration

Create a GitHub environment for each application you're deploying. Each GH environment should be configured with secrets from a ci-deployer service account.

Secret name Description
CF_SERVICE_AUTH The service key password.
CF_SERVICE_USER The service key username.

Known issues

The agency hierarchy is designed to be populated from the contacts API at https://www.usa.gov/api/USAGovAPI/contacts.json/contact, but that is no longer available, so these following steps no longer work:

  • Federal agencies were seeded using the USA.gov Federal Agency Directory API and the IDs provided by that resource are used as the primary IDs on this dashboard.
  • First populate the top of the agency hierarchy: $ php public/index.php import
  • Second, populate all the subagencies: $ php public/index.php import children
  • If you have an empty database offices table in the database, you'll also want to seed it with agency data by running the import script (/application/controllers/import.php) from a command line. You'll also need to temporarily change the import_active option in config.php to true

Currently this tool does not handle large files in a memory efficient way. If you are unable to utilize a high amount of memory and are at risk of timeouts, you should set the maximum file size that the application can handle so it will avoid large files and fail more gracefully. The maximum size of JSON files to parse can be set with the max_remote_size option in config.php

What about S3?

S3 is used in a few places when config[use_local_storage] is false:

  • for archiving data.json and digitalstrategy (public)

The use_local_storage setting does not impact all uses of the upload class, just those cases above.

The archive_file function will use config[use_local_storage] anytime it's called but the logic doesn't apply when to datajson_lines is set as filetype.

Here's an outline of where S3 is used in the code:

models/Campaign_model.php:

  • archive_file which calls archive_to_s3 when use_local_storage is false
    • the validate_datajson function calls archive_file but sets filetype to datajson-lines so the archive_file function does not store it in S3, regardless of use_local_storage setting.
  • archive_to_s3 which calls put_to_s3 and stores with a PUBLIC acl
  • put_to_s3 which stores private by default
  • get_from_s3 previously used by csv_to_json; unused now

views/office_detail.php:

  • Builds a URL based on values of config/s3_bucket for displaying the "Analyze archive copies" line of Automated Metrics.

S3 changes for cloud.gov*

  • There's a need for one public S3 bucket for archiving data.json from crawls, and fetching them in the office_detail.php.

More Repositories

1

data

Assorted data from the General Services Administration.
HTML
2,097
star
2

datagov-wptheme

Data.gov WordPress Theme (obsolete)
JavaScript
1,892
star
3

data.gov

Main repository for the data.gov service
Shell
411
star
4

code-gov-web

DEPRECATED πŸ›‘- Federal Source Code policy implementation.
TypeScript
410
star
5

https

The HTTPS-Only Standard for federal domains (M-15-13), and implementation guidance.
Python
241
star
6

govt-urls

Most government websites end in .gov or .mil, but many do not. This repo contains USA.gov's list of public government domains and URLs that don't end in .gov or .mil.
213
star
7

fedramp-automation

FedRAMP Automation
TypeScript
211
star
8

digitalgov.gov

Digital.gov β€”Β Helping the government community deliver better digital services.
HTML
194
star
9

pdf-filler

PDF Filler is a RESTful service (API) to aid in the completion of existing PDF-based forms and empowers web developers to use browser-based forms and modern web standards to facilitate the collection of information.
JavaScript
170
star
10

plainlanguage.gov

A resource to help federal employees write in plain language and comply with the Plain Writing Act of 2010
SCSS
144
star
11

code-gov

An informative repo for all Code.gov repos
132
star
12

accessibility-for-teams

A β€˜quick-start’ guide for embedding accessibility and inclusive design practices into your team’s workflow
JavaScript
91
star
13

openacr

OpenACR is a digital native Accessibility Conformance Report (ACR). The initial development is based on Section 508 requirements. The main goal is to be able to compare the accessibility claims of digital products and services. A structured, self-validated, machine-readable documentation will provide for this.
JavaScript
82
star
14

piv-guides

This is the old location for the PIV Playbook. New location below.
JavaScript
66
star
15

search-gov

Source code for the GSA's Search.gov search engine
Ruby
63
star
16

modernization

Report to the President on IT Modernization
CSS
61
star
17

slash-developer-pages

A lightweight listing of /developer pages in government, including embed-ready html code and structured xml.
60
star
18

code-gov-api

API powering the code.gov source code harvester
JavaScript
52
star
19

sf-sandbox-post-copy

A framework for managing automation tasks that are fired upon sandbox refresh in Salesforce orgs.
Apex
52
star
20

AI-Assistant-Pilot

Inter-agency Federal AI Personal Assistant Pilot
46
star
21

touchpoints

Feedback platform for continuous improvement of systems, services, processes, and policy.
HTML
45
star
22

resources.data.gov

Resources for open data and enterprise data inventory management
SCSS
45
star
23

open-gsa-redesign

A fresh start for open.gsa.gov.
SCSS
44
star
24

fedramp-tailored

FedRAMP Tailored.
SCSS
43
star
25

punchcard

Repository of synonyms, protected words, stop words, and localizations
Ruby
41
star
26

code-gov-front-end

Front-end of code.gov
JavaScript
40
star
27

devsecops-example

Example implementation of the GSA DevSecOps Pipeline
HCL
38
star
28

cto-website

Tech at GSA website
JavaScript
36
star
29

federal-open-source-repos

Uses Javascript to query the Social Media Registry and GitHub APIs and list details about all federal open source code on GitHub
CoffeeScript
36
star
30

srt-fbo-scraper

Using machine learning to predict Federal IT procurement compliance with Section 508 Accessibility Standards
Python
35
star
31

DataBeam

Generic RESTful Interface for databases
PHP
35
star
32

ckanext-geodatagov

data.gov extension
XSLT
34
star
33

ficam-playbooks

The content on this repository was migrated to idmanagement.gov.
HTML
33
star
34

devsecops-cloud-custodian-rules

[WORK IN PROGRESS] A repo containing rule sets for cloud-custodian inside GSA AWS accounts. This repo does not contain cloud-custodian itself.
30
star
35

jobs_api

Allows you to tap into a list of current jobs openings with federal, state, and local government agencies. Jobs are searchable by keyword, location, agency, schedule, or any combination of these.
Ruby
30
star
36

asis

ASIS (Advanced Social Image Search) indexes Flickr and MRSS images and provides a search API across both indexes.
Ruby
30
star
37

Open-And-Structured-Content-Models

Open and structured content models drafted by a cross-agency working group.
CSS
29
star
38

Challenge_gov

Elixir
29
star
39

training-pathway-data-practitioner

Open source training material for the GSA Data Science Practitioner Learning Program
Jupyter Notebook
29
star
40

fpki-guides

This is the old location for the FPKI Playbook. New location below.
JavaScript
29
star
41

recalls_api

NOT SUPPORTED. Allows you to tap into a list of car, drug, food, and product safety recalls. Recalls are searchable by keyword, issuing agency, date, UPC code, vehicle-specific attributes, or any combination of these.
Ruby
29
star
42

catalog.data.gov

Development environment for catalog.data.gov
Python
28
star
43

sdg-indicators-usa

U.S. National Reporting Platform for the Sustainable Development Goals
JavaScript
28
star
44

uswds-sf-lightning-community

A Salesforce Lightning Community Theme and related components built upon US Web Design System
JavaScript
27
star
45

DevSecOps

Base infrastructure for future DevSecOps environment in AWS
26
star
46

ficam-arch

This is the old location for the FICAM Architecture. New location below.
JavaScript
26
star
47

federal-website-index

A project to build and maintain a comprehensive listing of the public websites of the U.S. federal government.
Python
25
star
48

idmanagement.gov

IDManagement.gov is a collaboration between GSA and the Federal CIO Council. It is managed by the Identity Assurance and Trusted Access Division in the GSA Office of Government-wide Policy.
JavaScript
25
star
49

ckanext-usmetadata

A CKAN extension for inventory.data.gov
Python
24
star
50

jenkins-shared-library-examples

Groovy
24
star
51

emerging-technology-atlas

Emerging Citizen Technology
CSS
23
star
52

piv-conformance

Tool to verify conformance to the PIV data model per most recent releases of FIPS 201 and associated publications
HTML
22
star
53

GitHub-Administration

GSA's administration and implementation of github.com/gsa
22
star
54

security-benchmarks

GSA Security Benchmarks and Tools
21
star
55

jenkins-deploy

deploy Jenkins to AWS with Terraform and Ansible
HCL
21
star
56

sam-design-system

TypeScript
21
star
57

ansible-os-win-2016

Ansible Roles for Windows Server 2016
21
star
58

ckanext-datagovtheme

Theme for Data Catalog
XSLT
20
star
59

gsa-doc-digital-signature

This tool is deprecated. Please follow these new procedures - https://playbooks.idmanagement.gov/signfedregister/
Java
20
star
60

catalog-app

Development environment for catalog.data.gov
Shell
19
star
61

IAE-Architecture

Repository for IAE architectural documents.
19
star
62

inventory-app

Docker image for ckan app powering inventory.data.gov
Python
18
star
63

usagov-benefits-eligibility

Benefits eligibility estimator tool for USAGov.
JavaScript
18
star
64

site-scanning-engine

The repository for the rearchitected site-scanning project, specifically the scanning engine itself.
TypeScript
18
star
65

ITDB-schema

IT Dashboard submissions schema, documentation and example files.
18
star
66

data-strategy

Federal data strategy website
HTML
18
star
67

citizenscience.gov

This is the new build of CitizenScience.gov using Jekyll on Federalist. Feel free to contribute or submit an issue!
HTML
17
star
68

digital-strategy

Machine-readable schema for describing action items within the president's digital strategy, and for reporting on its progress
17
star
69

digital-strategy-report-generator

Generates reports to describe agencies' progress in realizing the goals of the President's Digital Government Strategy
PHP
17
star
70

github-federal-stats

Bash scripts to generate metrics on U.S. Federal usage of GitHub using the GitHub APIs
HTML
16
star
71

SF-Event-Monitoring-Log-Retrieval

Python-based utility to fetch Salesforce Event Monitoring Logs and store them locally for consumption by log monitoring and analytics software.
Python
16
star
72

ngx-uswds

USWDS Components in Angular
TypeScript
16
star
73

EDX

GSA Enterprise Digital Experience (EDX)
JavaScript
16
star
74

AI_Grand_Challenge_For_Resiliency

AI Grand Challenge for Resiliency: Impact of U.S. Government Policy on COVID-19 using Natural Language Processing & Text Analytics
Python
16
star
75

Mobile-Code-Catalog

Source code from around, inside, and outside the federal government that can be helpful to federal agencies building mobile apps.
HTML
15
star
76

Open-Data-Collaboration-Sandbox

A sandbox for loose collaboration on assorted open data projects
CSS
15
star
77

cloudgov-demo-postgrest

Get a federally-compliant REST API for your CSV data on cloud.gov in about 60 seconds. ATO not included.
Shell
15
star
78

laptop-management

ALPHA/WIP for OSquery configuration for Mac and Linux Operating Systems
Shell
15
star
79

ckan-php-manager

A tool for managing a CKAN data catalog
PHP
15
star
80

innovation.gov

Deprecated - This project repo is no longer being maintained.
Ruby
14
star
81

i14y

Search engine for agencies' published content
Ruby
14
star
82

oscal-gen-tool

C#
14
star
83

threat-analysis

14
star
84

grace-inventory

Lambda function to create an inventory report of AWS services as an Excel spreadsheet in an S3 bucket. Includes Terraform code to deploy it.
Go
14
star
85

participation-playbook

US Public Participation Playbook
CSS
14
star
86

gsa-icam-card-builder

ICAM Test Card Signer and Data Populator
Java
14
star
87

fpkilint

Federal PKI, X.509 certificate linter
JavaScript
14
star
88

centers-of-excellence

All the excellent centers
HTML
14
star
89

Very-Simple-API

A barebones API
HTML
13
star
90

participate-nap4

Participate in the 4th U.S. National Action Plan for Open Government
13
star
91

ckan-php-client

A PHP client for the CKAN data catalog, used by https://github.com/GSA/ckan-php-manager
PHP
13
star
92

openacr-editor

With this tool, people can generate Accessibility Conformance Report in the OpenACR format.
Svelte
13
star
93

devsecops-ekk-stack

Terraform that builds an EKK logging stack
HCL
12
star
94

coe-industry-day

Information on the Phase II Industry Day for the Centers of Excellence at USDA.
12
star
95

recruiter

Embeddable forms to recruit research participants. Sends results to a Google Sheet, deployed via Google Tag Manager.
JavaScript
12
star
96

code-gov-style

Deprecated - Style for code.gov including buttons, banners, and cards
JavaScript
12
star
97

oscal-ssp-to-word

JavaScript
12
star
98

ansible-https-proxy

Ansible role to set up nginx as a secure proxy
11
star
99

10x

[Deprecated] Website for the Office of Investments in GSA’s Technology Transformation Service
JavaScript
11
star
100

site-scanning

The central repository for the Site Scanning program
11
star