• Stars
    star
    177
  • Rank 215,985 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created over 11 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A command line interface and Python module for accessing the CKAN Action API

ckanapi

A command line interface and Python module for accessing the CKAN Action API

Installation

Installation with pip:

pip install ckanapi

Installation with conda:

conda install -c conda-forge ckanapi

ckanapi CLI

The ckanapi command line interface lets you access local and remote CKAN instances for bulk operations and simple API actions.

Actions

Simple actions with string parameters may be called directly. The response is pretty-printed to STDOUT.

🔧 List names of groups on a remote CKAN site

$ ckanapi action group_list -r https://demo.ckan.org --insecure
[
  "data-explorer",
  "example-group",
  "geo-examples",
  ...
]

Use -r to specify the remote CKAN instance, and -a to provide an API KEY. Remote actions connect as an anonymous user by default. For this example, we use --insecure as the CKAN demo uses a self-signed certificate.

Local CKAN actions may be run by specifying the config file with -c. If no remote server or config file is specified the CLI will look for a development.ini file in the current directory, much like paster commands.

Local CKAN actions are performed by the site user (default system administrator) when -u is not specified.

To perform local actions with a less privileged user use the -u option with a user name or a name that doesn't exist. This is useful if you don't want things like deleted datasets or private information to be returned.

Note that all actions in the CKAN Action API and actions added by CKAN plugins are supported.

Action Arguments

Simple action arguments may be passed in KEY=STRING form for string values or in KEY:JSON form for JSON values.

🔧 View a dataset using a KEY=STRING parameter

$ ckanapi action package_show id=my-dataset-name
{
  "name": "my-dataset-name",
  ...
}

🔧 Get detailed info about a resource in the datastore

$ ckanapi action datastore_info id=my-resource-id-or-alias
{
  "meta": {
    "aliases": [
      "test_alias"
    ],
    "count": 1000,
  ...
}

🔧 Get the number of datasets for each organization using KEY:JSON parameters

$ ckanapi action package_search facet.field:'["organization"]' rows:0
{
  "facets": {
    "organization": {
      "org1": 42,
      "org2": 21,
      ...
    }
  },
  ...
}

🔧 Create a resource with a file attached

Files may be passed for upload using the KEY@FILE form.

$ ckanapi action resource_create package_id=my-dataset-with-files \
          upload@/path/to/file/to/upload.csv

🔧 Edit a dataset with a text editor

$ ckanapi action package_show id=my-dataset-id > my-dataset.json
$ nano my-dataset.json
$ ckanapi action package_update -I my-dataset.json
$ rm my-dataset.json

🔧 Update a single resource field

$ ckanapi action resource_patch id=my-resource-id size:42000000

Bulk Dumping and Loading

Datasets, groups, organizations, users and related items may be dumped to JSON lines text files and created or updated from JSON lines text files.

dump and load jobs can be run in parallel with multiple worker processes using the -p parameter. The jobs in progress, the rate of job completion and any individual errors are shown on STDERR while the jobs run.

There are no parallel limits when running against a CKAN on localhost.
When running against a remote site, there's a default limit of 3 worker processes.

The environment variables CKANAPI_MY_SITES andCKANAPI_PARALLEL_LIMIT can be used to adjust these limits. CKANAPI_MY_SITES (comma-delimited list of CKAN urls) will not have the PARALLEL_LIMIT applied.

dump and load jobs may be resumed from the last completed record or split across multiple servers by specifying record start and max values.

🔧 Dump datasets from CKAN into a local file with 4 processes

$ ckanapi dump datasets --all -O datasets.jsonl.gz -z -p 4 -r http://localhost

🔧 Export datasets including private ones using search

$ ckanapi search datasets include_private=true -O datasets.jsonl.gz -z \
          -c /etc/ckan/production.ini

search is faster than dump because it calls package_search to retrieve many records per call, paginating automatically.

You may add parameters supported by package_search to filter the records returned.

🔧 Load/update datasets from a dataset JSON lines file with 3 processes

$ ckanapi load datasets -I datasets.jsonl.gz -z -p 3 -c /etc/ckan/production.ini

Bulk Delete

Datasets, groups, organizations, users and related items may be deleted in bulk with the delete command. This command accepts ids or names on the command line or a number of different formats piped on standard input.

🔧 All datasets (JSON list of "id" or "name" values)

$ ckanapi action package_list -j | ckanapi delete datasets

🔧 Selective delete (JSON object with "results" list containing "id" values)

$ ckanapi action package_search q=ponies | ckanapi delete datasets

🔧 Processed JSON Lines (JSON objects with "id" or "name" value, one per line)

$ ckanapi dump groups --all > groups.jsonl
$ grep ponies groups.jsonl | ckanapi delete groups

🔧 Text list of "id" or "name" values (one per line)

$ cat users_to_remove.txt
fred
bill
larry
$ ckanapi delete users < users_to_remove.txt

Bulk Dataset and Resource Export - datapackage.json format

Datasets may be exported to a simplified datapackage.json format (which includes the actual resources, where available).

If the resource url is not available, the resource will be included in the datapackage.json file but the actual resource data will not be downloaded.

$ ckanapi dump datasets --all --datapackages=./output_directory/ -r http://sourceckan.example.com

Batch Actions

Run a set of actions from a JSON lines file. For local actions this is much faster than running ckanapi action ... in a shell loop because the local start-up time only happens once.

Batch actions can also be run in parallel with multiple processes and errors logged, just like the dump and load commands.

🔧 Update a dataset field across a number of datasets

$ cat update-emails.jsonl
{"action":"package_patch","data":{"id":"dataset-1","maintainer_email":"[email protected]"}}
{"action":"package_patch","data":{"id":"dataset-2","maintainer_email":"[email protected]"}}
{"action":"package_patch","data":{"id":"dataset-3","maintainer_email":"[email protected]"}}
$ ckanapi batch -I update-emails.jsonl

🔧 Replace a set of uploaded files

$ cat upload-files.jsonl
{"action":"resource_patch","data":{"id":"408e1b1d-d0ca-50ca-9ae6-aedcee37aaa9"},"files":{"upload":"data1.csv"}}
{"action":"resource_patch","data":{"id":"c1eab17f-c2d0-536d-a3f6-41a3dfe6a2c3"},"files":{"upload":"data2.csv"}}
{"action":"resource_patch","data":{"id":"8ed068c2-4d4c-5f20-90db-39d2d596ce1a"},"files":{"upload":"data3.csv"}}
$ ckanapi batch -I upload-files.jsonl --local-files

The "files" values in the JSON lines file is ignored unless the --local-files parameter is passed. Paths in the JSON lines file reference files on the local filesystems relative to the current working directory.

Shell pipelines

Simple shell pipelines are possible with the CLI.

🔧 Copy the name of a dataset to its title with 'jq'

$ ckanapi action package_show id=my-dataset \
  | jq '.+{"title":.name}' \
  | ckanapi action package_update -i

🔧 Mirror all datasets from one CKAN instance to another

$ ckanapi dump datasets --all -q -r http://sourceckan.example.com \
  | ckanapi load datasets

ckanapi Python Module

The ckanapi Python module may be used from within a CKAN extension or in a Python 2 or Python 3 application separate from CKAN.

RemoteCKAN

Making a request:

from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'

demo = RemoteCKAN('https://demo.ckan.org', user_agent=ua)
groups = demo.action.group_list(id='data-explorer')
print(groups)

result:

[u'data-explorer', u'example-group', u'geo-examples', u'skeenawild']

The example above is using an "action shortcut". The .action object detects the method name used ("group_list" above) and converts it to a normal call_action call. This is equivalent code without using an action shortcut:

groups = demo.call_action('group_list', {'id': 'data-explorer'})

Once again, all actions in the CKAN Action API and actions added by CKAN plugins are supported by action shortcuts and call_action calls.

For example, if the Showcase extension is installed:

from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'

demo = RemoteCKAN('https://demo.ckan.org', user_agent=ua)
showcases= demo.action.ckanext_showcase_list()
print(showcases)

Combining query parameters clauses is possible as in the following package_search action. This query combines three clauses that are all satisfied by the single example dataset in the Demo CKAN site.

More detailed complex query syntax examples can be found in the SOLR documentation.

from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'

demo = RemoteCKAN('https://demo.ckan.org', user_agent=ua)
packages = demo.action.package_search(q='+organization:sample-organization +res_format:GeoJSON +tags:geojson')
print(packages)

Many CKAN API functions can only be used by authenticated users. Use the apikey parameter to supply your CKAN API key to RemoteCKAN:

demo = RemoteCKAN('https://demo.ckan.org', apikey='MY-SECRET-API-KEY')

An example of updating a single field in an existing dataset can be seen in the Examples directory

Exceptions

  • NotAuthorized - user unauthorized or accessing a deleted item
  • NotFound - name/id not found
  • ValidationError - field errors listed in .error_dict
  • SearchQueryError - error reported from SOLR index
  • SearchError
  • CKANAPIError - incorrect use of ckanapi or unable to parse response
  • ServerIncompatibleError - the remote API is not a CKAN API

When using an action shortcut or the call_action method failures are raised as exceptions just like when calling get_action from a CKAN plugin:

from ckanapi import RemoteCKAN, NotAuthorized
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'

demo = RemoteCKAN('https://demo.ckan.org', apikey='phony-key', user_agent=ua)
try:
    pkg = demo.action.package_create(name='my-dataset', title='not going to work')
except NotAuthorized:
    print('denied')

When it is possible to import ckan all the ckanapi exception classes are replaced with the CKAN exceptions with the same names.

File Uploads

File uploads for CKAN 2.2+ are supported by passing file-like objects to action shortcut methods:

from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'

mysite = RemoteCKAN('http://myckan.example.com', apikey='real-key', user_agent=ua)
mysite.action.resource_create(
    package_id='my-dataset-with-files',
    url='dummy-value',  # ignored but required by CKAN<2.6
    upload=open('/path/to/file/to/upload.csv', 'rb'))

When using call_action you must pass file objects separately:

mysite.call_action('resource_create',
    {'package_id': 'my-dataset-with-files'},
    files={'upload': open('/path/to/file/to/upload.csv', 'rb')})

Session Control

As of ckanapi 4.0 RemoteCKAN will keep your HTTP connection open using a requests session.

For long-running scripts make sure to close your connections by using RemoteCKAN as a context manager:

from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'

with RemoteCKAN('https://demo.ckan.org', user_agent=ua) as demo:
    groups = demo.action.group_list(id='data-explorer')
print(groups)

Or by explicitly calling RemoteCKAN.close().

LocalCKAN

A similar class is provided for accessing local CKAN instances from a plugin in the same way as remote CKAN instances. Unlike CKAN's get_action LocalCKAN prevents data from one action call leaking into the next which can cause issues that are very hard do debug.

This class defaults to using the site user with full access.

from ckanapi import LocalCKAN, ValidationError

registry = LocalCKAN()
try:
    registry.action.package_create(name='my-dataset', title='this will work fine')
except ValidationError:
    print('unless my-dataset already exists')

For extra caution pass a blank username to LocalCKAN and only actions allowed by anonymous users will be permitted.

from ckanapi import LocalCKAN

anon = LocalCKAN(username='')
print(anon.action.status_show())

TestAppCKAN

A class is provided for making action requests to a webtest.TestApp instance for use in CKAN tests:

from ckanapi import TestAppCKAN
from webtest import TestApp

test_app = TestApp(...)
demo = TestAppCKAN(test_app, apikey='my-test-key')
groups = demo.action.group_list(id='data-explorer')

Tests

To run the tests:

python setup.py test

License

🇨🇦 Government of Canada / Gouvernement du Canada

The project files are covered under Crown Copyright, Government of Canada and is distributed under the MIT license. Please see COPYING / COPYING.fr for full details.

More Repositories

1

ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
Python
4,391
star
2

ckanext-dcat

CKAN ♥ DCAT
Python
165
star
3

ckanext-harvest

Remote harvesting extension for CKAN
Python
130
star
4

ckanext-spatial

Geospatial extension for CKAN
XSLT
125
star
5

ckan-docker

Scripts and images to run CKAN using Docker Compose
Shell
89
star
6

ckanext-scheming

Easy, shareable custom CKAN schemas
Python
83
star
7

datapusher

A standalone web service that pushes data files from a CKAN site resources into its DataStore
Python
76
star
8

ckanext-pages

A simple builtin CMS for CKAN sites
Python
51
star
9

ckanext-xloader

Express Loader - quickly load data into DataStore. A replacement for DataPusher.
Python
45
star
10

ckanext-showcase

A ckan extension to showcase datasets in use
Python
42
star
11

ideas

[DEPRECATED] Use the main CKAN repo Discussions instead:
40
star
12

awesome-ckan

🎉 A curated list of all awesome things related to CKAN
37
star
13

ckanext-googleanalytics

CKAN extension to integrate Google Analytics data into CKAN. Gives download stats on package pages, list of most popular packages, etc.
Python
35
star
14

ckanext-dashboard

Organize views in a single dashboard
Python
29
star
15

ckanext-validation

CKAN extension for validating Data Packages using Table Schema.
Python
28
star
16

ckanext-hierarchy

Organization hierarchy - CKAN extension
Python
27
star
17

ckanext-pdfview

PDF viewer for CKAN
Python
26
star
18

ckanext-qa

CKAN QA Extension
Python
26
star
19

ckanext-mapviews

CKAN Resource View to build maps and choropleth maps
Python
26
star
20

ckanapi-exporter

Export dataset metadata from CKAN to Excel-compatible CSV
Python
25
star
21

ckanext-fluent

Multilingual fields for CKAN
Python
23
star
22

ckanext-datastorer

Get files from ckan into the webstore.
Python
21
star
23

ckan-service-provider

A library for making web services that make functions available as synchronous or asynchronous jobs
Python
21
star
24

ckanext-archiver

Archive CKAN resources
Python
21
star
25

deadoralive

A simple dead link checker service
Python
19
star
26

ckanext-report

CKAN report infrastructure
Python
17
star
27

ckan-docker-base

This is the Git repo of the official Docker images for CKAN.
Python
17
star
28

ckantoolkit

Backports for ckan.plugins.toolkit to ease CKAN extension compatibility
Python
15
star
29

ckanext-envvars

CKAN configuration settings available from env vars
Python
14
star
30

ckanext-issues

CKAN Issues Extension
Python
14
star
31

ckanext-disqus

Extension that adds the Disqus commenting system to CKAN
Python
13
star
32

ckan-packaging

Ansible scripts to package ckan
Shell
12
star
33

extensions.ckan.org

CKAN Extensions
HTML
12
star
34

ckanext-basiccharts

Line, bar and pie charts for CKAN
Python
12
star
35

ckan-instances

Repo for CKAN instances page
CSS
12
star
36

ckanext-apihelper

API Helper extension.
Python
10
star
37

ckan-solr

Pre-configured Solr images for rapid CKAN deployment
Makefile
10
star
38

ckanext-s3archive

Archive CKAN datastore to S3
Python
7
star
39

ckanext-deadoralive

A CKAN extension for the Dead or Alive link checker service
Python
6
star
40

ckanext-intro

A quick interactive introduction to writing CKAN extensions
Python
6
star
41

ckanext-drupal7

Drupal7 authentication for ckan
Python
6
star
42

ckanext-viewhelpers

Helpers for creating views for CKAN
JavaScript
5
star
43

ckan.github.io

WIP
HTML
5
star
44

example-earthquake-datastore

An example script that sets up and periodically updates a CKAN DataStore table with earthquake data from the NGDS
Python
5
star
45

ckanext-searchhistory

Save search history to be used later for autocomplete and statistics
Python
4
star
46

losser

Lossy JSON -> CSV transform, filter and export
Python
4
star
47

example-add-dataset

Example script that uses the CKAN API to create a dataset and upload some files to it
Python
3
star
48

ofs-hero

Rescue corrupted persisted_state.json in ofs-based filestore
Python
3
star
49

ckanext-rq

Backport of RQ background jobs for CKAN versions before 2.7.0
Python
3
star
50

ckan.org

Source code for the ckan.org website
SCSS
3
star
51

ckanext-repo

Shows information about the CKAN and extensions versions an instance is running. Mostly useful for development sites.
Python
2
star
52

ckanext-eurovoc

Add Eurovoc categories to CKAN's dataset schema
Python
2
star
53

ckanext-persona

A CKAN extension for logging in using Mozilla Persona
Python
2
star
54

ckan-demo-data

Add demo data to a CKAN instance.
Python
1
star
55

ckanext-editable-config

Python
1
star
56

ckanext-oauth2waad

Login to your CKAN site using Windows Azure Active Directory's OAuth 2.0
Python
1
star
57

irc-logs

1
star
58

ckan-python-monitor

1
star
59

ckanext-fdt-sqlalchemy

Python
1
star
60

ckan-postgres-dev

Postgres Docker images for rapid CKAN testing
Makefile
1
star
61

example-update-datastore

Example script that uses the CKAN API to create a dataset and upload some data to its DataStore
Python
1
star
62

ckanext-excelforms

Excel Forms for the CKAN 2.11 Table Designer feature
Python
1
star
63

ckanext-dsaudit

Activities for auditing datastore changes in ckan
HTML
1
star
64

ckanext-widgets

helper widgets to help render atom/rss feeds for CKAN themes
Python
1
star