Tutorials

Command-line Interface

Concord offers a command-line interface to make deploying, scaling and monitoring clusters simple, whether it’s in a production environment or during development.

Installation

In order to use the Concord CLI you will first need to install pip. After installing you may use pip to install our CLI python package.

$ sudo easy_install pip
$ sudo pip install concord

Computation JSON Manifest

Concord relies on a JSON manifest file that details how the computation should be packaged and run. This file can be named whatever you’d like. The schema is as follows:

{
    // optional list of arguments to pass to the command
    "executable_arguments": [],
    // files to include in the package sent to the cluster
    "compress_files": [],
    // memory to allocate for this task
    "mem": 1024,
    // disk space to allocate for this task
    "disk":1024,
    // zookeeper path where concord metadata is stored
    "zookeeper_path": "/concord",
    // cpu shares to allocate for this task
    "cpus": 1,
    // log level
    "framework_v_module": "",
    // number of instances to launch at start
    "instances": 1,
	// number of attempts when restarting a failed operator
	"retries" : 3,
	// Command line arguments to forward onto concords executor
	"executor_args" : [],
    // framework log level
    "framework_logging_level": 1,
    // list of KEY=VALUE pairs to be injected into the shell environment
    "environment_variables": [],
    // comma-separated list of zookeeper hosts
    "zookeeper_hosts": "localhost:2181",
    // executable to run
    "executable_name": "foobar",
    // if present, use blacklist for adding files to package
    "exclude_compress_files": [],
    // name of computation
    "computation_name": "my-computation",
    // force the scheduler to update the binary (useful when updating)
    "update_binary": true,
    // user to run the computation under
    "execute_as_user": "",
    // docker container to run in (optional)
    "docker_container": ""
    // Pull image from dockerhub or not (optional)
    "force_pull_container" : true
}

Deploy pre-built Concord connectors (runway)

Check out our open source repository of connectors we call Concord Runway:

$ concord runway INFO:2016-07-29 15:48:05,721 runway.py:204] Fetching concord runway metadata at: https://raw.githubusercontent.com/concord/runway/master/meta/repo_metadata.json Select an operator to deploy: +-------+----------------+--------------------------------------------------+--------------+------------+------------+ | Index | Connector | Description | Last Updated | Pull Count | Star Count | +-------+----------------+--------------------------------------------------+--------------+------------+------------+ | 1 | Cassandra Sink | Push incoming stream data to Cassandra... fast | 2016/07/27 | 13 | 1 | | 2 | Kafka Source | Pulls records from Kafka into a Concord topology | 2016/07/29 | 24 | 1 | | 3 | Kafka Sink | Pushes concord records to a kafka cluster | 2016/07/29 | 3 | 1 | +-------+----------------+--------------------------------------------------+--------------+------------+------------+ Selection:

You’ll be presented with a list of open source operators that are dockerized for ease of use. The runway command also accepts a manifest file via the -c option, although it may not be nessecary in some cases. Runway can detect your zookeeper hosts list and zookeeper path that has been setup by concord config, or explicity pass in these values via runways command line args.

Sensible defaults for things like CPU and memory have already been setup by package authors. Most runway operators will only need a manifest that forwards any necessary command line arguments onto the operator. For more information and examples, check out the READMEs to a specific connector, for example our kafka connector.

Since runway is open source, you can create your own operators and share them with the Concord community! This process is in its infancy so we are just beginning to spec out a process for this. For more information check out the repository README.

Deploying a Computation (deploy)

Once you’ve built your manifest, deploy your operator like this:

$ concord deploy my-operator.json
Scaling
To scale a computation, simply adjust the "instances" parameter in your JSON manifest file and redeploy.

Kill an Unresponsive Computation (kill)

If, for whatever reason, a computation gets stuck in an unresponsive state and won’t exit, you can force Concord to kill the task. The following command will bring up an interactive mode where you will be able to inspect running computations and selectively choose a group to kill:

$ concord kill --zookeeper-hosts <my hosts> \
  --zookeeper-path <path> \

  Querying zookeeper for cluster topology...
  INFO:cmd.utils:Connecting to: localhost:2181

  Select a computation to inspect:
  +-----+------------------+-------------------+-------------------+
  | Idx | Computation name | istreams/grouping | ostreams/grouping |
  +-----+------------------+-------------------+-------------------+
  | 1   | word-source      |                   | (words <-> 1)     |
  | 2   | word-counter     | (words <-> 1)     |                   |
  +-----+------------------+-------------------+-------------------+
  Selection: [1..2, 1,.., quit(q), all(a)]:

When prompted you may enter either ‘all’ or a comma separated list of numbers and ranges (i.e. 2, 4, 5..7).

In order to print the DAG for the all the computations running in your cluster:

$ concord graph --zookeeper <my hosts> --file deps

deps.pdf will contain the topology layout.

Deploying the Concord Scheduler with Marathon (marathon)

For users with existing mesos clusters running Marathon, the CLI supports a command to generate a Marathon configuration file in one step. However if you are using DC/OS check out our DC/OS installation instructions to install Concord with one command.

$ concord marathon -h
usage: marathon.py [-h] [-C CONCORD_ZOOKEEPER] [-M MESOS_ZOOKEEPER]
                   [-n FRAMEWORK_NAME] [-l] [-c CPU_SHARES] [-m MEM_ALLOCATED]
                   [-i INSTANCES] [-o OUTPUT_DESTINATION]

optional arguments:
  -h, --help            show this help message and exit
  -C CONCORD_ZOOKEEPER, --concord_zookeeper CONCORD_ZOOKEEPER
                        i.e. zk://1.2.3.4:2181,2.2.2.2:2181/concord
  -M MESOS_ZOOKEEPER, --mesos_zookeeper MESOS_ZOOKEEPER
                        i.e. zk://1.2.3.4:2181,2.2.2.2:2181/mesos
  -n FRAMEWORK_NAME, --framework_name FRAMEWORK_NAME
                        Name to give to Concord Scheduler framework
  -l, --locate_publicip
                        Use openDNS to automatically resolve public ip. If
                        this is not set then ifconfig will be used to query
                        for a public ip. This may not work if the selected
                        machine is behind a router that uses NAT.
  -c CPU_SHARES, --cpu_shares CPU_SHARES
  -m MEM_ALLOCATED, --mem_allocated MEM_ALLOCATED
  -i INSTANCES, --instances INSTANCES
                        Instances of Concord Scheduler
  -o OUTPUT_DESTINATION, --output_destination OUTPUT_DESTINATION

The configuration options you supply here only apply to the Concord Scheduler. Future computations will use the options in their manifest files to determine parameters such as zookeeper location, resource limits, etc.

Storing Defaults (config)

In the concord cli there are many commands that will prompt you for the same information in order to perform a unique action. You have the option to store global defaults using the concord config command:

$ concord config init

The config program has three commands; init, set, and show. The init command will create a file named ‘.concord.cfg’ in your current directory with default settings. Before any concord command is run, a search for this file will be performed in the current directory. If it is not found then the program will recursively search in the parent directory until it reaches root.