Welcome to the developer documentation for Concord, an open source, high performance, language-neutral, distributed Stream Processing framework. This document introduces you to Concord with a quick overview and a simple Word Counter example.
To get up and running with Concord see the quick start guide. It has all of the information you’ll need to create your own distributed word counting program in our pre configured Vagrant environment.
What is Concord?
Concord is a Stream Processing framework. Stream processing is a computing abstraction that enables and eases in the development of processing large volumes of data as it is generated in real time. This data could be anything from user activity updates (webpage views, impression / click streams from display ads), exchange feeds, or even database logs.
Concord was built with the goal of making stream processing approachable for all developers, regardless of their programming background.
In our experience, the hardest part of using existing stream processors was configuring and maintaining the multiple distributed systems required to keep them running. Concord aims to be an all-in-one solution so you can spend your time working on business logic rather than configuring and managing your infrastructure.
1. Event-driven architecture
Concord supports an event driven, reactive programming model like that of Erlang. We recommend developers modularize their code , structuring it as a series of operators that form a greater topology. Each operator is connected to other operators in the topology by “streams”, globally unique message queues.
In its implementation, our framework connects operators in a manner similar to pub sub
- each operator lists the streams it wishes to subscribe to and the framework sees to it that each message produced on that stream passes through the operator.
By exposing an interface through apache thrift, Concord supports operators written in 15 languages.
2. Real-time code deployment
In Concord, you can update operators and deploy new operators independently of
one another - even when they’re running in production. Unlike most other
streaming frameworks where you have to statically define your topology in your
programs main method, concord allows you to be flexible with your topology.
Computations can be submitted to the cluster in any order, via the Command Line
$ concord deploy foo.json.
Each operator/task runs in a separate Linux container via Mesos. This guarantees that errant processes don’t cause other processes to crash.
Concord’s communication layer is built from the ground up in C++14, utilizing thrift - backed by libevent for efficient evented I/O, and rocksDB to manage backpressure. Message delivery between operators in Concord achieves low latency through non-blocking network I/O similar to that of nginx.
4. Fault Tolerance
Concord was built to be highly available by design so that your operators continue to run even in the event of failure. If the scheduler dies for any reason, zookeeper will coordinate a leader election between remaining schedulers to determine the next leader. We are also working on multiple supervision strategy that will help us re-launch user code in the event of failures. Come back soon!
Currently Concord’s message guarantees are best effort, at most once. This means that in the event of an unrecoverable failure, the system will not process any missed messages during the outage. At the moment we are also implementing and designing new features such as at least once and exactly once processing.