CloudGraphs: Part 2 – The CloudGraphs Architecture

The first of the CloudGraphs series introduced the problem statement: Let’s represent this nifty localization data in a cloudy graph database so that it can be distributed across a cloud system. This article follows up with the preliminary architecture, attempting to answer the ‘how’ and ‘why’. Why each technology in the first post’s stack? Well, let’s start with the informal ‘back-of-napkin’ architecture diagram below.

Preliminary ‘Back-Of-Napkin’ System Architecture

CloudGraphs Architecture - New Page

It’s amazing how technology/business terms drift in an out. I was about to write ‘back of cigarette pack’, but quickly realized that phrase is about as archaic as ‘SLIP‘… Right, so it’s a ‘back-of-napkin‘ diagram 🙂 This is mostly because it’s preliminary, and may not completely represent how the individual libraries interact. So, a grain of salt and off we go!

We are concerned about 4 layers:

  1. The Platform Persistence Layer
    1. Where is the data actually persisted?
    2. How to gain universal (authorized) access to data?
    3. How much data needs to be accessed from where?
  2. The Cloud Interface Layer
    1. How do we hide the technicalities of the persistence layer from the application layer?
    2. How many different interface layers are there?
  3. The Cloud Application Layer
    1. What do the applications need to know to use CloudGraphs?
  4. The Robotics Interface Layer
    1. What considerations do we need to think about when interacting with an underlying CloudGraphs system?

The Platform Persistence Layer

After several back and forth iterations, a two tier data storage seems to be best. Two databases are at the root of the implementation:

Note: These two stores changes depending on the stage of development, as well as whether the we see any issues with the connectivity later in the project.

At the end of the day, we only need two endpoints and two sets of credentials in this layer. The Neo4j and the MongoDB instances can exist in CloudFoundry, Amazon AWS, on a local network beast machine, or even on our local PC’s. For the initial work, these do actually reside on our local machines. As the project matures and approaches some pre-production environment, we will move the databases out to the cloud.

In the 4th article in this series, I’ll quickly talk through how to set up local instances of both databases.

The Cloud Interface Layer

User applications will use the Cloud Interface Layer to access the persistence services. This is where the majority of this project’s work will be done, because we are wrapping these services into a single interface. Applications like IncrementalInference.jl should be able to use these easily.

Yaarrrr, here be dragons! The remainder of this project is dedicated to these components, welding bits of new technology with pieces of existing technology.

The principal components of this layer are Julia libraries:

In the 3rd article I’ll also discuss in more depth why these components were used.

Cloud Application Layer

The objective of this project is to replace a local graph with a cloud-enabled graph structure that can be traversed/modified by a variety of different agents. That means that the consumers of the new architecture (namely IncrementalInference.jl and possibly other applications) should be able to switch between their internal graph representations and CloudGraphs with minimal fuss.

In other words, the focus of this project is to provide these applications with a single interface to the complete graph structure, which has been split between the two databases and parts of it have been encoded using protocol buffers or BSON. However, that should be hidden from the consuming application. It should just be able to ask for graph nodes, run graph queries, or ask for image/video data without having to know that bits and pieces were woven back together.

The principal components of this layer are:

  • The target application for this project is Dehann’s IncrementalInference.jl project:
    • One of the really hard parts in navigating a robot is how to enable real-time, multiple access and persistent storage of commonly-inferred sensor data;
    • The IncrementalInference.jl library is open, under continued development and focuses on robust inference to navigate a mobile platform;
    • The project follows from a large body of robotic navigation work known as simultaneous localization and mapping (SLAM). As it turns out, this sensor measurements from a robot can be collected into a natural graphical model representation (and later re-factored into an algebraic equivalent tree representation, which makes statistical inference over the data much more tractable);
    • We’re storing that graph in CloudGraphs;
    • In addition to that, IncrementalInference.jl is aimed at allowing easier access to SLAM style map inference and, when coupled with CloudGraphs.jl, large amounts of sensory data in a multi-view type map-aware systems – more to follow on this as we’re a little excited about it!
  • There may be other applications that would interact with the graph, which are included in the placeholder ‘TreeRefinement.jl’ – e.g. applications that would process the raw video data from each node;
  • A critical component is the Julia Buildpack for CloudFoundry, which will allow Julia applications to be run inside of the CloudFoundry environment:
    • This is a small project unto itself, but having the ability to run (and scale) applications inside CloudFoundry is very appealing;
    • More discussions on this to follow as the project develops.

Robotics Interface Layer

This is all tied to a vehicle and its perception of its location. Well, ideally – once the distributed structure is implemented – this is applicable to a multiple coordinated vehicles. That means that we are concerned with how the vehicle communicates with the distributed tree.

We’ve put in a placeholder Julia application for vehicles that will interact with the graph, i.e. LocalizationAgent.jl. This is the minimum viable product component, no more. Why only concern ourselves with such a teeny slice? Well, for the moment, the vehicles are not the focal point. Once we have an idea of the general graph performance, and how Dehann’s algorithm maps to the underlying structure, we will be able to have a more informed discussion about this layer. Specifically:

  • Whether we interact directly with the cloud structure, or whether we capture a subgraph on the vehicle and asynchronously update the central tree;
  • How do multiple agents interact with shared vertices in the tree?
  • Do we consider refining the tree ‘online’, i.e. as vehicles update the graph, there is another agent in the cloud layer that cleans it up? Or, do we only concern ourselves with ‘offline’ refinement?
  • …Actually, now that we’re talking about it – aside from algorithm+performance concerns… do we want to provide a Python CloudGraphs interface because a large portion of our codebase is in Python?

See: can o’ worms! We haven’t even demonstrated how great it would be to have one centralized representation of our world, and BOOM! A hundred questions about how to consume it.

For the proof of concept, here are the assumptions:

  1. There is a root vertex in the graph – generally associated with the most recent events in the system;
  2. There may be multiple agents interacting with the graph – without this, there is arguably little purpose in moving the graph outside of the vehicle;
  3. Some of the agents may be vehicles:
    1. A vehicle will produce data, i.e. vertices and edges, and this data will be rooted against a vertex in the tree;
    2. That is, they will produce independent subgraphs with potentially common root nodes, however the subgraphs will be independent;
    3. As the vehicles generate new poses, they will be pushed into the central graph – probably asynchronously because the subgraph is cached on the vehicle and there is no need to read back the graph from the cloud.
  4. Other agents may refine the graph:
    1. These may prune or combine vertices;
    2. This step is assumed to be done while the vehicles are not active, i.e. it’s an ‘offline’ update;
  5. Lastly, some agents may be retrieving payload data from the graph:
    1. A simple example of this is an engineer stitching together terrain data from the poses;
    2. As this is solely a read operation, it can be done at any stage.

That’s pretty much it!

Next Steps

Before diving into the implementation, the next article discusses the language choice, namely that of Julia. Rather than giving a comprehensive rundown of the language, it’s more of a review of why the language is suitable to this problem.

– Sam and Dehann


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s