In the interest of a bit of learning and initiating some public projects on SemiSorted, we’re taking on a cloud-based project! Dehann Fourie in the MIT CSAIL lab is working on a really impressive open-source pose estimation framework, and the idea is to learn cloud technologies by growing it into a distributed cloud solution. This whole project is a learning curve in Neo4j, MongoDB, CloudFoundry, and JuliaLang, and the articles will be written in the same way the project was executed: with haphazard enthusiasm and a bit (~lot) of hackery.
What is Pose Estimation?
During a catch-up call on Njord, Dehann gave an overview of his work on pose estimation. Pose estimation is an umbrella term for fusing imperfect measurements (of location, orientation, and their derivatives) to generate a solid estimation of the true location and orientation. It’s also known as Simultaneous Localization and Mapping (SLAM). Pose estimation is an interesting problem that has applications in a huge variety of fields – notable cases include the moon lander, GPS systems, and your everyday drones.
This problem has really frustrated people for decades. I came across it by accident a long time ago while playing with a little drone in university: “Okay I have a measurement of linear and rotational accelerations from an accelerometer, I have a compass, and I have a GPS, so it should take a couple days to figure out how to blend those measurements into a good estimation of where I really am… Right?… ”
…[More Time Passes]…
“Fracking frack… this is hard!”
It is only now showing wide-spread, robust solutions (e.g. the head-tracking in Oculus Rift/Hololens and highly agile drones). A simple example of the evolution is the Kalman filter used in head-tracking in the early VR headsets *yuck*, and the new exciting stuff from Oculus.
In Dehann’s Incremental Inference solution (which I’ll be using for this problem statement), the core of the pose estimation algorithm is a graph. The graph represents the statistical measurements of points in space, and the relationships between each measurement. In terms of graphs, this means:
- Each vertex is a pose, which is: “I think I’m here, with some measure of error”;
- Think of markers on Google Maps – these are vertices;
- Between vertices are edges, which links vertices by successive measurements and movement, i.e. an edge translates to: “I moved some measurable amount (with estimated error), and now I’m at this new vertex”;
- Think of lines between the Google Maps markers – My car odometer measured a total travel distance of 2km while travelling to the new vertex;
- Over and above this, there are other measures of movement, such as: “…while I was moving, my other sensors believed I made some related measure of change” – these are additional vertices and edges in the graph.
- As a simple example, I I saw a sign at X while at vertex A and now, after moving, I recognize the same sign at Y when I ended up at vertex B.
Designing a Cloud Graph Solution
Graphs such as these, in a real-world environment, can grow to be huge. Consider this in terms of a person – the data being recognizable places in your memory – it could easily be terabytes upon terabytes of data. This leads to a couple of problems when addressing this in a real-world solution:
- You can’t realistically store the complete graph and all its data in local memory;
- Multiple users moving around in the same graph may want to share the graph (it can be leveraged in a distributed system – think of a few drone flying around the same space);
- Some users may want to add to the graph, others may just be consuming or mining it, i.e. there are different traversions and behaviors, which could be expensive if it not represented in an optimal way.
The holy grail of this is to build this graph over time, successively improving it. Ya know, the way us squishy meat-bags do it? Ideally if you know roughly where you are, you can operate in a sub-graph, never needing to store the complete structure, for example: “Oh yeah, I’ve seen that restaurant before – we must be close to our destination!”. If the graph can be traversed and consumed in parts – scaling it up and making it distributed would make it very powerful. Some workers consume it, some add to it, others refine it, etc. etc.
…A little exciting for a problem statement, no? 🙂
The Technology Stack
For the base graph, the choice was fait accompli – Neo4j. In the interest of full disclosure, I’ve never used Neo4j. However, I have a habit of trying to remember the books my manager is reading and following up by trying to read the same material. A while back he had a book about Neo4j on his table and it looked really interesting. I was intrigued – I’ve seen too many graphs represented in relational databases and, like… EW!
Net result: this problem statement sounds like a great fit to learn some new tech and maybe build something exciting using this Neo4j thingey that everyone’s talking about.
The project in a nutshell:
- Principal Development Language: Julia
- Julia is a powerful language in development at the MIT CSAIL lab that down-compiles to C;
- The respective MIT CSAIL group use Julia, so the interface will between their code and the graph database will be built in Julia;
- I’ll write up a bit of an article on why this language is a great fit for implementing this solution (I was originally reluctant to pick up a new language, but it is actually a really good fit – I’ll dive into that in the next article);
- Significant Technologies:
- Incremental Inference: This is the algorithm that will make use the new graph;
- Neo4J: The objective is to build a Neo4j database that will abstract away the internal structure of the pose graph used in the Incremental Inference algorithm;
- MongoDB: Parts of the graph may be big – like 4k video big – and we don’t want to save that into the Neo4j structures. A first-pass solution is to offload the data in a NoSQL database, specifically MongoDB;
- Google Protocol Buffers (protobufs): Dehann showed me a neat demo of being able to encode and decode data using protobufs – which are forward+backward compatible binary representations of a datastructure – this may be used to save the packed structure of the Neo4j nodes;
- Pivotal CloudFoundry: We want a centralized database that can easily grow in the cloud, boom: CloudFoundry! I’ve been using CloudFoundry at work quite a bit (as well as for CloudShips) – it’s a pretty impressive platform;
- Redis [Optional extra]: We may want to have a shared, distributed cache of nodes, and receive updates when the nodes changes – I’ve heard this can be done with Redis, so including it in the potential tech stack.
I’ll try write the articles in the same way in which the work progresses – starting with an explanation of the preliminary architecture. That is likely to change as the project develops, but it’ll be the best-guess design with a couple handfuls of salt. With that defined, the next step is exploration of the Julia language, which is really interesting and the discussion will flesh out the design pattern for the various libraries.
More to follow on the preliminary architecture in a short while then!