Polyglot API Services

July 23, 2018

It used to be that if you were building an API back 5-10 years ago, you would pick a language of your choice, a database and off you went. You would write a few HTTP endpoints and if you cared you would try to make them RESTful. These days you have new technologies such as GraphQL and others that are changing the landcape of how to write APIs. I’ll cover these topics in another post but today I want to talk about a problem that I see a lot of startups facing, how to manage a polyglot team.

A lot of companies today have several technical teams onboard. Specifically, they will have an engineering team in charge of shipping fast, reliable and performance APIs, and they will also have a data science team in charge of creating models and algorithms to answer deeper questions.

Data Science

Data science today is being done in many languages. You have Python, R, Julia, and others. All of these languages offer a rich set of libraries that allow the scientist to focus on solving the problem instead of trying to reimplement a well documented and previously developed algorithm that solves a low level problem. It is amazing the amount of open source tooling available to the science community and I hope to see this improving every day.

The problem

After science comes up with the solution to a technical problem, usually this is handed over to the engineering system to deploy into a production system. This is where I see a lot of inefficiencies happening at a lot of startups that I talk to. They will take some code written in Python which leverages libraries from sci-kit and others and try to rewrite it into another language such as Go, C++, or even Java.

This rewrite almost always becomes a large effort that makes your engineers have to reimplement core, low level functions that are just readibly available in its counterpart / data science friendly language such as Python.

We face this problem quite a lot at Spatially, and after many different approaches we found a happy balance that allows us to move fast and avoid inefficiencies such as having to reimplement sci-kit into a Go package.

The solution: serverless

We started leveraging Lambdas, Fargate, and Step functions to create API endpoints, data pipelines and other services using a variety of languages that are truly the best tool for the job. This doesn’t mean that we have hundreds of different languages, we actually only have 3 in production (Go, Python and Kotlin) but it does mean that we have been able to move significantly faster since adopting this methodology. We are not longer stuck in rewrite cycles, and can move into more of an integration / performance phase as soon as we get the delivery from science.

I really believe the rise of serverless should be leveraged to make your teams more efficient and allow them to ship faster. At the end, time to market is more important than whether your whole infrastructure runs on a single language.