RSS

API Database News

These are the news items I've curated in my monitoring of the API space that have some relevance to the API definition conversation and I wanted to include in my research. I'm using all of these links to better understand how the space is testing their APIs, going beyond just monitoring and understand the details of each request and response.

Api Life Cycle Basics Database

404: Not Found


API Life Cycle Basics: Database

Deploying an API from a database is the most common approach to delivering APIs today. Most of the data resources we are making available to partners and 3rd party developers via APIs lives in a database behind our firewall(s). While we have seen database platform providers begin to take notice of the need to make data available using the web, most APIs get deployed through custom frameworks, as well as gateways that expose backend systems as web APIs.

If you are deploying APIs from a centralized legacy database, there will be signficantly more security, performance, and other operational concerns than if your database is dedicated to providing a backend to your API. There are a growing number of open source tools for helping broker the relationship between your API and the database, as well as evolving services, and entire database platforms that are API-centric. Here are just a handful of what I’m seeing out there to support the database stop along an API life cycle.

DataBeam - Generic RESTful Interface for databases. Arrest-MySQL - A plug-n-play RESTful API for your MySQL database. Postgrest - REST API for any Postgres database. Restheart - RESTHeart, the automatic REST API Server for MongoDB. NodeAPI - Simple RESTful API implementation on Node.js + MongoDB. PHP CRUID API - Single file PHP script that adds a REST API to a SQL database Google Cloud Spanner - loud Spanner is the first and only relational database service that is both strongly consistent and horizontally scalable.

There are many database to API tools and services available out there. There are also many cloud-native slutions available to help you generate APIs from your preferred cloud provider. Amazon, Azure, and Google all provide API deployment, and management solutions directly from their database solutions. The most difficult part about helping folks thinking about this stop along the API life cycle, is the many different scenarios for how data is stored, and the limitations on how that data can be made available via APIs.

Ideally you are starting from scratch with your API, and you can deploy a new database, with a brand new API layer exposing youir data store within. If you are deploying from a legacy database which serves other systems and applications, I recommend thinking about replicating the database and creating read only instances for accessing via the API, or if if you need read / write capabilities, then take a look at many of the gateway solutions available today. Beyond that, if you have the skills to securely connect directly to your database, there are many more options on the table to help you get the job done in todays web-centric, data-driven world.


Being Able To See Your Database In XML, JSON, and CSV

This is a sponsored post by my friends over at SlashDB. The topic is chosen by me, but the work is funded by SlasDB, making sure I keep doing what I do here at API Evangelist. Thank you SlashDB for your support, and helping me educate my readers about what is going on in the API space.

I remember making the migration from XML to JSON. It was hard for me to understand that difference between the formats, and that you accomplish pretty much the same things in JSON that you could in XML. I’ve been seeing similarities in my migration to YAML from JSON. The parallels in each of these formats isn’t 100%, but this story is more about our perception of data formats, than it is about the technical details. CSV has long been a tool in my toolbox, but it was until this recent migration from JSON to YAML that I really started seeing the importance of CSV when it comes to helping onboard business users with the API possibilities.

In my experience API design plays a significant role in helping us understand our data. Half of this equation is understanding our schema, amd what the dimensions, field names, and data tpes of the data we are moving around using APIs. As I was working through some stories on how my friends over at SlashDB are turning databases into APIs, I saw that they were translating database, tables, and field names into API design, and that they also help you handle content negotiation between JSON, XML, CSV. Which I interpret as an excellent opportunity for learning more about the data we have in our databases, and getting to know the design aspects of the data schema.

In an earlier post about what SlashDB does I mentioned that many API designers cringe at translating database directly into a web API. While I agree that people should be investing into API design to get to know their data resources, the more time I spend with SlashDB’s approach to deploying APIs from a variety of databases, the more I see the potential for teaching API design skills along the way. I know many API developers who understand API design, but do not understand content negotiation between XML, JSON, and CSV. I see an opportunity for helping publish web APIs from a database, while having a conversation about what the API design should be, and also getting to know the underlying schema, then being able to actively negotiate between the different formats–all using an existing service.

While I want everyone to be as advanced as they possibly can with their API implementations, I also understand the reality on the ground at many organizations. I’m looking for any possible way to just get people doing APIs, and begin their journey, and I am not going to be to heavy handed when it comes to people being up to speed on modern API design concepts. The API journey is the perfect way to learn, and going from database to API, and kicking of the journey is more important than expecting everyone to be skilled from day one. This is why I’m partnering with companies like SlashDB, to help highlight tools that can help organizations take their existing legacy databases and translate them into web APIs, even if those APIs are just auto-translations of their database schema.

Being able to see your database as XML, JSON, and CSV is an important API literacy exercise for companies, organizations, institutions, and government agencies who are looking to make their data resources available to partners using the web. It is another important step in understanding what we have, and the naming and dimensions of what we are making available. I think the XML to JSON holds one particular set of lessons, but then CSV possesses a set of lessons all its own, helping keep the bar low for the average business user when it comes to making data available over the web. I’m feeling like there are a number of important lessons for companies looking to make their databases available via web APIs over at SlashDB, with automated XML, JSON, and CSV translation being just a noteable one.


Being Able To See Your Database In XML, JSON, and CSV

This is a sponsored post by my friends over at SlashDB. The topic is chosen by me, but the work is funded by SlasDB, making sure I keep doing what I do here at API Evangelist. Thank you SlashDB for your support, and helping me educate my readers about what is going on in the API space.

I remember making the migration from XML to JSON. It was hard for me to understand that difference between the formats, and that you accomplish pretty much the same things in JSON that you could in XML. I’ve been seeing similarities in my migration to YAML from JSON. The parallels in each of these formats isn’t 100%, but this story is more about our perception of data formats, than it is about the technical details. CSV has long been a tool in my toolbox, but it was until this recent migration from JSON to YAML that I really started seeing the importance of CSV when it comes to helping onboard business users with the API possibilities.

In my experience API design plays a significant role in helping us understand our data. Half of this equation is understanding our schema, and what the dimensions, field names, and data types of the data we are moving around using APIs. As I was working through some stories on how my friends over at SlashDB are turning databases into APIs, I saw that they were translating database, tables, and field names into API design, and that they also help you handle content negotiation between JSON, XML, CSV. Which I interpret as an excellent opportunity for learning more about the data we have in our databases, and getting to know the design aspects of the data schema.

In an earlier post about what SlashDB does I mentioned that many API designers cringe at translating database directly into a web API. While I agree that people should be investing into API design to get to know their data resources, the more time I spend with SlashDB’s approach to deploying APIs from a variety of databases, the more I see the potential for teaching API design skills along the way. I know many API developers who understand API design, but do not understand content negotiation between XML, JSON, and CSV. I see an opportunity for helping publish web APIs from a database, while having a conversation about what the API design should be, and also getting to know the underlying schema, then being able to actively negotiate between the different formats–all using an existing service.

While I want everyone to be as advanced as they possibly can with their API implementations, I also understand the reality on the ground at many organizations. I’m looking for any possible way to just get people doing APIs, and begin their journey, and I am not going to be to heavy handed when it comes to people being up to speed on modern API design concepts. The API journey is the perfect way to learn, and going from database to API, and kicking of the journey is more important than expecting everyone to be skilled from day one. This is why I’m partnering with companies like SlashDB, to help highlight tools that can help organizations take their existing legacy databases and translate them into web APIs, even if those APIs are just auto-translations of their database schema.

Being able to see your database as XML, JSON, and CSV is an important API literacy exercise for companies, organizations, institutions, and government agencies who are looking to make their data resources available to partners using the web. It is another important step in understanding what we have, and the naming and dimensions of what we are making available. I think the XML to JSON holds one particular set of lessons, but then CSV possesses a set of lessons all its own, helping keep the bar low for the average business user when it comes to making data available over the web. I’m feeling like there are a number of important lessons for companies looking to make their databases available via web APIs over at SlashDB, with automated XML, JSON, and CSV translation being just a notable one.


How Do You Ask Questions Of Data Using APIs?

I’m preparing to publish a bunch of transit related data as APIs, for us across a number of applications from visualizations to conversation interfaces like bots and voice-enablement. As I’m learning about the data, publishing it as unsophisticated CRUD APIs, I’m thinking deeply about how I would enable others to ask questions of this data using web APIs. I’m thinking about the hard work of deriving visual meaning from specific questions, all the way to how would you respond to an Alexa query regarding transit data in less than a second. Going well beyond what CRUD gives us when we publish our APIs and taking things to the next level.

Knowing the technology sector, the first response I’ll get is machine learning! You take all your data, and you train up some machine learning models, put some natural language process to work, and voila, you have your answer to how you provide answers. I think this is a sensible approach to many data sets, and for organizations who have the machine learning skills and resources at their disposal. There are also a growing number of SaaS solutions for helping put machine learning work to answer complex questions that might be asked of large databases. Machine learning is definitely part of the equation for me, but I’m not convinced it is the answer in all situations, and it might not always yield the correct answers we are always looking for.

After machine learning, and first on my list of solutions to this challenge is API design. How can I enable a domain expert to pull out the meaningful questions that will be asked of data, and expose as simple API paths, allowing consumers to easily get at the answers to questions. I’m a big fan of this approach because I feel like the chance we will get right answers to questions will be greater, and the APIs will help consumers understand what questions they might want to be asking, even when they are not domain experts. This approach might be more labor intensive than the magic of machine learning, but I feel like it will produce much higher quality results, and better serve the objectives I have for making data available for querying. Plus, this is a lower impact solution, allowing more people to implement, who might not have the machine learning skills or resources at their disposal. API design using low-cost web technology, makes for very accessible solutions.

Whether you go the machine learning or artisanal domain expert API design route, there has to be a feedback loop in place to help improve the questions being asked, as well as the answers being given. If there is no feedback loop, the process will never be improved. This is what APIs excel at when you do them properly. The savvy API platform providers have established feedback loops for API consumers, and their users to correct answers when they are wrong, learn how to ask new types of questions, and improve upon the entire question and answer life cycle. I don’t care whether you are going the machine learning route, or the API design route, you have to have a feedback loop in place to make this work as expected. Otherwise it is a closed loop system, and unlikely to give the answers people are looking for.

For now, I’m leaning heavily on the API design route to allow for my consumers to ask questions of the data I’m publishing as APIs. I’m convinced of my ability to ask some sensible questions of the data, and expose as simple URLs that anyone can query, and then evolve forward and improve upon as time passes. I just don’t have the time and resources to invest in the machine learning route at this point. As the leading machine learning platforms evolve, or as I generate more revenue to be able to invest in these solutions I may change my tune. However, for now I’ll just keep publishing data as simple web APIs, and crafting meaningful paths that allow people to ask questions of some of the data I’m coming across locked up in zip files, spreadsheets, and databases.


SQL Statement Pass-Through Using Web APIs

This is a sponsored post by my friends over at SlashDB. The topic is chosen by me, but the work is funded by SlasDB, making sure I keep doing what I do here at API Evangelist. Thank you SlashDB for your support, and helping me educate my readers about what is going on in the API space.

I’m closely following the approach of GraphQL when it comes to making data resources more accessible by API consumers when developing applications. I think there is some serious value introduced when it comes empowering front-end developers with the ability to get exactly the data they need using a variety of querying structures. I enjoy studying up on different approaches to making different dimensions of a database to consumers and end-users, and found a pretty scrappy one from my friends over at SlashDB, with their SQL statement pass through. It’s not the most formal approach to query a database, but I think it’s scrappy and simple enough, that it might work for a wide variety of technical, as well as non-technical users.

Using the SlashDB mode, an administrator, or an application backend developer can define arbitrary SQL queries which once defined, can be executed as a smple URL. The example query they provide returns customers from London: http://demo.slashdb.com/query/customers-in-city/city/London.html. It is something that will make RESTafarians pull their hair (dreads?) out, but for business users looking to get their hands on some data to populate a spreadsheet, or share with a partner when developing an application–it will be a lifesaver. As the GraphQL folks like trumpet, REST isn’t the only way to get things done, and while I think we should be thinking critical about the long term impact of our API design choices, getting business done efficiently is an important aspect of doing APIs as well.

What I like about the SlashDB approach is it makes for an intuitive URL. Something business users can understand. I could see crafting these in bulk, and some becoming permanent, while others maybe being more of a temporary thing. Depending on the application you may want to standardize how you publish your URLs, using common patterns, and making sure queries aren’t changing, if they are being baked into applications. I think that simple URLs that retrieve data from a database will always trump a more complex, technical solution that developers often want. Developers are always going to want more robust solutions that they can tweak and play with, but business users just want what they need, and are looking for the quickest way to solve their business problem–SQL statement pass-through is this.

I’ve worked at companies that have an HTML Textarea on the dashboard of the internal portal where you can hand type SQL statements, or use from a pre-configured set of statements. Allowing business users to quickly query a database and dump to spreadsheet, CSV, and import into other applications. I can see SQL pass-through being a quick and dirty solution that reflects these other approaches I’ve seen in the past. I could see bookmarks, quick links, and other scrappy ways of using the web to query backend databases like this. When you couple this with some sort of API key or other identifier, you can also begin to develop an awareness of who is making these types of queries, and what types of applications they are putting them to use in. Taking SQL query pass-through to the next level and going beyond just API deployment, and moving into the realms of API management.


Getting A Handle On Our Database Schema Using APIs

This is a sponsored post by my friends over at SlashDB. The topic is chosen by me, but the work is funded by SlasDB, making sure I keep doing what I do here at API Evangelist. Thank you SlashDB for your support, and helping me educate my readers about what is going on in the API space.

When I take money from my partners, I am always looking for characteristics in their products and services that allow me to write honest stories about the solutions they provide. I can’t do this for all API companies that approach me, but the ones that are doing useful things, make it pretty easy for me. SlashDB helps me out on this front because they aren’t the shiny new startup doing APIs–they are the real world business helping other companies, organizations, institutions, and government agencies get a handle on their databases using APIs. One huge benefit of this process in my opinion is how it helps us get a handle on the schema we use, by letting a little light in on the process.

One of the main reasons our databases are such a mess is because they are hidden away behind a dark technical or organizational curtain, and there really isn’t much accountability regarding how we define, name, organize, and store our data. Of course there are exceptions to this, but a messy, bloated, unwieldy database is a hallmark of about 75% of the organizations I’ve worked with over my 30 year career. Central databases are often a mashup of years, even decades of creating databases, tables, and adding columns, often times occurring over generations of database teams. The result is often an incoherent mess regarding how things are named, with layers of cryptic field names, and irrelevant table names, which might seem normal until you go and try to expose these data resources to 3rd party and partner developers.

Many of the data APIs I come across in my research lack any API design investment. Meaning they didn’t take any consideration when it came to exposing backend databases as coherent paths, parameters, and other elements. Many API providers just spit out the database as a web API, and called it good enough. This can be very frustrating for many Restafarians, and API designers. I agree, and I would love to see more efforts from API providers when it comes to making their APIs more intuitive, and doing the hard work of understand what resources they have, and how to best present their resources to their consumers. However, I feel like just exposing your database as endpoints can be an important first step in the API journey, and one that isn’t always 100% dialed in on day one–that is ok. Just publishing APIs, even if they reflect exactly the table and fields structures behind, is still an important first step for many companies. Not everybody is API design ready, and having APIs can prove to be more important than good design practices.

As I was looking through SlashDB’s site looking for potential story ideas, I thought their approach to exposing database and tables as paths, and helping take the first step of evolving any database towards being an API was worth telling a story about. I know this is the stuff that drives API obsessed folks crazy, and feel I shouldn’t be encouraging people, but I think it is more important that folks are doing APIs, and have embarked on their API journey, over doing things perfectly. API providers like SlashDB aren’t the bleeding edge of API design technology, they are the industrial grade API deployment solutions folks need to go from database to API. So go ahead and publish APIs that look exactly like your database structure. I’m not going to shame you. I think letting the sunlight in a bit is way more healthier than waiting until you have the perfect design, or worse, never doing it at all.

Tools like SlashDB allow us to begin the long process of unwinding our legacy database schema, and start being more consistent in the vocabulary we use. Even though the first version might not be as coherent, and plain language as we’d like, publishing a web API from your backend database like SlashDB provides, at least gets things out on the workbench–allowing you to begin having a conversation with external partners about what the future of your schema should look like. You are never going to learn API design by keeping everything behind closed doors, and even though you are going to have to support your first version out of the box for a significant amount of time–at least you are pushing your schema forward, making it more usable by external partners, and (hopefully) open to discussions about why your database schema might not work 100% at the moment.

Database to API is something ALL companies, organizations, institutions, and government agencies should be doing in 2017. ALL your databases should have web APIs available, even if you are still using ODBC/JDBC and other connectivity options. If you have the time and resources to inject some healthy API design practices into the mix you should, however don’t let it hold back your API deployments if you can’t. You should be eliminating any obstacles between your backend databases and the applications that need access to this data. Even if you did have the time to think through your API design, there is good chance you will need to shift the design of your API down the road based upon the feedback of consumers. So, just get your APIs published today, and begin the hard work of getting a handle on your database schema–it is too important to put off until you have everything just right.


Making Sure You Operate In The Cloud Marketplaces As An API Service Provider

This is a sponsored post by my friends over at SlashDB. The topic is chosen by me, but the work is funded by SlasDB, making sure I keep doing what I do here at API Evangelist. Thank you SlashDB for your support, and helping me educate my readers about what is going on in the API space.

As the cloud giants (AWS, Microsoft, and Google) continue to assert their dominance of the digital world, one aspect of their operations I’m watching closely has to do with their marketplaces. Google’s marketplaces are still very Android focused, but Amazon and Microsoft have shifted their recent editions of their marketplaces to be more cloud oriented, and accommodating a wide variety of applications, machine learning models, as well as APIs and API-focused services. While these marketplaces are still growing, and asserting their role in the digital economy, they are something I advise API providers, and service providers to be keeping a close eye on, and begin considering how they will want to operate within these environments.

If you are an API service provider, and you are selling services to API providers anywhere along the API lifecycle, I recommend you follow the example of friends over at SlashDB, who have their database to API offerings in two of the leading marketplaces:

[AWS](https://aws.amazon.com/marketplace/pp/B01MU8W71L] - Automatically constructing a REST API to databases for reading and writing on the AWS platform. Azure Marketplace - SlashDB enables you to do more with traditional databases and Microsoft Azure.

As more companies, organizations, institutions, and government agencies move their databases into the cloud, SlashDB sees the opportunity to help them quickly turn databases and tables into web interfaces for querying data. Having your API service ready to go, in the environments where your potential customers are already operating is how much of this API stuff will go down in the future. Amazon has set the stage for how we’ll be delivering IT infrastructure over the last decade with the introduction of the cloud, and Google and Microsoft are quickly playing catch up. The savvy API service providers understand their role in this cloud evolution and make sure their services are available as retail solutions, but also as plug and play wholesale solutions in these cloud marketplaces.

SlashDB is clearly serving the deployment and management aspects of the API lifecycle, but I’m tracking on virtualization, testing, monitoring, security, and other aspects of doing business with APIs who are also deploying using these cloud marketplaces. I’m also seeing an uptick in the growth of machine learning models being made available via AWS, Azure, and Google, demonstrating that the algorithmic evolution of the API sector will occur in these environments. The algorithmic wave of APIs is just getting started, but publishing APIs from your databases on the leading cloud platforms is standard operating procedure for businesses of all shapes and sizes in 2017. Are your API services available in the AWS or Azure marketplaces?


The Defensive Database Administrator And The Eager Blockchain Believer

Think about the power that database administrators have in your organizations world? I’ve been working with databases since my first job in 1987. I’ve seen the power bestowed upon database administrators in organization after organization. They are fully aware of the power they control, and most other people in an organization are regularly reminded of this power. The defensive database administrator is always the biggest obstacle in the way of API teams who are often seen as a threat to the power and budgets that database groups command. This power is why databases are often centralized, scaled vertically, and are the backends to so many web, mobile, desktop, and server applications.

I spend a significant amount time thinking about the power that database administrators wield, and how we can work to find more constructive, secure, and sensible approaches to shifting legacy database behaviors. Lately, I also find myself thinking a lot more about Blockchain. Not because I’m a believer, but because so many believers are pushing it onto my radar. Blockchain will continue to be a thing, not because it is a thing, but because so many people believe it is a thing. Most blockchains will not withstand the test of time, they are vapor, but the blockchains that remain will because people have convinced other people to put something meaningful into their blockchain. Much like we have convinced so many companies, organizations, institutions, and government agencies to put data into databases. Yes we. I’m complicit.

A definition of the blockchain is, “a continuously growing list of records, called blocks, which are linked and secured using cryptography”. It’s a database, linked and secured using cryptography. The reason you hear about the blockchain so much, and how it can revolutionize almost every business sector, is the blockchain believers want to convince you to put your digital assets into their blockchain, which will eventually make it something real. I can setup a blockchain today, call it anything I want, but it is nothing more than an empty distributed database. It doesn’t become anything until there is something of value stored in it, which is why there are so many eager folks right now trying to convince that blockchain is something, so you’ll put your valuable things in there, and it will become something.

Think of blockchain believers as the frontend version of the defensive database administrator. After a blockchain has been up for 20 years, and has a bunch of valuable things stored in it, the blockchain believers will become more like the database administrators. They’ll grow beards (even the women), and become more defensive of their precious data stores from whatever the next threat to their power is, and do whatever it takes to defend their power. Blockchain believers are young energetic, and looking to build their empires, and database administrators are usually older and motivated to defend their empires. When you are down in the trenches trapped within the tractor beam of a database it is hard to see beyond it. When you are basking in glow of Internet technology, and everything is new and exciting, it can also be hard to see beyond it. With everything, give it 20 years, and things often times become whatever they’ve replaced.


I Like The Scope Of The Aws Sdk For Javascript

404: Not Found


A Simple API With AWS DynamoDB, Lambda, and API Gateway

I’ve setup a few Lambda scripts from time to time, but haven’t had any dedicated project time to push forward API serverless concepts. Over the weekend I had a chance to deploy a couple of APIs using AWS DynamoDB, Lambda, and API Gateway, lighting up some of the serverless API possibilities in my brain. Like most areas of the tech sector, I think the term is dumb, and there is too much hype, but I think underneath there is some interesting possibilities, at least enough to keep me playing around with things.

Right now my primary API setup is Amazon Aurora (MysQL) backend, with API deployed on EC2, using Slim API framework in PHP. It is clean, simple, and gets the job done. I use 3Scale, or Github for the API management layer. This new approach simplifies some things for me, but definitely goes further down the AWS rabbit hole with the adoption of API Gateway and Lamdba, but also introduces some interesting enough benefits, that has me considering for use on some specific projects.

Identity and Access Management (IAM) Role The first thing you need to do to make the whole AWS thing work is setup a role using AWS IAM. I created a role just for this project, and added CloudWatchFullAccess, AmazonDynamoDBFullAccess, and AWSLambdaDynamoDBExecutionRole. I need this role to handle a bunch of management level things with the database, and logging. IAM is one of the missing aspects of hand crafting my APIs, and is why I am considering adopting on behalf of my customers, to help them get a handle on security.

Simple API Database Backends Using AWS DynamoDB I am a big fan of relational databases, mostly out of habit and experience. A client of mine is fluent in AWS DynamoDB, which is a simple NoSQL solution, so I felt compelled to ensure the backend database for their APIs spoke DynamoDB. It’s a pretty simple database, so I got to work creating an account table, and added a simple JSON object that contained 4 or 5 fields, and fired up an index for the simple accounts database. The databases I’m creating are meant to track aspects of API management, so the tables won’t end up being too large, or have high performance requirements, regardless, DyanamoDB is a perfect backend for APIs, leaving me unsure why I don’t use the platform more often.

Using Lambda Functions Behind The API Instead of firing up an Amazon EC2 and hand crafting my API framework, I crafted a handful of serverless scripts in Node.js that will run as independent Lambda functions. I’m going to eventually need a whole bunch of functions, but to get me going with this new API I crafted four separate Lambda functions that I can use to drive the API:

  • searchAccounts - Using the DynamoDB API scan method to query the table.
  • addAccount - Using the DynamoDB API putItem method to add a record to the table.
  • updateAccount - Using the DynamoDB API udpateItem method to add a record to the table.
  • deleteAccount - Using the DynamoDB API deleteItem method to add a record to the table.

Using the AWS SDK, I’m simply making calls to the DynamoDB API to get all the work done. I’m fluent in JavaScript, but not well versed in using Node.js, but it doesn’t take much energy to understand what is going on. The serverless functions are pretty utilitarian, and all that is unique is the DynamoDB method to call, and the JSON that is being sent with each call. It is something that is pretty straightforward, and easily replicated for other APIs. I will keep developing functions for my API, but now I can at least handle the basic CRUD functionality around my new database.

Publish An API Using AWS API Gateway The last piece of the puzzle for this story, is the API. Each Lambda function accepts and returns JSON, which is technically an API, but there is no management layer, or RESTful infrastructure present. The AWS API Gateway gives me the ability to craft API paths, with accompanying GET, POST, PUT, DELETE, and other methods. For each method I add, I’m given four options for connecting to my backend, either via HTTP call, create just a mock API, leverage other AWS service, or connect to a Lambda function. I quickly wire up a GET, POST, PUT, and DELETE to each of my functions, and add my API to an AWS API Gateway plan, requiring API keys, and limiting who can access what.

I now have an accounts API which allows me to add, update, delete, and search for accounts using an API. My data is stored in DynamoDB, and served up via Lambda functions, through the API Gateway. It is secured. It is scalable. I can easily quantify what my database, functions, and gateway resource usage and costs will end up being. I get why folks are interested in serverless. It’s clean. It’s modular. It scales. It is very manageable. I don’t feel like it will be the answer for every API I need to deploy, but it does make sense for quickly deploying APIs for customers who are open to AWS, and need things to be secure, highly performant, and scalable.

A serverless approach definitely takes the sysadmin load off a little bit. Especially when you depend on DynamoDB for the backend. DynamoDB, Lambda, and API Gateway offer a pretty nice stack that can auto tune and scale itself. I’m going to fire up five separate APIs using this new approach, and setup some monitoring and testing to see how it delivers, and maybe get a handle on the costs associated with operating an API like this. I still need to attach a custom domain, get a handle on logging with AWS CloudWatch, and some of the other aspects of API management using AWS API Gateway. However, it provides me with a nice look into the serverless world, and how I can use it to deploy, and manage APIs, but also use APIs to manage a serverless approach by publishing functions using the Lambda API, keeping things in tune with my API definitions stored on Github.


The Tractor Beam Of The Database In An API World

<img src="https://s3.amazonaws.com/kinlane-productions/algo-rotoscope/stories-new/dragon-shadows-black-white-outline.jpg" align=="right" width="40%" style="padding: 15px;" />

I’m an old database person. I’ve been working with databases since my first job in 1987. Cobol. FoxPro. SQL Server. MySQL. I have had a production database in my charge accessible via the web since 1998. I understand how databases are the center of gravity when it comes to data. Something that hasn’t changed in an API driven world. This is something that will make microservices in a containerized landscape much harder than some developers will want to admit. The tractor beam of the database will not give up control to data so easily, either because of technical limitations, business constraints, or political gravity.

Databases are all about the storage and access to data. APIs are about access to data. Storage, and the control that surrounds it is what creates the tractor beam. Most of the reasons for control over the storage of data are not looking to do harm. Security. Privacy. Value. Quality. Availability. There are many reasons stewards of data want to control who can access data, and what they can do with it. However, once control over data is established, I find it often morphs and evolves in many ways, that can eventually become harmful to meaningful and beneficial access to data. Which is usually the goal behind doing APIs, but is often seen as a threat to the mission of data stewards, and results in a tractor beam that API related projects will find themselves caught up in, and difficult to ever break free of.

The most obvious representation of this tractor beam is that all data retrieved via an API usually comes from a central database. Also, all data generated or posted via an API, also ends up within a database. The central database always has an appetite for more data, whether scaled horizontally or vertically. Next, it is always difficult to break off subsets of data into separate API-driven project, or prevent newly established ones from being pulled in, and made part of existing database operations. Whether due to technical, business, or political reasons, many projects born outside this tractor beam will eventually be pulled into the orbit of legacy data operations. Keeping projects decoupled will always be difficult when your central databases has so much pull when it comes to how data is stored and accessed. This isn’t just a technical decoupling, this is a cultural one, that will be much more difficult to break from.

Honestly, if your database is over 2-3 years old, and enjoys any amount of complexity, budget scope, and dependency across your organization, I doubt you’ll ever be able to decouple it. I see folks creating these new data lakes, which act as reservoirs for any and all types of data gathered and generated across operations. These lakes provide valuable opportunities for API innovators to potentially develop new and interesting ways of putting data to work, if they possess an API layer. However, I still think the massive data warehouse and database will look to consume and integrated anything structured and meaningful that evolves on the shores. Industrial grade data operations will just industrialize any smaller utilities that emerge along the fringes of large organizations. Power structures have long developed around central data stores, and no amount of decoupling, decentralizing, or blockchaining will change this any time soon. You can see this with the cloud, which was meant to disrupt this, when it just moved it from your data center to the someone else’s, and allowed it to grow at a faster rate.

I feel like us API folks have been granted ODBC and JDBC leases for our API plantations, but rarely will we ever decouple ourselves from the mother ship. No matter what the technology whispers in our ears about what is possible, the business value, and political control over established databases will always dictate what is possible and what is not possible. I feel like this is one reason all the big database platforms have waited so long to provide native API features, and why next generation data streaming solutions rarely have simple, intuitive API layers. I think we will continue to see the tractor beam of database culture continue to be aggressive, as well as passive aggressive to anything API, trumping access possibilities brought to the table by APIs, with outdated power and control beliefs rooted in how we store and control our data. These folks rarely understand they can be just as controlling and greedy with APIs, but they seem to be unable to get over the promises of access APIs afford, and refuse to play along at all, when it comes to turning down the volume on the tractor beam so anything can flourish.


Provide An Open Source Threat Information Database And API Then Sell Premium Data Subscriptions

I was doing some API security research and stumbled across vFeed, a “Correlated Vulnerability and Threat Intelligence Database Wrapper”, providing a JSON API of vulnerabilities from the vFeed database. The approach is a Python API, and not a web API, but I think provides an interesting blueprint for open source APIs. What I found interesting (somewhat) from the vFeed approach was the fact they provide an open source API, and database, but if you want a production version of the database with all the threat intelligence you have to pay for it.

I would say their technical and business approach needs a significant amount of work, but I think there is a workable version of it in there. First, I would create a Python, PHP, Node.js, Java, Go, Ruby version of the API, making sure it is a web API. Next, remove the production restriction on the database, allowing anyone to deploy a working edition, just minus all the threat data. There is a lot of value in there being an open source set of threat intelligence sharing databases and API. Then after that, get smarter about having a variety different free and paid data subscriptions, not just a single database–leverage the API presence.

You could also get smarter about how the database and API enables companies to share their threat data, plugging it into a larger network, making some of it free, and some of it paid–with revenue share all around. There should be a suite of open source threat information sharing databases and APIs, and a federated network of API implementations. Complete with a wealth of open data for folks to tap into and learn from, but also with some revenue generating opportunities throughout the long tail, helping companies fund aspects of their API security operations. Budget shortfalls are a big contributor to security incidents, and some revenue generating activity would be positive.

So, not a perfect model, but enough food for thought to warrant a half-assed blog post like this. Smells like an opportunity for someone out there. Threat information sharing is just one dimension of my API security research where I’m looking to evolve the narrative around how APIs can contribute to security in general. However, there is also an opportunity for enabling the sharing of API related security information, using APIs. Maybe also generating of revenue along the way, helping feed the development of tooling like this, maybe funding individual implementations and threat information nodes, or possibly even fund more storytelling around the concept of API security as well. ;-)


Looking At The 37 Apache Data Projects

I’m spending time investing in my data, as well as my database API research. I’ll have guides, with accompanying stories coming out over the next couple weeks, but I want to take a moment to publish some of the raw research that I think paints an interesting picture about where things are headed.

When studying what is going on with data and APIs you can’t do any search without stumbling across an Apache project doing something or other with data. I found 37 separate projects at Apache that were data related, and wanted to publish as a single list I could learn from.

  • Airvata** - Apache Airavata is a micro-service architecture based software framework for executing and managing computational jobs and workflows on distributed computing resources including local clusters, supercomputers, national grids, academic and commercial clouds. Airavata is dominantly used to build Web-based science gateways and assist to compose, manage, execute, and monitor large scale applications (wrapped as Web services) and workflows composed of these services.
  • Ambari - Apache Ambari makes Hadoop cluster provisioning, managing, and monitoring dead simple.
  • Apex - Apache Apex is a unified platform for big data stream and batch processing. Use cases include ingestion, ETL, real-time analytics, alerts and real-time actions. Apex is a Hadoop-native YARN implementation and uses HDFS by default. It simplifies development and productization of Hadoop applications by reducing time to market. Key features include Enterprise Grade Operability with Fault Tolerance, State Management, Event Processing Guarantees, No Data Loss, In-memory Performance & Scalability and Native Window Support.
  • Avro - Apache Avro is a data serialization system.
  • Beam - Apache Beam is a unified programming model for both batch and streaming data processing, enabling efficient execution across diverse distributed execution engines and providing extensibility points for connecting to different technologies and user communities.
  • Bigtop - Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadoop-related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc…) developed by a community with a focus on the system as a whole, rather than individual projects. In short we strive to be for Hadoop what Debian is to Linux.
  • BookKeeper - BookKeeper is a reliable replicated log service. It can be used to turn any standalone service into a highly available replicated service. BookKeeper is highly available (no single point of failure), and scales horizontally as more storage nodes are added.
  • Calcite - Calcite is a framework for writing data management systems. It converts queries, represented in relational algebra, into an efficient executable form using pluggable query transformation rules. There is an optional SQL parser and JDBC driver. Calcite does not store data or have a preferred execution engine. Data formats, execution algorithms, planning rules, operator types, metadata, and cost model are added at runtime as plugins.
  • CouchDB - Apache CouchDB is a database that completely embraces the web. Store your data with JSON documents. Access your documents with your web browser, via HTTP. Query, combine, and transform your documents with JavaScript. Apache CouchDB works well with modern web and mobile apps. You can even serve web apps directly out of Apache CouchDB. And you can distribute your data, or your apps, efficiently using Apache CouchDB’s incremental replication. Apache CouchDB supports master-master setups with automatic conflict detection.
  • Crunch - The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.
  • DataFu - Apache DataFu consists of two libraries: Apache DataFu Pig is a collection of useful user-defined functions for data analysis in Apache Pig. Apache DataFu Hourglass is a library for incrementally processing data using Apache Hadoop MapReduce. This library was inspired by the prevalence of sliding window computations over daily tracking data. Computations such as these typically happen at regular intervals (e.g. daily, weekly), and therefore the sliding nature of the computations means that much of the work is unnecessarily repeated. DataFu’s Hourglass was created to make these computations more efficient, yielding sometimes 50-95% reductions in computational resources.
  • Drill - Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google’s Dremel.
  • Edgent - Apache Edgent is a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the continuous streams of data coming from equipment, vehicles, systems, appliances, devices and sensors of all kinds (for example, Raspberry Pis or smart phones). Working in conjunction with centralized analytic systems, Apache Edgent provides efficient and timely analytics across the whole IoT ecosystem: from the center to the edge.
  • Falcon - Apache Falcon is a data processing and management solution for Hadoop designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon enables end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop clusters.
  • Flink - Flink is an open source system for expressive, declarative, fast, and efficient data analysis. It combines the scalability and programming flexibility of distributed MapReduce-like platforms with the efficiency, out-of-core execution, and query optimization capabilities found in parallel databases.
  • Flume - Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store
  • Giraph - Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections.
  • Hama - The Apache Hama is an efficient and scalable general-purpose BSP computing engine which can be used to speed up a large variety of compute-intensive analytics applications.
  • Helix - Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix automates reassignment of resources in the face of node failure and recovery, cluster expansion, and reconfiguration.
  • Ignite - Apache Ignite In-Memory Data Fabric is designed to deliver uncompromised performance for a wide set of in-memory computing use cases from high performance computing, to the industry most advanced data grid, in-memory SQL, in-memory file system, streaming, and more.
  • Kafka - A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
  • Knox - The Apache Knox Gateway is a REST API Gateway for interacting with Hadoop clusters. The Knox Gateway provides a single access point for all REST interactions with Hadoop clusters. In this capacity, the Knox Gateway is able to provide valuable functionality to aid in the control, integration, monitoring and automation of critical administrative and analytical needs of the enterprise.
  • Lens - Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one.
  • MetaModel - With MetaModel you get a uniform connector and query API to many very different datastore types, including: Relational (JDBC) databases, CSV files, Excel spreadsheets, XML files, JSON files, Fixed width files, MongoDB, Apache CouchDB, Apache HBase, Apache Cassandra, ElasticSearch, OpenOffice.org databases, Salesforce.com, SugarCRM and even collections of plain old Java objects (POJOs). MetaModel isn’t a data mapping framework. Instead we emphasize abstraction of metadata and ability to add data sources at runtime, making MetaModel great for generic data processing applications, less so for applications modeled around a particular domain.
  • Oozie - Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
  • ORC - ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the values that are required for the current query.
  • Parquet - Apache Parquet is a general-purpose columnar storage format, built for Hadoop, usable with any choice of data processing framework, data model, or programming language.
  • Phoenix - Apache Phoenix enables OLTP and operational analytics for Apache Hadoop by providing a relational database layer leveraging Apache HBase as its backing store. It includes integration with Apache Spark, Pig, Flume, Map Reduce, and other products in the Hadoop ecosystem. It is accessed as a JDBC driver and enables querying, updating, and managing HBase tables through standard SQL.
  • REEF - Apache REEF (Retainable Evaluator Execution Framework) is a development framework that provides a control-plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource reuse for data caching, and state management abstractions that greatly ease the development of elastic data processing workflows on cloud platforms that support a Resource Manager service.
  • Samza - Apache Samza provides a system for processing stream data from publish-subscribe systems such as Apache Kafka. The developer writes a stream processing task, and executes it as a Samza job. Samza then routes messages between stream processing tasks and the publish-subscribe systems that the messages are addressed to.
  • Spark - Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics.
  • Sqoop - Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
  • Storm - Apache Storm is a distributed real-time computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing real-time computation.
  • Tajo - The main goal of Apache Tajo project is to build an advanced open source data warehouse system in Hadoop for processing web-scale data sets. Basically, Tajo provides SQL standard as a query language. Tajo is designed for both interactive and batch queries on data sets stored on HDFS and other data sources. Without hurting query response times, Tajo provides fault-tolerance and dynamic load balancing which are necessary for long-running queries. Tajo employs a cost-based and progressive query optimization techniques for optimizing running queries in order to avoid the worst query plans.
  • Tez - Apache Tez is an effort to develop a generic application framework which can be used to process arbitrarily complex directed-acyclic graphs (DAGs) of data-processing tasks and also a reusable set of data-processing primitives which can be used by other projects.
  • VXQuery - Apache VXQuery will be a standards compliant XML Query processor implemented in Java. The focus is on the evaluation of queries on large amounts of XML data. Specifically the goal is to evaluate queries on large collections of relatively small XML documents. To achieve this queries will be evaluated on a cluster of shared nothing machines.
  • Zeppelin - Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects.

There is a serious amount of overlap between these projects. Not all of these projects have web APIs, while some of them are all about delivering a gateway or aggregate API across projects. There is a lot to process here, but I think listing them out provides an easier way to understand the big data explosion of projects over at Apache.

It is tough to understand what each of these do without actually playing with them, but that is something I just don’t have the time to do, so next up I’ll be doing independent searches for these project names, and finding stories from across the space regarding what folks are doing with these data solutions. That should give me enough to go on when putting them into specific buckets, and finding their place in my data, and database API research.


Database To Database Then API, Instead Of Directly To API

I am working with a team to expose a database as an API. With projects like this there can be a lot of anxiety in exposing a database directly as an API. Security is the first one, but in my experience, most of the time security is just cover for anxiety about a messy backend. The group I’m working with has been managing the same database for over a decade, adding on clients, and making the magic happen via a whole bunch of databases and table kung fu. Keeping this monster up and running has been priority number one, and evolving, decentralizing, or decoupling has never quite been a priority.

The database team has learned the hard way, and they have the resources to keep things up and running, but never seem to have them when it comes to refactoring it and thinking differently, let alone tackling the delivery of a web API on top of things. There will need to be a significant amount of education and training around REST, and doing APIs properly before we can move forward, something there really isn’t a lot of time or interest in doing. To help bridge the gap I am suggesting that we do an entirely new API, with it’s own database, and we focus on database to database communication, since that is what the team knows. We can launch an Amazon RDS instance, with an EC2 instance running the API, and the database team can work directly with RDS (MySQL) which they are already familiar with.

We can have a dedicated API team handle the new API and database, and the existing team can handle the syncing from database to database. This also keeps the messy, aggregate, overworked database out of reach of the new API. We get an API. The database team anxiety levels are lowered. It balances things out a little. Sure there will still be some work between databases, but the API can be a fresh start, and it won’t be burdened by the legacy. The database to database connection can carry this load. Maybe once this pilot is done, the database team will feel a little better about doing APIs, and be a little more involved with the next one.

I am going to pitch this approach in coming weeks. I’m not sure if it will be well received, but I’m hoping it will help bridge the new to the old a little bit. I know the database team likes to keep things centralized, which is one reason they have this legacy beast, so there might be some more selling to occur on that front. Doing APIs isn’t always about the technical. It is often about the politics of how things get done on the ground. Many organizations have messy databases, which they worry will make them look bad when any of it is exposed as an API. I get it, we are all self-conscious about the way our backends look. However, sometimes we still need to find ways to move things forward, and find compromise. I hope this database to database, then to API does the trick.


Each Airpair Datastore Comes With Complete API and Developer Portal

I see a lot of tools come across my desk each week, and I have to be honest I don’t alway fully get what they are and what they do. There are many reasons why I overlook interesting applications, but the most common reason is because I’m too busy and do not have the time to fully play with a solution. One application I’ve been keeping an eye on as part of my work is Airtable, which I have to be honest, I didn’t get what they were doing, or really I just didn’t notice because I was too busy.

Airtable is part spreadsheet, part database, that operates as a simple, easy to use web application, which with a push of a button, you can publish an API from. You don’t just get an API by default with each Airtable, you get a pretty robust developer portal for your API complete with good looking API documentation. Allowing you to go from an Airtable (spreadsheet / database) to API and documentation–no coding necessary. Trust me. Try it out, anyone can create an Airtable and publish an API that any developer can visit and quickly understand what is going on.

As a developer, API deployment still feels like it can be a lot of work. Then, once I take off my programmers hat, and put on my business user hat, I see that there are some very easy to use solutions like Airtable available to me. Knowing how to code is almost slowing me down when it comes API deployment. Sure, the APIs that Airtable publishes aren’t the perfectly designed, artisanally crafted API I make with my bare hands, but they work just as well as mine. Most importantly, they get business done. No coding necessary. Something that anyone can do without the burden of programming.

Airtable provides me another solution that I can recommend that my readers and clients should consider using when managing their data, which will also allow them to easily deploy an API for developers to build applications against.I also notice that Airtable has a whole API integration part of their platform, which allows you to integrate your Airtables into other APIs–something I will have to write about separately in a future post. I just wanted to make sure and take the time to properly add Airtable to my research, and write a story about them so that they are in my brain, available for recall when people are asking me for easy to use solutions that will help them deploy an API.


Bringing The API Deployment Landscape Into Focus

I am finally getting the time to invest more into the rest of my API industry guides, which involves deep dives into core areas of my research like API definitions, design, and now deployment. The outline for my API deployment research has begun to come into focus and looks like it will rival my API management research in size.

With this release, I am looking to help onboard some of my less technical readers with API deployment. Not the technical details, but the big picture, so I wanted to start with some simple questions, to help prime the discussion around API development.

  • Where? - Where are APIs being deployed. On-premise, and in the clouds. Traditional website hosting, and even containerized and serverless API deployment.
  • How? - What technologies are being used to deploy APIs? From using spreadsheets, document and file stores, or the central database. Also thinking smaller with microservices, containes, and serverless.
  • Who? - Who will be doing the deployment? Of course, IT and developers groups will be leading the charge, but increasingly business users are leveraging new solutions to play a significant role in how APIs are deployed.

The Role Of API Definitions While not every deployment will be auto-generated using an API definition like OpenAPI, API definitions are increasingly playing a lead role as the contract that doesn’t just deploy an API, but sets the stage for API documentation, testing, monitoring, and a number of other stops along the API lifecycle. I want to make sure to point out in my API deployment research that API definitions aren’t just overlapping with deploying APIs, they are essential to connect API deployments with the rest of the API lifecycle.

Using Open Source Frameworks Early on in this research guide I am focusing on the most common way for developers to deploy an API, using an open source API framework. This is how I deploy my APIs, and there are an increasing number of open source API frameworks available out there, in a variety of programming languages. In this round I am taking the time to highlight at least six separate frameworks in the top programming languages where I am seeing sustained deployment of APIs using a framework. I don’t take a stance on any single API framework, but I do keep an eye on which ones are still active, and enjoying usag bey developers.

Deployment In The Cloud After frameworks, I am making sure to highlight some of the leading approaches to deploying APIs in the cloud, going beyond just a server and framework, and leveraging the next generation of API deployment service providers. I want to make sure that both developers and business users know that there are a growing number of service providers who are willing to assist with deployment, and with some of them, no coding is even necessary. While I still like hand-rolling my APIs using my peferred framework, when it comes to some simpler, more utility APIs, I prefer offloading the heavy lifting to a cloud service, and save me the time getting my hands dirty.

Essential Ingredients for Deployment Whether in the cloud, on-premise, or even on device and even the network, there are some essential ingredients to deploying APIs. In my API deployment guide I wanted to make sure and spend some time focusing on the essential ingredients every API provider will have to think about.

-Compute - The base ingredient for any API, providing the compute under the hood. Whether its baremetal, cloud instances, or serverless, you will need a consistent compute strategy to deploy APIs at any scale. -Storage - Next, I want to make sure my readers are thinking about a comprehensive storage strategy that spans all API operations, and hopefully multiple locations and providers. -DNS - Then I spend some time focusing on the frontline of API deployment–DNS. In todays online environment DNS is more than just addressing for APIs, it is also security. -Encryption - I also make sure encryption is baked in to all API deployment by default in both transit, and storage.

Some Of The Motivations Behind Deploying APIs In previous API deployment guides I usually just listed the services, tools, and other resources I had been aggregating as part of my monitoring of the API space. Slowly I have begun to organize these into a variety of buckets that help speak to many of the motivations I encounter when it comes to deploying APIs. While not a perfect way to look at API deployment, it helps me thinking about the many reasons people are deploying APIs, and craft a narrative, and provide a guide for others to follow, that is potentially aligned with their own motivations.

  • Geographic - Thinking about the increasing pressure to deploy APIs in specific geographic regions, leveraging the expansion of the leading cloud providers.
  • Virtualization - Considering the fact that not all APIs are meant for production and there is a lot to be learned when it comes to mocking and virtualizing APIs.
  • Data - Looking at the simplest of Create, Read, Update, and Delete (CRUD) APIs, and how data is being made more accessible by deploying APIs.
  • Database - Also looking at how APIs are beign deployed from relational, noSQL, and other data sources–providing the most common way for APIs to be deployed.
  • Spreadsheet - I wanted to make sure and not overlook the ability to deploy APIs directly from a spreadsheet making APIs are within reach of business users.
  • Search - Looking at how document and content stores are being indexed and made searchable, browsable, and accessible using APIs.
  • Scraping - Another often overlooked way of deploying an API, from the scraped content of other sites–an approach that is alive and well.
  • Proxy - Evolving beyond early gateways, using a proxy is still a valid way to deploy an API from existing services.
  • Rogue - I also wanted to think more about some of the rogue API deployments I’ve seen out there, where passionate developers reverse engineer mobile apps to deploy a rogue API.
  • Microservices - Microservices has provided an interesting motivation for deploying APIs–one that potentially can provide small, very useful and focused API deployments.
  • Containers - One of the evolutions in compute that has helped drive the microservices conversation is the containerization of everything, something that compliments the world of APis very well.
  • Serverless - Augmenting the microservices and container conversation, serverless is motivating many to think differently about how APIs are being deployed.
  • Real Time - Thinking briefly about real time approaches to APIs, something I will be expanding on in future releases, and thinking more about HTTP/2 and evented approaches to API deployment.
  • Devices - Considering how APis are beign deployed on device, when it comes to Internet of Things, industrial deployments, as well as even at the network level.
  • Marketplaces - Thinking about the role API marketplaces like Mashape (now RapidAPI) play in the decision to deploy APIs, and how other cloud providers like AWS, Google, and Azure will play in this discussion.
  • Webhooks - Thinking of API deployment as a two way street. Adding webhooks into the discussion and making sure we are thinking about how webhooks can alleviate the load on APIs, and push data and content to external locations.
  • Orchestration - Considering the impact of continous integration and deployment on API deploy specifically, and looking at it through the lens of the API lifecycle.

I feel like API deployment is still all over the place. The mandate for API management was much better articulated by API service providers like Mashery, 3Scale, and Apigee. Nobody has taken the lead when it came to API deployment. Service providers like DreamFactory and Restlet have kicked ass when it comes to not just API management, but making sure API deployment was also part of the puzzle. Newer API service providers like Tyk are also pusing the envelope, but I still don’t have the number of API deployment providers I’d like, when it comes to referring my readers. It isn’t a coincidence that DreamFactory, Restlet, and Tyk are API Evangelist partners, it is because they have the services I want to be able to recommend to my readers.

This is the first time I have felt like my API deployment research has been in any sort of focus. I carved this layer of my research of my API management research some years ago, but I really couldn’t articulate it very well beyond just open source frameworks, and the emerging cloud service providers. After I publish this edition of my API deployment guide I’m going to spend some time in the 17 areas of my research listed above. All these areas are heavily focused on API deployment, but I also think they are all worth looking at individually, so that I can better understand where they also intersect with other areas like management, testing, monitoring, security, and other stops along the API lifecycle.


Google Spanner Is A Database With An API Core

I saw the news that Google’s Spanner Database is ready for prime time, and I wanted to connect it with a note I took at the Google Analyst Summit a few months back–that gRPC is the heart of the database solution. I’m not intimate with the Spanner architecture, approach, or codebase yet, but the API focus, both gRPC core, and REST APIs for a database platform are very interesting.

My first programming job was in 1987, developing COBOL databases. I’ve watched the database world evolve, contributing to my interest in APIs, and I have to say Google Spanner isn’t something I anticipated. Databases have always been where you start deploying an API, but Spanner feels like something new, where the database and the API are one, and the way the database does everything internally and externally is done via APIs (gRPC).

Now that Spanner Database is ready for prime time, I will invest some more time in standing up an instance of it and get to work playing with what is possible with the REST APIs. I also want to push forward my grPC education by hacking on this side of the database’s interface. Spanner feels like a pretty seismic shift in how we do APIs, and how we do them at scale–when you combine this with the elasticity of the cloud, and the simplicity of RESTful interfaces I think there is a lot of potential.


Your API Should Reflect A Business Objective Not A Backend System

I'm in the middle of evolving a data schema to be a living breathing API. I just finished generating 130 paths, all with the same names as the schema tables and their fields. It's a natural beginning to any data-centric API. In these situations, it is easy for us to allow the backend system to dictate our approach to API design, rather than considering how the API will actually be used.

I'm taking the Human Service Data Specification (HSDS) schema, and generating the 130 create, read, update, and delete (CRUD) API paths I need for the API. This allows the organizations, location, services, and other details being managed as part of any human service API that will be managed in a very database-driven way. This makes sense to my database administrator brain, but as I sit in a room full of implementors I'm reminded that none of this matters if it isn't serving an actual business objective.

If my API endpoints don't allow a help desk technician properly search for a service, or a website user browse the possibilities to find what they are looking for, my API means nothing. The CRUD is the easy part. Understanding the many different ways my API paths will (or won't) help someone find the services they need or assist a human service organization to better reach their audience is what the API(s) are all about, not just simply reflecting the backend system, and then walking away calling the job done.


I Am Keeping My Mind Open And Looking Forward To Learning More About GraphQL

I wrote a post the other day sharing my thoughts around GraphQL seeming like we were avoiding the hard work of API design. Shortly after publishing Sashko Stubailo (@stubailo) from Apollo, a GraphQL solution provider, wrote a very thoughtful response to me comments and questions about GraphQL. First I wanted to say that I really dig this approach to responding to other people's blog posts, with a blog post of your own, within your own personal or company domain.

I don't think Sashko has convinced me 100% that GraphQL is the solution we are looking for, but he has convinced me that I should be learning more about it, keeping a closer eye on the technology, and better understand how people are putting it to use.

Regarding my primary question regarding how GraphQL could benefit non-technical folks and end-users--I would say he answered it 50% of the way:

I’m a frontend developer. GraphQL makes my life easy.

It doesn't touch on whether or not non-technical users will be able to reverse engineer and put it to work for them, but that's ok for now. One thing that Sashko touched on for me, that move GraphQL closer to being simple enough for non-technical users, is he helped differentiate GraphQL from SQL:

GraphQL is a query language, like SQL? While GraphQL looks like a query language at first, I think its name might be one of the things that gets people off on the wrong foot. GraphQL is not at all like SQL...So GraphQL is a “query language” just like URLs are the “query language” of REST—it’s a contract that describes how to tell the API server what you’re looking for.

I like the separation from "structured" query language, and moving us to something that augments HTTP and the URL, and doesn't just tunnel into the backend database. This has the potential to move REST forward, not drag the database out front for me--which leaves me more hopeful.

Another area Sashko answered for me was regarding GraphQL seeming like it was too hard:

This is a very real concern whenever a new technology is introduced. Is this going to make stuff more complicated for everyone who isn’t in the know? 

Fair enough. Makes me happy to hear this from a service provider who is leading the charge when it comes to GraphQL. His stance seems like it is pragmatic, and aware of the importance that GraphQL needs to be accessible to as wide as possible audience as we can--building on the momentum REST has in this area.

What really pushed away my concern, and got me more interested in paying more attention to GraphQL was when Sashko talked about this just being the beginning:

The most exciting thing to me is that GraphQL has been publicly available for barely more than a year, and already a huge number of people I respect for their technical abilities and design thinking are trying to figure out ways to add it to their architecture. 

Ok. Now I want to see where this goes. I've been tracking on GraphQL as a subset of my data API research, but will spend some extra cycles each week keeping an eye who is doing anything interesting with GraphQL. I've added Apollo to my research, and I will work on a roundup of other providers, and any open source tooling I can find out there. I also wanted to thank Sashko for taking the time to answer some of my questions, and respond to my uncertainty around GraphQL. I dig it when API service providers and API providers provide responses to my storytelling on API Evangelist--makes for good conversation in the API community.


Create a PHP Script To Generate An OpenAPI Specification From The ClinicalTrials.Gov Database I Created

One of my objectives around importing the ClinicalTrials.gov data, is to create an API. The first step in creating an API, before we ever get programming anything, is to create an OpenAPI Spec for use as a version 1.0 scaffolding for the clinical trials data we now have stored in a MySQL database.

I sure wasn't going to be hand crafting an OpenAPI Spec for this fairly large data set, so I got to work creating a crude PHP script that would do the heavy lifting for me:

This script loops through all the tables in my clinical trials database, and auto generates the necessary JSON schema for the data structure, combined with OpenAPI Spec for describing the API interface for the clinical trials database. I have the result in a single OpenAPI Spec file, but will most likely be breaking up to make it easier to work with:

This OpenAPI Spec for the clinical trials API gives me a base blueprint I can use to generate server side code, client side code, documentation, and other essential building blocks of the API operations I put in place to support accessing the clinical trials API.

I will be added better descriptions for paths, parameters, schema, and other elements of this in the future, with this definition acting as the contract for the clinical trials API.


We Need An API For The Chronology of Data Breaches Database

I came across the Privacy Rights Clearinghouse, while conducting a search that turned up the chronology of data breaches, which provides details on 4,725 data breaches that have been made public since 2005. The website allows you to search for data breaches by type of breach, type of organization, and the year in which it occurred--some very valuable information.

In 2016, as breaches continue to be common place across almost all industries, we are going to need to take the chronology of data breaches up a notch. I would like to see an API be made available for the valuable database. As I do, I write stories about what I'd like to see in the space, and forward the link to key actors, and tell the story to the public at large, in hopes of inciting something to happen.

Making the data breach information available via API, would encourage more storytelling around the events, which could include much more meaningful visualizations using solutions like D3.js. Information about companies, could be included into other business search and discovery tooling, and more push notification networks could be setup that could keep industry experts more informed about what is happening across the sector.

Now that I am on the subject, it would make sense if all the privacy topics, and other resources available via the Privacy Rights Clearinghouse accessible through a simple API interface. If you work with the Privacy Rights Clearinghouse, and would like to talk about making this happen, feel free to reach out. If you are someone who would like to help work on this project, or possibly fund this work, also please let me know.

The type of information the Privacy Rights Clearinghouse is aggregating is only going to become more important in the insecure cyber world we have created, and making it accessible for reading and writing via a simple API, would significantly help the organization make a bigger impact, and educate a larger audiences.


State of Popular Database Platforms And Their Native API Publishing Features

I had a reinder on my task list to check-in on where some of the common database platforms were when it came to APIs. I think it was a Postgres announcement from a while back that put the thought in my notebook, but as an old database guys I tend to check-in regularly on the platforms I have worked most with.

The point of this check-in, is to see how far along each of the database platforms are when it comes to easy API deployment, directly from tables. The three relational database platforms I'm most familiar with are:

  • SQL Server - The platform has APIs to manage, and you deploy an OData service, as well as put .NET to work, but nothing really straightward, that would allow any developer to quickly expose simple RESTful API.
  • PostgreSQL - I'd say PostgreSQL is furthest along with thier "early draft proposal of an extension to PostgreSQL allowing clients to access the database using HTTP", as they have the most complete information about how to deploy an API.
  • MySQL - There was a writeup in InfoQ about MySQL offering a REST API, but from what I can tell it is still in MySQL Labs, without much movement or other stories I could find to show any next steps.

The database that drives my API platform is MySQL running via Amazon RDS. I haven't worked on Postgres for years, and jumped ship on SQL Server a while back (my therapist says I cannot talk about it). I automate the generation of my APIs using Swagger and the Slim framework, then do the finish work like polishing the endpoints to look less like their underlying database, and more like how they will actually be used. 

Maybe database platforms shouldn't get into the API game? Leaving API deployment to the API gateway providers like SlashDB and DreamFactory. It just seems like really low hanging fruit for these already used database solutions, to make it dead simple for developers to expose, and craft APIs from existing datasources.

if you are using any database to API solutions for SQL Server, PostgreSQL, or MySQL, please let me know.


Scientific Database Monetization

Wormbase working.


The Next Steps For The The Recreation Information Database (RIDB) API

I referenced the Recreation Information Database (RIDB), in my story late last year, when I was asking for your help to make sure the Department of Agriculture leads with APIs in their parks and recreation RFP. I'm not exactly sure where it fits in with the RFP, because the RIDB spans multiple agencies.

Here is the description from the RIDB site:

RIDB is a part of the Recreation One Stop (Rec1Stop) project, initiated as a result of a government modernization study conducted in 2004. Rec1Stop provides a user-friendly, web-based resource to citizens, offering a single point of access to information about recreational opportunities nationwide. The web site represents an authoritative source of information and services for millions of visitors to federal lands, historic sites, museums, and other attractions/resources.

When I wrote the post last October, I referenced the PDF for the REST API Requirements for the US Forest Service Recreation Information Database (RIDB), but this week I got an update, complete with fresh links to a preview of a pretty well designed API, complete with documentation developed using Slate.

I haven’t actually hacked on the endpoints, but I took a stroll through the docs, and my first impression was it is well designed, and robust, including resources for organizations involved with the RDB, recreational areas, facilities, campsites, permit entrances, tours, activities, events, media, and links. The RIDB documentation also includes details on errors, pagination, version, and a data dictionary, and the on-boarding was frictionless when looking to get a key.

Everything is in dev mode with the RIDB API, and the team is looking for feedback. I’m not entirely sure they wanted me to publish as story on API Evangelist, but I figure the more people involved the better, as I’m not sure when I’ll get around to actually hacking on the API. I’m happy to see such a quick iteration towards the next generation of the RIDB API, and it makes me hopeful to be covering so many new API efforts out of the federal government in a single day.


If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.