A closer look at the problem
Part of the success of REST comes from the fact that it elegantly uses the HTTP framework, making it easy to understand and work with. The first challenge is that when HTTP was designed, the world of the internet was largely a world of static or data that was changing slowly. Things have moved on, and we expect data to change constantly now and at greater and greater speeds. The first example of constant change is the sending of video and audio media, particularly when the video is of a live event. So there is no way of even considering pushing all the media to us before we consume it, although such an approach historically will have taken hours or days. As a result, a low-level connection is established between the client and server, allowing the server to have a steady series of data blocks — or a stream of data. Hence the term streaming. While they have grown, the sources of video and audio aren’t so significant that we can’t lean on a few providers to support us, and we plug in their services to our solutions. How many websites actually stream their own videos compared to integrating YouTube or Vimeo into their pages?
Learn more about API Conference
The problem is that the number of use cases that now need or expect a constant flow of data, sometimes flowing in both directions, has grown massively and extends into the world of more conventional text-based payloads. While in low-volume use cases, we can address the problem much like hitting refresh on the browser — in other words, we simply repeat the API request to the data source, which is what polling is. As you scale this up, it becomes a large consumption of resources — particularly if your refresh requests often result in no data changes. Even with network caches taking some of the burden, that’s network traffic and caching servers that still need to be paid for. Not to mention the user experience, where the user can observe lag or latency because of the additional time from round-tripping the data again. There is also the overhead that every request goes through a handshake and authentication processes before we get to the logic of processing the actual request.
As backends to applications become a composition of microservices, if they are all polling each other, we will see an accumulation of latency and polling overhead. Again this presents a compute overhead, which in low-volume cases can be tolerated and addressed by increasing the amount of physical resources. When scaled up, it becomes a significant cost in terms of needing more network capacity, more servers, power, cooling, and so on. As a result, changes to improve execution cost, let alone the benefits of better user experience, make it worthwhile looking to invest engineering effort in creating the means for data to be streamed at least in one direction between client and server, if not both.
By allowing a stream of data to leave our servers, we do raise some security considerations. Can we be sure that the recipient of our stream is legitimate and the intended destination? Is it possible for a malicious actor to initiate or become the stream recipient (e.g., a man in the middle, compromised client, malicious request, etc.)? Can we be duped into putting into the stream more data than should be shared due to some sort of injection attack? Can we link the steady flow of data back to the request for audit purposes?
Conversely, if the client suddenly receives a connection and data pours down the connection, do we know who the data is from? Is someone spoofing a server? Can we trust the origin — after all, when a request is made, we authenticate ourselves. Should we expect the response to do likewise, particularly if it is in a separate connection? These considerations may be addressed as the stream is part of the exchange, but if it isn’t, then what? There are strategies we can use to mitigate concerns, such as the client can give the server a generated token and expect it to be passed back with the stream and validate the token.
The need to make it easier to communicate a flow of data and efficiencies in the process have driven different techniques to achieve streaming or streaming-like behaviors and the evolution of two dominant frameworks that address these issues and include the streaming paradigm — GraphQL and gRPC. In addition to these frameworks, we should look at common techniques that can be used without a framework, such as Webhooks and WebSockets.
Webhooks are probably the simplest model to move from polling to something more efficient. As any client will have a web address, it becomes possible for us to initiate a flow of data by passing our address as part of the request as a URL and asking the server to send any changes it sees to that address. So the data provider (server), when there is data to share, then initiates a conventional REST API call to the provided address in the same way as when the client is pushing a data change to a server. We have, in effect simply established a reverse path for normal API calls. The server continues to do this until some rule is satisfied — this may be time-based or until an error occurs communicating with the client as you’d expect if a client was shifting across networks or connecting and disconnecting from a network resulting in the client having changing addresses.
Webhooks are simple to implement but do contain a couple of challenges. Firstly, the client has to understand when the server has lost contact with them to update the server of their address. After all, the provider won’t know any different until it has data to send, and applications rarely ‘monitor’ infrastructure on which they run to recognize something has changed that will require a refresh of the webhook address; we simply handle errors when something happens but with a webhook we won’t experience an error as we passively wait for a call to be received. Of course, we could solve this by introducing a heartbeat exchange — but we have just added compute effort and bandwidth use.
From a security standpoint, webhooks do have some weaknesses to consider. As the client provides an address for the server to talk with, we can hijack that address — either using the server’s record of addresses to call or spoofing or acting as a man in the middle and sending poisonous data. We could change the client address to somewhere that wants the data for malicious reasons, and the provider would never know they’re leaking data.
Our client now also needs to accept incoming traffic from locations that may not have the same address as the location it communicated with to make the initial request. This represents an increase in risk for the client.
While these risks can be mitigated, such as the server providing an authentication token the client provided, we continue to add complexity. We also have the issue that the conversation between the client and server is focused on communication being in one direction, sometimes described as half-duplex.
WebSockets are the foundation for most streaming solutions from video, audio, and onwards. They work by the client initiating a connection, but rather than the connection being closed once the request is addressed, as we would see in a standard HTTP conversation, the connection is kept open. Each connection with a client consumes resources at both ends — as each end holds resources to receive the network traffic etc. As the connection is kept open, the server can simply put on the connection any data that changes, and the client will receive it, and this can happen in both directions. As a result, we can easily do things like have the client confirm receipt of each data block. Nor do we have to go through the initial handshake process of agreeing on a connection and authenticating each other.
Using WebSockets does mean that the implementation work needs to manage the use of TCP unless a suitable library can be obtained. Needing to work at these lower levels distracts from developing functionality for the business solution.
Server Side Events
HTML5 introduced a feature called Server Side Events (SSE). SSE conceptually is a lot like a WebHook. The client will start the process and provide the server with a URL for the server to use. The server will then initiate a one-way conversation sending the client data until it is time to stop. The client has no say in this; ideally, the server will include in the stream notice of termination; otherwise, the client won’t have any means to know when the flow has been stopped, the connection has broken, or there is an absence of data.
gRPC was developed by Google as a means to create more efficient API calls. The efficiency comes as a result of several characteristics:
- The payload doesn’t carry the same overhead as JSON in terms of self-description. The description is dependent upon the schema and positional attributes of fields and knowing how long different data types take to access the content.
- The payload is binary encoding meaning the encoding is more efficient to transmit.
As a result, the message size is smaller than an equivalent JSON representation, and the serialization and deserialization can be quicker because the message size is smaller. It could be argued that because the message is also binary, it adds an element of security as the payload isn’t humanly readable. Even if you went from the binary back text, you need to know the structure.
The payload is defined using a Protobuf schema. The schema includes explicit ordering of attributes in the payload. Using positioning and knowing the size of different data types means extracting values is very efficient. This doesn’t prevent you from having variable length fields like strings or arrays of strings as variability is encoded into the payload, but there are good practices about positioning. With a schema defined, we generate the client and server code using the Protobuf tooling. Defining whether an API call will be streamed is the inclusion of the keyword stream in the service definition. You can see streamed responses in this example:
Where the keyword is used dictates the direction of the stream, from client to server, server to client, or in both directions. This makes it very easy to define and use. However, gRPC does bring some constraints. The key considerations are:
- Generated code relies on the use of HTTP/2, which does impose some restrictions on the host app server that can be used, and if you’re using an API Gateway.
- Load balancing is challenging, to the point of not being helpful. In normal API calls, an open connection equates to busy. But a connection can be open, but still no actual activity. The request may also have been completed, but the response could be a stream.
- Both client and server depend on a common code base — the generated code.
As a technology, gRPC is particularly effective where you have control over both ends of a connection, such as microservice to microservice connectivity within your bounded context. Applying gRPC this way limits the potential issues of tighter coupling gRPC can impose. It also means any use of an API gateway must understand how to handle the binary encoding. It is a very efficient and easy way to implement APIs, and extending or defining streaming connections creates little additional work. For the most part, the whole experience is not dissimilar to conventional coding of interface definitions in languages like Java. If you’ve been around for a while, then much of it will feel like using CORBA.
GraphQL also comes with performance benefits, but the efficiencies are far more application-centric when compared to gRPC. Facebook developed GraphQL to help overcome a problem observed as people’s use of Facebook shifted to using the mobile app rather than the browser. The mobile app, to keep bandwidth consumption down as it used to be a critical constraint for mobile devices, ended up making many small data requests building on the data from the previous request. This meant Facebook’s servers were being overloaded with millions of small API calls, with each call carrying an overhead for validating, serializing, and deserializing data. GraphQL solved all this in two ways. Firstly, by allowing the client to express just the attributes that are needed rather than an entire fixed record. Secondly, for the request to cover related entities, performing a logical data join. So rather than separate calls to get my list of friends and then requesting specific aspects of their contact details, such as just their email address, I can now formulate a single call that says get my friends and return them with their email address., The process of resolving this is then resolved in one go on the server side. GraphQL extended this paradigm with what it calls a subscription. In other words, rather than just sending me the current answer, I want to hear all the results as data changes. GraphQL communicates using JSON payloads, meaning everything is self-describing and human-readable.
How GraphQL achieves the subscription functionality is dependent upon the implementation of GraphQL. Most implementations use WebSockets under the hood, although some have used SSE. Regardless of the low-level communication implemented, the client and server don’t need to handle the low-level mechanics.
You may have noticed that across most answers are common challenges, which we could view as the tax for adopting streaming. These challenges include:
- Load balancing the demands on the backend needs to be looked at more closely; simple network balancing is unlikely to solve the problem.
- Understanding when the client has lost contact needs to be considered.
- Most options mean streaming clients consume network connections.
- Your infrastructure needs to be accepting of the connections — open sockets or use of newer HTTP versions.
- How do you know what data the client has received (if it matters)? How do you ensure that messages being sent are going in the correct order (could some caching infrastructure between server and client influence the order received)?
So streaming isn’t without its costs, so it is worth weighing up the cost benefits.
Learn more about API Conference
How do I decide which technology to use?
Deciding on which technology to use depends upon several factors. As the joke goes, ask a software architect anything, and they all agree on the answer ‘it depends.’ As an architect, I prefer to empower people to make their own decisions and understand why the decision fits their needs. This means building what we call a decision matrix which has columns representing each of the different technologies (in our case, GraphQL, etc.). The rows then cover the different factors that impact your decision, and the cells describe how that factor is addressed or impacts the decision. The column with the most favorable responses is the tech to use. We can see in the following fragment part of a decision matrix.