Solution Architecture of Downloading Data from REST APIs in Azure

Fred Gu
3 min readApr 22, 2021

--

It is common that third party platforms to provide data integration through REST APIs. However it is still challenging to enable a download solution as a user of those REST APIs.

A download scenario and its potential challenges

The potential challenges are:

  1. How to renew the access token before or when it expires which also need be accessible by the whole download logic
  2. How to orchestrate the whole download logic in a robust and efficient manner.
  3. How to parse the JSON data output without interrupting the download execution by exceptions
  4. How to build the related data warehouse which can deal with duplicated download, missing download, incomplete download, change tracking, aggregation and any other data requirements

There will be many solutions to address the above challenges. The below is to propose two architectures in Azure platform which I have ever implemented.

Server-less Architecture

In server-less architecture, there are some major components:

  • Data Factory is used to define and schedule the pipeline. The pipeline trigger Azure Functions either sequentially or parallelly.
  • Azure functions are coded to perform the real REST API download work.
  • Redis Caches is used to keep the latest token
  • Blob storage or ComosDB is used as the data storage for the download JSON file/data.
  • API Management is used to wrap Azure functions as API Endpoints, which makes it possible to integrate Functions into other applications with extra security checks
Server-less Architecture for API Download

There will be multiple Azure functions:

  • One function specifically looks after token renew to enable continuous authentication. It could be scheduled by Data Factory to execute every x minutes.
  • Each of other function supports a specific API endpoint. They could be scheduled in Data Factory as well. All those functions can fetch token directly from Redis Caches.

The reason why Redis Caches is used as Token storage is to speed up the token retrievemcent inside every running function.

Blob storage is a cheap storage solution to save the downloaded JSON data, however, Cosmos DB is more recommended if you want to immediately query the JSON data and worry the least about how to parse.

Azure Batch Architecture

Azure Batch is an undervalued service in Azure picture. It provides a running environment where you can run self-defined scripts and monitor their status in a very neat style. I will write a post specifically talking about Azure Batch in the future.

The beauty of using Azure Batch is that programmers can use their most familiar language to write scripts for the download logic, and quickly place the scripts as batch jobs scheduled by Data Factory.

Azure Batch Architecture for API Download

Summary

In this article, we checked about two high level architectures for API download. The server-less architecture is more recommended for any established team, while the Azure batch architecture could be for those who have already developed complicated scripts in their local machine but now want to get them running in the cloud.

Please contact me if you have any doubts or questions.

--

--

Fred Gu

Solution Architect, Data Scientist, Full-Stack Developer, Mobile App Maker, Consultant, Project Manager, Product Owner, A Thinker, Doer and Top-performer