Unikernels, Software Containers and Serverless Architecture: Road to Modularity

This blog post is discussing the implications of Unikernels, Software Containers and Serverless Architecture on Modularity of complex software systems in a service mesh as illustrated below. Modular software systems claim to be more maintainable, secure and future proven compared to software monoliths.

StackedModules

Software containers or the alternative MicroVMs have been proven as very successful for realizing extremely scalable cloud services. Examples can be found in the areas of serverless computing and Big Data / NoSQL solutions in form of serveless databases (which are often not realized using containers). This has gone so far that upon a web request of a user the software container is started that executes a business function developed by a software engineer, an answer to the user is provided and then software container is stopped. Thus, large cost savings received in a cloud world where infrastructure and services are payed by actual consumption.

However, we will see in this post that there is still room for optimization (cost/performance) to modularize the application, which is usually based still on large monolith, such as the Java Virtual Machine with all standard libraries or the Python environment with many libraries that are in most cases not used at all to execute a single business function.  Furthermore, the operating system layer of the container is also not optimized towards the execution of a single business function as they contain much more operating system functionality than needed (e.g. drivers, file systems, network protocols, system tools etc.). Thus Unikernels are an attractive alternative to introduce cost savings in the cloud infrastructure.

Finally, we will discuss grouping of functions, ie where it make sense to combine a set of function of your application composed of single functions/microservices to one unit. Briefly we will address composable infrastructure.

Background: Software Containers and Orchestrators

The example above of course is a simplistic example and much more happens behind the scene. For example, the business function may need to fetch data from a datastore. It may need to communicate with other business functions to return an answer. This requires that these business functions, communication infrastructure and datastores need to work together, ie they need to be orchestrated. Potentially additional hardware (e.g. GPUs) need to be taken into account that is not all the time available due to cost.

This may imply, for example, that these elements run together in the same virtual network or should run on the same servers or servers close to each other for optimal response times. Furthermore, in case of failures they need to be rerouted to working instances of the business function, the communication infrastructure or the data store.

Here orchestrators for containers come into play, for example Kubernetes (K8S) or Apache Mesos. In reality, much more need to be provided, e.g. distributed configuration services, such as Etcd or Apache Zookeeper, so that every component always finds its configuration without relying on complicated deployment of local configuration files.

Docker has been a popular concept for software containers, but it was neither the first one nor was it based on new technologies. In fact, the underlying module (cgroups) of the Linux kernel has been introduced years before Docker emerged.

This concept has been extended by so-called MicroVM technologies, such as Firecracker, based on UniKernels to provide only the OS functionality needed. This increases reliability, security and efficiency significantly. Those VMs can startup much faster, e.g. within milliseconds, compared to Docker containers and thus are more suitable even for simple use cases of web service requests described above.

About UniKernels

UniKernels (also known as library operating systems) are core concept of modern container technologies, such as Firecracker, and popular for providing cloud services. They contain only the minimum set of operating system functionality necessary to run a business function. This makes them more secure, reliable and efficient with significant better reaction times. Nevertheless, they are still flexible to be able to run a wide range functionality. They contain thus a minimal needed kernel and a minimal needed set of drivers. UniKernels have been proposed for various domains and despite some successes to run them productively they are at the moment still a niche. Examples are:

  • ClickOs: Dynamically create new network devices/functions (switching, routing etc.) within milliseconds on a device potentially based on a software-defined network infrastructure
  • Runtime.js: A minimal kernel for running Javascript functions
  • L4 family of microkernels
  • Unik – compile application for using in UniKernels in Cloud environments
  • Drawbridge – a Windows-based UniKernel
  • IncludeOS – A lightweight Linux OS for containers/MicroVMs
  • Container Linux (formerly: CoreOS):  A lightweight OS to run containers, such as Docker, but more recently based on rkt. While this approach is very light-weight, it still requires that the rkt containers that are designed by developers are light-weight, too. Especially care must be taken that different containers do not only include the libraries necessary, but also only the parts of the libraries necessary and only one version of them.
  • OSv – run unmodified Linux application on a UniKernel
  • MirageOSOcaml based

Serveless Computing

Serverless computing is based on MicroVMs and Unikernels. Compared to the traditional containerization approaches this reduces significantly the resource usage and maintenance cost. On top, they provide a minimum set of libraries and engines (e.g. Java, Python etc.) to run a business function with ideally the minimum needed set of functionality (software/hardware). Examples for for implementations of serverless computing are OpenFass , Kubeless or OpenWhisk. Furthermore, all popular cloud services offer serverless computing, such as AWS Lambda, Azure Functions or Google Cloud Functions.

The advantage of serverless computing is that one ideally does not have to design and manage a complex stack of operating systems, libraries etc., but simply specifies a business function to be executed. This reduces significantly the operating costs for the business function as server maintenance, operating system maintenance, library maintenance are taken over by the serverless computing platform. Furthermore, the developer may specify required underlying platform versions and libraries. While those are usually offered by the service provider out of the box, they need to be created manually by them or the developer of the business function.

Those libraries that provide the foundation for a business function should ideally be modularizable. For example, for a given business function one does not need all the functionality of a Java Virtual Machine (JVM) including standard libraries. However, only recently Java has introduced a possibility to modularize the JVM using the Jigsaw extension that came with JDK9. This is already an improvement for more efficiency when using serverless computing, but the resulting modules are still comparably coarse grained. For example, it is at the moment not possible to provide to the Java compiler a given business function in Java and it strips out of the existing standard libraries and third party libraries only the functionality needed. This still highly depends on the developer and also there are some limits. For other libraries/engines, such as Python, the situation is worse.

The popular standard library (glibc) is also a big monolith library that is used by Java, Python, native applications and kernels that has a lot of functionality that is not used by a single business function or even application. Here alternatives exists, such as Musl.

This means that currently perfect modularization cannot be achieved in serverless computing due to the lack of support by underlying libraries, but it is improving continuously.

Service Mesh

A service mesh is a popular mean for communication to and between functions in serverless computing. Examples for service mesh technologies are Istio, Linkerd or Consul Connect. Mostly this refers to direct synchronous communication, because asynchronous communication, which is an important pattern for calling remote functions that take a long time to complete, such as certain machine learning training and prediction tasks, is not supported directly.

However, you can deploy any messaging solution, such as ZeroMQ or RabbitMQ, to realize asynchronous communication.

The main point here is that service meshes and messaging solutions can benefit a lot from modularization. In fact, the aforementioned Clickos is used in network devices to spawn up rapidly any network function as a container that you may need from the network device, such as routing, firewall or proxying. Those concepts can be transferred to services meshes and messaging solution to deliver only the minimal functionality needed for a secure communication between serverless computing function.

The modularization of the user interface

One issue with user interfaces is that they basically provide a sensible composition of business functions that can be triggered by them. This means they support a more or less complex business process that is executed by one or more humans. In order to be usable they should present a common view on the offered functionality to the human users. New technologies, such as Angular Ivy, supports extracting from a UI library only the needed functionality reducing code size, security and reliability of the UI.

The aforementioned definition of UI means that there is at least one monolith that combines all the user interfaces related to the business functions in a single view for a given group of users. Since decades there are technologies out there that can do this, such as:

  • Portals: Portlets. More structured UI aggregation already at server side
  • Mashups: Loosely coupling of UIs using various “Web 2.0” technologies, such as REST, Websockets, JSON, Javascript, and integrating content from many different services

One disadvantage with those technologies is that a developer needs to combine different business functions into a single UI. However, the user may not need all the functionality of the UI and it cannot be expected that a developer manually combines all UIs of business functions for different user groups.

It would be more interesting that UIs are combined dynamically given the user context (e.g. desk clerk vs stewardess) using artificial intelligent technologies. However those approaches exist in academia since many years, but have not yet been managed to use in a production environment at large scale.

Finally, one need to think about distributed security technologies, such as OpenID:Connect, to provide proper authentication and authorization to access those UI combinations.

Bringing it all together: Cloud Business Functions and Orchestration

With the emergence of serverless computing, Microservices and container-based technologies we have seen the trends towards more modularization of software. The key benefits are higher flexibility, higher security and simpler maintenance.

One issue related to this is how to include only the minimal set of software to run a given business function. We have seen that it is not so easy and currently one still has to include large monolith libraries, such as Glibc, Python  or Java, to run a single business function. This increases the risk of security issues and requires still big upgrades (e.g. moving to another major version of an underlying library). Additionally, also the underlying operating system layer is far form being highly modularizable. Some operating systems exist, but they remain mostly in the domain of highly specialized devices.

Another open question is how to deal with the feature interaction problem as the possible number of combinations and modules may have unforeseen side-effects. On the other hand, one may argue that higher modularization and isolation will make this less of a problem. However, those aspects still have to be studied.

Finally, let us assume several business functions need to interact with each other (Combined Business Functions – CBF). One could argue that they could share the same set of modules and versions that they need. This would reduce complexity, but this is not always easy in serverless computing, where it is quite common that a set of functions is developed by different organisations. Hence, they may have different versions of a a shared module. This might be not so problematic if even in different versions the underlying function has not changed. However, if it changes then it can lead to subtle errors if two business functions in serverless computing need to communicate. Here it would be desirable to capture those changes semantically, e.g. using some logic language, to automatically find issues or potentially resolve them in the service mesh / messaging bus layer. One may think in this context as well that if business functions run on the same node they could share potentially modules to reduce the memory footprint and potentially CPU resources.

Answers to those issues will also make it easier  to upgrade serverless computing functions to the newest version offering the latest fixes.

In the future, I expect:

  • CBF Analyzer that automatically derive and extract the minimum set of VM, uni kernel, driver and library/engine functionality needed to run a business function or a collection of loosely coupled business functions
  • Extended Analysis on colocating CBFs that have the optimal minimum set of joint underlying dependencies (e.g. kernel, driver etc.)
  • Dynamically during runtime of a function making shared underlying modules in native libraries and operating system code available to reduce resource utilization
  • Composable infrastructure and software-defined infrastructure will not only modularize the underlying software infrastructure, but the hardware itself. For instance, if only a special function of a CPU is needed then other part of the CPUs can be used by other functions (e.g. similar to Hyper-Threading). Another example is the availability and sharing of GPUs by plugging them anywhere into the data center.

 

Big Data: Bring Computation to Data

Big Data is the topic of the coming years. Even today large Internet companies store exabytes of data and their revenue model is based on selling products as well as services around this data. Consequently, they need to process data using advanced statistical methods, such as machine learning. Hence, they need to think about how to do this efficiently. Currently, especially in-memory is hyped to address this issue. However, this is only one aspect. A fundamentally more important aspect is where the data is processed in a distributed multi-node data environment.

A brief history on software architectures

In the beginning of software development, many applications have been single monolithic applications. They have been deployed on a single computer. This lead to several problems, such as that developers could hardly reuse code of monolithic applications and the approach did not scale very well since it was limited to a single computer. The first problem has been addressed by introducing different layers into the architecture. The resulting architectures are usually based on three layers (see next figure): data layer, service layer and presentation layer. The data layer handles any functionality for managing data, such as querying or storing it. The service layer implements business logic, e.g. it implements business process. The presentation layer allows the user to interact with the implemented business processes, e.g. entering of new customer data. The layers communicate with each other using well-defined interfaces implemented today in REST, OData, SOAP, Websockets or HTTP/2.0. threelayerarchitecture

With the emergence of the Internet, these layers had to be put physically on different machines to provide larger scalability. However, they have never been designed with this in mind. The network layer has only limited transport bandwidth and capacity. Indeed, for very large data it can be faster to store it on a large drive and transport it by truck to its destination than doing it by the network.

Additionally, during development scalability of data computation is of less interest, because in the Internet world it is often not known how many people will have access to an application and this may change over time. Hence, you need to be able to scale dynamically up an down. I observe that more and more of the development efforts in this area have moved to operations, who need to implement monitors, load-balancer and other technology to scale applications. This is also the reason why DevOps is a popular and emerging paradigm for developing and operating Internet-scale web applications, such as Netflix.

Towards New Software Architectures: Bring Computation to Data

The multiple layer approach does make sense and you could it even split it into more layers (“services”), but you have to evaluate carefully complexity and reusability of your service design. More important, you will have to think about new interfaces, because if components are located on different machines or different memory instances, your application will spend a lot of time for moving data between them. For instance, the application logic on the application server may request all customer transactions from the database and then correlate them to write the results back into the database. This requires a lot of data to be transferred from the database to the application server and potentially costs a lot of performance. Finally, it does not scale at all.

This problem first emerged when companies introduced the first Online Analytical Processing (OLAP) engines as part of business intelligence solutions for understanding their business. Database queries proved as too simple and would require to transfer first a lot of data to the application server. Hence, the Structured Query Language (SQL) for databases was extended to cope with these new requirements (e.g. the CUBE operator). Moreover, you can define your own custom functions (e.g. SQL Stored procedures), but they have to be implemented very vendor specific. For instance, distributed databases based on Apache Hadoop support custom functions. However, you can integrate sometimes other programming languages, such as Java. While stored procedures are already an improvement in terms of security (protection against SQL injection attacks), they have the problem that it is very difficult to write sophisticated programs to handle modern Big Data applications. For instance, many applications require machine learning, statistical correlation or other statistical methods. It is difficult to write them as stored procedures and to maintain support for different vendors. Furthermore, it leads again to monolithic applications. Finally, they are not dynamic – the application cannot decide to do any new computation on the fly without reimplementing it in the database layer (e.g. implement a new machine learning algorithm). Hence, I suggest another way to address this issue.

A Standard for Bringing Computation to Data?

As mentioned, we want to support modern Big Data applications by providing suitable language support for machine learning and statistical methods on top of any database system (e.g. MySQL, Hadoop, Hbase or IBM DB2). The next figure illustrates the new approach. The communication between the presentation and service layer works as usual. However, the services do not call functions on the data layer, but send any data-intensive computation they want to perform as an R script to the data layer, which executes it and only sends back the result.

bringcomputationtodataarchitecture

I observed that the programming language R for statistical computing has been recently integrated in various data environments, such as transactional databases, Apache Hadoop clusters or in-memory databases, such as SAP HANA. Hence, I think R could be a suitable language for describing computation that operates on data. Additionally, R has already a lot of built-in packages for machine learning or statistical data processing. Finally, depending on the openness of the underlying data environment, you can integrate R tightly into it, so you may not have to do extensive in-memory transfers.

The advantage of the approach are:

  • business logic stays in the service level and does not move to the data layer
  • You can easily add new services without modifying the data layer – so you avoid a tight coupling, which makes it easier to change the data layer or to introduce new functionality
  • You can mine R scripts generated by services to determine which computation the user is likely to do next to start executing it before the user requests it.
  • Caching and distribution of data processing can be based on a more sophisticated analysis of the R scripts using the R Profiler Rprof
  • R is already known by many business analysts or social scientists/psychologists

However, you will need to have some functionality for governing the execution of the R scripts in the data layer. This includes decisions on when to schedule computation or creating new computing/data nodes (e.g. real-time vs batch). This will require a company-wide enterprise architecture approach where you need to define which data should be real-time and which data should be batch-processed. Furthermore, you need to take into account security and separation of concerns.

In this context, Apache Hadoop might be an interesting solution from the technology perspective.

What is next

The aforementioned approach is only the beginning. By using this solution, you can think about true inter-cloud deployments of your application. Finally, you can enable inter-organizational data-processing business processes.

Scenarios for Inter-Cloud Enterprise Architecture

The unstoppable cloud trend has arrived at the end users and companies. Particularly the first ones openly embrace the cloud, for instance, they use services provided by Google or Facebook. The latter one is more cautious fearing vendor lock-in or exposure of secret business data, such as customer records. Nevertheless, for many scenarios the risk can be managed and is accepted by the companies, because the benefits, such as scalability, new business models and cost savings, outweigh the risks. In this blog entry, I will investigate in more detail the opportunities and challenges of inter-cloud enterprise applications. Finally, we will have a look at technology supporting inter-cloud enterprise applications via cloudbursting, i.e. enabling them to be extended dynamically over several cloud platforms.

What is an inter-cloud enterprise application?

Cloud computing encompasses all means to produce and consume computing resources, such as processing units, networks and storage, existing in your company (on-premise) or the Internet. Particularly the latter enable dynamic scaling of your enterprise applications, e.g. you get suddenly a lot of new customers, but you do not have the necessary resources to serve them all using your own computing resources.

Cloud computing comes in different flavors and combinations of them:

  • Infrastructure-as-a-Service (IaaS): Provides hardware and basic software infrastructure on which an enterprise application can be deployed and executed. It offers computing, storage and network resources. Example: Amazon EC2 or Google Compute.
  • Platform-as-a-Service (PaaS): Provides on top of an IaaS a predefined development environment, such as Java, ABAP or PHP, with various additional services (e.g. database, analytics or authentication). Example: Google App Engine or Agito BPM PaaS.
  • Software-as-a-Service (SaaS): Provides on top of a IaaS or PaaS a specific application over the Internet, such as a CRM application. Example: SalesForce.com or Netsuite.com.

When designing and implementing/buying your enterprise application, e.g. a customer relationship management (CRM) system, you need to decide where to put in the cloud. For instance, you can put it fully on-premise or you can put it on a cloud in the Internet. However, different cloud vendors exist, such as Amazon, Microsoft, Google or Rackspace. They offer also a different flavor of cloud computing. Depending on the design of your CRM, you can put it either on a IaaS, PaaS or SaaS cloud or a mixture of them. Furthermore, you may only put selected modules of the CRM on the cloud in the Internet, e.g. a module for doing anonymized customer analytics. You will also need to think about how this CRM system is integrated with your other enterprise applications.

Inter-Cloud Scenario and Challenges

Basically, the exemplary CRM application is running partially in the private cloud and partially in different public clouds. The CRM database is stored in the private cloud (IaaS), some (anonymized) data is sent to different public clouds on Amazon EC2 (IaaS) and Microsoft Azure (IaaS) for doing some number crunching analysis. Paypal.com is used for payment processing. Besides customer data and buying history, the databases contains sensor information from different point of sales, such as how long a customer was standing in front of an advertisement. Additionally, the sensor data can be used to trigger some actuators, such as posting on the shop’s Facebook page what is currently trending, using the cloud service IFTTT. Furthermore, the graphical user interface presenting the analysis is hosted on Google App Engine (PaaS). The CRM is integrated with Facebook and Twitter to enhance the data with social network analysis. This is not an unrealistic scenario: Many (grown) startups already deploy a similar setting and established corporations experiment with it. Clearly, this scenario supports cloud-bursting, because the cloud is used heavily.

I present in the next figure the aforementioned scenario of an inter-cloud enterprise application leveraging various cloud providers.

intercloudarchitecture

There are several challenges involved when you distribute your business application over your private and several public clouds.

  • API Management: How to you describe different type of business and cloud resources, so you can make efficient and cost-effective decisions where to run the analytics at a given point in time? Furthermore, how to you represent different storage capabilities (e.g. in-memory, on-disk) in different clouds? This goes further up to the level of the business application, where you need to harmonize or standardize business concepts, such as “customer” or “product”. For instance, a customer described in “Twitter” terms is different from a customer described in “Facebook” or “Salesforce.com” terms. You should also keep in mind that semantic definitions change over time, because a cloud provider changes its capabilities, such as new computing resources, or focus. Additionally, you may dynamically change your cloud provider without disruption to the operation of the enterprise application.
  • Privacy, risk and Security: How do you articulate your privacy, risk and security concerns? How do you enforce them? While there are already technology and standards for this, the cloud setting imposes new problems. For example, once you update the encrypted data regularly the cloud provider may be able to determine from the differences parts or all of your data. Furthermore, it may maliciously change it. Finally, the market is fragmented without an integrated solution.
  • Social Network Challenge: Similarly to the semantic challenge, the problem of semantically describing social data and doing efficient analysis over several different social networks exist. Users may also change arbitrarily their privacy preferences making reliable analytics difficult. Additionally, your whole company organizational structure and the (in-)official networks within your company are already exposed in social business networks, such as LinkedIn or Xing. This blurs the borders of your enterprise further to which it has to adapt by integrating social networks into its business applications. For instance, your organizational hierarchy, informal networks or your company’s address book exist probably already partly in social networks.
  • Internet of Things: The Internet of Things consists of sensors and actuators delivering data or executing actions in the real world supported by your business applications and processes. Different platforms exist to source real world data or schedule actions in the real world using actuators. The API Management challenge exists here, but it goes even beyond: You create dynamic semantic concepts and relate your Internet of Things data to it. For example, you have attached an RFID and a temperature sensor to your parcels. Their data needs to be added to the information about your parcel in the ERP system. Besides the semantic concept “parcel” you have also that one of a “truck” transporting your “parcel” to a destination, i.e. you have additional location information. Furthermore it may be stored temporarily in a “warehouse”. Different business applications and processes may need to know where the parcel is. They do not query the sensor data (e.g. “give me data from tempsen084nl_98484”), but rather formulate a query “list all parcels in warehouses with a temperature above 0 C” or “list all parcels in transit”. Hence, Internet of Thing data needs to be dynamically linked with business concepts used in different clouds. This is particularly challenging for SaaS applications, which may have different conceptualization of the same thing.

Enterprise Architecture for Inter-Cloud Applications

You may wonder how you can integrate the above scenario at all in your application landscape and why you should do it at all. The basic promise of cloud computing is that it scales according to your needs, that you can outsource infrastructure to people who have the knowledge and capabilities to run the infrastructure. Particularly, small and medium size enterprises benefit from this and the cost advantage. It is not uncommon that modern startups start their IT using the cloud (e.g. FourSquare).

However, also large corporations can benefit from the cloud, e.g. as a “neutral” ground for a complex supply chain with a lot of partners or to ramp up new innovative business models where the outcome is uncertain.

Be aware that in order to offer some solution based on the cloud you need to first have a solid maturity of your enterprise architecture. Without it you are doomed to fail, because you cannot make proper risk and security analysis, scaling and benefit from cost reductions as well as innovation.

I propose in the following figure an updated model of the enterprise architecture with new components for managing cloud-based applications. The underlying assumption is that you have an enterprise architecture, more particularly a semantic model of business objects and concepts.

intercloudarchitecturenew

  • Public/Private Border Gateway: This gateway is responsible for managing the transition between your private cloud and different public clouds. It may also deploy agents on each cloud to enable a secure direct communication between different cloud platforms without the necessity to go through your own infrastructure. You might have more fine granular gateways, such as private, closest supplier and public. A similar idea came to me a few years ago when I was working on inter-organizational crisis response information systems. The gateway is not only working on the lower network level, but also on the business processes and objects level. It is business-driven and depending on business processes as well as rules, it decides where the borders should be set dynamically. This may also mean that different business processes have access to different things in the Internet of Things.
  • Semantic Matcher: The semantic matcher is responsible for translating business concepts from and to different technical representations of business objects in different cloud platforms. This can be simple transformations of not-matching data types, but also enrichment of business objects from different sources. This goes well beyond current technical standards, such as EDI or ebXML, which I see as a starting point. Semantic matching is done automatically – there is no need for creating time consuming manual mappings. Furthermore, the semantic matcher enhances business objects with Internet of Things information, so that business applications can query or trigger them on the business level as described before. The question here is how you can keep people in control of this (see Monitor) and leverage semantic information.
  • API Manager: Cloud API management is the topic of the coming years. Besides the semantic challenge, this component provides all necessary functionality to bill, secure and publish your APIs. It keeps track how is using your API and what impact changes on it may have. Furthermore, it supports you to compose new business software distributed over several cloud platforms using different APIs subject to continuous change. The API Manager will also have a registry of APIs with reputation and quality of service measures. We see now a huge variety of different APIs by different service providers (cf. ProgrammableWeb). However, the scientific community and companies have not picked up yet the inherent challenges, such as the aforementioned semantic matching, monitoring of APIs, API change management and alternative API compositions. While there exists some work in the web service community, it has not yet been extended to the full Internet dimension as it has been described in the scenario here. Additionally, it is unclear how they integrate the Internet of Thing paradigm.
  • Monitor: Monitoring is of key importance in this inter-cloud setting. Different cloud platforms offer different and possible very limited means for monitoring. A key challenge here will be to consolidate the monitoring data and provide an adequate visual representation to do risk analysis and selecting alternative deployment strategies on the aggregated business process level. For instance, by leveraging semantic integration we can schedule request to semantically similar cloud and business resources. Particularly, in the Internet of Thing setting, we may observe unpredictable delays, which lead to delayed execution of real-world activities, e.g. a robot is notified that a parcel flew off the shelf only after 15 minutes.

Developing and Managing Inter-Cloud Business Applications

Based on your enterprise architecture you should ideally employ a model-driven engineering approach. This approach enables you automation of the software development process. Be aware that this is not easy to do and failed often in practice – However, I have also seen successful approaches. It is important that you select the right modeling languages and you may need to implement your own translation tools.

Once you have all this infrastructure, you should think about software factories, which are ideal for developing and deploying standardized services for selected platforms. I imagine that in the future we will see small emerging software factories focusing on specific aspects of a cloud platform. For example, you will have a software factory for designing graphical user interfaces using map applications enhanced with selected Odata services (e.g. warehouse or plant locations). In fact, I expect soon a market for software factories which enhances the idea of very basic crowd sourcing platforms, such as the Amazon Mechanical Turk.

Of course, since more and more business applications shift towards the private and public clouds, you will introduce new roles in your company, such as the Chief Cloud Officer (CCO). This role is responsible for managing the cloud suppliers, integrating them in your enterprise architecture and proper controlling as well as risk management.

Technology

The cloud exists already today! More and more tools emerge to manage it. However, they do not take into account the complete picture. I described several components for which no technologies exist. However, some go in the right direction as I will briefly outline.

First of all, you need technology to manage your API to provide a single point of management towards your cloud applications. For instance, Apache Delta Cloud allows managing different IaaS provider, such as Amazon EC2, IBM SmartCloud or OpenStack.

IBM Research also provides a single point of management API for cloud storage. This goes beyond simple storage and enables fault tolerance and security.

Other providers, such as Software AG, Tibco, IBM or Oracle provide “API Management” software, which is only a special case of API Management. In fact, they provide software to publish, manage the lifecycle, monitor, secure and bill your own APIs for the public on the web. Unfortunately, they do not describe the necessary business processes to enable their technology in your company. Besides that, they do not support B2B interaction very well, but focusing on business to development aspects only. Additionally, you find registries for public web APIs, such as ProgrammableWeb or APIHub, which are first starting point to find APIs. Unfortunately, they do not feature sematic description and thus no semantic matching towards your business objects, which means a lot of laborious manual work for doing the matching towards your application.

There is not much software for managing the borders between private and public cloud or even allowing more fine-granular borders, such as private, closest partner and the public. There is software for visualizing and monitoring these borders, such as the eCloudManager by Fluid Operations. It features semantic integration of different cloud resources. However, it is unclear how you can enforce these borders, how you control them and how can you manage the different borders. Dome 9 goes into this direction, but focuses only on security policies for IaaS applications. It does only understand data and low level security, but not security and privacy over business objects. Deployment configuration software, such as Puppet or Chef, are only first steps, since they focus only on deployment, but not on operation.

On the monitoring side you will find a lot of software, such as Apache Flume or Tibco HAWK. While these operate more on the lower level of software development, IFTTT enables execution of business rules over data on several cloud providers providing public APIs. Surprisingly, it considers itself at the moment more as a end user facing company. Additionally, you find in the academic community approaches for monitoring distributed business processes.

Unfortunately, we find little ready to go software in the area “Internet of Things”. I worked myself with several R&D prototypes enabling cloud and gateways, but they are not ready for the market. Products have emerged but they are only for a special niche, e.g. Internet of Things enabled point of sale shop. They lack particularly a vision how they can be used in an enterprise-wide application landscape or within a B2B enterprise architecture.

Conclusion

I described in this blog the challenges of inter-cloud business applications. I think in the near future (3-5 years) all organizations will have some them. Technically they are already possible and exist to some extent. The risk and costs will be for many companies lower than managing everything on their own. Nevertheless key requirement is that you have a working enterprise architecture management strategy. Without it you won’t have any benefits. More particularly, from the business side you will need adequate governance strategies for different clouds and APIs.

We have seen already key technologies emerging, but there is still a lot to do. Despite decades of research on semantic technologies, there exists today no software that can perform automated semantic matching of cloud and business concepts existing in different components of an inter-cloud business application. Furthermore, there are no criteria on how to select a semantic description language for business purposes that are as broad as described here. Enterprise Architecture Management tools in this area only slowly emerge. Monitoring is still fragmented with many low level tools, but only few high-level business monitoring tools. They cannot answer simple questions, such as “what if cloud provider A goes down then how fast can I recover my operations and what are the limitations”. API Management is another evolving area, but which will have a significant impact in the coming years. However, current tools only consider low-level technical aspects and not high-level business concepts.

Finally, you see that a lot of challenges mentioned in the beginning, such as the social network challenge or Internet of Thing challenge, are simply not yet solved, but large scale research efforts are on their way. This means further investigation is needed to clarify the relationships between the aforementioned components. Unfortunately, many of the established middleware vendors lack a clear vision of cloud computing and the Internet of Things. Hence, I expect this gap will be filled by startups in this area.