Big Data – What is next? OLTP, OLAP, Predictive Analytics, Sampling and Probabilistic Databases

Big Data has matured over the last years and is becoming more and more a standard technology used in various industries. Coming from established concepts, such as OLAP or OLTP, in context of Big Data, I go in this blog post beyond them describing what is needed for next generation applications, such as autonomous cars, industry 4.0 and smart cities. I will cover three new aspects: (1) making the underlying technology of predictive analytics transparent to the data scientist (2) avoiding Big Data processing of one large scale dataset by employing sampling and probabilistic datastructures and (3) ensuring quality and consistency of predictive analytics using probabilistic databases. Finally, I will talk about how these aspects change the Big Data Lambda architecture and briefly address some technologies covering the new three aspects.

Big Data

Big Data has emerged over the last years as a concept to handle data that requires new data modeling concepts, data structures, algorithms and/or large-scale distributed clusters. This has several reasons, such as large data volumes, new analysis models, but also changing requirements in the light of new use cases, such as industry 4.0 and smart cities.

During investigations of these new use cases it quickly came apparent that current technologies, such as relational databases would not be sufficient to address the new requirements. This was due to inefficient data structures as well as algorithms for certain analytics questions, but also to the inherent limitations of scaling them.

Hence, Big Data technologies have been developed and are subject to continuous improvement for old and new use cases.

Big Online Transaction Processing (OLTP)

OLTP has been around for a long time and focuses on transaction processing. When the concept of OLTP emerged it has been usually a synonym for simply using relational databases to store various information related to an application – most people forgot that it was related to processing of transactions. Additionally, it was not about technical database transactions, but business transactions, such as ordering products or receiving money. Nevertheless, most relational databases secure business transactions via technical transactions by adhering to the ACID criteria.

Today OLTP is relevant given its numerous implementations in enterprise systems, such as Enterprise Resource Management systems, Customer Relationship Management systems or Supply Chain Management systems. Due to the growing complexity of international organisations these systems tend to have more and more data and – from a data volume point of view – they tend to generate a lot of data. For instance, large online vendor can have several exabyte of transaction data. Hence, Big Data happen also for OLTP. Particularly, if this data needs to be historized for analytical purposes (see next section).

However, one important difference from other systems is the access pattern: Usually, there are a lot of concurrent users, who are interested in a small part of the data. For instance, a customer relation agent adds some details about a conversation with a customer. Another example is that an order is updated. Hence, you need to be able to find/update a small data set in a much large data set. Different mechanismto handle a lot of data for OLTP usage exist in relational database systems since a long time.

Big Online Analytical Processing (OLAP)

OLAP has been around nearly as long as OLTP, because most analysis have been done on historized transactional data. Due to the historization and different analysis needs the amount of data is significant higher than in OLTP systems. However, OLAP has a different access pattern: Less concurrent users, but they are interested in the whole set of data, because they want to generate aggregated statistics for them. Hence, a lot of data is usually transferred into an OLAP system from different source systems and afterwards it is only read very often.

This has led very early to the development of special OLAP databases for storing data for multidimensional analysis in cubes to match the aforementioned access pattern. They can be seen as very early NoSQL databases, although they have not been described as such at this time, because the term NoSQL databases appeared only much later.

While data from OLTP systems have been originally the primary source for OLAP systems, new sources of data have appeared, such as sensor data or social network graphs. This data goes beyond the capability of OLTP or special OLAP databases and requires new approaches.

Going beyond OLTP and OLAP

Aspect 1: Predictive Analytics

Data scientists employing predictive analytics are using statistic and machine learning techniques to predict how a situation may evolve in the future. For example, they predict how the sales will evolve given existing sales and patterns. Some of these techniques exist already since decades, but only since recently they make more sense, because more data can be processed with Big Data technologies.

However, current Big Data technologies, such as Hadoop, are not transparent to the end user. This is not really an issue with the Big Data technologies themselves, but with the tools used for accessing and processing the data, such as R, Matlab or SAS.

They require that the end user thinks about writing a distributed analysis algorithms, e.g. via map/reduce programs in R or other languages to do their analysis. The standard library functions for statistics can be included in such distributed programs, but still the user has to think about how to design the distributed program. This is undesirable, because these users are usually not skilled enough to design them optimally. Hence, frustration with respect to performance and efforts is likely to occur.

Furthermore, organisations have to think about an integrated repository where they store these programs to enable reuse and consistent analytics. This is particularly difficult, because these programs are usually maintained by business users, who lack proper software management skills.

Unfortunately, it cannot be expected that the situation changes very soon.

Aspect 2: Sampling & Probabilistic Data Structures

Surprisingly often when we deal with Big Data, end users tend to execute queries over the whole set of data independent if it is has 1 million rows or 1 billion rows.

Sampling databases

While it is certainly possible to process a data set of nearly any size with modern Big Data technologies, one should carefully think if this is desired due to increased costs, time and efforts needed.

For instance, if I want to calculate the average value of all transactions then I can calculate the average of all transactions. However, I could take also a random sample of 5 % of the transactions and know that the average of this sample is correct with an error of +-1 % in comparison of the total population. For most decision making this is perfectly fine. However, I needed to process only a fraction of the data and can now do further analysis due to the saved time and resources. This may even lead to better informed decisions.

Luckily, there are already technologies allowing this. For example, BlinkDB, which allows – amongst others – the following SQL queries:

  • Calculate the average of the transaction values within 2 seconds with some error:SELECT avg(transactionValue) FROM table WITHIN 2 SECONDS
  • Calculate the average of the transaction values within given error boundaries :SELECT avg(transactionValue) FROM table ERROR 0.1 CONFIDENCE 95.0%

These queries executed over large-scale dataset are executed much faster than in any other Big Data technology not employing sampling methods.

Particularly for predictive algorithms this makes sense, because they have anyway underlying assumption about statistical errors., which can be easily integrated by a data scientists with errors from sampling databases.

Bloom filters

Probabilistic data structures, such as Bloom filters or HyperLoglog, are aiming in the same direction. They are more and more implemented in traditional SQL databases and NoSQL databases.

Bloom filters can tell you if an element is part of a set of elements without browsing through the set of elements by employing a smart hash structure. This means you can skip trying to access elements on disk which anyway do not exist. For instance, if you want to join two large datasets you need only to load the data for which there is a corresponding value in the other dataset. This dramatically improves the performance, because you need to load less data from slow storage.

However, bloom filters can only tell you if an element is definitely not in the set. This means it can only tell you with a certain probability if a given element is in the set. However, this is for the the given use cases of bloom filters no problem.

HyperLogLog

Hyperloglog structures allow you to count the number of unique elements in a set without storing the whole set.

For example, let us assume you want to count the unique listeners of a song on the web.

If you use traditional SQL technologies then you need to store for 5 million unique listeners (not uncommon on Spotify) 80 MB of data for one song. It takes also several seconds for each web site request just to do the count unique or to insert a new unique listener.

By using HyperLoglog you need to store at a maximum only few kilobytes of information (usually much less) and can read/update instantaneously the counted unique listeners.

Usually these results are correct within a minor configurable error margin, such as 0,12%.

Find some calculations in my lecture materials.

Aspect 3: Probablistic Databases

Your company has deployed a Big Data technology and uses it productively. Data scientists generate new statistical models on a daily basis all over the world based on your enormous data reservoir. However, all these models have only a very selective view on the real world and different data scientists use different methods and assumptions to do the same analysis, i.e. they have a different statistical view on the same object in the real world.

This is a big issue: Your organization has an inconsistent view on the market, because different data scientists may use different underlying data, assumptions and probabilities. This leads to a loss of revenue, because you may define contradictory strategies. For example, data scientist A may do a regression analysis on sales of ice cream based on surveys in North France with a population of 1000. Data scientist B may independently do a regression analysis on sales of ice cream in South Germany with a population of 100. Hence, both may come up with different prediction for ice cream sales in middle Europe, because they have different views on the world.

This is not an issue only with collaboration. Current Big Data technologies do not support a consistent statistical view of the world on your data.

This is where probabilistic databases will play a key role. They provide an integrated statistical view on the world and can be queried efficiently by employing new techniques, but still supporting SQL-style queries. For example one can query the location of a truck from a database. However, instead of just one location of one truck you can get several locations with different probabilities associated to them. Similarly you may join all the trucks with a certain probability being close to goods at a certain warehouse.

Current technologies are based on academic prototypes, such as MayBMS by Cornell University, BayesStore by University of California Berkeley or Trio by Standford University.

However, the technologies lack still commercial maturity and more research is needed.

Many organization are not ready for this next big step in Big Data and it is expected that this will take at least 5-10 years until the first are ready to employ such technology.

Context of the Lambda architecture

You may wonder how this all fits into the Big Data Lambda architecture and I will briefly explain it to you here.

Aspect 1: Integration of analytics tools in the cluster

Aspect 1, the integration of analytics tools with your cluster, has not been really the focus of the Lambda Architecture. In fact, this is missing, although it has significant architectural consequences, since it affects resources used, reusability (cf. also here) or security.

Aspect 2: Sampling databases and probabilistic data structures

Sampling databases and probabilistic data structures are most suitable for the speed-and serving layer. They allow fast processing of data while only being as accurate as needed. If one is satisfied with their accuracy, which can be expected for most of the business cases after thoughtful reconsideration, then one even won’t need a batch layer anymore.

Aspect 3: Probabilistic databases

Probabilistic databases will be initially part of the serving layer, because this is the layer the data scientists directly interact with in most of the cases. However, later it will be integral part of all layers. Especially the deployment of quantum computing, as we see it already in big Internet and High Tech companies, will drive this.

Conclusion

I presented in this blog post important aspects for the future of Big Data technologies. This covered the near-term future, medium-term future and long-term future.

In the near-term future we will see a better and better integration of analytics tools into Big Data technology, enabling transparent computation of sophisticated analytics models over a Big Data cluster. In the medium-term future we will see more and more usage of sampling databases and probabilistic data structures to avoid unnecessary processing of large data to save costs and resources. In the long-term future we will see that companies will build up an integrated statistical view of the world to avoid inconsistent and duplicated analysis. This enables proper strategy definition and execution of 21st century information organizations in traditional as well as new industries.

DevOps for your business? – About Uniting Development and Operations

DevOps has become in recent years a term for a new paradigm of integrating and managing development as well as operations of software within and cross organizations. I will describe in this blog entry what DevOps is and relate it to existing methodologies, such as agile development, and organizational structures. Basically, DevOps is a broad term that summarizes a set of best practices supported by a high degree of automation of all development and operational processes around the software delivery process using advanced public and/or private cloud technologies. I will conclude with a brief summary of the impact of DevOps on Big Data applications.

The Situation

Many companies have separate development and operations departments. Both usually work under high pressure to deliver and operate applications for business processes of strategic importance.

In recent years it has been shown that the development department has to work closer with internal and external customers to develop the right solutions that the customer can accept. Agile methodologies have been advocated to manage problems given the uncertainty of the business environment where the future solutions should be deployed and/or the lack of understanding of customer requirements or IT requirements. These agile methodologies broke with the paradigm to have a clearly defined long-term process (e.g. waterfall model) where the customer is only involved in the beginning and – often – too late in the end, so that mistakes where costly to correct.

At the same time, the operations department faced similar challenges as the development department. Given the uncertainties of the business environment, the customer ask a lot of new services and IT infrastructure changes, but there was little understanding on the customer side what effects they have and created high pressure on the operations department which has usually a low budget to deal with these changes. Hence, a clearly structured and governed process was needed to handle customer requests. Thus popular IT Service Management frameworks, such as the Information Technology Infrastructure Library (ITIL), were born and used globally. Since critical business applications are operated, the operations department tends to be much more risk avers and it tries to avoid unknown technologies.

It can be observed that both departments had needed and implemented different approaches to deal with the customers. This has lead to the problem that both departments were not only only divided from an organizational perspective, but also from a cultural one. For instance, the development department developed with little consultation of the operations department technology and after some time they just threw a complex piece of software over the “fence” and told the operations department to operate it. However, since they did not collaborate a lot there have been a lot of (extremely) costly problems during software delivery and operations. For example, the different environments, such as development or test environments, did not match the required infrastructure. Newest updates to fulfill requirements could not be installed fast enough, so that development was delayed. Operational staff required training and this was not considered. Many more problems occurred, because there was a strong interdependency between both departments due to the software delivery process.

Recently, DevOps have been pushed to handle this lack of collaboration between the two departments with respect to the software delivery process. Additionally, it address the challenges of new technologies, such as the cloud and software-defined infrastructures, which require strong development skills in development and the operations department.

DevOps a New Paradigm?

DevOps is not a clearly defined term and there is no reference model, such as ITIL, behind it. However, we can identify some common aspects that can be find in many different papers on the topic (cf. Gartner, this blog entry or this article):

  • A clearly defined as well as highly automated continuous software delivery pipeline across different software environments and the development as well as operations departments
    • Describe involved stakeholders with roles and responsibilities.
    • Defined and measured key performance indicators, e.g. “after committing some software source code in the development environment it is fully tested as well as deployed in the production environment and can be used by business processes within one hour”.
    • Clearly defined environments, e.g. development environment, test environment, acceptance environment and production environment. This implies a unambiguous and idempotent description (i.e. a set of scripts) of how they are created, operated, destroyed and what virtual resources (computation, memory and network) they require. This means you have to fully leverage private and public cloud technologies. Fully virtualized environments can be created by anyone using just a graphical browser interface (see next section).
    • Avoid reconfiguration of software for different environments. Environments should be ideally the same (e.g. same network addresses and same hardware) .
    • Continuous integration of software components delivered by different teams.
    • Fully automated deployment procedure in different environments. Manual deployment requiring human intervention is forbidden.
    • Fully automated regression, integration and acceptance testing. Manual test activities should be reduced to nearly zero, because rapidly changing economic environments require rapid deployment of new solutions.
    • Test-Driven development: develop the test of parts of software before you develop the software itself.
    • Deploy software in production incrementally in small chunks often (e.g. each week) as it is required when using Service-oriented Architectures (SOA). Then you avoid to make mistakes and changes cannot have catastrophic impact if they do not work. If you continue the traditional way of deploying large chunks of software with a lot of changes every few months you will continue the pay the price of production outages or obstacles to your business processes.
    • Have a consistent fully automated monitoring approach for your software environments. Leverage machine learning techniques to predict software problems before they happen.
    • Allow each stakeholder (development, test, operations and customer) to use a current build of the software deployed in a virtual environment by just using a browser interface
  • Integration of people from the operations department in the agile software development process. Operations has development skills and development has operation skills.
  • Integration of people from the development department in strategic IT operations processes
  • Clearly defined governance structures – have a sound program and project management
  • Integration of best practices, such as ITIL or agile methodologies
  • Fehlerkultur (error culture): Do not blame mistakes on each other, but solve them collaboratively. Finding out whose fault it was costs time and money and is usually not important – just solve them together as they occur. Be positive about errors and confident about handling them. Make error management part of your daily life. Each error is an opportunity to learn for all involved parties.

DevOps is NOT about flat organizations. Flat organizations make only sense if your organization has just one business area and your employee structure is relatively homogeneous and not heterogeneous. This does not imply that a strict hierarchical model is better, but you need to combine carefully control and agility.

Is DevOps right for your Organization?

Every organization should leverage and adapt DevOps practices, because they lead to significant benefits for all involved stakeholders (cf. also here). It makes sense for startups as well as large corporations. However, there are exceptions, such as research & development of disrupting new technologies. When researching & developing completely new technologies, where the impact on the infrastructure is completely unclear or where you just want to explore very risky software, where you are not certain that you will use it in the future, the overhead of a DevOps organization is too high. From my experience, there you have small teams, which are – on purpose – separated from the others to develop a new way of thinking and using novel technologies. They have to redo very often everything from scratch potentially using very diverse software technologies in a short time frame. Nevertheless, they may still employ some DevOps aspects, such as full computing, memory and network virtualization provided by cloud technologies.

What tools can I use?

You will find a lot of tools for enabling DevOps in your organization. Indeed tools are an important aspects, because you have a high degree of automation of previously manual activities and you will manage your environments using cloud virtualization technologies.

However, do not forget that it is also about organization and culture. Simply having the tools won’t help you much.

I present here only few of the many tools that can be used.

Cloud

As I mentioned before, you want to create and manage software environments fast and in a highly automatized fashion. Each environment should ideally be identical to avoid errors due to reconfiguration of software. This means they should have the same underlying software, configuration, (virtualized) hardware and the same network configuration (including the same IP-Addresses). Large-scale public cloud providers, such as Amazon EC2 or Google Compute, already have technologies that make it feasible. If you prefer a private cloud then you can use OpenStack, which is a Linux-based Cloud Computing distribution, offering similar functionality as Amazon EC2. It offers a web interface for creating new environments in a browser.

Additionally, you can use tools, such as Vagrant, to automate creation and management of software environments.

Continuous Integration

Continuous integration supports the integration of different software components from different teams. It builds and tests automatically complete applications every time a new piece of software source code is committed to the version management system, such as GIT. Popular tools are Jenkins or Travis.

Besides normal unit testing, integration testing, one should look at acceptance testing tools, such as Cucumber. There, acceptance tests can be described by business users in (nearly) natural language. This makes them repeatable and reliable.

Continuous Deployment

Deployment should be automated and repeatable independent of the environment. This means no manual configuration or manual deployment steps. You have to develop scripts that ensure that the target system is in the desired state after a deployment – independent of the state it is currently in. Popular tools, such as Puppet, Chef or Vagrant help you with this task.

Recently, web interfaces have become popular, so you can do deployment of complex applications just using your browser (cf. Ubuntu Juju).

Monitoring

Basically your monitoring infrastructure has to collect all the messages from various applications that are deployed in an environment and be able to take action on this. Usually there are a lot of message in various text formats. This means you have to find the right tools to collect and analyze a large amount of data.

You should not only present results (e.g. critical conditions), but also be able to automatically handle them (e.g. repair broken applications). Amazon OpsWorks is one product that can do this for application deployed on the Amazon EC2 cloud.

An interesting application is the Netflix ChaosMonkey. It is an excellent example for a tool supporting the Fehlerkultur (error culture) mentioned before. Basically, it switches off machines on which your software is deployed at random, but only in a certain time frame (e.g. from 9 – 17 o’clock). This means errors will be detected more easier and can be handled when all employees of the company are there. Hence, you will have no/less errors on Sunday morning, where it is difficult to get the right people to work on a problem. It should be noted that Netflix, a media streaming service, requires strict quality of service and it cannot allow itself that, for example, streaming of media to the customer is interrupted or disturbed. Nevertheless, the ChaosMonkey is switching off machines in their production environment.

The Netflix ConformityMonkey and JanitorMonkey check if the state of the systems is still acceptable or degraded. If it is not acceptable any more then the instances are automatically switched off and rebooted, so an acceptable state is available all the time. Furthermore, they switch off unused instances to reduce costs.

Recently, predictive software maintenance has become a hot topic, where you predict when your application will fail or slowed down given the environmental conditions before the event has taken place.

DevOps and Big Data

Big Data is about processing a large amount of data of different nature (e.g. structured or unstructured) in acceptable time. There is a trend to leverage big data to enable new sophisticated statistical models of complex real-world dependencies. For instance, you want to predict what customers want to buy next or when, for example, company cars are likely fail – even before the event happens.

Having DevOps is mandatory for Big Data. You will have a limited set of resources, you have very business critical applications and change will come often, e.g. new prediction models. You may also need to do experiments in production environments. For instance, you want to evaluate new machine learning technologies involving real live business data.

Furthermore, another challenge of which many people are not aware of yet is that business user will actually develop code and deploy it in production. For example, marketing specialists will develop and test new models using R. Similarly, hardware engineers will develop and deploy prediction algorithms for the reliability of hardware assets. This code will become part of existing critical business applications.

For example, you own a bus company. Your mechanics have a good understanding of the probability distributions for hardware errors, gas consumption etc. They will develop a model in a statistical programming language, such as R, to predict failure and maintenance needs of your bus based on sensor data as well as maintenance reports. The output of this is used by the scheduling application developed by your development department to avoid delays of bus service delivery. Similarly, a statistical marketing program developed by your marketing department will predict how much customers it expects when based on historical data, but also include current events (e.g. soccer match, news or tweets). This can also be included in the scheduling application. It is not very realistic that the marketing department will ask the development department to implement a statistical model proposed by them. The overhead is simply too large for experimental Big Data applications.

In fact, this has happen already a long time ago in the finance industry where business analysts created macros in Excel/Access or other office software to support calculation of their complex financial statistic models for creating more and more complex financial products. There the problem was that you have a variety of software somewhere in the business with critical impact no versioning or backup and unknown dependencies to other software or data. This is obviously bad and can even lead to a bankrupt.

How this can be handled has not been yet subject to extensive research or experiments.

Revisiting Risk Management for Business Process Management/Workflow Systems

Risk management and is a very important topic in the world after the financial crisis and dozens of recent natural hazards. I see risk management as a broad concept that does not only apply to the financial domain, but to any process of any organization. Two years ago, I worked briefly on the topic of integrating risk management into information systems. More particularly, on combining risk management and business process management/workflow systems. The goal of this blog entry is to revisit this topic and see the progress by others.

What is risk management?

In order to answer the question what risk management is, we need to define what risk is. There are a variety of definitions related to risks from a business, public safety, military, medical or personal perspective. We are interested here in business risks. ISO 31000 defines risk as the “effect of uncertainty on objectives”. It can be positive or negative.

Although business risks are not only related to risk associated with financial investment products, we assume that the impact of risk can somehow be calculated as well as quantitative probabilities for risks can be given. Most project managers and others know that impact and probability cannot always be provided as numbers. For instance, they may define that the impact of losing customer X because of product feature Y is high with a low probability. Supporting these kinds of scenarios is another problem not addressed in this blog.

Risk can be calculated by the following basic equation:

Impact * Probability = Risk

One example would be that the impact of losing customer X because of product feature Y is 100.000 Euro with a probability of 10%. The risk can be calculated as follows:

100.000 Euro * 10% = 10.000 Euro

Obviously, we can do more sophisticated calculations by using more advanced statistical methods.

Risk management is about identifying, assessing, monitoring and mitigating risks. In order to identify and assess risks you need to talk with stakeholders of business processes (suppliers, customers, employees etc.). Monitoring risks can be done on an automated or manual basis. Mitigating risks is about reducing the impact and/or probability of a negative risks as well as increasing the impact and/or probability of positive risks. Various measures can be used, such as transferring the risk by insuring against it. It is important that risks have owners that ensure that proper risk management is in place for a given risk. Otherwise we may manage risks that do not exist anymore or we may not address current risks.

We should not only consider big risks that are very unlikely, but we should also think about small risks (e.g. wrong data entries). We face many small risks in an enterprise and summing them all up will probably lead to larger risks than a single big one.

What are Business Process Management/Workflow systems?

Business Process Management Systems (BPMS), also known as workflow systems, allow modeling a process composed of (alternative) sequences of (human or automated) activities, their resource needs (mostly human resources) and the data that is processed. Furthermore, they can coordinate the process by keeping track of its execution state, starting new activities/applications, assigning resources to them and providing the required data.

Most of the business applications you use today contain implicitly a business process or business logic. However, if the process is described in a workflow system then it is made explicit and transparent. In addition, it is much easier to monitor, improve or change it.

Why combining them?

Our initial motivation for combining risk management and business process management system was the following:

  • Each business process has risks associated with it
  • Managing these risks more effectively will reduce the impact or probability of negative risks or can increase the impact or probability of positive risks. Thus, more business value is created
  • A workflow system makes a process explicit and this allows (1) easier identification and assessment of risks (2) monitoring and initiating mitigation processes related to risks

Our early approach

A longer description of our approach can be found here. At the time we did research on this topic, not many approaches were known aiming at the integration of risk management and workflow systems. Only few fragmented ones existed addressing mainly modeling of risks in business processes without further functionality such as monitoring and managing risks during process execution controlled by a workflow system. Our approach can thus be seen as one of the first attempts to address this problem.

The blog entry containing our approach raised a lot of questions. Some were related with the integration of existing (SAP) technologies where risks are described and identified. In fact, this is an important topic, because we do not want to have several independent risk databases, possibly with different impact and probabilities for the same risks. Others raised concerns about defining the probabilities, because we sketched several possibilities, but without any answers how to select the correct method for calculating probabilities and impact. This also depends on your enterprise, the methods you use and risk culture. Clearly, normative or well-tested methods by experts would be helpful. I think the area of reference models for various industries could help to address this problem.

Furthermore, the lack of interest by industry research groups, such as Gartner or Forrester has been criticized. This has not changed so much, but at least Gartner sees a need in their report on “Hype Cycle for Business Process Management 2011”, where they describe the integration of meta data repositories for compliance, risk and governance. Nevertheless, integrating risk management and business process management systems is not yet on the radar of the BPM hype cycle by Gartner.

Finally, others criticized that models in general are not always perfect and may be wrong. This is true, but I believe with continuous review and open exchanges between risk experts in various departments of the company this problem can be addressed.

New approaches

Unfortunately, in the last months the situation has not changed much. We could not continue to work on the topic due to new priorities. However, recently, Conforti et al. proposed their approach for integrating risk management and workflow systems. In fact, they describe many ideas we provided in more detail. Interestingly, they were looking for Pareto-optimal solutions, i.e. solutions where no party can increasing its outcome without reducing the outcome of another one. This may not always be beneficial for the goals of the companies, because it only considers individual goals of the stakeholders.

Nevertheless, they describe in another paper how the risk probabilities can be calculated by using real-time sensor information. This seems to address partly some criticism of our approach, namely that models can change based on changes in the real world and this needs to be taken into account in the information systems reasoning on the models.

Conclusions

I believe that the topic is still important and should be explored in more detail. This is why I decided to create this blog. Luckily, there has been at least some progress. I see also further extensions to new approaches for managing dynamic less predictable business processes. For example, the case management process modeling standard by the OMG could address the problem of risk management in this context. Furthermore, we need to think about how we can integrate qualitative risk assessment into the solution and how we keep the important stakeholders as well as risk owners informed.

I am looking forward to your comments!