Master Data Management and the Internet of Things

Master Data Management (MDM) has matured and grown significantly over the last years. The main motivation for master data management is to have a complete and accurate view on master data objects in your organization. Master data objects describe key assets, such as machines or customers, generating value for your organization. Hence, MDM fosters processes to enhance and improve the quality of master data, so that the key assets are used properly to generate value. However, most of these processes require still manual intervention by humans. Furthermore, master data is usually not up-to-date due to its manual improvement and tracking. Especially the current state of master data is usually only entered into the system after hours or even days. This makes it difficult to act upon this state or to predict changes to it. Clearly, this can be a disadvantage compared to the competition who leverages real-time information when using their master data. For instance, one cannot predict that a customer might move to another city in the near future or that the planes one operates will require maintenance at an inconvenient time delaying an offered flight.

The vision of the Internet of Things (IoT) is to connect things sensing and acting upon their environments to the Internet and exchanging data about their state as well as their environment. IoT enables a real-time 360° view on your key assets and their interaction with the environment. Current studies estimate that by 2020 several billions of things will be connected via the internet.

Hence, it make sense to combine MDM and IoT to improve your business processes acting upon master data. These processes will benefit from an up-to-date state of master data, but can use this data to enable predictive analytics applications, such as predictive maintenance or customer retention.

I will describe in more detail both concepts and how they can be integrated. Afterwards I will discuss current challenges with respect to architectures, data models and predictive analytics applications. Finally, I will provide insights on how next generation MDM systems look like.

What is Master Data Management?

Master Data is data about the key assets in a company. Examples are customers, machines, products, suppliers, financial assets or business partners.

One should differentiate master data from transactional data, which always refers to master data. Master data object can exist on their own and do not need to refer necessarily to other data, i.e. they make sense without any relations. For instance, a customer can exist without other customers. However, the customer has usually (social) relations to other customers. A transaction for buying a product cannot exist without a customer and a product.

One of the key issues for MDM is the integration of various systems containing master data. Usually this data is inconsistent and incomplete due to various reasons. This has significant impact on the business processes using master data, which leads to significant cost and waste of resources.

Hence master data management solutions provide various means to improve master data quality automatically and manually. For instance, they offer rules engine to validate data quality and workflow engines to assign tasks to data stewards to fix incorrect data. Currently, most efforts related to improving master data quality is by improving it manually.

What is the Internet of Things (IoT)?

The Internet of Things is about a paradigm that connect any things, such as machines, cars, smartphones, thermostates or smoke detectors to the Internet where they provide information about their state and their environment to other things as well as humans.

For example, a machine can report its utilization to other machines and inform its users about alternative machines to use in case of high utilization.

The Internet of Things does not only take into account the current state of things, but also the future state of things by employing predictive analytics applications.

For instance, a car can predict based on its sensor information that the engine is likely to fail within the next seven days. It can schedule maintenance with the manufacturer so it does not fail when it is needed by the driver.

Challenges Combining MDM and IoT

The main benefits of integrating MDM and IoT are the following:

  • Automatically update master data and its state to improve value-flow of current business processes.
  • Enable prediction on master data, such as predictive maintenance of machines or predictive customer behavior, to enable new types of business processes and models.

However, currently there are some challenges integrating them.

Internet of Things and Semantic Challenges

The Internet of Things brings only value to an organization if it can use the IoT information within a proper analytics model describing the semantic relations between things and master data objects.

For instance, if the company collects only information such as sensor “A4893983” reports its location as “50.106529,8.662162” then it is of very little value to the company.

However, if it would have a proper semantic description for MDM and IoT data then it can leverage this data to generate the following information: “Customer Max Mustermann is currently at Frankfurt central station and using one of our products. His friend, Martha Musterfrau, is currently near him, but having problems with one of our products”.

These types of predictive analytics and semantic models as well IoT information require new database technologies, which will be described later.

Combining Big Data and Master Data Management

Traditional master data management solutions have not been designed with “Big Data” in mind. However, combining MDM and IoT require “Big Data”:

  • Higher data volumes due to IoT Data
  • Complex analytics queries over existing MDM data with a lot of relationships
  • Variety of information in master data objects and IoT database

This requires as well new database technologies.

Providing Prediction to Business Processes

Traditional master data management solutions only support provision of master data to business processes. However, modern master data management solutions supporting IoT will have to provide predictive analytics to business processes. Examples are answers to questions, such as the following:

  • Which of my machines is likely to fail next and which ones should be sent to maintenance?
  • What product is the customer most likely to buy next and which material do I need to buy to build it?

Relational databases are suitable for descriptive statistics, but quickly reach their limit with respect to even simple prediction models. Hence, new database technologies have to be supported.

Technology Support

Current MDM solutions are based mostly on relational SQL databases together with caching solutions. This is suitable for integrating master data objects from MDM systems into today’s business processes. Unfortunately, this makes them less suitable for predictive analytics applications due to the limitation of relational algebra. They also cannot handle a lot of relations between master data objects as it is required today (e.g. many different versions of master data objects or by master reference data, such as social network graphs or dependency graphs). This limits as well opportunities for data quality enhancements and results in poorer data quality. This leads to higher costs within the business processes using master data.

Modern MDM solutions leverage Graph databases to store and analyze master data objects as well as provide them to business processes. They offer similar transactional guarantees as relational databases, but have different storage and index structures more suitable for MDM. However, they have not become yet first class citizens in companies which currently have to build up knowledge in this areas. Nevertheless, large software vendors, such as SAP or Oracle are starting to offer graph databases as part of their databases solutions. Popular open source graph databases/processing solutions, such as OrientDB, Neo4J or Spark GraphX, TitanDB exist since several years and they can cope with large amounts of data.

Furthermore, relational databases only poorly integrate IoT data which is about the ability to digest large volumes of data and do analytics on them. This cannot be coped with anymore using vertical scaling – a prominent paradigm for relational databases, but a database cluster consisting of several communicating nodes is needed. Column-stores, such as Apache Cassandra (together with an analytics framework, such as Hadoop MapReduce or Apache Spark), Hadoop/HBase (Parquet) or SAP HANA, seem to be most suitable for this scenario. They offer high read/write throughput and thus are able to cope with the high volume of IoT data. Furthermore, they can be scaled horizontally by adding new database nodes to an existing network of nodes. Finally, you can manage load by using Apache Kafka Messaging Technology.

Find here my university-level lecture material on NoSQL & Big Data platforms.


The following figure illustrates the concept of MDM and IoT by means of an  exemplary data model. Master data objects are represented as nodes of a graph with relations to other nodes. The following master data objects can be identified: 2 electronic devices and 2 customers. The customers, Max Mustermann and Martha Musterfrau, are friends and this is represented in the Master data object graph. Furthermore each of the customers has an ownerships relation to a product (an electronic device) sold by a company.

Finally, IoT data is illustrated in the figure. This data is connected to the master data objects providing information about their state. For example, the smartphones of the customers provide information about their location (“Central Station, Frankfurt, Germany”). The IoT data of the eletronic devices provide information about their operation status. One electronic device is operating normal and the other one is broken.

examplemdmgraph The example demonstrate only a small excerpt of what is possible with a next generation master data and IoT management system. Some examples for queries that can be answered:

  • Who is the owner of devices in the state “Broken”?
  • Which customers can support other customers nearby with devices in state “Broken”?
  • Which customers influence their friends to buy new devices or recommend devices?
  • Which devices in Frankfurt are likely to fail within the next week and needs replacement?

Additional information from IoT data enables superior data quality. For instance, we can properly identify customers and devices. This avoids costly maintenance of working devices or costly replacement of non-working ones.

It is obvious that such a new system enables enhanced sales to customers because more information allows more targeted advertisement and more customization. Based on prediction models one can offer completely new value-added services.


Master Data Management enters a new area: New database technologies and the Internet of Things enable superior data quality and open up new business cases, such as predictive analytics. Ultimately this leads to new business processes offering superior value.

Nevertheless, only few MDM solutions are leveraging these new technologies yet, although these new technologies are already quiet mature. Additionally, the Internet of Things has to become more pervasive and organizations need to pressure their suppliers and customers to engage more with it.

Scenarios for Inter-Cloud Enterprise Architecture

The unstoppable cloud trend has arrived at the end users and companies. Particularly the first ones openly embrace the cloud, for instance, they use services provided by Google or Facebook. The latter one is more cautious fearing vendor lock-in or exposure of secret business data, such as customer records. Nevertheless, for many scenarios the risk can be managed and is accepted by the companies, because the benefits, such as scalability, new business models and cost savings, outweigh the risks. In this blog entry, I will investigate in more detail the opportunities and challenges of inter-cloud enterprise applications. Finally, we will have a look at technology supporting inter-cloud enterprise applications via cloudbursting, i.e. enabling them to be extended dynamically over several cloud platforms.

What is an inter-cloud enterprise application?

Cloud computing encompasses all means to produce and consume computing resources, such as processing units, networks and storage, existing in your company (on-premise) or the Internet. Particularly the latter enable dynamic scaling of your enterprise applications, e.g. you get suddenly a lot of new customers, but you do not have the necessary resources to serve them all using your own computing resources.

Cloud computing comes in different flavors and combinations of them:

  • Infrastructure-as-a-Service (IaaS): Provides hardware and basic software infrastructure on which an enterprise application can be deployed and executed. It offers computing, storage and network resources. Example: Amazon EC2 or Google Compute.
  • Platform-as-a-Service (PaaS): Provides on top of an IaaS a predefined development environment, such as Java, ABAP or PHP, with various additional services (e.g. database, analytics or authentication). Example: Google App Engine or Agito BPM PaaS.
  • Software-as-a-Service (SaaS): Provides on top of a IaaS or PaaS a specific application over the Internet, such as a CRM application. Example: or

When designing and implementing/buying your enterprise application, e.g. a customer relationship management (CRM) system, you need to decide where to put in the cloud. For instance, you can put it fully on-premise or you can put it on a cloud in the Internet. However, different cloud vendors exist, such as Amazon, Microsoft, Google or Rackspace. They offer also a different flavor of cloud computing. Depending on the design of your CRM, you can put it either on a IaaS, PaaS or SaaS cloud or a mixture of them. Furthermore, you may only put selected modules of the CRM on the cloud in the Internet, e.g. a module for doing anonymized customer analytics. You will also need to think about how this CRM system is integrated with your other enterprise applications.

Inter-Cloud Scenario and Challenges

Basically, the exemplary CRM application is running partially in the private cloud and partially in different public clouds. The CRM database is stored in the private cloud (IaaS), some (anonymized) data is sent to different public clouds on Amazon EC2 (IaaS) and Microsoft Azure (IaaS) for doing some number crunching analysis. is used for payment processing. Besides customer data and buying history, the databases contains sensor information from different point of sales, such as how long a customer was standing in front of an advertisement. Additionally, the sensor data can be used to trigger some actuators, such as posting on the shop’s Facebook page what is currently trending, using the cloud service IFTTT. Furthermore, the graphical user interface presenting the analysis is hosted on Google App Engine (PaaS). The CRM is integrated with Facebook and Twitter to enhance the data with social network analysis. This is not an unrealistic scenario: Many (grown) startups already deploy a similar setting and established corporations experiment with it. Clearly, this scenario supports cloud-bursting, because the cloud is used heavily.

I present in the next figure the aforementioned scenario of an inter-cloud enterprise application leveraging various cloud providers.


There are several challenges involved when you distribute your business application over your private and several public clouds.

  • API Management: How to you describe different type of business and cloud resources, so you can make efficient and cost-effective decisions where to run the analytics at a given point in time? Furthermore, how to you represent different storage capabilities (e.g. in-memory, on-disk) in different clouds? This goes further up to the level of the business application, where you need to harmonize or standardize business concepts, such as “customer” or “product”. For instance, a customer described in “Twitter” terms is different from a customer described in “Facebook” or “” terms. You should also keep in mind that semantic definitions change over time, because a cloud provider changes its capabilities, such as new computing resources, or focus. Additionally, you may dynamically change your cloud provider without disruption to the operation of the enterprise application.
  • Privacy, risk and Security: How do you articulate your privacy, risk and security concerns? How do you enforce them? While there are already technology and standards for this, the cloud setting imposes new problems. For example, once you update the encrypted data regularly the cloud provider may be able to determine from the differences parts or all of your data. Furthermore, it may maliciously change it. Finally, the market is fragmented without an integrated solution.
  • Social Network Challenge: Similarly to the semantic challenge, the problem of semantically describing social data and doing efficient analysis over several different social networks exist. Users may also change arbitrarily their privacy preferences making reliable analytics difficult. Additionally, your whole company organizational structure and the (in-)official networks within your company are already exposed in social business networks, such as LinkedIn or Xing. This blurs the borders of your enterprise further to which it has to adapt by integrating social networks into its business applications. For instance, your organizational hierarchy, informal networks or your company’s address book exist probably already partly in social networks.
  • Internet of Things: The Internet of Things consists of sensors and actuators delivering data or executing actions in the real world supported by your business applications and processes. Different platforms exist to source real world data or schedule actions in the real world using actuators. The API Management challenge exists here, but it goes even beyond: You create dynamic semantic concepts and relate your Internet of Things data to it. For example, you have attached an RFID and a temperature sensor to your parcels. Their data needs to be added to the information about your parcel in the ERP system. Besides the semantic concept “parcel” you have also that one of a “truck” transporting your “parcel” to a destination, i.e. you have additional location information. Furthermore it may be stored temporarily in a “warehouse”. Different business applications and processes may need to know where the parcel is. They do not query the sensor data (e.g. “give me data from tempsen084nl_98484”), but rather formulate a query “list all parcels in warehouses with a temperature above 0 C” or “list all parcels in transit”. Hence, Internet of Thing data needs to be dynamically linked with business concepts used in different clouds. This is particularly challenging for SaaS applications, which may have different conceptualization of the same thing.

Enterprise Architecture for Inter-Cloud Applications

You may wonder how you can integrate the above scenario at all in your application landscape and why you should do it at all. The basic promise of cloud computing is that it scales according to your needs, that you can outsource infrastructure to people who have the knowledge and capabilities to run the infrastructure. Particularly, small and medium size enterprises benefit from this and the cost advantage. It is not uncommon that modern startups start their IT using the cloud (e.g. FourSquare).

However, also large corporations can benefit from the cloud, e.g. as a “neutral” ground for a complex supply chain with a lot of partners or to ramp up new innovative business models where the outcome is uncertain.

Be aware that in order to offer some solution based on the cloud you need to first have a solid maturity of your enterprise architecture. Without it you are doomed to fail, because you cannot make proper risk and security analysis, scaling and benefit from cost reductions as well as innovation.

I propose in the following figure an updated model of the enterprise architecture with new components for managing cloud-based applications. The underlying assumption is that you have an enterprise architecture, more particularly a semantic model of business objects and concepts.


  • Public/Private Border Gateway: This gateway is responsible for managing the transition between your private cloud and different public clouds. It may also deploy agents on each cloud to enable a secure direct communication between different cloud platforms without the necessity to go through your own infrastructure. You might have more fine granular gateways, such as private, closest supplier and public. A similar idea came to me a few years ago when I was working on inter-organizational crisis response information systems. The gateway is not only working on the lower network level, but also on the business processes and objects level. It is business-driven and depending on business processes as well as rules, it decides where the borders should be set dynamically. This may also mean that different business processes have access to different things in the Internet of Things.
  • Semantic Matcher: The semantic matcher is responsible for translating business concepts from and to different technical representations of business objects in different cloud platforms. This can be simple transformations of not-matching data types, but also enrichment of business objects from different sources. This goes well beyond current technical standards, such as EDI or ebXML, which I see as a starting point. Semantic matching is done automatically – there is no need for creating time consuming manual mappings. Furthermore, the semantic matcher enhances business objects with Internet of Things information, so that business applications can query or trigger them on the business level as described before. The question here is how you can keep people in control of this (see Monitor) and leverage semantic information.
  • API Manager: Cloud API management is the topic of the coming years. Besides the semantic challenge, this component provides all necessary functionality to bill, secure and publish your APIs. It keeps track how is using your API and what impact changes on it may have. Furthermore, it supports you to compose new business software distributed over several cloud platforms using different APIs subject to continuous change. The API Manager will also have a registry of APIs with reputation and quality of service measures. We see now a huge variety of different APIs by different service providers (cf. ProgrammableWeb). However, the scientific community and companies have not picked up yet the inherent challenges, such as the aforementioned semantic matching, monitoring of APIs, API change management and alternative API compositions. While there exists some work in the web service community, it has not yet been extended to the full Internet dimension as it has been described in the scenario here. Additionally, it is unclear how they integrate the Internet of Thing paradigm.
  • Monitor: Monitoring is of key importance in this inter-cloud setting. Different cloud platforms offer different and possible very limited means for monitoring. A key challenge here will be to consolidate the monitoring data and provide an adequate visual representation to do risk analysis and selecting alternative deployment strategies on the aggregated business process level. For instance, by leveraging semantic integration we can schedule request to semantically similar cloud and business resources. Particularly, in the Internet of Thing setting, we may observe unpredictable delays, which lead to delayed execution of real-world activities, e.g. a robot is notified that a parcel flew off the shelf only after 15 minutes.

Developing and Managing Inter-Cloud Business Applications

Based on your enterprise architecture you should ideally employ a model-driven engineering approach. This approach enables you automation of the software development process. Be aware that this is not easy to do and failed often in practice – However, I have also seen successful approaches. It is important that you select the right modeling languages and you may need to implement your own translation tools.

Once you have all this infrastructure, you should think about software factories, which are ideal for developing and deploying standardized services for selected platforms. I imagine that in the future we will see small emerging software factories focusing on specific aspects of a cloud platform. For example, you will have a software factory for designing graphical user interfaces using map applications enhanced with selected Odata services (e.g. warehouse or plant locations). In fact, I expect soon a market for software factories which enhances the idea of very basic crowd sourcing platforms, such as the Amazon Mechanical Turk.

Of course, since more and more business applications shift towards the private and public clouds, you will introduce new roles in your company, such as the Chief Cloud Officer (CCO). This role is responsible for managing the cloud suppliers, integrating them in your enterprise architecture and proper controlling as well as risk management.


The cloud exists already today! More and more tools emerge to manage it. However, they do not take into account the complete picture. I described several components for which no technologies exist. However, some go in the right direction as I will briefly outline.

First of all, you need technology to manage your API to provide a single point of management towards your cloud applications. For instance, Apache Delta Cloud allows managing different IaaS provider, such as Amazon EC2, IBM SmartCloud or OpenStack.

IBM Research also provides a single point of management API for cloud storage. This goes beyond simple storage and enables fault tolerance and security.

Other providers, such as Software AG, Tibco, IBM or Oracle provide “API Management” software, which is only a special case of API Management. In fact, they provide software to publish, manage the lifecycle, monitor, secure and bill your own APIs for the public on the web. Unfortunately, they do not describe the necessary business processes to enable their technology in your company. Besides that, they do not support B2B interaction very well, but focusing on business to development aspects only. Additionally, you find registries for public web APIs, such as ProgrammableWeb or APIHub, which are first starting point to find APIs. Unfortunately, they do not feature sematic description and thus no semantic matching towards your business objects, which means a lot of laborious manual work for doing the matching towards your application.

There is not much software for managing the borders between private and public cloud or even allowing more fine-granular borders, such as private, closest partner and the public. There is software for visualizing and monitoring these borders, such as the eCloudManager by Fluid Operations. It features semantic integration of different cloud resources. However, it is unclear how you can enforce these borders, how you control them and how can you manage the different borders. Dome 9 goes into this direction, but focuses only on security policies for IaaS applications. It does only understand data and low level security, but not security and privacy over business objects. Deployment configuration software, such as Puppet or Chef, are only first steps, since they focus only on deployment, but not on operation.

On the monitoring side you will find a lot of software, such as Apache Flume or Tibco HAWK. While these operate more on the lower level of software development, IFTTT enables execution of business rules over data on several cloud providers providing public APIs. Surprisingly, it considers itself at the moment more as a end user facing company. Additionally, you find in the academic community approaches for monitoring distributed business processes.

Unfortunately, we find little ready to go software in the area “Internet of Things”. I worked myself with several R&D prototypes enabling cloud and gateways, but they are not ready for the market. Products have emerged but they are only for a special niche, e.g. Internet of Things enabled point of sale shop. They lack particularly a vision how they can be used in an enterprise-wide application landscape or within a B2B enterprise architecture.


I described in this blog the challenges of inter-cloud business applications. I think in the near future (3-5 years) all organizations will have some them. Technically they are already possible and exist to some extent. The risk and costs will be for many companies lower than managing everything on their own. Nevertheless key requirement is that you have a working enterprise architecture management strategy. Without it you won’t have any benefits. More particularly, from the business side you will need adequate governance strategies for different clouds and APIs.

We have seen already key technologies emerging, but there is still a lot to do. Despite decades of research on semantic technologies, there exists today no software that can perform automated semantic matching of cloud and business concepts existing in different components of an inter-cloud business application. Furthermore, there are no criteria on how to select a semantic description language for business purposes that are as broad as described here. Enterprise Architecture Management tools in this area only slowly emerge. Monitoring is still fragmented with many low level tools, but only few high-level business monitoring tools. They cannot answer simple questions, such as “what if cloud provider A goes down then how fast can I recover my operations and what are the limitations”. API Management is another evolving area, but which will have a significant impact in the coming years. However, current tools only consider low-level technical aspects and not high-level business concepts.

Finally, you see that a lot of challenges mentioned in the beginning, such as the social network challenge or Internet of Thing challenge, are simply not yet solved, but large scale research efforts are on their way. This means further investigation is needed to clarify the relationships between the aforementioned components. Unfortunately, many of the established middleware vendors lack a clear vision of cloud computing and the Internet of Things. Hence, I expect this gap will be filled by startups in this area.