Master Data Management and the Internet of Things

Master Data Management (MDM) has matured and grown significantly over the last years. The main motivation for master data management is to have a complete and accurate view on master data objects in your organization. Master data objects describe key assets, such as machines or customers, generating value for your organization. Hence, MDM fosters processes to enhance and improve the quality of master data, so that the key assets are used properly to generate value. However, most of these processes require still manual intervention by humans. Furthermore, master data is usually not up-to-date due to its manual improvement and tracking. Especially the current state of master data is usually only entered into the system after hours or even days. This makes it difficult to act upon this state or to predict changes to it. Clearly, this can be a disadvantage compared to the competition who leverages real-time information when using their master data. For instance, one cannot predict that a customer might move to another city in the near future or that the planes one operates will require maintenance at an inconvenient time delaying an offered flight.

The vision of the Internet of Things (IoT) is to connect things sensing and acting upon their environments to the Internet and exchanging data about their state as well as their environment. IoT enables a real-time 360° view on your key assets and their interaction with the environment. Current studies estimate that by 2020 several billions of things will be connected via the internet.

Hence, it make sense to combine MDM and IoT to improve your business processes acting upon master data. These processes will benefit from an up-to-date state of master data, but can use this data to enable predictive analytics applications, such as predictive maintenance or customer retention.

I will describe in more detail both concepts and how they can be integrated. Afterwards I will discuss current challenges with respect to architectures, data models and predictive analytics applications. Finally, I will provide insights on how next generation MDM systems look like.

What is Master Data Management?

Master Data is data about the key assets in a company. Examples are customers, machines, products, suppliers, financial assets or business partners.

One should differentiate master data from transactional data, which always refers to master data. Master data object can exist on their own and do not need to refer necessarily to other data, i.e. they make sense without any relations. For instance, a customer can exist without other customers. However, the customer has usually (social) relations to other customers. A transaction for buying a product cannot exist without a customer and a product.

One of the key issues for MDM is the integration of various systems containing master data. Usually this data is inconsistent and incomplete due to various reasons. This has significant impact on the business processes using master data, which leads to significant cost and waste of resources.

Hence master data management solutions provide various means to improve master data quality automatically and manually. For instance, they offer rules engine to validate data quality and workflow engines to assign tasks to data stewards to fix incorrect data. Currently, most efforts related to improving master data quality is by improving it manually.

What is the Internet of Things (IoT)?

The Internet of Things is about a paradigm that connect any things, such as machines, cars, smartphones, thermostates or smoke detectors to the Internet where they provide information about their state and their environment to other things as well as humans.

For example, a machine can report its utilization to other machines and inform its users about alternative machines to use in case of high utilization.

The Internet of Things does not only take into account the current state of things, but also the future state of things by employing predictive analytics applications.

For instance, a car can predict based on its sensor information that the engine is likely to fail within the next seven days. It can schedule maintenance with the manufacturer so it does not fail when it is needed by the driver.

Challenges Combining MDM and IoT

The main benefits of integrating MDM and IoT are the following:

  • Automatically update master data and its state to improve value-flow of current business processes.
  • Enable prediction on master data, such as predictive maintenance of machines or predictive customer behavior, to enable new types of business processes and models.

However, currently there are some challenges integrating them.

Internet of Things and Semantic Challenges

The Internet of Things brings only value to an organization if it can use the IoT information within a proper analytics model describing the semantic relations between things and master data objects.

For instance, if the company collects only information such as sensor “A4893983” reports its location as “50.106529,8.662162” then it is of very little value to the company.

However, if it would have a proper semantic description for MDM and IoT data then it can leverage this data to generate the following information: “Customer Max Mustermann is currently at Frankfurt central station and using one of our products. His friend, Martha Musterfrau, is currently near him, but having problems with one of our products”.

These types of predictive analytics and semantic models as well IoT information require new database technologies, which will be described later.

Combining Big Data and Master Data Management

Traditional master data management solutions have not been designed with “Big Data” in mind. However, combining MDM and IoT require “Big Data”:

  • Higher data volumes due to IoT Data
  • Complex analytics queries over existing MDM data with a lot of relationships
  • Variety of information in master data objects and IoT database

This requires as well new database technologies.

Providing Prediction to Business Processes

Traditional master data management solutions only support provision of master data to business processes. However, modern master data management solutions supporting IoT will have to provide predictive analytics to business processes. Examples are answers to questions, such as the following:

  • Which of my machines is likely to fail next and which ones should be sent to maintenance?
  • What product is the customer most likely to buy next and which material do I need to buy to build it?

Relational databases are suitable for descriptive statistics, but quickly reach their limit with respect to even simple prediction models. Hence, new database technologies have to be supported.

Technology Support

Current MDM solutions are based mostly on relational SQL databases together with caching solutions. This is suitable for integrating master data objects from MDM systems into today’s business processes. Unfortunately, this makes them less suitable for predictive analytics applications due to the limitation of relational algebra. They also cannot handle a lot of relations between master data objects as it is required today (e.g. many different versions of master data objects or by master reference data, such as social network graphs or dependency graphs). This limits as well opportunities for data quality enhancements and results in poorer data quality. This leads to higher costs within the business processes using master data.

Modern MDM solutions leverage Graph databases to store and analyze master data objects as well as provide them to business processes. They offer similar transactional guarantees as relational databases, but have different storage and index structures more suitable for MDM. However, they have not become yet first class citizens in companies which currently have to build up knowledge in this areas. Nevertheless, large software vendors, such as SAP or Oracle are starting to offer graph databases as part of their databases solutions. Popular open source graph databases/processing solutions, such as OrientDB, Neo4J or Spark GraphX, TitanDB exist since several years and they can cope with large amounts of data.

Furthermore, relational databases only poorly integrate IoT data which is about the ability to digest large volumes of data and do analytics on them. This cannot be coped with anymore using vertical scaling – a prominent paradigm for relational databases, but a database cluster consisting of several communicating nodes is needed. Column-stores, such as Apache Cassandra (together with an analytics framework, such as Hadoop MapReduce or Apache Spark), Hadoop/HBase (Parquet) or SAP HANA, seem to be most suitable for this scenario. They offer high read/write throughput and thus are able to cope with the high volume of IoT data. Furthermore, they can be scaled horizontally by adding new database nodes to an existing network of nodes. Finally, you can manage load by using Apache Kafka Messaging Technology.

Find here my university-level lecture material on NoSQL & Big Data platforms.


The following figure illustrates the concept of MDM and IoT by means of an  exemplary data model. Master data objects are represented as nodes of a graph with relations to other nodes. The following master data objects can be identified: 2 electronic devices and 2 customers. The customers, Max Mustermann and Martha Musterfrau, are friends and this is represented in the Master data object graph. Furthermore each of the customers has an ownerships relation to a product (an electronic device) sold by a company.

Finally, IoT data is illustrated in the figure. This data is connected to the master data objects providing information about their state. For example, the smartphones of the customers provide information about their location (“Central Station, Frankfurt, Germany”). The IoT data of the eletronic devices provide information about their operation status. One electronic device is operating normal and the other one is broken.

examplemdmgraph The example demonstrate only a small excerpt of what is possible with a next generation master data and IoT management system. Some examples for queries that can be answered:

  • Who is the owner of devices in the state “Broken”?
  • Which customers can support other customers nearby with devices in state “Broken”?
  • Which customers influence their friends to buy new devices or recommend devices?
  • Which devices in Frankfurt are likely to fail within the next week and needs replacement?

Additional information from IoT data enables superior data quality. For instance, we can properly identify customers and devices. This avoids costly maintenance of working devices or costly replacement of non-working ones.

It is obvious that such a new system enables enhanced sales to customers because more information allows more targeted advertisement and more customization. Based on prediction models one can offer completely new value-added services.


Master Data Management enters a new area: New database technologies and the Internet of Things enable superior data quality and open up new business cases, such as predictive analytics. Ultimately this leads to new business processes offering superior value.

Nevertheless, only few MDM solutions are leveraging these new technologies yet, although these new technologies are already quiet mature. Additionally, the Internet of Things has to become more pervasive and organizations need to pressure their suppliers and customers to engage more with it.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s