Process Mining as Computational Process Intelligence
A formal and technical view of event-log analysis, object-centric process models, conformance checking, performance diagnosis, and enterprise applications
Process mining reconstructs operational behavior from event data rather than from procedural descriptions, making it a computational discipline for observing how enterprises actually execute their processes. This article develops process mining from first principles, starting from event-log construction, trace semantics, directly-follows relations, Petri-net models, variant analysis, behavioral entropy, conformance checking, performance mining, and Little’s Law. It then extends the classical case-centric model toward object-centric process mining, where events are related to multiple enterprise objects such as orders, deliveries, invoices, payments, materials, suppliers, and resources. The article frames process mining as a data-engineering and enterprise-architecture problem, because the validity of any discovered model depends on the semantic correctness of the event-construction function. It also discusses applications in order-to-cash, procure-to-pay, inventory, manufacturing, finance, audit, ERP migration, and digital transformation. Finally, it examines Celonis as a commercial instantiation of process intelligence and positions process mining as an architectural feedback mechanism for comparing intentional enterprise architecture with observed behavioral execution.
enterprise architecture
tutorial
🇬🇧
Author
Affiliation
Antonio Montano
4M4
Published
October 22, 2023
Modified
January 17, 2026
Keywords
process mining, event logs, computational process intelligence, process discovery, conformance checking, performance mining, object-centric process mining, OCEL, XES, Petri nets, directly-follows graph, behavioral entropy, process variants, ERP, business process management, enterprise architecture, data engineering, process intelligence, task mining, process automation, operational excellence, digital transformation, internal control, audit, supply chain, order-to-cash, procure-to-pay
Process mining reconstructs operational behavior from event data rather than from procedural descriptions, making it a computational discipline for observing how enterprises actually execute their processes. This article develops process mining from first principles, starting from event-log construction, trace semantics, directly-follows relations, Petri-net models, variant analysis, behavioral entropy, conformance checking, performance mining, and Little’s Law. It then extends the classical case-centric model toward object-centric process mining, where events are related to multiple enterprise objects such as orders, deliveries, invoices, payments, materials, suppliers, and resources. The article frames process mining as a data-engineering and enterprise-architecture problem, because the validity of any discovered model depends on the semantic correctness of the event-construction function. It also discusses applications in order-to-cash, procure-to-pay, inventory, manufacturing, finance, audit, ERP migration, and digital transformation. Finally, it examines Celonis as a commercial instantiation of process intelligence and positions process mining as an architectural feedback mechanism for comparing intentional enterprise architecture with observed behavioral execution.
From process descriptions to computational evidence
A business process is usually described as a normative abstraction: a purchase order is created, approved, received, invoiced, matched, and paid; a sales order is entered, confirmed, delivered, invoiced, and collected; a production order is planned, released, executed, confirmed, and closed. This representation is useful for governance, but it is not a measurement of operational reality. It is a model of intended behavior.
Process mining starts from a stricter premise. A process is not what a procedure says. A process is the set of temporally ordered state transitions that can be reconstructed from digital traces left by operational systems.
%%{init: {"theme": "neo", "look": "handDrawn", "layout": "elk"}}%%
flowchart TD
D["Raw operational data<br/>ERP, CRM, MES, WMS, logs"] --> PHI["Semantic extraction<br/>φ : D → E"]
PHI --> E["Event log or object-centric event structure"]
E --> A["Mining algorithms<br/>discovery, conformance, performance"]
A --> I["Process model and diagnostics"]
I --> G["Governed decisions and actions"]
G -. feedback .-> D
Figure 1: Process mining as an evidence pipeline. Raw operational data from enterprise systems are transformed into event logs or object-centric event structures, analyzed through process-mining algorithms, converted into diagnostics, and used to support governed decisions and actions.
The primitive object is therefore not a diagram, a procedure, or a workshop narrative. The primitive object is an event.
An event is an observed fact of the form:
e = (i, a, t, r, x)
where:
i is a case identifier or, in a more general formulation, a set of related business objects;
a is the activity or state transition observed;
t is a timestamp;
r is the resource, system, organizational unit, or agent responsible for the event;
x is a vector of attributes describing the event context.
The fundamental assumption is minimal: if an enterprise system records enough events with reliable ordering, then the operational process can be reconstructed empirically. The reconstructed process may contradict the designed process. That contradiction is not an error of the analytical method; it is the main object of analysis.
This shift is epistemologically important. Traditional process modeling is top-down and intentional. Process mining is bottom-up and evidential. The former asks what the organization believes should happen. The latter asks what the information systems prove has happened.
The technical discipline of process mining emerged around this exact distinction: event data are treated as first-class analytical objects, and process knowledge is extracted from event logs rather than inferred only from interviews, diagrams, or procedural documentation.12
Figure 2: Case-centric event-log structure. An event log is represented as a collection of traces, where each trace corresponds to one case and contains an ordered sequence of events with activity, timestamp, resource, and contextual attributes.
In classical case-centric process mining, an event log can be defined as a multiset of traces.3
L \in \mathcal{B}(\mathcal{A}^{*})
where:
\mathcal{A} is the finite set of activities;
\mathcal{A}^{*} is the set of all finite sequences over \mathcal{A};
\mathcal{B}(X) denotes the set of multisets over X.
A trace is a finite sequence:
\sigma = \langle a_1, a_2, \ldots, a_n \rangle
where each a_k \in \mathcal{A} corresponds to an activity executed for the same case.
More rigorously, an event log can be defined as a finite set of event identifiers E equipped with attribute functions:
case : E \rightarrow C,\quad act : E \rightarrow \mathcal{A},\quad time : E \rightarrow T,\quad res : E \rightarrow R,\quad attr : E \rightarrow X
where:
C is the set of cases;
\mathcal{A} is the activity alphabet;
T is a totally or partially ordered time domain;
R is the set of resources;
X is the attribute domain.
This formulation preserves event identity even when two events share the same case, activity, timestamp, resource, and attribute vector.
For a case c \in C, the observed trace is obtained by selecting all events associated with c and ordering them by timestamp:
This apparently simple definition hides several non-trivial problems:
Timestamps may be non-unique, inconsistent, delayed, batch-generated, or produced by different systems with different clock semantics.
The same business event may be represented by several technical records.
The case identifier may be ambiguous.
Some relevant events may be missing because they occur outside the transactional system, for example in emails, spreadsheets, manual controls, warehouse terminals, MES systems, supplier portals, banking systems, or desktop activity.
The quality of process mining is therefore bounded by the quality of event abstraction. Formally, if the source operational data are represented by a set of raw records D, event-log construction is a mapping:
\phi : D \rightarrow L_E
where L_E denotes the constructed event-log structure, including event identifiers, case assignments, activities, timestamps, resources, and attributes.
The mining problem is not only the discovery of a process model from E. It is also the correctness of \phi. A poor extraction function can create an apparently precise but semantically false process model.
This is why process mining is not merely an algorithmic discipline. It is also a data-engineering, semantic-modeling, and enterprise-architecture discipline.
Process discovery
Process discovery is the construction of a process model M from an event log L:
discover(L) = M
The model M can belong to several formalisms:
directed graphs;
directly-follows graphs;
Petri nets;
process trees;
BPMN-like models;
declarative constraint systems;
stochastic automata;
object-centric graphs.
The simplest representation is the directly-follows graph. Given an event log L, one defines a relation:
a >_L b
if there exists at least one trace \sigma \in L such that activity a is immediately followed by activity b in \sigma.
The frequency of the edge (a,b) is more precisely the number of observed adjacent occurrences of a followed by b across the multiset of traces:
F = \{(a,b) \in \mathcal{A} \times \mathcal{A} : a >_L b\}
This model is computationally simple and operationally useful. It shows frequent paths, rare paths, loops, skips, and variant structures. However, it is not always semantically precise. It may confuse concurrency with choice, and it may over-represent spurious behavior if the event log contains noise or incomplete traces.
For example, if two activities a and b are concurrent, one may observe both sequences:
\langle a,b \rangle
and:
\langle b,a \rangle
A directly-follows graph may represent this as two causal paths, even though the correct interpretation may be parallelism rather than mutual causality.
More expressive discovery algorithms therefore try to infer constructs such as sequence, choice, concurrency, loop, silent transition, and synchronization. In Petri-net terms4, a discovered model can be represented as:
N = (P, T_N, F, M_0, M_f)
where:
P is the set of places;
T_N is the set of transitions;
F \subseteq (P \times T_N) \cup (T_N \times P) is the flow relation;
M_0 is the initial marking;
M_f is the final marking.
The value of Petri nets is that they provide formal execution semantics. A transition is enabled only when the required input tokens exist. This makes it possible to reason rigorously about reachability, deadlocks, soundness, concurrency, and conformance.
Variants and behavioral entropy
A process variant is a distinct trace pattern. If L is an event log, the set of variants is:
V(L) = \{\sigma : \sigma \in supp(L)\}
where supp(L) is the support of the multiset L.
The probability of a variant \sigma is:
p(\sigma) = \frac{count_L(\sigma)}{|L|}
A useful measure of process fragmentation is behavioral entropy:
Low entropy indicates that the process is concentrated around a small number of dominant variants. High entropy indicates behavioral dispersion. In enterprise processes, high entropy is not automatically bad. It may reflect product diversity, customer segmentation, legal-entity differences, localization, exception handling, or legitimate operational flexibility. However, high entropy without business explanation is a signal of weak standardization or uncontrolled exception handling.
A mathematically disciplined interpretation requires conditioning. Let Z be a set of contextual attributes, such as legal entity, business unit, product family, supplier type, customer segment, country, plant, warehouse, or channel. Then one should compare:
H(L)
with:
H(L \mid Z)
If entropy collapses after conditioning on Z, the variation is structurally explained. If entropy remains high within homogeneous partitions, the variation is likely operational rather than structural.
This distinction matters in ERP transformation. A multinational group may show hundreds of order-to-cash variants globally. Some are legitimate because fiscal, logistic, or customer-channel conditions differ. Others are pure organizational noise. Process mining becomes valuable when it separates these two cases.
Conformance checking
Discovery asks: what model can be inferred from the event log? Conformance checking asks a different question: given a normative model M and an event log L, how well does the observed behavior conform to the expected behavior?
%%{init: {"theme": "neo", "look": "handDrawn", "layout": "elk"}}%%
flowchart TD
M["Normative process model M"] --> LM["Allowed behavior<br/>𝓛(M)"]
LOBS["Observed event log<br/>L_obs"] --> ALIGN["Alignment computation"]
LM --> ALIGN
ALIGN --> FIT["Fitness<br/>Can the model replay the log?"]
ALIGN --> PREC["Precision<br/>Does the model allow too much?"]
ALIGN --> DEV["Deviation diagnostics<br/>skips, inserts, loops, wrong order"]
DEV --> ACT["Control remediation<br/>process, system, data, training"]
Figure 3: Conformance checking through alignment. The observed event log is compared with the behavior allowed by the normative process model. The alignment computation produces fitness, precision, deviation diagnostics, and remediation inputs for process, system, data, or training interventions.
Formally, conformance checking evaluates the relation between:
L_{obs}
and:
\mathcal{L}(M)
where \mathcal{L}(M) is the language of all traces allowed by model M.
The simplest view is language membership:
\sigma \in \mathcal{L}(M)
If \sigma is allowed by the model, it is conformant. If it is not, it deviates. In real processes, however, this binary view is too crude. A trace may deviate slightly or severely. It may skip one activity, execute activities in the wrong order, repeat a control step, or contain additional manual interventions.
A more useful formulation is alignment-based conformance. Given an observed trace \sigma and a model M, an alignment is a sequence of moves that synchronizes observed events with model transitions. A synchronous move means that the event and the model agree. A log move means that the event occurred but the model did not expect it. A model move means that the model expected an activity that was not observed.
The optimal alignment minimizes a cost function:
cost(\gamma) = \sum_{m \in \gamma} c(m)
where \gamma is an alignment and c(m) is the cost assigned to each move type. The deviation distance is then:
d(\sigma, M) = \min_{\gamma \in Align(\sigma,M)} cost(\gamma)
Fitness can be interpreted as the degree to which the model can replay the log:
Here, d_{max}(\sigma,M) denotes a normalizing maximum deviation cost for trace \sigma with respect to model M, so that the resulting fitness score can be interpreted on a bounded scale.
Precision measures the inverse problem: whether the model allows too much behavior that was never observed. A model with perfect fitness but poor precision may be so permissive that it accepts almost everything. Such a model is not analytically useful.
Generalization measures whether the model captures behavior likely to occur beyond the observed sample. Simplicity penalizes unnecessary structural complexity. Process discovery is therefore not a single-objective optimization problem. It is a trade-off among fitness, precision, generalization, and simplicity.
Performance mining
Process mining becomes operationally powerful when control-flow analysis is combined with temporal and economic measures. Let e_i and e_j be two events in the same trace such that t(e_i) < t(e_j). The elapsed time between them is:
\Delta t(e_i,e_j) = t(e_j) - t(e_i)
For two activities a and b, one can define the empirical waiting-time multiset:
where e_i \rightarrow e_j denotes the selected behavioral relation, usually direct succession or a domain-specific milestone relation. The mean waiting time is:
In practice, the distribution is often more informative than the mean. Enterprise waiting times are typically heavy-tailed. A small number of extreme cases may dominate the average. Median, quantiles, conditional distributions, and survival functions are therefore more robust.
For a process instance c, cycle time can be defined as:
CT(c) = t_{end}(c) - t_{start}(c)
Throughput is:
TH = \frac{N}{T}
where N is the number of completed cases in observation window T.
Work in progress can be estimated as:
WIP(t) = |\{c : t_{start}(c) \leq t < t_{end}(c)\}|
Under standard stability and flow-conservation assumptions, Little’s Law provides a fundamental consistency relation:5
\overline{WIP} = TH \cdot \overline{CT}
This relation is useful because it prevents superficial interpretations. If throughput is constant and WIP grows, then cycle time must grow. If the organization wants shorter cycle time without reducing input volume, it must either increase capacity, reduce rework, reduce queues, or change the flow structure.
Process mining gives empirical estimates of these quantities per process segment, resource, product family, legal entity, supplier class, plant, warehouse, or customer group.
Object-centric process mining
The classical event-log model assumes one dominant case identifier. This assumption is often false in enterprise systems.
Consider procure-to-pay. A single purchase requisition can generate multiple purchase orders. A purchase order can contain multiple lines. A line can be partially received. Several receipts can be matched to one invoice. One invoice can refer to several purchase orders. One payment can settle multiple invoices. The process is not a sequence over one object. It is a network of interacting objects. The same problem appears in order-to-cash, make-to-order, engineer-to-order, maintenance, logistics, warehouse management, project accounting, and intercompany flows.
%%{init: {"theme": "neo", "look": "handDrawn", "layout": "elk"}}%%
flowchart TD
E1(("Event<br/>Invoice posted"))
INV["Invoice<br/>INV-9981"] --- E1
PO["Purchase Order<br/>PO-450001"] --- E1
POL["Purchase Order Line<br/>PO-450001-10"] --- E1
SUP["Supplier<br/>SUP-210"] --- E1
LE["Legal Entity<br/>IT01"] --- E1
PO --- POL
PO --- SUP
POL --- INV
INV --- LE
Figure 4: Object-centric event structure. A single business event, such as invoice posting, can be related to several enterprise objects at the same time, including invoices, purchase orders, purchase order lines, suppliers, and legal entities.
The object-centric formulation replaces the case-centric trace with a typed object-event structure. Let:
O = \bigcup_{\tau \in \mathcal{T}} O_{\tau}
be the set of business objects partitioned by object type \tau, such as Sales Order, Sales Order Line, Delivery, Invoice, Payment, Purchase Order, Production Order, Batch, Shipment, Handling Unit, or Service Ticket.
An event is associated not with one case but with a set of objects:
rel(e) \subseteq O
The event log becomes a structure:
L_{OC} = (E, O, type, act, time, rel, attr)
where:
E is the set of events;
O is the set of objects;
type : O \rightarrow \mathcal{T} assigns each object to an object type;
act : E \rightarrow \mathcal{A} assigns an activity to each event;
time : E \rightarrow T assigns a timestamp;
rel \subseteq E \times O relates events to objects;
attr stores event and object attributes.
This structure is naturally a typed temporal hypergraph. Events are hyperedges connecting multiple business objects. Objects persist. Events occur. Relationships carry the semantics of operational interaction.
The analytical gain is significant. Instead of forcing the organization to choose whether the process is by order, by order line, by delivery, or by invoice, one can analyze several perspectives over the same object-centric model.
For example:
from the Sales Order perspective, one sees customer-commitment behavior;
from the Delivery perspective, one sees logistic execution;
from the Invoice perspective, one sees billing behavior;
from the Payment perspective, one sees cash-collection behavior;
from the Item perspective, one sees material-flow behavior.
Object-centric event logs are therefore more faithful to ERP reality. The OCEL 2.0 standard formalizes this kind of event data and supports richer relationships between events and objects, including object-to-object relationships and multiple exchange formats.6
Data architecture for process intelligence
A process-mining system is not an isolated analytical tool. It is a data product over operational systems.
A robust architecture contains at least seven layers.
Layer 1: source systems
The source layer includes ERP, CRM, MES, WMS, PLM, TMS, service-management systems, banking systems, supplier portals, customer portals, spreadsheets, workflow engines, API gateways, and sometimes desktop-interaction logs.
The essential question is not whether these systems contain data. They do. The essential question is whether they contain reconstructible event semantics.
A database table recording the current status of an order is insufficient unless historical status transitions are retained. Process mining requires temporal evidence, not only current state.
Layer 2: extraction layer
The extraction layer collects raw operational records. This may occur through database replication, APIs, change-data capture, flat-file exports, event streams, audit tables, message queues, or system-specific connectors.
The extraction design must preserve:
timestamp fidelity;
source-system identifiers;
transaction keys;
user and resource metadata;
status-transition history;
deletion and reversal semantics;
time-zone normalization;
legal-entity and organizational context.
Layer 3: semantic transformation layer
The semantic transformation layer maps raw records to event abstractions.
This layer defines the function:
\phi : D \rightarrow E
It decides that a row in a change-log table corresponds to an event such as “Purchase Order Approved”, “Sales Order Blocked”, “Delivery Created”, “Invoice Posted”, “Goods Receipt Reversed”, or “Payment Cleared”.
This layer is the most critical one. If it is wrong, all subsequent process analysis becomes formally valid but operationally false.
Layer 4: process data store
The process data store persists event logs, object tables, relationships, attributes, and derived metrics. In a case-centric model, this may be a set of event-log tables. In an object-centric model, this becomes a graph-like or relational structure connecting objects, events, and relationships.
The data store should support incremental refresh, historical reproducibility, lineage, versioning, and auditable transformation logic.
Layer 5: analytical layer
The analytical layer executes discovery, conformance checking, performance mining, variant analysis, root-cause analysis, predictive models, and simulation.
The output is not merely a graph. The output is a set of hypotheses about operational behavior, such as:
late deliveries are concentrated in specific product families;
invoice blocks correlate with missing purchase-order references;
maverick buying correlates with specific plants or buyers;
rework loops appear after manual price changes;
customer-credit blocks increase cycle time but reduce exposure;
intercompany flows create hidden waiting time between legal entities.
Layer 6: decision layer
The decision layer converts analytical signals into managerial or operational interventions. These may include alerts, exception queues, workflow tasks, control recommendations, automation triggers, policy changes, master-data remediation, or process redesign.
The decision layer must distinguish between correlation and intervention. A bottleneck signal is not automatically a prescription. It becomes a prescription only when the causal mechanism is understood well enough to justify action.
Layer 7: governance layer
The governance layer defines data ownership, process ownership, access control, semantic standards, metric definitions, auditability, and change management.
Without governance, process mining degenerates into visually persuasive analytics with unstable semantics. With governance, it becomes a computational representation of enterprise operations.
Potential applications
Order-to-cash
Order-to-cash is one of the most natural application areas because it combines commercial commitments, logistics, invoicing, credit management, and cash collection.
Relevant objects include:
customer;
quotation;
sales order;
sales order line;
delivery;
shipment;
invoice;
credit memo;
payment;
dispute.
Relevant events include:
quotation created;
order created;
order confirmed;
credit block applied;
block released;
delivery created;
goods issued;
invoice posted;
payment received;
dispute opened;
dispute closed.
Typical analyses include:
cycle time from order creation to delivery;
cycle time from delivery to invoice;
days sales outstanding;
frequency and duration of credit blocks;
rework caused by order changes;
delivery delays by warehouse, carrier, item, customer segment, or legal entity;
revenue leakage from billing errors;
exception paths that bypass standard approval.
In an ERP transformation, this analysis can validate whether the target operating model is actually being adopted after go-live.
Procure-to-pay
Procure-to-pay is structurally suitable for conformance checking because procurement processes usually contain explicit control expectations.
Relevant objects include:
purchase requisition;
purchase order;
purchase order line;
supplier;
goods receipt;
service entry sheet;
invoice;
payment;
contract.
Relevant events include:
requisition created;
requisition approved;
purchase order created;
purchase order approved;
goods received;
invoice received;
invoice blocked;
invoice released;
payment executed.
Typical analyses include:
maverick buying;
purchase orders created after invoice receipt;
missing approvals;
three-way match deviations;
invoice blocks by supplier or buyer;
payment-term deviations;
late payment risk;
contract leakage;
approval bottlenecks.
The mathematical value lies in comparing observed traces with control models. A deviation is not a subjective finding. It is a measurable distance between L_{obs} and \mathcal{L}(M).
Inventory and supply-chain planning
Inventory processes are not always modeled as clean workflows, but they generate rich event data. Object-centric process mining is especially relevant because inventory behavior emerges from interactions among items, warehouses, production orders, purchase orders, sales orders, transfer orders, forecasts, and replenishment policies.
Relevant analyses include:
stockout root-cause chains;
excess stock caused by forecast changes or minimum-order quantities;
lead-time variability by supplier and item;
replenishment-cycle deviations;
warehouse transfer delays;
blocked or quality-inspection stock;
planning nervousness;
mismatch between planning parameters and actual execution.
A process-mining layer can reveal whether the planning model is structurally coherent. If reorder points, safety stocks, lead times, and coverage groups are configured in one way but the actual replenishment process behaves differently, the difference becomes measurable.
Manufacturing and MES/ERP integration
Manufacturing processes combine ERP-level production orders with MES-level execution events and machine-level telemetry. The relevant process is not only administrative. It is cyber-physical.
Relevant objects include:
production order;
batch;
operation;
work center;
machine;
material lot;
quality inspection;
nonconformance;
maintenance order.
Relevant events include:
production order released;
material staged;
operation started;
operation completed;
machine stopped;
scrap recorded;
quality inspection completed;
rework initiated;
order technically completed.
Potential analyses include:
waiting time between release and execution;
queue time by work center;
rework loops;
scrap correlation with machine, shift, material lot, or operator;
divergence between planned and actual routing;
bottlenecks across production cells;
impact of maintenance events on order completion.
The key technical challenge is synchronization across time domains. ERP, MES, SCADA, and historian systems do not necessarily share the same event granularity or timestamp semantics.
Finance and working capital
Finance processes are attractive because their economic effect can be quantified directly.
Applications include:
invoice processing;
accounts payable;
accounts receivable;
dispute management;
cash application;
closing processes;
intercompany reconciliation.
Metrics include:
invoice cycle time;
blocked-invoice aging;
discount capture rate;
payment-term compliance;
overdue receivables;
dispute resolution time;
manual journal-entry frequency;
month-end closing bottlenecks.
The value is not only operational efficiency. It is liquidity, risk reduction, compliance, and working-capital optimization.
Compliance, internal control, and audit
Process mining can support internal controls by replacing sampling-based evidence with population-level behavioral analysis.
Examples include:
segregation-of-duties violations;
skipped approvals;
order changes after approval;
payments without proper invoice match;
manual price overrides;
goods receipt after invoice;
supplier-bank-account changes before payment;
emergency-user activity;
retrospective purchase-order creation.
The relevant mathematical concept is not discovery but conformance. The control model defines admissible behavior. The event log shows observed behavior. Deviations can be measured, ranked, and assigned to owners.
This does not eliminate audit judgment. It changes the empirical basis of audit judgment.
Digital transformation and ERP migration
In ERP programs, process mining can be used before, during, and after transformation.
Before transformation, it reconstructs the as-is process from actual data. This avoids designing the target model from workshop narratives alone.
During transformation, it supports fit-gap analysis. If the actual process contains many variants, the transformation team can distinguish necessary variants from historical noise.
After go-live, it measures adoption. The organization can test whether users follow the new process, whether legacy workarounds reappear, whether master-data issues create new bottlenecks, and whether local entities diverge from the template.
The most rigorous use is to treat process mining as a transformation-control system. The target operating model becomes a formal model M. The live ERP event log becomes L_t. Adoption is measured as a time-dependent conformance function:
A(t) = conformance(L_t, M)
where A(t) should increase after stabilization if the transformation is effective.
Limits and failure modes
Process mining is powerful, but its limits are structural:
It cannot see events that are not recorded. If a decision occurs in a phone call, spreadsheet, email, or informal meeting, the process model will only see its downstream trace.
It cannot infer semantics from identifiers alone. A timestamped status change does not explain why the status changed.
It can confuse correlation with causation. If late deliveries correlate with a certain warehouse, the warehouse may be the cause, or it may simply handle more complex orders.
Process mining can produce false managerial confidence. A graph generated from poor event abstraction may look objective while encoding wrong assumptions.
Excessive granularity can obscure process logic. Not every technical event should become a business event. The event model must be abstract enough to represent business semantics and concrete enough to preserve causal evidence.
Privacy and surveillance risks must be governed. The closer process mining moves toward user-level task mining or desktop interaction capture, the more it enters the domain of labor monitoring, privacy regulation, employee relations, and proportionality analysis.
Event logs may encode organizational bias. If a system records only formal approvals, only successful transactions, or only activities executed by certain roles, the resulting process model may overrepresent governed behavior and underrepresent informal recovery work, shadow processes, or exception handling. The observed log is therefore not the process itself; it is the projection of the process through the instrumentation layer of the enterprise.
Celonis as a commercial instantiation
Celonis can be understood as a commercial platform that industrializes the architecture described above. It is not merely a process-discovery tool. It packages process-data ingestion, process modeling, object-centric process analysis, dashboards, business applications, task mining, automation, and a process-intelligence semantic layer into a managed enterprise platform.
Celonis currently presents its platform around the concept of a Process Intelligence Graph. This graph combines data from operational systems, applications, and devices with business context such as rules, KPIs, benchmarks, models, and enterprise architecture, with the objective of creating a system-agnostic digital twin of enterprise operations.7
From a technical perspective, the most important architectural point is the shift from isolated, case-centric process logs toward an object-centric data model. Celonis documentation describes the object-centric model as a unified structure of business objects, events, and relationships, avoiding the need to force complex enterprise processes into a single case notion.8
This direction is coherent with the mathematical limitations of classical process mining. Real ERP processes are not linear traces attached to one identifier. They are networks of interacting objects: orders, order lines, deliveries, invoices, payments, materials, suppliers, customers, plants, warehouses, and users. An object-centric architecture is therefore not only a product feature. It is a better computational approximation of enterprise reality.
Celonis also extends process mining toward task mining. In this layer, user interactions with applications and web pages can be captured and analyzed, subject to configuration and governance controls.9 This moves the analytical boundary from transactional system events to human-computer interaction traces. The gain is visibility into manual work. The risk is that the system moves closer to employee-monitoring infrastructure and therefore requires strict privacy, labor-law, and governance assessment.
Another relevant extension is the action layer. Celonis Action Flows are designed to create automations and combine them into processes.10 This changes the role of process mining from descriptive analytics to operational intervention. The architecture becomes a closed loop:
The strategic implication is that Celonis should not be evaluated only as a visualization tool. It should be evaluated as a process-intelligence operating layer over enterprise systems. The hard questions are therefore architectural and organizational:
Which systems are authoritative for process events?
Which business objects define the object-centric model?
Which KPIs are governed and reusable?
Which deviations are merely descriptive and which trigger action?
Which actions can be automated without violating control principles?
Which process owners are accountable for model correctness?
Which parts of the organization are allowed to see user-level or resource-level data?
Which process improvements are economically material?
In this sense, Celonis is best interpreted as an enterprise process-intelligence platform rather than as a simple process-mining application. Its value depends less on the beauty of discovered process maps and more on the quality of the semantic layer, the correctness of event extraction, the maturity of process ownership, and the ability to convert evidence into controlled operational change.
Process mining in enterprise architecture and digital transformation
Process mining should not be interpreted as an isolated analytical technique. In a mature enterprise, it is more correctly understood as a computational layer of enterprise architecture: a mechanism that connects operating-model intent, information-system behavior, organizational accountability, and measurable execution evidence.
Enterprise architecture traditionally represents the enterprise through relatively stable artifacts: capability maps, process models, application landscapes, information models, integration views, organizational structures, and governance principles. These artifacts describe what the enterprise is supposed to be. They define the intentional architecture.
Process mining adds a complementary dimension. It reconstructs the behavioral architecture of the enterprise from event data. It shows how the enterprise actually executes, where control paths diverge from designed procedures, where systems introduce latency, where local variants emerge, and where organizational units use the same application landscape in materially different ways.
The relation can be expressed as follows:
EA_{intent} = (C, P, A, D, O, G)
where:
C is the capability architecture;
P is the process architecture;
A is the application architecture;
D is the data architecture;
O is the organizational architecture;
G is the governance architecture.
Process mining introduces an empirical behavioral layer:
EA_{observed} = f(E)
where E is the event structure extracted from operational systems.
The architectural problem is then not merely to discover a process model. The real problem is to compare intentional architecture with observed architecture:
\Delta EA = EA_{observed} - EA_{intent}
This difference is one of the most important objects of digital transformation. It identifies where the designed enterprise and the executed enterprise diverge.
%%{init: {"theme": "neo", "look": "handDrawn", "layout": "elk"}}%%
flowchart LR
subgraph INTENT["Intentional Enterprise Architecture"]
C["Capability architecture"]
P["Process architecture"]
A["Application architecture"]
D["Data architecture"]
O["Organizational architecture"]
G["Governance architecture"]
end
subgraph OBS["Observed Behavioral Architecture"]
E["Event data"]
PM["Process mining model"]
K["KPIs, variants, deviations, bottlenecks"]
end
C --> DELTA["ΔEA<br/>Gap between designed and executed enterprise"]
P --> DELTA
A --> DELTA
D --> DELTA
O --> DELTA
G --> DELTA
E --> PM --> K --> DELTA
DELTA --> T["Transformation backlog"]
T --> R["Process redesign"]
T --> S["System configuration"]
T --> MD["Master-data remediation"]
T --> AU["Automation"]
T --> GOV["Governance intervention"]
Figure 5: Process mining as an enterprise-architecture feedback mechanism. Intentional architecture defines the designed enterprise, while observed behavioral architecture reconstructs execution from event data. The gap between the two feeds the transformation backlog across process redesign, system configuration, master-data remediation, automation, and governance intervention.
Capability architecture
At the capability level, process mining helps determine whether a business capability is only formally present or operationally mature.
For example, an organization may declare that it has a centralized procurement capability. However, event data may show that purchase orders are frequently created after invoice receipt, that approval paths differ by plant, that contracts are bypassed, or that suppliers are selected outside governed catalogues.
In architectural terms, the capability exists nominally but is not behaviorally instantiated in a controlled and repeatable way.
A capability can therefore be evaluated through process evidence:
Maturity(capability) = g(conformance, stability, cycle\ time, exception\ rate, automation\ rate)
This is more rigorous than assigning maturity through interviews alone. The maturity assessment becomes partially grounded in observed execution.
Process architecture
At the process-architecture level, process mining validates whether documented processes correspond to real operational flows.
The usual enterprise-architecture repository contains designed process models. These models are frequently normative, incomplete, outdated, or over-standardized. Process mining provides an empirical counter-model.
The designed process is:
P_{design}
The mined process is:
P_{observed}
The transformation-relevant object is the distance:
d(P_{design}, P_{observed})
This distance can represent skipped steps, additional activities, loops, local variants, timing deviations, control violations, or different routing patterns.
This matters because digital transformation often fails when the target process is designed as if the current process were already understood. Process mining reduces that epistemic risk. It does not eliminate judgment, but it constrains speculation by requiring evidence from the systems that execute the process.
Application architecture
At the application-architecture level, process mining reveals how systems actually participate in end-to-end execution.
Enterprise architecture diagrams often represent applications as boxes connected by interfaces. This is structurally useful but behaviorally incomplete. It says that two systems are integrated. It does not show whether the integration causes waiting time, rework, duplication, manual reconciliation, or inconsistent state propagation.
Process mining can expose application-level friction such as:
excessive waiting time between CRM quotation and ERP sales-order creation;
manual corrections after system-to-system transfer;
repeated status reversals after middleware failures;
invoice blocks caused by inconsistent master data;
production delays caused by ERP/MES synchronization gaps;
warehouse execution deviations caused by incomplete order attributes.
The application landscape can therefore be evaluated not only by technical ownership or interface count, but by behavioral contribution to process performance.
A useful architectural metric is the latency introduced by an application boundary:
Latency(A_i,A_j) = \mathbb{E}[t(e_j) - t(e_i)]
where e_i is the last relevant event in application A_i and e_j is the first corresponding event in application A_j.
This allows integration architecture to be evaluated empirically. A system boundary is not only a diagrammatic relation. It is a measurable delay, risk, and control point.
Data architecture and semantic governance
Process mining depends on the correctness of event semantics. For this reason, it is tightly connected to enterprise data architecture.
The same operational fact may be represented differently across ERP, CRM, MES, WMS, PLM, workflow systems, and data warehouses. A process-mining layer must decide which records become events, which timestamps are authoritative, which identifiers correlate objects, and which attributes define the analytical context.
This makes process mining a semantic-governance problem.
The event-construction function:
\phi : D \rightarrow E
must be governed as an enterprise artifact. It is not a technical detail. It defines the empirical reality that the organization will analyze.
If \phi is unstable, process metrics become unstable. If \phi is semantically wrong, dashboards become misleading. If \phi is locally customized without governance, cross-company or cross-country comparisons become invalid.
Therefore, a process-intelligence architecture requires:
governed event definitions;
canonical business-object identifiers;
lineage from source records to analytical events;
versioned transformation logic;
ownership of process metrics;
explicit treatment of time zones, reversals, cancellations, and status changes;
clear distinction between technical events and business events.
In digital transformation, this is critical. Without semantic governance, process mining becomes another reporting layer. With semantic governance, it becomes part of the enterprise operating model.
Operating model and organizational accountability
A process does not belong only to an application. It belongs to an operating model.
The same process may cross sales, finance, logistics, procurement, production, customer service, compliance, and external partners. Process mining exposes these crossings. It shows where ownership is fragmented, where handovers create delay, where exception handling is informal, and where local autonomy contradicts global standardization.
This has a direct consequence for transformation governance. Every relevant process metric should have an owner. Every relevant deviation class should have an accountable function. Every recurring exception should be classified as either:
a legitimate business variant;
a defect in the process design;
a defect in master data;
a defect in system configuration;
a training or adoption issue;
a control violation;
a necessary exception requiring explicit governance.
This classification is architectural. It determines whether the answer is process redesign, system change, master-data remediation, automation, organizational intervention, or policy enforcement.
ERP transformation and template governance
In ERP programs, process mining is especially valuable because it creates continuity between the as-is state, the target template, and post-go-live stabilization.
Before implementation, it supports empirical as-is discovery. This is useful because workshop-based process collection tends to overstate formal behavior and understate exceptions.
During design, it supports template rationalization. A global ERP template should not encode every local historical variant. However, it should not suppress legitimate legal, fiscal, logistic, or commercial differences. Process mining helps distinguish structural variation from accidental variation.
After go-live, it supports adoption control. The target template can be formalized as a reference model M_T. The live event log after deployment is L_t. Adoption can then be measured as:
Adoption(t) = conformance(L_t, M_T)
This allows the transformation office to observe whether the organization is converging toward the target operating model or recreating old behaviors inside the new system.
The same principle applies to phased rollouts. If different legal entities, plants, or countries adopt the same ERP template, process mining can compare behavioral convergence:
d(L_i, L_j)
where L_i and L_j are event logs from different organizational units.
Low distance suggests template convergence. High distance requires explanation. It may indicate legitimate localization, poor adoption, configuration divergence, different data quality, or process ownership gaps.
Automation and architectural control
Process mining often leads naturally to automation. If a repeated exception is detected, the organization may want to trigger a workflow, create a task, send a notification, update a field, release a block, or call an external service.
This must be treated carefully. Automation is not automatically transformation. Automating an incoherent process may only accelerate incoherence.
From an architectural perspective, automation should be introduced only when the following conditions hold:
The action layer should therefore be governed by enterprise architecture principles:
clear ownership of automated decisions;
separation between recommendation and execution;
auditability of actions;
compatibility with internal controls;
rollback and exception handling;
security and access-control design;
avoidance of uncontrolled point-to-point automation;
alignment with the canonical process and data model.
This is especially relevant in ERP landscapes, where apparently simple automations can affect financial postings, inventory positions, customer commitments, supplier obligations, credit exposure, or compliance controls.
Process mining as a transformation control system
The strongest architectural interpretation is to treat process mining as a control system for digital transformation.
A transformation defines a desired future state:
S^*
The enterprise produces observed behavior over time:
S(t)
Process mining measures the distance:
d(S(t), S^*)
Transformation management then acts to reduce that distance, subject to constraints of cost, risk, compliance, organizational capacity, and strategic intent.
This is fundamentally different from static transformation governance. A steering committee that only checks milestones, budgets, and deliverables may know whether the project is progressing. It may not know whether the operating model is actually changing. Process mining supplies this missing empirical layer.
In this sense, process mining is not merely a tool for process improvement. It is an architectural instrument for making digital transformation observable, measurable, and governable.
It connects the formal architecture of the enterprise with its behavioral execution. It converts operational traces into evidence. It transforms process diagrams from static descriptions into hypotheses that can be tested against reality. And it allows digital transformation to be managed not only as system deployment, but as measurable change in the way the enterprise actually works.
Conclusion
Process mining is the computational reconstruction of operational behavior from event data. Its foundations are simple: events, time, ordering, correlation, and process semantics. Its implementation is difficult because enterprise reality is not a clean sequence of activities. It is a multi-object, multi-system, multi-agent, temporally distributed structure.
The technical core consists of event-log construction, process discovery, variant analysis, conformance checking, performance mining, object-centric modeling, and operational intervention. The managerial value emerges when these capabilities are connected to concrete enterprise questions: working capital, delivery reliability, procurement control, production efficiency, ERP adoption, compliance, and transformation governance.
The mathematically disciplined view is therefore the safest one. Process mining should not be treated as automatic truth extraction. It should be treated as a formal inference pipeline:
D \xrightarrow{\phi} E \xrightarrow{mine} M \xrightarrow{evaluate} I \xrightarrow{govern} A
where:
D is raw operational data;
\phi is the semantic event-construction function;
E is the event log or object-centric event structure;
M is the discovered or normative process model;
I is the set of insights, deviations, bottlenecks, and predictions;
A is the set of governed actions.
Every arrow in this chain can fail. Raw data can be incomplete. Event construction can be semantically wrong. Models can overfit. Deviations can be misinterpreted. Actions can optimize local metrics while damaging the global system.
Used rigorously, however, process mining provides something that conventional process analysis rarely achieves: a measurable, reproducible, and continuously refreshable representation of how the enterprise actually operates.
IEEE Task Force on Process Mining, (2012). Process Mining Manifesto. Webpage. From the manifesto: The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today’s (information) systems. Process mining includes (automated) process discovery (i.e., extracting process models from an event log), conformance checking (i.e., monitoring deviations by comparing model and log), social network/organizational mining, automated construction of simulation models, model extension, model repair, case prediction, and history-based recommendations.↩︎
van der Aalst, W. (2016). Process mining: Data science in action (2nd ed.). Springer Berlin Heidelberg. DOI↩︎
XES, the eXtensible Event Stream standard, is a standard format for representing and exchanging event logs and event streams in process mining. The current IEEE version is IEEE 1849-2023, IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams. See also the XES standardization initiative website.↩︎
A Petri net is a formal model for representing discrete-event systems, especially systems with concurrency, synchronization, conflict, and resource constraints. It is usually defined as a bipartite graph composed of places, transitions, and directed arcs. Places may contain tokens, and the distribution of tokens across places is called a marking, which represents the current state of the system. A transition is enabled when its input places contain the required tokens; when it fires, it consumes tokens from its input places and produces tokens in its output places. In process mining, Petri nets are often used because they provide precise execution semantics for process models, making it possible to reason formally about reachability, deadlocks, parallelism, loops, and conformance between an observed event log and a normative process model. See: Murata, T. (1989). Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4), 541–580. DOI↩︎
Little’s Law is a fundamental result of queueing theory. In its standard form, it states that the long-run average number of items in a stable system, usually denoted by L, is equal to the long-run average arrival rate, \lambda, multiplied by the average time an item spends in the system, W: L = \lambda W. In process mining and operations analysis, this relation is useful because it links work in progress, throughput, and cycle time. See: Little, J. D. C. (1961). A proof for the queuing formula: L = \lambda W. Operations Research, 9(3), 383–387. DOI↩︎
OCEL 2.0 is a standard format for representing object-centric event logs, where events may be related to multiple business objects rather than to a single case identifier. This makes it suitable for complex enterprise processes involving interacting objects such as orders, deliveries, invoices, payments, materials, and resources. See: Berti, A., Koren, I., Adams, J. N., Park, G., Knopp, B., Graves, N., Rafiei, M., Liß, L., Tacke Genannt Unterberg, L., Zhang, Y., Schwanen, C., Pegoraro, M., & van der Aalst, W. M. P. (2023). OCEL (Object-Centric Event Log) 2.0 specification (Version 2.0). Chair of Process and Data Science, RWTH Aachen University. PDF↩︎