Lineage in Flink Part 1: Introduction
In the ever growing world of data processing, where information moves rapidly through intricate systems, understanding the path
Monitor data applications smarter with Apache Flink® and Datorios. The first observability platform that uses metrics, logs, and tracing to diagnose real-time data, fix hidden issues, and save the day.
Trace data across your Flink pipeline. Monitor transformations and identify where data changes occur
for integrity, auditing, and root-cause analysis.
Get a high-level overview of Apache Flink jobs. Analyze session data, access detailed statistics, and track records processed to improve job performance.
Gain real-time insights into resource utilization, throughput, and latency. Leverage Datorios’ custom metrics to identify bottlenecks and optimize execution. Drill down for fast troubleshooting.
Visualize various time windows like tumbling time/count, sliding, or session windows. Identify the events that occur within a given window and detect missing events for a comprehensive view of your data.
Track events on a timeline graph, allowing for detailed analysis of each event’s journey during processing. Zoom in to specific timeframes for focused examination.
Monitor state evolution throughout operators event processing. Easily identify state changes occurring during processing.
Utilize a versatile query language to extract specific operator states or event data, filter by values or keys, and target any processing period. Mark events for easy identification, such as flagging “late arrivals” within a defined timeframe.
In the ever growing world of data processing, where information moves rapidly through intricate systems, understanding the path
In my previous article, we dove head first together into the hows, whats, and whys of Data Lineage
In this follow-up article (see part 1), building on my initial explorations with Apache Flink, I aim to dive into
Datorios extracts certain client statistical information. All customer data remains visible only to the customer.
The Datorios client collects record metadata, state data, window details, along with Logs and Metrics produced by Flink.
The data collected by the Datorios client is either hashed or encrypted based on your preference, and then it is uploaded to Datorios using SSL to our internal backend.
Yes, you can permanently delete any job and its associated data by navigating to the job screen and clicking the trashcan icon. To remove your organization, go to the organization screen and click the wheel cog at the top right corner.
Datorios supports Java, Python, and Scala.
Datorios supports Flink version 1.6.x and above. For specific requirements, please contact us.
Yes, Datorios allows you to run in either session mode or application mode.
Datorios supports installation via Docker and Kubernetes. For other installation types, please contact us.
Datorios continuously collects metrics and logs, similar to other observability products like Datadog and Prometheus. Tracing data is collected only when triggered by specified metrics exceeding thresholds.
Yes.
Yes.
No, Datorios uses a cache that continuously overwrites itself until a critical event is recognized. Only three minutes of data before a critical event are stored.
Yes, click here to get started.
Yes, please refer to our GitHub and the preloaded jobs available in the application.
Fill out the short form below