Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink

Big Data Stream processing engines such as Apache Flink use windowing techniques to handle unbounded streams of events. Gathering all perti nent input within a window is crucial for event time windowing since it affects how accurate results are. A significant part of this process is played by waterm...

Full description

Saved in:
Bibliographic Details
Main Author: Yasser, Tawfik (author)
Other Authors: Arafa, Tamer (author), El-Helw, Mohamed (author), Awad, Ahmed (author)
Published: 2023
Subjects:
Online Access:https://bspace.buid.ac.ae/handle/1234/2938
https://ieeexplore.ieee.org/document/10296717
https://doi.org/10.1109/NILES59815.2023.10296717
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1862980617207021568
author Yasser, Tawfik
author2 Arafa, Tamer
El-Helw, Mohamed
Awad, Ahmed
author2_role author
author
author
author_facet Yasser, Tawfik
Arafa, Tamer
El-Helw, Mohamed
Awad, Ahmed
author_role author
dc.creator.none.fl_str_mv Yasser, Tawfik
Arafa, Tamer
El-Helw, Mohamed
Awad, Ahmed
dc.date.none.fl_str_mv 2023
2025-05-06T10:19:03Z
2025-05-06T10:19:03Z
dc.identifier.none.fl_str_mv Yasser, T. et al. (2023) “Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink,” in 2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES), pp. 23–28.
https://bspace.buid.ac.ae/handle/1234/2938
https://ieeexplore.ieee.org/document/10296717
https://doi.org/10.1109/NILES59815.2023.10296717
dc.language.none.fl_str_mv en
dc.publisher.none.fl_str_mv IEEE
dc.relation.none.fl_str_mv Proceedings of NILES2023: 5th Novel Intelligent and Leading Emerging Sciences Conference
dc.subject.none.fl_str_mv Keyed Watermarks,Big Data Stream Processing,Event-Time Tracking,Apache Flink
dc.title.none.fl_str_mv Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink
dc.type.none.fl_str_mv Article
description Big Data Stream processing engines such as Apache Flink use windowing techniques to handle unbounded streams of events. Gathering all perti nent input within a window is crucial for event time windowing since it affects how accurate results are. A significant part of this process is played by watermarks, which are unique timestamps that show the passage of events in time. However, the current watermark generation method in Apache Flink, which works at the level of the input stream, tends to favor faster sub-streams, resulting in dropped events from slower sub-streams. In our analysis, we found that Apache Flink’s vanilla watermark generation approach caused around 33% loss of data if 50% of the keys around the median are delayed. Furthermore, the loss surpassed 37% when 50% of random keys are delayed. In this paper, we present a novel strategy called keyed watermarks to overcome data loss and increase the accuracy of data processing to at least 99% in most cases. We enable separate progress tracking by creating a unique watermark for each logical sub stream (key). In our study, we outline the architec tural and API changes necessary to implement keyed watermarks and discuss our experience in extending Apache Flink’s enormous code base. Additionally, we compare the effectiveness of our strategy against the conventional watermark generation method in terms of the accuracy of event-time tracking. Index Terms—Keyed Watermarks, Big Data Stream Processing, Event-Time Tracking, Apache Flink.
id budr_23fb760cf49d667964a6ecdcbd6c7f35
identifier_str_mv Yasser, T. et al. (2023) “Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink,” in 2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES), pp. 23–28.
language_invalid_str_mv en
network_acronym_str budr
network_name_str The British University in Dubai repository
oai_identifier_str oai:bspace.buid.ac.ae:1234/2938
publishDate 2023
publisher.none.fl_str_mv IEEE
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache FlinkYasser, TawfikArafa, TamerEl-Helw, MohamedAwad, AhmedKeyed Watermarks,Big Data Stream Processing,Event-Time Tracking,Apache FlinkBig Data Stream processing engines such as Apache Flink use windowing techniques to handle unbounded streams of events. Gathering all perti nent input within a window is crucial for event time windowing since it affects how accurate results are. A significant part of this process is played by watermarks, which are unique timestamps that show the passage of events in time. However, the current watermark generation method in Apache Flink, which works at the level of the input stream, tends to favor faster sub-streams, resulting in dropped events from slower sub-streams. In our analysis, we found that Apache Flink’s vanilla watermark generation approach caused around 33% loss of data if 50% of the keys around the median are delayed. Furthermore, the loss surpassed 37% when 50% of random keys are delayed. In this paper, we present a novel strategy called keyed watermarks to overcome data loss and increase the accuracy of data processing to at least 99% in most cases. We enable separate progress tracking by creating a unique watermark for each logical sub stream (key). In our study, we outline the architec tural and API changes necessary to implement keyed watermarks and discuss our experience in extending Apache Flink’s enormous code base. Additionally, we compare the effectiveness of our strategy against the conventional watermark generation method in terms of the accuracy of event-time tracking. Index Terms—Keyed Watermarks, Big Data Stream Processing, Event-Time Tracking, Apache Flink.IEEE2025-05-06T10:19:03Z2025-05-06T10:19:03Z2023ArticleYasser, T. et al. (2023) “Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink,” in 2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES), pp. 23–28.https://bspace.buid.ac.ae/handle/1234/2938https://ieeexplore.ieee.org/document/10296717https://doi.org/10.1109/NILES59815.2023.10296717enProceedings of NILES2023: 5th Novel Intelligent and Leading Emerging Sciences Conferenceoai:bspace.buid.ac.ae:1234/29382025-08-13T07:21:42Z
spellingShingle Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink
Yasser, Tawfik
Keyed Watermarks,Big Data Stream Processing,Event-Time Tracking,Apache Flink
title Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink
title_full Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink
title_fullStr Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink
title_full_unstemmed Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink
title_short Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink
title_sort Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink
topic Keyed Watermarks,Big Data Stream Processing,Event-Time Tracking,Apache Flink
url https://bspace.buid.ac.ae/handle/1234/2938
https://ieeexplore.ieee.org/document/10296717
https://doi.org/10.1109/NILES59815.2023.10296717