Text this: Cells are initially detected in frame t<sub>0</sub> and then tracked to frame t<sub>1</sub>. Object queries are processed in the first layer of the decoder for object detection. In the subsequent decoder layers, track queries are concatenated with the object queries to perform tracking.