The details of the publicly available dataset.

<div><p>Knowledge tracing can reveal students’ level of knowledge in relation to their learning performance. Recently, plenty of machine learning algorithms have been proposed to exploit to implement knowledge tracing and have achieved promising outcomes. However, most of the previous ap...

Full description

Saved in:
Bibliographic Details
Main Author: Ailian Gao (20629841) (author)
Other Authors: Zenglei Liu (20629838) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852016801639038976
author Ailian Gao (20629841)
author2 Zenglei Liu (20629838)
author2_role author
author_facet Ailian Gao (20629841)
Zenglei Liu (20629838)
author_role author
dc.creator.none.fl_str_mv Ailian Gao (20629841)
Zenglei Liu (20629838)
dc.date.none.fl_str_mv 2025-09-09T17:32:49Z
dc.identifier.none.fl_str_mv 10.1371/journal.pone.0330433.t001
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/The_details_of_the_publicly_available_dataset_/30088486
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Cancer
Science Policy
Biological Sciences not elsewhere classified
students &# 8217
integrate temporal information
conducted comparison experiments
bidirectional lstm model
series forecasting pipeline
machine learning algorithms
proposed lstkt model
proposed informer model
publicly available dataset
individual knowledge states
informer </ p
achieved promising outcomes
short sequence prediction
probability sparse self
implement knowledge tracing
long sequence time
sparse self
series prediction
knowledge tracing
sequence time
tracing studies
time stamps
time stamp
ednet dataset
assistments2017 dataset
assistments2009 dataset
current knowledge
target exercises
previous approaches
learning performance
extensively utilized
existing models
exercising recordings
decoder architecture
canonical encoder
attention module
attention mechanism
answering records
82 %.
81 %.
dc.title.none.fl_str_mv The details of the publicly available dataset.
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description <div><p>Knowledge tracing can reveal students’ level of knowledge in relation to their learning performance. Recently, plenty of machine learning algorithms have been proposed to exploit to implement knowledge tracing and have achieved promising outcomes. However, most of the previous approaches were unable to cope with long sequence time-series prediction, which is more valuable than short sequence prediction that is extensively utilized in current knowledge-tracing studies. In this study, we propose a long-sequence time-series forecasting pipeline for knowledge tracing that leverages both time stamp and exercise sequences. Firstly, we introduce a bidirectional LSTM model to tackle the embeddings of exercise-answering records. Secondly, we incorporate both the students’ exercising recordings and the time stamps into a vector for each record. Next, a sequence of vectors is taken as input for the proposed Informer model, which utilizes the probability-sparse self-attention mechanism. Note that the probability sparse self-attention module can address the quadratic computational complexity issue of the canonical encoder-decoder architecture. Finally, we integrate temporal information and individual knowledge states to implement the answers to a sequence of target exercises. To evaluate the performance of the proposed LSTKT model, we conducted comparison experiments with state-of-the-art knowledge tracing algorithms on a publicly available dataset. This model demonstrates quantitative improvements over existing models. In the Assistments2009 dataset, it achieved an accuracy of 78.49% and an AUC of 78.81%. For the Assistments2017 dataset, it reached an accuracy of 74.22% and an AUC of 72.82%. In the EdNet dataset, it attained an accuracy of 68.17% and an AUC of 70.78%.</p></div>
eu_rights_str_mv openAccess
id Manara_157ecbb77e85febc420b48a6c19d232a
identifier_str_mv 10.1371/journal.pone.0330433.t001
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/30088486
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling The details of the publicly available dataset.Ailian Gao (20629841)Zenglei Liu (20629838)CancerScience PolicyBiological Sciences not elsewhere classifiedstudents &# 8217integrate temporal informationconducted comparison experimentsbidirectional lstm modelseries forecasting pipelinemachine learning algorithmsproposed lstkt modelproposed informer modelpublicly available datasetindividual knowledge statesinformer </ pachieved promising outcomesshort sequence predictionprobability sparse selfimplement knowledge tracinglong sequence timesparse selfseries predictionknowledge tracingsequence timetracing studiestime stampstime stampednet datasetassistments2017 datasetassistments2009 datasetcurrent knowledgetarget exercisesprevious approacheslearning performanceextensively utilizedexisting modelsexercising recordingsdecoder architecturecanonical encoderattention moduleattention mechanismanswering records82 %.81 %.<div><p>Knowledge tracing can reveal students’ level of knowledge in relation to their learning performance. Recently, plenty of machine learning algorithms have been proposed to exploit to implement knowledge tracing and have achieved promising outcomes. However, most of the previous approaches were unable to cope with long sequence time-series prediction, which is more valuable than short sequence prediction that is extensively utilized in current knowledge-tracing studies. In this study, we propose a long-sequence time-series forecasting pipeline for knowledge tracing that leverages both time stamp and exercise sequences. Firstly, we introduce a bidirectional LSTM model to tackle the embeddings of exercise-answering records. Secondly, we incorporate both the students’ exercising recordings and the time stamps into a vector for each record. Next, a sequence of vectors is taken as input for the proposed Informer model, which utilizes the probability-sparse self-attention mechanism. Note that the probability sparse self-attention module can address the quadratic computational complexity issue of the canonical encoder-decoder architecture. Finally, we integrate temporal information and individual knowledge states to implement the answers to a sequence of target exercises. To evaluate the performance of the proposed LSTKT model, we conducted comparison experiments with state-of-the-art knowledge tracing algorithms on a publicly available dataset. This model demonstrates quantitative improvements over existing models. In the Assistments2009 dataset, it achieved an accuracy of 78.49% and an AUC of 78.81%. For the Assistments2017 dataset, it reached an accuracy of 74.22% and an AUC of 72.82%. In the EdNet dataset, it attained an accuracy of 68.17% and an AUC of 70.78%.</p></div>2025-09-09T17:32:49ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pone.0330433.t001https://figshare.com/articles/dataset/The_details_of_the_publicly_available_dataset_/30088486CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/300884862025-09-09T17:32:49Z
spellingShingle The details of the publicly available dataset.
Ailian Gao (20629841)
Cancer
Science Policy
Biological Sciences not elsewhere classified
students &# 8217
integrate temporal information
conducted comparison experiments
bidirectional lstm model
series forecasting pipeline
machine learning algorithms
proposed lstkt model
proposed informer model
publicly available dataset
individual knowledge states
informer </ p
achieved promising outcomes
short sequence prediction
probability sparse self
implement knowledge tracing
long sequence time
sparse self
series prediction
knowledge tracing
sequence time
tracing studies
time stamps
time stamp
ednet dataset
assistments2017 dataset
assistments2009 dataset
current knowledge
target exercises
previous approaches
learning performance
extensively utilized
existing models
exercising recordings
decoder architecture
canonical encoder
attention module
attention mechanism
answering records
82 %.
81 %.
status_str publishedVersion
title The details of the publicly available dataset.
title_full The details of the publicly available dataset.
title_fullStr The details of the publicly available dataset.
title_full_unstemmed The details of the publicly available dataset.
title_short The details of the publicly available dataset.
title_sort The details of the publicly available dataset.
topic Cancer
Science Policy
Biological Sciences not elsewhere classified
students &# 8217
integrate temporal information
conducted comparison experiments
bidirectional lstm model
series forecasting pipeline
machine learning algorithms
proposed lstkt model
proposed informer model
publicly available dataset
individual knowledge states
informer </ p
achieved promising outcomes
short sequence prediction
probability sparse self
implement knowledge tracing
long sequence time
sparse self
series prediction
knowledge tracing
sequence time
tracing studies
time stamps
time stamp
ednet dataset
assistments2017 dataset
assistments2009 dataset
current knowledge
target exercises
previous approaches
learning performance
extensively utilized
existing models
exercising recordings
decoder architecture
canonical encoder
attention module
attention mechanism
answering records
82 %.
81 %.