The simulation I'm running has a fixed-step solver with a fixed-step-size of 5e-4. The sample time of my DQN-Agent (and the corresponding S-function for the reward-signal) is 0.25.
How is it possible that after a simulation time of 20 seconds I have a BufferLength of ~1600 samples? I hope you can enlighten me...
Is it possible to look into the ExperienceBuffer? As impressed as I am by the RL-Toolbox, I would really prefer it not to be such a blackbox in most cases.