The problem is defined in p.214 of Sutton's book (Reinforcement Learning). In this implementation, the system is considered as a deterministic system.
The goal is to minimize the cost, which is the time needed to go to the target.
The cost is defined as '+1' in entire space of positions and velocities. Except at the target (p >= 0.5), where the cost is defined as '0'.