Jump(跳一跳) with ML-Agents

COMPLETED

Community Choice Award

1 AWARD

Article

Jump(跳一跳) with ML-Agents

Updated 6 years ago

大智

你的Unity技术探路者/洪流学堂公众号主理人/XR创业者/VX：zhz11235 - Student

Jump(跳一跳) with ML-Agents

欢迎关注“洪流学堂”微信公众号，后续将会分享所有开发过程以及遇到的各种坑~
Final Result
 

The Jump(跳一跳) Game
This is a popular lite game in China recently in WeChat App and this is a Unity version.
 
There is a WebGL version for fun. 
How to play:
Hold space button and release after a while and the time you hold decides how far the player jumps.
+1 score every time you jump to the next box. If not, game over.
If jump to the center of the next box you get score +2, and if this happens continuously the score will multiply by 2, such as +2, +4, +8, +16, etc.
How Curriculum Learning Used in This Project
A detailed training process article in Chinese is coming soon.
There are three variables defining the difficulty of this game.
The random distance between next box and current one
The random scale of the next box
The random direction of box spawning
The curriculum file format is defined as follow:
{
    "measure" : "reward",
    "thresholds" : [5,5,5,5,5,5,5,5,5,5,5,5,5,5,5],
    "min_lesson_length" : 2,
    "signal_smoothing" : true, 
    "parameters" : 
    {
        "max_distance" : [1.2,1.2,1.2,1.2,1.5,1.5,1.5,1.5,2,2,2,2,3,3,3,3],
        "min_scale" :    [1,1,0.5,0.5,1,1,0.5,0.5,1,1,0.5,0.5,1,1,0.5,0.5],
        "random_direction":[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1]
    }
}
max distance: the max distance between next box and current box, and the min distance is fixed to 1.1
min scale: the min scale of the next box, and the max scale is fixed to 1
random direction:  if random direction is activated
Brief Training Process - Model#1
In this model, the reward system is +1 when the player jumps to the next box and -1 when the player jump to ground.
Using curriculum file:
{
    "measure" : "reward",
    "thresholds" : [2,3,2,3,2,3,2,3,2,3,2,3,2,3,2],
    "min_lesson_length" : 2,
    "signal_smoothing" : true, 
    "parameters" : 
    {
        "max_distance" : [1.2,1.2,1.2,1.2,1.5,1.5,1.5,1.5,2,2,2,2,3,3,3,3],
        "min_scale" :    [1,1,0.5,0.5,1,1,0.5,0.5,1,1,0.5,0.5,1,1,0.5,0.5],
        "random_direction":[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1]
    }
}
 
At first, the curriculum file was not defined as above. The first max distance was set to 1.1 equal to the min distance  and threshold set to 10. 
After a while I realized that it may over-fit the spaced distance of boxes. Then the first max distance was set to 1.2 and threshold set to 5.
After a long time training the lesson was not switching. I checked issues of GitHub and got that academy needs to be set done or max steps not 0. Then the academy was set to done when game over.
Long training process followed...
Brief Training Process - Model#2
The model above performed not well. I began thinking how to simplify the model to make it learn easier.
I did the followings to speed up the training:
Reconstruct the scene to make multiple games run at the same time
reduce state size to 2 including the distance from next box to player and the size of next box
remove random direction for training. As the input is distance now, the direction doesn't matter
In this model, the reward system is +0.5 when the player jumps to the next box, +1 when the player jumps to the center of the next box,  -1 when the player jump to ground.
{
    "measure" : "reward",
    "thresholds" : [5,5,5,5,5,5,5,5,5,5,5,5,5,5,5],
    "min_lesson_length" : 2,
    "signal_smoothing" : true, 
    "parameters" : 
    {
        "max_distance" : [1.2,1.2,1.2,1.2,1.5,1.5,1.5,1.5,2,2,2,2,3,3,3,3],
        "min_scale" :    [1,0.9,0.7,0.5,1,0.9,0.7,0.5,1,0.9,0.7,0.5,1,0.9,0.7,0.5],
        "random_direction":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    }
}
 

 
 

After 3 hours training, the results is quite amazing as the gif above. But there is a problem is that the players try to jump by a very small step to stay on the same box to avoid penalty -1 of jumping to ground.
Brief Training Process - Model#3
The problem that players try to jump a very small step is the wrong corresponding relation with states and reward. The Frame to Skip is set to 30 before and this is not enough. The player may be still in the air in the next 30 frames.
I tried to unsubscribe and resubscribe to brain and this is now working with multiple game instances running at the same time.
At last, after fixing the bugs of the game, I got a best Frame to Skip value and here is the result.
The agent's max steps value is set to 100 and the max cumulative_reward is 100. Mean reward got 70 after 200K steps.
Step: 349000. Mean Reward: 88.70098911968347. Std of Reward: 10.67634076995865.
Saved Model
Step: 350000. Mean Reward: 89.22838773491591. Std of Reward: 9.210647049058082.
Saved Model
Step: 351000. Mean Reward: 89.44072978303747. Std of Reward: 12.138629221972357.
Saved Model
 

Summaries
Thank Unity for giving the opportunity to take a challenge like this.
The projects here are all excellent and I learnt a lot of Unity and Machine Learning.

欢迎关注“洪流学堂”微信公众号，后续将会分享所有开发过程以及遇到的各种坑~