This tutorial shows how to load a domain in scikit-decide and try to solve it with techniques from different communities:
Prerequisites¶
Install scikit-decide:
!pip install "scikit-decide[all]"
Install renderlab to render Gymnasium environments in Google Colab:
!pip install renderlab
Loading a domain¶
Once a problem is formalized as a scikit-decide domain, it can be tackled by any compatible solver. Domains can be created from scratch or imported from various formats. Here we demonstrate how to import an environment from Gymnasium (the new official fork of OpenAI Gym, a standard API often used in RL communities), like Cart Pole:
import gymnasium as gym
from renderlab import RenderFrame
from skdecide.hub.domain.gym import GymDomain
# Select a Gymnasium environment
ENV_NAME = "CartPole-v1"
# Create a domain factory, a callable returning a skdecide domain (used by solvers)
def domain_factory(record_videos=False):
# Create a Gymnasium environment
env = gym.make(ENV_NAME, render_mode="rgb_array")
# Maybe wrap it with RenderFrame to record/play episode videos (works in Colab)
if record_videos:
env = RenderFrame(env, "./render")
# Return a skdecide domain from a Gymnasium environment
return GymDomain(env)
# In simple cases, domain_factory can be created in one line:
# domain_factory = lambda: GymDomain(gym.make(ENV_NAME))
The rollout utility provides a quick way to run episodes by taking random actions (or a solver policy as shown later) in the domain:
from skdecide.utils import rollout
# Instantiate one domain (used for rollouts)
domain = domain_factory(record_videos=True)
# Do a random rollout of the domain (random actions are taken when no solver is specified)
rollout(domain, num_episodes=1, max_steps=1000, verbose=False) # try verbose=True for more printing
domain.unwrapped().play() # watch last episode in video by calling play() on the underlying Gymnasium environment (works in Colab)
Moviepy - Building video temp-{start}.mp4. Moviepy - Writing video temp-{start}.mp4
Moviepy - Done ! Moviepy - video ready temp-{start}.mp4
Solving the domain¶
One of the key benefits of scikit-decide is its ability to connect the same domain definition to many different solvers from various communities. To demonstrate this versatility, we show how to solve the domain loaded above with both Reinforcement Learning and Cartesian Genetic Programming:
With Reinforcement Learning (RL)¶
Scikit-decide provides wrappers for several RL solvers, such as RLlib and Stable-Baselines3. We use the latter in this example:
from stable_baselines3 import PPO
from skdecide.hub.solver.stable_baselines import StableBaseline
# Check domain compatibility with StableBaseline RL solver (good practice)
assert StableBaseline.check_domain(domain)
# Instantiate solver with parameters of choice (e.g. type of algo/neural net, learning steps...)
solver = StableBaseline(
domain_factory,
algo_class=PPO,
baselines_policy="MlpPolicy",
learn_config={"total_timesteps": 10000},
verbose=1
)
# Solve with RL
solver.solve()
# Save solution
solver.save("saved_solution")
WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gymnasium/wrappers/compatibility.py:67: DeprecationWarning: WARN: The `gymnasium.make(..., apply_api_compatibility=...)` parameter is deprecated and will be removed in v0.29. Instead use `gym.make('GymV21Environment-v0', env_name=...)` or `from shimmy import GymV21CompatibilityV0`
logger.deprecation(
Using cpu device ----------------------------- | time/ | | | fps | 891 | | iterations | 1 | | time_elapsed | 2 | | total_timesteps | 2048 | ----------------------------- ------------------------------------------ | time/ | | | fps | 594 | | iterations | 2 | | time_elapsed | 6 | | total_timesteps | 4096 | | train/ | | | approx_kl | 0.0073780883 | | clip_fraction | 0.0866 | | clip_range | 0.2 | | entropy_loss | -0.687 | | explained_variance | -0.0121 | | learning_rate | 0.0003 | | loss | 7.21 | | n_updates | 10 | | policy_gradient_loss | -0.0131 | | value_loss | 62.6 | ------------------------------------------ --------------------------------------- | time/ | | | fps | 600 | | iterations | 3 | | time_elapsed | 10 | | total_timesteps | 6144 | | train/ | | | approx_kl | 0.0107297 | | clip_fraction | 0.0867 | | clip_range | 0.2 | | entropy_loss | -0.669 | | explained_variance | 0.0917 | | learning_rate | 0.0003 | | loss | 15.5 | | n_updates | 20 | | policy_gradient_loss | -0.0214 | | value_loss | 37.8 | --------------------------------------- ----------------------------------------- | time/ | | | fps | 603 | | iterations | 4 | | time_elapsed | 13 | | total_timesteps | 8192 | | train/ | | | approx_kl | 0.008674022 | | clip_fraction | 0.0938 | | clip_range | 0.2 | | entropy_loss | -0.636 | | explained_variance | 0.266 | | learning_rate | 0.0003 | | loss | 25.7 | | n_updates | 30 | | policy_gradient_loss | -0.0205 | | value_loss | 55.4 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 589 | | iterations | 5 | | time_elapsed | 17 | | total_timesteps | 10240 | | train/ | | | approx_kl | 0.007361056 | | clip_fraction | 0.0664 | | clip_range | 0.2 | | entropy_loss | -0.603 | | explained_variance | 0.23 | | learning_rate | 0.0003 | | loss | 24.4 | | n_updates | 40 | | policy_gradient_loss | -0.0157 | | value_loss | 67.7 | -----------------------------------------
Now we can run episodes with rollout using the latest solver policy:
# Visualize solution (pass solver to rollout to use its policy)
rollout(domain, solver, num_episodes=1, max_steps=1000, verbose=False)
domain.unwrapped().play()
WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code)
Moviepy - Building video temp-{start}.mp4. Moviepy - Writing video temp-{start}.mp4
t: 95%|█████████▌| 186/195 [00:00<00:00, 181.09it/s, now=None]WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/moviepy/video/io/ffmpeg_reader.py:123: UserWarning: Warning: in file ./render/1717076658.8098516.mp4, 720000 bytes wanted but 0 bytes read,at frame 194/195, at time 6.47/6.47 sec. Using the last valid frame instead. warnings.warn("Warning: in file %s, "%(self.filename)+
Moviepy - Done ! Moviepy - video ready temp-{start}.mp4
It is always possible to reload a saved solution (especially useful in a new Python session) and possibly continue learning from there. By running this cell a couple of times, you should see increasingly better solutions:
# Optional: reload solution (required if reloading in a new Python session)
solver.load("saved_solution")
# Continue learning
solver.solve()
# Save updated solution
solver.save("saved_solution")
# Visualize updated solution
rollout(domain, solver, num_episodes=1, max_steps=1000, verbose=False)
domain.unwrapped().play()
WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gymnasium/wrappers/compatibility.py:67: DeprecationWarning: WARN: The `gymnasium.make(..., apply_api_compatibility=...)` parameter is deprecated and will be removed in v0.29. Instead use `gym.make('GymV21Environment-v0', env_name=...)` or `from shimmy import GymV21CompatibilityV0`
logger.deprecation(
----------------------------- | time/ | | | fps | 1058 | | iterations | 1 | | time_elapsed | 1 | | total_timesteps | 2048 | ----------------------------- ----------------------------------------- | time/ | | | fps | 805 | | iterations | 2 | | time_elapsed | 5 | | total_timesteps | 4096 | | train/ | | | approx_kl | 0.008414998 | | clip_fraction | 0.0925 | | clip_range | 0.2 | | entropy_loss | -0.573 | | explained_variance | 0.682 | | learning_rate | 0.0003 | | loss | 17.5 | | n_updates | 60 | | policy_gradient_loss | -0.0132 | | value_loss | 51.8 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 669 | | iterations | 3 | | time_elapsed | 9 | | total_timesteps | 6144 | | train/ | | | approx_kl | 0.005221113 | | clip_fraction | 0.0502 | | clip_range | 0.2 | | entropy_loss | -0.58 | | explained_variance | 0.723 | | learning_rate | 0.0003 | | loss | 14.5 | | n_updates | 70 | | policy_gradient_loss | -0.00825 | | value_loss | 39.8 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 617 | | iterations | 4 | | time_elapsed | 13 | | total_timesteps | 8192 | | train/ | | | approx_kl | 0.006324863 | | clip_fraction | 0.0541 | | clip_range | 0.2 | | entropy_loss | -0.582 | | explained_variance | 0.841 | | learning_rate | 0.0003 | | loss | 6.59 | | n_updates | 80 | | policy_gradient_loss | -0.00791 | | value_loss | 26.5 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 619 | | iterations | 5 | | time_elapsed | 16 | | total_timesteps | 10240 | | train/ | | | approx_kl | 0.004066074 | | clip_fraction | 0.0335 | | clip_range | 0.2 | | entropy_loss | -0.554 | | explained_variance | 0.72 | | learning_rate | 0.0003 | | loss | 13.5 | | n_updates | 90 | | policy_gradient_loss | -0.00582 | | value_loss | 57.1 | ----------------------------------------- Moviepy - Building video temp-{start}.mp4. Moviepy - Writing video temp-{start}.mp4
Moviepy - Done ! Moviepy - video ready temp-{start}.mp4
After using a solver, it is good practice to do a cleanup as shown below (not critical here, but sometimes useful for C++ parallel solvers in scikit-decide). Note that this is automatically done if you use the solver within a with
statement, which will be shown in the CGP sub-section below as an alternative.
# Clean up solver after use (good practice)
solver._cleanup()
With Cartesian Genetic Programming (CGP)¶
from skdecide.hub.solver.cgp import CGP
# Check domain compatibility with CGP solver (good practice)
assert CGP.check_domain(domain)
# Instantiate solver with parameters of choice (using "with" syntax to avoid manual clean up)
with CGP(domain_factory, folder_name="TEMP_CGP", n_it=50) as solver:
# Solve with CGP
solver.solve()
# Visualize solution
rollout(domain, solver, num_episodes=1, max_steps=1000, verbose=False)
domain.unwrapped().play()
[Discrete(2)] [Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)] [20 1 2 14 2 1 12 4 1 0 0 6 20 6 3 9 7 2 2 0 2 1 10 2 6 9 10 13 6 0 1 6 2 14 14 5 18 13 14 4 8 1 3 7 5 13 5 1 3 15 7 15 4 18 17 11 9 16 1 22 8 9 4 8 13 11 0 11 14 20 2 14 5 23 27 12 22 18 2 12 12 8 20 16 14 17 15 18 6 17 16 22 18 19 25 16 12 34 8 2 18 31 8 4 21 1 2 20 19 32 30 4 40 38 18 25 26 6 11 14 16 6 36 16 37 13 17 27 33 16 33 41 11 24 37 5 0 39 2 28 21 3 36 4 5 6 6 9 0 17 2 3 5 19 35 25 1 4 35 7 49 55 1 25 52 17 30 57 9 22 50 4 29 13 3 12 6 15 28 32 9 46 54 10 0 54 1 42 31 15 65 18 5 11 40 2 1 5 3 68 61 1 57 45 19 18 37 9 13 25 18 23 47 14 65 40 19 61 67 15 21 8 13 15 27 18 35 49 10 77 71 9 80 49 20 72 12 16 19 39 18 50 20 17 31 38 1 4 26 13 30 18 5 49 11 10 60 66 4 28 65 11 52 81 17 19 87 0 33 71 19 42 48 9 7 30 2 39 58 15 80 86 9 64 64 20 98 72 5 84 49 3 69 95 2 20 46 1 41 57 4] 1 10.0 True [ 9. 8. 10. 10.] ==================================================== 2 50.0 True [10. 10. 50. 10.] ==================================================== 3 51.0 True [36. 39. 9. 51.] ==================================================== 4 52.0 True [38. 9. 9. 52.] ==================================================== 5 52.0 False [31. 42. 8. 9.] ==================================================== 6 71.0 True [40. 52. 71. 41.] ==================================================== 7 71.0 False [ 9. 57. 55. 8.] ==================================================== 8 71.0 False [ 9. 51. 10. 51.] ==================================================== 9 71.0 False [24. 45. 38. 26.] ==================================================== 10 71.0 False [49. 9. 47. 25.] ==================================================== 11 71.0 False [39. 10. 51. 57.] ==================================================== 12 71.0 False [ 9. 26. 39. 48.] ==================================================== 13 71.0 False [51. 32. 9. 25.] ==================================================== 14 71.0 False [46. 9. 46. 9.] ==================================================== 15 71.0 False [ 9. 25. 48. 26.] ==================================================== 16 71.0 False [44. 41. 31. 18.] ==================================================== 17 71.0 False [45. 42. 56. 46.] ==================================================== 18 71.0 False [40. 41. 32. 44.] ==================================================== 19 71.0 False [40. 37. 9. 58.] ==================================================== 20 71.0 False [ 9. 8. 52. 20.] ==================================================== 21 71.0 False [ 9. 38. 35. 9.] ==================================================== 22 71.0 False [ 9. 25. 42. 9.] ==================================================== 23 71.0 False [68. 40. 35. 11.] ==================================================== 24 71.0 False [46. 52. 9. 25.] ==================================================== 25 71.0 False [ 9. 8. 25. 10.] ==================================================== 26 71.0 False [ 9. 10. 10. 40.] ==================================================== 27 71.0 False [46. 8. 40. 35.] ==================================================== 28 71.0 False [12. 40. 32. 39.] ==================================================== 29 71.0 False [39. 17. 42. 47.] ==================================================== 30 71.0 False [ 9. 10. 41. 10.] ==================================================== 31 71.0 False [10. 41. 35. 9.] ==================================================== 32 71.0 False [39. 42. 35. 48.] ==================================================== 33 71.0 False [11. 35. 25. 25.] ==================================================== 34 71.0 False [39. 42. 25. 36.] ==================================================== 35 71.0 False [ 9. 9. 35. 9.] ==================================================== 36 71.0 False [39. 10. 34. 48.] ==================================================== 37 71.0 False [47. 45. 9. 12.] ==================================================== 38 71.0 False [37. 8. 10. 10.] ==================================================== 39 71.0 False [51. 9. 10. 40.] ==================================================== 40 71.0 False [32. 12. 9. 51.] ==================================================== 41 71.0 False [44. 10. 58. 38.] ==================================================== 42 71.0 False [46. 52. 9. 38.] ==================================================== 43 71.0 False [41. 11. 35. 44.] ==================================================== 44 71.0 False [10. 40. 11. 41.] ==================================================== 45 71.0 False [ 9. 39. 8. 44.] ==================================================== 46 71.0 False [ 9. 9. 35. 41.] ==================================================== 47 71.0 False [53. 39. 47. 39.] ==================================================== 48 71.0 False [ 9. 10. 9. 9.] ==================================================== 49 71.0 False [51. 25. 10. 10.] ==================================================== 50 71.0 False [ 8. 10. 10. 21.] ==================================================== Moviepy - Building video temp-{start}.mp4. Moviepy - Writing video temp-{start}.mp4
t: 57%|█████▋ | 24/42 [00:00<00:00, 231.28it/s, now=None]WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/moviepy/video/io/ffmpeg_reader.py:123: UserWarning: Warning: in file ./render/1717077900.3986228.mp4, 720000 bytes wanted but 0 bytes read,at frame 41/42, at time 1.37/1.37 sec. Using the last valid frame instead. warnings.warn("Warning: in file %s, "%(self.filename)+
Moviepy - Done ! Moviepy - video ready temp-{start}.mp4
In this example, you may find that RL often finds better solutions than CGP (although this depends on the solver parameters and the random seed). Note however that this is highly problem-dependent: try re-running this notebook after setting ENV_NAME = "MountainCarContinuous-v0"
at the beginning and you may find opposite results. That shows the power of having a wide catalog of solvers to find the best solution for each specific problem!