Introduction to Reinforcement Learning (Mar 2025)

Undergraduate course on RL, Jadavpur University, Information Technology (Online), 2024

Animals learn to select actions based on interaction with the environment around them. Reinforcement Learning (RL) is a field in learning theory, where we mimic animals using computer algorithms. Here, an RL agent takes an action based on the current environment and receives a reward. Through the reward, it estimates its optimal action to maximise rewards/increase chances of survival. This is a 54–hour comprehensive course (21 sessions, 2 hours each, and 3 Assignemnts, 4 hours each) that introduces students to the foundations of Reinforcement Learning. It is targeted at students who want to learn the foundations of AI – particularly Reinforcement Learning. This is tailored for all UG students in Engineering and Sciences. It would be particularly helpful for students who want to pursue - (i) a career in research in AI/RL/statistics (ii) Or apply for RL positions roles in the industry (a lot of robots like Boston Robotics use RL to design their robots). We also showed how to code several RL algorithms in Python from scratch.

Session 1-3: Probability Space, Conditional Probability, Discrete and Continuous random variables and distributions - Bernoulli, Binomial, Poisson, Normal, Exponential and Uniform distribution, Joint distributions, Conditional expectation, Law of Large Numbers, Inequalities in Probability.

Session 4-5: Introduction to RL, Multi-armed Bandits, ODE interpretation, UCB, Gradient Based Action Selection in Bandits

Session 7-8: Controlled Markov Chain, Markov Decision Process, Action, Reward, Value Function, Bellman Equations for a policy and optimality, Optimal Policy, Iterative Policy Evaluation, Value and Policy Iteration

Assignment 1: Sessions 1-8 – Theory + Coding (40 points, 4 hours)

Session 9-10: Q-learning, Finite Horizon MDPs, Dynamic Programming Examples, Stochastic Shortest Path, Bellman Operators, Proof of Convergence for Policy Evaluation

Session 11-12: Model Free Methods, Monte Carlo, Temporal Difference Algorithm, TD(lambda) Algorithm, On Policy vs Off Policy Algorithms, Importance Sampling, Model-free Q learning, SARSA, Implementation of TD(0)

Session 13-16: Function Approximation in RL, Policy Evaluation, SGD Monte Carlo, TD0 with linear function approximation, Glimpse at Stochastic Approximation Algorithm, TD(0) convergence proof, Point of Convergence of TD(0), gamma-contraction, Banach’s Fixed Point, Deviation from Optimality.

Session 17-18: Off-Policy Evaluation of TD0 with linear function Approximation, Emphatic TD(0), Synchronous Q-learning, Model-free, Model-based, tabular, and with Linear Function Approximation, Discussion on Convergence

Session 19: Asynchronous Q learning, Classification in Machine Learning, Maximum Likelihood Estimation, Logistic and Softmax Regression

Session 20-21: Deep Neural Networks, MLP, Backpropagation, Policy Gradient, REINFORCE, Actor Critic based Policy Gradient, Safe RL, Planning, DYNA, Curriculum Learning

Assignment 2-3: Sessions 9-21 – Theory + Coding (60 points, 8 hours)

You can get the session notes here, and the video lectures in this youtube link. Certificate

Share on

Twitter Facebook LinkedIn

Mainak Biswas

Share on