CSC415 - Introduction to Reinforcement Learning

Course website for CSC415H5S Introduction to Reinforcement Learning - Winter 2026

🎯 Welcome to CSC415H5S

Introduction to Reinforcement Learning
Winter 2026 • University of Toronto Mississauga

Reinforcement learning is a powerful paradigm for modeling autonomous and intelligent agents interacting with the environment, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This course provides an introduction to reinforcement learning intelligence, focusing on the study and design of agents that interact with a complex, uncertain world to achieve a goal. We will study agents that can make near-optimal decisions in a timely manner with incomplete information and limited computational resources.

The course will cover (among other topics):

  • Markov Decision Processes (MDPs)
  • Reinforcement Learning algorithms
  • Planning methods
  • Function Approximation (online supervised learning)

Note: The topics listed above represent a limited selection from the course. Additional topics will be covered throughout the semester.


📅 Course Information

Schedule

Lecture (LEC0101):
Wednesday, 11:00 AM - 1:00 PM
In Person: DH 2070

Practical (PRA0101):
Thursday, 7:00 PM - 8:00 PM
In Person: DH 2026

Instructor

Dr. Ameya Pore
amey.pore@utoronto.ca

Office Hours:
Wednesday, 6:00 PM - 7:00 PM
In Person: DH3110

Please allow 24-48 hours for response during regular business hours. Include [CSC415] in the subject line.
Teaching Assistants

Deniz Jafari
Office Hours:
TBA

Quentin Clark
Office Hours:
TBA


Learning Outcomes

By the end of this course, students will be able to:

  1. Theoretically analyze and practically implement fundamental and advanced Reinforcement Learning algorithms, ranging from tabular methods and DQN to Policy Gradients (PPO)
  2. Design and execute a research project that applies these concepts to domains such as robotics (both simulation and real-world robots)
  3. Communicate findings through a conference-level research paper
  4. Critically evaluate the work of peers
  5. Effectively defend technical decisions through oral presentations

Prerequisites

  • Prerequisites: CSC311H5
  • Recommended: CSC413
  • Credit Value: 0.5

📢 Announcements

Welcome to CSC415!

Welcome to Introduction to Reinforcement Learning! This page will be updated regularly with course materials, announcements, and important information. Please check back frequently for updates.

Posted: Course Start Date

📚 Course Materials

Required Textbook

Reinforcement Learning: An Introduction (2nd Edition)

Authors: Richard S. Sutton and Andrew G. Barto
Available online: http://incompleteideas.net/book/

This textbook covers foundational theory (MDPs, Bellman equations, TD learning) extensively. Best for Weeks 1–3 of the syllabus (Foundations, MC, TD).

Additional Resources

The course draws inspiration from several excellent open-source courses and resources:


📊 Assessment & Grading

Assessment Weight Due Date Description
Laboratory Exercises 25% Various dates 6 lab exercises (top 5 count). Hands-on programming assignments in Python using Gymnasium and PyTorch. Implement algorithms from tabular methods to DQN and PPO.
Midterm Exam 15% Jan 29, 2026 Written test covering foundational concepts from Weeks 1-4: MDPs, Bellman Equations, Q-Learning, and Policy Gradients.
Assignment 1 10% Feb 13, 2026 Literature review of assigned papers (2-3 papers) along with code implementation of one of the papers.
Project Proposal 5% Feb 24, 2026 Concise document outlining selected research topic, intended environment/dataset, and hypothesis.
Final Project Paper 25% Mar 24, 2026 Comprehensive research paper in conference format (e.g., ICML/ICRA style) detailing methodology, experimental setup, results, and discussion.
Assignment 2 (Peer Review) 10% Mar 31, 2026 Critical evaluation of peer project reports, providing constructive feedback on technical correctness, clarity, and novelty.
Final Project Presentation 10% Apr 2, 2026 10-minute oral presentation of research findings, methodology, and analysis.

Lab Exercise Schedule

Lab Due Date Topic
Lab 1 Jan 13, 2026 Tabular value-iteration agent on Gridworld
Lab 2 Jan 20, 2026 Compare MC and TD methods; Q-Learning with ε-greedy
Lab 3 Jan 27, 2026 Implement DQN in Gymnasium (CartPole or MountainCar)
Lab 4 Feb 17, 2026 Train PPO agent on Pendulum-v1 (dm_control)
Lab 5 Mar 3, 2026 Implement RND agent in MiniGrid or Maze2D
Lab 6 Mar 10, 2026 Train CNN encoder on Atari frames; visualize latent space

📖 Course Schedule

Weekly Schedule
Week Date Topic Key Concepts
1 Jan 7 Foundations of Reinforcement Learning Agent–environment loop, MDP structure, value functions, Bellman equations, biological motivation
2 Jan 14 Monte Carlo & Temporal-Difference Learning MC prediction (first/every visit), TD(0), TD(λ), SARSA, on/off-policy learning
3 Jan 21 Q-Learning Q-Learning algorithm, bias–variance trade-off, linear value approximation
4 Jan 28 Function Approximation & DQN Deep Q-Networks (DQN), Policy-Gradient Theorem, REINFORCE with baseline
5 Feb 4 Policy Gradient Methods (PPO) REINFORCE → A2C → PPO, trust-region optimization, GAE, training stability
6 Feb 11 Exploration in RL Entropy regularization, intrinsic motivation (ICM, RND), robustness and generalization
7 Feb 25 Regularization and Representation Learning Contrastive learning (CURL, BYOL-Explore), predictive state representations, auxiliary tasks
8 Mar 4 RL for Robotics (Embodied RL) Continuous control policies, sim-to-real transfer, domain randomization, hybrid IL + RL strategies
9 Mar 11 World Models & Latent Planning Latent dynamics models (VAE, RSSM, Dreamer), imagination rollouts, planning in latent space
10 Mar 18 RL for LLMs and Alignment (RLHF) Preference modeling, reward models, PPO/DPO/RLAIF, alignment issues, reward mis-specification
11 Mar 28 Sequence Modelling in RL Recurrent neural networks (RNNs, LSTMs, GRUs) for RL, Transformers in RL, Decision Transformers, trajectory transformers, history encoding, temporal dependencies, memory-augmented RL
12 Apr 2 Final Project Presentations Student presentations of term projects (oral defense)
13 (Optional) Apr 9 Safe-RL and Hierarchical RL Safe MDPs, constraint optimization, Lyapunov-based safety, CPO, risk-sensitive criteria, hierarchical task decomposition, options framework, HRL architectures

💻 Project Information

Final Project

The course culminates in a capstone research project where students produce a conference-level paper. This is the primary deliverable for the course.

Project Guidelines

Comprehensive guidelines for the final project, including format requirements, evaluation criteria, and submission instructions.

Download Guidelines
Project Topics

Suggested research topics and project ideas to help you get started on your final project.

View Topics
Simulation Setup

Instructions for setting up simulation environments for your RL experiments, including Gymnasium, dm_control, and other relevant frameworks.

Download Setup Guide

Project Timeline

  • Feb 24, 2026: Project Proposal Due (5%)
  • Mar 24, 2026: Final Project Paper Due (25%)
  • Mar 31, 2026: Peer Review Due (10%)
  • Apr 2, 2026: Final Project Presentation (10%)

📋 Course Policies

Late Submission Policy

Attendance

Success in this course is highly correlated with active participation in both lectures and tutorial sessions. While attendance is not strictly graded for standard lectures, students are strongly encouraged to attend in person.

Academic Integrity

All work submitted must be your own. Collaboration on assignments is allowed but must be acknowledged. Plagiarism or any form of academic dishonesty will result in severe penalties, including possible failure of the course.

Please familiarize yourself with the Code of Behaviour on Academic Matters and the Code of Student Conduct.

Generative AI Policy

AI Tool Usage Guidelines

Students are permitted to use AI tools (e.g., ChatGPT, Claude, GitHub Copilot) as learning aids and to assist in assignments. However:

  • Students remain ultimately accountable for all work submitted
  • Citation is required: Any content, code, or ideas produced by AI must be explicitly cited
  • Include an "AI Statement" at the end of assignments detailing which tools were used and for what purpose
  • No grading penalty for declared use of AI tools, provided they are cited correctly
  • Midterm Exam: Closed environment - use of AI tools/electronic devices is strictly prohibited

Accommodations

  • Religious Accommodations: Information available at the University’s Policy on Scheduling
  • Temporary Absence: Students may use the ACORN Absence Declaration Tool for absences up to 7 consecutive days
  • Equity and Academic Rights: The University of Toronto is committed to equity, meaningful inclusion, and respect for diversity

📧 Contact

For questions about the course:

  1. Check this website and announcements first
  2. Attend office hours: Wednesday, 6:00 PM - 7:00 PM (DH3110)
  3. Email the instructor: amey.pore@utoronto.ca
    • Please include [CSC415] in the subject line
    • Allow 24-48 hours for response during regular business hours

Last updated: December 27, 2025