# reinforcement-learning

A Reinforcement Learning Riddle

I proved 1=0 starting from the formula for the on-policy distribution in episodic tasks. Obviously there is some mistake, can you spot it? 🤔.

Link