I proved 1=0 starting from the formula for the on-policy distribution in episodic tasks. Obviously there is some mistake, can you spot it? 🤔.
LinkI proved 1=0 starting from the formula for the on-policy distribution in episodic tasks. Obviously there is some mistake, can you spot it? 🤔.
Link