When the policymaker is unable to distinguish across agents’ types, suboptimal pooling equilibria are likely to emerge in public policy. Here, I consider the problem of repeatedly choosing policy parameters, like benefit packages, in an adverse selection context. The policimaker has the mandate to maximize social welfare subject to exogenous budget constraints. However, unlike traditional bandit problems, (i) the policymaker does not get to observe utility at the end of the period, (ii) ex-post budget balance needs to be addressed carefully, and (ii) variations in the policy space remain unsufficient for identification. Fortunately, the policymaker may rely on orthogonal varition of a screening device to improve her inference and overall social welfare. I show that the increasing accuracy of a screening device operates not only as a deterrence mechanism but also as a source of information. Improved selection of policy parameters yields to double welfare increase via efficiency and information gains. Finally, heuristics about different bandit and partial monitoring algorithms are discussed as solutions to the problem above. This policy recommendation has applications in disability insurance design, low-income scholarship programs and tax evasion among others.