Sequential search is ubiquitous in empirical and theoretical economics. A worker is presented with a set of job offers that he explores sequentially until he finally accepts one or quits the labor force, a consumer is presented with a set of products that he queries until he buys one or exits the market, etc. To identify the welfare-maximizing sequence is a complicated task whenever there is uncertainty on the quality of the elements in the order. I characterize the ordering and information provision problem as a Principal-Agent model in a repeated game setting. While agents are strict posterior maximizers, the Principal is long-lived, and, consequently, is willing to explore the ordering space to make better recommendations down the road.
I leverage Bandit and Learning Theory to derive near optimal sequencing strategies in the presence of incomplete information. When the outcome of the game is available to the principal at the end of the period, I identify a near-optimal algorithm in a non-parametric setting. I then show that, under some parametric assumptions, there is no loss in restricting the feedback space to the actions of the agents, without observing the outcome of the game. I discuss the applications of these results to labor markets, experimental design, platform design, finance, and more.