We study learning mid-level representation from natural non-curated data to achieve efficient and generalizing performance on downstream visual tasks such as recognition, segmentation, and detection. We exploit instance discrimination, instance grouping, model bias and variance analysis, pixel-to-segment contrastive learning, and visual memory to handle open-set recognition, long-tail distribution, open compound domain adaptation, unsupervised or weakly supervised recognition and segmentation.