Proactive caching at the baseband units (BBUs) in cloud radio access networks (CRANs) has attracted significant attention. However, most existing works assume a known content distribution while ignoring the massive nature of data in CRANs. In contrast, in this paper, the problem of proactive caching is studied for CRANs. In this model, the BBUs can predict the content distribution of each user, determine which content to cache, and cluster remote radio heads (RRHs) based on the content predictions. This problem is formulated as an optimization problem which jointly incorporates backhaul loads, RRH clustering, and content caching. To solve this problem, an algorithm that combines the machine learning framework of echo state networks with sublinear algorithms is proposed. Using echo state networks, the BBUs can predict the users’ content request distribution while having only limited information on the network’s and users’ states. Then, a sublinear algorithm is proposed to determine which content to cache and how to cluster the RRHs while using limited content request samples. Simulation results using real data from Youku show that the proposed approach yields significant gains, in terms of sum effective capacity, that reach up to 26.8% and 36.5%, respectively, compared to random caching with clustering and random caching without clustering algorithms.