use wholetextfiles to create an rdd from the activations dataset. the resulting rdd will consist of tuples, in which the first value is the name of the file and the second value is the contents of the file (xml) as a string.