Datasets with Scala Case Class and Java Bean Class
Learn how Scala's case classes and Java's bean classes can be used with Datasets.
We'll cover the following...
Generating data using SparkSession
We can also create a Dataset using a SparkSession object as demonstrated below.
## define class
case class MovieDetailShort(imdbID: String, rating: Int)
## define random number generator
scala> val rnd = new scala.util.Random(9)
## create some data
scala> val data = for(i <- 0 to 100) yield (MovieDetailShort("movie-"+i, rnd.nextInt(10)))
## use spark session to generate a Dataset consisting of objects created in the previous step
scala> val datasetMovies = spark.createDataset(data)
## display three rows from the Dataset
scala> datasetMovies.show(3)
+-------+------+
| imdbID|rating|
+-------+------+
|movie-0| 0|
|movie-1| 3|
|movie-2| 8|
+-------+------+
only showing top 3 rows
When working with Scala, we didn’t have to explicitly specify the encoder since Spark implicitly handles it for us. This is not the case for Java, where we have to specify the encoder. The equivalent Java bean class for MovieDetailShort
is listed below:
public class MovieDetailShort implements Serializable {
...