Spark Maven Based Projects
Get introduced to the common structure of the Maven projects we'll use in the code examples and projects.
We'll cover the following...
Maven projects
For the small projects, and even for the main batch application template project, this course abides by a standard project structure based on Maven, and specifically relies on the pom.xml file for the configuration.
For this reason, this lesson focuses on describing the main Maven project structure parts, which can also serve as a guide in the future to come back to if we need a Maven-based Spark type of project.
The core structure of the Maven projects follows the usual Maven conventions centered around the so-called Project Object Model (POM) file, so it can be helpful to revise this first briefly.
The previous project POM XML file can be a good starting point to understand the configurations explained in this lesson:
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.jmb.examples</groupId><artifactId>dataframe-basics</artifactId><version>1.0-SNAPSHOT</version><properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><java.version>1.8</java.version><scala.version>2.12</scala.version><spark.version>3.0.0</spark.version><maven-compiler-plugin.version>3.8.0</maven-compiler-plugin.version></properties><dependencies><!-- Spark --><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_${scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_${scala.version}</artifactId><version>${spark.version}</version><exclusions><exclusion><groupId>org.slf4j</groupId><artifactId>slf4j-simple</artifactId></exclusion></exclusions></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.11</version><scope>test</scope></dependency></dependencies><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><version>${maven-compiler-plugin.version}</version><configuration><source>${java.version}</source><target>${java.version}</target></configuration></plugin><plugin><groupId>org.codehaus.mojo</groupId><artifactId>exec-maven-plugin</artifactId><version>1.6.0</version><configuration><executable>java</executable><arguments><argument>-classpath</argument><classpath /><argument>com.jmb.DataFrameBasicsMain</argument></arguments></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-assembly-plugin</artifactId><version>2.4.1</version><configuration><!-- Get all project dependencies packaged alongside the application, aka the Fat Jar--><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs><!-- MainClass in mainfest makes a executable jar --><archive><manifest><mainClass>com.jmb.DataFrameBasicsMain</mainClass></manifest></archive></configuration><executions><execution><id>make-assembly</id><!-- Bind to the packaging phase --><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build></project>
Basics of POM
The POM is perhaps the most fundamental unit of work in Maven. It is an XML file that contains information about the project and configuration details that Maven utilises while building a project.
“Building” is understood in this context as a set of tasks, or goals, that eventually can produce either a packaged Java executable application, a Java library, or both. However, a single JAR file will not be the outcome of the Maven execution process.
Note: For more information about Maven, the official documentation can serve as an excellent manual to consult when needed.
Besides the configurations that can be manually set up in the POM XML file, Maven also works behind the scenes by following certain default conventions. These are also configurable and can be overwritten, but only if they are specified within this file.
One example of these is the build directory, the directory where the built object is copied to, denoted by the ...