Save the configuration, and then restart the service as described in steps 6 and 7. We’ve been using Apache Spark for 1 year, and we wanted to share some thoughts and tips about it. Old MTD web site and old MTD documentation is available here. KryoSerializer Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf. local-dirs). To investigate the capabilities of a distributed version of TensorFlow, we adapted the single machine version of TensorFlow to Apache Spark. In this analysis, we highlighted the impact of tuning resource allocation, because not only by allocating all the available resources we will get better performance, but it also depends on how to tune the parameters like - -driver-memory, - -num-executors, - -executor-cores and - -executor-memory. 3 occurs after driver B. instances in order to control the task’s processing and memory consumption. SparkContext的初始化. It will run smoothly. I read an article before, it says:”This is because of the runtime overhead imposed by Scala, which is usually around 3-7%, more or less. Developed to utilize distributed, in-memory data structures to improve data processing speeds for most workloads, Spark performs up to. cores, spark. Under the Executors tab, you can see processing and storage for each executor. answered by nealmcb on Sep 30, '15. memoryOverhead. 2X WorkerType configuration, the Spark drivers for the development endpoint will run on 4 vCPU, 16 GB of memory, and a 64 GB disk. 8:7077 spark. enabled true spark. 각 executor에서 실행하는 foreach 함수에서 참조하는 counter는 driver node에서 보낸 closure내의 변수(counter)를 copy한 값이다. The main idea of this job is as following: The job is of filling type. Memory config of system with Intel Optane DC persistent memory: 8x128GB DCPMM in App Direct Mode + 12x16GB DDR4. https://babun. A YARN Application Master still needs to be created to schedule the Spark executor, and you can set its memory requirements. maxResultSize: 10G; packages: com. 1000m, 2g) what is the relationship of above 3 parameters? As I understand, DRIVER_MEMORY is the max memory master node/process can request. 04 - Embedded & Distributed Apache Drill - Query File System, JSON, and Parquet Apache Drill - HBase query Apache Drill - Hive query Apache Drill - MongoDB query Redis In-Memory Database Redis vs. 在這裡我也加入一次環境設定 (雖然應該是在. Ever wondered how to configure –num-executors, –executor-memory and –execuor-cores spark config params for your cluster?. In this case, you do not need to specify spark. Upscaling a Single Memory Intensive Spark Job¶ You can set a limit on the executor memory a job can use by setting spark. Configuration import org. Also, running tiny executors (with a single core, for example) throws away benefits that come from running multiple tasks in a single JVM. • Driver memory does not need to be large if the job does not aggregate much data (as with a collect() action). Much of this performance increase is due to Spark's use of in-memory persistence. It is the memory used for the driver process, that is, where SparkContext is initialized. nodemanager. master spark://5. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. annotations declaring the amount of memory and CPU. In cluster mode, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config yarn. Launching Spark on YARN. The main idea of this job is as following: The job is of filling type. ” This is the default option selected when you install Windows. Overview We’ve had some hiccups and now some success in getting the new Spark based EMR job running. Contribute to vaquarkhan/vaquarkhan development by creating an account on GitHub. SparkContext :. In the Executors page of the Spark Web UI, we can see that the Storage Memory is at about half of the 16 gigabytes requested. answered by nealmcb on Sep 30, '15. We can sqoop the data from RDBMS tables into Hadoop Hive table without using SQOOP. However, we can also prefer for dynamic. The application master is used only to request resources for YARN. memory property of the –executor-memory flag. First an executor was lost and then the > driver went out of memory. Spark Driver – Master Node of a Spark Application. 1- Saving orc file from spark and create table directly on hive, see this code: spark-shell --master yarn-client --driver-memory 512m --executor. /`/bin/spark-submit --class --master yarn-cluster --driver-memory 2g --executor-memory 1g --conf spark. 하지만 Cluster 모드에서는 Executor의 stdout를 사용하기 때문에 driver에 출력이 되는 것이 아니라 각각의 cluster node에 출력이 된다. On the other hand, as each driver process is isolated, data is not allowed to be shared among drivers, or `SparkContext` object without writing data into external storages (and it means it may harm spark's in-memory driven performance). JVM overhead. By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). But this is not how it works. You will also learn how to develop Spark applications using SparkR and PySpark APIs, interactive data analytics using Zeppelin, and in-memory data processing with Alluxio. Memory Tuning In Apache Spark Performance Tuning While we tune memory usage, there are three considerations which strike: As the whole dataset needs to fit in memory, consideration of memory used by your objects is the must. Performance: rxExecBy vs gapply on Spark April 25, 2017 April 28, 2017 ~ jasonzhangmachinelearning rxExecBy is a new API of R server release 9. Hadoop saves data on the hard drive along with each step of the MapReduce algorithm. ” This seems apply to my calculation as (14*1024-300)*0. If the required jar is only to be used by driver code, use option --driver-class-path Launching spark-shell with external jars. Spark provides flexibility to configure properties like number of executors, number of cores per executor, and amount of memory per executor for each application. answered by nealmcb on Sep 30, '15. com provides a central repository where the community can come together to discover and share dashboards. Spark is a scalable data analytics platform that incorporates primitives for in-memory computing and therefore exercises some performance advantages over Hadoop's cluster storage approach. The remaining 40% of memory is available for any objects created during task execution. 8:7077 spark. Three important features offered by BigDL are rich deep learning support, High Single. The driver and the executors run their individual Java processes and users can run them on the same horizontal spark cluster or on separate machines i. Change the driver memory of the Spark Thrift Server. by the executors to renounce probate. And available RAM on each node is 63 GB. 2 Streaming Apache Drill with ZooKeeper install on Ubuntu 16. spark executor 怎样获取driver中声明的参数变量. This quick article is a wrap up for reference on how to connect to ScyllaDB using Spark 2 when authentication and SSL are enforced for the clients on the Scylla cluster. Each worker node launches its own Spark Executor, with a configurable number of cores (or threads). spark·memory·driver·executor Cluster size vs Spark UI Executors memory. The client. Memory for each executor: From above step, we have 3 executors per node. I set the driver memory per spark-submit parameter --driver-memory. To get over from driver failure, Spark streaming uses data checkpointing Final Words: Apache Storm Vs Apache Spark. SPARK DRIVER • Driver Pod – The Spark driver is running within a POD. Also, all the data needed to run a task is cached within Executor memory. The fantastic Apache Spark framework provides an API for distributed data analysis and processing in three different languages: Scala, Java and Python. local-dirs). This usually happens during development for beginners. Apache Spark™ An integrated part of CDH and supported with Cloudera Enterprise, Apache Spark is the open standard for flexible in-memory data processing that enables batch, real-time, and advanced analytics on the Apache Hadoop platform. Spark Workers and Executors. memory=20G - spark. Consider boosting spark. Instead of using this old driver, you may check the R/O block device emulation provided by UBI useful. As soon as they have run the task, sends results to the driver. Determine the maximum size of the data the Spark application will handle. Upscaling a Single Memory Intensive Spark Job¶ You can set a limit on the executor memory a job can use by setting spark. Similarly, Spark needs some source of data - and large amounts of it, to make using Spark worthwhile. The best practices were tested on a Dell 12-server cluster, with Hadoop …. With no custom. You can find the R Markdown document used to generate this post here. --num-executors 3 \ --driver-memory 4g \ --executor-memory 2g \ --executor-cores 1 \ lib/spark-examples*. Memory is ordered in a resilient distributed dataset (RDD), which is stored on each node. --executor-cores 5 means that each executor can run a maximum of five tasks at the same time. memory won't have any effect, as you have noticed. Apache Spark™ is a fast, general-purpose engine for large-scale data processing. Note that pods for drivers and executors of different applications may occur in arbitrary order, e. memory 4g spark. An Executor is dedicated to a specific Spark application and terminated when the application completes. By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). Concurrency. How will you do memory tuning in Spark? Knowledge Powerhouse Introduction To Driver Manager, Executor, Spark Context & RDD Decide Number Of Executors and Memory | Spark Tutorial Interview. databricks:spark-csv_2. 07, with minimum of 384 : The amount of off heap memory (in megabytes) to be allocated per executor. The client driver JVM The driver is the client program for the Spark job. Every Spark™ application consists of a driver program that manages the execution of your application on a cluster. Upscaling a Single Memory Intensive Spark Job¶ You can set a limit on the executor memory a job can use by setting spark. DAG Action: This is the by far least expensive action out of the three. Driver的注册与启动. Second, Spark does not. This quick article is a wrap up for reference on how to connect to ScyllaDB using Spark 2 when authentication and SSL are enforced for the clients on the Scylla cluster. But it is not working. Configuration import org. Spark executor. Save the configuration, and then restart the service as described in steps 6 and 7. 想要了解Spark源码的小伙伴们,请按照下面文章的顺序进行阅读哦~ 1. fraction - The default is set to 60% of the requested memory per executor. We assume Spark and HBase are deployed in the same cluster, and Spark executors are co-located with region servers, as illustrated in the figure below. These properties are set by creating a configuration object called SparkConf, which will be used to create SparkContext ; RDD transformations. With 4 cores you can run 4 tasks in parallel, this affects the amount of execution memory being used. GitHub Gist: instantly share code, notes, and snippets. We can sqoop the data from RDBMS tables into Hadoop Hive table without using SQOOP. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN. You call PyBDA like that: pybda run data/pybda-usecase. The spark driver program uses spark context to connect to the cluster through a resource manager (YARN orMesos. g: spark-shell --conf config_file_details --driver-memory 4G --executor-memory 4G -i spark_job. package statistics. Overview We’ve had some hiccups and now some success in getting the new Spark based EMR job running. Difference between spark. f) Determining Memory Consumption in Spark : If we want to know the size of Spark memory consumption a dataset will require to create an RDD, put that RDD into the cache and look at “Storage” page in Web UI. 03 March 2016 on Spark, scheduling, RDD, DAG, shuffle. Storage MemoryExecution Memory The boundary between Storage. memoryOverhead , where we assign at least 512M. The driver on-heap memory - usually set as an; The driver off-heap memory; Some notes: On YARN, it is generally necessary to set the spark. This quick article is a wrap up for reference on how to connect to ScyllaDB using Spark 2 when authentication and SSL are enforced for the clients on the Scylla cluster. • There are tradeoffs between num-executors and executor-memory. The last 3 options indicate the Spark Master URL and the amount of memory to allocate for each Spark Executor and Spark Driver. SparkContext (sc) is the abstraction that encapsulates the cluster for the driver node (and the programmer). maxResultSize: 10G; packages: com. One operation and maintenance 1. We will discuss various topics about spark like Lineage, reduceby vs group by, yarn client mode vs yarn cluster mode etc. Answer #3 – with overhead • 6 executors • 63 GB memory each • 15 cores each 17. A Spark driver (an application's driver process) is a JVM process that hosts SparkContext for a Spark application. With other Tableau products, it comprises a complete business intelligence software solution. " This seems apply to my calculation as (14*1024-300)*0. TaskScheduler的创建和启动. Structured Streaming, as of today, provides only one implementation of State Store: HDFS backed State Management This State Store has been implemented using in-memory HashMap (in executors) and. Key takeaways: Spark driver resource related configurations also control the YARN application master resource in yarn-cluster mode. SPARK vs HADOOP 1. cores spark. Its "executor ID" is listed as. In this analysis, we highlighted the impact of tuning resource allocation, because not only by allocating all the available resources we will get better performance, but it also depends on how to tune the parameters like - -driver-memory, - -num-executors, - -executor-cores and - -executor-memory. Livy is an open source REST interface for interacting with Spark from anywhere. And available RAM on each node is 63 GB. The client driver JVM The driver is the client program for the Spark job. and memory on which Spark runs its tasks. Unfortunately, Spark Mesos and YARN only allow giving as much resources (cores, memory, etc. However for this beta only static resource allocation can be used. Invoking an action inside a Spark application triggers the launch of a Spark job to fulfill it. spark_session --executor-cores=3 --diver 8G sample. Spark runs out of memory when either 1. We are submitting the spark job in our tomcat application using yarn-cluster mode with great success. How do I configure an Apache Spark application by using a Jupyter notebook on clusters?. This tends to grow with the executor size (typically 6-10%). However, we can also prefer for dynamic. packages--packages: Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. The default value for those parameters is 10% of the defined memory (spark. Therefore, memory overhead should be increased in case the executor memory is lower than 10GB. How will you do memory tuning in Spark? Knowledge Powerhouse Introduction To Driver Manager, Executor, Spark Context & RDD Decide Number Of Executors and Memory | Spark Tutorial Interview. Spark is beautiful. - As Spark works on a cluster, it is network sensitive. Fault refers to failure, thus fault tolerance in Apache Spark is the capability to operate and to recover loss after a failure occurs. instances manually. We also found that we needed to explicitly stipulate that Spark use all 20 executors we had provisioned. answered by nealmcb on Sep 30, '15. An executor have multiple slots to run multiple tasks parallel. By default, spark submits all. sh , or "spark. Executors are the processes that can perform tasks. If you simply divide all memory available into a number of Executors you want to run on a single node, Yarn will not be able to start so many Executors due to overhead. This article cover core Apache Spark concepts, including Apache Spark Terminologies. The default value of the driver node type is the same as the worker node type. collect() Collect will attempt to copy all the data in the RDD/DataFrame into the driver machine and may run out of memory and crash. Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured. Learn more. Apache Spark Tutorial Following are an overview of the concepts and examples that we shall go through in these Apache Spark Tutorials. Acts as a coordinator for the Application Executors Runs computation & Stores Application Data Are launched at the beginning of an …. Memory is ordered in a resilient distributed dataset (RDD), which is stored on each node. Apache Spark Terminologies - Objective. master spark://5. This makes it very crucial for users to understand the. This article explains the resource allocation configurations for Spark on Yarn with examples. f) Determining Memory Consumption in Spark : If we want to know the size of Spark memory consumption a dataset will require to create an RDD, put that RDD into the cache and look at “Storage” page in Web UI. As Spark works on a cluster, it is. 病気やけがなどで輸血を必要としている患者さんの尊い生命を救うため、日本赤十字社では、16~69歳までの健康な方に献血のご協力をお願いしています。. jar的程序参数; 以--master指定的master url为准; executor-memory是给每个executor指定使用的内存;total-executor-cores 这个参数是所有的executor使用的总CPU核数。. You should ensure the values in spark. executors, spark. Note: Spark configurations for resource allocation are set in spark-defaults. these three params play a very important role in spark performance as they control the amount of CPU & memory your spark application gets. Under the Executors tab, you can see processing and storage for each executor. archive: points to the folder containing all the Spark JARs needed when running in Spark on YARN mode, that is, when spark. maxResultSize: 10G; packages: com. This can be determined by looking at the “Executors” tab in the Spark application UI. Scala, Java, Python, R. instances in order to control the task’s processing and memory consumption. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. This can be determined by looking at the "Executors" tab in the Spark application UI. Structured Streaming, as of today, provides only one implementation of State Store: HDFS backed State Management This State Store has been implemented using in-memory HashMap (in executors) and. cores (different for client mode). Apache Ignite is a memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads, delivering in-memory speeds at petabyte scale. It is a sequential process performed by the Spark driver that renames files one by one. Contribute to vaquarkhan/vaquarkhan development by creating an account on GitHub. Whether you've loved the book or not, if you give your honest and detailed thoughts then people will find new books that are right for them. enabled true spark. partitions=x, where x should be same number as numPartitions in your Spark “config. Each worker node launches its own Spark Executor, with a configurable number of cores (or threads). From the Spark documentation, the definition for executor memory is Amount of memory to use per executor process, in the same format as JVM memory strings (e. Local threads. Spark Thrift Server driver memory is configured to 25% of the head node RAM size, provided the total RAM size of the head node is greater than 14 GB. Spark-on-HBase Connector Architecture. overhead)for off heap memory • Default is max(384MB,. Memory Tuning In Apache Spark Performance Tuning While we tune memory usage, there are three considerations which strike: As the whole dataset needs to fit in memory, consideration of memory used by your objects is the must. How do I configure an Apache Spark application by using a Jupyter notebook on clusters?. The tasks are load balanced across the cluster nodes and are guaranteed to be executed as long as there is at least one node in the cluster. So, you may need to decrease the amount of heap memory specified via --executor-memory to increase the off-heap memory via spark. cores; driver配置 spark. 4 million records. It’s JVM process that execute multiple tasks and it provide in-memory storage for RDD. --master yarn-client --driver-memory 5g --num-executors 10 --executor-memory 9g --executor-cores 6 Theoretically, you only need to make sure that the total amount of resources calculated by using the preceding formula does not exceed the total amount of the resources of the cluster. Memory is ordered in a resilient distributed dataset (RDD), which is stored on each node. cores: number of cores of each executor; spark. Properties 사용하기properties는 Hashtable를 상속받아 Key value로 구성 되어 있다. Worker nodes communicate with the cluster manager. Are you running in Yarn or standalone mode, if you don't have enough memory in your containers (if you are using Yarn), then it could also explain the problem. Does RDD use memory to store data? How much of RDD is in memory? Does RDD use off-heap memory? Should an RDD be serialized or not (while storing the data)? How many replicas (default: 1) to use (can only be less than 40)?. As we know, Spark runs on Master-Slave Architecture. In contrast, the Spark cache uses memory. Submit your job. In-memory computing is much faster than disk-based applications, such as Hadoop, which shares data through Hadoop distributed file system (HDFS). annotations declaring the amount of memory and CPU. This talk will take a deep dive through the memory management designs adopted in Spark since its inception and discuss their performance …. Enhanced Speed – MapReduce makes use of persistent storage for carrying out any of the data processing tasks. And the demand meets the. in a vertical spark cluster or in mixed machine configuration. Its "executor ID" is listed as. It hosts Web UI for the environment. Executors are only responsible for executing the part of the code assigned to them by the driver and reporting the status back to the driver. –properties-file FILE Path to a file from which to load extra properties –driver-memory MEM Memory for driver. Apache Spark 1. property_1, property_1 will be passed to SparkConf; Dependency Management. Syncing dependencies. Next, let's try to: so that the next two sections can operate on in-memory data. Deep Dive: Apache Spark Memory Management A Developer’s View into Spark's Memory How Spark Executes A Program | Introduction To Driver Manager, Executor, Spark Context & RDD. In reality, the driver requests executor creation from Kubernetes. memory + spark. 04 - Embedded & Distributed Apache Drill - Query File System, JSON, and Parquet Apache Drill - HBase query Apache Drill - Hive query Apache Drill - MongoDB query Redis In-Memory Database Redis vs. Depending on size of pandas objects it can add significant memory overhead to the Spark driver process (where all data is collected). Your application (driver program) SparkContext. 95=10001mb or 9. Spark SQL: Basically, for redundantly storing data on multiple nodes, there is a no replication factor in Spark SQL. Spring async vs completablefuture. This tends to grow with the executor size (typically 6-10%). fraction – a fraction of the heap space (minus 300 MB * 1. Free shipping and free returns on eligible items. Worker nodes manage resources in a single slave machine. The fantastic Apache Spark framework provides an API for distributed data analysis and processing in three different languages: Scala, Java and Python. The difference between Spark Session vs Spark Context vs Sql Context lies in the version of the Spark versions used in Application. Next, let's try to: so that the next two sections can operate on in-memory data. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark. cores spark. It transmits control flow logic from the driver to the executors. Master hang up, standby restart is also invalid Master defaults to 512M of memory, when the task in the cluster is particularly high, it will hang, because the master will read each task event log log to generate spark ui, the memory will naturally OOM, you can run the log See that the master of the start through the HA will naturally fail for this reason. Parquet vs ORC vs ORC with Snappy ; Apache Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs ; Can apache spark run without hadoop? How to connect to a Hive metastore programmatically in SparkSQL?. 2 Insanely Clever Tips & Tricks for Running Spark on Hadoop By Daniela in What is Big Data After some interesting industry banter about whether Hadoop or Spark would inevitably rule the universe of big data analytics, it's decided. Shop a wide selection of Memory Card Adapters at Amazon. We are submitting the spark job in our tomcat application using yarn-cluster mode with great success. memory + spark. memory property of the -executor-memory flag. 学校所在地・連絡先 〒222-0023 横浜市港北区仲手原2-34-1. The 512 MB of memory used is what the executor on the node used out of the available 8 GB. A Spark driver (aka an application’s driver process) is a JVM process that hosts SparkContext for a Spark application. But this is not how it works. Deep Dive: Apache Spark Memory Management A Developer’s View into Spark's Memory How Spark Executes A Program | Introduction To Driver Manager, Executor, Spark Context & RDD. This piece of code does the real work of execution. Consider boosting spark. memory property. In our case we use SPARK to run in standalone mode with 4 workers and 4 cores per worker, 16GB driver memory, 16 GB executor memory + max 16 cores & parallelism of 16. Standard spark property (prefix with spark. Spring async vs completablefuture. instances • spark. When we submit a Spark JOB via the Cluster Mode, Spark-Submit utility will interact with the Resource Manager to Start the Application Master. Step 1: Crating the Spark session ( >2. As per Spark versions > Spark 2. Supports spark natively; Web interface to configure number, type of instances, memory required, etc. driver memory + executor memory > worker memory 3. This presentation concentrates mostly on Spark-on-Yarn. With that, we came up with the following Spark configuration: 20 Spark executors with 21GB each (16GB Spark memory and 5120MB YARN container overhead) = 420GB 20 Spark executors with 8 vCores each = 160 vCores. cores; driver配置 spark. 2 Insanely Clever Tips & Tricks for Running Spark on Hadoop By Daniela in What is Big Data After some interesting industry banter about whether Hadoop or Spark would inevitably rule the universe of big data analytics, it's decided. memory 4g spark. In contrast, the Spark cache uses memory. However for this beta only static resource allocation can be used. Fault refers to failure, thus fault tolerance in Apache Spark is the capability to operate and to recover loss after a failure occurs. Take Action: Driver takes data from all Executors. Along with his companions, Princess Leia Organa and Captain Han Solo, Skywalker served on the side of the Alliance to Restore the Republic—an organization committed to the downfall of Emperor Palpatine and the restoration of democracy. puisque vous lancez Spark en mode local, paramétrer spark. At the top of the execution hierarchy are jobs. overhead) for off heap memory default. The throughput of simple workloads would scale well horizontally on the cluster nodes. master spark://5. memory + spark. TaskScheduler的创建和启动. The diagramme above shows that the driver initiates executor creation. 0 environment set up with Python 3 Posted by Dong Meng on August 8, 2016. jar 10 In yarn-client mode, the driver runs in the client process. Spark is a distributed system with a driver program—the main function. instances * spark. We are submitting the spark job in our tomcat application using yarn-cluster mode with great success. Spark includes a fairly simple implementation of the necessary software (“Spark Standalone”) but can also integrate with the Yarn and Mesos cluster management systems. • Guest virtual memory refers to a continuous virtual address space presented by the guest operating system to applications. Deep Dive: Apache Spark Memory Management A Developer's View into Spark's Memory How Spark Executes A Program | Introduction To Driver Manager, Executor, Spark Context & RDD. I tested codes below with hdp 2. These properties are set by creating a configuration object called SparkConf, which will be used to create SparkContext ; RDD transformations. package statistics. Modes of execution. Much of this performance increase is due to Spark's use of in-memory persistence. Every spark application has same fixed heap size and fixed number of cores for a spark executor. 0 version) or SQL Context [crayon-5e25d64c11222997552245/] Step 2: Connecting to ORACLE Database from Spark using JDBC. Apache Hive: Basically, hive supports concurrent manipulation of data. Basically, Driver is a. memory or --executor-memory command line options when submitting the job using spark-submit. 95=10001mb or 9. memory) TreeReduce over Reduce • TreeReduce & Reduce return some result to driver. ExecutorService that submits tasks to a cluster’s server nodes for execution. KryoSerializer Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf. Reduce the executor memory to executor-memory 1G or less; Since you are running locally, Remove driver-memory from your configuration. memory HiveServer2 Default Group. This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. The default is 1g. Determine the maximum size of the data the Spark application will handle. memory – specifies the driver’s process memory heap (default 1 GB) spark.