Tag: partition

Apache Spark: Repartition vs Coalesce

Repartition can be used for increasing or decreasing the number of partitions. Whereas Coalesce can only be used for decreasing the number of partitions. Coalesce is a less expensive operation than Repartition as Coalesce reduces data movement between the nodes while Repartition shuffles all data over the network. Partitions What are partitions? The dataset in […]

Scroll to Top