If you access the files via Hive (instead of loading them into spark directly) then perhaps you could store the files in Hive tables partitioned on the join columns? Just a thought since you mentioned that you do a lot of joins.
↧
If you access the files via Hive (instead of loading them into spark directly) then perhaps you could store the files in Hive tables partitioned on the join columns? Just a thought since you mentioned that you do a lot of joins.