Dear experts,

I am currently trying to figure out what is needed in order to run Apache Spark [1] on an HTCondor cluster [2]. It seems that Spark can use external scheduling (YARN, Mesos), which means that at least in theory, this should be possible.

Before I dive too deep into Spark, I wanted to ask around if someone has tried this before.
There have been talks about Spark at the last HTCondor Week [3], so it seems that there is interest.



Our cluster is Hadoop (HDFS + YARN) but with YARN disabled - we use HTCondor instead for scheduling (similar to some US sites?)

