I’ve been learning Scala as part of my continuing professional development. Scala is a functional language which runs primarily on the Java Runtime Environment. It is a first class citizen for working with Apache Spark – an important platform for data science. My intention in learning Scala is to get myself thinking in a more functional programming style and to gain easy access to Java-based libraries and ecosystems, typically I program in Python.
In this post I describe how to get Scala installed and functioning on a workplace laptop, along with its dependency manager, sbt. The core issue here is that my laptop at work puts me behind a web proxy so that sbt does not Just Work™. I figure this is a common problem so I thought I’d write my experience down for the benefit of others, including my future self.
The test system in this case was a relatively recent (circa 2015) Windows 7 laptop, I like using bash as my shell on Windows rather than the Windows Command Prompt – I install this using the Git for Windows SDK.
Scala can be installed from the Scala website https://www.scala-lang.org/download/. For our purposes we will use the Windows binaries since the sbt build tool requires additional configuration to work. Scala needs the Java JDK version 1.8 to install and the JAVA_HOME needs to point to the appropriate place. On my laptop this is:
JAVA_HOME=C:\Program Files (x86)\Java\jdk1.8.0_131
The Java version can be established using the command:
javac –version
My Scala version is 2.12.2, obtained using:
scala -version
Sbt is the dependency manager and build tool for Scala, it is a separate install from:
http://www.scala-sbt.org/0.13/docs/Setup.html
It is possible the PATH environment variable will need to be updated manually to include the sbt executables (:/c/Program Files (x86)/sbt/bin).
I am a big fan of Visual Studio Code, so I installed the Scala helper for Visual Studio Code:
https://marketplace.visualstudio.com/items?itemName=dragos.scala-lsp
This requires a modification to the sbt config file which is described here:
http://ensime.org/build_tools/sbt/
Then we can write a trivial Scala program like:
object HelloWorld {
def main(args: Array[String]): Unit = {
println(“Hello, world!”)
}
}
And run it at the commandline with:
scala first.scala
To use sbt in my workplace requires proxies to be configured. The symptom of a failure to do this is that the sbt compile command fails to download the appropriate dependencies on first run, as defined in a build.sbt file, producing a line in the log like this:
[error] Server access Error: Connection reset url=https://repo1.maven.org/maven2/net/
sourceforge/htmlcleaner/htmlcleaner/2.4/htmlcleaner-2.4.pom
In my case I established the appropriate proxy configuration from the Google Chrome browser:
chrome://net-internals/#proxy
This shows a link to the pacfile, something like:
http://pac.madeupbit.com/proxy.pac?p=somecode
The PAC file can be inspected to identify the required proxy, in my this case there is a statement towards the end of the pacfile which contains the URL and port required for the proxy:
if (url.substring(0, 5) == ‘http:’ || url.substring(0, 6) == ‘https:’ || url.substring(0, 3) == ‘ws:’ || url.substring(0, 4) == ‘wss:’)
{
return ‘PROXY longproxyhosturl.com :80’;
}
These are added to a SBT_OPTS environment variable which can either be set in a bash-like .profile file or using the Windows environment variable setup.
export SBT_OPTS=”-Dhttps.proxyHost=longproxyhosturl.com -Dhttps.proxyPort=80 -Dhttps.proxySet=true”
As a bonus, if you want to use Java’s Maven dependency management tool you can use the same proxy settings but put them in a MAVEN_OPTS environment variable.
Typically to start a new project in Scala one uses the sbt new command with a pointer to a g8 template, in my workplace this does not work as normally stated because it uses the github protocol which is blocked by default (it runs on port 9418). The normal new command in sbt looks like:
sbt new scala/scala-seed.g8
The workaround for this is to specify the g8 repo in full including the https prefix:
sbt new https://github.com/scala/scala-seed.g8
This should initialise a new project, creating a whole bunch of standard directories.
So far I’ve completed one small project in Scala. Having worked mainly in dynamically typed languages it was nice that, once I had properly defined my types and got my program to compile, it ran without obvious error. I was a bit surprised to find no standard CSV reading / writing library as there is for Python. My Python has become a little more functional as a result of my Scala programming, I’m now a bit more likely to map a function over a list rather than loop over the list explicitly.
I’ve been developing intensively in Python over the last couple of years, and this seems to have helped me in configuring my Scala environment in terms of getting to grips with module/packaging, dependency managers, automated doocumentation building and also in finding my test library (http://www.scalatest.org/) at an early stage.