Why I think Open Source software will eat the Industrial SCADA

This is the first of what I think will be 2 posts regarding open source software and Industrial SCADA systems.  In these posts I will address, at a very high level, the how and the why.

First, the how.

I have been involved in the industrial automation world going back to my days as a co-op sitting at a Provox console watching FST’s run on a funda filter cleaning up washwater for million dollar resin beds.  That was 1998.  My first year or so of permanent employment revolved more around process engineering but after I moved to our facility in Houston I was all in on the controls world.  Starting up a $350MM new facility on what was at the time the largest DeltaV installation in the world was an amazing opportunity that I only really appreciated years later.  Fast forward a few years and I left that operating company to join a wonderful integrator in North Carolina, Avid Solutions.  While at Avid I had the opportunity to work on systems as diverse as DeltaV, ABB Bailey Net 90, Rockwell, and Wonderware, just to name the top few.  While a majority of my time was spent with Wonderware software I think I have a pretty good basis to judge a wider range of offerings.

One thing you learn or accept very quickly is that setting up industrial automation software systems is just hard.  If you have every tried to setup a Factory Talk environment you know exactly what I mean.  Wonderware System Platform is better but you need to get your SQL Server installed and then configure the web server if you want to use Information Server, don’t forget your special network accounts and make sure you aren’t running a DC on any platform because….. My problem was that after a while I got so good at it, everything seemed simple and easy.  At some point you convince yourself that this is really complex stuff because it has to be so reliable and runs such critical functions.  If it was too easy well that meant it wasn’t really reliable and ready to use in “my” environment.  Maybe the kiddos down the street can play with it but it’s not good enough for me. People from outside industrial automation certainly couldn’t understand what it takes to do what we do.

I had always known about famous open source projects like Linux and Apache.  I even setup and played with a few Linux VM’s.  The problem was that it was always way too complicated to even get started.  And oh goodness running an Apache webserver.  I tried one time and fell on my face hard.  I liken it to a race car driver in an F1 car. You know the capabilities of the car are out of this world.  But if you take an average person and put them in a F1 car I suspect you will quickly form an opinion that the car is unwieldy, difficult to control, and doesn’t perform very well.  However, put a professional driver behind the wheel and you quickly change your opinion.  You realize that it takes serious expertise to harness the power and technology.  I always thought that about open source software.  It may be really powerful but unless you are an expert it is simply not worth the effort.

But then something, or some things, changed in my mindset.  If I had to pin it on one thing I think it might be the discovery of node.js in late 2013.  For the unitiated, node.js is basically a way you can run javascript, yes javascript, as a server side process to serve has the backend for an application.   Most things will talk HTTP to it but you can just as easily have a conversation in TCP if you like.  Here is a link to the main node.js site with an introduction.   Then you see it.  If I want to run a web server and serve up a page the hello world is this.

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');

To run the server you simply type this at your command line

node example.js

That’s it. Go to your web browser and go the URL and you see a web page that simply says “Hello World”.  That’s impossible.  It can’t be that easy. Don’t I have to run a huge web server like IIS or some other complex project with some JAVA or some other resource hog.  Nope.  Install Node with click click next and roll.  So the boo-birds will chime in and say “well that’s neat and all but that’ s not for serious production use.  It’s just scripting.  How reliable can that be?  I’ll put it succintly.  The back end systems supporting Wal-Mart Mobile are all written in Node.JS.  Yes, I said that correctly.  Here is a nice article from Tech Crunch about the staggering scale of traffic Wal-Mart mobile handled this past year.  Do you think your requirements can even sniff the roundoff errors in the kind of load and speed these guys are handling?  If you want a deeper dive from 2013 here is a great NodeUp episode with the team from Wal-Mart labs.  http://nodeup.com/fiftysix

Ok, so there is some open source technology out there that could support might needs in terms of a run-time environment but the protocols and applications are just so complex.  I thought the same.  Until I kept sniffing around and discovered MQTT.  What is MQTT you ask?  It is a standard protocol developed by IBM many years ago specifically for an oil pipeline project to convey sensor data over high latency, low bandwidth network.  There is plenty to read about at MQTT.org it but basically think OPC but designed for a much lighter weight implementation and instead of having client-server you have client-broker with a publish-subscribe model.  I won’t go into the details here but if you do some brief reading you can very quickly understand how a data source (like a sensor or a PLC) can publish data to the broker who then in turn sends that data out to all clients who have subscribed to it.  The concepts are fairly basic and fairly universal across message buses in general.  This is amazingly similar to what most if not all SCADA systems operate.  The SCADA I/O server can be thought of as the broker but it also acts as a client to many devices in the field.  The only problem is that if you want to get data in our out of any SCADA system you have two options.  Use their proprietary software tools for visualization, alarm and event handling, and historization or use an OPC server/bridge to get data in and out.  The final option, if you SCADA offers it, is to write a custom package using toolkits from the vendor to read and write the data.  We can do this with the MXAccess toolkit for Wonderware System platform.  It’s fairly easy to use and works pretty well.  But all the work and you can only access data from Wonderware System platform.

So this MQTT things sounds like crazy complicated with all the brokers and publishers and subscribers.. well hang on.

Step 1.  Get yourself a broker.  Thing of this as the grand central station for your data.  All data goes up through the broker.  The most popular one is mosquitto.  It must be complex to run right?  Nope.  Download from here and run a single command

mosquitto.exe

Good, now you have a broker running.

Step 2.  Write a client that connects to the broker and send/receives data.  Because most of my life is in C# I’ll post the relevant snippets.

// Setup your client  MqttClient 
client = new MqttClient(IPAddress.Parse("192.168.10.53")); 
// Send Data to the broker to anyone else can see it 
client.Publish("sensor/temp", Encoding.UTF8.GetBytes(temp); 
// Setup a subscription to receive data with 
// specific quality of service guarantees

string[] topic = { "sensor/temp", "sensor/humidity" };

byte[] qosLevels = { MqttMsgBase.QOS_LEVEL_AT_MOST_ONCE, MqttMsgBase.QOS_LEVEL_AT_MOST_ONCE };
byte[] grantedQos = client.Subscribe(topic, qosLevels);
//Finally setup to handle any data you receive from your subscriptions
client.MqttMsgPublishReceived += client_MqttMsgPublishReceived;
void client_MqttMsgPublishReceived(object sender, MqttMsgPublishEventArgs e)
{
// access data bytes through e.Message
}

In literally less than 20 lines of code you have the foundation for a complete pub/sub model with guaranteed delivery of messages.

OK, so we can move data around in an efficient manner, but what about visualization?  I have one word..ok maybe acronym.. for you, HTML5.  With a combination of a javascript client library for MQTT and HTML5 canvas there is essentially no limit to what can be accomplished for visualizations.  Taking a look at existing libraries like D3 and you get a pretty good idea that we aren’t talking about your mom and dad’s ugly dial gauges anymore.  And D3 is only one library out of many that are available today and in use by many thousand around the world.

So I hear all of this and I’m still skeptical because any good SCADA needs a historian.  And in industrial automation, well, we have lots of data and it’s moving really really fast.  We all know off the shelf SQL servers databases are resource hogs, very expensive, and can’t really handle my situation.  For the skeptics, I give to you.. InfluxDB.  They call InfluxDB an application that is designed for time series data.  In other words… a historian.  It can handle non time-series data, like logs, alarms, and events, but their differentiation is the tooling around time series data.  Do yourself a favor and spend 15 minutes looking over their site.  Tell me that aside from the data ingest what does it not do that your current historian is doing for you?  99% of what you do today is just graph some points and do some occasional min/max/average summaries.  So that seems fine but I’m sure it can’t scale to what I need it for.  I’ve got like 50,000 tags and about 1/4 of them change every few seconds.  While I can’t comment specifically about performance as I have not tested it directly, take a look at this write up from the middle of last year discussing their performance testing for underlying datastores.  After you read that you start to get a small sense for what kind of scale they intend for this solution to operate at.  Also a post from the lead developer in late 2013 indicated that they were seeing somewhere around 20K to 70K points per second on a single node.  And one of the more recent posts indicates they are look at the 2M points PER SECOND type of performance for a cluster.

So what have I done here?  I have just laid out the foundational elements for a 100% open source SCADA system using standards based communication methods.  Yes, OPC is a standard.  But I ask you to go off and write a client or a server.  You will very quickly realize it is really complex and at some point you will probably be ponying up some cash to the OPC foundation for something.  Doesn’t sound terribly open an inviting to me.  A standard way to communicate, yes.  Open, no.

So what’s missing, why hasn’t anyone done this yet? First, we are still missing the tools to make it configuration and not programming.  While those who work on control systems like to call themselves (me included) programmers we are really configurers.  We configure scripts and code inside of a pre-built environment.  We don’t worry about scheduling loops, multitasking, authentication mechanisms etc.  We point, click, and import csv files.  No company or person has built the tooling necessary to make this underlying tech accessible to configurers.  If you want a great case study on tools making all the difference go read up on Docker.  Basically think of a lightweight VM inside a VM.  At it’s core all Docker has done is create a nicer interface to some underlying tech, namely LXC, that has been around for a while.  I think right now we are at the pre-Docker stage.  The technology is in place, we just need tools to configure, not program.

So what has changed or is changing to make me thing this could happen.  One loaded acronym, IOT.  I used an expression a while back that went something like this; “the barbarians are at the gate and they are bringing their fancy development tools and source control with them”.  What I meant by this is that the rest of the world.. and it’s a really big world.. has all of a sudden gotten the bug for connecting their physical world to the digital world.  Don’t worry I wont, and really can’t, wax poetic about what all this means in a big picture.  But on a small scale the people who only cared about lines and pictures on a screen before are looking over at us and saying hey, I can read a temperature over TCP/IP and it’s really easy.. I can do something with that.  I think I’ll make a Nest thermostat and sell it for $3B+ to Google.  You want a good summary of how I feel right now with respect to all of this technology.  Watch this video about people basically recreating a Nest thermostat for almost nothing.  Someone is going to find an opportunity and seize it.

I feel incredibly strongly that we are doing to see a massive influx of new interested parties gather around the walls of our little fortresses of solitude.  We’ve told outsiders for years they are welcome to come on in, but they have to do things they way we’ve always done them.  No wonder the pace of innovation in our industry sucks.  Yes we run our systems on Windows instead of VAX and the IDE’s are a little better but at the end of the day we really aren’t that far along as compared to the rest of the software world.  Again, go read up on the histories of node.js and Docker.  They literally did not exist a few years ago and now they are deployed on a massive scale across thousands of applications, many of them mission critical; again see Wal-Mart.  Please don’t try to convince me that a technology has to be around and in use for 5-10 years before it’s “good enough”.

There will always be applications where new technologies are not welcome and sometimes for good reason.  Nuclear is the extreme example.  I was chatting with a project manager for a very large multi-national automation company here in Taiwan about a year ago.  At the time there was great debate about whether or not to build a 4th nuclear power plant.  Basically what he was telling me was that if the plant was actually built due to timelines and technology lockins they would probably startup a 10 year old version of the DCS in the beginning.  This is an extreme corner case.  The vast majority of our applications don’t have the sort of redundancy and fail-safe requirements that prevent us from using even remotely modern tech.

So that’s the “how”.  There are lots of holes in my argument.  I accept that.  Instead of showing you a finished product instead I am showing you just a few examples of foundational elements.  What I have shown above is by no means the best or even a complete solution.  Instead I want to inspire you to look at these pieces and go find others.  The open source world is moving so fast that 6 months from now the best solution for a particular piece might be a completely different application that is not on anyone’s radar right now.

In my next post I will discuss the “why” of open source for Industrial SCADA.

12 thoughts on “Why I think Open Source software will eat the Industrial SCADA

  1. This is one of the most brilliantly simple, elegant and honest expositions of our ripe-for-disruption-space Ive read in long time. Really nice work. Come help us blow up Rockwell from the inside, in a good way!… We’ve assembled a band of merry men and women and the jolly roger is flying…

    If nothing else, love to have you help inspire a few old gorillas standing between the passionate (young and old) wanting to make a difference… I cant promise a big speaking fee or anything but can promise a bunch of really cool nerdy engineers who love tech, love the art of making, and who want to change old ways

    Ryan

  2. Just curious if you took these ideas into a prototype stage or not? I am currently working on a PoC that uses most of the same building blocks but I keep hoping to find collaborators that are interested in building the core glue together so that we don’t all have to keep reinventing the wheel.

    1. Hey Bob
      Unfortunately I transitioned back to an integrator about a year ago and haven’t had much time to work on these things. I’m always up for engaging conversation and bouncing around ideas. If you check out my work at the aasopensource github you can see bits and pieces I’ve been working on but nothing as comprehensive as I would like.

      -andy

    2. Hey Bob,

      Is your PoC public (github?). I code on embedded, linux and hardened kernel, so
      I’d like to see if you are “on track”. A group of us had an opensource project to support multi-protocol scada some years ago, but folks left for a paycheck::

      https://sourceforge.net/projects/qscada/

      Reading about MQTT, it look interesting, but is not critical at all. In fact it will probably be a huge source of attack surfaces, if not well designed. I have 1.4.10 built from sources, so I’ll go look at those codes. Other do not understand that modbus has morphed into modbusTCP so all the intelligence any server needs does not require a gateway broker. That said, I’m all for supporting dozens of
      open and proprietary protocols, encryption and embedded systems with hardened (linux) kernels.

      The big need is a open (non-proprietary) library or system for objects that can be automated to build the scada server. Perhaps we should move to a forum where open collaboration can work. I know lots of coders that are into SCADA, the problem is companies make their money off of proprietary systems, you really need to find VC or other significant sources of revenue. Hell, I know a single coder can be hired to develop the entire thing, if you find funding. He build SElinux for the NSA:: Rusell Coker and Aussie and one hell of a smart and cool dude. If a company wants to open source this, then funding is really all that is need.

      He could easily lead a team and have SElinux codes and config to secure the entire thing via SElinux. SElinux is now on available on some android systems.

      That said, I’m more than willing to help the project stumble around until it get’s legs.

      hth,
      James Horton, PE

  3. SCADA will change when a few, strong coders actually put something out on github or another repo. It will expand rapidly, because all of the current SCADA systems are comprimized. Pentesting of an open source SCADA solution is the only thing that will keep
    SCADA safe. The fees will be made by consulting and integration of stealth monitoring technologies employing out of band transmission. You’ve got to monitor the Rf spectrum too, which will be expensive but not that difficult, if the NSA allows advanced systems for industry. The simple thing to do is just not connect the controls network to the internet or the corporate network (no brainer solution that vendors just hate).
    TTFN
    James Horton, PE

  4. Like you im a former WW, RA (insert brand) consultant who has been doing this for a while. I 100% agree with everything you’ve said and we actually found a platform that supports what you’ve just described natively. No chicken wire and duct tape needed. It’s the same software that is running the CERN super collider and was built on a publisher subscriber model over 20 years ago. Siemens bought the platform PVSS II back in 2006 and renamed it WinCC Open Architecture. Its a lesser known platform that is catching fire in our region.

    Great article! I hope you find “OA” and quit beating your head against the keyboard:)

    Cheers,

    Shawn

    1. Funny you mention OA. Back in 2013 I had the privilege of getting to know the outgoing deputy for automation for the CMS experiment at the LHC. He is now the principal at Cleverdist (http://www.cleverdist.com/). We had a great time comparing and contrasting OA and System Platform. Both of them got a lot of things right in terms of building big open systems. My friend definitely has tried to lean on me to provide OA integration services stateside but it hasn’t lined up with our verticals or typical deliveries with my current employer. I saw a few things on your site that indicated they have overhauled the UI as well as adding support for C#. I definitely might take some time to check it out again.

      1. We’ve got the Cleverdist guys coming stateside in a few months to work with a large automotive manufacturer. Small world!

        Thanks again for the article, Andy.

        Shawn

  5. Fascinating story! Great write-up. Just stumbled upon this article while Googling “Node-JS for industrial automation”. I too have been working in industrial automation since 1999, starting with the Bailey Net90’s then migrating to A-B ControlLogix. I traveled globally for an integrator for 10 years installing/programming Step7’s, Wago’s, A-B’s, etc… and too found myself asking, does it really need to be this hard to build some screens and expose graphics to clients? The FactoryTalk SE software alone, in 2007, was $8k and my time brought the whole project to close to $20k, just for screens!
    I am now in consulting, still dealing with PLC’s but also building automation controls and DCS’s (e.g. Delta V). All three industries are very ripe for a complete takeover for all of the reasons you’ve mentioned in your article.
    I spent a couple of years playing with my Beaglebone Black (similar to RaspPi), trying to get it to simply monitor and store a temperature. I read books, spent countless hours working in Eclipse and native command line prompts to no avail. I gave up…it was too hard. I then found node-red and that changed everything. Within two weeks I was able to monitor, store, trend, text a number of temperatures, my furnace and water heater run times. I am also using MQTT but my end devices are simple/cheap ESP8266’s. For about $10 per temperature monitor, i’m up and running. It’s so easy to use and with Linux, it always works. That’s the amazing part, even in power glitches, network outages, it recovers, with no extra coding. I’m now thinking about testing it in an industrial environment, like you are suggesting. The problem is always time, never enough.

    I will continue to watch your progress with great enthusiasm.

  6. Great story Andy!
    I agree with your point of view here. I think the growing quality and coverage of open source software will surpass most proprietary expensive solutions, it already did in many cases.
    I use and develop open source software since 1999 for real time and historical SCADA data processing.
    I’ve built a cloud service for historical and real time SCADA data recording and visualization, almost entirely using open source software! Check it out: https://xplaincloud.com.
    Some tools I’ve used: InfluxDB, MySQL, Grafana, Kapacitor and Open Substation HMI (OSHMI).
    Thank you for sharing your insightful thoughts.

Leave a Reply

Your email address will not be published. Required fields are marked *