Platforms which allow data scientists to build and deploy algorithms are increasingly important as businesses look to operationalise their data faster than ever before.
Gartner defines data science platforms simply as "engines for creating machine-learning solutions". For the sake of this article we have broadened Gartner's definition to include everything from data science workbenches, where teams can collaborate on code and deploy it themselves, to guided data science solutions.
See also: How to get a job as a data scientist: What qualifications and skills you need and what employers expect
It is important to remember that all data science platforms are relatively immature and none are a silver bullet. "Data science is not plug and play,"
Matt Jones, lead analytics strategist at Tessella Analytics told Computerworld UK. "Platforms are fine, but they need to be trained by someone who understands the data and the context it exists in. If you’re outsourcing data science to a tech vendor, be absolutely sure they understand your business and your data."
With that in mind, here are some of the best and most popular data science platforms, from open source to established vendors, being used by enterprises today.
Our top picks are:
Microsoft Azure machine learning platform
Domino Data Lab
Cloudera Data Science Workbench
1. Microsoft Azure machine learning
Microsoft provides data scientists with a fully managed cloud service for building and deploying predictive analytics into live environments with its
Azure Machine Learning platform. The platform comes with built in packages to support custom code in your preferred language, be it Python or R, and a plethora of documentation for data scientists to get started.
The Azure platform allows data scientists to deploy models into production quickly as a web service and then share them on the Azure marketplace to gain exposure. Customers include Carnival Cruises, JLL and Fujitsu.
2. Domino Data Lab
© Domino Data Lab
Domino Data Lab's platform is another 'workbench' solution, allowing data science teams to do modelling on their preferred data sources, using whatever tools and programming languages they are comfortable with and to collaborate and deploy models straight from Domino as APIs.
It then acts as a hub for all data science activity, elastically provisioning compute in the cloud and deploying in a consistent, secure manner so that IT can take a back seat. Data science teams at insurers Zurich and Allstate are both customers of Domino.
3. Cloudera Data Science Workbench
Analytics vendor Cloudera launched its "
Data Science Workbench" in March 2017 following the acquisition of Sense.io a year ago. The workbench is intended to be a platform where data science teams can work with their data in popular programming languages like R, Python and Spark in a secured-by-default, collaborative environment.
The idea is to make the modelling and deployment of machine learning and advanced analytics within the enterprise at far greater speeds than if they had to worry about anything other than the actual data science.
4. SAS Viya
Analytics and BI vendor SAS provides data science and machine learning capabilities through its Viya platform.
This is an example of an analytics vendor providing customers with a platform where they can take their advanced analytics work out of self-contained clusters and into an environment where they can be deployed in a secure, consistent way.
"We try to enable people to use what they want to use, but not reinvent the wheel every time," Peter Pugh-Jones, head of technology at SAS UK and Ireland told Computerworld UK.
The French startup
Dataiku provides a host of guided data science and machine learning processes on its platform DSS. The platform has a level of abstraction so that anyone using it can either code in Python, Pig, R, Hive etc. or use drag and drop functionality to wrangle and model data.
The platform allows teams of data scientists, data analysts, and engineers to prototype, build and deliver data solutions into the businesses from a single place. Previous customers include L'Oreal, Trainline and AXA insurance.
In its more recent releases Dataiku has added point-and-click capabilities (called 'visual recipes') for data preparation, the ability to monitor model performance during training, and support for Python 3 with a new code editor.
6. IBM Data Science Experience
IBM offers a range of data science tools and is preparing to release an
IBM Watson-guided machine learning platform.
The current iteration comes with built in learning, so that data scientists can improve the more they engage with the platform, collaboration features and notebook tools for working with popular programming languages, like Jupiter Notebooks for Python and RStudio for R. The enterprise version of the platform retails at $9,200 per instance per month and provides managed Spark clusters and flexible storage.
Open source data science platform
RapidMiner helps the likes of BMW, Samsung, Dominos and Barclays launch data science projects.
Tools on the RapidMiner platform include Studio, for visual data science workflows, Server for operationalising models, and Radoop for workflows using Hadoop data.
For larger customers or projects there are enterprise versions of the platform which range from $2,500 to $10,000 a year depending on the rows of data.
The open source and
free Knime Analytics Platform looks to give data scientists a blank canvas to work on projects using various data sources and the tools they are comfortable with in a scalable environment.
The open platform comes with thousands of native nodes and modules, extensive documentation and pre-packaged advanced algorithms to get started quickly. Data scientists can toggle quickly between single computer, streaming or big data on top of or alongside existing infrastructure and makes sure that everything is backwards compatible and easily portable for flexibility.
9. Splunk Machine Learning Toolkit
Big data specialist Splunk has moved into more integrated machine learning within its platform over the past year or so, but the vendor also provides a
Machine Learning Toolkit for custom models.
The advantage of using Splunk over other workbench solutions is that you can model straight on top of machine-generated data - Splunk's area of expertise - so security and IoT use cases are a natural fit.
The Toolkit is a guided workbench for data scientists to model and deploy algorithms in the most popular programming languages. There is also a library of pre-built Python algorithms for popular use cases, and plenty of documentation and tutorials to get started straight away.