With the rise of big data comes the need for more highly skilled people to mine and interpret that data for businesses. This is the role of a data scientist, the job that Harvard Business Review called "the sexiest job of the 21st century" back in 2012.
With more and more tech companies looking to make sense of their customer data, and with salaries topping out at £100,000 you can see why graduates with quantitative degrees - think mathematics, computer science, astrophysics - are becoming data scientists.
What is a data scientist?
A data scientist's role is to derive actionable insights from huge data sets. This is different to a data engineer, whose primary role is to store and prepare that data, so someone with expertise setting up and maintaining large databases. The skills required of a data engineer tend to be more technical, with knowledge of Hadoop, SQL and NoSQL databases.
"Data engineers build massive reservoirs for big data," says Sophie Adelman, head of sales EMEA for Hired.com. "They develop, construct, test and maintain architectures such as databases and large-scale data processing systems. Once continuous pipelines are installed to – and from – these huge “pools” of filtered information, data scientists can pull relevant data sets for their analyses."
Data scientist: Qualifications and skills
Adelman says a strong undergraduate degree in 'quantitive' subjects such as mathematics, economics, finance or statistics is key. "You do see a lot of PhD and Masters coming out, as data science is a good way to apply what they have learned," she says.
In terms of hard skills, a data scientist will be expected to know how to interact with, and query, a database, so knowledge of analysis and data modelling skills such as Apache Hive and Pig and programming languages Python and R are useful.
However, Adelman from Hired says it is the soft skills that many candidates fall down on.
"Technical skills are important but people tend to over emphasise that. Commercial experience and soft skills are equally important," she says.
"If people don't have the ability to understand what the problem they are trying to solve is and communicate it in a way that others can understand, it is difficult to be a great data scientist."
Her advice? Go in search of more applied experience if you are a student or recent graduate. "You could use Kaggle to get more real world experience and learn new techniques, or work as a financial analyst in banking, or take Coursera courses, or take on research projects."
Nuno Castro, director of data science at Expedia also sees business awareness as an important skill. "People who can understand the organisation from a commercial perspective, and who create relevant relationships in the organisation will be more successful," he says.
Castro also puts a premium on problem-solving skills. "You and your team will typically be set a high level objective for which you need to determine the best cause of action. Unlike other areas, there is no one recipe for data science. Often you are not trying to find the right answer to a question, you’re trying to find the right questions to ask in the first place."
If you speak to enough data scientists one thing you will hear is how much time they spend cleaning up data rather than analysing it. Sandra Greiss, a data scientist at online retailer Asos, says that even though this takes up eighty percent of her time, and the availability of tools for data cleansing (Trifacta, OpenRefine, DataWrangler), she would only ever want to do it by hand.
"It is frustrating but I think it is also a relief when you are done with it as you will be using something which is correct," she says. "I don't think you would want to rely on a tool. You have to see it yourself."
One skill that is of growing importance in data science circles, and within the enterprise, is machine learning.
"Machine learning is a no-brainer to me. That is the true heart of data science," says Mike Ferguson, an analyst at Intelligent Business Strategies.
"People want to have a pattern detection and a view into the future, so the traditional career in reporting is no longer enough, which is a key reason machine learning is critical. The days of taking data out of a database and doing the analysis somewhere else is done, the data is too big."
Asos' Greiss has seen it grow in importance within the industry: "I think it was already important, and I was asked about it at interview, but I don't have machine learning skills. I think machine learning is something you can pick up pretty quickly if you want to. Now it is something they will probably ask for because it has expanded so much and there is so much free online material available to give you some idea about how to do machine learning."
Castro at Expedia says that a great data scientist must be "persistent, highly energetic and motivated". His advice for any prospective candidates is to: "Follow lots of other data scientists on social media - Twitter - read blogs, learn a new data science technology, practice on Kaggle or possibly enrol in an intensive data science course.
"After you’ve done that, try to get a data science internship. Make sure that you’ll be working on a cool end-to-end data science project or a deep dive on a specific piece with a measurable output, e.g. a new algorithm that you can A/B test, rather than just doing what everyone else doesn’t want to do, e.g. unit tests, though you will still learn."
From a technical perspective, Castro says it is better to learn using open source technologies and skills, like Java, Hadoop and SparkML. This way "when the next technology buzzword arrives you will be ready. If you spend your days working with a proprietary technology with its own programming language and workflow, how transferable is that and how will that add value to your CV?"
Gary Damiano, vice president of marketing at NoSQL database specialists Couchbase said: "The two things you need to consider in hiring a data scientist are: how are you going to use them and how does their skill set match the use?"
For data scientists Damiano wants to see "a programmer skill set backed by deep understanding of statistical regression metrics."
For data engineers, or "wrangling" as he calls it, they need to be "a database or spreadsheet ninja who can identify and build complicated data relationships and break them down into data segments that lend themselves to presentable views of relevant insights.”
Data scientist perspective: Sandra Greiss
Asos data scientist Greiss's Linkedin profile summary is a single line: "I am not looking for a career move. Thanks!"
This is because she is approached by recruiters on the platform on a daily basis, often with badly prepared pitches. "It is pretty frustrating, it's a shame that I feel like they don't all make the effort to know the candidate well," she says.
When she graduated from Warwick University with a PhD in Astrophysics Greiss turned down an offer of triple her entry salary at ecommerce company Lyst to go and work in the finance sector because "I wasn't interested in the topic or the project so I wouldn't give it my best."
Even working at a startup like Lyst brought challenges though. "Finding a job wasn't as hard as I thought it would be but adapting to the industry was challenging. The terms people use in companies and even the coding style was different, in academia they want performance and speed but aren't as particular."
If you can master the skills of data science though the rewards are plentiful. Hired's 2016 Mind The Gap report showed that data scientists are being paid increasingly well as employers look to attract them away from traditional roles in financial services. In the 18 months leading up to that report, salary offers for data scientists had risen by 29 percent. The only role that has increased more is security engineer, at 31 percent.
More recently Hired's State of Global Tech Salaries Report 2017 pegged the average UK data scientist salary at £56,000 for 2016, which is up 5.2 percent year-on-year.
Read next: How to get a job as a network engineer
Find your next job with computerworld UK jobs