Analyst house IDC has made its "digital universe" predictions for the coming years - a measure of all the digital data created, replicated and consumed, and says companies will have to brace themselves to help manage most of it - on lower budgets.
The report, which has been commissioned by EMC, claims that from now until 2020 the digital universe will "about double" every two years. By 2020, said the analyst, the digital universe will grow to 40,000 exabytes, or 40 trillion gigabytes - more than 5,200 gigabytes for every man, woman and child in 2020.
But the overall investment in managing, containing, studying and storing the bits of data in the digital universe will only grow by 40 percent between 2012 and 2020. As a result the investment per gigabyte during the same period will drop from $2.00 (£1.24) to $0.20 (£0.12).
IDC said that although the majority of the information in the digital universe - 68 percent in 2012 - is created and consumed by consumers, enterprises have liability or responsibility for nearly 80 percent of it, including copyright, privacy and compliance issues.
IDC estimates that by 2020 as much as 33 percent of the digital universe will contain information that might be valuable if analysed.
However, the vast majority of new data being generated is unstructured. "This means that more often than not, we know little about the data, unless it is somehow characterised or tagged, a practice that results in metadata," said IDC.
The analyst said metadata is one of the fastest-growing sub-segments of the digital universe, though metadata itself is a small part of the digital universe overall.
"We believe that by 2020, a third of the data in the digital universe (more than 13,000 exabytes) will have big data value, but only if it is tagged and analysed," said IDC.
IDC said CIOs had a long way to however to get their big data houses in order. In 2012, it said, around 23 percent of the information in the digital universe (or 643 exabytes) would be useful for big data - if it were tagged and analysed.
However, it said, only 3 percent of the potentially useful data is currently tagged, and even less is analysed, something which IDC describes as the big data gap.