what is dark data and why does it matter? Data Science Topic

Blog image for how online Data Science course works in pakistan for Students to work and to start best data science courses in islamabad

Do you Know Online Earning Websites can help you better to earn money online without any innvestment?

What is dark data?

Dark data, a term defined by experts as “the information what an organisations collect, process and store during regular business advertisement, marketing or leads generation activities, but generally fail to use for other purposes. Like dark means, dark data takes up huge amounts of space in data centres and is virtually invisible, none of use and remain in dark places. This doesn’t mean we can ignore it. It’s worth taking a moment to think about the nature of dark data, its impact and what we might be able to do to improve things.

Personal footprint

Dark data is easiest way to get private information of potential customers what interact with business. it is easiest way to grasp and deal with at a personal level. For most of us it consists of unused photos and videos. In the old days, film was precious and development expensive, but now we can take 20 shots to get the one we want, and we can edit easily, creating more backup files in the process. In 2020, Google said it stored 4 trillion photos, with 28 billion new photos and videos uploaded each week. Google Photos is just one photo service, and those upload rates have no doubt grown in the last few years.

This personal dark data also creates a privacy issue. However secure a cloud service is, there is always the possibility that data such as ID photos, personal chat screengrabs and private files can be used by cybercriminals. The answer? Think before you shoot, tidy up caches and archives regularly, and be particularly careful not to leave sensitive files lying around.

Hidden losses

For the businesses, it is a big challenge to manage it as dark data is a larger scale and affects the bottom line. Dark data consists of things like near-identical images, filled forms, or attached documents, IoT data sets, log files, and applications. This data takes up server space in huge manner, and powering these servers takes up energy and equipment, which not only costs money, but can also mean significant emissions if low-carbon or renewable power is not being used. Dark data is also unstructured and unexplored, which brings with it privacy and compliance risks.

If you can think about it, there is no any organisation who is live and is no affected of such dark data issue. Globally, estimated levels of commercial dark data vary by sector from 40 to 90%, so it’s extremely likely that the majority of your company’s data is dark. According to the World Economic Forum, companies generate 1.3 trillion gigabytes of dark data every day. Storing that data for a year using non-renewables generates as much CO2 as three million flights from London to New York. So, if we’re interested in decarbonising — and we should be — we should tackle this issue.

Technology lag

For many businesses the level of dark data is a reflection of a lack of data structuring processes. The ability of an organisation to collect data can exceed the throughput at which it can analyse the data. In some cases a business may not even be aware the data is being collected.

Companies retain dark data for a multitude of reasons. Often it is stored for regulatory compliance and record keeping, but equally often the complexity of compliance, privacy and data discovery is the reason that these data lakes are allowed to build up. Some organisations believe that dark data could be useful to them in the future once they have acquired better analytic and business intelligence technology to process the information.

New tools and standards

There is good news here. The scale of the task may appear daunting for CIO and CDOs, but artificial intelligence (AI) and machine learning (ML) have now advanced to the point that they can help automate the data structuring process. Only a tiny percentage of dark data needs to be reviewed at the outset by humans to kickstart the process. This can then be followed up with a reinforcement learning model to assess the relevance of remaining data and prioritise it. From then on, a virtuous cycle of tagging and analysis makes the process easier to manage.

Measurement would also help to benchmark progress. Considering the scale of the problem, there may be a case for setting standards for effective data use such as a Data Usage Effectiveness (DUE) metric to sit alongside CUE (carbon), WUE (water) AND PUE (power). This, or some similar metric, would be well worth working towards, and could also have value as a digital performance indicator. However, it may be too early to measure, while so much dark data remains invisible.

Conclusion!

Whatever dark data means to you or your business, it is an ‘elephant in the room’ for data centres, and the more we talk about it the likelier we are to come up with incremental improvements. For individual data users there are things we can do to reduce single-use data. For organisations, it’s a bit more complicated, but approaches and tools are emerging. These should be discussed and shared.

As with energy efficiency, identifying and eliminating waste at source is the most obvious opportunity. According to IBM, 60% of data loses its value within milliseconds of being acquired, and any scheme to use data more effectively must first address the issue of collecting useless data. A robust approach to data gathering is the key here; assessing how data can be used, or if it is usable.

The next step is structuring the data we keep. Structured data is not only more valuable, but easier to track and, if necessary, delete. By making data more visible, it should be possible to reduce the environmental and financial burden of storage at the same time as using our valuable data to empower our organisations and serve our customers better.

Comment (3)

  • Reviewer image for freelancing course
    Kazim Bukhari
    Sep 12, 2023

    Such a wonderful Data Science information, great sharing

    Reviewer image for freelancing course
    Shayan Umar
    Oct 22, 2023

    nice blog sharing about Idea of Dark Data

Leave a comment

Careervision Whatsapp Logo Iamge