What’s the deal with ‘big data’?

There’s been a lot of talk recently about “big data” and what it means for our society. While data mining is often utilized toward the bottom line for businesses, that’s just one use of big data. Insights gained through data mining can also help organizations and people understand more about each other.

Definitions

The process of data mining, also called data discovery or knowledge discovery, involves taking a set of data and analyzing it from a variety of perspectives and through various methods to boil it down into information that is useful for a certain purpose to an individual or organization.

In order to understand data mining, it is first important to understand data — facts, text, and numbers — basically anything that a computer can process. Within the concept of data, there is operational data, or transactional data, which in a business context can mean things like sales data, costs, payroll, inventory, and accounting and financial data. There is also nonoperational data, which still matters to the business but does not involve transactions, such as industry sales trends, data involving forecasts, and macroeconomics. Finally, there is metadata, which is data about data — database designs and data dictionary definitions.

Data by itself can’t tell the whole story. It is the associations and relationships between available data, and patterns that can be derived from those relationships, which provide useful information. Once useful information is gathered, it can then be converted into knowledge about patterns of things that have happened in the past and how those events might predict what happens in the future. Data is often stored in central repositories known as data warehouses, which allow for more efficient retrieval and analysis (Palace, 1996).

There are a few different types of relationships between data gathered via data mining that are helpful and sought after often. Classes locate data in groups that have already been predetermined, like customers and the purchases they make. Clusters are where data items are grouped together via logical connections or preferences, such as identifying market segments. Associations are where data is mined to identify how things are associated to one another. Sequential patterns can anticipate patterns of behavior and trends for the future, predicting what purchases customers might make next, for example (Palace, 1996).

Data mining has a few key properties, according to Oracle, publisher of servers and software that help businesses carry out the process. Oracle defines those properties as the automatic discovery of patterns, the prediction of likely outcomes, creating actionable information, and the focus on databases and large data sets. Particularly, data mining answers questions that are not answered via simpler reporting techniques and queries (Oracle, 2014).

Applications

While there are many applications for data mining, the most common is by companies that have data on customers that they wish to convert into more useful information. So, we see data mining applied often in retail, marketing, communication, and financial organizations.

Companies like these can take their internal data and combine it with data from external sources and factors like demographics, their competitors, and indicators in the overall economy to learn more about their customers and figure out what their customers are likely to do in the future. They can gain insights about their customers that simply running queries on their internal data would not be able to provide (Palace, 1996).

Current Development

When the concept of data mining was first crystallized, it had a limited use case. But now organizations of all kinds employ data mining to understand more about the populations they serve — it isn’t just about for-profit corporations increasing the bottom line. Non-profit organizations can use data mining to better serve their communities. Educational institutions like universities can use data mining to better understand their student populations.

We have seen in recent years a move toward more data mining in contexts that were previously not considered or not possible. Sometimes data mining can be used simply to start conversations — for example, Facebook recently aggregated and analyzed data among hundreds of millions of its users to find out how people of different demographics and in different areas feel about a range of political topics and issues ahead of this year’s midterm elections.

The company depersonalized the data and published it in an aggregate form to share with its users and the public, creating an interesting set of information for the general public to peruse while also increasing its value proposition to its shareholders (Gold, 2014).

Future Trends

Moving into the future, we will see data mining applied to even more contexts that were not possible before. InformationWeek referred to the recent Ebola outbreak in West Africa as a “test for data mining and analytics”, noting that the U.S. Centers for Disease Control and the World Health Organization have typically relied on conventional estimates to track the spread of diseases but those traditional measures fell short this time.

It mentions a service created by Boston Children’s Hospital called HealthMap that “provides early detection and real-time surveillance on emerging health threats by aggregating and analyzing information from multiple sources”, such as official government data, news reports, social media posts, and travel sites. The paper notes that “big data doesn’t replace traditional data sources or surveillance networks in watching for outbreaks — it helps make them better”, illustrating how data mining will be applied to ever more important and critical contexts in years to come (Vijayan, 2014).

Summary

We have seen over the past couple of decades how data mining has evolved from mainly a consumer information tool to something of global importance. The technology behind data mining is just one piece of the puzzle — the societal, political, personal and privacy, and cultural factors that go into data mining are becoming more important as time goes on and more of the public is aware of data mining and how it affects their lives. Data mining is now an invaluable process across all major industries and will only become more and more prevalent as our world becomes even more interconnected.

References

Back to top
Spotlight