Big Data with AIIM and Zaizi - An interview with AIIM’s Director of Market Intelligence
Recently we talked about the state of ECM via our exclusive interview with John Mancini, the President of AIIM. Today we also have another exclusive interview with Doug Miles, AIIM’s Director of Market Intelligence who enlightened us on the challenges and future of ‘Big Data’ for organizations.
Before I sat down with Doug, it did occur to me, do I know what ‘Big Data’ really is? After listening to his presentation, yes I am a little bit enlightened. I have read papers and articles on the challenges of Big Data and how people think we should overcome the issues that come with it. But it still seemed to me that Big Data can mean different things to different people. A quick look on Wikipedia, it defines Big Data as:
Big Data is a loosely-defined term used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization.
Oh, so that’s not so bad, I tell myself. I always like to focus what makes something a problem. In this case, Big Data is the problem that occurs when an organization decides to digitize everything. According to this definition, big data might include the difficulty in capturing information in digital format, like a torn receipt or an odd sized piece of paper that’s either too large or double layered to fit into the scanning machine. As our world becomes more and more digitized, capturing this type of information can later lead to difficulties in storing all that information that was once only physically available, taking up floor or warehouse space, is now cluttering up your digital storage unit, disorganized…. Unstructured.
AIIM – THE REALITY CHECK FOR BIG DATA
AIIM has been an advocate and supporter of information professionals for nearly 70 years. Founded in 1943, AIIM builds on a strong heritage of research and has a large community of members. So it’s not hard to see, that when AIIM does a survey on its user or member base, the results would be insightful and definitely useful for organizations. Big Data – the big hype Doug has kindly talked about the results of a survey AIIM has done recently on Big Data. He tells us that it’s not surprising that the definition of Big Data is still somewhat ‘loosely defined’. Even he feels he can’t define it with 100% accuracy. But what he stresses on is not the definition, but the angle that we look upon when discussing big data. The survey very much unveiled the consequences of when ‘Big Data meets Big Content’.
Big Data meets Big Content
Thornton May once said, “Big Data is not just ‘more data’”.
So, don’t sweat when people start talking about Big Data. They aren’t talking about adding onto the data you currently have. Its about making sense of data and knowing the difference between “Big Data and Big Content”.
What is Big Data in the scope of Systems of Engagement?
Within Systems of Engagement, Big Data can come in all sorts of fashion. This explains why every time I ask people as to what they think Big Data is, they all provide a different answer. According to AIIM’s research, it can be: -
- Financial transactions
- Quality monitoring, machine efficiency, network logging
- Citizen/patient data
- External datasets (public or subscription)
- Manufacturing/transportation/vehicle operations
- Scientific/weather/exploration data
- Sensors and machine monitoring devices
Big data can also includes the ‘Internet of things’. Now you might be asking me, what is that? Well ‘Internet of things’ refers to a vision to attach ‘tiny devices to every single object to make it identifiable by its own unique IP address. These devices can then autonomously communicate with one another.’ (Reference)
Radio-frequency identification (RFID) is often seen as a prerequisite for the 'Internet of Things'. If all objects of daily life were equipped with radio tags, they could be identified and inventoried by computers. However, unique identification of things may be achieved through other means such as barcodes or 2D-codes as well.
For example: With the implementation of such devices on every object, a medicine cabinet may be continuously aware of the status of each medicine bottle stored inside the cabinet such as its name, contraindications and expiry date. Also, engineers may be able to query each cable in a suspension bridge to determine the extent of fatigue wear. (Reference) In terms of an organization such as the public sector, healthcare, retail, manufacturing and personal-location data globally, ‘Big Data’ can do the following according to a report Big Data by McKinsey, with the help and realization of the ‘Internet of things’:
- Big Data can unlock significant value by making information transparent and usable at much higher frequency.
- As organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance. Leading companies are using data collection and analysis to conduct controlled experiments to make better management decisions; others are using data for basic low-frequency forecasting to high-frequency now casting to adjust their business levers just in time.
- Big Data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services.
- Sophisticated analytics can substantially improve decision-making.
- Big Data can be used to improve the development of the next generation of products and services. For instance, manufacturers are using data obtained from sensors embedded in products to create innovative after-sales service offerings such as proactive maintenance (preventive measures that take place before a failure occurs or is even noticed).
What is Big Content?
AIIM sees big content as the equivalent of unstructured data in systems of record, which now is evolving into systems of engagement. This may include all the ‘fuzzy’ unstructured content like:
- Document repositories/ECM
- Powerpoints, spreadsheets, PDFs, XML, etc
- Web behaviours, click streams
- External/public social media
- Voice, video, image
- Publicly available or open content files
- Internal social media
- Text communications channels (SMS, IM)
- Print stream archives
Seems a lot doesn’t it? What’s worst is, most of this unstructured ‘Big Content’ is unsearchable on a company website or intranet. So how does one actually become an ‘Information Professional’ within their organization if they can’t access the information they need? This is where Big Data analytics tools may become more .
Linked Data - Main drivers and benefits of Big Data & Big Content
Have you ever received your companies Business Intelligence (BI) report and thought, “How is this relevant to me?” or “What does all this mean?” or even “Just show me which bit is relevant to me!” Well, you’re probably not alone. The daunting world of information overload or digital landfill can overwhelm any information professional. Big Data analytics seems to shed a light on those who are frustrated with their company search and conventional business intelligence. According to AIIM’s research, the main drivers for Big Data applications are:
- To better exploit internal and external data in order to improve the running of the business
- To avoid disruption
- To gain a competitive edge over the competition
By linking both structured (big data) and unstructured data (big content), they will be able to achieve all of these goals. For example, linking case reports to geographical demographics, or customer web behaviour to their history of product sales. (Reference) Results from the AIIM research show that:
"Over 60% would find it very useful if they could correlate text-based data with transactional data, but only 2% are able to do so at present."
Zaizi knows all about Big Data, Big Content & Unstructured Data
As an Alfresco (a superior Open Source ECM provider) Platinum Partner and Best Partner of the Year 2012, Zaizi understands the challenges that big data may bring, especially to those using Alfresco as the ECM system. 99% of the information stored in Alfresco repositories is unstructured. It is normally a bunch of documents with a set of metadata that describes themselves. With more and more content within the repository, key word search is broken. And since documents are not linked to each other like on the internet, algorithms like page rank are useless in finding the relevant content. Documents tends to be organised or filed by company structure, therefore it is not easy to discover related and relevant content.
Zaizi’s presents Semantic Alfresco
Zaizi has developed a demo that can demonstrate how you can turn a large Alfresco repository into an intelligent knowledge repository through the integration of the open source Apache Stanbol semantic stack. This will include:
- Enhanced metadata modelling in Alfresco by support RDF triples in Alfresco
- Language detection of content to ensure correct search indexer is used
- Auto-classification of content
- Use of third party entities like DBPedia, Geonames and local Active Directory.
- Enhanced taxonomy management within Alfresco
- SPARQL query interface in Alfresco
- Overview of the Apache Stanbol project & functionality.
- Overview of Apache Clerezza web framework to build Semantically enhanced web application on top of Alfresco
This demo will be presented at the Alfresco DevCon 2012 in Berlin. So if you’re going, be sure not to miss it!
Zaizi puts the FUN in Big Data!
Zaizi have been working on making Alfresco more social and fun to work with. We've built an extension to provide Gamification functionality within the Alfresco Share. We have three dashlets right now (we want to create many more):
- A ranking dashlets where site's users are sorted according to their activity inside the site
- An achievement configuration dashlet, where sites admin users can configure the achievements for a site
- A user achievements dashlet, where every user can see the achievements he has obtained
Once you have deployed gamifications extensions, we will be gathering a lot of useful information about users are doing and what they are ‘liking’. Alfresco 4, also includes a number of social features to enable users to collaborate better. Including the Alfresco Share Online Chat functionality. You can also use the social and gamification interaction audit data to start providing some intelligent suggestions to users.
For more information regarding these Alfresco features, contact Aingaran Pillai at firstname.lastname@example.org
The many opportunities with Big Data in government
I’m very much inclined to include this recent report titled “The Big Data Opportunity” produced by the Policy Exchange (written by Chris Yiu) on some of the selected Big Data initiatives in the US. Its a great example of how to take our imaginations that little bit further, by seeing how it can be applied in a real environment.
So its clear to see that the future for Big Data is huge. But the question I have for you is, is your organization ready for it? AIIM’s research mentions that: -
Although not necessarily a pre-requisite, a degree of content organization will certainly make big data analysis somewhat more straightforward, and in terms of priorities, many organizations are looking to address this before embarking on big data projects.
Is your content management system in check? Do you need some help organizing it? If you do, don’t hesitate to get in touch with our team of expert content technologists and let us help you get ready for Big Data! We hope this was useful!