How Tech Enterprises Handle Big Data on Open Source and Ensure User Privacy

How Tech Enterprises Handle Big Data on Open Source and Ensure User Privacy

The term “big data” gets thrown around a lot, especially taking into account its importance for driving AI technology. Finding ways to build scalable systems that provide valuable insights into what you’re doing well and what you could be doing better is imperative to maintain a competitive edge. And, as big data, artificial intelligence and machine learning become more advanced and interconnected each year, these scalable systems become more and more valuable.

When PicsArt was founded in 2011, the online landscape and the world of data collection, management and analysis were much less sophisticated. Since then, many startups have risen while others have faltered, and those that have found success were largely companies that were able to adapt to an increasingly data-driven marketplace. Today, our users generate a staggering 10 terabytes of data every single day. On a global scale, PicsArt has a medium- to large-size big data cluster with most of the large-size cluster functionalities enabled.

It was evident that we were stepping into the big data arena when our data met all four characteristics of big data: volume, velocity, variability and complexity. Once the volume of data we were dealing with was too large to fit into a relational or other standard database, the die was cast and we jumped into the big data scene with optimism and gusto. Besides that, because AI and machine learning became a mainstream technology, we were able to fully use it to benefit our users.

Adapting To A Big Data Mindset

When most people think about big data, they often imagine that the technical side would be the most difficult, but we found out through trial and error that approaching problems from a technical side first isn’t always ideal. Big data offers nearly endless possibilities, but if you don’t have a clear understanding of specific use cases and goals, you can unnecessarily prolong the development process. Since our system was constructed without a clearly delineated list of use cases, our data architects had to design it to handle as many future use cases as possible. The end result was a working system, with extensive support and capabilities, but the rollout time was longer than it could have been had we defined things better from the start.

Getting used to the sheer scope of data was a learning curve as well, especially since there was a lack of a big data community at the time. Initially, we placed responsibility for cleaning data on a single centralized team, which we quickly discovered would never work due to the constant barrage of thousands of events happening across multiple apps. Getting the data clean, we discovered, requires simultaneous efforts from the tech and business teams -- it only works if everyone is on the same page. Big data is considered the new oil nowadays, but it’s also a huge challenge in terms of how to prepare it, process it, store it and most importantly, turn it into applicable knowledge. To make that happen, it’s important to define the most common use cases within the product and align technical and business team efforts from the beginning. Overall, maintaining flexibility, learning from mistakes and adapting was essential to getting past the first step to becoming a big data company.

Finding The Right Tools For The Job

As the value of big data became more evident, conferences started popping up, giving innovators and companies a way to gather and share strategies. Open source solutions for data analysis and collections became more common and more robust, and it got easier to find the right technology. The lesson that start-ups can take away from all this is to take advantage of the big data community that exists now and do so with direct aims in mind.

In a wide range of tools, it’s really important to find those that fit your business needs. That can be done only empirically depending on the size of your company. It is important to discover tools for data processing, data analysis, crash monitoring and infrastructure monitoring.

Using Data To Fuel Innovation

Each piece of data my company collects falls into one of three categories: user device info, user behaviour and uploaded images -- complete with editing logs and intermediate steps. The metadata we collect is used to directly improve the user experience by responding to the way people use our app and then creating the tools they want.

Privacy is definitely an important topic for every tech enterprise that deals with a large amount of data. As an organization operates globally with data on citizens in European Union countries, they must comply with strict new rules around protecting customer data: The General Data Protection Regulation sets a new standard for consumer rights regarding their data. All of our users have the opportunity to adjust their preferable privacy settings and make sure they are comfortable with the data they are sharing with us.  

Source: All the above opinions are personal perspective on the basis of information provided by Forbes and contributor Hovhannes Avoyan.

No Comments Yet.

Leave a comment