Several years ago, while leading the development at an IDF (Israeli Defence Force) technological unit, I found myself
It was back in the day when I was at the helm of the data arm of Unit 8200, the elite high-tech division within Israeli intelligence. We got our hands on some data which indicated that ISIS intended to sneak a meat grinder rigged with an explosive onto a plane departing from Australia.
We reached out to the Australian authorities, and they arrested the terrorists two days before the planned attack was supposed to take place. But unfortunately, this was not always the case. A few weeks later ISIS carried out an attack in one of the major cities in Europe. When we looked back, it turned out that we had had the relevant pieces of data to alert our partners of that plot as well, but it hadn’t been prepared for analysis in time. And in that, lay another lesson I learned for good—for all its immense power, data is only valuable if you can process it quickly and effectively.
The great data bottleneck
Data has grown ever bigger and more diverse over the past decade, and the process will only accelerate moving forward. Leaving the armed forces, I quickly realized that this dramatic evolution was just as much a challenge for the corporate world as it had been for us. Both in the military and in business, data collection is less of a challenge than it used to be, the real challenge is generating value from it in due time.
Granted, the stakes are different in the corporate world, we aren’t exactly talking about issues of life and death here. But businesses do need to make fast decisions and respond to changes in near real-time to stay ahead of the competition. And that’s where companies are struggling—they’re simply not able to keep up the pace and generate value from their data quickly enough, especially when it comes to complex event-based data. Sometimes, crucial data sources are left out of the picture or not integrated quickly enough. Sometimes, the business question changes before the data pipeline is even up and running, meaning everything has to be remade from scratch. In these and many other scenarios, the outcome is the business losing critical ground to competitors who move faster.
It’s not because data engineers aren’t good at what they do! On the contrary, engineers are actually the data heroes of our times, dealing with troves of diverse data and ever-changing demands. Instead, the challenge lies in the fact that, as the data becomes more complex and the requirements pile up, the data engineers have to do more and heavier coding. Everyone expects data engineers to provide a fast, high-quality product, but there are just never enough data engineers for the team to keep up.
As a result, we’re seeing more and more data experts think that data itself is the problem and claim that solving it is just a matter of more integration efforts. I’d argue that data itself isn’t the problem at all, but rather the solution to most of the issues humankind currently faces. For this solution to work, though, we must realize its full potential and change some of our ways to accommodate this powerful tool. Tackling the challenge through a new paradigm will allow the data engineers to do their job in a much simpler way and allow more data professionals to take part in the integration process, both conceptually and technologically.
Changing the data paradigm
One of the data bottlenecks lies in the time it takes to set up even a single data pipeline, which is usually the domain of a limited team of highly-trained data engineers. For as long as things go this way, there’s no solving the challenge.
In most companies, data engineers and data users are locked in a never-ending loop of meetings that eat up the time to value, especially given the shortage of data engineers as such. That’s why one of the keys to the new paradigm shift is actually bringing data scientists, analysts, and engineers closer together to create hybrid production. By enabling data citizens to bring their business understanding into the integration process, we can speed the process up, save a lot of resources, and allow engineers to make the most of their unique capabilities by solving the most complex problems and creating building blocks for data pipelines.
Three steps to get started
But where do you begin with this change? I recommend starting with these three simple steps.
1. Change the ETL as we know it
Traditionally, most Extract-Transform-Load(ETL) pipeline projects put the highest emphasis on the Extract and Load parts. In the past, these were indeed the key challenges, but now, getting the T right is more crucial than ever. Pre-processing is where you begin to extract value from data before it is even in the warehouse, as the transformations you include are driven by your exact business needs. Use pre-build transformers that suit your organization’s business and operational demands (with or without a code) or build the transformers you need using a simple and convenient framework.
2. Let your data engineers focus on their core capabilities. Data engineers will always be in a limited supply, so use them wisely. Instead of weighing them down with the day-to-day routines, let data engineers focus on more complex tasks and on developing building blocks for your data systems and allow others to take part in the day-to-day data integration process.
3. The transition from relying on code to declarative code. Separate the design phase from the implementation phase, thus saving time and raising product quality. Focus on writing custom code only for really difficult and complex problems.
From my many years of experience in dealing with data challenges, it is clear that these three steps aren’t a silver bullet for all the struggles, but they do allow for a significant leap forward, especially if the conceptual adjustments are combined with the right technology. We should not be stuck in one data management paradigm forever. There is no reason to continue to waste quality engineering power and a lot of money on a task that can be done in a much simpler and more efficient way. All that is required in order to generate value from the existing data and to prepare properly for the challenges of the future is to adopt concepts and appropriate technological platforms.
About the Author
Asaf Cohen is an Israel Defense Forces (IDF) veteran with 25 years of experience in military intelligence. Joining the elite Unit 8200, as a young recruit, Asaf worked his way up to become the unit’s deputy commander in the rank of Colonel. Throughout his military career, Asaf has been working with advanced data analysis methods and helped pioneer AI integration in IDF’s operations. After leaving the army, Asaf co-founded Datorios, a data transformation startup, in a bid to put his ample experience with data at the service of companies looking to streamline their data operations and management.
Imagine making your way through a crowd, thousands of people donning anything from casual wear to the most
In modern business environments, data pipelines stand at the core of every aligned digital process and action. From