{"id":3041,"date":"2016-03-08T06:31:38","date_gmt":"2016-03-08T06:31:38","guid":{"rendered":"https:\/\/www.digitalcreed.in\/?p=3041"},"modified":"2017-11-21T07:19:45","modified_gmt":"2017-11-21T07:19:45","slug":"spark-is-becoming-more-important-than-hadoop","status":"publish","type":"post","link":"https:\/\/www.digitalcreed.in\/spark-is-becoming-more-important-than-hadoop\/","title":{"rendered":"\u2018Spark is becoming more important than Hadoop\u2019"},"content":{"rendered":"
Spark can run in the Hadoop ecosystem, or it can run in its own stand-alone environment. Over 25% of Spark projects today run outside of Hadoop, and the percentage is rising. Moshe Kranc<\/strong>, Chief Technology Officer, Ness Software Engineering Services (SES) talks about the big data trends for 2016 and Spark\u2019s role over Hadoop.\u00a0The views and opinions expressed in this article are entirely those of\u00a0Moshe Kranc<\/strong>.<\/h4>\n
— Vinita Gupta Malu<\/h5>\n
Q. What are the Big Data trends that you observe in 2016?<\/strong><\/p>\n
Moshe<\/strong>: I’ve observed the following trends:<\/p>\n
\n
Spark is becoming more important than Hadoop: Hadoop has been around since the late \u201890\u2019s, and has evolved to the point where it can efficiently and reliably perform big data analytics. Spark has the advantage of being a fast follower, able to learn from and avoid Hadoop\u2019s mistakes. Spark has a more generic and extensible programming model, which makes it easier to use for analytics. It also can handle big data in Motion, via Spark Streaming, and serves as the basis for a powerful graph database (GraphX) and a full-featured data science library (MLib). Spark\u2019s closest relative in the Hadoop world is Tez, which, like Spark, can execute algorithms organized as directed acyclic graphs. The open source community, recognizing the similarity, has crowned Spark as the converged platform of choice, and it will soon replace Tez in the Hadoop platform. Spark is the future of big data computing.<\/li>\n<\/ul>\n
\n
There will be fewer big data startups:\u00a0 Venture capital investors view big data as last year\u2019s trend. They have already doubled down on a variety of startups, and want to see those investments pan out. Hence, this year it would be difficult for the big data startups to convince the investors.<\/li>\n<\/ul>\n
\n
Oracle continues to lose market share to open source big data technologies: The mainstream software giants have adopted various strategies to cope with the competition from open source big data platforms. Some have formed alliances (e.g., Microsoft and HortonWorks), some have embraced and extended (e.g., IBM Watson). The company that least seems to get this brave new world is Oracle, which continues to sell Exadata (an expensive alternative for big data analytics), and has launched their own proprietary NOSQL database that has no advantage over open source alternatives. Oracle is having trouble understanding that in 2016, most customers prefer to avoid vendor lock-in.<\/li>\n<\/ul>\n
\n
Cassandra is becoming a dominant player in the NOSQL space: Cassandra was always the fastest NOSQL database, especially for write-heavy applications, and it provides an active-active distributed datacenter topology out of the box. The knock on Cassandra was that it was hard to deploy, maintain and program. Datastax, the commercial vendor for Cassandra, seems to have noticed: The CQL language makes Cassandra far easier to program, and the OpsCenter management tool makes maintenance a lot simpler.<\/li>\n<\/ul>\n
<\/p>\n
Q. What according to you would be the big data challenges in 2016? Explain the ways to tackle it.<\/strong><\/p>\n