Continuuity Reactor 2.3: SQL and Security Release

Jul 23 2014, 10:22 am

Alex Baranau is a software engineer at Continuuity where he is responsible for building and designing software fueling the next generation of Big Data applications. Alex is a contributor to HBase and Flume, and has created several open-sourced projects. He also writes frequently about Big Data technologies.

The Continuuity Reactor platform is designed to make it easy for developers to build and manage data applications on Apache Hadoop™ and Apache HBase™. Every day we’re passionately focused on delivering an awesome experience for all developers, with or without Hadoop expertise. And today, we’re excited to release the next version of our platform, Continuuity Reactor 2.3.

In addition to continued stability, scalability, and performance, we have added a number of significant new features in Continuuity Reactor 2.3:

Ad-hoc SQL Queries

Procedures are an existing, programmatic way to access and query your data in Reactor, but sometimes you may want to explore a Dataset in an ad-hoc manner rather than writing procedural code. Reactor now supports ad-hoc SQL queries over Datasets via a new API that allows developers to expose the schema of a Dataset and make it query-able through a REST API. This enables the submission of SQL queries over Datasets along with retrieval of the results, submitted via REST and executed via Apache Hive or other Hadoop-based SQL engines.

Security Enhancements

We’re committed to making Hadoop applications secure. Continuuity Reactor now supports perimeter security, restricting access to resources only to authenticated users. With perimeter security, access to cluster nodes is restricted by a firewall. Cluster nodes can communicate with each other, but outside clients can only communicate with the cluster through a secured gateway.

Using Reactor security, the Reactor authentication server issues credentials (access tokens) to authenticated clients, and clients then send these credentials with their requests to Reactor. Calls that lack valid access tokens are rejected, limiting access to only authenticated clients. You can learn more about the authentication process on the Reactor Security page.

Additional Release Highlights

Other key enhancements in 2.3 include new Application, Stream, Flow, and Dataset features such as:

  • Stream support for data retention policy; reconfigurable at runtime, while in use
  • Stream support for truncate via REST
  • Simplified Flowlet @Batch support with process methods no longer requiring an Iterator
  • New Datasets API that gives more power and flexibility when developing custom Datasets
  • Dataset management outside of Applications exposes REST interfaces to create, truncate, drop and discover Datasets
  • New Application API with an improved way to define application components

Finally, we have added Reactor Services, an experimental feature that allows the addition of custom User Services that can be easily discovered from within Flows, Procedures and MapReduce jobs. We’ll have more services capabilities in our next release, but you can get an early preview of one of the features we are most excited about right now!

Try Reactor 2.3 Today

We are working hard to solve the challenging problems faced by both new and experienced data application developers and to enable a much more fun and productive development experience for Hadoop. Reactor unifies the capabilities you need when developing on Hadoop into an integrated developer experience so that you can focus on your application logic without the worries of distributed system architectures or scalability. Download the Continuuity Reactor 2.3 SDK and check out the developer documentation to get started.

We are excited about the latest release and would love to hear your thoughts. Please feel free to send us feedback at support@continuuity.com.

Comments

Hadoop Summit: Where is the value? Where are the apps?

Jun 24 2014, 8:00 am

Jonathan Gray, Founder & CEO of Continuuity, is an entrepreneur and software engineer with a background in open source and data. Prior to Continuuity, he was at Facebook working on projects like Facebook Messages. At startup Streamy, Jonathan was an early adopter of Hadoop and HBase committer.

Coming out of Hadoop Summit, one thing is clear to me – while there has been significant growth and success of the ecosystem, it is still early days and Hadoop is still exceptionally hard to consume for most organizations. As a result of this persistent issue, there weren’t many major announcements, nothing exceptionally new or different released, and the buzz remained largely centered on YARN and Spark, both of which are several years old.

While we saw reports of early adopting companies seeing real value created with Hadoop, the focus was more technical this year than I anticipated—from the keynotes to the breakout sessions to the show floor, this year’s summit seemed more about the endless variety of different technologies than use cases and actual return on investment realized. A brief overview of a few other trends we observed is below:

Hadoop is not quite enterprise ready…yet

Hadoop Summit generated significant discussion about whether Hadoop is truly ready for real, production enterprise use. Of particular concern is security and related issues of privacy and data policies needed for companies, especially those dealing with customer or financial information. Recent acquisitions of Hadoop security upstarts by the major Hadoop distributions indicate that this will continue to be an important area of focus in the near term.

Hadoop vs. The EDW: To Replace or To Augment

Another hot topic is whether Hadoop is a replacement for the traditional EDW or if it is only to augment and offload certain workloads. In years past, this has been much more of a debate; however this year it seems clear that most have accepted a symbiotic relationship for the time being. While I do expect this to change, it is evident today that there is a significant gap in the capabilities of the Hadoop stack compared to proprietary EDW technologies.

Hadoop is becoming more fragmented

This year it became apparent that the Hadoop ecosystem is splintering into multiple and often competing projects. Competing vendors are establishing parallel but increasingly separate stacks while differentiated vendors are marketing overlapped messages. There has been an explosion in the variety of ways to work with Hadoop and in the number of companies trying to make Hadoop consumable, and it’s becoming even more confusing to choose which path is best to follow. This is true not only for business leaders who are making decisions about Big Data projects in their company but even for knowledgeable developers.

Hadoop (still) needs to be simplified

This mass confusion in the market is undercutting companies’ ability to achieve value and realize what they want from their Big Data initiatives. A lot of attention is still being paid to the infrastructure rather than the applications, so although the disruptive value of Big Data should be at the forefront, it remains elusive for most.

The Big Data Application revolution is still forthcoming. It is still early days, Hadoop is still very difficult, and very few people understand how to work with it. That’s why we are building a platform that focuses on making Hadoop easier for developers, allowing anyone to build applications (today in Java) without worrying about the low-level infrastructure. Rather than grapple with myriad technology options, they are free to focus on what matters – turning their brilliant ideas for data into apps that solve real problems. This is where Hadoop can produce desired outcomes – in data applications that quickly provide measurable value.

Adding Jet Fuel to the Fire

Not to be left out of the new choices in the Hadoop menagerie, in case you missed it, we announced a project in collaboration with AT&T Labs: a distributed framework for real-time data processing and analytics applications, codenamed jetStream. Available in open source in Q3 2014, you can find more information about this effort in our recent blog post and at jetStream.io.

Comments

HBaseCon: Moving Beyond the Core to Address Availability & Usability

May 19 2014, 12:58 pm

Jonathan Gray, CEO & Co-founder of Continuuity, is an entrepreneur and software engineer with a background in open source and data. Prior to Continuuity, he was at Facebook working on projects like Facebook Messages. At startup Streamy, Jonathan was an early adopter of Hadoop and HBase committer.

We just wrapped HBaseCon 2014, the annual event for Apache HBase™ contributors, developers, and users. As in years past, this is one of the most technical conferences that we attend, and it’s really focused on the core community of developers who are doing something meaningful with the enabling technology. What makes HBaseCon so compelling is that it’s not theoretical but rather all about overcoming real technical challenges and actual business use cases. And this year, we noticed a couple of key trends that are shaping the future of HBase.

Overall, we noticed that the HBase discussion has moved up a level, and this is a good thing. We’re no longer talking about the core architecture of HBase, which is pretty much set at this point. So people aren’t talking about doing the architecture better, but instead it’s all about building above what’s already there. Last year was very focused on improvements to the core platform, such as detecting server failure more quickly and recovering, and describing new use cases launching on HBase. But, in the year since, HBase has further stabilized into a mature platform and the new use cases are now established production systems. Now the conversation is around building above HBase and around it for higher availability and usability.

There was a lot of good discussion of increasing availability from an HBase standpoint. In the Facebook keynote on HydraBase, they discussed using a consensus protocol for HBase reads and writes in order to tolerate individual server failures without sacrificing availability or strong consistency. Similarly, Hortonworks and others shared work they’ve been doing on timeline consistent read replicas. For example, if a single server goes down you can still read data consistently up to a given point in time—the most updated snapshot of the data. Google’s Bigtable team also touched on availability by addressing their approach to the long tail of latency.

Multiple approaches to availability are happening, but they ultimately lead to the same goals of trying to reduce the big latency outliers and getting to 5-9s (i.e., 99.999%) reliability. In addition to early adopters like Facebook, Cloudera, and Hortonworks, we’re also encouraged to see a lot of other real users step up and take an active role in the community—we’ve seen this particularly in contributions from Salesforce, Xiaomi, and Bloomberg.

All of these companies are using HBase at very large scale, contributing to its development to continue to move it forward, and then sharing their successes with others. For us at Continuuity, HBase usability is what we’re driving at, and we’ll remain very focused on improving usability so that more developers can build their own HBase and Hadoop applications. This is where HBase is going, and we’re excited to be a part of this community and contribute to its success.

Comments

Continuuity Reactor 2.1: New Developer Experience and Windows Support

Mar 6 2014, 8:04 am

Sreevatsan Raman is a software engineer at Continuuity where he is building and architecting a platform fueling the next generation of Big Data applications. Prior to Continuuity, Sree designed and implemented big data infrastructure at Klout, Nominum, and Yahoo!

Last November at Strata + Hadoop World 2013, we came out of public beta and launched Continuuity Reactor 2.0. Our goal was to enable developers and businesses to deploy production Big Data applications on our platform. Since then, we’ve been maniacally focused on making our developer experience better for all Java developers, with or without Hadoop experience. Today, we’re happy to announce the next version of Continuuity Reactor: 2.1.

In Continuuity Reactor 2.1 release, we have the following new features:

New Developer Experience

We completely revamped our developer documentation and out-of-the-box experience to make it easier for Java developers–who may not be familiar with Hadoop or Big Data–to get started with Continuuity Reactor. We added real-world examples for you to get a quick and simple overview of our platform, a helpful guide to the Reactor user interface, and a streamlined SDK download so you can get started without missing a beat.

Windows Support

Some developers build Big Data applications on Windows machines. So, we added support for running single-node Reactor on 64-bit Windows (Windows XP and above).

Additional Features

The new release will add support for HBase 0.96, which has features like improved scalability, lower mean-time-to-recovery, and support for larger clusters (more than hundreds of nodes). We also fixed a number of bugs and made our platform more stable.

We are working hard to solve the problems faced by both new and experienced Big Data developers. Reactor unifies the capabilities you need, in an integrated developer experience, without the worries of distributed system architectures or scalability. You will hear more from us as we progress, and we would love to hear from you. Feel free to leave a comment or send us feedback at support@continuuity.com.

Download the Continuuity Reactor 2.1 SDK and visit the developer documentation to get started.

Happy hacking!

—Sree

0 notes

Comments
blog comments powered by Disqus