Exciting Times at Continuuity

Apr 14 2014, 8:00 am

Tom Leonard is Executive Vice President, Sales & Business Development at Continuuity, where he is responsible for all relationships and go-to-market initiatives through our direct and indirect channels. Tom has over 25 years of experience in enterprise software and has held executive positions at Pentaho, OpenSpan, JBoss, and Bluestone Software.

These are very exciting times at Continuuity as we have aggressively ramped up our go-to-market and hiring efforts in 2014. In a short period of time, we are experiencing strong momentum across leading telco and financial services enterprise customers, as well as with many leading channel partners (OEMs, service providers and technology partners).

We continue to hire outstanding talent and continue to push differentiated and disruptive products to market in this fast-growing, ever-evolving Hadoop ecosystem. Continuuity Reactor 2.1 was made generally available in early March 2014 and just a couple of weeks later at the Gigaom Structure Data conference in NYC, we announced the availability and open-sourcing of Continuuity Loom, a cluster management software platform that manages clusters on public and private clouds (more on this at a later date).

To support our growing momentum with enterprises and partners alike, the Continuuity team is very excited to announce that we are now a Certified Technology Partner of both Cloudera and Hortonworks – the two leading Hadoop distributions in the market. Continuuity Reactor 2.1 is officially certified and tested on Cloudera Hadoop Distribution (CDH5) and Hortonworks Data Platform (HDP2). We want Reactor to work seamlessly with Cloudera and Hortonworks’ Hadoop deployments in the enterprise to make it easier for developers to integrate Reactor with these existing Hadoop distributions.

This milestone is very important to Continuuity as it further validates the core value propositions that Reactor provides to your Hadoop distribution of choice:

  • Recognized as the industry’s first purpose-built application server for Apache HadoopTM, Continuuity is focused on the full data and application lifecycle for Hadoop initiatives, from Data Lake ingestion to running real-time applications. Hence, we sit right on top of CDH5 and HDP2 – now certified!

  • Continuuity is very focused on Java developers and enabling them to be highly relevant across the Hadoop ecosystem. How? Reactor abstracts the complexity of the many Hadoop components (HBase, HDFS, etc.), which allows the developer to focus exclusively on the application code being developed, without domain knowledge of Hadoop.

  • By accomplishing the above, Reactor greatly broadens the Big Data developer base within organizations of all sizes and accelerates time-to-market / time-to-value with the operationalizing and monetizing of your Big Data initiatives.

Continuuity Reactor is the application server for Hadoop. It actively enables your Big Data technology stack (or reference architecture) to efficiently and rapidly develop Big Data applications. Reactor has enabled many Big Data application use cases such as the following: network analytics, fraud detection, consumer intelligence, geospatial and location, social media and sentiment, and many others.

We congratulate Cloudera and Hortonworks for their tremendous success to date. We are thrilled to be recognized as a Certified Technology Partner for CDH5 and HDP2 respectively and to be part of their growing ecosystem of partners.

To get going with Continuuity, download the Reactor 2.1 SDK today and visit the developer documentation to start building applications on CDH5, HDP2 or Apache Hadoop.

0 notes

Comments

Running Presto over Apache Twill

Apr 3 2014, 11:35 am

Alvin Wang is a software engineer at Continuuity where he is building software fueling the next generation of Big Data applications. Prior to Continuuity, Alvin developed real-time processing systems at Electronic Arts and completed engineering internships at Facebook, AppliedMicro, and Cisco Systems.

We open-sourced Apache Twill with the goal of enabling developers to easily harness the power of YARN using a simple programming framework and reusable components for building distributed applications. Twill hides the complexity of YARN with a programming model that is similar to running threads. Instead of writing boilerplate code again and again for every application you write, Twill provides a simple and intuitive API for building an application over YARN.

Twill makes it super simple to integrate new technologies to run within YARN. A great example of this is Presto, an ad-hoc query framework, and in this blog, I’ll explain what it is and how we were able to make Presto run within YARN using Twill in a short period of time.

Why did we want to run Presto over Twill?

We wanted to add ad-hoc query capabilities to our flagship product, Continuuity Reactor. We looked at different frameworks and got started on experimentation with Presto because it is written in Java and is emerging as an important big data tool. The next question was how to integrate it? We opted to run Presto within YARN because it gives developers the flexibility to manage and monitor resources efficiently within a Hadoop cluster, and the capability to run multiple Presto instances.

We use Twill extensively in Reactor for running all services within YARN. So, in order for us to run Presto within Reactor, we had to integrate it with Twill.

What is Presto?

Presto is a distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Last fall, Facebook open-sourced Presto, giving the world a viable, faster alternative to Hive, the data warehousing framework for Hadoop. Presto was designed for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of large organizations.

How does Presto work?

When executing a query, the query is sent to the coordinator through the command-line interface. The coordinator distributes the workload across workers. Each worker reads and processes data for their portion of the input. The results are then sent from the workers back to the coordinator, which would then aggregate the results to form a full response to the query. Presto works much faster than Hive because it doesn’t need to run a new MapReduce job for every query, as the workers can be left running even when there aren’t any active queries.

How did we integrate Presto with Twill?

First, we needed to run Presto services (discovery, coordinator, and worker) embedded in TwillRunnables, which posed a couple of challenges:

  • The public Presto distribution provides Bash scripts and Python scripts for running Presto services, but has no documented way to run in embedded mode.
  • Presto services normally use external configuration files for various properties like discovery URI and HTTP binding port.
  • Presto on Twill needed to handle varying discovery URIs since YARN cannot guarantee that the discovery service would run on a particular host since any host could become unavailable.
  • So, we configured the Presto services programmatically:

    Bootstrap app = new Bootstrap(modules.build());
       app.setRequiredConfigurationProperty("coordinator", "true");
       app.setRequiredConfigurationProperty("datasources", "jmx");
       app.setRequiredConfigurationProperty("discovery-server.enabled", "false");
       app.setRequiredConfigurationProperty("http-server.http.port", propHttpServerPort);
       app.setRequiredConfigurationProperty("discovery.uri", propDiscoveryUri);
    

    Next, we needed to get Presto services to use an existing Hive metastore with the Hive connector so that Presto CLI can run queries against Hive tables. While Presto includes basic documentation for file-based configuration of the Hive connector, there isn’t any documentation on how to do it programmatically. To tackle this, we inspected the code that loads the Hive connectors. We found that ConnectorManager.createConnection() was setting up the connectors, but the ConnectorManager instance was a private field in CatalogManager, so we had to use reflection. While not ideal, it worked. In the future, we may contribute our source code to Presto to make it easier to embed in Java. The code we used to register the Hive connector is shown below:

    injector.getInstance(PluginManager.class).installPlugin(new HiveHadoop2Plugin());
          CatalogManager catalogManager = injector.getInstance(CatalogManager.class);
          Field connectorManagerField = CatalogManager.class.getDeclaredField("connectorManager");
          connectorManagerField.setAccessible(true);
          ConnectorManager connectorManager = (ConnectorManager) connectorManagerField.get(catalogManager);
          connectorManager.createConnection("hive", "hive-hadoop2", ImmutableMap.builder()
            .put("hive.metastore.uri", propHiveMetastoreUri)
            .build());
    

    Once we were able to run embedded Presto without Twill, we packaged Presto with all the dependency jars into a bundle jar file to avoid dependency conflicts. Then we simply configured a Twill application to run various instances of BundledJarRunnable that were running the Presto services contained within the jar file. Below is a full example of a Twill application that runs Presto’s discovery service that is packaged within a jar file using BundledJarRunnable:

    public class PrestoApplication implements TwillApplication {
    
      public static final String JAR_NAME = "presto-wrapper.jar";
      public static final File JAR_FILE = new File("presto-wrapper-1.0-SNAPSHOT.jar");
    
      @Override
      public TwillSpecification configure() {
        return TwillSpecification.Builder.with()
          .setName("PrestoApplication")
          .withRunnable()
          .add("Discovery", new BundledJarRunnable())
          .withLocalFiles().add(JAR_NAME, JAR_FILE.toURI(), false).apply()
          .anyOrder()
          .build();
      }
    
      public static void main(String[] args) {
        if (args.length ");
        }
    
        String zkStr = args[0];
    
        final TwillRunnerService twillRunner =
          new YarnTwillRunnerService(
            new YarnConfiguration(), zkStr);
        twillRunner.startAndWait();
    
        // configure BundledJarRunnable
        BundledJarRunner.Arguments discoveryArgs = new BundledJarRunner.Arguments.Builder()
            .setJarFileName(JAR_NAME)
            .setLibFolder("lib")
            .setMainClassName("com.continuuity.presto.DiscoveryServer")
            .setMainArgs(new String[] { "--port", "8411" })
            .createArguments();
    
        // run Twill application
        final TwillController controller = twillRunner.prepare(new PrestoApplication())
            .withArguments("Discovery", discoveryArgs.toArray())
            .addLogHandler(new PrinterLogHandler(new PrintWriter(System.out, true)))
            .start();
    
        Runtime.getRuntime().addShutdownHook(new Thread() {
          @Override
          public void run() {
            controller.stopAndWait();
          }
        });
    
        try {
          Services.getCompletionFuture(controller).get();
        } catch (InterruptedException e) {
          e.printStackTrace();
        } catch (ExecutionException e) {
          e.printStackTrace();
        }
      }
    }
    

    As you can see, once you have your application running from Java code, Twill makes it straightforward to write a Twill application that runs your code inside a YARN container.

    Adding new features to Twill

    During the process of getting Presto to run over Twill, we contributed a couple of new features to Twill to make it easier for anyone to implement applications that have similar needs: We’ve added support for running Twill runnables within a clean classloader and we’re currently working on allowing users to deploy Twill runnables on unique hosts. In the future, we plan to open-source our Presto work so that anyone can spin up their own Presto services in YARN, and we are also considering support for Presto in Reactor to speed up ad-hoc queries.

    Apache Twill is undergoing incubation at the Apache Software Foundation. Help us make it better by becoming a contributor.

    0 notes

    Comments

    Introducing Continuuity Loom: Modern Cluster Management

    Mar 20 2014, 11:28 am

    Jonathan Gray, CEO & Co-founder of Continuuity, is an entrepreneur and software engineer with a background in open source and data. Prior to Continuuity, he was at Facebook working on projects like Facebook Messages. At startup Streamy, Jonathan was an early adopter of Hadoop and HBase committer.

    Our flagship platform, Continuuity Reactor, utilizes several technologies within the Apache HadoopTM ecosystem, so as we started building it, our developers needed an easy way to test Reactor on real distributed systems. They needed a fast, self-service way to provision clusters, both on our own hardware and in public clouds, because testing against real clusters is such a critical and frequent component of our development cycle. Our vision was to enable all Continuuity developers to be able to quickly create a cluster with the push of a button.

    We started off with simple scripts. But scripts had multiple issues - they required a lot of setup, there was no central management keeping track of available clusters, and the scripts had to be changed whenever we moved from one cloud to another, or one software deployment to another. Next, we started looking at different SCM (software configuration management) technologies and picked Chef. Combining scripts and Chef recipes, we’re able to get to one-button push, but it became extremely complex for two reasons - there were an increasing number of customizations needed for each new type of cluster and a significant amount of setup was needed on every developer’s box.

    When we launched Reactor, we wanted to enable developers to deploy our single-node Sandbox Reactor to a public cloud directly from the website. This forced us to think harder about designing a complete production system while still providing an easy and flexible solution for our developers. We wrote a combination of scripts and Chef recipes, designed a system that can templatize the Hadoop cluster, added capabilities to make it extendable to other services and providers, and made it very developer and ops friendly - that’s when Loom was born.

    Today, Loom is used to provision Sandbox Reactors on the Continuuity Cloud and as an internal DevOps tool to test new and incremental features on an ongoing basis. We’ve been developing and using Loom for almost a year and we’re noticing the benefits: our developer and DevOps productivity has doubled, and our development and test cycles have become faster.

    We built Loom for ourselves. But today, we’re open sourcing it for anyone to use because we believe that the broader developer community can benefit from it.

    What is Continuuity Loom?

    Continuuity Loom is cluster management software that provisions, manages, and scales clusters on public clouds and private clouds. Clusters created with Loom utilize templates of any hardware and software stack, from simple standalone LAMP-stack servers and traditional application servers like JBoss to full Apache Hadoop clusters comprised of thousands of nodes. Clusters can be deployed across many cloud providers (Rackspace, Joyent, etc.) and virtualization platforms (like OpenStack) while utilizing common SCM tools (Chef, scripts, etc.).

    Loom from Your Laptop

    Developers can use Loom to provision and decommission a cluster directly from their laptop. Loom creates a very simple user API that allows the developer to choose the type of cluster, how many nodes, and which cloud. Any developer can go from zero to Hadoop, from their laptop to any cloud, in minutes.

    Worry-Free Provisioning

    Loom simplifies the installation and configuration of any software stack, including Hadoop, and ensures that all installations are verified before a cluster is made available. Developers can create custom cluster types ranging from single VM application servers to large-scale, distributed, heterogeneous clusters.

    Download Loom for Free

    As a developer, building Big Data applications using Continuuity Reactor is just one part of the equation. The other part involves provisioning and managing your Hadoop cluster, and deploying your applications to the Hadoop cluster. With Loom, we’re empowering DevOps in a truly self-service way. Download and run Loom on a Mac laptop (currently experimental), and start or spin up Hadoop clusters and other software stacks across any private or public cloud.

    Open Sourcing Loom

    Loom is a relatively new project but highly extensible in the providers and services that can be supported. Open sourcing Loom allows others to add support for new software stacks, currently unsupported recipes/scripts, and new infrastructure/provisioning systems. Help us make Loom better by contributing here and please report any issues, bugs, or ideas.

    Loom for the Enterprise

    In many companies, developers must submit requests to individuals in other parts of the organization to obtain a Hadoop cluster or to run distributed software on a group of machines. This process can be incredibly slow and burdensome for both the developer and IT, decreasing developer productivity and increasing IT overhead. With Loom, the IT department can codify reference architectures and maintain a centrally controlled catalog of cluster templates that can be provisioned directly by developers. IT administrators then use a centralized dashboard to monitor and manage multiple clusters concurrently across multiple clouds.

    Download Loom to start managing and provisioning your own clusters today, or contact sales if you need enterprise support for Loom.

    0 notes

    Comments

    Continuuity Reactor 2.1: New Developer Experience and Windows Support

    Mar 6 2014, 8:04 am

    Sreevatsan Raman is a software engineer at Continuuity where he is building and architecting a platform fueling the next generation of Big Data applications. Prior to Continuuity, Sree designed and implemented big data infrastructure at Klout, Nominum, and Yahoo!

    Last November at Strata + Hadoop World 2013, we came out of public beta and launched Continuuity Reactor 2.0. Our goal was to enable developers and businesses to deploy production Big Data applications on our platform. Since then, we’ve been maniacally focused on making our developer experience better for all Java developers, with or without Hadoop experience. Today, we’re happy to announce the next version of Continuuity Reactor: 2.1.

    In Continuuity Reactor 2.1 release, we have the following new features:

    New Developer Experience

    We completely revamped our developer documentation and out-of-the-box experience to make it easier for Java developers–who may not be familiar with Hadoop or Big Data–to get started with Continuuity Reactor. We added real-world examples for you to get a quick and simple overview of our platform, a helpful guide to the Reactor user interface, and a streamlined SDK download so you can get started without missing a beat.

    Windows Support

    Some developers build Big Data applications on Windows machines. So, we added support for running single-node Reactor on 64-bit Windows (Windows XP and above).

    Additional Features

    The new release will add support for HBase 0.96, which has features like improved scalability, lower mean-time-to-recovery, and support for larger clusters (more than hundreds of nodes). We also fixed a number of bugs and made our platform more stable.

    We are working hard to solve the problems faced by both new and experienced Big Data developers. Reactor unifies the capabilities you need, in an integrated developer experience, without the worries of distributed system architectures or scalability. You will hear more from us as we progress, and we would love to hear from you. Feel free to leave a comment or send us feedback at support@continuuity.com.

    Download the Continuuity Reactor 2.1 SDK and visit the developer documentation to get started.

    Happy hacking!

    —Sree

    0 notes

    Comments

    What do you do at Continuuity, again? Part 2

    Feb 4 2014, 8:54 am

    Alex Baranau is a software engineer at Continuuity where he is responsible for building and designing software fueling the next generation of Big Data applications. Alex is a contributor to HBase and Flume, and has created several open-sourced projects. He also writes frequently about Big Data technologies.

    Let’s make it awesome!

    In our previous post, we introduced the basics of our Continuuity platform by using a simple example of a real-time processing application. In this post, we’ll take a step forward and introduce you to the details of building a non-trivial, real-time, data processing application.

    Read More

    0 notes

    Comments

    Programming with Apache Twill*, Part II

    Jan 21 2014, 10:41 am

    Terence Yim is a Software Engineer at Continuuity, responsible for designing and building realtime processing systems on Hadoop/HBase. Prior to Continuuity, Terence spent over a year at LinkedIn Inc. and seven years at Yahoo!, building high performance large scale distributed systems.

    In the Programming with Weave (now Apache Twill), Part I blog post, we introduced the basics of writing a distributed application on Hadoop YARN using Twill. In this post, we are going to highlight some of the important features in Twill.

    Read More

    0 notes

    Comments

    What do you do at Continuuity, again?

    Nov 20 2013, 12:10 pm

    As Continuuity gets more traction, my friends ask me about what I do at Continuuity. The short answer is - we’ve created a platform which makes building Big Data applications easier. Let me try to give you more details with a short example. Let’s imagine you need to implement an app.

    The app

    This example app is very simple. I will not reason why one should implement it. The point of the example is to let me walk you through the developer experience of implementing a Big Data app using Continuuity Reactor.

    Read More

    0 notes

    Comments

    Twill, formerly Weave, accepted into the Apache Incubator

    Nov 14 2013, 10:51 pm

    For the past few years, applications have been generating hundreds of petabytes of data. Analyzing this data can create real business value, but until recently businesses had discarded the data because it was either too hard to analyze or too expensive to store using traditional relational databases.

    The answer to this problem is a free, open-source technology running on commodity hardware: Apache Hadoop. Hadoop makes it cheap to store Big Data and easy to extract valuable insights using a batch-driven analysis method called MapReduce. It turns every web app into a data-driven app.

    Read More

    0 notes

    Comments

    Programming with Weave, Part I

    Nov 11 2013, 10:36 am

    Read first blog post of the Weave series

    In this second blog post of the Weave series, we would like to show you how writing a distributed application can be as simple as writing a standalone Java application using Weave.

    Writing applications to run on YARN

    Before we dive into the details about Weave, let’s talk briefly about what a developer has to do in order to write a YARN application, other than standard MapReduce. A YARN application always consists of three parts:

    Read More

    0 notes

    Comments

    Strata + Hadoop World 2013: My Perspective

    Nov 6 2013, 5:34 pm

    “Tech Geeks. Your chariot awaits: SFO->NYC”. As I drove past this billboard on the 101, I felt very excited. The entire team had been working very hard for the past 2 months. This was the moment of truth. We were going to announce the general availability of Continuuity Reactor 2.0 and our strategic relationship with Rackspace. The team would get to meet actual customers and developers, and see Reactor 2.0 in action in the real world.

    Read More

    0 notes

    Comments
    blog comments powered by Disqus