Many of you who read this blog loyally have complained recently that it's not being updated enough. Well, there's been a good reason for that: here at ETI HQ we've not only been focused on delivering the Badaboom™ Media Converter and RapiHD™ Accelerator for Adobe Premiere® Pro, we've been working on getting the capital base underneath Elemental solidified. Today I'm very happy to announce that we have closed our Series A financing round, bringing in two top-flight VCs and $7.1M to help us deliver on the promise of massively parallel computing. Check out the press release here. The two we selected (ok, maybe they selected us as well :) ) were General Catalyst Partners, based in Boston, MA; and Voyager Capital, based in Seattle, WA. The capital will fuel our growth, but more importantly we are thrilled to have Neil Sequeira from GC and Erik Benson from Voyager joining our board of directors. Neil has deep domain expertise in the Internet media space, and Erik is one of the smartest, best connected VCs in the Pacific Northwest. I have no doubt they will help Elemental achieve great things. Speaking of which, they have already added value by helping us flesh out our board of directors. Along with Neil and Erik, Frank Gill has joined the board. A former Executive Vice President at Intel Corporation, Frank provides a shot in the arm to our marketing, sales, and business development acumen. Bob Greenberg from the Oregon Angel Fund will remain on the board, providing deep technical insight. And Bruce Chizen, most recently CEO of Adobe Systems, will join the board in an observer capacity. Under Bruce's leadership, Adobe build tremendously innovative technologies including Flash and Acrobat; we are looking forward to following in the footsteps with RapiHD. Finally, Roy Coppinger will transition off the board into an advisory role. Roy has been with us at Elemental since day 1 -- actually, since day negative 30 or so -- and we wouldn't be where we are today without his patient counsel. Thanks for all your hard work, Roy! We are excited to begin the next chapter in Elemental's story. And I promise that the blog will be updated more frequently from now on.
The Elemental team showed off RapiHD at the National Association of Broadcasters (NAB) show in April in Las Vegas. Our marketing superstar Monica was responsible for organizing everything, and made sure that we had all our ducks in a row. Assuming you've done diligent preparation, these shows are a great opportunity to meet with a variety of prospective customers and business partners. We learned a few things along the way, and thought we would share some of them here to give any startups trying to pull off a smooth show a head start. T-x months: Book the space. Key learning here – check the site often. There is a lot of flux in the space game so the more often you check the better your chances are to get a good spot that might be visible from the main aisle or near a larger company that has a product that has synergy with your own. T – 4 months: Booth design. We decided to rent our booth since we were pretty sure we might want a larger, different booth in coming years. When you are making progress on start-up speed it is hard to believe anything will last for longer than a few months. Finding a rental booth that would stand out is tough but we believe we got one that did just that. A big part of doing that was making sure we had eye-catching graphics and deciding to have a screen with our demo running; but more on that later.
T – 3 months: Demos, Demos, Demos. Since we had accomplished something that had long been talked about but never achieved, demonstrations were critical to our booth. We had a small amount of space and lots of different things we could show. We decided to make one of our demos self running in a loop and put it on a screen that we put on a stand at about 7 feet off the ground. That enabled us to have two more stations in the booth to allow for interactive demos. The issues around speed of video processing seemed to resonate with all the attendees. The demos of RapiHD running 7X faster than a CPU-only solution were required to convince most people that it can be done with off-the-shelf GPUs. T – 3 months: Graphics. We knew that a large, eye-catching graphic would be required to draw people's eye on the crazy NAB show floor. So shopping for a high quality graphic that would look great at a large scale and designing the booth to incorporate detailed enough content to explain ETI’s RapiHD™ technology took some thought and careful planning. T – 2 months: Trinkets. Having high quality schwag is a core ETI value so getting these picked out and ordered and shipped to be ready in Vegas is critical. We went with green stress balls that kinda matched the ETI ball brandmark. T – 2 months: Tagline. This is harder than you thing since it needs to be in a large enough font so people can see but having something meaningful and descriptive to say. We came up with "Hitting the sweet spot for video processing." T – 1 month: Decisions, decisions, decisions. Padding thickness, carpet color, trash service, electrical, internet connections, stools, AV rentals and much much more. There are tons of things that you can buy at trade shows. Some you will decide are not worth but the tip here is that whatever you decided to do, get your paperwork in before the deadline where prices go up because there is a steep penalty for being late. T – 2-4 weeks: Booking meetings. We sent out our invitation about a month before the show, but discovered that the critical meeting booking time is about 2 weeks before the show. There is a key window you must hit when people are ready to think about their schedule but before their dance card is filled. T – 2 weeks: Uniforms. We bought polo shirts and had our logos stitched so that people would know who were are. T – 1 week: Packing list. Make a packing list of everything you need to set up your booth and run your demos. Don’t forget to take Windex, tape, scissors, power strips and twist ties. Give yourself a full day for setup, as there will be a variety of emergencies. And then ... wallah! ... it's 8 AM on Day 1, and customers are streaming towards your booth!
Enjoy! Although there is a lot of work and months of preparation, a trade show is a great experience and you can get a lot accomplished in a very short amount of time.
3000 Cray1 supercomputers in your PC
If you are a software developer, the thought of half a teraflop of performance in your PC for around $250 starts to get the creative juices flowing. You start thinking, "What can I do with all that power?" and "How do I program it to do what I want?" NVIDIA and AMD are giving us 10s, 100s, and soon thousands of processors that will execute in parallel ... now what?!
The Serial Universe We have spent most of our careers writing software that reads like a book. Line by line, the software code tells the processor what to do and the processor happily executes it serially. When the program is complete, the processor obediently waits for the next program to run. Operating systems (OS) such as Windows, Linux and MAC OS have helped us run many programs on one processor. Programs that need input or output from keyboards, printers, disk drives, or other programs, are scheduled by the operating system to run on the processor in the computer. They have helped us share the one processor with many programs or users in a fair and efficient way. Support for multiple processors in the OS has been increasing, but to get the highest performance out of an application, it must be written properly to take advantage of the processors and memory cache. Intel and AMD have built incredible serial processors that execute the majority of the applications on a PC today, but they are hitting performance barriers in clock speed and are adding processors to compensate. There is a new methodology on the horizon...
The Parallel Universe Parallel programming is not a new idea. Ever since someone connected 2 computers together people have been trying to run applications on multiple processors. What has changed, however, is that it is not only possible to get 128 processors on a single chip, but these devices are on their way to becoming ubiquitous. The devices contain many simple, general-purpose, processors that can execute instructions, in parallel, as fairly separate entities. NVIDIA has also provided an excellent language for us to program them all called CUDA (Compute Unified Device Architecture). It allows us to write a "program" that will run in parallel on as many processors that are available in the GPU. The language is very similar to the familiar C language with some simple extensions. CUDA allows us to solve problems using grids of one, two, or three dimensional groups of threads (CUDA blocks). Each set of threads can share a block of memory and thus a subset of the task at hand. Like an operating system, the device also includes a scheduler that selects a set of physical processors to run each CUDA block in parallel which is an excellent level of abstraction.
The Patio Problem I often get asked, "What is it like programming the GPU, and how is it different from the CPU?", I find I am always telling this story to explain it. I hope it helps! :) Around the time of founding Elemental, my wife and I decided we would build a patio in our front yard. It seemed simple enough- cut out all the grass, level the patio area, dig a French drain around the patio, lay gravel, lay sand, lay patio blocks (all 3000 of them), put in a simple sprinkler system, tamp it all down, plant the plants, and we would be having margaritas in the sun in a couple weekends! Well, three months later, I could say I was done. Ugh. Around this time, Brian and I had been writing the MPEG2 decoder on CUDA using the NVIDIA 8800GTX which has 128 processors. Due to the standard long hour, high concentration of a startup, everywhere I looked I would see blocks and grids and pixels - in my sleep, in buildings, and especially… in my patio. I realized there were very direct parallels in building the MPEG2 decoder and building that patio. The little spring time project that I will now refer to as the "Patio Problem," nicely illustrates the issues we've faced in developing codecs on the GPU. As I was cursing the "Patio Project" on the 4th weekend of spring, I thought to myself, "How would I do this if I had 128 friends helping me?" I couldn't just turn them loose on the project. Even if each person knew what the high level goal and plan of the patio was, there would need to be great coordination (and a lot of tasty beverages) required to effectively get 128 people to help finish my patio faster than my wife and I could do it. The first step of cutting out the grass would require putting each person on one little square of the grass have them cut it out. This would be very efficient, and would get the grass out very quickly. Similarly, the leveling of the ground and digging a French drain could be done effectively as long as each worker knew how far down they would have to dig. But what about getting the grass and dirt out and the gravel in? They would need to move the material in and out of the patio area in smaller groups in order for them not to bump in to eachother. Then the tricky part of how do all of those people lay the bricks in the right pattern efficiently? Do I have each of them grab 1 brick and lay it down in the correct spot which may depend on previously laid bricks, or do I have each lay down a line of bricks, or have small groups do a line, or do I have 2 groups start on opposite ends and work toward eachother? The building of the patio and the building of codec technology (and presumably many other applications) on a GPU are very similar. Some parts of a codec are easily solved using many parallel processors, but some parts are more difficult and require serialization of groups of processors. The NVIDIA devices are incredibly flexible in what each processor can do which allows us to come up with clever ways to solve the codec problems effectively. Just because we have a super computer on a chip doesn't mean that it solves the problems for us just as having 128 friends offer to help me on the "Patio Project" won't automatically get it done faster. Now, if I only had 128 friends, think what I could accomplish!
The Shift This fundamental shift in thinking is something that software developers will need to overcome to solve problems using parallel processors. There will be advances in tools and parallel languages, but there is no substitute for understanding of this method of solving problems. Some of the best parallel processor software writers may very well be tucked away in landscape companies, general building contractors, and assembly line managers. If you are someone who would like to join the shift in thinking, let us know!